CN110276451A - One kind being based on the normalized deep neural network compression method of weight - Google Patents
One kind being based on the normalized deep neural network compression method of weight Download PDFInfo
- Publication number
- CN110276451A CN110276451A CN201910575103.3A CN201910575103A CN110276451A CN 110276451 A CN110276451 A CN 110276451A CN 201910575103 A CN201910575103 A CN 201910575103A CN 110276451 A CN110276451 A CN 110276451A
- Authority
- CN
- China
- Prior art keywords
- weight
- quantization
- neural network
- normalized
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 46
- 230000006835 compression Effects 0.000 title claims abstract description 16
- 238000007906 compression Methods 0.000 title claims abstract description 16
- 238000013139 quantization Methods 0.000 claims abstract description 83
- 230000008569 process Effects 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 230000006854 communication Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 15
- 238000011002 quantification Methods 0.000 description 6
- 101150041570 TOP1 gene Proteins 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 210000005036 nerve Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses one kind to be based on the normalized deep neural network compression method of weight, the quantization of weight can be decomposed into three steps, weight is normalized first, minimum quantization error is then based on to quantify to extremely low bit weight, the weight that renormalization is quantified finally is carried out, the forward-propagating of neural network is carried out using the weight of quantization.In order to train the neural network of quantization weight, the present invention carries out approximation to the derivative of the quantization function of step form, so that the gradient of neural network can carry out backpropagation, gradient is accumulated in floating-point weight.The long-tail distribution for avoiding weight to a certain extent based on the normalized weight quantization of weight, to reduce quantization error, and then improves the performance of model.The weight of present some mainstream neural networks can be compressed to 2 bits, 3 bits by the present invention, while guarantee that the performance of model does not have biggish loss.
Description
Technical field
The present invention provides one kind based on the normalized deep neural network compression method of weight, is related to compressing (quantization) nerve
The parameter of network can quantify the weight of model to extremely low bit number (2 bits, 3 bits) to be suitable for compressing various masters
The parameter of the neural network of stream, such as ResNet, MobileNet etc., so that model can be deployed to mobile end equipment.
Background technique
With the development of deep learning, deep neural network is increasingly becoming the main model of machine learning.But depth
Practising model has a large amount of parameter while needing great computing cost, is unfavorable for model to mobile device and embedded device portion
Administration.According to existing research, there are bulk redundancies in deep neural network, thus can the parameter to model greatly pressed
Contracting, while guaranteeing that the performance of model does not decline significantly.
Model Weight quantization is a main method of model compression, although existing method can be the parameter amount of model
Change to 8 bits almost without performance loss, but hold power re-quantization to extremely low bit number when, often bring biggish property
The loss of energy.
A kind of weight quantization method of mainstream is based on minimum quantization error quantization weight, but this method is by weight
Long-tail distribution influence, lead to biggish Relative quantification error, in turn result in the loss of quantitative model performance.
Summary of the invention
Goal of the invention: the shadow that the current weight quantization method based on minimum quantization error is distributed by the long-tail of weight
It rings, biggish Relative quantification error can be brought.In view of the above-mentioned problems, the present invention provides one kind to be based on the normalized depth of weight
Neural network compression method.First weight is normalized, then weight is quantified based on minimum quantization error, finally to the anti-normalizing of weight
Change the weight quantified.Approximation is carried out to the derivative of the quantization function of step form simultaneously, so that the reversed biography of neural network
Broadcasting can be normally carried out.By the present invention in that weight normalization is carried out with maximum value element, to obtain maximum value
The gradient profile of element is different from the gradient of other elements, so that maximum value element is all quickly leaned on to 0 value in each iteration
Closely, so that the long-tail distribution of weight can be weakened after many iterations, lesser Relative quantification error is obtained, so that amount
The performance loss for changing model reduces.
Technical solution: one kind being based on the normalized deep neural network compression method of weight, in neural network forward-propagating
In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified
Weight carries out the forward-propagating of neural network using the weight of quantization;In back-propagation process, to the quantization letter of step form
Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point
In weight.
The forward-propagating process first normalizes weight, is then based on minimum quantization error quantization weight, then into
The weight that row renormalization is quantified, specific steps are as follows:
Step 100, the parameter for obtaining the full precision model of pre-training, each filtering to all convolutional layers and full articulamentum
The parameter vector of device obtains w ∈ RM。
Step 101, parameter w is normalized, using w maximum value element each element normalizing in w
Change to [- 1,1], i.e.,
Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V
(α)。
Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization
Function) Π ()In each element project to the set V (α) of quantized value.
Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantifiedq, it is therefore an objective to maintenance dose
The weight and initial parameter w of change have identical magnitude, i.e.,Detach () operation handlebar its
In variable be considered as constant.
Step 105, the quantization weight w by obtainingqConvolution operation (full connection is carried out with the input x of neural network this layer
Operation) obtain the output y of this layer of neural network.
The back-propagation process carries out approximation to the derivative of the quantization function of step form, so that neural network can
With backpropagation, specific steps are as follows:
Step 200, by the backpropagation of gradient, neural network loss function L is obtained to quantization weight wqGradient
Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient.
Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weight
Gradient.
Step 203, according to the backpropagation of gradient, the gradient to original floating-point parameter w, the form of gradient are obtained are as follows:
Wherein wiIt is i-th of element of w.
Step 204, using gradientUpdate floating-point weight w.
It is described based on rightMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:
Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base.α
Define set V (α), V (α)={ α that all quantized values are constitutedTel|1≤l≤2K, wherein el∈{-1,1}KEnumerate all K
The binary-coding of a bit.Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, the optimal solution of α is calculated, this
Shi Wenti can convert recurrence problem solving, obtain new α;α is fixed again, calculates the optimal solution of B, problem is converted into projection at this time
Problem, i.e. handleIn each element project to set V (α), obtain corresponding binary-coding, as new B;Continuous iteration
The above process, until convergence.
The overall flow of the model training are as follows: to each filter parameter in neural network convolutional layer and full articulamentum
The parameter w that w is quantifiedq, use wqCarry out neural network forward-propagating;Loss function L is calculated by forward-propagating,
Then the backpropagation of gradient is carried out, wherein the derivative to phase step type function Π () carries out approximation, finally obtains and floating-point is joined
The gradient of number w, updates floating point parameters w, and training needs iteration repeatedly until convergence.Final model need to only save the normalizing of quantization
Change weightCorresponding binary-coding and quantization base are for predicting, without saving floating point parameters w.
The utility model has the advantages that compared with prior art, it is provided by the invention based on the normalized neural network weight quantization of weight
Method can quantify the weight of neural network to extremely low bit, while guarantee that the performance of model is not lost significantly.Benefit
Weight normalization is carried out with maximum value element, so that the gradient profile for obtaining maximum value element is different from other elements
Gradient so that maximum value element is all quickly close to 0 value in each iteration, to can weaken after many iterations
The long-tail of weight is distributed, and smaller quantization error is obtained, so that the performance loss of quantitative model reduces.
Detailed description of the invention
Fig. 1 is forward-propagating and the backpropagation schematic diagram of quantizing process of the invention;
Fig. 2 is that the neural network of quantization weight of the invention integrally trains flow chart;
Fig. 3 is quantization base α solution procedure of the invention;
Fig. 4 is the process predicted using trained quantitative model;
Fig. 5 is the floating-point weight of the method for the present invention and " the compact neural network of high-precision based on the quantization function that can learn "
Profiles versus's figure.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention
Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention
The modification of form falls within the application range as defined in the appended claims.
Based on the normalized deep neural network compression method of weight, the forward-propagating and backpropagation of quantizing process are as schemed
Shown in 1.Quantization function forward-propagating process are as follows: first each filter weight w is normalized to obtainIt is then based on minimum
Quantization error solves quantization base α, obtains the set V (α) of quantized value by α, quantifies normalized weight according to V (α)Use
Projection function ΠV(α)() arrives handleEach element project to set V (α) and obtainFinally carry out the renormalization amount of obtaining
Change weight wq, the magnitude of renormalization guarantee quantization weight is consistent with the magnitude of floating-point weight, uses the weight w of quantizationqCarry out net
The forward-propagating of network.The back-propagation process of quantizing process carries out approximate, normalizing to the gradient of phase step type quantization function Π ()
Change and the derivative of the operation of renormalization can be directly obtained without approximation, may finally obtain to the gradient of floating-point weight w (such as
Shown in step 204), gradient is accumulated in floating-point weight, so as to be trained end to end.
Based on the normalized deep neural network compression method of weight, whole training process is as shown in Figure 2.Firstly the need of taking
The full precision model for obtaining a pre-training, then according to the forward-propagating process of the quantizer of Fig. 1 to convolutional layers all in network
The floating-point weight w of each filter connected entirely quantifies, and uses the weight w of quantizationqNetwork is carried out with the training data of input
Forward-propagating, the loss function L of network is calculated.Then the backpropagation Jing Guo neural network is obtained to quantization weight wq
Gradient the gradient to floating-point weight w is obtained according to the back-propagation process of the quantizer of Fig. 1, update floating-point weight w, repeatedly
It updates until convergence.Floating-point weight w and quantization base α are saved, compression (low precision) can be further processed into before model prediction
Format.
It is as shown in Figure 3 to quantify base α solution procedure.By the way of alternative optimization, initialization quantization base α first calculates institute
Have the set V (α) of quantized value, thenIn each element project to V (α) and obtain corresponding binary-coding, it is as new
B;According to current B andFind out the new enclosed optimal solution of quantization base α, the wheel number for repeating the above steps certain.
The process predicted using trained quantitative model as shown in figure 4, the floating-point weight w that is obtained first according to training and
Quantify base α to calculateBinary-coding B, then calculate renormalization after quantization weight wqCorresponding quantization base α′=α ×
Max (| w |), final model need to only store quantization base α′With binary-coding B, the storage overhead of model is substantially reduced.Convolution
Inner product operation in operation and full attended operation can calculate in the following way:
Due to BijIt is two-value, value is { -1,1 }, so Bij·xiCalculating only need to be according to BijValue, to xiIt takes just
Or take negative, and it is operated without multiplication, therefore a large amount of multiplication in inner product operation can be eliminated and operated, it can be under specific hardware
Accelerate deduction process.Furthermore some existing methods be can be combined with, input x is also quantified as low bit, greatly promote nerve
The deduction speed of network.
The present invention is tested on both data sets, compared the effect of method and existing best method of the invention
Fruit, existing best method are that Dongqing Zhang was proposed in European Computer vision international conference ECCV paper in 2018
" the compact neural network of high-precision based on the quantization function that can learn ".
First data set is CIFAR-100, and CIFAR-100 includes the RGB picture of 60K 32x32, altogether 100 classes
Not, each classification has 600 pictures.Including 50K training pictures and 10K test pictures.Experimental result such as 1 institute of table
Show, evaluation index is Top1 classification accuracy, and the network used is ResNet20.It can be seen that method ratio of the invention in it is existing most
Good method has very big promotion, and can be more than the effect of full precision model when using 4 bit quantization weights.
Table 1 is the Top1 nicety of grading (%) that the present invention uses ResNet20 on CIFAR-100 data set
Table 2 is that the present invention is classified on ImageNet data set using the Top1/Top5 of MobileNetv1
Precision (%)
Second data set is ImageNet, and ImageNet includes that 1.28M training pictures and 50K open and test pictures, one
Totally 1000 classifications.Experimental result is as shown in table 2, and evaluation index is that Top1 and Top5 classify accuracy, the network used for
MobileNetv1.It can be seen that method ratio of the invention has very big promotion in existing best method.
Based on the normalized deep neural network compression method of weight, obtained weight distribution is as shown in figure 5, use 2 ratios
Special quantization weight, above four figures be the obtained floating-point weight distribution of the present invention, four figures respectively correspond four convolution of selection
Layer/full articulamentum floating-point weight distribution.Four figures are " the compact nerve net of high-precision based on the quantization function that can learn below
The floating-point weight distribution that network " equivalent layer obtains, the point in x-axis represent being averaged for the quantized value of this layer of all filters.Wherein
" mse " indicates being averaged for the Relative quantification error of this layer of each filter, and the Relative quantification error of each filter is defined asWeight distribution figure comparison from above and below is it can be seen that the present invention can be distributed to avoid the long-tail of weight, simultaneously
Smaller Relative quantification error " mse " can be obtained, therefore quantitative model can obtain better performance.
Claims (5)
1. one kind is based on the normalized deep neural network compression method of weight, it is characterised in that: in neural network forward-propagating
In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified
Weight carries out the forward-propagating of neural network using the weight of quantization;In back-propagation process, to the quantization letter of step form
Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point
In weight.
2. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that in forward direction
In communication process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out the renormalization amount of obtaining
The weight of change, specific steps are as follows:
Step 100, the parameter for obtaining the full precision model of pre-training, to each filters of all convolutional layers and full articulamentum
Parameter vector obtains w ∈ RM;
Step 101, parameter w is normalized, each element in w is normalized to using the element of the maximum value of w
[- 1,1], i.e.,
Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V (α);
Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization function)
Π ()In each element project to the set V (α) of quantized value;
Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantifiedq, it is therefore an objective to keep quantization
Weight and initial parameter w have identical magnitude, i.e.,Detach () operation handlebar is therein
Variable is considered as constant;
Step 105, the quantization weight w by obtainingqConvolution operation (full attended operation) is carried out with the input x of neural network this layer
Obtain the output y of this layer of neural network.
3. being based on the normalized deep neural network compression method of weight as described in claim 1, it is characterised in that: reversed
In communication process, approximation is carried out to the derivative of the quantization function of step form, so that neural network can be reversed propagation, it is specific to walk
Suddenly are as follows:
Step 200, by the backpropagation of gradient, neural network loss function L is obtained to quantization weight wqGradient
Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient;
Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weightLadder
Degree;
Step 203, according to the backpropagation of gradient, the gradient to original floating-point parameter w, the form of gradient are obtained are as follows:
Step 204, using gradientUpdate floating-point weight w.
4. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that described
Based on pairMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:
Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base;α is defined
The set V (α), V (α)={ α that all quantized values are constitutedTel|1≤l≤2K, wherein el∈{-1,1}KEnumerate all K bits
Binary-coding;Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, calculates the optimal solution of α, at this time problem
Recurrence problem solving can be converted, new α is obtained;α is fixed again, calculates the optimal solution of B, problem is converted into On The Projection at this time,
That is handleIn each element project to set V (α), obtain corresponding binary-coding, as new B;Continuous iteration is above-mentioned
Process, until convergence.
5. being based on the normalized deep neural network compression method of weight, the bulk flow of model training as described in claim 1
Journey are as follows: the parameter w that each filter parameter w in all convolutional layers of neural network and full articulamentum is quantifiedq, make
Use wqCarry out neural network forward-propagating;Loss function L is calculated by forward-propagating, then carries out the backpropagation of gradient,
Approximation wherein is carried out to the derivative of phase step type function Π (), finally obtains the gradient to floating point parameters w, updates floating point parameters w,
Training needs iteration repeatedly until convergence.Final model need to only save the normalized weight of quantizationCorresponding binary-coding
With quantization base for predicting, without saving floating point parameters w.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910575103.3A CN110276451A (en) | 2019-06-28 | 2019-06-28 | One kind being based on the normalized deep neural network compression method of weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910575103.3A CN110276451A (en) | 2019-06-28 | 2019-06-28 | One kind being based on the normalized deep neural network compression method of weight |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110276451A true CN110276451A (en) | 2019-09-24 |
Family
ID=67962561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910575103.3A Pending CN110276451A (en) | 2019-06-28 | 2019-06-28 | One kind being based on the normalized deep neural network compression method of weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276451A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851563A (en) * | 2019-10-08 | 2020-02-28 | 杭州电子科技大学 | Neighbor document searching method based on coding navigable stretch chart |
CN110969251A (en) * | 2019-11-28 | 2020-04-07 | 中国科学院自动化研究所 | Neural network model quantification method and device based on label-free data |
GB2581546A (en) * | 2019-08-22 | 2020-08-26 | Imagination Tech Ltd | Methods and systems for converting weights of a deep neural network from a first number format to a second number format |
CN112686031A (en) * | 2020-12-24 | 2021-04-20 | 北京有竹居网络技术有限公司 | Text feature extraction model quantification method, device, equipment and storage medium |
CN113610232A (en) * | 2021-09-28 | 2021-11-05 | 苏州浪潮智能科技有限公司 | Network model quantization method and device, computer equipment and storage medium |
CN113688990A (en) * | 2021-09-09 | 2021-11-23 | 贵州电网有限责任公司 | No-data quantitative training method for power edge calculation classification neural network |
CN113795869A (en) * | 2019-11-22 | 2021-12-14 | 腾讯美国有限责任公司 | Method and apparatus for quantization, adaptive block partitioning and codebook coding and decoding for neural network model compression |
CN114925829A (en) * | 2022-07-18 | 2022-08-19 | 山东海量信息技术研究院 | Neural network training method and device, electronic equipment and storage medium |
-
2019
- 2019-06-28 CN CN201910575103.3A patent/CN110276451A/en active Pending
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2581546A (en) * | 2019-08-22 | 2020-08-26 | Imagination Tech Ltd | Methods and systems for converting weights of a deep neural network from a first number format to a second number format |
CN112418391A (en) * | 2019-08-22 | 2021-02-26 | 畅想科技有限公司 | Method and system for transforming weights of deep neural network |
GB2581546B (en) * | 2019-08-22 | 2021-03-31 | Imagination Tech Ltd | Methods and systems for converting weights of a deep neural network from a first number format to a second number format |
CN112418391B (en) * | 2019-08-22 | 2022-07-08 | 畅想科技有限公司 | Method and system for transforming weights of deep neural network |
US11188817B2 (en) | 2019-08-22 | 2021-11-30 | Imagination Technologies Limited | Methods and systems for converting weights of a deep neural network from a first number format to a second number format |
CN110851563A (en) * | 2019-10-08 | 2020-02-28 | 杭州电子科技大学 | Neighbor document searching method based on coding navigable stretch chart |
CN110851563B (en) * | 2019-10-08 | 2021-11-09 | 杭州电子科技大学 | Neighbor document searching method based on coding navigable stretch chart |
CN113795869B (en) * | 2019-11-22 | 2023-08-18 | 腾讯美国有限责任公司 | Neural network model processing method, device and medium |
CN113795869A (en) * | 2019-11-22 | 2021-12-14 | 腾讯美国有限责任公司 | Method and apparatus for quantization, adaptive block partitioning and codebook coding and decoding for neural network model compression |
CN110969251A (en) * | 2019-11-28 | 2020-04-07 | 中国科学院自动化研究所 | Neural network model quantification method and device based on label-free data |
CN110969251B (en) * | 2019-11-28 | 2023-10-31 | 中国科学院自动化研究所 | Neural network model quantification method and device based on label-free data |
WO2022135174A1 (en) * | 2020-12-24 | 2022-06-30 | 北京有竹居网络技术有限公司 | Quantization method and apparatus for text feature extraction model, and device and storage medium |
CN112686031B (en) * | 2020-12-24 | 2023-09-08 | 北京有竹居网络技术有限公司 | Quantization method, device, equipment and storage medium of text feature extraction model |
CN112686031A (en) * | 2020-12-24 | 2021-04-20 | 北京有竹居网络技术有限公司 | Text feature extraction model quantification method, device, equipment and storage medium |
CN113688990A (en) * | 2021-09-09 | 2021-11-23 | 贵州电网有限责任公司 | No-data quantitative training method for power edge calculation classification neural network |
CN113610232B (en) * | 2021-09-28 | 2022-02-22 | 苏州浪潮智能科技有限公司 | Network model quantization method and device, computer equipment and storage medium |
WO2023050707A1 (en) * | 2021-09-28 | 2023-04-06 | 苏州浪潮智能科技有限公司 | Network model quantization method and apparatus, and computer device and storage medium |
CN113610232A (en) * | 2021-09-28 | 2021-11-05 | 苏州浪潮智能科技有限公司 | Network model quantization method and device, computer equipment and storage medium |
CN114925829A (en) * | 2022-07-18 | 2022-08-19 | 山东海量信息技术研究院 | Neural network training method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276451A (en) | One kind being based on the normalized deep neural network compression method of weight | |
WO2022141754A1 (en) | Automatic pruning method and platform for general compression architecture of convolutional neural network | |
WO2020238237A1 (en) | Power exponent quantization-based neural network compression method | |
CN111612147A (en) | Quantization method of deep convolutional network | |
CN113159173A (en) | Convolutional neural network model compression method combining pruning and knowledge distillation | |
CN113595993B (en) | Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation | |
CN111931906A (en) | Deep neural network mixing precision quantification method based on structure search | |
CN114239861A (en) | Model compression method and system based on multi-teacher combined guidance quantification | |
CN108268950A (en) | Iterative neural network quantization method and system based on vector quantization | |
CN114626550A (en) | Distributed model collaborative training method and system | |
Wang et al. | Global aligned structured sparsity learning for efficient image super-resolution | |
Zhang et al. | ACP: adaptive channel pruning for efficient neural networks | |
CN112686384A (en) | Bit-width-adaptive neural network quantization method and device | |
Chen et al. | DNN gradient lossless compression: Can GenNorm be the answer? | |
Liu et al. | Flexi-compression: a flexible model compression method for autonomous driving | |
CN114154626B (en) | Filter pruning method for image classification task | |
CN115564987A (en) | Training method and application of image classification model based on meta-learning | |
CN116051861A (en) | Non-anchor frame target detection method based on heavy parameterization | |
CN113033653B (en) | Edge-cloud cooperative deep neural network model training method | |
CN112990336B (en) | Deep three-dimensional point cloud classification network construction method based on competitive attention fusion | |
WO2022141189A1 (en) | Automatic search method and apparatus for precision and decomposition rank of recurrent neural network | |
CN114648123A (en) | Convolutional neural network hierarchical reasoning time prediction method and device | |
CN113159318A (en) | Neural network quantification method and device, electronic equipment and storage medium | |
CN112488291B (en) | 8-Bit quantization compression method for neural network | |
CN115496200B (en) | Neural network quantization model training method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190924 |
|
RJ01 | Rejection of invention patent application after publication |