CN110276451A - One kind being based on the normalized deep neural network compression method of weight - Google Patents

One kind being based on the normalized deep neural network compression method of weight Download PDF

Info

Publication number
CN110276451A
CN110276451A CN201910575103.3A CN201910575103A CN110276451A CN 110276451 A CN110276451 A CN 110276451A CN 201910575103 A CN201910575103 A CN 201910575103A CN 110276451 A CN110276451 A CN 110276451A
Authority
CN
China
Prior art keywords
weight
quantization
neural network
normalized
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910575103.3A
Other languages
Chinese (zh)
Inventor
李武军
蔡文朴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910575103.3A priority Critical patent/CN110276451A/en
Publication of CN110276451A publication Critical patent/CN110276451A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses one kind to be based on the normalized deep neural network compression method of weight, the quantization of weight can be decomposed into three steps, weight is normalized first, minimum quantization error is then based on to quantify to extremely low bit weight, the weight that renormalization is quantified finally is carried out, the forward-propagating of neural network is carried out using the weight of quantization.In order to train the neural network of quantization weight, the present invention carries out approximation to the derivative of the quantization function of step form, so that the gradient of neural network can carry out backpropagation, gradient is accumulated in floating-point weight.The long-tail distribution for avoiding weight to a certain extent based on the normalized weight quantization of weight, to reduce quantization error, and then improves the performance of model.The weight of present some mainstream neural networks can be compressed to 2 bits, 3 bits by the present invention, while guarantee that the performance of model does not have biggish loss.

Description

One kind being based on the normalized deep neural network compression method of weight
Technical field
The present invention provides one kind based on the normalized deep neural network compression method of weight, is related to compressing (quantization) nerve The parameter of network can quantify the weight of model to extremely low bit number (2 bits, 3 bits) to be suitable for compressing various masters The parameter of the neural network of stream, such as ResNet, MobileNet etc., so that model can be deployed to mobile end equipment.
Background technique
With the development of deep learning, deep neural network is increasingly becoming the main model of machine learning.But depth Practising model has a large amount of parameter while needing great computing cost, is unfavorable for model to mobile device and embedded device portion Administration.According to existing research, there are bulk redundancies in deep neural network, thus can the parameter to model greatly pressed Contracting, while guaranteeing that the performance of model does not decline significantly.
Model Weight quantization is a main method of model compression, although existing method can be the parameter amount of model Change to 8 bits almost without performance loss, but hold power re-quantization to extremely low bit number when, often bring biggish property The loss of energy.
A kind of weight quantization method of mainstream is based on minimum quantization error quantization weight, but this method is by weight Long-tail distribution influence, lead to biggish Relative quantification error, in turn result in the loss of quantitative model performance.
Summary of the invention
Goal of the invention: the shadow that the current weight quantization method based on minimum quantization error is distributed by the long-tail of weight It rings, biggish Relative quantification error can be brought.In view of the above-mentioned problems, the present invention provides one kind to be based on the normalized depth of weight Neural network compression method.First weight is normalized, then weight is quantified based on minimum quantization error, finally to the anti-normalizing of weight Change the weight quantified.Approximation is carried out to the derivative of the quantization function of step form simultaneously, so that the reversed biography of neural network Broadcasting can be normally carried out.By the present invention in that weight normalization is carried out with maximum value element, to obtain maximum value The gradient profile of element is different from the gradient of other elements, so that maximum value element is all quickly leaned on to 0 value in each iteration Closely, so that the long-tail distribution of weight can be weakened after many iterations, lesser Relative quantification error is obtained, so that amount The performance loss for changing model reduces.
Technical solution: one kind being based on the normalized deep neural network compression method of weight, in neural network forward-propagating In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified Weight carries out the forward-propagating of neural network using the weight of quantization;In back-propagation process, to the quantization letter of step form Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point In weight.
The forward-propagating process first normalizes weight, is then based on minimum quantization error quantization weight, then into The weight that row renormalization is quantified, specific steps are as follows:
Step 100, the parameter for obtaining the full precision model of pre-training, each filtering to all convolutional layers and full articulamentum The parameter vector of device obtains w ∈ RM
Step 101, parameter w is normalized, using w maximum value element each element normalizing in w Change to [- 1,1], i.e.,
Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V (α)。
Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization Function) Π ()In each element project to the set V (α) of quantized value.
Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantifiedq, it is therefore an objective to maintenance dose The weight and initial parameter w of change have identical magnitude, i.e.,Detach () operation handlebar its In variable be considered as constant.
Step 105, the quantization weight w by obtainingqConvolution operation (full connection is carried out with the input x of neural network this layer Operation) obtain the output y of this layer of neural network.
The back-propagation process carries out approximation to the derivative of the quantization function of step form, so that neural network can With backpropagation, specific steps are as follows:
Step 200, by the backpropagation of gradient, neural network loss function L is obtained to quantization weight wqGradient
Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient.
Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weight Gradient.
Step 203, according to the backpropagation of gradient, the gradient to original floating-point parameter w, the form of gradient are obtained are as follows:
Wherein wiIt is i-th of element of w.
Step 204, using gradientUpdate floating-point weight w.
It is described based on rightMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:
Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base.α Define set V (α), V (α)={ α that all quantized values are constitutedTel|1≤l≤2K, wherein el∈{-1,1}KEnumerate all K The binary-coding of a bit.Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, the optimal solution of α is calculated, this Shi Wenti can convert recurrence problem solving, obtain new α;α is fixed again, calculates the optimal solution of B, problem is converted into projection at this time Problem, i.e. handleIn each element project to set V (α), obtain corresponding binary-coding, as new B;Continuous iteration The above process, until convergence.
The overall flow of the model training are as follows: to each filter parameter in neural network convolutional layer and full articulamentum The parameter w that w is quantifiedq, use wqCarry out neural network forward-propagating;Loss function L is calculated by forward-propagating, Then the backpropagation of gradient is carried out, wherein the derivative to phase step type function Π () carries out approximation, finally obtains and floating-point is joined The gradient of number w, updates floating point parameters w, and training needs iteration repeatedly until convergence.Final model need to only save the normalizing of quantization Change weightCorresponding binary-coding and quantization base are for predicting, without saving floating point parameters w.
The utility model has the advantages that compared with prior art, it is provided by the invention based on the normalized neural network weight quantization of weight Method can quantify the weight of neural network to extremely low bit, while guarantee that the performance of model is not lost significantly.Benefit Weight normalization is carried out with maximum value element, so that the gradient profile for obtaining maximum value element is different from other elements Gradient so that maximum value element is all quickly close to 0 value in each iteration, to can weaken after many iterations The long-tail of weight is distributed, and smaller quantization error is obtained, so that the performance loss of quantitative model reduces.
Detailed description of the invention
Fig. 1 is forward-propagating and the backpropagation schematic diagram of quantizing process of the invention;
Fig. 2 is that the neural network of quantization weight of the invention integrally trains flow chart;
Fig. 3 is quantization base α solution procedure of the invention;
Fig. 4 is the process predicted using trained quantitative model;
Fig. 5 is the floating-point weight of the method for the present invention and " the compact neural network of high-precision based on the quantization function that can learn " Profiles versus's figure.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention The modification of form falls within the application range as defined in the appended claims.
Based on the normalized deep neural network compression method of weight, the forward-propagating and backpropagation of quantizing process are as schemed Shown in 1.Quantization function forward-propagating process are as follows: first each filter weight w is normalized to obtainIt is then based on minimum Quantization error solves quantization base α, obtains the set V (α) of quantized value by α, quantifies normalized weight according to V (α)Use Projection function ΠV(α)() arrives handleEach element project to set V (α) and obtainFinally carry out the renormalization amount of obtaining Change weight wq, the magnitude of renormalization guarantee quantization weight is consistent with the magnitude of floating-point weight, uses the weight w of quantizationqCarry out net The forward-propagating of network.The back-propagation process of quantizing process carries out approximate, normalizing to the gradient of phase step type quantization function Π () Change and the derivative of the operation of renormalization can be directly obtained without approximation, may finally obtain to the gradient of floating-point weight w (such as Shown in step 204), gradient is accumulated in floating-point weight, so as to be trained end to end.
Based on the normalized deep neural network compression method of weight, whole training process is as shown in Figure 2.Firstly the need of taking The full precision model for obtaining a pre-training, then according to the forward-propagating process of the quantizer of Fig. 1 to convolutional layers all in network The floating-point weight w of each filter connected entirely quantifies, and uses the weight w of quantizationqNetwork is carried out with the training data of input Forward-propagating, the loss function L of network is calculated.Then the backpropagation Jing Guo neural network is obtained to quantization weight wq Gradient the gradient to floating-point weight w is obtained according to the back-propagation process of the quantizer of Fig. 1, update floating-point weight w, repeatedly It updates until convergence.Floating-point weight w and quantization base α are saved, compression (low precision) can be further processed into before model prediction Format.
It is as shown in Figure 3 to quantify base α solution procedure.By the way of alternative optimization, initialization quantization base α first calculates institute Have the set V (α) of quantized value, thenIn each element project to V (α) and obtain corresponding binary-coding, it is as new B;According to current B andFind out the new enclosed optimal solution of quantization base α, the wheel number for repeating the above steps certain.
The process predicted using trained quantitative model as shown in figure 4, the floating-point weight w that is obtained first according to training and Quantify base α to calculateBinary-coding B, then calculate renormalization after quantization weight wqCorresponding quantization base α=α × Max (| w |), final model need to only store quantization base αWith binary-coding B, the storage overhead of model is substantially reduced.Convolution Inner product operation in operation and full attended operation can calculate in the following way:
Due to BijIt is two-value, value is { -1,1 }, so Bij·xiCalculating only need to be according to BijValue, to xiIt takes just Or take negative, and it is operated without multiplication, therefore a large amount of multiplication in inner product operation can be eliminated and operated, it can be under specific hardware Accelerate deduction process.Furthermore some existing methods be can be combined with, input x is also quantified as low bit, greatly promote nerve The deduction speed of network.
The present invention is tested on both data sets, compared the effect of method and existing best method of the invention Fruit, existing best method are that Dongqing Zhang was proposed in European Computer vision international conference ECCV paper in 2018 " the compact neural network of high-precision based on the quantization function that can learn ".
First data set is CIFAR-100, and CIFAR-100 includes the RGB picture of 60K 32x32, altogether 100 classes Not, each classification has 600 pictures.Including 50K training pictures and 10K test pictures.Experimental result such as 1 institute of table Show, evaluation index is Top1 classification accuracy, and the network used is ResNet20.It can be seen that method ratio of the invention in it is existing most Good method has very big promotion, and can be more than the effect of full precision model when using 4 bit quantization weights.
Table 1 is the Top1 nicety of grading (%) that the present invention uses ResNet20 on CIFAR-100 data set
Table 2 is that the present invention is classified on ImageNet data set using the Top1/Top5 of MobileNetv1
Precision (%)
Second data set is ImageNet, and ImageNet includes that 1.28M training pictures and 50K open and test pictures, one Totally 1000 classifications.Experimental result is as shown in table 2, and evaluation index is that Top1 and Top5 classify accuracy, the network used for MobileNetv1.It can be seen that method ratio of the invention has very big promotion in existing best method.
Based on the normalized deep neural network compression method of weight, obtained weight distribution is as shown in figure 5, use 2 ratios Special quantization weight, above four figures be the obtained floating-point weight distribution of the present invention, four figures respectively correspond four convolution of selection Layer/full articulamentum floating-point weight distribution.Four figures are " the compact nerve net of high-precision based on the quantization function that can learn below The floating-point weight distribution that network " equivalent layer obtains, the point in x-axis represent being averaged for the quantized value of this layer of all filters.Wherein " mse " indicates being averaged for the Relative quantification error of this layer of each filter, and the Relative quantification error of each filter is defined asWeight distribution figure comparison from above and below is it can be seen that the present invention can be distributed to avoid the long-tail of weight, simultaneously Smaller Relative quantification error " mse " can be obtained, therefore quantitative model can obtain better performance.

Claims (5)

1. one kind is based on the normalized deep neural network compression method of weight, it is characterised in that: in neural network forward-propagating In the process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out what renormalization was quantified Weight carries out the forward-propagating of neural network using the weight of quantization;In back-propagation process, to the quantization letter of step form Several derivatives carries out approximation, so that neural network can be reversed propagation, to be trained end to end, gradient is accumulated in floating-point In weight.
2. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that in forward direction In communication process, weight is normalized first, is then based on minimum quantization error quantization weight, then carry out the renormalization amount of obtaining The weight of change, specific steps are as follows:
Step 100, the parameter for obtaining the full precision model of pre-training, to each filters of all convolutional layers and full articulamentum Parameter vector obtains w ∈ RM
Step 101, parameter w is normalized, each element in w is normalized to using the element of the maximum value of w [- 1,1], i.e.,
Step 102, based on pairMinimum quantization error solve optimal quantization base α, obtain corresponding quantized value set V (α);
Step 103, to the weight after normalizationQuantization obtainsI.e.Projection function (quantization function) Π ()In each element project to the set V (α) of quantized value;
Step 104, to the normalized weight of quantizationCarry out the weight w that renormalization is quantifiedq, it is therefore an objective to keep quantization Weight and initial parameter w have identical magnitude, i.e.,Detach () operation handlebar is therein Variable is considered as constant;
Step 105, the quantization weight w by obtainingqConvolution operation (full attended operation) is carried out with the input x of neural network this layer Obtain the output y of this layer of neural network.
3. being based on the normalized deep neural network compression method of weight as described in claim 1, it is characterised in that: reversed In communication process, approximation is carried out to the derivative of the quantization function of step form, so that neural network can be reversed propagation, it is specific to walk Suddenly are as follows:
Step 200, by the backpropagation of gradient, neural network loss function L is obtained to quantization weight wqGradient
Step 201, according to the backpropagation of gradient, the normalized weight to quantization is obtainedGradient;
Step 202, approximation is carried out to the gradient of phase step type function Π (), i.e.,It obtains to normalized weightLadder Degree;
Step 203, according to the backpropagation of gradient, the gradient to original floating-point parameter w, the form of gradient are obtained are as follows:
Step 204, using gradientUpdate floating-point weight w.
4. being based on the normalized deep neural network compression method of weight as described in claim 1, which is characterized in that described Based on pairMinimum quantization error solves the optimization aim of optimal quantization base α are as follows:
Wherein, M is the dimension of weight vectors, and K is the bit number of quantization, and B isBinary-coding, α be quantization base;α is defined The set V (α), V (α)={ α that all quantized values are constitutedTel|1≤l≤2K, wherein el∈{-1,1}KEnumerate all K bits Binary-coding;Using the above-mentioned target of thought Optimization Solution of alternative optimization, B is first fixed, calculates the optimal solution of α, at this time problem Recurrence problem solving can be converted, new α is obtained;α is fixed again, calculates the optimal solution of B, problem is converted into On The Projection at this time, That is handleIn each element project to set V (α), obtain corresponding binary-coding, as new B;Continuous iteration is above-mentioned Process, until convergence.
5. being based on the normalized deep neural network compression method of weight, the bulk flow of model training as described in claim 1 Journey are as follows: the parameter w that each filter parameter w in all convolutional layers of neural network and full articulamentum is quantifiedq, make Use wqCarry out neural network forward-propagating;Loss function L is calculated by forward-propagating, then carries out the backpropagation of gradient, Approximation wherein is carried out to the derivative of phase step type function Π (), finally obtains the gradient to floating point parameters w, updates floating point parameters w, Training needs iteration repeatedly until convergence.Final model need to only save the normalized weight of quantizationCorresponding binary-coding With quantization base for predicting, without saving floating point parameters w.
CN201910575103.3A 2019-06-28 2019-06-28 One kind being based on the normalized deep neural network compression method of weight Pending CN110276451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910575103.3A CN110276451A (en) 2019-06-28 2019-06-28 One kind being based on the normalized deep neural network compression method of weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910575103.3A CN110276451A (en) 2019-06-28 2019-06-28 One kind being based on the normalized deep neural network compression method of weight

Publications (1)

Publication Number Publication Date
CN110276451A true CN110276451A (en) 2019-09-24

Family

ID=67962561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910575103.3A Pending CN110276451A (en) 2019-06-28 2019-06-28 One kind being based on the normalized deep neural network compression method of weight

Country Status (1)

Country Link
CN (1) CN110276451A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851563A (en) * 2019-10-08 2020-02-28 杭州电子科技大学 Neighbor document searching method based on coding navigable stretch chart
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
GB2581546A (en) * 2019-08-22 2020-08-26 Imagination Tech Ltd Methods and systems for converting weights of a deep neural network from a first number format to a second number format
CN112686031A (en) * 2020-12-24 2021-04-20 北京有竹居网络技术有限公司 Text feature extraction model quantification method, device, equipment and storage medium
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
CN113688990A (en) * 2021-09-09 2021-11-23 贵州电网有限责任公司 No-data quantitative training method for power edge calculation classification neural network
CN113795869A (en) * 2019-11-22 2021-12-14 腾讯美国有限责任公司 Method and apparatus for quantization, adaptive block partitioning and codebook coding and decoding for neural network model compression
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2581546A (en) * 2019-08-22 2020-08-26 Imagination Tech Ltd Methods and systems for converting weights of a deep neural network from a first number format to a second number format
CN112418391A (en) * 2019-08-22 2021-02-26 畅想科技有限公司 Method and system for transforming weights of deep neural network
GB2581546B (en) * 2019-08-22 2021-03-31 Imagination Tech Ltd Methods and systems for converting weights of a deep neural network from a first number format to a second number format
CN112418391B (en) * 2019-08-22 2022-07-08 畅想科技有限公司 Method and system for transforming weights of deep neural network
US11188817B2 (en) 2019-08-22 2021-11-30 Imagination Technologies Limited Methods and systems for converting weights of a deep neural network from a first number format to a second number format
CN110851563A (en) * 2019-10-08 2020-02-28 杭州电子科技大学 Neighbor document searching method based on coding navigable stretch chart
CN110851563B (en) * 2019-10-08 2021-11-09 杭州电子科技大学 Neighbor document searching method based on coding navigable stretch chart
CN113795869B (en) * 2019-11-22 2023-08-18 腾讯美国有限责任公司 Neural network model processing method, device and medium
CN113795869A (en) * 2019-11-22 2021-12-14 腾讯美国有限责任公司 Method and apparatus for quantization, adaptive block partitioning and codebook coding and decoding for neural network model compression
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
CN110969251B (en) * 2019-11-28 2023-10-31 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
WO2022135174A1 (en) * 2020-12-24 2022-06-30 北京有竹居网络技术有限公司 Quantization method and apparatus for text feature extraction model, and device and storage medium
CN112686031B (en) * 2020-12-24 2023-09-08 北京有竹居网络技术有限公司 Quantization method, device, equipment and storage medium of text feature extraction model
CN112686031A (en) * 2020-12-24 2021-04-20 北京有竹居网络技术有限公司 Text feature extraction model quantification method, device, equipment and storage medium
CN113688990A (en) * 2021-09-09 2021-11-23 贵州电网有限责任公司 No-data quantitative training method for power edge calculation classification neural network
CN113610232B (en) * 2021-09-28 2022-02-22 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
WO2023050707A1 (en) * 2021-09-28 2023-04-06 苏州浪潮智能科技有限公司 Network model quantization method and apparatus, and computer device and storage medium
CN113610232A (en) * 2021-09-28 2021-11-05 苏州浪潮智能科技有限公司 Network model quantization method and device, computer equipment and storage medium
CN114925829A (en) * 2022-07-18 2022-08-19 山东海量信息技术研究院 Neural network training method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110276451A (en) One kind being based on the normalized deep neural network compression method of weight
WO2022141754A1 (en) Automatic pruning method and platform for general compression architecture of convolutional neural network
WO2020238237A1 (en) Power exponent quantization-based neural network compression method
CN111612147A (en) Quantization method of deep convolutional network
CN113159173A (en) Convolutional neural network model compression method combining pruning and knowledge distillation
CN113595993B (en) Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation
CN111931906A (en) Deep neural network mixing precision quantification method based on structure search
CN114239861A (en) Model compression method and system based on multi-teacher combined guidance quantification
CN108268950A (en) Iterative neural network quantization method and system based on vector quantization
CN114626550A (en) Distributed model collaborative training method and system
Wang et al. Global aligned structured sparsity learning for efficient image super-resolution
Zhang et al. ACP: adaptive channel pruning for efficient neural networks
CN112686384A (en) Bit-width-adaptive neural network quantization method and device
Chen et al. DNN gradient lossless compression: Can GenNorm be the answer?
Liu et al. Flexi-compression: a flexible model compression method for autonomous driving
CN114154626B (en) Filter pruning method for image classification task
CN115564987A (en) Training method and application of image classification model based on meta-learning
CN116051861A (en) Non-anchor frame target detection method based on heavy parameterization
CN113033653B (en) Edge-cloud cooperative deep neural network model training method
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
WO2022141189A1 (en) Automatic search method and apparatus for precision and decomposition rank of recurrent neural network
CN114648123A (en) Convolutional neural network hierarchical reasoning time prediction method and device
CN113159318A (en) Neural network quantification method and device, electronic equipment and storage medium
CN112488291B (en) 8-Bit quantization compression method for neural network
CN115496200B (en) Neural network quantization model training method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190924

RJ01 Rejection of invention patent application after publication