CN108805257A - A kind of neural network quantization method based on parameter norm - Google Patents
A kind of neural network quantization method based on parameter norm Download PDFInfo
- Publication number
- CN108805257A CN108805257A CN201810387893.8A CN201810387893A CN108805257A CN 108805257 A CN108805257 A CN 108805257A CN 201810387893 A CN201810387893 A CN 201810387893A CN 108805257 A CN108805257 A CN 108805257A
- Authority
- CN
- China
- Prior art keywords
- quantization
- parameter
- neural network
- loss
- center
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 160
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000005457 optimization Methods 0.000 claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims abstract description 4
- 230000006835 compression Effects 0.000 claims description 8
- 238000007906 compression Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 abstract description 5
- 238000007796 conventional method Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000011002 quantification Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of neural network quantization method based on parameter norm, and this method includes:For given pre-training neural network parameter model, is counted by the value of the parameter to required quantization layer, divide quantization center;According to selected quantization center, the quantization loss of the parameter of each corresponding quantization layer is calculated;Quantization loss is added with the loss of the error in classification of Neural Network Training Parameter model, as total losses, and carries out backpropagation optimization, while quantifying center and being also updated in optimization;To the end of training, quantization operation, the compact model after being quantified are carried out to equivalent layer according to quantization center.Method provided by the invention can divide weight center, and by applying simple quantization loss, using optimizer identical with conventional method, neural network model is quantified, to obtain the compact model of archetype, reduce network storage volume and computational complexity.
Description
Technical Field
The invention relates to the field of neural networks, in particular to a neural network quantification method based on a parameter norm.
Background
As early as the end of the last century, Yann LeCun et al have successfully identified handwritten zip codes on mail using neural networks. In recent years, different neural network structures are layered endlessly, a good effect far exceeding that of a traditional algorithm is obtained, a huge breakthrough is made in the fields of computer vision, voice processing, recommendation systems and the like, and the neural network structure is widely applied to the industries of internet, intelligent equipment, security equipment and the like.
In order to enable the neural network to achieve a better effect, supervised optimization and learning are carried out on network parameters based on a large-scale labeled data set in the training process. Meanwhile, in order to learn data more comprehensively, the corresponding neural network structure is also developing towards high capacity and high complexity. However, as the number of layers and the number of parameters of the neural network increase, the operation time and the storage cost increase greatly, so that training and deployment of the existing neural network need to depend on a large-scale server cluster. This is a difficult method for mobile devices and wearable devices under the mobile internet.
Some effective algorithms have been proposed for the compression problem of neural networks. One of the well-known quantization methods is to quantize the high-precision floating-point parameters into a low-precision representation or to retain only a small number of high-precision quantization centers. However, most of these algorithms are based on conventional average quantization or clustering quantization algorithms, for example, patent of the invention with the grant number of CN105184362B obtains quantization codebooks through K-means clustering. However, these methods are not optimized by combining the characteristics of the neural network, and only quantification is performed from the mathematical and statistical aspects. Therefore, the result after quantization often has a large accuracy reduction relative to the original model, and is difficult to use in practical application scenarios. How to combine the quantization operation in the training process of the neural network becomes a new research direction.
Disclosure of Invention
In order to solve the problems, a neural network quantization algorithm based on parameter norm is provided to realize the combination of neural network training and quantization and overcome the problem of low accuracy after quantization.
The invention provides a neural network quantification method based on a parameter norm, which comprises the following steps:
for a given pre-training neural network parameter model, dividing a quantization center by counting the values of the parameters of the required quantization layer;
calculating a quantization loss of the parameter for each quantization layer based on the selected quantization center;
adding the quantitative loss and the loss of the pre-trained neural network parameter model to obtain a total loss, and performing back propagation optimization;
judging whether the training requirement is met, if so, entering the next step, and if not, updating the quantization center;
and carrying out quantization operation on the corresponding layer according to the quantization center to obtain a quantized compression model.
By automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved.
Optionally, the dividing of the quantization centers comprises the steps of:
for a neural network training model, a quantization series parameter n is given to the corresponding layer l to be quantizedlI.e. the number of quantization centers;
weighting parameter w of l layerlMaking statistics to obtain the corresponding maximum value max (w)l) And minimum value min (w)l);
And obtaining different quantization centers and regions according to the maximum and minimum values.
Optionally, the specific formula of the different quantization centers and regions is as follows:
l denotes the l network weight parameter layer, dlRepresenting the spacing of adjacent quantization centers or regions,the end points of the different quantized regions are represented,indicates the range of the i-th quantization region,represents the value of the ith quantization center, i being an integer.
Optionally, the calculation of the loss comprises the steps of:
parameter w for l layerslAnd calculating the parameter norm loss according to the quantization region and the quantization center where the parameter norm loss is located.
Alternatively,is wlThe jth weight parameter in (1) is found by comparing the rangesThe quantization area is located to enable the following formula to be established;
computingL of1Loss or L2Loss:
after summing, L of the L layer is obtained1Loss or L2Loss:
m is the number of all weights for that layer.
Optionally, the optimization operation comprises the steps of:
selecting t samples { x ] from a neural network training set(1),x(2),…,x(t)Is used for learning optimization of the network, wherein x(i)The corresponding target is y(i);
Calculating a classification error loss L for a neural networkcCalculating the total quantization loss LqAdding the total loss L according to a certain proportion to obtain a formula:
wherein, theta is weight parameter of the neural network model, f (x)(i)(ii) a Theta) is a neural netOutput of the collaterals, LCEalpha represents a quantization loss scale factor for a cross entropy loss function;
gradient was calculated using the chain rule:
updating the weight parameters in sequence:
θ=θ-εg
where ε is the learning rate of the optimizer.
Optionally, the quantization center update comprises the following steps:
parameter w for the l-th layerlCalculating the mean value of the parameters according to the quantization area where the parameter is located;
then after one optimization training, the mean of the ith quantization region is:
wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region,the end points of the different quantized regions are represented,represents the value of the ith quantization center, i being an integer.
Optionally, the update to the ith quantization center uses the following formula:
beta is a quantization center update rate parameter.
Optionally, the quantization operation comprises the steps of:
and carrying out quantization operation on the quantization result of the parameter of each layer in sequence according to the quantization region and the quantization center where the quantization result of the parameter of each layer is located.
Optionally, the formula of the quantization operation is as follows:
as a result of the quantization of the parameter of the l-th layer,a value representing the ith quantization center,is wlThe jth weight parameter of (a),the end points of the different quantized regions are represented,indicates the range of the i-th quantization region.
The invention has the advantages that:
by adding the layer quantization loss of the network based on the parameter norm, the neural network can train and optimize parameters in the quantization process, and the accuracy loss caused by quantization is reduced. Specifically, the method comprises the following steps: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved. In summary, the method of the present invention can reduce the decrease of the accuracy rate while quantifying the network.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the specific embodiments. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 illustrates an exemplary flow diagram of a neural network quantization algorithm in accordance with an embodiment of the present invention;
fig. 2 is a network structure diagram illustrating a neural network quantization algorithm according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are merely for illustrating and explaining the present invention, and are not intended to limit the present invention, and that the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The embodiment of the invention provides a neural network quantization algorithm based on a parameter norm based on a neural network, and a model for quantization compression of the neural network provided by the embodiment of the invention is shown in fig. 1: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved.
The neural network quantization method in the embodiment of the present invention is described in detail below.
As shown in fig. 1, an exemplary flowchart of a neural network quantization method in an embodiment of the present invention is shown, where the method includes the following steps:
step S101: according to the input neural network initial model, a quantization series parameter n is given to the layer l to be quantized in sequencelI.e. the number of quantization centers. In the embodiment of the present invention, the tested network model is Alexnet, and the structure thereof is shown in fig. 2. Weighting parameter w of l layerlMaking statistics to obtain corresponding maximum value max(wl) And minimum value min (w)l). According to the maximum value and the minimum value, linear division is carried out to obtain different quantization centersAnd a quantization region
The specific formula is as follows:
l denotes the l network weight parameter layer, dlRepresenting the spacing of adjacent quantization centers or regions,the end points of the different quantized regions are represented,indicates the range of the i-th quantization region,represents the value of the ith quantization center, i being an integer.
Step S102: parameter w for l layerslAnd calculating the parameter norm loss according to the quantization region and the quantization center where the parameter norm loss is located.Is wlIn (1)The jth weight parameter is found by comparing the rangesIn the quantization region i, calculatingL of1Loss or L2Loss, after summing, to obtain L of the L-th layer1Loss or L2And (4) loss. The classification error is obtained by calculating the cross entropy loss of the label of the network output and input data.
The specific formula is as follows:
computingL of1Loss or L2Loss:
after summing, L of the L layer is obtained1Loss or L2Loss:
m is the number of all weights for that layer.
Step S103: and updating and optimizing network parameters by using a back propagation algorithm in the field of neural networks according to the calculated classification error loss and layer quantization loss. And then, according to the conditions of whether the requirement of quantization precision is met or not, whether the preset training times are met or not and the like, judging whether the training is continued or is finished, and respectively skipping to the steps S201 and S104. If the training requirement is not met, the process goes to step S201. If the training requirement is met, the process jumps to step S104.
The optimization operation comprises the following steps:
selecting t samples { x ] from a neural network training set(1),x(2),...,x(t)Is used for learning optimization of the network, wherein x(i)The corresponding target is y(i);
Calculating a classification error loss L for a neural networkcCalculating the total quantization loss LqAdding the total loss L according to a certain proportion to obtain a formula:
wherein theta is a weight parameter of the neural network model, f (x (i); theta) is an output of the neural network, and LCEalpha represents a quantization loss scale factor for a cross entropy loss function;
gradient was calculated using the chain rule:
updating the weight parameters in sequence:
θ=θ-εg
where ε is the learning rate of the optimizer.
Step S104: and after the quantization is finished, obtaining the training optimization parameters of the quantization layer and the quantization centers updated for multiple times. And carrying out quantization operation in sequence according to the quantization region and the quantization center of the parameter, so that the quantization value of the parameter is equal to the quantization center of the quantization region of the parameter.
The quantization operation comprises the steps of:
qlis the parameter w of the l-th layerlAccording to the quantization region and the quantization center, sequentially performing quantization operation:
i.e. the parameter quantization value is equal to the quantization center value of the quantization region in which the parameter value is located.
Step S201: parameter w for the l-th layerlAccording to the quantization area where the network weight parameter is located, average value calculation of the parameter is carried out to obtain the average value of each quantization area after the network weight parameter is updatedAnd updating the ith quantization center by combining the quantization center value before updating. After that, the process goes to step S102.
The quantization center update comprises the following steps:
parameter w for the l-th layerlCalculating the mean value of the parameters according to the quantization area where the parameter is located;
then after one optimization training, the mean of the ith quantization region is:
wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region,the end points of the different quantized regions are represented,a value representing the ith quantization center, i being an integer;
the update to the ith quantization center uses the following formula:
beta is a quantization center update rate parameter.
According to the invention, through adding the network layer quantization loss based on the parameter norm, the neural network can train and optimize parameters in the quantization process, and the accuracy loss caused by quantization is reduced. Specifically, the method comprises the following steps: by automatically dividing the quantization centers, the burden of manual analysis can be reduced as much as possible, and different network parameters can be better adapted. By calculating the quantization loss, the difference between the current network parameter and the quantization target can be obtained and used as one of the evaluation and optimization items of the quantization degree. Through the combination of the quantization loss and the training loss, the neural network can maintain high accuracy while quantizing. During training, the quantization center is continuously updated along with the transformation of the parameters, so that the error in the training process is reduced. After the quantization training is finished, the network weight parameters are quantized through the obtained quantization area and the center, and the purpose of model compression is achieved. In summary, the method of the present invention can reduce the decrease of the accuracy rate while quantifying the network.
In summary, the steps in the embodiments of the present invention are described. Therefore, the method provided by the embodiment of the invention can combine training optimization and parameter quantization and improve the accuracy of the quantized network.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A method for neural network quantization based on parameter norms, the method comprising:
for a given pre-training neural network parameter model, dividing a quantization center by counting the values of the parameters of the required quantization layer;
calculating a quantization loss of the parameter for each quantization layer based on the selected quantization center;
adding the quantization loss and the classification error loss of the pre-training neural network parameter model to obtain a total loss, and performing back propagation optimization;
judging whether the training requirement is met, if so, entering the next step, and if not, updating the quantization center;
and carrying out quantization operation on the corresponding layer according to the quantization center to obtain a quantized compression model.
2. The neural network quantization method of claim 1, wherein said dividing the quantization centers comprises the steps of:
for a neural network model, sequentially giving a quantization progression parameter, namely a quantization center number, to each corresponding layer to be quantized;
counting the weight parameters of each corresponding layer to obtain the maximum value and the minimum value of the weight parameters;
and obtaining different quantization centers and regions according to the maximum value and the minimum value.
3. The neural network quantization method of claim 2, wherein the different quantization centers and regions are specified by the following formula:
wherein, l represents the ith network weight parameter layer, wlDenotes a weight parameter, max (w)l) Denotes the maximum value, min (w)l) Denotes the minimum value, dlRepresenting the spacing of adjacent quantization centers or regions,to express different quantitiesThe end points of the chemo-area,indicates the range of the i-th quantization region,represents the value of the ith quantization center, where i is an integer.
4. The neural network quantization method of claim 1, wherein said calculating a quantization loss for the parameters of each quantization layer comprises the steps of:
and for the parameters of the layer needing to be quantized, calculating the parameter norm loss according to the quantization region and the quantization center where the parameters are located.
5. The neural network quantization method of claim 3 or 4, wherein the range of the quantization region is compared to find the range of the quantization regionThe quantization area in which the data is located,is wlThe jth weight parameter of (1), the following formula holds:
computingL of1Loss or L2Loss:
after summing, L of the L layer is obtained1Loss or L2Loss:
where m is the number of all weights for that layer.
6. The neural network quantization method of claim 1, wherein said back propagation optimization operation comprises the steps of:
selecting a plurality of samples from the data set of the pre-trained neural network parameter model;
calculating the classification error loss of the pre-training neural network parameter model, calculating the total quantization loss, and adding the classification error loss and the total quantization loss according to a certain proportion to obtain the total loss;
calculating the gradient using a chain rule;
and updating the weight parameters in sequence.
7. A neural network quantization method according to claim 1 or 3, wherein said quantization centre update comprises the steps of:
calculating the mean value of the parameters of the corresponding quantization layer according to the quantization region where the parameters are located;
after one optimization training, the mean of the ith quantization region is obtained as:
wherein,as a weight parameter in the ith quantization region,the number of weight parameters for the ith quantization region.
8. The neural network quantization method of claim 7,
the following formula is used for the ith quantization center update:
wherein,represents the mean value of the i-th quantization region,represents the value of the ith quantization center, i being an integer and β being a quantization center update rate parameter.
9. The neural network quantization method of claim 1, wherein said quantization operation comprises the steps of:
and carrying out quantization operation on the quantization result of the parameter of each layer in sequence according to the quantization region and the quantization center where the quantization result of the parameter of each layer is located.
10. The neural network quantization method of claim 9, wherein the quantization operation is formulated as follows:
as a result of the quantization of the parameter of the l-th layer,a value representing the ith quantization center,is wlThe jth weight parameter of (a),the end points of the different quantized regions are represented,indicates the range of the i-th quantization region.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810387893.8A CN108805257A (en) | 2018-04-26 | 2018-04-26 | A kind of neural network quantization method based on parameter norm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810387893.8A CN108805257A (en) | 2018-04-26 | 2018-04-26 | A kind of neural network quantization method based on parameter norm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108805257A true CN108805257A (en) | 2018-11-13 |
Family
ID=64094082
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810387893.8A Pending CN108805257A (en) | 2018-04-26 | 2018-04-26 | A kind of neural network quantization method based on parameter norm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108805257A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993296A (en) * | 2019-04-01 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Quantify implementation method and Related product |
CN110059822A (en) * | 2019-04-24 | 2019-07-26 | 苏州浪潮智能科技有限公司 | One kind compressing quantization method based on channel packet low bit neural network parameter |
CN111406263A (en) * | 2018-11-28 | 2020-07-10 | 深圳市大疆创新科技有限公司 | Method and device for searching neural network architecture |
CN111738419A (en) * | 2020-06-19 | 2020-10-02 | 北京百度网讯科技有限公司 | Quantification method and device of neural network model |
WO2020258071A1 (en) * | 2019-06-26 | 2020-12-30 | Intel Corporation | Universal loss-error-aware quantization for deep neural networks with flexible ultra-low-bit weights and activations |
CN112329923A (en) * | 2020-11-24 | 2021-02-05 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN112532251A (en) * | 2019-09-17 | 2021-03-19 | 华为技术有限公司 | Data processing method and device |
CN113361678A (en) * | 2020-03-04 | 2021-09-07 | 北京百度网讯科技有限公司 | Training method and device of neural network model |
CN114363631A (en) * | 2021-12-09 | 2022-04-15 | 慧之安信息技术股份有限公司 | Deep learning-based audio and video processing method and device |
CN114692865A (en) * | 2020-12-31 | 2022-07-01 | 安徽寒武纪信息科技有限公司 | Neural network quantitative training method and device and related products |
-
2018
- 2018-04-26 CN CN201810387893.8A patent/CN108805257A/en active Pending
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111406263A (en) * | 2018-11-28 | 2020-07-10 | 深圳市大疆创新科技有限公司 | Method and device for searching neural network architecture |
CN109993296B (en) * | 2019-04-01 | 2020-12-29 | 安徽寒武纪信息科技有限公司 | Quantitative implementation method and related product |
CN109993296A (en) * | 2019-04-01 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Quantify implementation method and Related product |
CN110059822A (en) * | 2019-04-24 | 2019-07-26 | 苏州浪潮智能科技有限公司 | One kind compressing quantization method based on channel packet low bit neural network parameter |
WO2020258071A1 (en) * | 2019-06-26 | 2020-12-30 | Intel Corporation | Universal loss-error-aware quantization for deep neural networks with flexible ultra-low-bit weights and activations |
CN112532251A (en) * | 2019-09-17 | 2021-03-19 | 华为技术有限公司 | Data processing method and device |
CN113361678A (en) * | 2020-03-04 | 2021-09-07 | 北京百度网讯科技有限公司 | Training method and device of neural network model |
CN111738419B (en) * | 2020-06-19 | 2024-01-12 | 北京百度网讯科技有限公司 | Quantification method and device for neural network model |
CN111738419A (en) * | 2020-06-19 | 2020-10-02 | 北京百度网讯科技有限公司 | Quantification method and device of neural network model |
CN112329923A (en) * | 2020-11-24 | 2021-02-05 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN112329923B (en) * | 2020-11-24 | 2024-05-28 | 杭州海康威视数字技术股份有限公司 | Model compression method and device, electronic equipment and readable storage medium |
CN114692865A (en) * | 2020-12-31 | 2022-07-01 | 安徽寒武纪信息科技有限公司 | Neural network quantitative training method and device and related products |
CN114363631B (en) * | 2021-12-09 | 2022-08-05 | 慧之安信息技术股份有限公司 | Deep learning-based audio and video processing method and device |
CN114363631A (en) * | 2021-12-09 | 2022-04-15 | 慧之安信息技术股份有限公司 | Deep learning-based audio and video processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108805257A (en) | A kind of neural network quantization method based on parameter norm | |
CN109408731B (en) | Multi-target recommendation method, multi-target recommendation model generation method and device | |
Wu et al. | Easyquant: Post-training quantization via scale optimization | |
CN110969251B (en) | Neural network model quantification method and device based on label-free data | |
Pomerat et al. | On neural network activation functions and optimizers in relation to polynomial regression | |
CN109002889B (en) | Adaptive iterative convolution neural network model compression method | |
CN110809772A (en) | System and method for improving optimization of machine learning models | |
JP6950756B2 (en) | Neural network rank optimizer and optimization method | |
CN110826692B (en) | Automatic model compression method, device, equipment and storage medium | |
Mo et al. | Neural architecture search for keyword spotting | |
CN111898689A (en) | Image classification method based on neural network architecture search | |
CN109034175B (en) | Image processing method, device and equipment | |
CN115774854B (en) | Text classification method and device, electronic equipment and storage medium | |
CN110717103A (en) | Improved collaborative filtering method based on stack noise reduction encoder | |
CN113961765A (en) | Searching method, device, equipment and medium based on neural network model | |
Li et al. | Using feature entropy to guide filter pruning for efficient convolutional networks | |
Zhao et al. | An investigation on different underlying quantization schemes for pre-trained language models | |
CN112598078B (en) | Hybrid precision training method and device, electronic equipment and storage medium | |
Kwak et al. | Quantization aware training with order strategy for CNN | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
US11507782B2 (en) | Method, device, and program product for determining model compression rate | |
US20200372363A1 (en) | Method of Training Artificial Neural Network Using Sparse Connectivity Learning | |
CN116976461A (en) | Federal learning method, apparatus, device and medium | |
Yamada et al. | Weight Features for Predicting Future Model Performance of Deep Neural Networks. | |
US11195094B2 (en) | Neural network connection reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181113 |
|
RJ01 | Rejection of invention patent application after publication |