WO2020237904A1 - 一种基于幂指数量化的神经网络压缩方法 - Google Patents
一种基于幂指数量化的神经网络压缩方法 Download PDFInfo
- Publication number
- WO2020237904A1 WO2020237904A1 PCT/CN2019/105485 CN2019105485W WO2020237904A1 WO 2020237904 A1 WO2020237904 A1 WO 2020237904A1 CN 2019105485 W CN2019105485 W CN 2019105485W WO 2020237904 A1 WO2020237904 A1 WO 2020237904A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- weight parameter
- neural network
- weight
- quantized
- power exponent
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Definitions
- the invention discloses a neural network compression method based on power exponent quantification, relates to artificial intelligence neural network technology, and belongs to the technical field of calculation, calculation and counting.
- Deep Neural Network DNN
- DNN Deep Neural Network
- the main strategy adopted by the existing quantization compression method is to limit the parameter change range of the convolutional neural network to a limited data set instead of the traditional real number set, and then can use fewer bits to represent the same data and realize storage Space saving.
- methods such as two-value and three-value quantization can reduce the complexity of the hardware, the classification accuracy rate is too large due to the impact of quantization and cannot meet the requirements.
- dynamic accuracy can meet the accuracy requirements, the complexity of hardware implementation is very large, which is not conducive to the application of embedded terminal equipment.
- the training of the deep neural network itself has consumed a lot of computing resources and time.
- the formation and update of the huge parameter matrix requires a large amount of computing resources and time to achieve.
- the ordinary quantization compression method can effectively solve the problem of computing resources and It takes a lot of time, but it sacrifices the classification accuracy of the network model, and most of the sacrificed accuracy is not negligible, and the huge parameter model cannot be effectively applied to some miniaturized embedded terminal devices.
- This application aims to provide a neural network compression scheme based on power exponent quantization in order to better solve the problem of precision reduction after quantization compression and reduce the difficulty of hardware implementation.
- the purpose of the present invention is to address the shortcomings of the above-mentioned background technology and provide a neural network compression method based on power exponent quantization, which quantifies a full-precision network model into a low-precision model, and fully reduces the effect of quantization on classification accuracy. At the same time, it reduces the complexity of the application of the network model on the hardware, and solves the technical problems that the accuracy of the neural network after compression by the existing quantization compression method decreases and the hardware implementation is difficult.
- a network compression method based on power exponent quantization including the following steps:
- Step 1 According to the data set and network model disclosed by the network resource library, adjust the parameters by yourself and train the convolutional neural network on the data set, test the weight parameters obtained and adjust the network model and parameters to make the trained The model reaches the target accuracy rate and obtains the weight parameters under high precision;
- the grouping threshold is determined according to the division rate of each layer and the sorting size of the absolute value of the weight parameter. According to this threshold, the weight parameters are divided into two mutually exclusive groups, where the absolute value is less than A set of weights of the threshold remain unchanged, and a set of absolute values greater than the threshold is quantified;
- Step 3 Based on the preset bit width, by comparing the original weight and the difference between the power of 2 and 0 in the bit width range, the weight of another group whose absolute value is greater than the threshold is quantified as Power of 2 or 0 with the smallest difference from the original weight;
- Step 4 Put a set of weights whose absolute value is less than the threshold value back into the original network model and data set for retraining, and update the original weight parameters with the weight parameters after retraining;
- Step 5 Divide the retrained weight parameters into two groups according to the grouping threshold determined in step 2 and the grouping method, quantify a group of weights whose absolute value is greater than the threshold, and quantify a group of weights whose absolute value is less than the threshold Perform retraining, keep the quantized weights unchanged, and retrain the unquantized weights in the network to make the network converge again;
- Step 6 Repeat steps 2 to 5 until all network weights are quantified.
- the database for network training is the Imagnet data set.
- the data is trained and quantified based on the learning framework caffe.
- floating-point calculations are used to test the data of each layer, and then the network model and parameters are adjusted to make the network model reach the highest possible classification Accuracy, get the weight parameter at this time.
- the specific method of grouping is: sort the weight tensor according to the absolute value, determine the corresponding threshold according to the selected segmentation rate, and divide the weights whose absolute value is greater than or equal to the threshold into a group, and The weights whose absolute value is less than the threshold are grouped into one group, so that the two groups are mutually exclusive.
- the method of selecting the power of 2 or 0 instead of the original weight is: the original weight parameter and the power exponent in the range of n2 to n1, the power of 2, or the opposite of the power of 2, or 0, respectively, Find the power of 2 or the opposite of the power of 2 or 0 with the smallest difference to replace the original weight parameter.
- step 4 the way to ensure that the quantified weights are not updated is: after each weight is quantified, the company variable following its weight changes from the initial 1 to 0, while the company variable of the unquantified weight remains If it is 1, multiply the company with the gradient during update to ensure that the gradient of the quantized weight is 0, so that only the unquantized weight is updated.
- the present invention proposes a neural network compression method that can ensure that the parameter value range is not compressed, and to a certain extent reduce the impact on the final classification accuracy, the initial full-precision network parameter model Quantification is a power of 2 or 0, which reduces the scale of the network weight parameter model and also reduces the impact of quantification on classification accuracy;
- the present invention quantizes the initialization parameter model to a power of 2 or 0, so that the multiplication operations involved in the neural network inference and training process can be realized by simple shifting in hardware implementation, and quantized into low-precision weights It can be stored by coding, thereby reducing the storage size of the model in the embedded device, reducing the complexity of the network parameter model on the hardware, and reducing the amount of calculations and parameters during the network operation, making the network model more compact Embedded terminal equipment and other hardware applications provide convenience.
- FIG. 1 is a flowchart of the present invention.
- Figure 2 is a schematic diagram of the structure of partial quantization and retraining.
- the present invention provides a neural network compression method based on power exponent quantization as shown in Fig. 1, and includes the following five steps.
- Step 1 Obtain the initialization parameter model: Obtain the data set and adjust the network model, set the parameters to train the network model, make it converge and reach a certain accuracy, and obtain the full-precision network model weight parameters.
- the training database can be selected Imagenet.
- Step 2 Weight parameter grouping: Pre-set different segmentation rates according to different network models and data sets, and then determine the current grouping threshold according to the absolute value of the weight and the segmentation rate, and set the absolute value of the weight greater than the threshold After the parameters are divided into one group, they are ready to be quantified, and the weight parameters whose absolute value is less than the threshold are divided into another group and ready to be retrained.
- the n1 and n2 are based on The initial parameter matrix is determined, and the value range of the quantized weight parameter is Traverse the power exponent range [n2,n1], subtract the power of 2 or the opposite of the power of 2 or 0 from the current weight parameter to be quantized, and find the power of 2 or the power of 2 with the smallest difference from the weight
- the opposite number or 0, replace the original weight parameter to be quantized with the power of 2 or the opposite number of the power of 2, or 0, so as to realize the quantization of the original weight parameter.
- Step 4 Partial retraining: the part whose absolute value is less than the threshold is returned to the original network model, and retraining is performed on the same data set to make the network converge again and update to obtain new weight parameters.
- Step 5 Cycle grouping quantization: Keep the quantized weight parameters unchanged, group the retrained weight parameters again according to the grouping threshold and the absolute value of the weight after retraining, and group the weights with absolute value greater than the threshold. The value parameters are quantized, the quantized weight parameters remain unchanged, and another set of weight parameters whose absolute value is less than the threshold is retrained and then updated to make the network converge again. Repeat the process of cyclic packet quantization until all network parameters are quantified.
- FIG. 2 shows a schematic diagram of the partial quantization and training structure of the present invention.
- the weight parameter is quantified to the power of 2 or the opposite of the power of 2, or 0, and the light gray mark indicates the 11 weight parameters that need to participate in the retraining that are not quantified.
- the 11 weight parameters after training 5 weights whose absolute value is greater than the threshold are called for quantification, and the remaining 6 weights are retrained. Repeat the above steps until all 16 weight parameters are Quantify.
- the present invention also provides a feasible solution for the quantization requirements of different bit widths.
- the weights can be quantized to different powers of 2, wherein the value of the power exponent is between n1 and n2 , N1 is determined by the largest absolute value of the weight parameter, and n2 is determined by the different bit width b set.
- Different quantization bit widths will result in different target detection accuracy after quantization. To a certain extent, the larger the quantization bit width, the smaller the effect of quantization on the original accuracy.
- the quantized bit width can be changed as needed, and the original weight parameter model can be quantized into a weight parameter model composed of powers of 2 or 0, so as to adapt to the bit width of different devices.
- Different requirements Ensure that the parameter value range is not compressed, which can reduce the impact of quantization on the final classification accuracy to a certain extent.
- Reduce the complexity of the network parameter model on the hardware implementation by simply shifting to achieve the multiplication operation on the hardware, reducing the amount of calculations and parameters in the network operation process, for some network models in small embedded terminals
- the implementation on the device provides convenience.
Abstract
Description
Claims (7)
- 一种基于幂指数量化的神经网络压缩方法,其特征在于,获取神经网络达到目标准确率时的权值参数初始值,根据神经网络每层的分割率以及权值参数绝对值的排序大小确定分组阈值,将绝对值大于阈值的权值参数划分为待量化参数的一组数据,将绝对值小于阈值的权值参数划分为重新训练的一组数据,根据权值参数初始值中绝对值最大的权值参数和设定的量化位宽确定幂指数取值范围,遍历幂指数取值范围,选取与待量化参数差值最小的以2为底的幂或以2为底的幂的相反数或0替代待量化参数,依据分组阈值对重新训练后的数据执行分组再量化的循环操作直至网络收敛。
- 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,在执行分组再量化的循环操作的过程中保证量化过的权值参数不更新,具体方法为:经过量化的权值参数的company变量由初始的1变为0,而未量化的权值参数的company变量仍然是1,在重新训练绝对值小于阈值的权值参数时,将经过量化的权值参数的company与梯度相乘以保证量化过的权值参数的梯度为0。
- 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,网络训练的数据库选择Imagnet数据集。
- 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,基于学习框架caffe对数据进行训练量化,在前向传播过程中采用浮点计算测试每一层的数据,对网络模型和参数进行调整以获取神经网络达到目标准确率时的权值参数初始值。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1所述的压缩方法。
- 小型化嵌入式终端设备,其特征在于,包括:存储器、处理器及存储在存储器上并可以在处理器上运行的计算机程序,其特征在于,所述处理器根据位宽执行所述程序时实现权利要求1所述压缩方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910445413.3 | 2019-05-27 | ||
CN201910445413.3A CN110245753A (zh) | 2019-05-27 | 2019-05-27 | 一种基于幂指数量化的神经网络压缩方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020237904A1 true WO2020237904A1 (zh) | 2020-12-03 |
Family
ID=67885162
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/105485 WO2020237904A1 (zh) | 2019-05-27 | 2019-09-11 | 一种基于幂指数量化的神经网络压缩方法 |
PCT/CN2020/071124 WO2020238237A1 (zh) | 2019-05-27 | 2020-01-09 | 一种基于幂指数量化的神经网络压缩方法 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/071124 WO2020238237A1 (zh) | 2019-05-27 | 2020-01-09 | 一种基于幂指数量化的神经网络压缩方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110245753A (zh) |
WO (2) | WO2020237904A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245753A (zh) * | 2019-05-27 | 2019-09-17 | 东南大学 | 一种基于幂指数量化的神经网络压缩方法 |
CN112561050B (zh) * | 2019-09-25 | 2023-09-05 | 杭州海康威视数字技术股份有限公司 | 一种神经网络模型训练方法及装置 |
CN110852439B (zh) * | 2019-11-20 | 2024-02-02 | 字节跳动有限公司 | 数据处理方法及装置、存储介质 |
CN111222561B (zh) * | 2019-12-31 | 2023-06-09 | 深圳云天励飞技术股份有限公司 | 图像识别神经网络处理方法、装置与系统 |
CN113112009B (zh) * | 2020-01-13 | 2023-04-18 | 中科寒武纪科技股份有限公司 | 用于神经网络数据量化的方法、装置和计算机可读存储介质 |
CN111563593B (zh) * | 2020-05-08 | 2023-09-15 | 北京百度网讯科技有限公司 | 神经网络模型的训练方法和装置 |
CN113487036B (zh) * | 2021-06-24 | 2022-06-17 | 浙江大学 | 机器学习模型的分布式训练方法及装置、电子设备、介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644254A (zh) * | 2017-09-09 | 2018-01-30 | 复旦大学 | 一种卷积神经网络权重参数量化训练方法及系统 |
US20180075338A1 (en) * | 2016-09-12 | 2018-03-15 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN109102064A (zh) * | 2018-06-26 | 2018-12-28 | 杭州雄迈集成电路技术有限公司 | 一种高精度的神经网络量化压缩方法 |
CN109344893A (zh) * | 2018-09-25 | 2019-02-15 | 华中师范大学 | 一种基于移动终端的图像分类方法及系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688850B (zh) * | 2017-08-08 | 2021-04-13 | 赛灵思公司 | 一种深度神经网络压缩方法 |
CN108229681A (zh) * | 2017-12-28 | 2018-06-29 | 郑州云海信息技术有限公司 | 一种神经网络模型压缩方法、系统、装置及可读存储介质 |
CN109523016B (zh) * | 2018-11-21 | 2020-09-01 | 济南大学 | 面向嵌入式系统的多值量化深度神经网络压缩方法及系统 |
CN109635936A (zh) * | 2018-12-29 | 2019-04-16 | 杭州国芯科技股份有限公司 | 一种基于重训练的神经网络剪枝量化方法 |
CN110245753A (zh) * | 2019-05-27 | 2019-09-17 | 东南大学 | 一种基于幂指数量化的神经网络压缩方法 |
-
2019
- 2019-05-27 CN CN201910445413.3A patent/CN110245753A/zh active Pending
- 2019-09-11 WO PCT/CN2019/105485 patent/WO2020237904A1/zh active Application Filing
-
2020
- 2020-01-09 WO PCT/CN2020/071124 patent/WO2020238237A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180075338A1 (en) * | 2016-09-12 | 2018-03-15 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
CN107644254A (zh) * | 2017-09-09 | 2018-01-30 | 复旦大学 | 一种卷积神经网络权重参数量化训练方法及系统 |
CN109102064A (zh) * | 2018-06-26 | 2018-12-28 | 杭州雄迈集成电路技术有限公司 | 一种高精度的神经网络量化压缩方法 |
CN109344893A (zh) * | 2018-09-25 | 2019-02-15 | 华中师范大学 | 一种基于移动终端的图像分类方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110245753A (zh) | 2019-09-17 |
WO2020238237A1 (zh) | 2020-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020238237A1 (zh) | 一种基于幂指数量化的神经网络压缩方法 | |
CN106250939B (zh) | 基于fpga+arm多层卷积神经网络的手写体字符识别方法 | |
CN111242287A (zh) | 一种基于通道l1范数剪枝的神经网络压缩方法 | |
US11775833B2 (en) | Accelerated TR-L-BFGS algorithm for neural network | |
CN107239825B (zh) | 考虑负载均衡的深度神经网络压缩方法 | |
CN107516129A (zh) | 基于维度自适应的Tucker分解的深度网络压缩方法 | |
CN108229681A (zh) | 一种神经网络模型压缩方法、系统、装置及可读存储介质 | |
WO2021258752A1 (zh) | 一种神经网络的4比特量化方法及系统 | |
CN106570559A (zh) | 一种基于神经网络的数据处理方法和装置 | |
CN110276451A (zh) | 一种基于权重归一化的深度神经网络压缩方法 | |
CN113011571B (zh) | 基于Transformer模型的INT8离线量化及整数推断方法 | |
CN111126602A (zh) | 一种基于卷积核相似性剪枝的循环神经网络模型压缩方法 | |
CN108985453A (zh) | 基于非对称三元权重量化的深度神经网络模型压缩方法 | |
CN108734264A (zh) | 深度神经网络模型压缩方法及装置、存储介质、终端 | |
CN110837890A (zh) | 一种面向轻量级卷积神经网络的权值数值定点量化方法 | |
CN111160524A (zh) | 一种两阶段的卷积神经网络模型压缩方法 | |
CN112598129A (zh) | 基于ReRAM神经网络加速器的可调硬件感知的剪枝和映射框架 | |
TW202022798A (zh) | 處理卷積神經網路的方法 | |
WO2023020456A1 (zh) | 网络模型的量化方法、装置、设备和存储介质 | |
CN112488070A (zh) | 一种面向遥感图像目标检测的神经网络压缩方法 | |
WO2022222649A1 (zh) | 神经网络模型的训练方法、装置、设备及存储介质 | |
CN114677548A (zh) | 基于阻变存储器的神经网络图像分类系统及方法 | |
CN110110852B (zh) | 一种深度学习网络移植到fpag平台的方法 | |
CN112990420A (zh) | 一种用于卷积神经网络模型的剪枝方法 | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19930744 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19930744 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/05/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19930744 Country of ref document: EP Kind code of ref document: A1 |