WO2020237904A1 - 一种基于幂指数量化的神经网络压缩方法 - Google Patents

一种基于幂指数量化的神经网络压缩方法 Download PDF

Info

Publication number
WO2020237904A1
WO2020237904A1 PCT/CN2019/105485 CN2019105485W WO2020237904A1 WO 2020237904 A1 WO2020237904 A1 WO 2020237904A1 CN 2019105485 W CN2019105485 W CN 2019105485W WO 2020237904 A1 WO2020237904 A1 WO 2020237904A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight parameter
neural network
weight
quantized
power exponent
Prior art date
Application number
PCT/CN2019/105485
Other languages
English (en)
French (fr)
Inventor
陆生礼
庞伟
刘昊
樊迎博
花硕硕
缪烨昊
Original Assignee
东南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东南大学 filed Critical 东南大学
Publication of WO2020237904A1 publication Critical patent/WO2020237904A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the invention discloses a neural network compression method based on power exponent quantification, relates to artificial intelligence neural network technology, and belongs to the technical field of calculation, calculation and counting.
  • Deep Neural Network DNN
  • DNN Deep Neural Network
  • the main strategy adopted by the existing quantization compression method is to limit the parameter change range of the convolutional neural network to a limited data set instead of the traditional real number set, and then can use fewer bits to represent the same data and realize storage Space saving.
  • methods such as two-value and three-value quantization can reduce the complexity of the hardware, the classification accuracy rate is too large due to the impact of quantization and cannot meet the requirements.
  • dynamic accuracy can meet the accuracy requirements, the complexity of hardware implementation is very large, which is not conducive to the application of embedded terminal equipment.
  • the training of the deep neural network itself has consumed a lot of computing resources and time.
  • the formation and update of the huge parameter matrix requires a large amount of computing resources and time to achieve.
  • the ordinary quantization compression method can effectively solve the problem of computing resources and It takes a lot of time, but it sacrifices the classification accuracy of the network model, and most of the sacrificed accuracy is not negligible, and the huge parameter model cannot be effectively applied to some miniaturized embedded terminal devices.
  • This application aims to provide a neural network compression scheme based on power exponent quantization in order to better solve the problem of precision reduction after quantization compression and reduce the difficulty of hardware implementation.
  • the purpose of the present invention is to address the shortcomings of the above-mentioned background technology and provide a neural network compression method based on power exponent quantization, which quantifies a full-precision network model into a low-precision model, and fully reduces the effect of quantization on classification accuracy. At the same time, it reduces the complexity of the application of the network model on the hardware, and solves the technical problems that the accuracy of the neural network after compression by the existing quantization compression method decreases and the hardware implementation is difficult.
  • a network compression method based on power exponent quantization including the following steps:
  • Step 1 According to the data set and network model disclosed by the network resource library, adjust the parameters by yourself and train the convolutional neural network on the data set, test the weight parameters obtained and adjust the network model and parameters to make the trained The model reaches the target accuracy rate and obtains the weight parameters under high precision;
  • the grouping threshold is determined according to the division rate of each layer and the sorting size of the absolute value of the weight parameter. According to this threshold, the weight parameters are divided into two mutually exclusive groups, where the absolute value is less than A set of weights of the threshold remain unchanged, and a set of absolute values greater than the threshold is quantified;
  • Step 3 Based on the preset bit width, by comparing the original weight and the difference between the power of 2 and 0 in the bit width range, the weight of another group whose absolute value is greater than the threshold is quantified as Power of 2 or 0 with the smallest difference from the original weight;
  • Step 4 Put a set of weights whose absolute value is less than the threshold value back into the original network model and data set for retraining, and update the original weight parameters with the weight parameters after retraining;
  • Step 5 Divide the retrained weight parameters into two groups according to the grouping threshold determined in step 2 and the grouping method, quantify a group of weights whose absolute value is greater than the threshold, and quantify a group of weights whose absolute value is less than the threshold Perform retraining, keep the quantized weights unchanged, and retrain the unquantized weights in the network to make the network converge again;
  • Step 6 Repeat steps 2 to 5 until all network weights are quantified.
  • the database for network training is the Imagnet data set.
  • the data is trained and quantified based on the learning framework caffe.
  • floating-point calculations are used to test the data of each layer, and then the network model and parameters are adjusted to make the network model reach the highest possible classification Accuracy, get the weight parameter at this time.
  • the specific method of grouping is: sort the weight tensor according to the absolute value, determine the corresponding threshold according to the selected segmentation rate, and divide the weights whose absolute value is greater than or equal to the threshold into a group, and The weights whose absolute value is less than the threshold are grouped into one group, so that the two groups are mutually exclusive.
  • the method of selecting the power of 2 or 0 instead of the original weight is: the original weight parameter and the power exponent in the range of n2 to n1, the power of 2, or the opposite of the power of 2, or 0, respectively, Find the power of 2 or the opposite of the power of 2 or 0 with the smallest difference to replace the original weight parameter.
  • step 4 the way to ensure that the quantified weights are not updated is: after each weight is quantified, the company variable following its weight changes from the initial 1 to 0, while the company variable of the unquantified weight remains If it is 1, multiply the company with the gradient during update to ensure that the gradient of the quantized weight is 0, so that only the unquantized weight is updated.
  • the present invention proposes a neural network compression method that can ensure that the parameter value range is not compressed, and to a certain extent reduce the impact on the final classification accuracy, the initial full-precision network parameter model Quantification is a power of 2 or 0, which reduces the scale of the network weight parameter model and also reduces the impact of quantification on classification accuracy;
  • the present invention quantizes the initialization parameter model to a power of 2 or 0, so that the multiplication operations involved in the neural network inference and training process can be realized by simple shifting in hardware implementation, and quantized into low-precision weights It can be stored by coding, thereby reducing the storage size of the model in the embedded device, reducing the complexity of the network parameter model on the hardware, and reducing the amount of calculations and parameters during the network operation, making the network model more compact Embedded terminal equipment and other hardware applications provide convenience.
  • FIG. 1 is a flowchart of the present invention.
  • Figure 2 is a schematic diagram of the structure of partial quantization and retraining.
  • the present invention provides a neural network compression method based on power exponent quantization as shown in Fig. 1, and includes the following five steps.
  • Step 1 Obtain the initialization parameter model: Obtain the data set and adjust the network model, set the parameters to train the network model, make it converge and reach a certain accuracy, and obtain the full-precision network model weight parameters.
  • the training database can be selected Imagenet.
  • Step 2 Weight parameter grouping: Pre-set different segmentation rates according to different network models and data sets, and then determine the current grouping threshold according to the absolute value of the weight and the segmentation rate, and set the absolute value of the weight greater than the threshold After the parameters are divided into one group, they are ready to be quantified, and the weight parameters whose absolute value is less than the threshold are divided into another group and ready to be retrained.
  • the n1 and n2 are based on The initial parameter matrix is determined, and the value range of the quantized weight parameter is Traverse the power exponent range [n2,n1], subtract the power of 2 or the opposite of the power of 2 or 0 from the current weight parameter to be quantized, and find the power of 2 or the power of 2 with the smallest difference from the weight
  • the opposite number or 0, replace the original weight parameter to be quantized with the power of 2 or the opposite number of the power of 2, or 0, so as to realize the quantization of the original weight parameter.
  • Step 4 Partial retraining: the part whose absolute value is less than the threshold is returned to the original network model, and retraining is performed on the same data set to make the network converge again and update to obtain new weight parameters.
  • Step 5 Cycle grouping quantization: Keep the quantized weight parameters unchanged, group the retrained weight parameters again according to the grouping threshold and the absolute value of the weight after retraining, and group the weights with absolute value greater than the threshold. The value parameters are quantized, the quantized weight parameters remain unchanged, and another set of weight parameters whose absolute value is less than the threshold is retrained and then updated to make the network converge again. Repeat the process of cyclic packet quantization until all network parameters are quantified.
  • FIG. 2 shows a schematic diagram of the partial quantization and training structure of the present invention.
  • the weight parameter is quantified to the power of 2 or the opposite of the power of 2, or 0, and the light gray mark indicates the 11 weight parameters that need to participate in the retraining that are not quantified.
  • the 11 weight parameters after training 5 weights whose absolute value is greater than the threshold are called for quantification, and the remaining 6 weights are retrained. Repeat the above steps until all 16 weight parameters are Quantify.
  • the present invention also provides a feasible solution for the quantization requirements of different bit widths.
  • the weights can be quantized to different powers of 2, wherein the value of the power exponent is between n1 and n2 , N1 is determined by the largest absolute value of the weight parameter, and n2 is determined by the different bit width b set.
  • Different quantization bit widths will result in different target detection accuracy after quantization. To a certain extent, the larger the quantization bit width, the smaller the effect of quantization on the original accuracy.
  • the quantized bit width can be changed as needed, and the original weight parameter model can be quantized into a weight parameter model composed of powers of 2 or 0, so as to adapt to the bit width of different devices.
  • Different requirements Ensure that the parameter value range is not compressed, which can reduce the impact of quantization on the final classification accuracy to a certain extent.
  • Reduce the complexity of the network parameter model on the hardware implementation by simply shifting to achieve the multiplication operation on the hardware, reducing the amount of calculations and parameters in the network operation process, for some network models in small embedded terminals
  • the implementation on the device provides convenience.

Abstract

本发明涉及人工智能神经网络技术领域,具体公开一种基于幂指数量化的神经网络压缩方法。该方法,在外部数据集上训练卷积神经网络后获取该网络的初始化权值参数;根据权值参数绝对值的大小以及分组阈值将权值参数分为两组,绝对值超过阈值的一组基于预先设定的位宽及绝对值最大的权值参数量化大于分组阈值的一组权值参数,将权值参数量化为2的幂或者0;对小于分组阈值的权值参数再训练后执行分组再量化的循环操作,直至网络收敛。本发明在保证参数的取值范围不被压缩的同时在一定程度上减小量化对最终目标检测准确率的影响,解决了量化后准确率下降过多以及硬件实现难度大的问题。

Description

一种基于幂指数量化的神经网络压缩方法 技术领域
本发明公开了一种基于幂指数量化的神经网络压缩方法,涉及人工智能神经网络技术,属于计算、推算、计数的技术领域。
背景技术
近年来,随着计算能力的提升、大数据的积累和学习算法的进步,以深度神经网络(Deep Neural Network,简称DNN)为代表的深度学习已经逐步取代传统机器学习算法,大量的实践应用于自然语言处理、目标检测识别和数据挖掘等诸多领域,成为当今人工智能领域最主要的研究热点。深度学习的成功很大程度上依赖于庞大的数据集和高达数百万的参数,但是无论是推断过程还是训练过程都需要使用庞大的存储空间和计算资源,同时还会消耗大量的能量,因此催生了对神经网络模型量化压缩的研究。对庞大的参数模型进行量化压缩,以量化后低位宽的定点数取代浮点运算为深度神经网络在手机等嵌入式终端设备的应用创造了可能。
现有的量化压缩方法主采用的策略是将卷积神经网络的参数变化范围限制为有限的数据集而不是传统的实数集,进而可以使用较少的bit位数来表示相同的数据,实现存储空间的节省,其中,二值、三值量化等方法虽然可以降低硬件的复杂程度,但分类准确率因受量化的影响下降过大而达不到要求。动态精度虽然可以达到准确率的要求,但硬件实现的复杂度却很大,不利于嵌入式终端设备的应用。
深度神经网络的训练本身都已耗费大量的运算资源和时间,庞大的参数矩阵的形成及更新更是需要较大的计算资源和时间才能实现,而普通的量化压缩方法虽然可以有效解决计算资源和时间花费较大的问题,但却牺牲了网络模型的分类精度,而且所牺牲的精度大都是不可忽略的,而且庞大的参数模型无法有效地应用于一些小型化的嵌入式终端设备。本申请旨在为了更好地解决在量化压缩后的精度下降问题和降低硬件实现的难度提供一种基于幂指数量化的神经网络压缩方案。
发明内容
本发明的发明目的是针对上述背景技术的不足,提供了一种基于幂指数量化的神经网络压缩方法,将全精度的网络模型量化成为低精度模型,充分减小量化对分类准确率下降这一影响的同时降低了网络模型在硬件上应用的复杂度,解决了现有量化压缩方法压缩后的神经网络精度下降且硬件实现难度大的技术问题。
本发明为实现上述发明目的采用如下技术方案:
一种基于幂指数量化的网络压缩方法,包括如下步骤:
步骤1,根据网络资源库公开的数据集和网络模型,自行调整参数并在数据集上对卷积神经网络进行训练,对得到的权值参数进行测试并调整网络模型和参数,使训练出的模型达到目标准确率,获取高精度下的权值参数;
步骤2,对于初始化的权值参数,根据每层的分割率和权值参数绝对值的排序大小来确定分组阈值,依此阈值将权值参数分为互斥的两组,其中,绝对值小于阈值的一组权值保持不变,对绝对值大于阈值的一组进行量化;
步骤3,基于预先设定的位宽,通过比较原始权值与位宽范围内的各个以2为底的幂以及0的差值大小,将绝对值大于阈值的另一组的权值量化为与原始权值差值最小的以2为底的幂或0;
步骤4,将绝对值小于阈值的一组权值放回原网络模型和数据集中进行再训练,用再训练过后的权值参数更新原始权值参数;
步骤5,依据步骤2确定的分组阈值以及分组方法将再训练后的权值参数再分为两组,对绝对值大于阈值的一组权值进行量化,对绝对值小于阈值的一组权值进行再训练,保持量化过的权值不变,重新训练网络中未被量化的权值,再次使得网络收敛;
步骤6,重复步骤2至步骤5,直到所有的网络权值全部完成量化。
上述步骤1中,网络训练的数据库选择Imagnet数据集。
上述步骤1中,对数据进行训练量化基于学习框架caffe,在前向传播过程中采用浮点计算测试每一层的数据进而对网络模型和参数进行调整,使该网络模型达到尽可能高的分类准确率,获得这时候的权值参数。
上述步骤2中,分组的具体方法是:将权值张量按照绝对值大小进行排序,依据选择好的分割率确定对应的阈值,将绝对值大于或等于阈值的权值分为一组,将绝对值小于阈值的权值分为一组,从而两组互斥。
上述步骤3中,设定需要量化的位宽b后,在量化过程中确定权值矩阵中最大权值所对应的2的幂时,幂指数的上限n1的计算公式为:n1=floor(log 2(max_weight*4.0/3.0)),其中,max_weight为权值矩阵中绝对值最大的权值,floor为向下取整。根据公式n2=n1+1-2 b-2求得幂指数的下限n2,即幂指数的取值范围为n1到n2,量化后的权值参数的取值范围为
Figure PCTCN2019105485-appb-000001
上述步骤3中,代替原始权值的2的幂或0的选择方法为:将原始权值参数与幂指数范围在n2到n1内2的幂或2的幂的相反数或0分别求差,找出差值最小的2的幂或2的幂的相反数或0来替代原始的权值参数。
上述步骤4中,保证量化过的权值不更新的办法是:每个权值经过量化后,跟随其权值的company变量由初始的1变为0,而未量化的权值的company变量仍然是1,在更新时将company与梯度相乘,保证量化过的权值的梯度为0,从而仅更新未量化的权值。
本发明采用上述技术方案,具有以下有益效果:
(1)本发明提出了一种可以保证参数取值范围不被压缩的的神经网络压缩方法,并在一定程度上减小对最终分类准确率影响的情况下,将初始的全精度网络参数模型量化成以2为底的幂或0,减小了网络权值参数模型的规模,也减小了量化对分类准确率的影响;
(2)本发明通过将初始化参数模型量化为2的幂或0,使得神经网络推断和训练过程中涉及的乘法操作在硬件实现时可以通过简单地移位实现,并且量化成低精度的权值可以通过编码来存储,从而减少模型在嵌入式设备的存储大小,降低了网络参数模型在硬件上实现的复杂度,减少了在网络运行过程中的运算量和参数量,为网络模型在如小型嵌入式终端设备等硬件上的应用提供了便利。
附图说明
图1是本发明的流程图。
图2是分部量化和再训练的结构示意图。
具体实施方式
下面结合附图对发明的技术方案进行详细说明。
本发明提供一种基于幂指数量化的神经网络压缩方法如图1所示,包括如下五个步骤。
步骤一、获取初始化参数模型:获取数据集并调整网络模型,自行设定参数对网络模型进行训练,使得其收敛并达到一定准确率,得到全精度的网络模型权值参数,训练的数据库可以选择Imagenet。
步骤二、权值参数分组:根据不同的网络模型和数据集,预先设定不同的分割率,再根据权值绝对值大小顺序和分割率确定当前分组的阈值,将绝对值大于阈值的权值参数分为一组后准备进行量化,将绝对值小于阈值的权值参数分为另一组后准备进行重新训练。
步骤三、部分量化:设定需量化的位宽b,获得绝对值最大的权值参数max_weight,根据公式:n1==floor(log 2(max_weight*4.0/3.0))求得幂指数的上限n1,根据预先设定的量化位宽b,由公式:n2=n1+1-2 b-2求得幂指数的下限n2,从而确定幂指数取值范围为n1到n2,,该n1、n2根据初始的参数矩阵确定,量化后的权值参数的取值范围为
Figure PCTCN2019105485-appb-000002
遍历幂指数取值范围[n2,n1],用当前待量化权值参数减去2的幂或2的幂的相反数或0,找到与该权值之差最小的2的幂或2的幂相反数或0,用该2的幂或2的幂的相反数或0代替原来的待量化权值参数,从而实现对原始权值参数的量化。
步骤四、部分再训练:绝对值小于阈值的部分返回原网络模型,在同一数据集上进行重新训练,再次使得网络收敛,更新得到新的权值参数。
步骤五、循环分组量化:保持已量化的权值参数不变,依据分组阈值及再训练后权值的绝对值大小对再训练后的权值参数再次分组,对绝对值大于阈值的一组权值参数进行量化,保持量化后的权值参数保持不变,对绝对值小于阈值的另一组权值参数进行再训练进而更新,再次使得网络收敛。重复循环分组量化的过程,直到所有的网络参数全部完成量化。
图2给出了本发明部分量化和在训练的结构示意图。如样例中所示,得到一个4x4的原始权值参数模型后,根据分割率确定的阈值和权值参数绝对值大小的顺序,挑选出绝对值大于阈值的5个权值(即图中深灰色所标注),将权值参数量化为2的幂或2的幂的相反数或0,浅灰色标注表示参与未量化的需参与再训练的11个权值参数。在训练过后的这11个权值参数中再调出5个绝对值大于阈值的权值进行量化,剩下的6个权值再训练,重复上述步骤,直到所有的16个权值参数均被量化。
本发明还为不同位宽的量化需求提供了可行方案,通过设置不同的位宽b,从而使得权值被量化到不同的2的幂的范围,其中,幂指数的值在n1到n2之间,n1由权值参数中绝对值最大者决定,而n2则由设定的不同位宽b决定。不同的量化位宽会使得量化过后的目标检测准确率也不同,一定程度上量化位宽越大,量化对于原始准确率的影响会越小。
综合上述,采用本发明公开的技术方案后,可以根据需要改变量化的位宽,将原始的权值参数模型量化成为由2的幂或0组成的权值参数模型,以适应不同设备对位宽的不同要求。保证参数的取值范围不被压缩,可以在一定程度上减小量化对最终分类准确率的影响。降低了网络参数模型在硬件上实现的复杂度,通过简单地移位即可实现在硬件上的乘法操作,减少了网络运行过程中的运算量和参数量,为一些网络模型在小型嵌入式终端设备上的实现提供了便利。
以上实施例仅为说明本发明的技术构思,不能以此限定本发明的保护范围,凡是按照本发明提出的技术构思做出的等同技术方案均落入本发明保护范围之内。

Claims (7)

  1. 一种基于幂指数量化的神经网络压缩方法,其特征在于,获取神经网络达到目标准确率时的权值参数初始值,根据神经网络每层的分割率以及权值参数绝对值的排序大小确定分组阈值,将绝对值大于阈值的权值参数划分为待量化参数的一组数据,将绝对值小于阈值的权值参数划分为重新训练的一组数据,根据权值参数初始值中绝对值最大的权值参数和设定的量化位宽确定幂指数取值范围,遍历幂指数取值范围,选取与待量化参数差值最小的以2为底的幂或以2为底的幂的相反数或0替代待量化参数,依据分组阈值对重新训练后的数据执行分组再量化的循环操作直至网络收敛。
  2. 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,幂指数取值范围的上限根据公式:n1=floor(log 2(max_weight*4.0/3.0))计算,幂指数取值范围的下限根据公式:n2=n1+1-2 b-2求得,量化后的权值参数的取值范围为
    Figure PCTCN2019105485-appb-100001
    n1和n2为幂指数取值范围的上下限,max_weight为权值参数初始值中绝对值最大的权值参数,floor为向下取整运算。
  3. 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,在执行分组再量化的循环操作的过程中保证量化过的权值参数不更新,具体方法为:经过量化的权值参数的company变量由初始的1变为0,而未量化的权值参数的company变量仍然是1,在重新训练绝对值小于阈值的权值参数时,将经过量化的权值参数的company与梯度相乘以保证量化过的权值参数的梯度为0。
  4. 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,网络训练的数据库选择Imagnet数据集。
  5. 根据权利要求1所述一种基于幂指数量化的神经网络压缩方法,其特征在于,基于学习框架caffe对数据进行训练量化,在前向传播过程中采用浮点计算测试每一层的数据,对网络模型和参数进行调整以获取神经网络达到目标准确率时的权值参数初始值。
  6. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1所述的压缩方法。
  7. 小型化嵌入式终端设备,其特征在于,包括:存储器、处理器及存储在存储器上并可以在处理器上运行的计算机程序,其特征在于,所述处理器根据位宽执行所述程序时实现权利要求1所述压缩方法。
PCT/CN2019/105485 2019-05-27 2019-09-11 一种基于幂指数量化的神经网络压缩方法 WO2020237904A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910445413.3 2019-05-27
CN201910445413.3A CN110245753A (zh) 2019-05-27 2019-05-27 一种基于幂指数量化的神经网络压缩方法

Publications (1)

Publication Number Publication Date
WO2020237904A1 true WO2020237904A1 (zh) 2020-12-03

Family

ID=67885162

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2019/105485 WO2020237904A1 (zh) 2019-05-27 2019-09-11 一种基于幂指数量化的神经网络压缩方法
PCT/CN2020/071124 WO2020238237A1 (zh) 2019-05-27 2020-01-09 一种基于幂指数量化的神经网络压缩方法

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071124 WO2020238237A1 (zh) 2019-05-27 2020-01-09 一种基于幂指数量化的神经网络压缩方法

Country Status (2)

Country Link
CN (1) CN110245753A (zh)
WO (2) WO2020237904A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245753A (zh) * 2019-05-27 2019-09-17 东南大学 一种基于幂指数量化的神经网络压缩方法
CN112561050B (zh) * 2019-09-25 2023-09-05 杭州海康威视数字技术股份有限公司 一种神经网络模型训练方法及装置
CN110852439B (zh) * 2019-11-20 2024-02-02 字节跳动有限公司 数据处理方法及装置、存储介质
CN111222561B (zh) * 2019-12-31 2023-06-09 深圳云天励飞技术股份有限公司 图像识别神经网络处理方法、装置与系统
CN113112009B (zh) * 2020-01-13 2023-04-18 中科寒武纪科技股份有限公司 用于神经网络数据量化的方法、装置和计算机可读存储介质
CN111563593B (zh) * 2020-05-08 2023-09-15 北京百度网讯科技有限公司 神经网络模型的训练方法和装置
CN113487036B (zh) * 2021-06-24 2022-06-17 浙江大学 机器学习模型的分布式训练方法及装置、电子设备、介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (zh) * 2017-09-09 2018-01-30 复旦大学 一种卷积神经网络权重参数量化训练方法及系统
US20180075338A1 (en) * 2016-09-12 2018-03-15 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
CN109102064A (zh) * 2018-06-26 2018-12-28 杭州雄迈集成电路技术有限公司 一种高精度的神经网络量化压缩方法
CN109344893A (zh) * 2018-09-25 2019-02-15 华中师范大学 一种基于移动终端的图像分类方法及系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107688850B (zh) * 2017-08-08 2021-04-13 赛灵思公司 一种深度神经网络压缩方法
CN108229681A (zh) * 2017-12-28 2018-06-29 郑州云海信息技术有限公司 一种神经网络模型压缩方法、系统、装置及可读存储介质
CN109523016B (zh) * 2018-11-21 2020-09-01 济南大学 面向嵌入式系统的多值量化深度神经网络压缩方法及系统
CN109635936A (zh) * 2018-12-29 2019-04-16 杭州国芯科技股份有限公司 一种基于重训练的神经网络剪枝量化方法
CN110245753A (zh) * 2019-05-27 2019-09-17 东南大学 一种基于幂指数量化的神经网络压缩方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075338A1 (en) * 2016-09-12 2018-03-15 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
CN107644254A (zh) * 2017-09-09 2018-01-30 复旦大学 一种卷积神经网络权重参数量化训练方法及系统
CN109102064A (zh) * 2018-06-26 2018-12-28 杭州雄迈集成电路技术有限公司 一种高精度的神经网络量化压缩方法
CN109344893A (zh) * 2018-09-25 2019-02-15 华中师范大学 一种基于移动终端的图像分类方法及系统

Also Published As

Publication number Publication date
CN110245753A (zh) 2019-09-17
WO2020238237A1 (zh) 2020-12-03

Similar Documents

Publication Publication Date Title
WO2020238237A1 (zh) 一种基于幂指数量化的神经网络压缩方法
CN106250939B (zh) 基于fpga+arm多层卷积神经网络的手写体字符识别方法
CN111242287A (zh) 一种基于通道l1范数剪枝的神经网络压缩方法
US11775833B2 (en) Accelerated TR-L-BFGS algorithm for neural network
CN107239825B (zh) 考虑负载均衡的深度神经网络压缩方法
CN107516129A (zh) 基于维度自适应的Tucker分解的深度网络压缩方法
CN108229681A (zh) 一种神经网络模型压缩方法、系统、装置及可读存储介质
WO2021258752A1 (zh) 一种神经网络的4比特量化方法及系统
CN106570559A (zh) 一种基于神经网络的数据处理方法和装置
CN110276451A (zh) 一种基于权重归一化的深度神经网络压缩方法
CN113011571B (zh) 基于Transformer模型的INT8离线量化及整数推断方法
CN111126602A (zh) 一种基于卷积核相似性剪枝的循环神经网络模型压缩方法
CN108985453A (zh) 基于非对称三元权重量化的深度神经网络模型压缩方法
CN108734264A (zh) 深度神经网络模型压缩方法及装置、存储介质、终端
CN110837890A (zh) 一种面向轻量级卷积神经网络的权值数值定点量化方法
CN111160524A (zh) 一种两阶段的卷积神经网络模型压缩方法
CN112598129A (zh) 基于ReRAM神经网络加速器的可调硬件感知的剪枝和映射框架
TW202022798A (zh) 處理卷積神經網路的方法
WO2023020456A1 (zh) 网络模型的量化方法、装置、设备和存储介质
CN112488070A (zh) 一种面向遥感图像目标检测的神经网络压缩方法
WO2022222649A1 (zh) 神经网络模型的训练方法、装置、设备及存储介质
CN114677548A (zh) 基于阻变存储器的神经网络图像分类系统及方法
CN110110852B (zh) 一种深度学习网络移植到fpag平台的方法
CN112990420A (zh) 一种用于卷积神经网络模型的剪枝方法
Qi et al. Learning low resource consumption cnn through pruning and quantization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19930744

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19930744

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/05/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19930744

Country of ref document: EP

Kind code of ref document: A1