WO2019080483A1 - Neural network computation acceleration method and system based on non-uniform quantization and look-up table - Google Patents

Neural network computation acceleration method and system based on non-uniform quantization and look-up table

Info

Publication number
WO2019080483A1
WO2019080483A1 PCT/CN2018/087117 CN2018087117W WO2019080483A1 WO 2019080483 A1 WO2019080483 A1 WO 2019080483A1 CN 2018087117 W CN2018087117 W CN 2018087117W WO 2019080483 A1 WO2019080483 A1 WO 2019080483A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
input
parameters
uniform quantization
Prior art date
Application number
PCT/CN2018/087117
Other languages
French (fr)
Chinese (zh)
Inventor
江帆
王瑜
盛骁
韩松
单羿
Original Assignee
北京深鉴智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京深鉴智能科技有限公司 filed Critical 北京深鉴智能科技有限公司
Publication of WO2019080483A1 publication Critical patent/WO2019080483A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to artificial neural networks, and more particularly to methods and systems for accelerating neural network computation using non-uniform quantization and lookup tables.
  • ANN Artificial Neural Networks
  • NN neural network
  • Mainstream technologies for network compression include pruning, quantification, distillation, and the like.
  • well-designed small-volume networks such as SqueezeNet, MobileNets, and ShuffleNet have also achieved good results in some applications.
  • Deep Compression proposes non-uniform quantization of parameters (such as weights) of various layers of the neural network while pruning, thereby reducing computational complexity and speeding up operations.
  • parameters such as weights
  • the multiplication of parameters and inputs only quantifies the parameters, only a part of the workload is reduced, multiplication is still the main source of computational complexity.
  • Deep compression only performs non-uniform quantization on the parameters (such as weights) of the neural network.
  • the contribution of the present invention is to adopt a non-uniform quantization method, and based on the quantization parameters, the calibration is adopted.
  • the method quantifies the input of each layer in the network, and further uses a lookup table instead of multiplication to accelerate the calculation of the neural network.
  • a method of accelerating neural network computation using non-uniform quantization and lookup tables can include: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by inputting all quantized values of parameters of each layer with inputs of each layer Multiply all the quantized values to construct a lookup table for each layer; for the forward calculation of the neural network, for the multiplication of the parameters of each layer and the input, look up the result of the multiplication from the lookup table of the layer , layer by layer until all is completed.
  • the non-uniform quantization of the input to each layer of the neural network can be done by a calibrated method.
  • a system for accelerating neural network computation using non-uniform quantization and lookup tables may include: a network parameter quantization unit for non-uniform quantization of parameters of each layer of the neural network; an input quantization unit for non-uniform quantization of input of each layer of the neural network; Multiply all the quantized values of the parameters of each layer with all the quantized values of the input of each layer to construct a lookup table for each layer; the main processing unit is used for forward calculation of the neural network.
  • the main processing unit searches for the result of the multiplication operation from the lookup table of the layer for the multiplication operation of the parameters of each layer and the input, and calculates the layer by layer until all is completed.
  • the input quantization unit may be configured to perform non-uniform quantization of the input to each layer of the neural network by a calibrated method.
  • a computer readable medium for recording instructions executable by a processor, when executed by a processor, causing a processor to perform acceleration of a neural field using a non-uniform quantization and lookup table
  • a method of network computing includes the following operations: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by arranging all quantized values of each layer of parameters All the quantized values of the input of one layer are multiplied to construct a lookup table for each layer; when performing the forward calculation of the neural network, the multiplication operation of the parameters and inputs of each layer is searched from the lookup table of the layer. The result of the multiplication operation is calculated layer by layer until all is completed.
  • the invention performs non-uniform quantization on the parameters and inputs of the neural network, and further uses a lookup table instead of multiplication, thereby accelerating the calculation of the neural network.
  • FIG. 1 is a process illustrating non-uniform quantization of an input of a neural network using a calibration method.
  • FIG. 2 is a block diagram illustrating a system for accelerating neural network calculations using non-uniform quantization and lookup tables in accordance with the present invention.
  • FIG. 3 is a flow chart illustrating a method of accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention.
  • Deep compression only non-uniformly quantifies the parameters of the neural network, such as weights.
  • the invention adopts a non-uniform quantization method.
  • the calibration method is used to quantify the input of each layer in the network, and a lookup table is further used instead of the multiplication operation to accelerate the calculation of the neural network.
  • the quantization of the input is much more complicated.
  • the fundamental reason is the uncertainty of the input.
  • the input to each layer in the neural network depends on the output of the previous layer and is ultimately determined by the input of the entire network. Therefore, the input of each layer has a certain randomness.
  • the calibration of the neural network is quantified by the method of calibration.
  • the so-called calibration can also be understood as a kind of training.
  • the input N quantized values are calculated to complete the non-uniform quantization of the input.
  • this statement is used later in this paper.
  • N that is, the number of quantized values
  • N is generally taken as a power of 2 n , which is called quantization to n bits, or n-bit quantization.
  • FIG. 1 is a process illustrating non-uniform quantization of an input of a neural network using a calibration method.
  • Figure 1 shows the quantization process for any layer in a neural network.
  • calibration samples can range from tens to thousands.
  • Kmeans clustering was used to obtain N cluster centers for each batch, and finally for each batch.
  • the clustering center performs another Kmeans clustering to obtain the final N quantized values.
  • the specific steps are as follows:
  • the combination of possible multiplications in the neural network is a finite set, and the lookup table becomes a suitable acceleration method.
  • the parameters are quantized to 4 bits (16 quantized values)
  • the input is also quantized to 4 bits
  • FIG. 2 is a block diagram illustrating a system for accelerating neural network calculations using non-uniform quantization and lookup tables in accordance with the present invention.
  • a system 200 for accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention can include a network parameter quantization unit 201 for parameters (eg, weights, etc.) for each layer of the neural network.
  • input quantization unit 202 for inputting each layer of the neural network (for the first layer, the original input; for the later layers, the output of the previous layer) for non-uniformity Quantization; lookup table 203, constructing a lookup table 203 for each layer by multiplying all quantized values of the parameters of each layer with all quantized values of the input of each layer; main processing unit 204 for performing neural networks Forward calculation.
  • the main processing unit 204 searches for the result of the multiplication operation from the lookup table 204 of the layer for the multiplication operation of the parameters of each layer and the input, and calculates the layer by layer until all is completed. .
  • the network parameter quantization unit 201 can be configured to non-uniformly quantize parameters in accordance with a deep compression method.
  • the input quantization unit 202 can then be configured to perform non-uniform quantization of the input to each layer of the neural network by a calibrated method.
  • Quantitative values ⁇ c 1 , c 2 , ... c N ⁇ .
  • FIG. 3 is a flow chart illustrating a method of accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention.
  • the method 300 for accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention begins in step S310, where non-uniform quantization of parameters of each layer of the neural network is performed.
  • a deep compression method may be adopted, but it is not excluded to use other methods to perform non-uniform parameters.
  • Uniform quantization For a method of deep compression, see S Han, H Mao, WJ Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; Fiber, 2015, 56(4): 3-7. Here, by reference, the whole is added.
  • step S320 the input of each layer of the neural network is non-uniformly quantized.
  • step S320 non-uniform quantization of the input to each layer of the neural network can be accomplished by a calibrated method. Specifically, as described above, the calibration can be performed by the following steps:
  • the quantization of the input is done by a calibrated method, and more specifically by Kmeans clustering and by the specific steps described above, the invention is not Exclude non-uniform quantization of the input using other methods.
  • step S330 a lookup table for each layer is constructed by multiplying all quantized values of the parameters of each layer with all quantized values of the input of each layer.
  • step S340 when performing the forward calculation of the neural network, for the multiplication operation of the parameters of each layer and the input, the result of the multiplication operation is searched from the lookup table of the layer, and the calculation is performed layer by layer until all is completed.
  • method 300 can end.
  • Non-transitory computer readable media include various types of tangible storage media.
  • non-transitory computer readable medium examples include magnetic recording media (such as floppy disks, magnetic tapes, and hard disk drives), magneto-optical recording media (such as magneto-optical disks), CD-ROM (Compact Disc Read Only Memory), CD-R, CD-R /W and semiconductor memory (such as ROM, PROM (programmable ROM), EPROM (rewritable PROM), flash ROM and RAM (random access memory)).
  • these programs can be provided to a computer by using various types of transient computer readable media.
  • Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The transitory computer readable medium can be used to provide a program to a computer via a wired communication path such as a wire and an optical fiber or a wireless communication path.
  • a computer program or a computer readable medium for recording instructions executable by a processor when executed by a processor, causes the processor to perform non-uniform quantization and
  • the lookup table accelerates neural network computation methods, including the following operations: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by passing all parameters of each layer The quantized value is multiplied with all the quantized values of the input of each layer to construct a lookup table for each layer; when performing the forward calculation of the neural network, the multiplication operation of the parameters and inputs of each layer is performed from the layer Find the result of the multiplication operation in the lookup table, and calculate it layer by layer until it is all completed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A neural network computation acceleration method and system based on non-uniform quantization and a look-up table. The method (300) comprises: performing non-uniform quantization on parameters of each layer of a neural network (S310); performing non-uniform quantization on inputs of each layer of the neural network (S320); constructing a look-up table for each layer by multiplying each of the quantization values of the parameters of said layer by each of the quantization values of the inputs of said layer (S330); and when forward computation of the neural network is to be performed, looking up a result of multiplication computation of the parameters and inputs of each layer in the look-up table of said layer, and performing computation layer by layer until all computation is done (S340). The method performs non-uniform quantization on all parameters and inputs of a neural network, and further adopts a look-up table to replace multiplication computation, thus accelerating computation of the neural network.

Description

利用非均匀量化和查找表加速神经网络计算的方法和系统Method and system for accelerating neural network computation using non-uniform quantization and lookup tables 技术领域Technical field
本发明涉及人工神经网络,更具体涉及利用非均匀量化和查找表加速神经网络计算的方法和系统。The present invention relates to artificial neural networks, and more particularly to methods and systems for accelerating neural network computation using non-uniform quantization and lookup tables.
背景技术Background technique
人工神经网络(Artificial Neural Networks,可以缩写为ANN),也简称神经网络(NN),是一种模仿动物神经网络的行为特征,进行分布式并行信息处理的数学计算模型。近年来,神经网络发展很快,被广泛应用于诸多领域,如图像识别、语音识别、自然语言处理、天气预报、基因表达、内容推送等等。Artificial Neural Networks (abbreviated as ANN), also referred to as neural network (NN), is a mathematical calculation model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. In recent years, neural networks have developed rapidly and are widely used in many fields, such as image recognition, speech recognition, natural language processing, weather forecasting, gene expression, content push, and so on.
近年来,神经网络发展迅速,层数越来越深,参数量相应的也越来越多,对计算资源的需求也不断增加。然而在移动设备上,在嵌入式的应用场景中,计算能力以及功耗都严格受限,因此网络压缩的相关技术也越来越受到大家的关注。In recent years, neural networks have developed rapidly, the number of layers has become deeper and deeper, and the number of parameters has increased accordingly. The demand for computing resources has also increased. However, on mobile devices, in embedded application scenarios, computing power and power consumption are strictly limited, so the related technologies of network compression are getting more and more attention.
网络压缩的主流技术包括剪枝(pruning)、量化(quantization)、蒸馏(distilling)等。另一方面,设计精巧的小计算量的网络,比如SqueezeNet、MobileNets、ShuffleNet,也都在一些应用中取得了不错的效果。Mainstream technologies for network compression include pruning, quantification, distillation, and the like. On the other hand, well-designed small-volume networks such as SqueezeNet, MobileNets, and ShuffleNet have also achieved good results in some applications.
此外,有研究者提出了深度压缩(Deep Compression)的概念,提议在剪枝的同时,对神经网络各层的参数(例如权重等)进行非均匀量化,从而降低运算复杂度,提高运算速度,以加速神经网络计算。可以参见S Han,H Mao,WJ Dally;Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding;Fiber,2015,56(4):3-7。然而,参数与输入进行的乘法运算,仅仅参数进行量化,只是减轻了一部分的工作量,乘法运算依然是运算复杂度的主要来源。In addition, some researchers have proposed the concept of deep compression (Deep Compression), which proposes non-uniform quantization of parameters (such as weights) of various layers of the neural network while pruning, thereby reducing computational complexity and speeding up operations. To accelerate neural network calculations. See S Han, H Mao, WJ Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; Fiber, 2015, 56(4): 3-7. However, the multiplication of parameters and inputs only quantifies the parameters, only a part of the workload is reduced, multiplication is still the main source of computational complexity.
因此,希望提出一种加速神经网络计算的方法,不仅对参数进行量化,对输入也进行量化,并将两组数量有限的量化值进行相乘的乘法运算转化为查找表的方式,以查表代替乘法,从而更大程度上降低运算复杂度,提高运算速度。Therefore, it is desirable to propose a method for accelerating the calculation of neural networks, which not only quantifies the parameters, but also quantizes the input, and converts the multiplication of the two sets of quantized values into a lookup table to check the table. Instead of multiplication, the computational complexity is reduced to a greater extent, and the computational speed is increased.
发明内容Summary of the invention
根据以上的描述,本发明的目标在于提出一种加速神经网络计算的方法。In light of the above description, it is an object of the present invention to provide a method of accelerating neural network calculations.
如上所述,深度压缩(Deep compression)只对神经网络的参数(例如权重)进行了非均匀量化,本发明的贡献是采用了非均匀量化的方法,在量化参数的基础上,采用了标定的方法量化了网络中各层的输入,并进一步采用查找表(lookup table)代替乘法运算,加速神经网络的计算。As described above, Deep compression only performs non-uniform quantization on the parameters (such as weights) of the neural network. The contribution of the present invention is to adopt a non-uniform quantization method, and based on the quantization parameters, the calibration is adopted. The method quantifies the input of each layer in the network, and further uses a lookup table instead of multiplication to accelerate the calculation of the neural network.
根据本发明的第一方面,提供了一种利用非均匀量化和查找表加速神经网络计算的方法。该方法可以包括:对神经网络的每一层的参数进行非均匀量化;对神经网络的每一层的输入进行非均匀量化;通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。According to a first aspect of the present invention, there is provided a method of accelerating neural network computation using non-uniform quantization and lookup tables. The method can include: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by inputting all quantized values of parameters of each layer with inputs of each layer Multiply all the quantized values to construct a lookup table for each layer; for the forward calculation of the neural network, for the multiplication of the parameters of each layer and the input, look up the result of the multiplication from the lookup table of the layer , layer by layer until all is completed.
在根据本发明的第一方面的方法中,可以通过标定的方法完成对神经网络的每一层的输入的非均匀量化。In the method according to the first aspect of the invention, the non-uniform quantization of the input to each layer of the neural network can be done by a calibrated method.
更具体地说,可以通过以下步骤来进行标定:将标定数据分成M个批次;对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M};对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)};针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 More specifically, the calibration can be performed by dividing the calibration data into M batches; performing a forward calculation of the network for each batch to obtain the input of this layer {X 1 , X 2 ,...X M }; Kmeans clustering all the elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 ,cnt i1 ), (c i2 , Cnt i2 ),...(c iN ,cnt iN )}; for each cluster center of M batches, Kmeans clustering is performed again, and finally N cluster centers are obtained as input quantization values {c 1 , c 2 ,...c N } class.
根据本发明的第二方面,提供了一种利用非均匀量化和查找表加速神经网络计算的系统。该系统可以包括:网络参数量化单元,用于对神经网络的每一层的参数进行非均匀量化;输入量化单元,用于对神经网络的每一层的输入进行非均匀量化;查找表,通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;主处理单元,用于进行神经网络的前向计算。其中,所述主处理单元在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。According to a second aspect of the present invention, a system for accelerating neural network computation using non-uniform quantization and lookup tables is provided. The system may include: a network parameter quantization unit for non-uniform quantization of parameters of each layer of the neural network; an input quantization unit for non-uniform quantization of input of each layer of the neural network; Multiply all the quantized values of the parameters of each layer with all the quantized values of the input of each layer to construct a lookup table for each layer; the main processing unit is used for forward calculation of the neural network. Wherein, when performing the forward calculation of the neural network, the main processing unit searches for the result of the multiplication operation from the lookup table of the layer for the multiplication operation of the parameters of each layer and the input, and calculates the layer by layer until all is completed.
在根据本发明的第二方面的系统中,所述输入量化单元可以被配置为通过标定的方法完成对神经网络的每一层的输入的非均匀量化。In a system according to the second aspect of the invention, the input quantization unit may be configured to perform non-uniform quantization of the input to each layer of the neural network by a calibrated method.
更具体地说,所述输入量化单元可以被配置为通过以下操作进行标定:将标定数据分成M个批次;对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M};对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)};针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 More specifically, the input quantization unit may be configured to perform calibration by dividing the calibration data into M batches; performing a forward calculation of the network for each batch to obtain an input of this layer {X 1 , X 2 ,...X M }; Kmeans clustering all the elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 ,cnt I1 ),(c i2 ,cnt i2 ),...(c iN ,cnt iN )}; for each cluster center of M batches, another Kmeans clustering is performed, and finally N cluster centers are obtained as input quantization The value {c 1 , c 2 , ... c N }.
根据本发明的第三方面,提供了一种计算机可读介质,用于记录可由处理器执行的指令,所述指令在被处理器执行时,使得处理器执行利用非均匀量化和查找表加速神经网络计算的方法,包括如下操作:对神经网络的每一层的参数进行非均匀量化;对神经网络的每一层的输入进行非均匀量化;通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。According to a third aspect of the present invention, a computer readable medium is provided for recording instructions executable by a processor, when executed by a processor, causing a processor to perform acceleration of a neural field using a non-uniform quantization and lookup table A method of network computing includes the following operations: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by arranging all quantized values of each layer of parameters All the quantized values of the input of one layer are multiplied to construct a lookup table for each layer; when performing the forward calculation of the neural network, the multiplication operation of the parameters and inputs of each layer is searched from the lookup table of the layer. The result of the multiplication operation is calculated layer by layer until all is completed.
本发明对神经网络的参数和输入都进行了非均匀量化,并进一步采用查找表代替乘法运算,从而加速了神经网络的计算。The invention performs non-uniform quantization on the parameters and inputs of the neural network, and further uses a lookup table instead of multiplication, thereby accelerating the calculation of the neural network.
附图说明DRAWINGS
下面参考附图结合实施例说明本发明。在附图中:The invention will now be described in connection with the embodiments with reference to the accompanying drawings. In the drawing:
图1是图示说明利用标定的方法对神经网络的输入进行非均匀量化的过程。FIG. 1 is a process illustrating non-uniform quantization of an input of a neural network using a calibration method.
图2是图示说明根据本发明的利用非均匀量化和查找表加速神经网络计算的系统的框图。2 is a block diagram illustrating a system for accelerating neural network calculations using non-uniform quantization and lookup tables in accordance with the present invention.
图3是图示说明根据本发明的利用非均匀量化和查找表加速神经网络计算的方法的流程图。3 is a flow chart illustrating a method of accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention.
具体实施方式Detailed ways
附图仅用于示例说明,不能理解为对本发明的限制。下面结合附图和实施例对本发明的技术方案做进一步的说明。The drawings are for illustrative purposes only and are not to be construed as limiting. The technical solution of the present invention will be further described below with reference to the accompanying drawings and embodiments.
如上所述,深度压缩(Deep compression)只对神经网络的参数(例如权重)进行了非均匀量化。本发明采用非均匀量化的方法,在量化参数的基础上,采用了标定的方法量化了网络中各层的输入,并进一步采用查找表(lookup table)代替乘法运算,加速神经网络的计算。As noted above, Deep compression only non-uniformly quantifies the parameters of the neural network, such as weights. The invention adopts a non-uniform quantization method. On the basis of the quantization parameters, the calibration method is used to quantify the input of each layer in the network, and a lookup table is further used instead of the multiplication operation to accelerate the calculation of the neural network.
下面分两部分来介绍本发明的创新技术。The innovative technology of the present invention will be described in two parts below.
量化输入Quantitative input
在神经网络预测的阶段,网络的参数已经固定,因此深度压缩(Deep compression)可以直接对网络参数进行Kmeans聚类,得到N个类别的聚类中心,就是非均匀量化的N个量化值。In the stage of neural network prediction, the parameters of the network have been fixed. Therefore, deep compression can directly perform Kmeans clustering on network parameters, and obtain cluster centers of N categories, which are non-uniformly quantized N quantized values.
相比于参数的量化,输入的量化要复杂的多,根本的原因是输入的不确定性造成的。神经网络中每一层的输入取决于前一层的输出,最终由整个网络的输入决定。因此每一层的输入都具有一定的随机性。本文采用标定(Calibration)的方法对神经网络的输入(activation)进行量化。所谓标定,其实也可以理解为一种训练,通过一定量的标定样本(训练样本),计算出 输入的N个量化值,完成输入的非均匀量化。为了与神经网络本身的训练区分开,本文后面都采用标定这一说法。在实际操作中,一般取N(即量化值个数)为2的幂次,即N=2 n,称为量化到n比特,或者n比特量化。 Compared to the quantification of parameters, the quantization of the input is much more complicated. The fundamental reason is the uncertainty of the input. The input to each layer in the neural network depends on the output of the previous layer and is ultimately determined by the input of the entire network. Therefore, the input of each layer has a certain randomness. In this paper, the calibration of the neural network is quantified by the method of calibration. The so-called calibration can also be understood as a kind of training. Through a certain amount of calibration samples (training samples), the input N quantized values are calculated to complete the non-uniform quantization of the input. In order to distinguish from the training of the neural network itself, this statement is used later in this paper. In actual operation, N (that is, the number of quantized values) is generally taken as a power of 2, that is, N=2 n , which is called quantization to n bits, or n-bit quantization.
图1是图示说明利用标定的方法对神经网络的输入进行非均匀量化的过程。图1展示了神经网络中任意一层的量化过程。在图1中,我们展示的是3比特量化,通过标定最终会将输入量化到8(2 3)个量化值。根据网络结构的复杂程度,标定样本需要几十张到几千张不等。我们将标定数据分成M个批次(图1中记为Batch 1、Batch 2、…、Batch M)并分别利用Kmeans聚类得到每个批次的N个聚类中心,最后再对各批次的聚类中心再进行一次Kmeans聚类,得到最终的N个量化值。具体的步骤如下: FIG. 1 is a process illustrating non-uniform quantization of an input of a neural network using a calibration method. Figure 1 shows the quantization process for any layer in a neural network. In Figure 1, we show a 3-bit quantization that is finally quantized to 8 (2 3 ) quantized values by calibration. Depending on the complexity of the network structure, calibration samples can range from tens to thousands. We divide the calibration data into M batches (labeled as Batch 1, Batch 2, ..., Batch M in Figure 1) and use Kmeans clustering to obtain N cluster centers for each batch, and finally for each batch. The clustering center performs another Kmeans clustering to obtain the final N quantized values. The specific steps are as follows:
1、对每个批次分别进行网络的前向计算,得到这一层的输入,记为{X 1,X 2,…X M}; 1. Perform a forward calculation of the network for each batch, and obtain the input of this layer, which is recorded as {X 1 , X 2 , ... X M };
2、对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心(centroid)和该聚类(cluster)的样本数,记为{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)}; 2. Kmeans clustering all the elements in the i-th batch X i to obtain N=2 n cluster centers (centroid) and the number of samples of the cluster (c), denoted as {(c i1 , Cnt i1 ), (c i2 , cnt i2 ),...(c iN ,cnt iN )};
3、针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 3. For all cluster centers of M batches, Kmeans clustering is performed again, and finally N cluster centers are obtained as input quantized values {c 1 , c 2 , ... c N }.
利用查找表加速神经网络的前向计算Accelerating forward computing of neural networks using lookup tables
当神经网络的参数和输入都进行非均匀量化之后,神经网络中可能进行乘法运算的组合就是一个有限的集合,查找表成为一种合适的加速方法。When the parameters and inputs of the neural network are non-uniformly quantized, the combination of possible multiplications in the neural network is a finite set, and the lookup table becomes a suitable acceleration method.
以网络中任意一层为例,假设参数被量化到4比特(bit)(16个量化值),输入也被量化到4比特(bit),神经网络中的乘法运算来自于输入和参数的相乘,因此这里可能的乘法组合就只有16x 16=256种。我们事先将所有可能的乘法计算出来,存储在查找表中,在前向计算的过程中,可以免去乘法计算,直接查表得到乘法的计算结果。Taking any layer in the network as an example, assume that the parameters are quantized to 4 bits (16 quantized values), the input is also quantized to 4 bits, and the multiplication in the neural network comes from the phase of the input and parameters. Multiply, so the possible multiplication combinations here are only 16x 16=256. We calculate all the possible multiplications in advance and store them in the lookup table. In the process of forward calculation, we can eliminate the multiplication calculation and directly look up the table to get the multiplication calculation result.
利用非均匀量化和查找表技术,加速神经网络前向计算的整体步骤如下:Using non-uniform quantization and lookup table techniques, the overall steps to accelerate neural network forward computation are as follows:
1、利用深度压缩(Deep compression)的方法对参数进行非均匀量化;1. Non-uniform quantization of parameters by means of deep compression;
2、量化输入,利用标定数据,计算出输入的量化值;2. Quantitative input, using the calibration data to calculate the input quantized value;
3、根据参数和输入的量化值,预先计算出每一层的查找表;3. Calculate a lookup table for each layer in advance based on the parameters and the input quantized values;
4、前向计算过程中,对于输入的任意特征,将每层的输入进行量化后,查表计算出乘法的结果,逐层计算直至全部完成。4. In the forward calculation process, for each characteristic of the input, after the input of each layer is quantized, the result of the multiplication is calculated by looking up the table, and the calculation is performed layer by layer until all is completed.
根据以上的描述,可以构建出一种利用非均匀量化和查找表加速神经网络计算的系统和方法。From the above description, a system and method for accelerating neural network computation using non-uniform quantization and lookup tables can be constructed.
图2是图示说明根据本发明的利用非均匀量化和查找表加速神经网络计算的系统的框图。2 is a block diagram illustrating a system for accelerating neural network calculations using non-uniform quantization and lookup tables in accordance with the present invention.
如图2中所示,根据本发明的利用非均匀量化和查找表加速神经网络计算的系统200可以包括:网络参数量化单元201,用于对神经网络的每一层的参数(例如权重等)进行非均匀量化;输入量化单元202,用于对神经网络的每一层的输入(对于第一层来说,是原始输入;对于以后的层来说,是前一层的输出)进行非均匀量化;查找表203,通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表203;主处理单元204,用于进行神经网络的前向计算。其中,所述主处理单元204在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表204中查找乘法运算的结果,逐层计算直至全部完成。As shown in FIG. 2, a system 200 for accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention can include a network parameter quantization unit 201 for parameters (eg, weights, etc.) for each layer of the neural network. Performing non-uniform quantization; input quantization unit 202 for inputting each layer of the neural network (for the first layer, the original input; for the later layers, the output of the previous layer) for non-uniformity Quantization; lookup table 203, constructing a lookup table 203 for each layer by multiplying all quantized values of the parameters of each layer with all quantized values of the input of each layer; main processing unit 204 for performing neural networks Forward calculation. Wherein, when performing the forward calculation of the neural network, the main processing unit 204 searches for the result of the multiplication operation from the lookup table 204 of the layer for the multiplication operation of the parameters of each layer and the input, and calculates the layer by layer until all is completed. .
在系统200中,所述网络参数量化单元201可以被配置为依照深度压缩(Deep compression)的方法对参数进行非均匀量化。所述输入量化单元202则可以被配置为通过标定的方法完成对神经网络的每一层的输入的非均匀量化。更具体地说,所述输入量化单元202可以被配置为通过以下操作进行标定:将标定数据分成M个批次;对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M};对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)};针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 In system 200, the network parameter quantization unit 201 can be configured to non-uniformly quantize parameters in accordance with a deep compression method. The input quantization unit 202 can then be configured to perform non-uniform quantization of the input to each layer of the neural network by a calibrated method. More specifically, the input quantization unit 202 can be configured to perform calibration by dividing the calibration data into M batches; performing a forward calculation of the network for each batch to obtain an input of this layer { X 1 , X 2 ,...X M }; Kmeans clustering for all elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 , Cnt i1 ),(c i2 ,cnt i2 ),...(c iN ,cnt iN )}; for each cluster center of M batches, another Kmeans clustering is performed, and finally N cluster centers are obtained as inputs. Quantitative values {c 1 , c 2 , ... c N }.
图3是图示说明根据本发明的利用非均匀量化和查找表加速神经网络计算的方法的流程图。3 is a flow chart illustrating a method of accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention.
如图3中所示,根据本发明的利用非均匀量化和查找表加速神经网络计算的方法300开始于步骤S310,在此步骤,对神经网络的每一层的参数进行非均匀量化。As shown in FIG. 3, the method 300 for accelerating neural network computation using non-uniform quantization and lookup tables in accordance with the present invention begins in step S310, where non-uniform quantization of parameters of each layer of the neural network is performed.
本领域技术人员应该理解,如前所述,针对神经网络的每一层的参数进行的非均匀量化,可以采用深度压缩(Deep compression)的方法,但是也不排除使用其他的方法对参数进行非均匀量化。有关深度压缩的方法,可以参见S Han,H Mao,WJ Dally;Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding;Fiber,2015,56(4):3-7。在此通过援引,将其整体加入进来。Those skilled in the art should understand that, as described above, for the non-uniform quantization of the parameters of each layer of the neural network, a deep compression method may be adopted, but it is not excluded to use other methods to perform non-uniform parameters. Uniform quantization. For a method of deep compression, see S Han, H Mao, WJ Dally; Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding; Fiber, 2015, 56(4): 3-7. Here, by reference, the whole is added.
在步骤S320,对神经网络的每一层的输入进行非均匀量化。在步骤S320中,可以通过标定的方法完成对神经网络的每一层的输入的非均匀量化。具体地说,如上所述,可以通过以下步骤来进行标定:In step S320, the input of each layer of the neural network is non-uniformly quantized. In step S320, non-uniform quantization of the input to each layer of the neural network can be accomplished by a calibrated method. Specifically, as described above, the calibration can be performed by the following steps:
1、将标定数据分成M个批次;1. Divide the calibration data into M batches;
2、对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M}; 2. Perform a forward calculation of the network for each batch to obtain the input {X 1 , X 2 , ... X M } of this layer;
3、对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)}; 3. Kmeans clustering all the elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 ,cnt i1 ),(c i2 ,cnt I2 ),...(c iN ,cnt iN )};
4、针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 4. For all cluster centers of M batches, Kmeans clustering is performed again, and finally N cluster centers are obtained as input quantized values {c 1 , c 2 , ... c N }.
本领域技术人员应该理解,尽管在具体实施例中,关于输入的量化是通过标定的方法来完成的,并且更具体地,通过Kmeans聚类以及通过上述的具体步骤来完成,但是本发明并不排除使用其他的方法对输入进行非均匀量化。It will be understood by those skilled in the art that although in a particular embodiment, the quantization of the input is done by a calibrated method, and more specifically by Kmeans clustering and by the specific steps described above, the invention is not Exclude non-uniform quantization of the input using other methods.
接下来,在步骤S330,通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表。Next, in step S330, a lookup table for each layer is constructed by multiplying all quantized values of the parameters of each layer with all quantized values of the input of each layer.
在步骤S340,在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。In step S340, when performing the forward calculation of the neural network, for the multiplication operation of the parameters of each layer and the input, the result of the multiplication operation is searched from the lookup table of the layer, and the calculation is performed layer by layer until all is completed.
由此,方法300可以结束。Thus, method 300 can end.
本领域普通技术人员应该认识到,本发明的方法可以实现为计算机程序。如上结合图3所述,根据上述实施例的方法可以执行一个或多个程序,包括指令来使得计算机或处理器执行结合附图所述的算法。这些程序可以使用各种类型的非瞬时计算机可读介质存储并提供给计算机或处理器。非瞬时计算机可读介质包括各种类型的有形存贮介质。非瞬时计算机可读介质的示例包括磁性记录介质(诸如软盘、磁带和硬盘驱动器)、磁光记录介质(诸如磁光盘)、CD-ROM(紧凑盘只读存储器)、CD-R、CD-R/W以及半导体存储器(诸如ROM、PROM(可编程ROM)、EPROM(可擦写PROM)、闪存ROM和RAM(随机存取存储器))。进一步,这些程序可以通过使用各种类型的瞬时计算机可读介质而提供给计算机。瞬时计算机可读介质的示例包括电信号、光信号和电磁波。瞬时计算机可读介质可以用于通过诸如电线和光纤的有线通信路径或无线通信路径提供程序给计算机。One of ordinary skill in the art will recognize that the method of the present invention can be implemented as a computer program. As described above in connection with FIG. 3, the method in accordance with the above-described embodiments can execute one or more programs, including instructions to cause a computer or processor to perform the algorithms described in connection with the figures. These programs can be stored and provided to a computer or processor using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of the non-transitory computer readable medium include magnetic recording media (such as floppy disks, magnetic tapes, and hard disk drives), magneto-optical recording media (such as magneto-optical disks), CD-ROM (Compact Disc Read Only Memory), CD-R, CD-R /W and semiconductor memory (such as ROM, PROM (programmable ROM), EPROM (rewritable PROM), flash ROM and RAM (random access memory)). Further, these programs can be provided to a computer by using various types of transient computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The transitory computer readable medium can be used to provide a program to a computer via a wired communication path such as a wire and an optical fiber or a wireless communication path.
因此,根据本发明,还可以提议一种计算机程序或一种计算机可读介质,用于记录可由处理器执行的指令,所述指令在被处理器执行时,使得处理器执行利用非均匀量化和查找表加速神经网络计算的方法,包括如下操作:对神经网络的每一层的参数进行非均匀量化;对神经网络的每一层的输入进行非均匀量化;通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。Thus, in accordance with the present invention, a computer program or a computer readable medium for recording instructions executable by a processor, when executed by a processor, causes the processor to perform non-uniform quantization and The lookup table accelerates neural network computation methods, including the following operations: non-uniform quantization of parameters of each layer of the neural network; non-uniform quantization of inputs of each layer of the neural network; by passing all parameters of each layer The quantized value is multiplied with all the quantized values of the input of each layer to construct a lookup table for each layer; when performing the forward calculation of the neural network, the multiplication operation of the parameters and inputs of each layer is performed from the layer Find the result of the multiplication operation in the lookup table, and calculate it layer by layer until it is all completed.
上面已经描述了本发明的各种实施例和实施情形。但是,本发明的精神和范围不限于此。本领域技术人员将能够根据本发明的教导而做出更多的应用,而这些应用都在本发明的范围之内。Various embodiments and implementations of the invention have been described above. However, the spirit and scope of the present invention are not limited thereto. Those skilled in the art will be able to make further applications in accordance with the teachings of the present invention, and such applications are within the scope of the present invention.
也就是说,本发明的上述实施例仅仅是为清楚说明本发明所做的举例,而非对本发明实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其他不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、替换或改进等,均应包含在本发明权利要求的保护范围之内。That is, the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations or modifications of the various forms may be made by those skilled in the art in light of the above description. There is no need and no way to exhaust all of the implementations. Any modifications, substitutions or improvements made within the spirit and scope of the invention are intended to be included within the scope of the appended claims.

Claims (7)

  1. 一种利用非均匀量化和查找表加速神经网络计算的方法,包括:A method for accelerating neural network computation using non-uniform quantization and lookup tables, including:
    对神经网络的每一层的参数进行非均匀量化;Non-uniform quantization of parameters of each layer of the neural network;
    对神经网络的每一层的输入进行非均匀量化;Non-uniform quantization of the input to each layer of the neural network;
    通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;Constructing a lookup table for each layer by multiplying all quantized values of the parameters of each layer with all quantized values of the inputs of each layer;
    在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。In the forward calculation of the neural network, for the multiplication of the parameters of each layer and the input, the result of the multiplication operation is searched from the lookup table of the layer, and the calculation is performed layer by layer until all is completed.
  2. 根据权利要求1所述的方法,其中,通过标定的方法完成对神经网络的每一层的输入的非均匀量化。The method of claim 1 wherein the non-uniform quantization of the input to each layer of the neural network is accomplished by a calibrated method.
  3. 根据权利要求2所述的方法,其中,通过以下步骤来进行标定:The method of claim 2 wherein the calibration is performed by the following steps:
    将标定数据分成M个批次;Divide the calibration data into M batches;
    对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M}; Perform a forward calculation of the network for each batch to obtain the input {X 1 , X 2 , ... X M } of this layer;
    对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)}; Kmeans clustering is performed on all elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 ,cnt i1 ), (c i2 ,cnt i2 ) ,...(c iN ,cnt iN )};
    针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 For each cluster center of M batches, Kmeans clustering is performed again, and finally N cluster centers are obtained as input quantized values {c 1 , c 2 , ... c N }.
  4. 一种利用非均匀量化和查找表加速神经网络计算的系统,包括:A system for accelerating neural network computation using non-uniform quantization and lookup tables, including:
    网络参数量化单元,用于对神经网络的每一层的参数进行非均匀量化;a network parameter quantization unit for performing non-uniform quantization on parameters of each layer of the neural network;
    输入量化单元,用于对神经网络的每一层的输入进行非均匀量化;Input quantization unit for non-uniform quantization of input of each layer of the neural network;
    查找表,通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;A lookup table that constructs a lookup table for each layer by multiplying all quantized values of the parameters of each layer with all quantized values of the inputs of each layer;
    主处理单元,用于进行神经网络的前向计算,a main processing unit for performing forward calculation of a neural network,
    其中,所述主处理单元在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。Wherein, when performing the forward calculation of the neural network, the main processing unit searches for the result of the multiplication operation from the lookup table of the layer for the multiplication operation of the parameters of each layer and the input, and calculates the layer by layer until all is completed.
  5. 根据权利要求4所述的系统,其中,所述输入量化单元被配置为通过标定的方法完成对神经网络的每一层的输入的非均匀量化。The system of claim 4, wherein the input quantization unit is configured to perform non-uniform quantization of input to each layer of the neural network by a calibrated method.
  6. 根据权利要求5所述的系统,其中,所述输入量化单元被配置为通过以下操作进行标定:The system of claim 5 wherein said input quantization unit is configured to calibrate by:
    将标定数据分成M个批次;Divide the calibration data into M batches;
    对每个批次分别进行网络的前向计算,得到这一层的输入{X 1,X 2,…X M}; Perform a forward calculation of the network for each batch to obtain the input {X 1 , X 2 , ... X M } of this layer;
    对第i个批次X i中的所有元素,进行Kmeans聚类,得到N=2 n个聚类中心和这个聚类的样本数{(c i1,cnt i1),(c i2,cnt i2),…(c iN,cnt iN)}; Kmeans clustering is performed on all elements in the i-th batch X i to obtain N=2 n cluster centers and the number of samples of this cluster {(c i1 ,cnt i1 ), (c i2 ,cnt i2 ) ,...(c iN ,cnt iN )};
    针对M个批次所有的聚类中心,再进行一次Kmeans聚类,最终得到N个聚类中心作为输入的量化值{c 1,c 2,…c N}类。 For each cluster center of M batches, Kmeans clustering is performed again, and finally N cluster centers are obtained as input quantized values {c 1 , c 2 , ... c N }.
  7. 一种计算机可读介质,用于记录可由处理器执行的指令,所述指令在被处理器执行时,使得处理器执行利用非均匀量化和查找表加速神经网络计算的方法,包括如下操作:A computer readable medium for recording instructions executable by a processor, when executed by a processor, causing a processor to perform a method of accelerating neural network computation using a non-uniform quantization and lookup table, comprising:
    对神经网络的每一层的参数进行非均匀量化;Non-uniform quantization of parameters of each layer of the neural network;
    对神经网络的每一层的输入进行非均匀量化;Non-uniform quantization of the input to each layer of the neural network;
    通过将每一层的参数的所有量化值与每一层的输入的所有量化值进行乘法运算,构建每一层的查找表;Constructing a lookup table for each layer by multiplying all quantized values of the parameters of each layer with all quantized values of the inputs of each layer;
    在进行神经网络的前向计算时,对于每一层的参数与输入的乘法运算,从该层的查找表中查找乘法运算的结果,逐层计算直至全部完成。In the forward calculation of the neural network, for the multiplication of the parameters of each layer and the input, the result of the multiplication operation is searched from the lookup table of the layer, and the calculation is performed layer by layer until all is completed.
PCT/CN2018/087117 2017-10-23 2018-05-16 Neural network computation acceleration method and system based on non-uniform quantization and look-up table WO2019080483A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710990577.5A CN109697508A (en) 2017-10-23 2017-10-23 Utilize the method and system of non-uniform quantizing and look-up table accelerans network query function
CN201710990577.5 2017-10-23

Publications (1)

Publication Number Publication Date
WO2019080483A1 true WO2019080483A1 (en) 2019-05-02

Family

ID=66226693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087117 WO2019080483A1 (en) 2017-10-23 2018-05-16 Neural network computation acceleration method and system based on non-uniform quantization and look-up table

Country Status (2)

Country Link
CN (1) CN109697508A (en)
WO (1) WO2019080483A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0901008A1 (en) * 1997-09-04 1999-03-10 Ford Global Technologies, Inc. Method of generating correction tables for misfire detection using neural networks
CN102780542A (en) * 2012-07-19 2012-11-14 南京邮电大学 Gain factor adjustment method for Hopfield neural network signal blind detection
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN106919942A (en) * 2017-01-18 2017-07-04 华南理工大学 For the acceleration compression method of the depth convolutional neural networks of handwritten Kanji recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485316B (en) * 2016-10-31 2019-04-02 北京百度网讯科技有限公司 Neural network model compression method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0901008A1 (en) * 1997-09-04 1999-03-10 Ford Global Technologies, Inc. Method of generating correction tables for misfire detection using neural networks
CN102780542A (en) * 2012-07-19 2012-11-14 南京邮电大学 Gain factor adjustment method for Hopfield neural network signal blind detection
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN106919942A (en) * 2017-01-18 2017-07-04 华南理工大学 For the acceleration compression method of the depth convolutional neural networks of handwritten Kanji recognition

Also Published As

Publication number Publication date
CN109697508A (en) 2019-04-30

Similar Documents

Publication Publication Date Title
US10229356B1 (en) Error tolerant neural network model compression
TWI791610B (en) Method and apparatus for quantizing artificial neural network and floating-point neural network
KR102589303B1 (en) Method and apparatus for generating fixed point type neural network
US10152676B1 (en) Distributed training of models using stochastic gradient descent
US11271876B2 (en) Utilizing a graph neural network to identify supporting text phrases and generate digital query responses
US10984308B2 (en) Compression method for deep neural networks with load balance
US10032463B1 (en) Speech processing with learned representation of user interaction history
TWI722434B (en) Self-tuning incremental model compression method in deep neural network
CN110992935B (en) Computing system for training neural networks
Alvarez et al. On the efficient representation and execution of deep acoustic models
WO2019037700A1 (en) Speech emotion detection method and apparatus, computer device, and storage medium
CN110852421A (en) Model generation method and device
CN110852438A (en) Model generation method and device
JP7287397B2 (en) Information processing method, information processing apparatus, and information processing program
TW202139073A (en) Neural network weight encoding
CN108509422B (en) Incremental learning method and device for word vectors and electronic equipment
KR20200089588A (en) Electronic device and method for controlling the electronic device thereof
US10140581B1 (en) Conditional random field model compression
TWI738048B (en) Arithmetic framework system and method for operating floating-to-fixed arithmetic framework
Jiang et al. A low-latency LSTM accelerator using balanced sparsity based on FPGA
Huai et al. Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization
CN110874635A (en) Deep neural network model compression method and device
JP2022042467A (en) Artificial neural network model learning method and system
JP2024043504A (en) Methods, devices, electronic devices and media for accelerating neural network model inference
WO2019080483A1 (en) Neural network computation acceleration method and system based on non-uniform quantization and look-up table

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18869565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18869565

Country of ref document: EP

Kind code of ref document: A1