WO2020253692A1 - 一种针对深度学习网络参数的量化方法 - Google Patents

一种针对深度学习网络参数的量化方法 Download PDF

Info

Publication number
WO2020253692A1
WO2020253692A1 PCT/CN2020/096430 CN2020096430W WO2020253692A1 WO 2020253692 A1 WO2020253692 A1 WO 2020253692A1 CN 2020096430 W CN2020096430 W CN 2020096430W WO 2020253692 A1 WO2020253692 A1 WO 2020253692A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
parameters
deep learning
quantizer
soft
Prior art date
Application number
PCT/CN2020/096430
Other languages
English (en)
French (fr)
Inventor
韦逸
赵明敏
赵民建
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2020253692A1 publication Critical patent/WO2020253692A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention belongs to the field of deep learning, and is a quantization method for deep learning network parameters.
  • Deep learning network has been gradually created since 2006 with the introduction of a learning algorithm for deep belief networks based on cascading restricted Boltzmann machines. It is an emerging field in the field of artificial intelligence. Subject, the main content of its research is the modeling of multilayer neural network and the problem of algorithm learning. Deep learning network methods have been successfully used in many other fields, such as image processing, natural language processing, etc.
  • Deep learning is an emerging multi-layer neural network learning algorithm. Because it relieves the local minimum in traditional network training, it has attracted widespread attention in the field of machine learning. With the development in recent years, deep learning networks do not only refer to multi-layer neural networks, but generally refer to multi-layer networks composed of complex network structures. Deep learning networks can be divided into two types, one is model-driven depth Learning network, this type of network is constructed based on known knowledge and mechanisms, usually by expanding known iterative algorithms into networks, such as LAMP and LISTA algorithms; the second is a data-driven deep learning method, which regards the network as a black box And rely on a large amount of data to train this network. Common fully connected networks and deep convolutional networks belong to this method.
  • the present invention proposes a quantizer specifically for deep learning networks, which is determined by learning. Quantization function.
  • the quantizer is usually expressed as a separate hard step function.
  • the hard step function is not derivable everywhere, and most of the regional derivatives are zero. It is difficult to introduce the network to carry out the backward gradient transfer process. Therefore, the present invention specifically designs a derivable everywhere Soft step function, and introduces learnable parameters to adjust the shape of the step function.
  • the soft step function can be introduced into the network after the training is completed, and the network parameters are fixed to train the parameters of the quantizer.
  • a quantizer suitable for network parameters can be obtained, which not only reduces the storage overhead of the network, but also reduces the performance loss caused by quantization.
  • the purpose of the present invention is that in a large deep network, due to the complex network structure and the deep number of layers, the network parameters included in the network are often too many, which will cause huge storage overhead. In the parameter update system, a large number of network parameters also add a huge burden to the transmission.
  • a quantification method for deep learning network parameters is proposed. The present invention adopts the following technical solutions:
  • mapping process represented by the large deep network is:
  • y represents the input signal
  • is the learning parameter included in the deep network.
  • the training data is Where y m is the input data, s m is the label, M is the number of training data.
  • TanhSum(x) is designed, which is composed of multiple tanh( ⁇ ) functions.
  • the expression of the soft step function is:
  • a TanhSum(x) function of 2l+1 steps is composed of 2l tanh( ⁇ ) functions;
  • is the sharpness coefficient, which is a hyperparameter, which needs to be set before network training, sharp The coefficient determines the smoothness of the soft step function. The larger the coefficient, the closer to the hard step function; the number of quantization steps 2l+1, the limit G bound and the interval between adjacent steps G are all hyperparameters determined in step (3) .
  • the soft step function Q s (x) after introducing learnable parameters can be expressed as:
  • w 1t is used to adjust the height of the t-th step
  • w 2t is used to adjust the width of the t-th step
  • b 1t is used to adjust the position of the t-th step in the x-axis direction
  • b 2t is used to adjust the t-th step The position of the step in the y-axis direction.
  • the L2 norm is chosen as the cost function to learn the learnable parameters of this quantizer.
  • the soft step function with learning parameters is introduced into the deep learning network, the learning parameters are quantified, and the quantizer parameters are learned through the same training data.
  • is the learning parameter of the large-scale deep network after training
  • ⁇ ,l,G b ⁇ is the hyperparameter determined in step (4).
  • the training process adopts the annealing strategy, that is, gradually increase the value of the sharpness coefficient ⁇ during the training process, and gradually make the soft step function approach the separated hard step function.
  • the present invention makes full use of the deep learning method. Therefore, the present invention specifically designs a soft step function that can be guided everywhere, and introduces learnable parameters to adjust The shape of the step function.
  • the soft step function can be introduced into the network after the training is completed, and the network parameters are fixed to train the parameters of the quantizer.
  • a quantizer suitable for network parameters can be obtained.
  • the steps of the quantizer are non-uniform, and its shape is adjusted with the specific distribution of network parameters.
  • Introducing the trained quantizer into the network to quantify the network parameters can not only greatly reduce the storage overhead of the network, but also minimize the network performance loss caused by the quantization parameters.
  • Figure 1 is a schematic diagram of the network structure of an example large-scale deep network
  • Figure 2 is a schematic diagram of the network structure of the quantizer applied to a large deep network
  • Figure 3 shows the specific shape of the trained quantizer
  • Figure 4 shows the network performance after quantization using the trained quantizer.
  • the invented quantization method for deep learning network parameters is applied to specific scenarios for clearer description.
  • a deep network LcgNetV used in massive MIMO signal detection in the wireless communication field.
  • the network is composed of multiple layers of the same structure.
  • the network can realize the function of inputting and receiving signals and detecting the transmitted signals.
  • mapping process represented by the large deep network is:
  • y represents the input signal
  • is the learning parameter included in the deep network.
  • the training data is Where y m is the input data, s m is the label, M is the number of training data.
  • TanhSum(x) is designed, which is composed of multiple tanh( ⁇ ) functions.
  • the expression of the soft step function is:
  • a TanhSum(x) function of 2l+1 steps is composed of 2l tanh( ⁇ ) functions;
  • is the sharpness coefficient, which is a hyperparameter, which needs to be set before network training, sharp The coefficient determines the smoothness of the soft step function. The larger the coefficient, the closer to the hard step function; the number of quantization steps 2l+1, the network parameter limit G b, and the adjacent step interval G are all determined in step (3) Hyperparameters.
  • the soft step function Q s (x) after introducing learnable parameters can be expressed as:
  • w 1t is used to adjust the height of the t-th step
  • w 2t is used to adjust the width of the t-th step
  • b 1t is used to adjust the position of the t-th step in the x-axis direction
  • b 2t is used to adjust the t-th step The position of the step in the y-axis direction.
  • the L2 norm is chosen as the cost function to learn the learnable parameters of this quantizer.
  • is the learning parameter of the large deep network after training, here is ⁇ ,l,G b ⁇ are the hyperparameters determined in step (4).
  • the training process adopts the annealing strategy, that is, gradually increase the value of the sharpness coefficient ⁇ during the training process, and gradually make the soft step function approach the separated hard step function.
  • the sharpness coefficient ⁇ during training is ⁇ 10,100,500 ⁇ in order, when the normalization is equal
  • the training is terminated when the square error no longer decreases.
  • the soft step function obtained from training is solidified into a quantizer to quantify the parameters of the deep learning network.
  • Figure 3 compares the shapes of different quantizers under 3bit quantization (a) and 4bit quantization (b).
  • Hard quantizer represents different quantizers based on hard step functions
  • soft quantizer represents the quantizer proposed by the present invention. It can be seen from the figure that in the quantizer proposed in the present invention, the quantization steps are not uniform, and it can be seen that the quantizer is adjusted according to the specific distribution of network parameters.
  • Figure 4 compares the performance curves of the example network LcgNetV under different quantizers. The performance is measured by the detection bit error rate under different sex to noise ratios.
  • the LcgNetV curve represents the unquantified detection performance.
  • QLcgNetV hard 3bit and QLcgNetV hard 4bit represent the detection performance of LcgNetV after 3bit and 4bit quantization with ordinary hard step functions.
  • QLcgNetV soft 3bit and QLcgNetV soft 4bit represent the The detection performance of LcgNetV is proposed after the number of quantizers are quantized by 3bit and 4bit. It can be seen from the figure that the performance provided by the quantizer provided by the present invention is significantly better than the performance provided by the ordinary quantizer. The 3bit quantization result of the quantizer proposed by the present invention even exceeds that of the ordinary quantizer 4bit quantization. result.
  • the invention is a deep learning beam domain channel estimation method applied to a millimeter wave massive MIMO system based on a lens antenna and based on an approximate message propagation algorithm.
  • the deep learning beam-domain channel estimation method based on the approximate message propagation algorithm, we require protection as an invention.
  • the above are only specific implementations for specific applications, but the true spirit and scope of the present invention are not limited to this. Any person skilled in the art can modify, equivalently replace, improve, etc., to implement channel estimation methods for different applications .
  • the present invention is defined by the claims and their equivalent technical solutions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Analysis (AREA)

Abstract

一种针对深度学习网络参数的量化方法。包括如下步骤:(1)构建深度学习网络,产生训练数据;(2)利用大量训练数据对所构建的深度学习网络进行训练,确定网络参数的值;(3)提取出学习参数,确定超参数;(4)设计软阶梯函数,由超参数确定所设计软阶梯函数的具体表达式,并引入可学习参数,使之形状可调整;(5)将该带有学习参数的软阶梯函数引入深度学习网络,量化学习参数,通过相同的训练数据学习量化器参数,训练过程采用退火策略;(6)训练所得软阶梯函数固化为量化器,对深度学习网络进行量化。该方法能够有效降低量化所引起的性能损失,大大降低了深度网络所需的存储开销。

Description

一种针对深度学习网络参数的量化方法 技术领域
本发明属于深度学习领域,是一种针对深度学习网络参数的量化方法。
背景技术
深度学习网络这一学科是自从2006年起,随着基于层叠的限制玻尔兹曼机的深度信念网络的学习算法的提出,而逐渐开创起来的,它在人工智能领域中是一门新兴的学科,其研究的主要内容,就是多层神经网络的建模和算法学习的问题。深度学习网络方法已经成功运用于其他很多领域,比如说图像处理,自然语言处理等。
深度学习是一种新兴的多层神经网络学习算法,因其缓解了传统网络训练中局部最小性,引起机器学习领域的广泛关注。随着近几年的发展,深度学习网络不仅仅指多层神经网络,而是泛指由复杂网络结构构成的多层网络,主要可将深度学习网络分为两种,一是模型驱动的深度学习网络,该类网络根据已知的知识和机制构建,通常是将已知的迭代算法展开成网络,比如LAMP和LISTA算法;二是数据驱动深度学习方法,此方法将网络看做是黑盒并依赖大量数据训练这个网络,常见的全连接网络以及深度卷积网络都属于此种方法。深度网络凭借其多层网络结构,在很多领域获得了很好的应用,但是与此同时,随着网络层次的增加,随之而来的是数量庞大的网络参数,不仅难以学习,在存储网络结构时需要大量的硬件开销。
对于大型深度网络,量化网络参数不是为一种压缩网络存储空间的方法。另外,在不同的应用中,训练所得的网络参数具有不同的分布,如果采用常用的量化器,容易引起较大的量化误差,本发明提出了专门针对深度学习网络的量化器,通过学习来确定量化函数。量化器通常表示为分离的硬阶梯函数,硬阶梯函数并非处处可导,且绝大部分区域导数为零,难以引入网络进行后向梯度传递过程,因此本发明专门设计了一种处处可导的软阶梯函数,并且引入了可学习参数以调整该阶梯函数的形状。该软阶梯函数可以引入训练完成后的网络中,固定网络参数,以训练量化器的参数。通过学习,可得到适应于网络参数的量化器,不仅减少了网络的存储开销,并且减少了由于量化引起的性能损失。
发明内容
本发明的目的是针对在大型深度网络中,由于网络结构复杂,层数深,网络中所包含的网络参数往往数量过多,这会引起巨大的存储开销。在参数更新的系统中,大量网络 参数也给传输增加了巨大的负担。提出了一种针对深度学习网络参数的量化方法。本发明采用如下技术方案:
(1)构建所需的深度学习网络结构,并根据问题产生训练数据;
(2)利用大量训练数据对所构建的深度学习网络进行训练,确定网络参数的值。大型深度网络所表示的映射过程为:
Figure PCTCN2020096430-appb-000001
其中y表示输入信号,
Figure PCTCN2020096430-appb-000002
表示网络的输出信号,Θ为深度网络中所包含学习参数。训练数据为
Figure PCTCN2020096430-appb-000003
其中y m是输入数据,s m是标签,M是训练数据的数量。
(3)提取训练后的网络参数,寻找界限,将绝对值最大的网络参数作为界限G b。确定所需量化的阶梯数2l+1,根据阶梯数可确认所量化比特数
Figure PCTCN2020096430-appb-000004
根据量化阶梯数L以及界限确定相邻阶梯间隔为
Figure PCTCN2020096430-appb-000005
(4)设计可引入网络的处处可导的软阶梯函数,由(3)所确认的超参数确定软阶梯函数的函数具体表达式,并引入可学习参数。具体做法如下:
为了赋予阶梯函数处处可导的能力使其能够引入网络训练,设计专门的软阶梯函数TanhSum(x),其由多个tanh(·)函数组成,该软阶梯函数的表达式为:
Figure PCTCN2020096430-appb-000006
其中2l+1表示该阶梯函数台阶数,一个2l+1台阶的TanhSum(x)函数由2l个tanh(·)函数组成;σ是为锐利系数,为超参数,需要在网络训练之前设置,锐利系数决定了该软阶梯函数的平滑程度,该系数越大,越接近硬阶梯函数;量化台阶数2l+1、界限G bound以及相邻阶梯间隔G都是步骤(3)中所确定的超参数。
将可学习参数引入该量化器,使其能够被学习并根据深度学习网络中参数的特征调整其形状,引入可学习参数后的软阶梯函数Q s(x)可表示为:
Figure PCTCN2020096430-appb-000007
其中,w 1t用于调整第t个阶梯的高度,w 2t用于调整第t个阶梯的宽度,b 1t用于调整第t个阶梯在x轴方向的位置,b 2t用于调整第t个阶梯在y轴方向的位置。选择L2范数作为代价函数对此量化器的可学习参数进行学习。
(5)将该带有学习参数的软阶梯函数引入深度学习网络,量化学习参数,通过相同的训练数据学习量化器参数。
采用L2函数作为损失函数
Figure PCTCN2020096430-appb-000008
Figure PCTCN2020096430-appb-000009
其中,
Figure PCTCN2020096430-appb-000010
为训练大型深度网络的训练数据,
Figure PCTCN2020096430-appb-000011
为量化器中所包含的学习参数,Θ是训练后的大型深度网络学习参数,{σ,l,G b}是步骤(4)中所确定的超参数。
训练过程采用退火策略,即在训练过程中逐渐增加锐利系数σ的值,逐渐使软阶梯函数趋近于分离的硬阶梯函数。
(6)训练所得软阶梯函数固化为量化器,对深度学习网络进行量化。
本发明针对在大型深度网络中参数众多而导致存储开销巨大的情况,充分利用了深度学习的方法,因此本发明专门设计了一种处处可导的软阶梯函数,并且引入了可学习参数以调整该阶梯函数的形状。该软阶梯函数可以引入训练完成后的网络中,固定网络参数,以训练量化器的参数。通过学习,可得到适应于网络参数的量化器,将量化器的阶梯是非均匀的,其形状随着网络参数的具体分布作出了调整。将训练所得量化器引入网络中对网络参数进行量化,不仅能够大量减少了网络的存储开销,还尽可能降低了由于量化参数而导致的网络性能损失。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是示例大型深度网络的网络结构示意图;
图2是量化器应用于大型深度网络的网络结构示意图;;
图3是训练所得量化器具体形状;
图4是运用所训练量化器量化后的网络性能。
具体实施方式
为了使本发明的技术方案和优点变得更加清晰,接下来将结合附图对技术方案的具体实施方式作更加详细地说明:
此处将所发明的针对深度学习网络参数的量化方法应用于具体场景中以便更清晰的说明。考虑一个运用于无线通信领域大规模MIMO信号检测的深度网络LcgNetV,该网络由 相同结构的多层构成,该网络可实现输入接收信号,检测出发射信号的功能。
(1)构建所需的深度学习网络结构LcgNetV,该网络由L层网络构成,每一层网络具有相同的结构,单层网络结构由图1所示,其中
Figure PCTCN2020096430-appb-000012
代表检测信号,为单层网络的输出,
Figure PCTCN2020096430-appb-000013
为网络各层之间的传递变量,第一层网络的输入
Figure PCTCN2020096430-appb-000014
即为接收信号y rm,最后一层网络的输出
Figure PCTCN2020096430-appb-000015
为由网络所得的检测信号,{α (i)(i)}为第i层所包含的待学习的网络参数。根据问题产生训练数据
Figure PCTCN2020096430-appb-000016
M为训练数据的数量;
(2)利用大量训练数据对所构建的深度学习网络进行训练,确定网络参数的值。大型深度网络所表示的映射过程为:
Figure PCTCN2020096430-appb-000017
其中y表示输入信号,
Figure PCTCN2020096430-appb-000018
表示网络的输出信号,Θ为深度网络中所包含学习参数。训练数据为
Figure PCTCN2020096430-appb-000019
其中y m是输入数据,s m是标签,M是训练数据的数量。
(3)提取训练后的网络参数,寻找界限,将绝对值最大的网络参数作为界限G b。确定所需量化的阶梯数2l+1,根据阶梯数可确认所量化比特数
Figure PCTCN2020096430-appb-000020
根据量化阶梯数2l+1以及界限确定相邻阶梯间隔为
Figure PCTCN2020096430-appb-000021
此处我们选择量化台阶数2l+1为7和15,所对应的量化比特数为3bit和4bit。根据训练结果,G b=2.5。
(4)设计可引入网络的处处可导的软阶梯函数,由(3)所确认的超参数确定软阶梯函数的函数具体表达式,并引入可学习参数。具体做法如下:
为了赋予阶梯函数处处可导的能力使其能够引入网络训练,设计专门的软阶梯函数TanhSum(x),其由多个tanh(·)函数组成,该软阶梯函数的表达式为:
Figure PCTCN2020096430-appb-000022
其中2l+1表示该阶梯函数台阶数,一个2l+1台阶的TanhSum(x)函数由2l个tanh(·)函数组成;σ是为锐利系数,为超参数,需要在网络训练之前设置,锐利系数决定了该软阶梯函数的平滑程度,该系数越大,越接近硬阶梯函数;量化台阶数2l+1、网络参数界限G b以及相邻阶梯间隔G都是步骤(3)中所确定的超参数。
将可学习参数引入该量化器,使其能够被学习并根据深度学习网络中参数的特征调整其形状,引入可学习参数后的软阶梯函数Q s(x)可表示为:
Figure PCTCN2020096430-appb-000023
其中,w 1t用于调整第t个阶梯的高度,w 2t用于调整第t个阶梯的宽度,b 1t用于调整第t个阶梯在x轴方向的位置,b 2t用于调整第t个阶梯在y轴方向的位置。选择L2范数作为代价函数对此量化器的可学习参数进行学习。
(5)固定深度学习网络参数,将该带有可学习参数的软阶梯函数引入深度学习网络,量化网络参数,通过步骤(1)所述的训练数据学习量化器参数。引入量化器的深度网络模型如图2所示,该网络的所有参数都通过相同的量化器进行量化。
采用L2函数作为损失函数
Figure PCTCN2020096430-appb-000024
Figure PCTCN2020096430-appb-000025
其中,
Figure PCTCN2020096430-appb-000026
为训练大型深度网络的训练数据,
Figure PCTCN2020096430-appb-000027
为量化器中所包含的学习参数,Θ是训练后的大型深度网络学习参数,此处为
Figure PCTCN2020096430-appb-000028
{σ,l,G b}是步骤(4)中所确定的超参数。
训练过程采用退火策略,即在训练过程中逐渐增加锐利系数σ的值,逐渐使软阶梯函数趋近于分离的硬阶梯函数,此处训练时锐利系数σ依次为{10,100,500},当归一化均方误差不再下降时训练终止。
(6)训练所得软阶梯函数固化为量化器,对深度学习网络参数进行量化。
图3对比了在3bit量化(a)和4bit量化(b)下不同量化器的形状,其中hard quantizer表示的不同基于硬阶梯函数的量化器,soft quantizer表示的是本发明所提出的量化器。可以从图中看出,本发明所提出的量化器中,量化台阶是不均匀的,可见该量化器随着网络参数的具体分布作出了调整。
图4对比了所示例网络LcgNetV在不同量化器下的性能曲线,性能由不同性噪比下的检测误比特率衡量。LcgNetV曲线代表未经量化的检测性能,QLcgNetV hard 3bit和QLcgNetV hard 4bit表示的是用普通硬阶梯函数经过3bit和4bit量化后放入LcgNetV的检测性能,QLcgNetV soft 3bit和QLcgNetV soft 4bit表示的是用所提出量化器数经过3bit和4bit量化后放入LcgNetV的检测性能。从图中可以看出本发明所提供的量化器所提供的性能要明显好于用普通量化器提供的性能,用本发明所提出的量化器3bit量化的结果甚至超过了普通量化器4bit量化的结果。
本发明是一种应用于基于透镜天线的毫米波大规模MIMO系统,基于近似消息传播算法的深度学习波束域信道估计方法。针对基于近似消息传播算法的深度学习波束域信道估计方法,我们要求将作为发明进行保护。以上所述仅为特定应用场合的具体实施方式,但 本发明的真实精神和范围不局限于此,任何熟悉本领域的技术人员可以修改、等同替换、改进等,实现不同应用场合的信道估计方法。本发明由权利要求书及其等效技术方案来限定。

Claims (4)

  1. 一种针对深度学习网络参数的量化方法,其特征在于包括以下步骤:
    (1)构建深度学习网络,并根据问题产生训练数据;
    (2)利用训练数据对所构建的深度学习网络进行训练,确定网络参数;
    (3)提取步骤(2)所述的网络参数,根据网络参数界限和量化台阶数确定量化比特数和相邻量化台阶间隔;
    (4)设计量化器,所述量化器由带有可学习参数的处处可导的软阶梯函数构成,由步骤(3)所述的网络参数界限、量化台阶数和相邻量化台阶间隔确定软阶梯函数的函数表达式,并引入可学习参数作为量化器参数;
    (5)将步骤(4)所述的量化器引入深度学习网络并量化步骤(2)得到的网络参数,采用步骤(1)所述的训练数据训练量化器参数,训练过程采用退火策略;
    (6)利用步骤(5)得到的训练后的量化器参数,运用训练好的量化器对步骤(2)得到的网络参数进行量化。
  2. 如权利要求1所述的针对深度学习网络参数的量化方法,其特征在于所述的步骤(3)具体为:
    提取训练后的网络参数,寻找界限,将绝对值最大的网络参数作为网络参数界限G b;确定所需的量化台阶数2l+1,根据量化台阶数确认所需的量化比特数
    Figure PCTCN2020096430-appb-100001
    根据量化台阶数2l+1以及网络参数界限G b确定相邻量化台阶间隔为
    Figure PCTCN2020096430-appb-100002
  3. 如权利要求1所述的针对深度学习网络参数的量化方法,其特征在于所述的步骤(4)具体为:
    设计可引入深度学习网络的处处可导的软阶梯函数TanhSum(x),其由多个tanh(·)函数组成,该软阶梯函数的表达式为:
    Figure PCTCN2020096430-appb-100003
    其中2l+1表示该软阶梯函数台阶数,一个2l+1台阶的TanhSum(x)函数由2l个tanh(·)函数组成;σ是为锐利系数,为超参数决定了该软阶梯函数的平滑程度,σ越大,越接近硬阶梯函数;量化台阶数2l+1、网络参数界限G b以及相邻量化台阶间隔G都是步骤(3)中所确定的超参数;
    将可学习参数引入软阶梯函数,使其能够被学习并根据深度学习网络参数的特征调整 形状,引入可学习参数后的软阶梯函数Q s(x)可表示为:
    Figure PCTCN2020096430-appb-100004
    其中,w 1t用于调整第t个台阶的高度,w 2t用于调整第t个台阶的宽度,b 1t用于调整第t个台阶在x轴方向的位置,b 2t用于调整第t个台阶在y轴方向的位置;
    将带有可学习参数的软阶梯函数作为量化器,选择L2范数作为代价函数对此量化器的可学习参数进行学习。
  4. 如权利要求1所述的针对深度学习网络参数的量化方法,其特征在于所述的步骤(5)具体为:
    将步骤(4)得到的带有可学习参数的软阶梯函数引入训练后的深度学习网络以量化步骤(2)得到的网络参数,运用训练数据训练量化器参数;训练过程采用退火策略,依次增大σ,使软阶梯函数逐渐趋近于硬阶梯函数,训练过程中采用步骤(1)所述的训练数据进行训练,当归一化均方误差不再下降时训练终止。
PCT/CN2020/096430 2019-06-17 2020-06-16 一种针对深度学习网络参数的量化方法 WO2020253692A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910521633.XA CN110378467A (zh) 2019-06-17 2019-06-17 一种针对深度学习网络参数的量化方法
CN201910521633.X 2019-06-17

Publications (1)

Publication Number Publication Date
WO2020253692A1 true WO2020253692A1 (zh) 2020-12-24

Family

ID=68249558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/096430 WO2020253692A1 (zh) 2019-06-17 2020-06-16 一种针对深度学习网络参数的量化方法

Country Status (2)

Country Link
CN (1) CN110378467A (zh)
WO (1) WO2020253692A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378467A (zh) * 2019-06-17 2019-10-25 浙江大学 一种针对深度学习网络参数的量化方法
CN112564118B (zh) * 2020-11-23 2022-03-18 广西大学 一种分布式可拓展量子深宽度学习的实时电压控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980641A (zh) * 2017-02-09 2017-07-25 上海交通大学 基于卷积神经网络的无监督哈希快速图片检索系统及方法
US20180107925A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN108717570A (zh) * 2018-05-23 2018-10-30 电子科技大学 一种脉冲神经网络参数量化方法
CN110378467A (zh) * 2019-06-17 2019-10-25 浙江大学 一种针对深度学习网络参数的量化方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399487B (zh) * 2013-07-30 2015-10-21 东北石油大学 一种基于非线性多入多出mimo系统的解耦控制方法及其装置
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
CN105790813B (zh) * 2016-05-17 2018-11-06 重庆邮电大学 一种大规模mimo下基于深度学习的码本选择方法
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN106656461B (zh) * 2016-11-25 2019-05-28 中国石油大学(华东) 一种信号量化情形下的混沌神经网络保密通信方法
CN109670057B (zh) * 2019-01-03 2021-06-29 电子科技大学 一种渐进式的端到端深度特征量化系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107925A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN106980641A (zh) * 2017-02-09 2017-07-25 上海交通大学 基于卷积神经网络的无监督哈希快速图片检索系统及方法
CN108717570A (zh) * 2018-05-23 2018-10-30 电子科技大学 一种脉冲神经网络参数量化方法
CN110378467A (zh) * 2019-06-17 2019-10-25 浙江大学 一种针对深度学习网络参数的量化方法

Also Published As

Publication number Publication date
CN110378467A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2018209932A1 (zh) 多量化深度二值特征学习方法及装置
CN108648188B (zh) 一种基于生成对抗网络的无参考图像质量评价方法
CN107832787B (zh) 基于双谱自编码特征的雷达辐射源识别方法
WO2020253692A1 (zh) 一种针对深度学习网络参数的量化方法
WO2020238237A1 (zh) 一种基于幂指数量化的神经网络压缩方法
CN111901024B (zh) 基于抗拟合深度学习的mimo信道状态信息反馈方法
CN110445581B (zh) 基于卷积神经网络降低信道译码误码率的方法
CN110276451A (zh) 一种基于权重归一化的深度神经网络压缩方法
CN112215054B (zh) 一种用于水声信号去噪的深度生成对抗方法
CN107885787A (zh) 基于谱嵌入的多视角特征融合的图像检索方法
WO2018076331A1 (zh) 一种神经网络训练方法及装置
Tian et al. A data reconstruction algorithm based on neural network for compressed sensing
CN113806559B (zh) 一种基于关系路径与双层注意力的知识图谱嵌入方法
CN105184742A (zh) 一种基于拉普拉斯图特征向量的稀疏编码的图像去噪方法
CN110351212A (zh) 快衰落信道下基于卷积神经网络的信道估计方法
CN113467949A (zh) 边缘计算环境下用于分布式dnn训练的梯度压缩方法
CN110942106A (zh) 一种基于平方平均的池化卷积神经网络图像分类方法
CN107809399B (zh) 一种针对量化接收信号的多天线毫米波信道估计方法
WO2022227957A1 (zh) 一种基于图自编码器的融合子空间聚类方法及系统
CN108737298B (zh) 一种基于图像处理的scma盲检测方法
CN112818152A (zh) 一种深度聚类模型的数据增强方法和装置
WO2021117163A1 (ja) 光通信の状態推定方法、装置及びプログラム
CN113762500B (zh) 一种卷积神经网络在量化时提高模型精度的训练方法
CN106028043B (zh) 基于新的邻域函数的三维自组织映射图像编码方法
CN110794893A (zh) 一种面向量子计算的多层噪声高精度温度控制方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20826550

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20826550

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20826550

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 160922)

122 Ep: pct application non-entry in european phase

Ref document number: 20826550

Country of ref document: EP

Kind code of ref document: A1