WO2020253692A1 - 一种针对深度学习网络参数的量化方法 - Google Patents
一种针对深度学习网络参数的量化方法 Download PDFInfo
- Publication number
- WO2020253692A1 WO2020253692A1 PCT/CN2020/096430 CN2020096430W WO2020253692A1 WO 2020253692 A1 WO2020253692 A1 WO 2020253692A1 CN 2020096430 W CN2020096430 W CN 2020096430W WO 2020253692 A1 WO2020253692 A1 WO 2020253692A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- parameters
- deep learning
- quantizer
- soft
- Prior art date
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000011002 quantification Methods 0.000 title claims abstract description 5
- 230000006870 function Effects 0.000 claims abstract description 74
- 238000012549 training Methods 0.000 claims abstract description 51
- 238000000137 annealing Methods 0.000 claims abstract description 5
- 238000013139 quantization Methods 0.000 claims description 32
- 238000013461 design Methods 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 5
- 230000007423 decrease Effects 0.000 claims description 2
- 239000010410 layer Substances 0.000 description 10
- 238000001514 detection method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000002356 single layer Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241001134453 Lista Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the invention belongs to the field of deep learning, and is a quantization method for deep learning network parameters.
- Deep learning network has been gradually created since 2006 with the introduction of a learning algorithm for deep belief networks based on cascading restricted Boltzmann machines. It is an emerging field in the field of artificial intelligence. Subject, the main content of its research is the modeling of multilayer neural network and the problem of algorithm learning. Deep learning network methods have been successfully used in many other fields, such as image processing, natural language processing, etc.
- Deep learning is an emerging multi-layer neural network learning algorithm. Because it relieves the local minimum in traditional network training, it has attracted widespread attention in the field of machine learning. With the development in recent years, deep learning networks do not only refer to multi-layer neural networks, but generally refer to multi-layer networks composed of complex network structures. Deep learning networks can be divided into two types, one is model-driven depth Learning network, this type of network is constructed based on known knowledge and mechanisms, usually by expanding known iterative algorithms into networks, such as LAMP and LISTA algorithms; the second is a data-driven deep learning method, which regards the network as a black box And rely on a large amount of data to train this network. Common fully connected networks and deep convolutional networks belong to this method.
- the present invention proposes a quantizer specifically for deep learning networks, which is determined by learning. Quantization function.
- the quantizer is usually expressed as a separate hard step function.
- the hard step function is not derivable everywhere, and most of the regional derivatives are zero. It is difficult to introduce the network to carry out the backward gradient transfer process. Therefore, the present invention specifically designs a derivable everywhere Soft step function, and introduces learnable parameters to adjust the shape of the step function.
- the soft step function can be introduced into the network after the training is completed, and the network parameters are fixed to train the parameters of the quantizer.
- a quantizer suitable for network parameters can be obtained, which not only reduces the storage overhead of the network, but also reduces the performance loss caused by quantization.
- the purpose of the present invention is that in a large deep network, due to the complex network structure and the deep number of layers, the network parameters included in the network are often too many, which will cause huge storage overhead. In the parameter update system, a large number of network parameters also add a huge burden to the transmission.
- a quantification method for deep learning network parameters is proposed. The present invention adopts the following technical solutions:
- mapping process represented by the large deep network is:
- y represents the input signal
- ⁇ is the learning parameter included in the deep network.
- the training data is Where y m is the input data, s m is the label, M is the number of training data.
- TanhSum(x) is designed, which is composed of multiple tanh( ⁇ ) functions.
- the expression of the soft step function is:
- a TanhSum(x) function of 2l+1 steps is composed of 2l tanh( ⁇ ) functions;
- ⁇ is the sharpness coefficient, which is a hyperparameter, which needs to be set before network training, sharp The coefficient determines the smoothness of the soft step function. The larger the coefficient, the closer to the hard step function; the number of quantization steps 2l+1, the limit G bound and the interval between adjacent steps G are all hyperparameters determined in step (3) .
- the soft step function Q s (x) after introducing learnable parameters can be expressed as:
- w 1t is used to adjust the height of the t-th step
- w 2t is used to adjust the width of the t-th step
- b 1t is used to adjust the position of the t-th step in the x-axis direction
- b 2t is used to adjust the t-th step The position of the step in the y-axis direction.
- the L2 norm is chosen as the cost function to learn the learnable parameters of this quantizer.
- the soft step function with learning parameters is introduced into the deep learning network, the learning parameters are quantified, and the quantizer parameters are learned through the same training data.
- ⁇ is the learning parameter of the large-scale deep network after training
- ⁇ ,l,G b ⁇ is the hyperparameter determined in step (4).
- the training process adopts the annealing strategy, that is, gradually increase the value of the sharpness coefficient ⁇ during the training process, and gradually make the soft step function approach the separated hard step function.
- the present invention makes full use of the deep learning method. Therefore, the present invention specifically designs a soft step function that can be guided everywhere, and introduces learnable parameters to adjust The shape of the step function.
- the soft step function can be introduced into the network after the training is completed, and the network parameters are fixed to train the parameters of the quantizer.
- a quantizer suitable for network parameters can be obtained.
- the steps of the quantizer are non-uniform, and its shape is adjusted with the specific distribution of network parameters.
- Introducing the trained quantizer into the network to quantify the network parameters can not only greatly reduce the storage overhead of the network, but also minimize the network performance loss caused by the quantization parameters.
- Figure 1 is a schematic diagram of the network structure of an example large-scale deep network
- Figure 2 is a schematic diagram of the network structure of the quantizer applied to a large deep network
- Figure 3 shows the specific shape of the trained quantizer
- Figure 4 shows the network performance after quantization using the trained quantizer.
- the invented quantization method for deep learning network parameters is applied to specific scenarios for clearer description.
- a deep network LcgNetV used in massive MIMO signal detection in the wireless communication field.
- the network is composed of multiple layers of the same structure.
- the network can realize the function of inputting and receiving signals and detecting the transmitted signals.
- mapping process represented by the large deep network is:
- y represents the input signal
- ⁇ is the learning parameter included in the deep network.
- the training data is Where y m is the input data, s m is the label, M is the number of training data.
- TanhSum(x) is designed, which is composed of multiple tanh( ⁇ ) functions.
- the expression of the soft step function is:
- a TanhSum(x) function of 2l+1 steps is composed of 2l tanh( ⁇ ) functions;
- ⁇ is the sharpness coefficient, which is a hyperparameter, which needs to be set before network training, sharp The coefficient determines the smoothness of the soft step function. The larger the coefficient, the closer to the hard step function; the number of quantization steps 2l+1, the network parameter limit G b, and the adjacent step interval G are all determined in step (3) Hyperparameters.
- the soft step function Q s (x) after introducing learnable parameters can be expressed as:
- w 1t is used to adjust the height of the t-th step
- w 2t is used to adjust the width of the t-th step
- b 1t is used to adjust the position of the t-th step in the x-axis direction
- b 2t is used to adjust the t-th step The position of the step in the y-axis direction.
- the L2 norm is chosen as the cost function to learn the learnable parameters of this quantizer.
- ⁇ is the learning parameter of the large deep network after training, here is ⁇ ,l,G b ⁇ are the hyperparameters determined in step (4).
- the training process adopts the annealing strategy, that is, gradually increase the value of the sharpness coefficient ⁇ during the training process, and gradually make the soft step function approach the separated hard step function.
- the sharpness coefficient ⁇ during training is ⁇ 10,100,500 ⁇ in order, when the normalization is equal
- the training is terminated when the square error no longer decreases.
- the soft step function obtained from training is solidified into a quantizer to quantify the parameters of the deep learning network.
- Figure 3 compares the shapes of different quantizers under 3bit quantization (a) and 4bit quantization (b).
- Hard quantizer represents different quantizers based on hard step functions
- soft quantizer represents the quantizer proposed by the present invention. It can be seen from the figure that in the quantizer proposed in the present invention, the quantization steps are not uniform, and it can be seen that the quantizer is adjusted according to the specific distribution of network parameters.
- Figure 4 compares the performance curves of the example network LcgNetV under different quantizers. The performance is measured by the detection bit error rate under different sex to noise ratios.
- the LcgNetV curve represents the unquantified detection performance.
- QLcgNetV hard 3bit and QLcgNetV hard 4bit represent the detection performance of LcgNetV after 3bit and 4bit quantization with ordinary hard step functions.
- QLcgNetV soft 3bit and QLcgNetV soft 4bit represent the The detection performance of LcgNetV is proposed after the number of quantizers are quantized by 3bit and 4bit. It can be seen from the figure that the performance provided by the quantizer provided by the present invention is significantly better than the performance provided by the ordinary quantizer. The 3bit quantization result of the quantizer proposed by the present invention even exceeds that of the ordinary quantizer 4bit quantization. result.
- the invention is a deep learning beam domain channel estimation method applied to a millimeter wave massive MIMO system based on a lens antenna and based on an approximate message propagation algorithm.
- the deep learning beam-domain channel estimation method based on the approximate message propagation algorithm, we require protection as an invention.
- the above are only specific implementations for specific applications, but the true spirit and scope of the present invention are not limited to this. Any person skilled in the art can modify, equivalently replace, improve, etc., to implement channel estimation methods for different applications .
- the present invention is defined by the claims and their equivalent technical solutions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (4)
- 一种针对深度学习网络参数的量化方法,其特征在于包括以下步骤:(1)构建深度学习网络,并根据问题产生训练数据;(2)利用训练数据对所构建的深度学习网络进行训练,确定网络参数;(3)提取步骤(2)所述的网络参数,根据网络参数界限和量化台阶数确定量化比特数和相邻量化台阶间隔;(4)设计量化器,所述量化器由带有可学习参数的处处可导的软阶梯函数构成,由步骤(3)所述的网络参数界限、量化台阶数和相邻量化台阶间隔确定软阶梯函数的函数表达式,并引入可学习参数作为量化器参数;(5)将步骤(4)所述的量化器引入深度学习网络并量化步骤(2)得到的网络参数,采用步骤(1)所述的训练数据训练量化器参数,训练过程采用退火策略;(6)利用步骤(5)得到的训练后的量化器参数,运用训练好的量化器对步骤(2)得到的网络参数进行量化。
- 如权利要求1所述的针对深度学习网络参数的量化方法,其特征在于所述的步骤(4)具体为:设计可引入深度学习网络的处处可导的软阶梯函数TanhSum(x),其由多个tanh(·)函数组成,该软阶梯函数的表达式为:其中2l+1表示该软阶梯函数台阶数,一个2l+1台阶的TanhSum(x)函数由2l个tanh(·)函数组成;σ是为锐利系数,为超参数决定了该软阶梯函数的平滑程度,σ越大,越接近硬阶梯函数;量化台阶数2l+1、网络参数界限G b以及相邻量化台阶间隔G都是步骤(3)中所确定的超参数;将可学习参数引入软阶梯函数,使其能够被学习并根据深度学习网络参数的特征调整 形状,引入可学习参数后的软阶梯函数Q s(x)可表示为:其中,w 1t用于调整第t个台阶的高度,w 2t用于调整第t个台阶的宽度,b 1t用于调整第t个台阶在x轴方向的位置,b 2t用于调整第t个台阶在y轴方向的位置;将带有可学习参数的软阶梯函数作为量化器,选择L2范数作为代价函数对此量化器的可学习参数进行学习。
- 如权利要求1所述的针对深度学习网络参数的量化方法,其特征在于所述的步骤(5)具体为:将步骤(4)得到的带有可学习参数的软阶梯函数引入训练后的深度学习网络以量化步骤(2)得到的网络参数,运用训练数据训练量化器参数;训练过程采用退火策略,依次增大σ,使软阶梯函数逐渐趋近于硬阶梯函数,训练过程中采用步骤(1)所述的训练数据进行训练,当归一化均方误差不再下降时训练终止。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910521633.XA CN110378467A (zh) | 2019-06-17 | 2019-06-17 | 一种针对深度学习网络参数的量化方法 |
CN201910521633.X | 2019-06-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253692A1 true WO2020253692A1 (zh) | 2020-12-24 |
Family
ID=68249558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096430 WO2020253692A1 (zh) | 2019-06-17 | 2020-06-16 | 一种针对深度学习网络参数的量化方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110378467A (zh) |
WO (1) | WO2020253692A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378467A (zh) * | 2019-06-17 | 2019-10-25 | 浙江大学 | 一种针对深度学习网络参数的量化方法 |
CN112564118B (zh) * | 2020-11-23 | 2022-03-18 | 广西大学 | 一种分布式可拓展量子深宽度学习的实时电压控制方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980641A (zh) * | 2017-02-09 | 2017-07-25 | 上海交通大学 | 基于卷积神经网络的无监督哈希快速图片检索系统及方法 |
US20180107925A1 (en) * | 2016-10-19 | 2018-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for neural network quantization |
CN108717570A (zh) * | 2018-05-23 | 2018-10-30 | 电子科技大学 | 一种脉冲神经网络参数量化方法 |
CN110378467A (zh) * | 2019-06-17 | 2019-10-25 | 浙江大学 | 一种针对深度学习网络参数的量化方法 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399487B (zh) * | 2013-07-30 | 2015-10-21 | 东北石油大学 | 一种基于非线性多入多出mimo系统的解耦控制方法及其装置 |
US10373050B2 (en) * | 2015-05-08 | 2019-08-06 | Qualcomm Incorporated | Fixed point neural network based on floating point neural network quantization |
CN105790813B (zh) * | 2016-05-17 | 2018-11-06 | 重庆邮电大学 | 一种大规模mimo下基于深度学习的码本选择方法 |
US20180107926A1 (en) * | 2016-10-19 | 2018-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for neural network quantization |
CN106656461B (zh) * | 2016-11-25 | 2019-05-28 | 中国石油大学(华东) | 一种信号量化情形下的混沌神经网络保密通信方法 |
CN109670057B (zh) * | 2019-01-03 | 2021-06-29 | 电子科技大学 | 一种渐进式的端到端深度特征量化系统及方法 |
-
2019
- 2019-06-17 CN CN201910521633.XA patent/CN110378467A/zh not_active Withdrawn
-
2020
- 2020-06-16 WO PCT/CN2020/096430 patent/WO2020253692A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180107925A1 (en) * | 2016-10-19 | 2018-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for neural network quantization |
CN106980641A (zh) * | 2017-02-09 | 2017-07-25 | 上海交通大学 | 基于卷积神经网络的无监督哈希快速图片检索系统及方法 |
CN108717570A (zh) * | 2018-05-23 | 2018-10-30 | 电子科技大学 | 一种脉冲神经网络参数量化方法 |
CN110378467A (zh) * | 2019-06-17 | 2019-10-25 | 浙江大学 | 一种针对深度学习网络参数的量化方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110378467A (zh) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018209932A1 (zh) | 多量化深度二值特征学习方法及装置 | |
CN108648188B (zh) | 一种基于生成对抗网络的无参考图像质量评价方法 | |
CN107832787B (zh) | 基于双谱自编码特征的雷达辐射源识别方法 | |
WO2020253692A1 (zh) | 一种针对深度学习网络参数的量化方法 | |
WO2020238237A1 (zh) | 一种基于幂指数量化的神经网络压缩方法 | |
CN111901024B (zh) | 基于抗拟合深度学习的mimo信道状态信息反馈方法 | |
CN110445581B (zh) | 基于卷积神经网络降低信道译码误码率的方法 | |
CN110276451A (zh) | 一种基于权重归一化的深度神经网络压缩方法 | |
CN112215054B (zh) | 一种用于水声信号去噪的深度生成对抗方法 | |
CN107885787A (zh) | 基于谱嵌入的多视角特征融合的图像检索方法 | |
WO2018076331A1 (zh) | 一种神经网络训练方法及装置 | |
Tian et al. | A data reconstruction algorithm based on neural network for compressed sensing | |
CN113806559B (zh) | 一种基于关系路径与双层注意力的知识图谱嵌入方法 | |
CN105184742A (zh) | 一种基于拉普拉斯图特征向量的稀疏编码的图像去噪方法 | |
CN110351212A (zh) | 快衰落信道下基于卷积神经网络的信道估计方法 | |
CN113467949A (zh) | 边缘计算环境下用于分布式dnn训练的梯度压缩方法 | |
CN110942106A (zh) | 一种基于平方平均的池化卷积神经网络图像分类方法 | |
CN107809399B (zh) | 一种针对量化接收信号的多天线毫米波信道估计方法 | |
WO2022227957A1 (zh) | 一种基于图自编码器的融合子空间聚类方法及系统 | |
CN108737298B (zh) | 一种基于图像处理的scma盲检测方法 | |
CN112818152A (zh) | 一种深度聚类模型的数据增强方法和装置 | |
WO2021117163A1 (ja) | 光通信の状態推定方法、装置及びプログラム | |
CN113762500B (zh) | 一种卷积神经网络在量化时提高模型精度的训练方法 | |
CN106028043B (zh) | 基于新的邻域函数的三维自组织映射图像编码方法 | |
CN110794893A (zh) | 一种面向量子计算的多层噪声高精度温度控制方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20826550 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20826550 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20826550 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 160922) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20826550 Country of ref document: EP Kind code of ref document: A1 |