CN113128116B - Pure integer quantization method for lightweight neural network - Google Patents
Pure integer quantization method for lightweight neural network Download PDFInfo
- Publication number
- CN113128116B CN113128116B CN202110421738.5A CN202110421738A CN113128116B CN 113128116 B CN113128116 B CN 113128116B CN 202110421738 A CN202110421738 A CN 202110421738A CN 113128116 B CN113128116 B CN 113128116B
- Authority
- CN
- China
- Prior art keywords
- feature map
- channel
- weights
- layer
- maximum value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Image Processing (AREA)
Abstract
The application provides a pure integer quantification method for a lightweight neural network, which is characterized by comprising the following steps of: obtaining the maximum value of the pixel values of each channel of the current layer of feature map; dividing the pixel value of each pixel of each channel of the feature map by the t power of the maximum value, t e [0,1]; multiplying the value of each channel of the weight by the maximum value of the pixel value of the corresponding feature map channel; and convolving the processed feature map with the processed weight to obtain a next layer of feature map. The algorithm provided by the application is respectively verified on SkyNet and MobileNet, INT8 lossless quantization is obtained on SkyNet, and the highest quantization precision is obtained on MobileNet v 2.
Description
Technical Field
The application relates to a quantization method for a lightweight neural network.
Background
In recent years, a great deal of work has explored quantization techniques for traditional models. But these techniques can be applied to lightweight networks with significant loss of accuracy. Such as: jacob Benoit et al quantization and training of neural networks for efficient integer-arithmetical-only reference. InCVPR, pages 2704-2713,2018 reduced from 73.03% to 0.1% in imageNet dataset accuracy when quantifying MobileNet v 2; raghura Krishnamoorthi.Quantizing deep convolutional networks for efficient inference: A whistepaper.CoRR, abs/1806.08342,2018 achieved a 2% loss of accuracy. To recover these loss of accuracy, many efforts employ retraining or quantization techniques at training. But these techniques are time consuming and require data set support. Nagel et al propose a DFQ algorithm to solve the above-mentioned problem, and they believe that the difference in distribution of weights results in the traditional quantization method performing poorly on models employing deep-split convolution. For this reason Nagel et al propose cross-layer weight balancing, adjusting the balance of weights between different layers. However, this technique can only be applied to network models with ReLU as an activation function, but most lightweight networks currently employ ReLU6. Direct replacement of ReLU6 with ReLU would in turn result in significant loss of accuracy. And the method proposed by Nagel et al is not suitable for pure integer quantization.
Disclosure of Invention
The application aims to solve the technical problems that: simply combining lightweight neural network techniques and quantization techniques can result in either a significant degradation in accuracy or a longer retrain time; furthermore, many quantization methods only quantize the weights and feature maps, but the bias and quantization coefficients are also floating point numbers, which is very unfriendly to an ASIC/FPGA.
In order to solve the technical problems, the technical scheme of the application provides a pure integer quantization method for a lightweight neural network, which is characterized by comprising the following steps:
step 1, setting N channels in a feature map, wherein N is more than or equal to 1, and obtaining the maximum value of pixel values of each channel in the feature map of the current layer;
step 2, pixels of each channel of the feature map are processed as follows:
dividing the pixel value of each pixel of the nth channel of the feature map by the power t of the maximum value of the nth channel obtained in the step 1, wherein t is [0,1];
there are N sets of weights corresponding to N channels of the next layer of feature map, each set of weights is composed of N weights corresponding to N channels of the current layer of feature map, and each set of weights is processed as follows:
the N weights in the nth group of weights are respectively and correspondingly multiplied by the maximum value of the pixel values of the N channels obtained in the step 1;
and step 3, convolving the feature map processed in the step 2 with the N groups of weights processed in the step 2 to obtain a next layer of feature map.
Preferably, when t=0, no imbalance transfer is done; when t=1, the imbalance among the channels of the characteristic map of the previous layer is completely transferred to the weight of the subsequent layer.
Preferably, the current layer is any layer except the last layer in the lightweight neural network.
The algorithm provided by the application is respectively verified on SkyNet and MobileNet, INT8 lossless quantization is obtained on SkyNet, and the highest quantization precision is obtained on MobileNet v 2.
Drawings
FIG. 1 is a schematic diagram of 1X1 convolution with unbalanced transitions.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
The inventor analyzes and models the quantization flow of the neural network, and finds that the equalization of tensors can be used as a prediction index of quantization errors. Under the guidance of the index, the application provides an adjustable unbalanced shift algorithm to optimize the quantization error of the feature map, which comprises the following specific contents:
in view of the current neural network computing mode, the weights can be quantized channel by channel, and the feature images can only be quantized layer by layer, so that the quantization error of the weights is smaller, but the quantization error of the feature images is larger.
The application divides the pixel value of each pixel of each channel of the current layer of characteristic diagram in the neural network by the maximum value of the pixel value of the channel where the pixel value is located, and then carries out quantization to realize equivalent channel-by-channel quantization. In order to ensure that the calculation result is unchanged, the value of each channel of the weight convolved with the feature map is multiplied by the maximum value of the pixel value of the corresponding feature map channel. This achieves that the imbalance between the channels of the feature map of the current layer is all transferred to the weights of the following layer.
In practice, however, the overall shift of the inter-channel non-uniformities of the feature map is not an optimal solution. In order to adjust the degree of the imbalance transfer, the application additionally adds a super-parameter imbalance transfer coefficient t, wherein in the step, the pixel value of each pixel of each channel of the characteristic diagram is divided by the t power of the maximum value of the pixel value of the channel where the pixel value is located, and the range of t is 0 to 1. When t=0, this corresponds to no imbalance transfer; when t=1, this corresponds to the aforementioned shift of all imbalances. By adjusting t, the application can obtain the optimal quantization precision. This operation applies to any network model, any convolution kernel size.
Fig. 1 shows a schematic diagram of unbalanced shift of 1X1 convolution, and tensors circled by dotted lines share the same quantization coefficient. The pixel value of each pixel of each channel of A1 is divided by the maximum value of the pixel value of the respective channel, and then the corresponding channel of W2 is multiplied by this maximum value, so that the calculation result is not changed, but the equalization of A1 is greatly increased. At the same time, the balance of the weights is not significantly reduced. Therefore, the quantization error of the feature map can be reduced, and the accuracy of the quantized model is improved.
Claims (2)
1. A pure integer quantization method for a lightweight neural network, comprising the steps of:
step 1, setting N channels in a feature map, wherein N is more than or equal to 1, and obtaining the maximum value of pixel values of each channel in the feature map of the current layer;
step 2, pixels of each channel of the feature map are processed as follows:
dividing the pixel value of each pixel of the nth channel of the feature map by the power t of the maximum value of the nth channel obtained in the step 1, wherein t is [0,1];
there are N sets of weights corresponding to N channels of the next layer of feature map, each set of weights is composed of N weights corresponding to N channels of the current layer of feature map, and each set of weights is processed as follows:
the N weights in the nth group of weights are respectively and correspondingly multiplied by the maximum value of the pixel values of the N channels obtained in the step 1;
step 3, convolving the feature map processed in the step 2 with the N groups of weights processed in the step 2 to obtain a next layer of feature map;
when t=0, no imbalance transfer is done; when t=1, the imbalance among the channels of the characteristic map of the previous layer is completely transferred to the weight of the subsequent layer.
2. A purely integer quantization method as defined in claim 1, wherein the current layer is any layer of the lightweight neural network other than the last layer.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110421738.5A CN113128116B (en) | 2021-04-20 | 2021-04-20 | Pure integer quantization method for lightweight neural network |
PCT/CN2021/119513 WO2022222369A1 (en) | 2021-04-20 | 2021-09-22 | Integer-only quantification method for lightweight neural network |
US17/799,933 US11934954B2 (en) | 2021-04-20 | 2021-09-22 | Pure integer quantization method for lightweight neural network (LNN) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110421738.5A CN113128116B (en) | 2021-04-20 | 2021-04-20 | Pure integer quantization method for lightweight neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113128116A CN113128116A (en) | 2021-07-16 |
CN113128116B true CN113128116B (en) | 2023-09-26 |
Family
ID=76779184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110421738.5A Active CN113128116B (en) | 2021-04-20 | 2021-04-20 | Pure integer quantization method for lightweight neural network |
Country Status (3)
Country | Link |
---|---|
US (1) | US11934954B2 (en) |
CN (1) | CN113128116B (en) |
WO (1) | WO2022222369A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113128116B (en) * | 2021-04-20 | 2023-09-26 | 上海科技大学 | Pure integer quantization method for lightweight neural network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528589A (en) * | 2015-12-31 | 2016-04-27 | 上海科技大学 | Single image crowd counting algorithm based on multi-column convolutional neural network |
WO2018073975A1 (en) * | 2016-10-21 | 2018-04-26 | Nec Corporation | Improved sparse convolution neural network |
CN110930320A (en) * | 2019-11-06 | 2020-03-27 | 南京邮电大学 | Image defogging method based on lightweight convolutional neural network |
CN111402143A (en) * | 2020-06-03 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN111937010A (en) * | 2018-03-23 | 2020-11-13 | 亚马逊技术股份有限公司 | Accelerated quantized multiplication and addition operations |
CN112560355A (en) * | 2021-02-22 | 2021-03-26 | 常州微亿智造科技有限公司 | Method and device for predicting Mach number of wind tunnel based on convolutional neural network |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000074850A2 (en) * | 1999-06-03 | 2000-12-14 | University Of Washington | Microfluidic devices for transverse electrophoresis and isoelectric focusing |
WO2005048185A1 (en) * | 2003-11-17 | 2005-05-26 | Auckland University Of Technology | Transductive neuro fuzzy inference method for personalised modelling |
KR102601604B1 (en) * | 2017-08-04 | 2023-11-13 | 삼성전자주식회사 | Method and apparatus for quantizing parameter of neural network |
JP6977864B2 (en) * | 2018-03-02 | 2021-12-08 | 日本電気株式会社 | Inference device, convolution operation execution method and program |
US11755880B2 (en) * | 2018-03-09 | 2023-09-12 | Canon Kabushiki Kaisha | Method and apparatus for optimizing and applying multilayer neural network model, and storage medium |
US10527699B1 (en) | 2018-08-01 | 2020-01-07 | The Board Of Trustees Of The Leland Stanford Junior University | Unsupervised deep learning for multi-channel MRI model estimation |
US11704555B2 (en) * | 2019-06-24 | 2023-07-18 | Baidu Usa Llc | Batch normalization layer fusion and quantization method for model inference in AI neural network engine |
CN111311538B (en) | 2019-12-28 | 2023-06-06 | 北京工业大学 | Multi-scale lightweight road pavement detection method based on convolutional neural network |
US11477464B2 (en) * | 2020-09-16 | 2022-10-18 | Qualcomm Incorporated | End-to-end neural network based video coding |
CN112418397B (en) | 2020-11-19 | 2021-10-26 | 重庆邮电大学 | Image classification method based on lightweight convolutional neural network |
CN112488070A (en) | 2020-12-21 | 2021-03-12 | 上海交通大学 | Neural network compression method for remote sensing image target detection |
CN113128116B (en) | 2021-04-20 | 2023-09-26 | 上海科技大学 | Pure integer quantization method for lightweight neural network |
-
2021
- 2021-04-20 CN CN202110421738.5A patent/CN113128116B/en active Active
- 2021-09-22 WO PCT/CN2021/119513 patent/WO2022222369A1/en active Application Filing
- 2021-09-22 US US17/799,933 patent/US11934954B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528589A (en) * | 2015-12-31 | 2016-04-27 | 上海科技大学 | Single image crowd counting algorithm based on multi-column convolutional neural network |
WO2018073975A1 (en) * | 2016-10-21 | 2018-04-26 | Nec Corporation | Improved sparse convolution neural network |
CN111937010A (en) * | 2018-03-23 | 2020-11-13 | 亚马逊技术股份有限公司 | Accelerated quantized multiplication and addition operations |
CN110930320A (en) * | 2019-11-06 | 2020-03-27 | 南京邮电大学 | Image defogging method based on lightweight convolutional neural network |
CN111402143A (en) * | 2020-06-03 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN112560355A (en) * | 2021-02-22 | 2021-03-26 | 常州微亿智造科技有限公司 | Method and device for predicting Mach number of wind tunnel based on convolutional neural network |
Non-Patent Citations (1)
Title |
---|
基于图像融合的实时去雾硬件加速器设计与实现;刘冠宇;《信息科技》;正文 * |
Also Published As
Publication number | Publication date |
---|---|
US11934954B2 (en) | 2024-03-19 |
WO2022222369A1 (en) | 2022-10-27 |
US20230196095A1 (en) | 2023-06-22 |
CN113128116A (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111260022B (en) | Full INT8 fixed-point quantization method for convolutional neural network | |
CN109087273B (en) | Image restoration method, storage medium and system based on enhanced neural network | |
CN113011571B (en) | INT8 offline quantization and integer inference method based on Transformer model | |
CN113052868B (en) | Method and device for training matting model and image matting | |
CN111612147A (en) | Quantization method of deep convolutional network | |
JP2020119518A (en) | Method and device for transforming cnn layers to optimize cnn parameter quantization to be used for mobile devices or compact networks with high precision via hardware optimization | |
CN111696149A (en) | Quantization method for stereo matching algorithm based on CNN | |
CN113128116B (en) | Pure integer quantization method for lightweight neural network | |
CN114139683A (en) | Neural network accelerator model quantization method | |
US20200372340A1 (en) | Neural network parameter optimization method and neural network computing method and apparatus suitable for hardware implementation | |
CN111985495A (en) | Model deployment method, device, system and storage medium | |
CN112465844A (en) | Multi-class loss function for image semantic segmentation and design method thereof | |
US11531884B2 (en) | Separate quantization method of forming combination of 4-bit and 8-bit data of neural network | |
CN112465140A (en) | Convolutional neural network model compression method based on packet channel fusion | |
CN108322749A (en) | The coefficient optimization method of RDOQ, the accelerating method and device of RDOQ | |
WO2018076331A1 (en) | Neural network training method and apparatus | |
CN114708496A (en) | Remote sensing change detection method based on improved spatial pooling pyramid | |
CN112183726A (en) | Neural network full-quantization method and system | |
CN110837885B (en) | Sigmoid function fitting method based on probability distribution | |
CN112446487A (en) | Method, device, system and storage medium for training and applying neural network model | |
CN116524173A (en) | Deep learning network model optimization method based on parameter quantization | |
CN116227563A (en) | Convolutional neural network compression and acceleration method based on data quantization | |
CN115062690A (en) | Bearing fault diagnosis method based on domain adaptive network | |
WO2022247368A1 (en) | Methods, systems, and mediafor low-bit neural networks using bit shift operations | |
CN110751259A (en) | Network layer operation method and device in deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |