CN113128116B - Pure integer quantization method for lightweight neural network - Google Patents

Pure integer quantization method for lightweight neural network Download PDF

Info

Publication number
CN113128116B
CN113128116B CN202110421738.5A CN202110421738A CN113128116B CN 113128116 B CN113128116 B CN 113128116B CN 202110421738 A CN202110421738 A CN 202110421738A CN 113128116 B CN113128116 B CN 113128116B
Authority
CN
China
Prior art keywords
feature map
channel
weights
layer
maximum value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110421738.5A
Other languages
Chinese (zh)
Other versions
CN113128116A (en
Inventor
姜伟雄
哈亚军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN202110421738.5A priority Critical patent/CN113128116B/en
Publication of CN113128116A publication Critical patent/CN113128116A/en
Priority to PCT/CN2021/119513 priority patent/WO2022222369A1/en
Priority to US17/799,933 priority patent/US11934954B2/en
Application granted granted Critical
Publication of CN113128116B publication Critical patent/CN113128116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Processing (AREA)

Abstract

The application provides a pure integer quantification method for a lightweight neural network, which is characterized by comprising the following steps of: obtaining the maximum value of the pixel values of each channel of the current layer of feature map; dividing the pixel value of each pixel of each channel of the feature map by the t power of the maximum value, t e [0,1]; multiplying the value of each channel of the weight by the maximum value of the pixel value of the corresponding feature map channel; and convolving the processed feature map with the processed weight to obtain a next layer of feature map. The algorithm provided by the application is respectively verified on SkyNet and MobileNet, INT8 lossless quantization is obtained on SkyNet, and the highest quantization precision is obtained on MobileNet v 2.

Description

Pure integer quantization method for lightweight neural network
Technical Field
The application relates to a quantization method for a lightweight neural network.
Background
In recent years, a great deal of work has explored quantization techniques for traditional models. But these techniques can be applied to lightweight networks with significant loss of accuracy. Such as: jacob Benoit et al quantization and training of neural networks for efficient integer-arithmetical-only reference. InCVPR, pages 2704-2713,2018 reduced from 73.03% to 0.1% in imageNet dataset accuracy when quantifying MobileNet v 2; raghura Krishnamoorthi.Quantizing deep convolutional networks for efficient inference: A whistepaper.CoRR, abs/1806.08342,2018 achieved a 2% loss of accuracy. To recover these loss of accuracy, many efforts employ retraining or quantization techniques at training. But these techniques are time consuming and require data set support. Nagel et al propose a DFQ algorithm to solve the above-mentioned problem, and they believe that the difference in distribution of weights results in the traditional quantization method performing poorly on models employing deep-split convolution. For this reason Nagel et al propose cross-layer weight balancing, adjusting the balance of weights between different layers. However, this technique can only be applied to network models with ReLU as an activation function, but most lightweight networks currently employ ReLU6. Direct replacement of ReLU6 with ReLU would in turn result in significant loss of accuracy. And the method proposed by Nagel et al is not suitable for pure integer quantization.
Disclosure of Invention
The application aims to solve the technical problems that: simply combining lightweight neural network techniques and quantization techniques can result in either a significant degradation in accuracy or a longer retrain time; furthermore, many quantization methods only quantize the weights and feature maps, but the bias and quantization coefficients are also floating point numbers, which is very unfriendly to an ASIC/FPGA.
In order to solve the technical problems, the technical scheme of the application provides a pure integer quantization method for a lightweight neural network, which is characterized by comprising the following steps:
step 1, setting N channels in a feature map, wherein N is more than or equal to 1, and obtaining the maximum value of pixel values of each channel in the feature map of the current layer;
step 2, pixels of each channel of the feature map are processed as follows:
dividing the pixel value of each pixel of the nth channel of the feature map by the power t of the maximum value of the nth channel obtained in the step 1, wherein t is [0,1];
there are N sets of weights corresponding to N channels of the next layer of feature map, each set of weights is composed of N weights corresponding to N channels of the current layer of feature map, and each set of weights is processed as follows:
the N weights in the nth group of weights are respectively and correspondingly multiplied by the maximum value of the pixel values of the N channels obtained in the step 1;
and step 3, convolving the feature map processed in the step 2 with the N groups of weights processed in the step 2 to obtain a next layer of feature map.
Preferably, when t=0, no imbalance transfer is done; when t=1, the imbalance among the channels of the characteristic map of the previous layer is completely transferred to the weight of the subsequent layer.
Preferably, the current layer is any layer except the last layer in the lightweight neural network.
The algorithm provided by the application is respectively verified on SkyNet and MobileNet, INT8 lossless quantization is obtained on SkyNet, and the highest quantization precision is obtained on MobileNet v 2.
Drawings
FIG. 1 is a schematic diagram of 1X1 convolution with unbalanced transitions.
Detailed Description
The application will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
The inventor analyzes and models the quantization flow of the neural network, and finds that the equalization of tensors can be used as a prediction index of quantization errors. Under the guidance of the index, the application provides an adjustable unbalanced shift algorithm to optimize the quantization error of the feature map, which comprises the following specific contents:
in view of the current neural network computing mode, the weights can be quantized channel by channel, and the feature images can only be quantized layer by layer, so that the quantization error of the weights is smaller, but the quantization error of the feature images is larger.
The application divides the pixel value of each pixel of each channel of the current layer of characteristic diagram in the neural network by the maximum value of the pixel value of the channel where the pixel value is located, and then carries out quantization to realize equivalent channel-by-channel quantization. In order to ensure that the calculation result is unchanged, the value of each channel of the weight convolved with the feature map is multiplied by the maximum value of the pixel value of the corresponding feature map channel. This achieves that the imbalance between the channels of the feature map of the current layer is all transferred to the weights of the following layer.
In practice, however, the overall shift of the inter-channel non-uniformities of the feature map is not an optimal solution. In order to adjust the degree of the imbalance transfer, the application additionally adds a super-parameter imbalance transfer coefficient t, wherein in the step, the pixel value of each pixel of each channel of the characteristic diagram is divided by the t power of the maximum value of the pixel value of the channel where the pixel value is located, and the range of t is 0 to 1. When t=0, this corresponds to no imbalance transfer; when t=1, this corresponds to the aforementioned shift of all imbalances. By adjusting t, the application can obtain the optimal quantization precision. This operation applies to any network model, any convolution kernel size.
Fig. 1 shows a schematic diagram of unbalanced shift of 1X1 convolution, and tensors circled by dotted lines share the same quantization coefficient. The pixel value of each pixel of each channel of A1 is divided by the maximum value of the pixel value of the respective channel, and then the corresponding channel of W2 is multiplied by this maximum value, so that the calculation result is not changed, but the equalization of A1 is greatly increased. At the same time, the balance of the weights is not significantly reduced. Therefore, the quantization error of the feature map can be reduced, and the accuracy of the quantized model is improved.

Claims (2)

1. A pure integer quantization method for a lightweight neural network, comprising the steps of:
step 1, setting N channels in a feature map, wherein N is more than or equal to 1, and obtaining the maximum value of pixel values of each channel in the feature map of the current layer;
step 2, pixels of each channel of the feature map are processed as follows:
dividing the pixel value of each pixel of the nth channel of the feature map by the power t of the maximum value of the nth channel obtained in the step 1, wherein t is [0,1];
there are N sets of weights corresponding to N channels of the next layer of feature map, each set of weights is composed of N weights corresponding to N channels of the current layer of feature map, and each set of weights is processed as follows:
the N weights in the nth group of weights are respectively and correspondingly multiplied by the maximum value of the pixel values of the N channels obtained in the step 1;
step 3, convolving the feature map processed in the step 2 with the N groups of weights processed in the step 2 to obtain a next layer of feature map;
when t=0, no imbalance transfer is done; when t=1, the imbalance among the channels of the characteristic map of the previous layer is completely transferred to the weight of the subsequent layer.
2. A purely integer quantization method as defined in claim 1, wherein the current layer is any layer of the lightweight neural network other than the last layer.
CN202110421738.5A 2021-04-20 2021-04-20 Pure integer quantization method for lightweight neural network Active CN113128116B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202110421738.5A CN113128116B (en) 2021-04-20 2021-04-20 Pure integer quantization method for lightweight neural network
PCT/CN2021/119513 WO2022222369A1 (en) 2021-04-20 2021-09-22 Integer-only quantification method for lightweight neural network
US17/799,933 US11934954B2 (en) 2021-04-20 2021-09-22 Pure integer quantization method for lightweight neural network (LNN)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110421738.5A CN113128116B (en) 2021-04-20 2021-04-20 Pure integer quantization method for lightweight neural network

Publications (2)

Publication Number Publication Date
CN113128116A CN113128116A (en) 2021-07-16
CN113128116B true CN113128116B (en) 2023-09-26

Family

ID=76779184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110421738.5A Active CN113128116B (en) 2021-04-20 2021-04-20 Pure integer quantization method for lightweight neural network

Country Status (3)

Country Link
US (1) US11934954B2 (en)
CN (1) CN113128116B (en)
WO (1) WO2022222369A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113128116B (en) * 2021-04-20 2023-09-26 上海科技大学 Pure integer quantization method for lightweight neural network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528589A (en) * 2015-12-31 2016-04-27 上海科技大学 Single image crowd counting algorithm based on multi-column convolutional neural network
WO2018073975A1 (en) * 2016-10-21 2018-04-26 Nec Corporation Improved sparse convolution neural network
CN110930320A (en) * 2019-11-06 2020-03-27 南京邮电大学 Image defogging method based on lightweight convolutional neural network
CN111402143A (en) * 2020-06-03 2020-07-10 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN111937010A (en) * 2018-03-23 2020-11-13 亚马逊技术股份有限公司 Accelerated quantized multiplication and addition operations
CN112560355A (en) * 2021-02-22 2021-03-26 常州微亿智造科技有限公司 Method and device for predicting Mach number of wind tunnel based on convolutional neural network

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000074850A2 (en) * 1999-06-03 2000-12-14 University Of Washington Microfluidic devices for transverse electrophoresis and isoelectric focusing
WO2005048185A1 (en) * 2003-11-17 2005-05-26 Auckland University Of Technology Transductive neuro fuzzy inference method for personalised modelling
KR102601604B1 (en) * 2017-08-04 2023-11-13 삼성전자주식회사 Method and apparatus for quantizing parameter of neural network
JP6977864B2 (en) * 2018-03-02 2021-12-08 日本電気株式会社 Inference device, convolution operation execution method and program
US11755880B2 (en) * 2018-03-09 2023-09-12 Canon Kabushiki Kaisha Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
US10527699B1 (en) 2018-08-01 2020-01-07 The Board Of Trustees Of The Leland Stanford Junior University Unsupervised deep learning for multi-channel MRI model estimation
US11704555B2 (en) * 2019-06-24 2023-07-18 Baidu Usa Llc Batch normalization layer fusion and quantization method for model inference in AI neural network engine
CN111311538B (en) 2019-12-28 2023-06-06 北京工业大学 Multi-scale lightweight road pavement detection method based on convolutional neural network
US11477464B2 (en) * 2020-09-16 2022-10-18 Qualcomm Incorporated End-to-end neural network based video coding
CN112418397B (en) 2020-11-19 2021-10-26 重庆邮电大学 Image classification method based on lightweight convolutional neural network
CN112488070A (en) 2020-12-21 2021-03-12 上海交通大学 Neural network compression method for remote sensing image target detection
CN113128116B (en) 2021-04-20 2023-09-26 上海科技大学 Pure integer quantization method for lightweight neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528589A (en) * 2015-12-31 2016-04-27 上海科技大学 Single image crowd counting algorithm based on multi-column convolutional neural network
WO2018073975A1 (en) * 2016-10-21 2018-04-26 Nec Corporation Improved sparse convolution neural network
CN111937010A (en) * 2018-03-23 2020-11-13 亚马逊技术股份有限公司 Accelerated quantized multiplication and addition operations
CN110930320A (en) * 2019-11-06 2020-03-27 南京邮电大学 Image defogging method based on lightweight convolutional neural network
CN111402143A (en) * 2020-06-03 2020-07-10 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN112560355A (en) * 2021-02-22 2021-03-26 常州微亿智造科技有限公司 Method and device for predicting Mach number of wind tunnel based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于图像融合的实时去雾硬件加速器设计与实现;刘冠宇;《信息科技》;正文 *

Also Published As

Publication number Publication date
US11934954B2 (en) 2024-03-19
WO2022222369A1 (en) 2022-10-27
US20230196095A1 (en) 2023-06-22
CN113128116A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN111260022B (en) Full INT8 fixed-point quantization method for convolutional neural network
CN109087273B (en) Image restoration method, storage medium and system based on enhanced neural network
CN113011571B (en) INT8 offline quantization and integer inference method based on Transformer model
CN113052868B (en) Method and device for training matting model and image matting
CN111612147A (en) Quantization method of deep convolutional network
JP2020119518A (en) Method and device for transforming cnn layers to optimize cnn parameter quantization to be used for mobile devices or compact networks with high precision via hardware optimization
CN111696149A (en) Quantization method for stereo matching algorithm based on CNN
CN113128116B (en) Pure integer quantization method for lightweight neural network
CN114139683A (en) Neural network accelerator model quantization method
US20200372340A1 (en) Neural network parameter optimization method and neural network computing method and apparatus suitable for hardware implementation
CN111985495A (en) Model deployment method, device, system and storage medium
CN112465844A (en) Multi-class loss function for image semantic segmentation and design method thereof
US11531884B2 (en) Separate quantization method of forming combination of 4-bit and 8-bit data of neural network
CN112465140A (en) Convolutional neural network model compression method based on packet channel fusion
CN108322749A (en) The coefficient optimization method of RDOQ, the accelerating method and device of RDOQ
WO2018076331A1 (en) Neural network training method and apparatus
CN114708496A (en) Remote sensing change detection method based on improved spatial pooling pyramid
CN112183726A (en) Neural network full-quantization method and system
CN110837885B (en) Sigmoid function fitting method based on probability distribution
CN112446487A (en) Method, device, system and storage medium for training and applying neural network model
CN116524173A (en) Deep learning network model optimization method based on parameter quantization
CN116227563A (en) Convolutional neural network compression and acceleration method based on data quantization
CN115062690A (en) Bearing fault diagnosis method based on domain adaptive network
WO2022247368A1 (en) Methods, systems, and mediafor low-bit neural networks using bit shift operations
CN110751259A (en) Network layer operation method and device in deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant