CN113762496B - Method for reducing low-bit convolutional neural network reasoning operation complexity - Google Patents

Method for reducing low-bit convolutional neural network reasoning operation complexity Download PDF

Info

Publication number
CN113762496B
CN113762496B CN202010497777.9A CN202010497777A CN113762496B CN 113762496 B CN113762496 B CN 113762496B CN 202010497777 A CN202010497777 A CN 202010497777A CN 113762496 B CN113762496 B CN 113762496B
Authority
CN
China
Prior art keywords
quantization
quantized
feature map
int
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010497777.9A
Other languages
Chinese (zh)
Other versions
CN113762496A (en
Inventor
张东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ingenic Technology Co ltd
Original Assignee
Hefei Ingenic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ingenic Technology Co ltd filed Critical Hefei Ingenic Technology Co ltd
Priority to CN202010497777.9A priority Critical patent/CN113762496B/en
Publication of CN113762496A publication Critical patent/CN113762496A/en
Application granted granted Critical
Publication of CN113762496B publication Critical patent/CN113762496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a method for reducing the inference operation complexity of a low-bit convolutional neural network, which comprises the following steps of quantizing by using stored data after S1 neural network training is finished, and assuming the quantization of an ith layer: Wherein delta i is an activation function, Q A is a quantization formula of feature map, and Q w is a quantization formula of weight; s2 when the parameters of the formula in S1 meet the conditions, quantifying The method comprises the following steps of obtaining by fixed point number operation: S3, determining a threshold value from quantization of feature map: quantification of feature map: Directly deriving a threshold value (0.5, 1.5 … (2 k -0.5)) from the quantization formula of feature map, where k is the quantized bit width; since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization Wherein the method comprises the steps ofThen the threshold valueWhere k is the quantized bit width; s4 since the quantized feature map value is determined when quantized to low bits, and Q A is uniformly quantized, S2 isThe final quantized result is obtained by comparison with a series of thresholds (T 1,T2…Tn). The application solves the problems of high calculation complexity and high calculation resource requirement in the low-bit model reasoning process.

Description

Method for reducing low-bit convolutional neural network reasoning operation complexity
Technical Field
The invention relates to the technical field of neural network acceleration, in particular to a method for reducing the reasoning operation complexity of a low-bit convolutional neural network.
Background
In recent years, with rapid development of technology, a large data age has come. Deep learning takes a Deep Neural Network (DNN) as a model, and has quite remarkable results in many key fields of artificial intelligence, such as image recognition, reinforcement learning, semantic analysis and the like. The Convolutional Neural Network (CNN) is used as a typical DNN structure, can effectively extract hidden layer characteristics of images, accurately classifies the images, and is widely applied to the fields of image recognition and detection in recent years.
In particular, multiplying the shift achieves 32 bits quantization as low bits: and storing the result obtained by the quantized convolution operation as 32bit shaping, and then performing multiplication and shift operation according to the pre-calculated parameters to realize conversion from 32bit to low bit.
However, when the 32 bits are quantized into the low bits in the prior art, since the accuracy after quantization needs to be ensured, a series of addition and comparison operations need to be performed in the quantization process, so that the computational complexity and the computational resource are greatly increased, and the cost is often too high especially when the quantization is performed to 2 bits.
Furthermore, the common terminology in the prior art is as follows:
Convolutional neural network (Convolutional Neural Networks, CNN): is a type of feedforward neural network that includes convolution calculations and has a depth structure.
Quantification: quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) discrete values.
Low bits: the data is quantized to 8bit,4bit or 2bit wide data.
Reasoning: and after the neural network training is finished, performing operation by using the stored data.
Disclosure of Invention
The application provides a method for reducing the inference operation complexity of a low-bit convolutional neural network, which aims to overcome the defects in the prior art and solve the problems of high computation complexity and high computation resource requirement in the existing low-bit model inference process.
Specifically, the invention provides a method for reducing the inference operation complexity of a low-bit convolutional neural network, which comprises the following steps:
S1, after the neural network training is finished, the stored data is used for quantization,
Let the quantization of the i-th layer be as follows:
Wherein delta i is an activation function, Q A is a quantization formula of feature map, and Q w is a quantization formula of weight;
s2, when the parameters of the formula in S1 meet the following conditions:
1)、 Representing i.e./>, with floating point scalar scaled fixed point numbers W int is a fixed point number expressed in an integer;
2)、 Representing i.e./>, with floating point scalar scaled fixed point numbers X int is a fixed point number expressed in an integer;
3) Delta i is a monotonic function;
then, quantize Obtained by fixed point number operation, namely:
s3, determining a threshold value from quantization of feature map:
The quantization formula of feature map is:
The threshold value (0.5, 1.5 … (2 k -0.5)) can be directly deduced from the quantization formula of the feature map above, where k is the quantized bit width;
since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization Wherein/>Threshold/>Where k is the quantized bit width;
s4, since the value of the feature map after quantization is already determined when the quantization is low-bit, and Q A is uniformly quantized, δi(swsxsBN(wint·xint+bi/(swsxsBN))) in S2 obtains the final quantization result by comparing with a series of the thresholds (T 1,T2…Tn) in step S3.
And when the quantization is carried out to the low bit 2bit in the step S2, the quantized feature map has the values of 0,1,2 and 3.
Since δ i is a monotonic function in step S2, S wsx > 0, it is also possible to pass (w int·xint+bi/(swsxsBN)) andComparison between them to obtain quantized results.
In the step S4, since S BN are different for each channel, one for each channel is required to be saved when the threshold is saved.
Thus, the present application has the advantages that:
1. the 32 bits are quantized into low bits directly through threshold comparison, so that the complexity of operation is reduced;
2. The overall running time of the quantization model is reduced;
3. the demand of operation resources is reduced;
The operation of 64 bits by 64 bits is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application.
FIG. 1 is a schematic flow chart of the method of the present invention.
Detailed Description
In order that the technical content and advantages of the present invention may be more clearly understood, a further detailed description of the present invention will now be made with reference to the accompanying drawings.
As shown in fig. 1, the method for reducing the inference operation complexity of the low-bit convolutional neural network comprises the following steps:
S1, after the neural network training is finished, the stored data is used for quantization,
Let the quantization of the i-th layer be as follows:
Wherein delta i is an activation function, Q A is a quantization formula of feature map, and Q w is a quantization formula of weight;
s2, when the parameters of the formula in S1 meet the following conditions:
1)、 Representing i.e./>, with floating point scalar scaled fixed point numbers W int is a fixed point number expressed in an integer;
2)、 Representing i.e./>, with floating point scalar scaled fixed point numbers X int is a fixed point number expressed in an integer;
3) Delta i is a monotonic function;
then, quantize Obtained by fixed point number operation, namely:
s3, determining a threshold value from quantization of feature map:
The quantization formula of feature map is:
The threshold value (0.5, 1.5 … (2 k -0.5)) can be directly deduced from the quantization formula of the feature map above, where k is the quantized bit width;
since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization Wherein/>Threshold/>Where k is the quantized bit width;
S4, since the value of the feature map after quantization is already determined when the quantization is low-bit, and Q A is uniformly quantized, δi(swsxsBN(wint·xint+bi/(swsxsBN))) in S2 obtains the final quantization result by comparing with a series of the thresholds (T 1,t2…Tn) in step S3.
In particular, the method of the application can also be expressed as follows:
suppose the quantization calculation for the i-th layer is as follows:
Wherein delta i is an activation function, Q A is a quantization formula of feature map, and Q w is a quantization formula of weight
The parameters in the above formula meet the following conditions:
1、 i.e./>, can be represented by fixed point numbers scaled with floating point scalar W int is the fixed point number expressed in integers
2、I.e./>, can be represented by fixed point numbers scaled with floating point scalarX int is the fixed point number expressed in integers
3. Delta i is a monotonic function
So calculate the finalThe method can be obtained by calculating the fixed point number:
Since the value of the quantized feature map is actually determined (taking 2 bits as an example, the feature map takes 0,1,2, 3) and Q A is uniformly quantized when quantized to low bits, δi(swsxsBN(wint·xint+bi/(swsxsBN))) can be compared with a series of thresholds (T 1,T2…Tn) to obtain a quantized result, and since δ i is a monotonic function, s wsx > 0, it can also be obtained by (w int·xint+bi/(swsxsBN) and (d) a quantization result Comparison between them to obtain quantized results.
The determination of the threshold needs to start with the quantization formula of the feature map.
The quantization formula of feature map is:
from the above equation, the threshold value can be directly deduced to be (0.5, 1.5 … (2 k -0.5)), where k is the quantized bit width. Since the distance between the thresholds is 1.0, we only need to preserve at the final quantization Wherein the method comprises the steps ofThreshold value/> Where k is the quantized bit width; since s BN are different for each channel, one per channel is required to save the threshold.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A method for reducing the complexity of the reasoning operation of a low-bit convolutional neural network, which is suitable for image recognition and detection, comprising the following steps:
S1, after the neural network training is finished, the stored data is used for quantization,
Let the quantization of the i-th layer be as follows:
Wherein delta i is an activation function, Q A is a quantization formula of feature map, and Q w is a quantization formula of weight;
s2, when the parameters of the formula in S1 meet the following conditions:
1)、 Representing i.e./>, with floating point scalar scaled fixed point numbers W int is a fixed point number expressed in an integer;
2)、 Representing i.e./>, with floating point scalar scaled fixed point numbers X int is a fixed point number expressed in an integer;
3) Delta i is a monotonic function;
then, quantize Obtained by fixed point number operation, namely:
s3, determining a threshold value from quantization of feature map:
The quantization formula of feature map is:
Directly deriving a threshold value (0.5, 15 … (2 k -0.5)) from the quantization formula of the feature map, where k is the quantized bit width;
since the distance between the thresholds is 1.0, only the preservation is needed in the final quantization Wherein the method comprises the steps ofThreshold/>Where k is the quantized bit width;
S4, since the value of the feature map after quantization is already determined when the quantization is low-bit, and Q A is uniformly quantized, δi(swsxsBN(wint·xint+bi/(swsxsBN))) in S2 obtains the final quantization result by comparing with a series of thresholds (T 1,T2…Tn) in step S3.
2. The method for reducing the complexity of the reasoning operation of the low-bit convolutional neural network according to claim 1, wherein the quantized feature map has a value of 0,1,2,3 when the low-bit 2bit is quantized in the step S2.
3. The method for reducing the complexity of the low-bit convolutional neural network inference operation according to claim 1, wherein in the step S2, δ i is a monotonic function, S wsx > 0, so (w int·xint+bi/(swsxsBN)) and (w int·xint+bi/(swsxsBN)) can be used as wellComparison between them to obtain quantized results.
4. The method of claim 1, wherein in step S4, since S BN are different for each channel, one needs to be stored for each channel when storing the threshold.
CN202010497777.9A 2020-06-04 2020-06-04 Method for reducing low-bit convolutional neural network reasoning operation complexity Active CN113762496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497777.9A CN113762496B (en) 2020-06-04 2020-06-04 Method for reducing low-bit convolutional neural network reasoning operation complexity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497777.9A CN113762496B (en) 2020-06-04 2020-06-04 Method for reducing low-bit convolutional neural network reasoning operation complexity

Publications (2)

Publication Number Publication Date
CN113762496A CN113762496A (en) 2021-12-07
CN113762496B true CN113762496B (en) 2024-05-03

Family

ID=78783418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497777.9A Active CN113762496B (en) 2020-06-04 2020-06-04 Method for reducing low-bit convolutional neural network reasoning operation complexity

Country Status (1)

Country Link
CN (1) CN113762496B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944458A (en) * 2017-12-08 2018-04-20 北京维大成科技有限公司 A kind of image-recognizing method and device based on convolutional neural networks
GB201821150D0 (en) * 2018-12-21 2019-02-06 Imagination Tech Ltd Methods and systems for selecting quantisation parameters for deep neural neitworks using back-propagation
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks
CN110188877A (en) * 2019-05-30 2019-08-30 苏州浪潮智能科技有限公司 A kind of neural network compression method and device
JP2019160319A (en) * 2018-03-09 2019-09-19 キヤノン株式会社 Method and device for optimizing and applying multi-layer neural network model, and storage medium
CN110363281A (en) * 2019-06-06 2019-10-22 上海交通大学 A kind of convolutional neural networks quantization method, device, computer and storage medium
US10592799B1 (en) * 2019-01-23 2020-03-17 StradVision, Inc. Determining FL value by using weighted quantization loss values to thereby quantize CNN parameters and feature values to be used for optimizing hardware applicable to mobile devices or compact networks with high precision
CN111105007A (en) * 2018-10-26 2020-05-05 中国科学院半导体研究所 Compression acceleration method of deep convolutional neural network for target detection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270187B2 (en) * 2017-11-07 2022-03-08 Samsung Electronics Co., Ltd Method and apparatus for learning low-precision neural network that combines weight quantization and activation quantization
US11755880B2 (en) * 2018-03-09 2023-09-12 Canon Kabushiki Kaisha Method and apparatus for optimizing and applying multilayer neural network model, and storage medium
US11645493B2 (en) * 2018-05-04 2023-05-09 Microsoft Technology Licensing, Llc Flow for quantized neural networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944458A (en) * 2017-12-08 2018-04-20 北京维大成科技有限公司 A kind of image-recognizing method and device based on convolutional neural networks
JP2019160319A (en) * 2018-03-09 2019-09-19 キヤノン株式会社 Method and device for optimizing and applying multi-layer neural network model, and storage medium
CN111105007A (en) * 2018-10-26 2020-05-05 中国科学院半导体研究所 Compression acceleration method of deep convolutional neural network for target detection
GB201821150D0 (en) * 2018-12-21 2019-02-06 Imagination Tech Ltd Methods and systems for selecting quantisation parameters for deep neural neitworks using back-propagation
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks
US10592799B1 (en) * 2019-01-23 2020-03-17 StradVision, Inc. Determining FL value by using weighted quantization loss values to thereby quantize CNN parameters and feature values to be used for optimizing hardware applicable to mobile devices or compact networks with high precision
CN110188877A (en) * 2019-05-30 2019-08-30 苏州浪潮智能科技有限公司 A kind of neural network compression method and device
CN110363281A (en) * 2019-06-06 2019-10-22 上海交通大学 A kind of convolutional neural networks quantization method, device, computer and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Quantizing deep convolutional networks for efficient inference: A whitepaper;Krishnamoorthi R;arXiv preprint arXiv:1806.08342;20181231;全文 *
Towards effective low-bitwidth convolutional neural networks;Zhuang B等;Proceedings of the IEEE conference on computer vision and pattern recognition;20181231;全文 *
卷积神经网络低位宽量化推理研究;付强等;计算机与数字工程;20191231;全文 *
基于位量化的深度神经网络加速与压缩研究;牟帅;中国硕士学位论文全文库 信息科技辑;20180615;全文 *
面向"边缘"应用的卷积神经网络量化与压缩方法;蔡瑞初等;计算机应用;20180423(第09期);全文 *

Also Published As

Publication number Publication date
CN113762496A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN112116030A (en) Image classification method based on vector standardization and knowledge distillation
CN111612147A (en) Quantization method of deep convolutional network
CN110874625B (en) Data processing method and device
CN112183742B (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN111814973B (en) Memory computing system suitable for neural ordinary differential equation network computing
CN112381205A (en) Neural network low bit quantization method
CN109978144B (en) Model compression method and system
CN110647990A (en) Cutting method of deep convolutional neural network model based on grey correlation analysis
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN113762496B (en) Method for reducing low-bit convolutional neural network reasoning operation complexity
CN116884398B (en) Speech recognition method, device, equipment and medium
CN112613604A (en) Neural network quantification method and device
CN116992946B (en) Model compression method, apparatus, storage medium, and program product
CN114169513B (en) Neural network quantization method and device, storage medium and electronic equipment
CN112446461A (en) Neural network model training method and device
CN115841149A (en) Confidence estimation-based satellite fault prediction model training and fault prediction method
CN113762452B (en) Method for quantizing PRELU activation function
CN112735392B (en) Voice processing method, device, equipment and storage medium
CN112561050B (en) Neural network model training method and device
CN113593538B (en) Voice characteristic classification method, related equipment and readable storage medium
CN113762500B (en) Training method for improving model precision during quantization of convolutional neural network
CN113762499A (en) Method for quantizing weight by channels
CN113761834A (en) Method, device and storage medium for acquiring word vector of natural language processing model
CN113762495A (en) Method for improving precision of low bit quantization model of convolutional neural network model
CN112488291B (en) 8-Bit quantization compression method for neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant