CN113743601A - Method for retraining compensation activation function of low-bit quantization network - Google Patents

Method for retraining compensation activation function of low-bit quantization network Download PDF

Info

Publication number
CN113743601A
CN113743601A CN202010460267.4A CN202010460267A CN113743601A CN 113743601 A CN113743601 A CN 113743601A CN 202010460267 A CN202010460267 A CN 202010460267A CN 113743601 A CN113743601 A CN 113743601A
Authority
CN
China
Prior art keywords
activation function
size
quantization
relu
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010460267.4A
Other languages
Chinese (zh)
Inventor
周飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ingenic Technology Co ltd
Original Assignee
Hefei Ingenic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ingenic Technology Co ltd filed Critical Hefei Ingenic Technology Co ltd
Priority to CN202010460267.4A priority Critical patent/CN113743601A/en
Publication of CN113743601A publication Critical patent/CN113743601A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a method for retraining a compensation activation function of a low-bit quantization network, which is characterized in that in a full-precision quantization model, the size of compressed data of an original activation function is divided by the size of compressed data of the activation function, and the compressed data is multiplied by the size of the compressed data of the corresponding activation function after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.

Description

Method for retraining compensation activation function of low-bit quantization network
Technical Field
The invention relates to the technical field of neural networks, in particular to a method for retraining a compensation activation function by a low-bit quantization network.
Background
With the rapid development of computer technology, algorithms based on convolutional neural networks are successfully applied to various identification fields. In the prior art, particularly, as the number of stacked layers of convolutional neural network models is increased, features (features) to low bits (low bits) are required to be quantized in order to enable the models to normally operate at a mobile terminal, so that the calculated amount of the models is reduced. However, a series of more quantitative operation nodes are added to the activation function, so that precision loss and long model convergence time occur during low-bit quantization retraining. In the prior art, on the basis of full precision, corresponding quantization operation nodes are added to an activation function, and training is started from the beginning based on full-precision parameter information.
In the prior art, when 32-bit quantization is low bit, due to the requirement of ensuring the precision after quantization, fine adjustment needs to be performed on the basis of the original full precision, and due to the fact that the corresponding quantization operation mode is added to the activation function, the parameter distribution of the low-bit model has a large difference relative to the full precision, the low-bit model needs to be retrained based on the full-precision quantization, but the full-precision level is often difficult to achieve, and even convergence cannot be achieved.
Furthermore, the common terminology in the prior art is as follows:
and (3) activating a function, wherein each neuron node in the neural network receives the output value of the neuron in the previous layer as the input value of the neuron and transmits the input value to the next layer, and the neuron node in the input layer can directly transmit the input attribute value to the next layer (hidden layer or output layer). In the multi-layer neural network, there is a functional relationship between the output of the upper node and the input of the lower node, and this function is called an activation function (also called an excitation function), as shown in fig. 2, Relu ═ max (0, x), and an image of the Relu function and its derivative.
Disclosure of Invention
The method aims to make up the situation that the model convergence is difficult or even not convergent due to the fact that the corresponding quantization operation nodes are added to the activation functions, overcome the defects in the prior art, and solve the problems of network non-convergence, training time and the like in the processes of low-bit quantization fine adjustment and retraining.
The method belongs to a technology for carrying out quantitative retraining on the basis of full-precision quantification of a deep neural network, can effectively improve the degree of polymerization of a model in the process of network retraining, eliminates the problem that the activation function is not converged due to the addition of corresponding quantification nodes, and enables the model not to reach the full-precision model convergence. By the method, the model can reach the full-precision level more quickly, and the condition that the model is not converged due to the addition of the quantization nodes by the activation function is effectively reduced.
Specifically, the invention provides a method for retraining a compensation activation function of a low-bit quantization network, which is characterized in that in a full-precision quantization model, the size of compressed data of an original activation function is divided by the size of compressed data of the activation function, and the compressed data is multiplied by the size of the compressed data of the corresponding activation function after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.
The method further comprises the steps of:
s1, assuming the feature of the ith layer is XiQuantizing the features by using an activation function;
s2, adding a corresponding quantization node process on the basis of full precision to realize characteristic quantization to low bits, and specifically realizing the characteristic quantization to low bits by the following formula:
Xi=clip(Xi,relu_size)
Xi=Round{(Xi/relu_size)*2bit}/2bit
wherein relu _ size is the size of compressed data of an activation function, the characteristics are compressed to [0, relu _ size ], and a characteristic compression range is defined according to the size of the compressed data;
at S3, the calculation of the formula at S2 is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, but only modifying the dispersion of the feature data.
The step S1 of using the activation function quantization feature includes using a relu6 activation function for the 8-bit quantization feature and using a relu3 activation function for the 4-bit quantization feature.
The relu6 activation function is:
relu6(x)=min(max(x,0),6)∈[0,6];
Figure BDA0002510689290000031
thus, the present application has the advantages that:
1. the problem of model non-convergence caused by adding quantization nodes to an activation function during low-bit model quantization is solved;
2. and the quantization retraining time of the model from full precision to low bit is reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention.
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is an image corresponding to the Relu function and its derivative function in the method of the present invention.
Fig. 3 is an image of the Relu6 activation function and derivative function correspondence in the method of the present invention.
Detailed Description
In order that the technical contents and advantages of the present invention can be more clearly understood, the present invention will now be described in further detail with reference to the accompanying drawings.
The invention relates to a method for retraining a compensation activation function of a low-bit quantization network, which divides an original activation function by the size of activation function compressed data in a full-precision quantization model, and multiplies the original activation function by the size of corresponding activation function compressed data after calculation by an original model formula, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.
As shown in fig. 1, the method further comprises the steps of:
s1, assuming the feature of the ith layer is XiQuantizing the features by using an activation function;
s2, adding a corresponding quantization node process on the basis of full precision to realize characteristic quantization to low bits, and specifically realizing the characteristic quantization to low bits by the following formula:
Xi=clip(Xi,relu_size)
Xi=Round{(Xi/relu_size)*2bit}/2bit
wherein relu _ size is the size of compressed data of an activation function, the characteristics are compressed to [0, relu _ size ], and a characteristic compression range is defined according to the size of the compressed data;
at S3, the calculation of the formula at S2 is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, but only modifying the dispersion of the feature data.
The activation function is: f (x) max (0, x).
The step S1 of using the activation function quantization feature includes using a relu6 activation function for an 8-bit quantization feature, and using a relu3 activation function relu3(x) ═ min (max (x,0),3) for a 4-bit quantization feature.
As shown in fig. 3, Relu performs linear activation using x in the region where x >0, which may cause the value after activation to be too large and affect the stability of the model, and to counteract the linear growth part of the Relu excitation function, the Relu6 function may be used. The relu6 activation function is:
relu6(x)=min(max(x,0),6)∈[0,6];
Figure BDA0002510689290000041
in step S2, a feature compression range is defined according to the compressed data size, including when relu _ size is equal to 6, the feature is compressed to [0, 6 ]. Under normal conditions, the relu _ size of 8 bits is selected to be 6, and the relu _ size of 4 bits is selected to be 3, which are only recommended values under normal conditions, and specific numerical values can be finely adjusted according to a training model.
In fact, for the sake of easy understanding, the technical solutions created by the present invention can also be interpreted as follows:
suppose feature of i-th layer is XiIn order to quantize the feature parameters to low bits and keep the fixed point number fully utilized, a relu6 activation function is generally adopted for 8-bit quantization features, and a relu3 activation function is adopted for 4-bit quantization features.
Xi=clip(Xi,relu_size)
Xi=Round{(Xi/relu_size)*2bit}/2bit
The above formula is a process of adding corresponding quantization nodes on the basis of full precision to realize feature quantization to low bit;
the relu _ size is the activation function compressed data size, such as relu _ size equal to 6, feature compressed to [0, 6], defined according to the compressed data size.
According to the quantization operation nodes, in the full-precision quantization model, the original activation function is divided by the relu _ size, so that loss occurs to the distribution of the full-precision model, and retraining is needed to ensure that the low-bit quantization model keeps higher precision and reduce loss; when in quantification, large learning rate needs to be adjusted to quantify learning, and meanwhile, the convergence of the model is difficult to ensure;
the improvement is as follows: the above formula is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, and only the dispersion of feature data is modified, so that the low loss and high precision of the low-bit model can be maintained by directly fine tuning with a low learning rate.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A method for retraining a compensation activation function of a low-bit quantization network is characterized in that in a full-precision quantization model, the size of original activation function divided by the size of activation function compressed data is calculated by an original model formula and then multiplied by the size of corresponding activation function compressed data, namely, the full-precision data distribution is kept, and only the dispersion of characteristic data is modified.
2. The method of claim 1, wherein the method further comprises the steps of:
s1, assuming the feature of the ith layer is XiQuantizing the features by using an activation function;
s2, adding a corresponding quantization node process on the basis of full precision to realize characteristic quantization to low bits, and specifically realizing the characteristic quantization to low bits by the following formula:
Xi=clip(Xi,relu_size)
Xi=Round{(Xi/relu_size)*2bit}/2bit
wherein relu _ size is the size of compressed data of an activation function, the characteristics are compressed to [0, relu _ size ], and a characteristic compression range is defined according to the size of the compressed data;
at S3, the calculation of the formula at S2 is multiplied by the corresponding relu _ size, which is equivalent to maintaining the full-precision data distribution, but only modifying the dispersion of the feature data.
3. The method of claim 2, wherein the activation function is selected from the group consisting of: f (x) max (0, x).
4. A method for retraining an activation function for a low bit quantized network as claimed in claim 3, wherein the step S1 using the activation function quantization feature comprises using relu6 activation function for 8bit quantization feature and relu3 activation function relu3(x) min (max (x,0),3) for 4bit quantization feature.
5. The method of claim 4, wherein the relu6 activation function is:
relu6(x)=min(max(x,0),6)∈[0,6];
Figure FDA0002510689280000021
6. a method for retraining a low bit-quantized network padding activation function according to claim 2, wherein the step S2 defines the feature compression range according to the compressed data size, including feature compression to [0, 6] when relu _ size is equal to 6.
CN202010460267.4A 2020-05-27 2020-05-27 Method for retraining compensation activation function of low-bit quantization network Pending CN113743601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010460267.4A CN113743601A (en) 2020-05-27 2020-05-27 Method for retraining compensation activation function of low-bit quantization network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010460267.4A CN113743601A (en) 2020-05-27 2020-05-27 Method for retraining compensation activation function of low-bit quantization network

Publications (1)

Publication Number Publication Date
CN113743601A true CN113743601A (en) 2021-12-03

Family

ID=78723670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010460267.4A Pending CN113743601A (en) 2020-05-27 2020-05-27 Method for retraining compensation activation function of low-bit quantization network

Country Status (1)

Country Link
CN (1) CN113743601A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training

Similar Documents

Publication Publication Date Title
CN113743601A (en) Method for retraining compensation activation function of low-bit quantization network
CN112990420A (en) Pruning method for convolutional neural network model
CN111970654B (en) Sensor node dynamic energy-saving sampling method based on data characteristics
WO2021083230A1 (en) Power adjusting method and access network device
CN114881134A (en) Federal domain adaptation method applied to data isomerism
KR102409974B1 (en) Method and system for optimizing source, channel code rate and power control based on artificial neural network
CN111191742A (en) Sliding window length self-adaptive adjustment method for multi-source heterogeneous data stream
TW202105263A (en) A separate quantization method for a 4-bit and 8-bit combination of a neural network
CN113159318A (en) Neural network quantification method and device, electronic equipment and storage medium
CN117251604A (en) Alignment and uniformity perception oriented chart representation learning method and system
CN116634162A (en) Post-training quantization method for rate-distortion optimized image compression neural network
CN116761245A (en) Cognitive radio multi-target power distribution method based on boundary protection
CN112613603B (en) Neural network training method based on amplitude limiter and application thereof
CN116259328A (en) Post-training quantization method, apparatus and storage medium for audio noise reduction
CN110378467A (en) A kind of quantization method for deep learning network parameter
CN118364888A (en) Model quantization method, device, equipment and storage medium
CN113762499B (en) Method for quantizing weights by using multiple channels
CN106658010A (en) HEVC self-adaption video rate control method and system
CN114744674A (en) Voltage and power self-adaptive control method for photovoltaic access power distribution network
CN112987611B (en) Event trigger control method and control system based on RLC circuit switching system
CN112633472A (en) Convolutional neural network compression method based on channel pruning
CN108737826A (en) A kind of method and apparatus of Video coding
CN110933004B (en) Random channel quantization method based on confidence degree and confidence interval
CN108181938B (en) Water level adjusting method for reversely dissolving waste water of hydropower station
CN113762497B (en) Low-bit reasoning optimization method for convolutional neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination