CN110135580B - Convolution network full integer quantization method and application method thereof - Google Patents

Convolution network full integer quantization method and application method thereof Download PDF

Info

Publication number
CN110135580B
CN110135580B CN201910344069.9A CN201910344069A CN110135580B CN 110135580 B CN110135580 B CN 110135580B CN 201910344069 A CN201910344069 A CN 201910344069A CN 110135580 B CN110135580 B CN 110135580B
Authority
CN
China
Prior art keywords
network
output
weight
integer
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910344069.9A
Other languages
Chinese (zh)
Other versions
CN110135580A (en
Inventor
钟胜
周锡雄
王建辉
商雄
蔡智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910344069.9A priority Critical patent/CN110135580B/en
Publication of CN110135580A publication Critical patent/CN110135580A/en
Application granted granted Critical
Publication of CN110135580B publication Critical patent/CN110135580B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a convolution network full integer quantization method, and belongs to the technical field of convolution network quantization compression. The invention adopts integer expression to the input characteristic diagram, the network weight and the output characteristic diagram of the convolution network, and the forward reasoning process of each layer of the network only involves integer calculation. In order to ensure the performance after the integer quantization, the invention needs to retrain the network and simulate the result of the network full integer reasoning in the training. The invention also discloses an application method of the full integer quantization convolution network. Compared with a convolution network expressed by a single-precision floating point, the scheme of the invention occupies less resources and has higher reasoning speed; compared with a fixed-point quantization network, the method adopts fixed-length integer expression for input, output and weight of the network, does not need to consider the influence caused by bit width of an output result of the layer-by-layer network, has stronger regularity, and is more suitable for application to resource-limited platforms, such as FPGA/ASIC and other platforms.

Description

Convolution network full integer quantization method and application method thereof
Technical Field
The invention belongs to the technical field of quantization compression of a convolutional network, and particularly relates to a convolutional network full integer quantization method and an application method thereof.
Background
Since Alex-Net published in 2012, the deep learning method represented by convolutional neural network has been a breakthrough in the performance of the target discrimination and identification fields year by year, the accuracy of the existing complex network can reach more than 95%, and the network is not considered to be deployed on the embedded platform with limited resources at the beginning of design. For resource-oriented constrained applications, such as: applications such as AR/VR, smart phones, FPGA/ASIC, etc. require quantitative compression of models for reducing the size of the models and the demand of computing resources to adapt to the deployment of these embedded platforms.
Facing the model quantization compression problem, there are mainly two approaches: the first is to design a more efficient/lightweight network for the model structure itself to accommodate constrained computing resources such as Mobile Net, Shuffle Net. The second method is to carry out low bit quantization on intermediate results of the network, including weight, input and output, aiming at the existing network structure, and reduce the requirement of the computing resources of the network and the computing delay of the network under the condition that the network structure is unchanged and the network precision is ensured.
In view of the second mode, the existing methods for low bit quantization include: TWN, BNN, XOR-NET. The methods change the weight and the input and output quantity of the network into 1 bit or 3 bits, so that the multiplication and addition operation of the convolution process can be replaced by an exclusive or + shift operation, and the use of computing resources can be reduced. However, this method has significant drawbacks: the loss of precision is large. As for other quantification methods, the actual deployment in hardware is not considered, quantification is only performed on the network weight, and the consideration on the requirement of storage resources is focused on meeting, while the consideration on the requirement of computing resources is ignored.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, the present invention provides a convolution network full-integer quantization method and an application method thereof, which aims to express the input, output and weight of a network by fixed-length integer, and the quantization method enables the accuracy loss of the network to be controlled to about 5%, and simultaneously, the consumption of computing resources, storage resources and network resources.
In order to achieve the above object, the present invention provides a convolution network full integer quantization method, which comprises the following steps:
(1) obtaining a model, a floating point type weight and a training data set of a convolutional network, and initializing the network;
(2) for each convolution layer, firstly, calculating the distribution range of input IN, output OUT and weight WT of each layer through a floating point type reasoning process, and respectively calculating the maximum absolute extreme value of the input IN, the output OUT and the weight WT;
(3) updating the maximum absolute extreme values of the three in the training process of the current layer;
(4) performing integer quantization on the input and the weight of the current layer IN the convolutional network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
(5) according to the input and the weight of the integer quantization, the output of the integer quantization of the current layer is solved;
(6) carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type, and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeating the steps (3) to (6) until the last layer in the convolutional network;
(7) back propagation, continuously updating the weight until the network converges, and storing the quantized weight and the additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used to replace the original floating point operation.
Further, the step (3) of updating the maximum absolute extreme values of the three in the training process specifically includes updating the maximum absolute extreme values by using an exponential moving average algorithm:
xn=αxn-1+(1-α)x
wherein x isnFor updating the maximum absolute extreme value, x, of input, output, or weight this timen-1The maximum absolute extreme value of the input, the output or the weight is updated last time, x is the input, the output or the weight obtained by the calculation, and alpha is a weight coefficient.
Further, the step (4) is specifically as follows:
input integer quantization:
Q_IN=clamp(IN/S1)
wherein Q _ IN represents an integer quantization input; s1 { | IN | }/Γ ═ 2N(ii) a N represents the number of quantized bits; clamp () represents the part after truncation of the decimal point; max { | IN | } represents the maximum absolute extreme value of the input;
integer quantization of weights:
Q_WT=clamp(WT/S2)
wherein Q _ WT represents an integer quantization of the weights; s2 { | WT | }/Γ | WT | }/Γ | 2N(ii) a max { | WT | } represents the maximum absolute extreme value of the weight.
Further, the step (5) is specifically:
the output of the integer quantization, Q _ OUT, is:
Q_OUT=Q_IN×Q_WT×M
M=S1×S2/S3
wherein Q _ IN represents an integer quantization input; q _ WT represents an integer quantization of the weights; since M is floating-point type S1 × S2/S3, the order is
Figure BDA0002041707870000031
The derivation process of the parameter C and the parameter S is as follows:
firstly, solving M, S1 × S2/S3:
wherein S1 { | IN | }/Γ ═ 2NMax { | IN | } represents the maximum absolute extreme of the input; s2 { | WT | }/Γ | WT | }/Γ | 2NMax { | WT | } represents the maximum absolute extreme value of the weight; s3 { | OUT | }/Γ | Γ 2NMax { | OUT | } represents the maximum absolute extremum of the output; n represents the number of quantized bits;
multiplying M by 2 or dividing by 2 repeatedly, so that 0< M <0.5, a is 0, each time M is multiplied by 2, a is a +1, and dividing by 2, a is a-1, and counting to obtain the final value of a;
then presetting a value of v, wherein v is more than 0 and less than or equal to 32, and solving S and C according to the following formula:
S=v+a
C=round(M×2v)
0<C≤2v
where round () means to return round rounding.
Further, the shaped quantized output Q _ OUT is:
Q_OUT=Q_IN×Q_WT×M
before the output is shaped and quantized, the non-linear activation of Q _ IN and Q _ WT is carried out, and the non-linear activation adopts a shift approximation operation.
Further, the non-linear activation of Q _ IN and Q _ WT is specifically:
nonlinear activation is performed by using a leak activation function Q _ IN × Q _ WT, which is specifically formed as follows:
Figure BDA0002041707870000041
to ensure that the Q _ IN × Q _ WT remains integer after nonlinear activation, the above equation is shifted approximately, as follows:
Figure BDA0002041707870000042
wherein y < <1 indicates that the binary y is shifted to the left by one bit, and (y + y < <1) > >5 indicates that the binary (y + y < <1) is shifted to the right by 5 bits, and the final nonlinearly activated Q _ IN × Q _ WT remains an integer.
Further, if the next layer in the step (6) is a batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means specifically comprises:
the calculation process of the batch norm layer is as follows:
Figure BDA0002041707870000043
wherein x represents input, y represents output, epsilon represents the additional value of denominator, mu represents the output mean value, sigma represents the output standard deviation, gamma is a parameter generated in the calculation process of the batch norm layer, and beta represents bias;
since the batch norm follows the convolution process, the convolution process is expressed as:
y=∑w×fmap(i,j)
wherein fmap (i, j) is an image feature at the input image (i, j); w is a weight; y represents an output;
therefore, merging the batch norm layer parameters into the convolution process by adopting a merging means is as follows:
the combined weight is as follows:
Figure BDA0002041707870000051
combined bias:
Figure BDA0002041707870000052
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold.
According to another aspect of the present invention, there is provided an application method of a full integer quantization convolution network, the application method comprising the steps of:
s1, obtaining a model, a floating point type weight and a training data set of the convolutional network, and initializing the network;
s2, for each convolution layer, firstly, the distribution range of the input IN, the output OUT and the weight WT of each layer is obtained through the reasoning process of a floating point form, and the maximum absolute extreme values of the input IN, the output OUT and the weight WT are respectively obtained;
s3, updating the maximum absolute extreme values of the three in the training process of the current layer;
s4, performing integer quantization on the input and the weight of the current layer IN the convolution network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
s5, obtaining the output of the current layer integer quantization according to the input and weight of the integer quantization;
s6, carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeatedly and sequentially executing the steps S3 to S6 until the last layer in the convolutional network;
s7, used for back propagation, continuously updating the weight until the network convergence, saving the quantized weight, and additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used for replacing the original floating point operation;
s8, inputting the image of the target to be detected into a full integer quantization convolution network, and dividing the image of the target to be detected into S × S grids;
s9, setting n anchor boxes with fixed length-width ratios, predicting n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence coefficient p and the probability of m categories of the target by each anchor box; wherein x, y represent the target coordinates, w, h represent the height and width of the target;
s10, according to the probability corresponding to each category calculated in the previous step, firstly, carrying out preliminary screening through a fixed threshold, filtering out candidate frames with the confidence coefficient lower than the threshold in the corresponding category, and then removing overlapped target frames through a non-maximum inhibition method;
and S11, selecting the targets with the corresponding probability exceeding the threshold in different categories for the reserved target frames to be displayed visually, and outputting the target detection result.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
(1) the invention adopts a full integer quantization method, and the input, output and weight of the network are expressed by fixed length integers, the quantization method can control the precision loss of the network to be about 5 percent, and the requirement on computing resources is more friendly because the forward propagation process only comprises the multiplication of the fixed length integers;
(2) the absolute value extreme values of input and output of the network are calculated by adopting an exponential moving average algorithm, then quantization operation is carried out through the extreme values, the exponential moving average algorithm counts the distribution characteristics of a batch of data, so that the quantization result can meet the numerical characteristics of the batch of data and is not limited to specific input, and the quantization method is a necessary guarantee for generalization in practical application;
(3) merging measures are taken for the batch norm layer, parameters of the batch norm layer are directly merged to the convolutional layer, the process of quantifying the batch norm layer is directly omitted, and meanwhile, the process does not need to consider calculation of the batch norm layer when the network carries out forward reasoning;
(4) the shift activation process is advanced to the front of the quantization network output result, the shift activation operation is firstly carried out on the output intermediate result, and then the quantization of the network output is carried out, the method is based on the following steps: if the output is quantized to 8 bits and then the shift activation process is executed, it is equivalent to operating on an 8-bit signed number, and the precision is
Figure BDA0002041707870000071
Before the output is quantized, it is expressed using a 32bit value, androw shift active operation with precision of
Figure BDA0002041707870000072
Thus, by performing the change of the order, an error due to the shift approximation operation of the active layer can be reduced.
Drawings
FIG. 1 is a training flow diagram of a full integer quantization method of the present invention;
FIG. 2 is a diagram illustrating an example of the structure of a convolutional neural network in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a batch norm integration method according to the present invention;
FIG. 4 is an exemplary diagram of the cancellation of quantization and dequantization between adjacent layers of a network in the present invention;
FIG. 5 is a schematic diagram of the full integer forward derivation process of the present invention;
FIG. 6 is a graph of target detection results before quantification;
FIG. 7 is a graph of the target detection results after quantification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the method of the present invention comprises the steps of:
(1) obtaining a model, a floating point type weight and a training data set of a convolutional network, and initializing the network;
specifically, the supporting embodiment of the invention adopts a network structure of YOLOV 2-tiny. Referring to FIG. 2, there are 6 max pool layers, 9 convolutional layers followed by a batch norm. The training framework employs a darknet, which is written in c language and opens open sources. The Yolo web author provides floating point type weights on the personal home page for download. The training data was trained using VOC2012 and VOC2007 data sets, which contain 20 classes of targets, and a total of 9963+ 11540-21503 labeled data. The width of an input image for initializing the network is 416 pixels, the height of the input image is 416 pixels, the number of channels of the image is 3, the number of pictures subjected to iterative training each time is 64, the momentum is 0.9, the learning rate is 0.001, the maximum iteration number is 60200, the network output is the position, the size and the confidence coefficient of a target in the image, and due to the fact that cross redundancy exists in detection results, the detection results need to be fused by a non-maximum suppression method, and therefore the output result of each detected target corresponds uniquely.
(2) For each convolution layer, firstly, the distribution range of input, output and weight of each convolution layer is obtained through a floating point type reasoning process, the maximum absolute extreme value | max | of the input, output and weight of each convolution layer is respectively obtained, and the extreme value is updated by using an exponential moving average algorithm (EMA) in a training process;
specifically, each layer of network weight includes parameters w and β, and input and output need to be quantized, which requires statistics of the maximum absolute values of 4 groups of w, β, IN, and OUT. In order to enable statistical absolute maxima, reflecting the statistical characteristics of the data set, rather than maxima under a particular input image, these extrema need to be updated using the EMA. The specific formula is as follows: x is the number ofn=αxn-1+(1-α)x。
xnValue, x, reserved for the current endn-1The value reserved for the last iteration process, and x is the result of this calculation. Alpha is a weight coefficient, generally selected between 0.9 and 1, and in the embodiment of the invention, alpha is 0.99.
(3) Quantizing the input and the weight of the network according to the obtained maximum absolute value by using the following quantization formula, so that the input and the weight can be expressed by int 8;
and (3) quantization input: q _ IN ═ clamp (IN/S1)
Quantization weight: q _ WT ═ clamp (WT/S2)
Quantization coefficient: s1 ═ MAX |/Γ, | MAX { | IN | }, Γ ═ 2N
S2=|MAX|/Γ,|MAX|=max{|WT|},Γ=2N
Wherein Γ ═ 2NRepresenting the number of quantized bits; IN is input, WT is weight, max { | IN | } is the maximum absolute extreme of the input, max { | WT | } is the maximum absolute extreme of the weight;
specifically, according to experience, the input and weight absolute value of each layer network are in the range of 0-1, linear transformation is carried out by utilizing the statistical maximum absolute value, the weight and the input are normalized to [ -127,127] by adopting the formula, when the numerical value is rounded, a direct truncation mode is used instead of a rounding forensics mode, and in the formula, clamp () represents truncation operation: int ═ clamp (float). In an embodiment of the present invention, N ═ 8.
(4) The quantized output of the current layer can be obtained according to the obtained quantized input and weight. To ensure that the network output is also an integer value, the quantization is performed using the following formula:
and (3) floating point output: OUT × WT × Q _ IN × Q _ WT × S1 × S2
And (3) quantization output: q _ OUT/S3Q _ IN × Q _ WT × (S1 × S2/S3)
Where S3 is the output quantized coefficient. Since M is a floating point number S1 × S2/S3, to ensure that the network inference process is integer computation, it can be approximated by multiplication and shift, and the coefficients C, S generated by the approximation process are stored as parameters, which is as follows:
approximate calculation:
Figure BDA0002041707870000091
Figure BDA0002041707870000092
specifically, since M is a floating point number, S1S2/S3, since it is necessary to ensure that the quantized output value can be represented by integer, and the calculation process does not involve floating point operation, it is necessary to perform approximate calculation on M, and order M
Figure BDA0002041707870000093
To ensure that the bit width of the integer multiplication is as small as possible and the result of the approximate calculation is more accurate, it is necessary to selectSelecting the numerical range of C. In the examples of the present invention, 0< C.ltoreq.2 is definedv,v=24。
The calculation for solving C, S is to multiply or divide M by 2 repeatedly to finally get 0<MΔ<0.5. Assuming that a is 0, each time M multiplies 2, a adds 1, and each time M divides 2, a subtracts 1. Finally, let C equal to round (M)Δ×2v) S ═ v + a, round () denotes rounding.
(5) Before a network result is output to a lower layer, a nonlinear activation (active) process is required, the process is a floating point operation, and in order to simulate a forward propagation full integer calculation process, a shift approximation operation is required to be adopted in the process. Shifting the result (in8 expression) after the activation operation is approximated, reducing the result into a floating point type expression after inverse quantization, and outputting the floating point type expression to the next layer; and (5) repeating the processes from (2) to (5) until the last layer of the network. For the network with the batch norm layer, merging means is needed to merge the parameters of the batch norm layer directly into the network of the previous layer.
Specifically, for a network having a batch norm layer, a merging approach needs to be taken, as shown in fig. 3. The specific implementation process comprises the following steps: mathematical formulas are available for batch norm
Figure BDA0002041707870000101
Describing the calculation process, wherein mu represents an output mean value, epsilon represents the added value of a denominator, 0 division operation is prevented when the square difference is divided, default is 1e-5, sigma represents an output standard deviation, gamma is a parameter generated by a batch norm process, and beta represents a bias; since the batch norm follows the convolution process, i.e., x ═ Σ w × fmap (i, j), w is the weight of the network, and fmap (i, j) is the feature map of the input. Through a simple transformation, the batch norm can be integrated into the convolution process, and the deformation process is expressed as follows:
combined weight w:
Figure BDA0002041707870000102
combined bias β:
Figure BDA0002041707870000103
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold
The invention adopts the displacement approximation to the nonlinear activation function, and ensures the full integer forward derivation process. The invention adopts a leak activation function, and the specific form is as follows:
Figure BDA0002041707870000104
for the activation function, two parts of operations are mainly included: data judgment and floating-point multiplication. In order to ensure that the forward derivation process only uses integer calculation, the invention adopts shift approximate calculation for the forward derivation process, and the specific form is as follows:
Figure BDA0002041707870000105
the shift approximation process of the present invention is numerically equivalent to the following approximation:
Figure BDA0002041707870000106
in the actual calculation process, the operation of shift activation is performed before the quantization of the final output in step (4). The bit width of the final output value is consistent with that of the input value, and preparation is made for the forward derivation process of the next layer, so that the error caused by the shift approximation operation of the activation layer can be reduced.
(6) Back propagation, continuously updating the weight until the network converges, and storing the quantized weight and the additional parameters; the parameters after integer quantization can be used in the forward derivation process of full integer, and integer is used to replace the original floating point operation.
Specifically, assuming that the input channel of the convolutional layer is L _ M, the output channel is L _ N, and the convolutional kernel size is K, the required storage space before and after the integer quantization is 1/4 after the quantization as shown below.
After quantization:
Storage_int8=L_M×L_N×K×K+L_N+2×sizeof(int32)/sizeof(int8)
before quantization:
Storage_float=(L_M×L_N×K×K+L_N+bn×L_N×3)sizeof(float),bn={0,1}
as shown in fig. 4, there is a quantization as well as an inverse quantization process between the two layers. In the actual forward derivation process, the two can cancel each other out, so in the actual calculation process, only the inverse quantization processing needs to be performed on the output of the last layer of the network, and only the full integer calculation exists in the middle layer, as shown in fig. 5.
In addition, the performance of the invention is actually measured by using a darknet frame: quantization was performed on a network structure of YOLO v2-tiny, and the loss was 5.1% comparing the average map values before and after quantization, as shown in table 1:
categories Before quantization After quantization Error of the measurement
Boat 0.1415 0.1657 0.0242
Bird 0.1807 0.1621 -0.0186
Train 0.5145 0.4441 -0.0704
Bus 0.5306 0.4669 -0.0637
Person 0.4633 0.4061 -0.0572
Dog 0.3379 0.3023 -0.0356
Diningtable 0.3433 0.238 -0.1053
Sheep 0.3322 0.2644 -0.0678
Pottedplant 0.0864 0.0756 -0.0108
Sofa 0.3187 0.2076 -0.1111
Car 0.5195 0.4358 -0.0837
Aeroplane 0.4157 0.2801 -0.1356
Bicycle 0.48 0.4563 -0.0237
Tvmonitor 0.4029 0.3335 -0.0694
Bottle 0.0522 0.037 -0.0152
Motorbike 0.536 0.4221 -0.1139
Cat 0.3847 0.3633 -0.0214
Chair 0.1776 0.1235 -0.0541
Cow 0.3049 0.2972 -0.0077
Horse 0.5222 0.4384 -0.0838
Average mAP 0.3521 0.301 -0.0511
TABLE 1
The invention utilizes the parameters before and after quantization to detect and identify the target
Inputting a given image into a convolution network, and dividing the image into S-S grids;
setting n anchor boxes with fixed length-width ratios, predicting the n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence (p) and the probability of 20 categories of the target by each anchor box;
performing non-maximum suppression (NMS) on the extracted S, S and n targets, removing overlapped frames, and keeping a prediction frame with high confidence;
and outputting and visually displaying the result.
For a certain class of targets, the confidence of the corresponding class in all candidate frames needs to be calculated, and the calculation process is shown as the following formula:
P(class)=P(class|obj)×P(obj)
wherein P (class) represents the final confidence of a class of target in a candidate box, P (class | obj) represents the numerical value of the corresponding class regressed in the candidate box, and P (obj) represents the probability of the target regressed in the candidate box. After the probability of the corresponding category is calculated, firstly, a primary screening is carried out through a fixed threshold, candidate frames with low confidence level in the corresponding category are filtered, and then overlapped target frames are removed through an NMS (non-maximum suppression) method.
The non-maximum suppression (NMS) removal of overlapping boxes is performed according to each category, and the process is summarized as follows:
(1) sorting P (class) of a certain class in all the candidate frames in a descending order, and marking all the frames in an unprocessed state;
(2) calculating the overlapping rate of the frame with the maximum probability and other frames, if the overlapping rate exceeds 0.5, reserving the frame with the maximum probability, correspondingly removing other frames, and marking the frame as processed;
(3) finding out the second largest target frame of P (class) in sequence, and marking according to the step (2);
(4) repeating steps (2) - (3) until all frames are marked as processed;
(5) and selecting the targets exceeding the threshold in the P (class) for visual display and outputting the result for the reserved target frames.
As shown in fig. 6, it is a schematic diagram of the effect of a general convolutional network after image target recognition; performing target detection and identification on the same picture by adopting a convolution network after full integer quantization, wherein the effect is shown in fig. 7; it can be seen that the performance loss of the convolutional network after the integer quantization is adopted is not large, the identification effect is almost better than that of the ordinary convolutional network, but the detection identification speed is higher, and the consumption of computing resources is less.
It will be appreciated by those skilled in the art that the foregoing is only a preferred embodiment of the invention, and is not intended to limit the invention, such that various modifications, equivalents and improvements may be made without departing from the spirit and scope of the invention.

Claims (7)

1. An application method of a full integer quantization convolution network, the application method comprising the steps of:
s1, obtaining a model, a floating point type weight and a training data set of the convolutional network, and initializing the network;
s2, for each convolution layer, firstly, the distribution range of the input IN, the output OUT and the weight WT of each layer is obtained through the reasoning process of a floating point form, and the maximum absolute extreme values of the input IN, the output OUT and the weight WT are respectively obtained;
s3, updating the maximum absolute extreme values of the three in the training process of the current layer;
s4, performing integer quantization on the input and the weight of the current layer IN the convolution network according to the maximum absolute pole of the input IN, the output OUT and the weight WT;
s5, obtaining the output of the current layer integer quantization according to the input and weight of the integer quantization;
s6, carrying out inverse quantization on the output of the integer quantization of the current layer, reducing the output into a floating point type and outputting to the next layer; if the next layer is the batch norm layer, merging the parameters of the batch norm layer into the current layer by adopting a merging means; repeatedly and sequentially executing the steps S3 to S6 until the last layer in the convolutional network;
s7, used for back propagation, continuously updating the weight until the network convergence, saving the quantized weight, and additional parameters; the parameters after integer quantization are used in the forward derivation process of full integer, and integer is used for replacing the original floating point operation;
s8, inputting the image of the target to be detected into a full integer quantization convolution network, and dividing the image of the target to be detected into S × S grids;
s9, setting n anchor boxes with fixed length-width ratios, predicting n anchor boxes for each grid, and independently predicting the coordinates (x, y, w, h), the confidence coefficient p and the probability of m categories of the target by each anchor box; wherein x, y represent the target coordinates, w, h represent the height and width of the target;
s10, according to the probability corresponding to each category calculated in the previous step, firstly, carrying out preliminary screening through a fixed threshold, filtering out candidate frames with the confidence coefficient lower than the threshold in the corresponding category, and then removing overlapped target frames through a non-maximum inhibition method;
and S11, selecting the targets with the corresponding probability exceeding the threshold in different categories for the reserved target frames to be displayed visually, and outputting the target detection result.
2. The method of claim 1, wherein the maximum absolute extremum of the full integer quantization convolutional network is updated in the training process in step S3, specifically, the maximum absolute extremum is updated by using an exponential moving average algorithm:
xn=αxn-1+(1-α)x
wherein x isnFor updating the maximum absolute extreme value, x, of input, output, or weight this timen-1The maximum absolute extreme value of the input, the output or the weight is updated last time, x is the input, the output or the weight obtained by the calculation, and alpha is a weight coefficient.
3. The method of claim 1, wherein the step S4 is specifically as follows:
input integer quantization:
Q_IN=clamp(IN/S1)
wherein Q _ IN represents an integer quantization input; s1 { | IN | }/Γ ═ 2N(ii) a N represents the number of quantized bits; clamp () represents the part after truncation of the decimal point; max { | IN | } represents the maximum absolute extreme value of the input;
integer quantization of weights:
Q_WT=clamp(WT/S2)
wherein Q _ WT represents an integer quantization of the weights; s2 { | WT | }/Γ | WT | }/Γ | 2N(ii) a max { | WT | } represents the maximum absolute extreme value of the weight.
4. The method of claim 1, wherein the step S5 is specifically as follows:
the output of the integer quantization, Q _ OUT, is:
Q_OUT=Q_IN×Q_WT×M
M=S1×S2/S3
wherein Q _ IN represents an integer quantization input; q _ WT represents an integer quantization of the weights; since M is floating-point type S1 × S2/S3, the order is
Figure FDA0002881954860000031
The derivation process of the parameter C and the parameter S is as follows:
firstly, solving M, S1 × S2/S3:
wherein S1 { | IN | }/Γ ═ 2NMax { | IN | } represents the maximum absolute extreme of the input; s2 { | WT | }/Γ | WT | }/Γ | 2NMax { | WT | } represents the maximum absolute extreme value of the weight; s3 { | OUT | }/Γ | Γ 2NMax { | OUT | } represents the maximum absolute extremum of the output; n represents the number of quantized bits;
multiplying M by 2 or dividing by 2 repeatedly, so that 0< M <0.5, a is 0, each time M is multiplied by 2, a is a +1, and dividing by 2, a is a-1, and counting to obtain the final value of a;
then presetting a value of v, wherein v is more than 0 and less than or equal to 32, and solving S and C according to the following formula:
S=v+a
C=round(M×2v)
0<C≤2v
where round () means to return round rounding.
5. The method of claim 4, wherein the output Q _ OUT of the full integer quantization convolution network is:
Q_OUT=Q_IN×Q_WT×M
before the output is shaped and quantized, the non-linear activation of Q _ IN and Q _ WT is carried out, and the non-linear activation adopts a shift approximation operation.
6. The method of claim 5, wherein the non-linear activation of Q _ IN and Q _ WT is specifically:
nonlinear activation is performed by using a leak activation function Q _ IN × Q _ WT, which is specifically formed as follows:
Figure FDA0002881954860000041
to ensure that the Q _ IN × Q _ WT remains integer after nonlinear activation, the above equation is shifted approximately, as follows:
Figure FDA0002881954860000042
wherein y < <1 indicates that the binary y is shifted to the left by one bit, and (y + y < <1) > >5 indicates that the binary (y + y < <1) is shifted to the right by 5 bits, and the final nonlinearly activated Q _ IN × Q _ WT remains an integer.
7. The method of claim 1, wherein if the next layer is a batch norm layer in step S6, merging the parameters of the batch norm layer into the current layer by using a merging means specifically comprises:
the calculation process of the batch norm layer is as follows:
Figure FDA0002881954860000043
wherein x represents input, y represents output, epsilon represents the additional value of denominator, mu represents the output mean value, sigma represents the output standard deviation, gamma is a parameter generated in the calculation process of the batch norm layer, and beta represents bias;
since the batch norm follows the convolution process, the convolution process is expressed as:
y=∑w×fmap(i,j)
wherein fmap (i, j) is an image feature at the input image (i, j); w is a weight; y represents an output;
therefore, merging the batch norm layer parameters into the convolution process by adopting a merging means is as follows:
the combined weight is as follows:
Figure FDA0002881954860000044
combined bias:
Figure FDA0002881954860000045
the convolution process after combination: y ∑ w _ fold × fmap (i, j) + β _ fold.
CN201910344069.9A 2019-04-26 2019-04-26 Convolution network full integer quantization method and application method thereof Active CN110135580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910344069.9A CN110135580B (en) 2019-04-26 2019-04-26 Convolution network full integer quantization method and application method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910344069.9A CN110135580B (en) 2019-04-26 2019-04-26 Convolution network full integer quantization method and application method thereof

Publications (2)

Publication Number Publication Date
CN110135580A CN110135580A (en) 2019-08-16
CN110135580B true CN110135580B (en) 2021-03-26

Family

ID=67575312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910344069.9A Active CN110135580B (en) 2019-04-26 2019-04-26 Convolution network full integer quantization method and application method thereof

Country Status (1)

Country Link
CN (1) CN110135580B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659734B (en) * 2019-09-27 2022-12-23 中国科学院半导体研究所 Low bit quantization method for depth separable convolution structure
CN112686365B (en) * 2019-10-18 2024-03-29 华为技术有限公司 Method, device and computer equipment for operating neural network model
CN111260022B (en) * 2019-11-22 2023-09-05 中国电子科技集团公司第五十二研究所 Full INT8 fixed-point quantization method for convolutional neural network
CN110929862B (en) * 2019-11-26 2023-08-01 陈子祺 Fixed-point neural network model quantification device and method
CN110889503B (en) * 2019-11-26 2021-05-04 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN111160544B (en) * 2019-12-31 2021-04-23 上海安路信息科技股份有限公司 Data activation method and FPGA data activation system
CN111310890B (en) * 2020-01-19 2023-10-17 深圳云天励飞技术有限公司 Optimization method and device of deep learning model and terminal equipment
CN111444772A (en) * 2020-02-28 2020-07-24 天津大学 Pedestrian detection method based on NVIDIA TX2
CN113762495A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Method for improving precision of low bit quantization model of convolutional neural network model
CN113762497B (en) * 2020-06-04 2024-05-03 合肥君正科技有限公司 Low-bit reasoning optimization method for convolutional neural network model
CN113780513B (en) * 2020-06-10 2024-05-03 杭州海康威视数字技术股份有限公司 Network model quantization and reasoning method and device, electronic equipment and storage medium
CN111696149A (en) * 2020-06-18 2020-09-22 中国科学技术大学 Quantization method for stereo matching algorithm based on CNN
CN111723934B (en) * 2020-06-24 2022-11-01 北京紫光展锐通信技术有限公司 Image processing method and system, electronic device and storage medium
CN112200296B (en) * 2020-07-31 2024-04-05 星宸科技股份有限公司 Network model quantization method and device, storage medium and electronic equipment
CN112508125A (en) * 2020-12-22 2021-03-16 无锡江南计算技术研究所 Efficient full-integer quantization method of image detection model
CN114191267A (en) * 2021-12-06 2022-03-18 南通大学 Light-weight intelligent method and system for assisting blind person in going out in complex environment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013531313A (en) * 2010-07-08 2013-08-01 プライム・ジェノミクス・インコーポレイテッド A system for quantifying the dynamic characteristics of the entire system in complex networks
WO2016039651A1 (en) * 2014-09-09 2016-03-17 Intel Corporation Improved fixed point integer implementations for neural networks
US10733505B2 (en) * 2016-11-10 2020-08-04 Google Llc Performing kernel striding in hardware
CN107515736B (en) * 2017-07-01 2021-01-15 广州深域信息科技有限公司 Method for accelerating computation speed of deep convolutional network on embedded equipment
CN107909537B (en) * 2017-11-16 2020-11-06 厦门美图之家科技有限公司 Image processing method based on convolutional neural network and mobile terminal

Also Published As

Publication number Publication date
CN110135580A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN110135580B (en) Convolution network full integer quantization method and application method thereof
CN111191583B (en) Space target recognition system and method based on convolutional neural network
CN108288270B (en) Target detection method based on channel pruning and full convolution deep learning
CN111626330A (en) Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation
CN111488985B (en) Deep neural network model compression training method, device, equipment and medium
CN110610237A (en) Quantitative training method and device of model and storage medium
CN109614874B (en) Human behavior recognition method and system based on attention perception and tree skeleton point structure
CN110533022B (en) Target detection method, system, device and storage medium
CN111260020B (en) Convolutional neural network calculation method and device
CN112508125A (en) Efficient full-integer quantization method of image detection model
CN111489364B (en) Medical image segmentation method based on lightweight full convolution neural network
CN112329922A (en) Neural network model compression method and system based on mass spectrum data set
CN113158862A (en) Lightweight real-time face detection method based on multiple tasks
CN107506792B (en) Semi-supervised salient object detection method
CN112101364B (en) Semantic segmentation method based on parameter importance increment learning
CN111476346A (en) Deep learning network architecture based on Newton conjugate gradient method
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN112150497A (en) Local activation method and system based on binary neural network
CN113240090B (en) Image processing model generation method, image processing device and electronic equipment
CN112288084B (en) Deep learning target detection network compression method based on feature map channel importance
CN112561050B (en) Neural network model training method and device
CN114494441B (en) Grape and picking point synchronous identification and positioning method and device based on deep learning
CN116453096A (en) Image foreign matter detection method, device, electronic equipment and storage medium
CN116187416A (en) Iterative retraining method based on layer pruning sensitivity and image processor
CN115619563A (en) Stock price analysis method based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhong Sheng

Inventor after: Zhou Xixiong

Inventor after: Wang Jianhui

Inventor after: Shang Xiong

Inventor after: Cai Zhi

Inventor before: Zhong Sheng

Inventor before: Zhou Xixiong

Inventor before: Shang Xiong

Inventor before: Cai Zhi

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant