CN110135580A

CN110135580A - A kind of full integer quantization method and its application method of convolutional network

Info

Publication number: CN110135580A
Application number: CN201910344069.9A
Authority: CN
Inventors: 钟胜; 周锡雄; 商雄; 蔡智
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-08-16
Anticipated expiration: 2039-04-26
Also published as: CN110135580B

Abstract

The invention discloses a kind of full integer quantization methods of convolutional network, belong to the quantization compression technique area of convolutional network.The present invention is all made of integer expression to the input feature vector figure of convolutional network, network weight and output characteristic pattern, and the forward inference process of every layer network pertains only to integer calculating.To ensure the performance after integer quantization, the present invention needs to network re -training, and in training the full integer reasoning of analog network result.The invention also achieves a kind of application methods of full integer quantization convolutional network.Compared to the convolutional network of single-precision floating point expression, resource occupied by the present invention program is less, and inference speed is faster；Compared to the network of fixed point quantization, the present invention is all made of the expression of regular length integer to the input, output and weight of network, and without considering that the bit wide bring of output result of layer-by-layer network influences, regularity is stronger, more suitable for resourceoriented restricted platform, such as FPGA/ASIC platform application.

Description

A kind of full integer quantization method and its application method of convolutional network

Technical field

The invention belongs to the quantization compression technique areas of convolutional network, more particularly, to a kind of full integer of convolutional network Quantization method and its application method.

Background technique

It has been the deep learning method of representative in target point using convolutional neural networks since Alex-Net in 2012 is delivered Not, identify that the performance in field makes a breakthrough year by year, the precision of existing complex network can reach 95% or more, these networks At the beginning of design, the deployment for the embedded platform not being limited in view of resourceoriented.For resourceoriented be limited application, Such as: AR/VR, smart phone, FPGA/ASIC application, need to carry out quantization compression to model, for reduce the size of model with And the demand of computing resource, to adapt to the deployment of these embedded platforms.

Quantify compression problem in face of model, be primarily present two kinds of approach: the first is for model structure itself, and design is more Efficiently/light-type network, for adapting to limited computing resource, such as Mobile Net, Shuffle Net.It is for second Low bit quantization is carried out, in network to the intermediate result of network, including weight, input, output for existing network structure Structure is constant, guarantee neural network accuracy in the case where, reduce the demand of the computing resource of network, reduce the computation delay of network.

In face of the above-mentioned second way, the method for already present low bit quantization has at present: TWN, BNN, XOR-NET.This For a little methods by the weight of network and input, output quantization at 1 bit or 3 bits, this makes the multiply-add operation of convolution process It can be substituted with exclusive or+shifting function, the use of computing resource can be reduced.But there are notable defects for this method: loss of significance It is larger.As for other quantization methods, the practical deployment in hardware is not considered, is quantified only for the weight in network, The considerations of focusing on meeting the needs of storage resource, and having ignored to computational resource requirements.

Summary of the invention

Aiming at the above defects or improvement requirements of the prior art, the present invention provides a kind of full integer quantization sides of convolutional network Method and its application method, its object is to which the input, output, weight of network are expressed using regular length integer, the quantization side Method controls the loss of significance of network 5% or so, while the consumption of computing resource, storage resources and Internet resources.

To achieve the above object, the present invention provides a kind of full integer quantization method of convolutional network, the method includes with Lower step:

(1) model, floating type weight and the training dataset of convolutional network are obtained, and initializes network；

(2) to each layer of convolutional layer, the reasoning process of relocatable is first passed through, seeks each layer of input IN, output OUT With the distribution of weight WT, and the maximum absolute extreme of three is sought respectively；

(3) the maximum absolute extreme of three is updated during the training of current layer；

(4) according to input IN, export the maximum absolute extreme of OUT and weight WT to the input of current layer in convolutional network and Weight carries out integer quantization；

(5) input and weight quantified according to integer, the output for asking current layer integer to quantify；

(6) inverse quantization is carried out to the output of current layer integer quantization, is reduced to floating type, output to next layer；If next Layer is batch norm layers, then takes merger means that batch norm layer parameter is merged into current layer；Repeatedly step (3)~ (6), up to the last layer in convolutional network；

(7) weight is constantly updated in backpropagation, until network convergence, saves the weight and additional parameter of quantization；Through It crosses integer and quantifies later parameter for, to derivation process, replacing original floating-point operation with integer before full integer.

Further, in the step (3) during training update three maximum absolute extreme specifically, using Index sliding average algorithm updates maximum absolute extreme:

xⁿ=α x^n-1+(1-α)x

Wherein, xⁿThe maximum absolute extreme of input, output or weight, x are updated for this^n-1Input, output were updated for last time Or the maximum absolute extreme of weight, x are input, output or the weight that this is calculated, α is weight coefficient.

Further, the step (4) specifically:

The integer of input quantifies:

Q_IN=clamp (IN/S1)

Wherein, Q_IN indicates integer quantization input；S1=max | IN | }/Γ, Γ=2^N；N indicates the digit of quantization； Clamp () indicates part after truncation decimal point；Max | IN | } indicate the maximum absolute extreme inputted；

The integer of weight quantifies:

Q_WT=clamp (WT/S2)

Wherein, Q_WT indicates the integer quantization of weight；S2=max | WT | }/Γ, | Γ=2^N；Max | WT | } indicate power The maximum absolute extreme of weight.

Further, the step (5) specifically:

The output Q_OUT of integer quantization are as follows:

Q_OUT=Q_IN × Q_WT × M

M=S1 × S2/S3

Wherein, Q_IN indicates integer quantization input；Q_WT indicates the integer quantization of weight；Since M=S1 × S2/S3 is floating Point-type, therefore enableWherein, the derivation process of parameter C and parameter S are as follows:

M, M=S1 × S2/S3 are solved first:

Wherein, S1=max { | IN | }/Γ, Γ=2^N, max | IN | } indicate the maximum absolute extreme inputted；S2=max | WT | }/Γ, | Γ=2^N, max { | WT | } indicates the maximum absolute extreme of weight；S3=max | OUT | }/Γ, | Γ=2^N, Max | OUT | } indicate the maximum absolute extreme exported；N indicates the digit of quantization；

Again to M repeatedly multiplied by 2 or divided by 2, so that finally enable 0 < M < 0.5, enable a=0, each M multiplied by 2 a=a+1, Divided by 2 a=a-1, statistics obtains the value of final a；

The value of v is preset later, and S and C are sought in 0 v≤32 < according to the following formula:

S=v+a

(M × 2 C=round^v)

0 C≤2 <^v

Wherein, round () indicates to return to round.

Further, the output Q_OUT of the integer quantization are as follows:

Q_OUT=Q_IN × Q_WT × M

Before carrying out integer quantization to output, nonlinear activation first is carried out to Q_IN and Q_WT, the nonlinear activation uses Shift approximation operation.

It is further, described that nonlinear activation is carried out to Q_IN and Q_WT specifically:

Using leaky activation primitive Q_IN × Q_WT progress nonlinear activation, leaky activation primitive, concrete form is as follows:

Still it is integer after carrying out nonlinear activation for guarantee Q_IN × Q_WT, above formula is subjected to displacement approximation operation, is obtained:

Wherein, y<<1 indicates binary y moving to left one, (y+y<<1)>>it 5 indicates binary (y+y<<1) 5 are moved to right, the Q_IN × Q_WT for eventually passing through nonlinear activation remains as integer.

Further, the step (6) if in next layer be batch norm layers, take merger means by batch Norm layer parameter is merged into current layer specifically:

The calculating process that norm layers of batch are as follows:

Wherein, x indicates input, and y indicates output, and ε indicates denominator added value, and μ indicates output mean value, and σ indicates outputting standard Difference, γ are the parameters that norm layers of calculating process of batch generates, and β indicates biasing；

Since batch norm linking is behind convolution process, convolution process is indicated are as follows:

Y=∑ w × fmap (i, j)

Wherein, fmap (i, j) is the characteristics of image at input picture (i, j)；W is weight；Y indicates output；

Therefore merger means are taken to be merged into batch norm layer parameter in convolution process i.e. are as follows:

Weight after merging:

Biasing after merging:

Convolution process after merging: y=∑ w_fold × fmap (i, j)+β _ fold.

It is another aspect of this invention to provide that the present invention provides a kind of application method of full integer quantization convolutional network, institute State application method the following steps are included:

S1, obtains model, floating type weight and the training dataset of convolutional network, and initializes network；

S2 first passes through the reasoning process of relocatable to each layer of convolutional layer, seeks each layer of input IN, output OUT With the distribution of weight WT, and the maximum absolute extreme of three is sought respectively；

S3 updates the maximum absolute extreme of three during the training of current layer；

S4, according to input IN, export the maximum absolute extreme of OUT and weight WT to the input of current layer in convolutional network and Weight carries out integer quantization；

S5, according to input and weight that integer quantifies, the output for asking current layer integer to quantify；

S6 carries out inverse quantization to the output of current layer integer quantization, is reduced to floating type, output to next layer；If next Layer is batch norm layers, then takes merger means that batch norm layer parameter is merged into current layer；It successively executes repeatedly Step S3 to S6, until the last layer in convolutional network；

S7 is used for backpropagation, constantly updates weight, until network convergence, saves the weight of quantization, and additional ginseng Number；Parameter after integer quantifies is for, to derivation process, replacing original floating-point operation with integer before full integer；

The image of target to be detected is input to full integer and quantifies convolutional network, the image of target to be detected is divided by S8 At S*S grid；

S9 sets the anchor box of n fixed aspect ratio, to n anchor box of each grid forecasting, each The probability of the coordinate (x, y, w, h) of anchor box independent prediction target, confidence level p and m classification；Wherein, x, y indicate target Coordinate, w, h indicate the height and width of target；

S10, the corresponding probability of each classification calculated according to front first pass through fixed threshold and carry out primary dcreening operation, filter out corresponding Confidence level is lower than the candidate frame of threshold value in classification, then by the method for non-maximum restraining, removes the target frame of overlapping；

S11, for the target frame remained, selecting different classes of middle corresponding probability is more than that the target of thresholding carries out visually Change display, exports object detection results.

In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect Fruit:

(1) input, output, weight of network are used regular length integer using the method for full integer quantization by the present invention Expression, this quantization method can make the loss of significance of network control 5% or so, since propagated forward process only includes solid The multiplication of measured length integer calculates, this is also more friendly for the demand of computing resource；

(2) the absolute value extreme value of input, the output of network is calculated using index sliding average algorithm, then passes through this extreme value Quantization operation is carried out, what index sliding average algorithm counted is the distribution character of batch of data, so that quantized result can satisfy The numerical characteristic of this batch of data, and it is not limited to specific input, this is that this quantization method can be extensive in practical application Necessary guarantee；

(3) merger measure is taken for norm layers of batch, norm layers of parameter of batch is directly incorporated into convolutional layer, This will directly be saved for norm layers of process quantified of batch, while this process is but also network carries out forward inference When without the concern for norm layers of batch of calculating；

(4) process of displacement activation is advanced to before quantization network output result, it is first for the intermediate result of output Displacement activation operation is carried out, then carries out the quantization of network output again, this way foundation is: if being first 8 output quantization Position, then displacement activation is executed, it is equivalent to and a 8bit signed number is operated, precision isAnd it is exporting Before being quantized, it uses 32bit numerical expression, carries out displacement activation operation, and precision isThus, pass through execution sequence Change, it is possible to reduce due to the displacement approximation operation bring error of active coating.

Detailed description of the invention

Fig. 1 is the training flow chart of full integer quantization method of the invention；

Fig. 2 is the topology example figure of convolutional neural networks in the embodiment of the present invention；

Fig. 3 is batch norm integration method exemplary diagram in the present invention；

Fig. 4 is the exemplary diagram for quantifying to offset with inverse quantization between network adjacent layer in the present invention；

Fig. 5 be in the present invention before full integer to derivation process schematic diagram；

Fig. 6 is object detection results figure before quantifying；

Fig. 7 is object detection results figure after quantization.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below that Not constituting conflict between this can be combined with each other.

As shown in Figure 1, the method for the present invention the following steps are included:

Specifically, the matched embodiment of the present invention uses the network structure of YOLOV2-tiny.With reference to Fig. 2, including 6 layers of max Pool layers, 9 layers of convolutional layer, subsidiary batch norm behind convolutional layer.Training frame uses darknet, and the frame is with c language volume It writes and disclosure is increased income.Yolo network author provides floating type weight in personal homepage, available for download.Training data uses VOC2012 and VOC2007 data set is trained, which includes 20 class targets, amounts to 9963+11540=21503 Labeled data.The width of the input picture of initialization network is 416 pixels, is highly 416 pixels, and the port number of image is 3, often It is secondary be iterated trained picture number be 64, momentum 0.9, learning rate 0.001, maximum number of iterations 60200, network Output is position, size and the confidence level of the target in image, since testing result is there are crossing redundancy, is needed using non-very big The method that value inhibits merges testing result, so that the target output result for detecting each uniquely corresponds to.

(2) to each layer of convolutional layer, the reasoning process of relocatable is first passed through, seeks each layer of input, output, weight Distribution, and the maximum absolute extreme of three is sought respectively | max |, and utilization index sliding average is calculated in the training process Method (EMA) updates the extreme value；

Specifically, the parameter that every layer network weight includes is w, β, in addition needing to quantify input and output, altogether Need to count the maximum value of w, β, IN, OUT totally 4 groups of numbers.In order to make the bare maximum after statistics, response data collection Statistical nature, and the maximum under nonspecific input picture need to be updated these extreme values using EMA.Specific formula: xⁿ=α x^n-1+(1-α)x。

xⁿFor the value currently finally retained, x^n-1For the value that last iteration process retains, x is the result that this is calculated.α is Weight coefficient is generally selected between 0.9~1, in an embodiment of the present invention α=0.99.

(3) input of network, weight are quantified according to the maximum value sought using following quantitative formula, makes it It can be expressed with int8；

Quantization input: Q_IN=clamp (IN/S1)

Quantization weight: Q_WT=clamp (WT/S2)

Quantization parameter: S1=| MAX |/Γ, | MAX |=max | IN | }, Γ=2^N

S2=| MAX |/Γ, | MAX |=max | WT | }, Γ=2^N

Wherein, Γ=2^NIndicate the digit of quantization；IN is input, and WT is weight, max { | IN | } be the maximum of input absolutely Extreme value, max { | WT | } are the maximum absolute extremes of weight；

Specifically, rule of thumb, the input of each layer network and weight absolute value are in 0~1 range, utilize statistics Maximum value carry out linear transformation, using above-mentioned formula by weight and the equal Regularization of input to [- 127,127], this hair It is bright when logarithm is rounded, using mode is directly truncated, rather than the evidence obtaining mode that rounds up is used, in above formula Clamp () indicates break-in operation: int=clamp (float).In an embodiment of the present invention, N=8.

(4) according to the input for the quantization sought, weight, the quantization of current layer can be asked to export.Also it is to guarantee that network exports Integer numerical value is quantified using following formula:

Floating-point output: OUT=IN × WT=Q_IN × Q_WT × S1 × S2

Quantization output: Q_OUT=OUT/S3=Q_IN × Q_WT × (S1 × S2/S3)

Wherein S3 is output quantization coefficient.Since M=S1 × S2/S3 is floating number, it is for guarantee network reasoning process Integer calculates, can be with multiplication and displacement come approximate calculation, and coefficient C, S that the approximation is generated are saved as parameter, It is specific as follows:

Approximate calculation:

It specifically, is floating number due to M=S1S2/S3, due to needing to guarantee that the output valve after quantization can use integer table Show, while calculating process being made not to be related to floating-point operation, needs to carry out approximate calculation to M, enableTo guarantee multiplication of integers Bit wide it is small as far as possible, and the result of approximate calculation is more acurrate, needs to select the numberical range of C.In the embodiment of the present invention, limit 0 C≤2 <^v, v=24.

Solve the calculating process of C, S are as follows: first to M by being multiplied by repeatedly or divided by 2, finally enabling 0 < M^Δ<0.5.If a initial value Multiply 2 every time for 0, M to allow for a 1, M is added to remove 2 every time and allow for a subtracting 1.Finally enable C=round (M^Δ×2^v), S=v+a, Round () indicates round.

(5) needed before web results are exported to lower layer by nonlinear activation (active) process, this process sheet as Floating-point operation is needed to this process for the full integer calculating process that analogue forward is propagated using displacement approximation operation.It will activation Result (in8 expression) after operation displacement is approximate after taking inverse quantization, is reduced to floating type expression, output to next layer；Repeatedly (2) to (5) process, until network the last layer.For the network with batch norm layers, need first to take merger means, Batch norm layer parameter is directly merged into previous layer network.

Specifically, it for the network containing norm layers of batch, needs to take merging means, as shown in Figure 3.It has The implementation process of body: mathematical formulae can be used for batch normThe calculating process is described, wherein μ table Show output mean value, ε indicates denominator added value, prevents from being defaulted as 1e-5, σ indicates outputting standard divided by occurring operating except 0 when variance Difference, γ are the parameters that batch norm process generates, and β indicates biasing；Since batch norm linking is behind convolution process, i.e., X=∑ w × fmap (i, j), w are the weight of network, and fmap (i, j) is the characteristic pattern of input.By simple transformation, can incite somebody to action Batch norm is integrated into convolution process, and deformation process is expressed as follows:

Weight w after merging:

Biasing β after merging:

Convolution process after merging: y=∑ w_fold × fmap (i, j)+β _ fold

It is approximate using displacement to nonlinear activation function in the present invention, it ensure that before full integer to derivation process.The present invention Middle to use leaky activation primitive, concrete form is as follows:

It mainly include two parts operation for above-mentioned activation primitive: data judgement and floating-point multiplication.For before guaranteeing to pushing away It leads process only to be calculated with integer, the present invention uses it displacement approximate calculation, and concrete form is as follows:

Displacement approximation of the invention, is numerically equivalent to following approximation:

In actual calculating process, the operation for shifting activation will execute before the quantization final output of step (4).Make The bit wide of final output valve and the bit wide of input value are consistent, and to prepare before next layer to derivation process, processing can subtract in this way Less due to the displacement approximation operation bring error of active coating.

(6) weight is constantly updated in backpropagation, until network convergence, saves the weight and additional parameter of quantization；Through Crossing integer and quantifying later parameter can be used for before full integer replacing original floating-point operation with integer to derivation process.

Specifically, if setting the input channel of convolutional layer as L_M, output channel L_N, convolution kernel size is K, then integer The memory space that quantization front and back needs is as follows, is 1/4 before quantization after quantization.

After quantization:

Storage_int8=L_M × L_N × K × K+L_N+2 × sizeof (int32)/sizeof (int8)

Before quantization:

Storage_float=(L_M × L_N × K × K+L_N+bn × L_N × 3) sizeof (float), bn={ 0,1 }

As shown in figure 4, due to there is quantization and de-quantization process between two layers.In actual forward direction derivation process, The two can cancel out each other, therefore, in actual calculating process, it is only necessary to do at inverse quantization in the output of the last layer of network Reason, there is only full integers to calculate for middle layer, as shown in Figure 5.

In addition, the present invention surveys its performance using darknet frame: the amount of progress in the network structure of YOLO v2-tiny Change processing, to the average map value of ratio front and back, losing is 5.1%, as shown in table 1:

Classification	Before quantization	After quantization	Error
				Boat	0.1415	0.1657	0.0242
Bird	0.1807	0.1621	-0.0186
				Train	0.5145	0.4441	-0.0704
Bus	0.5306	0.4669	-0.0637
				Person	0.4633	0.4061	-0.0572
Dog	0.3379	0.3023	-0.0356
				Diningtable	0.3433	0.238	-0.1053
Sheep	0.3322	0.2644	-0.0678
				Pottedplant	0.0864	0.0756	-0.0108
Sofa	0.3187	0.2076	-0.1111
				Car	0.5195	0.4358	-0.0837
Aeroplane	0.4157	0.2801	-0.1356
				Bicycle	0.48	0.4563	-0.0237
Tvmonitor	0.4029	0.3335	-0.0694
				Bottle	0.0522	0.037	-0.0152
Motorbike	0.536	0.4221	-0.1139
				Cat	0.3847	0.3633	-0.0214
Chair	0.1776	0.1235	-0.0541
				Cow	0.3049	0.2972	-0.0077
Horse	0.5222	0.4384	-0.0838
				Average mAP	0.3521	0.301	-0.0511

Table 1

The present invention carries out detection identification to target using the parameter of quantization front and back

Given image inputs convolutional network, divides an image into S*S grid；

The anchor box for setting n fixed aspect ratio, to each grid forecasting n anchor box, each anchor The probability of the coordinate (x, y, w, h) of box independent prediction target, confidence level (p) and 20 classifications；

Non-maximum restraining (NMS) is carried out for S*S*n target of extraction, removal overlapping frame retains the pre- of high confidence level Survey frame；

As a result it exports, visualization display.

It is directed to a certain class target, the confidence level of the respective classes in all candidate frames need to be calculated, calculating process is as follows Shown in formula:

P (class)=P (class | obj) × P (obj)

Wherein P (class) indicates a certain final confidence level of classification target in some candidate frame, and P (class | obj) is indicated should The numerical value of the respective classes returned out in candidate frame, P (obj) indicate to return out in the candidate frame there are the probability of target.Meter After the probability for calculating corresponding classification, first passes through fixed threshold and carry out primary dcreening operation, filter out the low candidate frame of confidence level in respective classes, then By the method for NMS (non-maximum restraining), the target frame of overlapping is removed.

Non-maximum restraining (NMS) removal overlapping frame, is to carry out by each classification, process is summarized as follows:

(1) P (class) a kind of to certain in all candidate frames is ranked up by sequence from big to small, and will be all Frame all marks untreated state；

(2) Duplication for the frame and other frames for calculating maximum probability then retains maximum probability if Duplication is more than 0.5 Frame, remove other frames accordingly, and be marked as processed；

(3) that second largest target frame of P (class) is found out by cis-position, is marked according still further to step (2)；

(4) repeat step (2)~(3) until institute it is framed be all marked as it is processed；

(5) it for the target frame remained, selects to carry out visualization display more than the target of thresholding in P (class), it is defeated Result out.

As shown in fig. 6, carrying out the effect diagram after images steganalysis for common convolutional network；Quantified using full integer Convolutional network afterwards carries out target detection identification to same picture, and effect is as shown in Figure 7；As can be seen that being quantified using integer The performance loss of convolutional network afterwards is little, and recognition effect is almost even better compared to common convolutional network, but detects Faster, consumption computing resource is less for recognition speed.

The above content as it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, It is not intended to limit the invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention, It should all be included in the protection scope of the present invention.

Claims

1. a kind of full integer quantization method of convolutional network, which is characterized in that the described method comprises the following steps:

(2) to each layer of convolutional layer, the reasoning process of relocatable is first passed through, seeks each layer of input IN, output OUT and power The distribution of weight WT, and the maximum absolute extreme of three is sought respectively；

(4) according to input and weight of the maximum absolute extreme to current layer in convolutional network for inputting IN, exporting OUT and weight WT Carry out integer quantization；

(6) inverse quantization is carried out to the output of current layer integer quantization, is reduced to floating type, output to next layer；If next layer is Norm layers of batch, then take merger means that batch norm layer parameter is merged into current layer；Step (3)~(6) repeatedly, Until the last layer in convolutional network；

(7) weight is constantly updated in backpropagation, until network convergence, saves the weight and additional parameter of quantization；By whole Type quantifies later parameter for, to derivation process, replacing original floating-point operation with integer before full integer.

2. the full integer quantization method of a kind of convolutional network according to claim 1, which is characterized in that in the step (3) The maximum absolute extreme of three is updated during training specifically, updating maximum absolute pole using index sliding average algorithm Value:

xⁿ=α x^n-1+(1-α)x

Wherein, xⁿThe maximum absolute extreme of input, output or weight, x are updated for this^n-1Input, output or power were updated for last time The maximum absolute extreme of weight, x are input, output or the weight that this is calculated, and α is weight coefficient.

3. the full integer quantization method of a kind of convolutional network according to claim 1, which is characterized in that step (4) tool Body are as follows:

The integer of input quantifies:

Q_IN=clamp (IN/S1)

Wherein, Q_IN indicates integer quantization input；S1=max | IN | }/Γ, Γ=2^N；N indicates the digit of quantization；clamp() Indicate that part after decimal point is truncated in expression；Max | IN | } indicate the maximum absolute extreme inputted；

The integer of weight quantifies:

Q_WT=clamp (WT/S2)

Wherein, Q_WT indicates the integer quantization of weight；S2=max | WT | }/Γ, | Γ=2^N；Max | WT | } indicate weight most Big absolute extreme.

4. the full integer quantization method of a kind of convolutional network according to claim 1, which is characterized in that step (5) tool Body are as follows:

The output Q_OUT of integer quantization are as follows:

Q_OUT=Q_IN × Q_WT × M

M=S1 × S2/S3

Wherein, Q_IN indicates integer quantization input；Q_WT indicates the integer quantization of weight；Since M=S1 × S2/S3 is floating-point Type, therefore enableWherein, the derivation process of parameter C and parameter S are as follows:

M, M=S1 × S2/S3 are solved first:

Wherein, S1=max { | IN | }/Γ, Γ=2^N, max | IN | } indicate the maximum absolute extreme inputted；S2=max | WT |/Γ, | Γ=2^N, max { | WT | } indicates the maximum absolute extreme of weight；S3=max | OUT | }/Γ, | Γ=2^N, max | OUT | } indicate the maximum absolute extreme exported；N indicates the digit of quantization；

Again to M repeatedly multiplied by 2 or divided by 2, so that finally enabling 0 < M < 0.5, a=0 is enabled, each M is multiplied by 2 a=a+1, divided by 2 Then a=a-1, statistics obtain the value of final a；

S=v+a

(M × 2 C=round^v)

0 C≤2 <^v

Wherein, round () indicates to return to round.

5. the full integer quantization method of a kind of convolutional network according to claim 4, which is characterized in that the integer quantization Export Q_OUT are as follows:

Q_OUT=Q_IN × Q_WT × M

Before carrying out integer quantization to output, nonlinear activation first is carried out to Q_IN and Q_WT, the nonlinear activation is using displacement Approximation operation.

6. the full integer quantization method of a kind of convolutional network according to claim 5, which is characterized in that described to Q_IN and Q_ WT carries out nonlinear activation specifically:

Wherein, y<<1 indicates binary y moving to left one, (y+y<<1)>>it 5 indicates binary (y+y<<1) moving to right 5 Position, the Q_IN × Q_WT for eventually passing through nonlinear activation remain as integer.

7. the full integer quantization method of a kind of convolutional network according to claim 1, which is characterized in that in the step (6) If next layer is batch norm layers, take merger means that batch norm layer parameter is merged into current layer specifically:

The calculating process that norm layers of batch are as follows:

Wherein, x indicates input, and y indicates output, and ε indicates denominator added value, and μ indicates output mean value, and σ indicates that outputting standard is poor, γ It is the parameter that norm layers of calculating process of batch generates, β indicates biasing；

Y=∑ w × fmap (i, j)

Weight after merging:

Biasing after merging:

Convolution process after merging: y=∑ w_fold × fmap (i, j)+β _ fold.

8. a kind of application method of full integer quantization convolutional network, which is characterized in that the application method the following steps are included:

S2 first passes through the reasoning process of relocatable to each layer of convolutional layer, seeks each layer of input IN, output OUT and power The distribution of weight WT, and the maximum absolute extreme of three is sought respectively；

S4 according to input IN, exports input and weight of the maximum absolute extreme of OUT and weight WT to current layer in convolutional network Carry out integer quantization；

S6 carries out inverse quantization to the output of current layer integer quantization, is reduced to floating type, output to next layer；If next layer is Norm layers of batch, then take merger means that batch norm layer parameter is merged into current layer；Step is successively executed repeatedly S3 to S6, until the last layer in convolutional network；

S7 is used for backpropagation, constantly updates weight, until network convergence, saves the weight and additional parameter of quantization；Through It crosses integer and quantifies later parameter for, to derivation process, replacing original floating-point operation with integer before full integer；

The image of target to be detected is input to full integer and quantifies convolutional network, the image of target to be detected is divided into S*S by S8 A grid；

S9 sets the anchor box of n fixed aspect ratio, to each grid forecasting n anchor box, each anchor The probability of the coordinate (x, y, w, h) of box independent prediction target, confidence level p and m classification；Wherein, x, y expression coordinates of targets, w, The height and width of h expression target；

S10, the corresponding probability of each classification calculated according to front first pass through fixed threshold and carry out primary dcreening operation, filter out respective classes Middle confidence level is lower than the candidate frame of threshold value, then by the method for non-maximum restraining, removes the target frame of overlapping；

S11 selects different classes of middle corresponding probability to carry out visualization more than the target of thresholding and shows for the target frame remained Show, exports object detection results.