CN114372565A - Target detection network compression method for edge device - Google Patents
Target detection network compression method for edge device Download PDFInfo
- Publication number
- CN114372565A CN114372565A CN202210038592.0A CN202210038592A CN114372565A CN 114372565 A CN114372565 A CN 114372565A CN 202210038592 A CN202210038592 A CN 202210038592A CN 114372565 A CN114372565 A CN 114372565A
- Authority
- CN
- China
- Prior art keywords
- network
- skynet
- quantization
- layer
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006835 compression Effects 0.000 title claims abstract description 37
- 238000007906 compression Methods 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000013139 quantization Methods 0.000 claims abstract description 77
- 238000012545 processing Methods 0.000 claims abstract description 41
- 238000010586 diagram Methods 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 230000004913 activation Effects 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 16
- 238000013138 pruning Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 12
- 230000003321 amplification Effects 0.000 claims description 8
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 4
- 229910052739 hydrogen Inorganic materials 0.000 claims description 3
- 101100404736 Arabidopsis thaliana NIA2 gene Proteins 0.000 claims 1
- 101100166823 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) CTF3 gene Proteins 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 4
- 229920006395 saturated elastomer Polymers 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000011897 real-time detection Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention belongs to the technical field of target detection of edge equipment, and discloses a target detection network compression method for the edge equipment, which comprises the following steps: optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution. The invention is respectively sent from the angle of algorithm optimization, takes SkyNet as an example, and provides a compression processing technology for the target detection network, thereby reducing the deployment difficulty of the target detection network on edge equipment. The invention cuts the network and is more suitable for the edge device. The invention carries out quantization processing and greatly reduces the size of the network model. The invention carries out merging processing, and greatly reduces the calculated amount of the network.
Description
Technical Field
The invention belongs to the technical field of target detection of edge equipment, and particularly relates to a target detection network compression method for edge equipment.
Background
At present, the task of target detection is to find out and identify the target of interest in the image, and they are widely used in scenes such as automatic driving, face detection and video monitoring. In recent years, a target detection algorithm based on a convolutional neural network obtains better performance than that of a traditional method, but due to huge calculation amount and parameter amount, most of the convolutional neural networks are deployed on a general CPU or a general GPU, power consumption is large, the size is large, meanwhile, real-time detection of edge equipment is difficult to achieve, and a light-weight network or a compression network mode is urgently needed, so that the convolutional neural network can directly carry out inference work on the edge equipment.
To solve this problem, prior art 1 proposes a lightweight network of mobilene, and prior art 2 proposes a lightweight network of Xception, which uses a Deep Separable Convolution (DSC) instead of the standard Convolution to reduce the amount of calculation and parameters and to improve the operation efficiency of the target detection network on the edge device to some extent. In the prior art 3, a hardware-friendly network Skynet is designed for edge devices on the basis of deep separable convolution, and compared with MobileNet and Xception, the Skynet structure is more regular, the module reuse rate is higher, and the deployment platform still has higher computational requirements.
To meet the edge application scenario, a number of neural network optimization techniques are proposed, which mainly include network compression, and are roughly divided into quantization of the network and pruning of the network. The quantization part uses low precision number to replace high precision number to participate in convolution calculation, uses partial precision loss to avoid floating point number operation, or uses a central value to replace a part of weight value in a clustering mode; the pruning is to utilize the network sparsity, and as the weight in the neural network is smaller, the influence on the final prediction result is smaller, the network can be skipped by judging whether the weight of the network is zero.
Through the above analysis, the problems and defects of the prior art are as follows: the existing target detection network has low operation efficiency on the edge equipment, high requirement on the equipment, large power consumption and large volume, and cannot carry out real-time detection on the edge equipment.
The difficulty in solving the above problems and defects is: the high-precision target detection task can be realized by utilizing the convolutional neural network, but the characteristics of large parameter and large calculation amount make the realization on edge equipment difficult.
The significance of solving the problems and the defects is as follows: by the set of compression method, the network parameter quantity and the calculated quantity can be reduced, network deployment can be carried out on low-power-consumption and low-cost edge equipment, and the precision loss is in a reasonable range.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a target detection network compression method for edge equipment.
The invention is realized in such a way that an object detection network compression method for edge equipment comprises the following steps:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
Further, the target detection network compression method for the edge device comprises the following steps:
removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to perform optimized cutting of the SkyNet network to obtain an optimized SkyNet network;
step two, the SkyNet network is compressed after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and step three, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
Further, the performing optimized cutting of the SkyNet network includes the following steps:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
Further, the optimized SkyNet network includes:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
Further, the compression of the SkyNet network after retraining comprises the following steps:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
Further, selecting a threshold value by using the KL relative entropy, and performing feature map quantization by using saturated quantization comprises:
1) selecting a threshold value by utilizing KL relative entropy:
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
Further, the merging the SkyNet network structure includes:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
wherein:
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; ReLu activation and saturation truncation were performed.
Further, the performing ReLu activation and saturation truncation comprises:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
Another object of the present invention is to provide an object detection network compression system for an edge device, comprising:
the network pruning module is used for carrying out optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a part of channels output by the first layer to obtain the optimized SkyNet network;
the network compression module is used for compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and the network structure merging module is used for merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing and merging the SkyNet network structure.
By combining all the technical schemes, the invention has the advantages and positive effects that:
the invention is respectively sent from the angle of algorithm optimization, takes SkyNet as an example, and provides a compression processing technology for the target detection network, thereby reducing the deployment difficulty of the target detection network on edge equipment. The invention cuts the network and is more suitable for the edge device. The invention carries out quantization processing and greatly reduces the size of the network model. The invention carries out merging processing, and greatly reduces the calculated amount of the network.
Drawings
Fig. 1 is a flowchart of a target detection network compression method for an edge device according to an embodiment of the present invention.
Fig. 2 is a diagram of the optimized Skynet network structure according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a saturated truncated scaling quantization according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a merged convolutional layer and a normalization layer according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of the FETCH operation provided by the embodiment of the present invention.
Fig. 6 is a schematic diagram of a calculation process of each layer before fixed-point processing according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a calculation process of each layer after the fixed point processing provided by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a target detection network compression method for an edge device, and the following describes the present invention in detail with reference to the accompanying drawings.
The target detection network compression method for the edge device provided by the embodiment of the invention comprises the following steps:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
As shown in fig. 1, the method for compressing an object detection network for an edge device according to an embodiment of the present invention includes the following steps:
s101, performing optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to obtain an optimized SkyNet network;
s102, compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
s103, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
The SkyNet network optimized cutting method provided by the embodiment of the invention comprises the following steps:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
The optimized SkyNet network provided by the embodiment of the invention comprises the following components:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
The SkyNet network compression method after retraining provided by the embodiment of the invention comprises the following steps:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
The method for selecting the threshold value by utilizing the KL relative entropy and quantizing the characteristic diagram by adopting the saturation quantization comprises the following steps:
1) selecting a threshold value by utilizing KL relative entropy:
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
The SkyNet network structure merging method provided by the embodiment of the invention comprises the following steps:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
wherein:
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; ReLu activation and saturation truncation were performed.
The ReLu activation and saturation truncation provided by the embodiment of the invention comprises the following steps:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
The technical solution of the present invention is further described with reference to the following specific embodiments.
Example 1:
the target detection network compression method for the edge device provided by the embodiment of the invention comprises the following steps:
(1) optimized tailoring
The Skynet branches are pruned, each depth separable convolution is taken as a minimum unit layer, the output of a first layer is pruned into 32 channels, then the pooling operation is added after a sixth layer, and finally the last layer is modified into the depth separable convolution, so that the whole network structure is optimized into a straight cylinder type.
As shown in fig. 2, the optimized Skynet includes 8 layers in total, namely a 3-channel input layer (CHL3), an intermediate layer (CHL32, CHL96, CHL192, CHL384, CHL512, CHL96), and a regression layer (CHL30), wherein the convolution between each layer is realized by using Deep Separable Convolution (DSC), the network structure is regular, the module multiplexing is facilitated, and a high-efficiency calculation structure is formed.
The input image size used in the present invention is 160 × 160, and the image size of each layer is shown in table 1:
TABLE 1 detailed dimensions of the layers
(2) Compressing a model
The quantization parts are quantization of weights, quantization of feature maps and spotting of bias and scaling coefficients, respectively.
The weight adopts a maximum value quantization mode, and the maximum value selects the maximum value in the convolution kernels corresponding to each output channel.
And (3) setting the original weight corresponding to each channel as a vector w, the weight after quantization as a vector q _ w and the scaling coefficient as a scalar scale _ w, wherein the corresponding relation is shown as formulas (1) and (2), in order to facilitate operation, the boundary is taken as 63, and the finally obtained q _ w is uniformly distributed between-63 and 63.
q_w=w×scale_w (2)
The characteristic diagram quantization adopts saturation quantization, and a threshold value is selected by utilizing the KL relative entropy, so that the precision loss is obviously reduced.
Saturated quantization as shown in fig. 3, a threshold T is selected, the values of the original distribution within the range of ± T are scaled to-127 to +127, the red value table in the figure is scaled to be out of the range, saturation processing is performed, and the saturated value is directly taken to represent the part of the values.
Assuming that original distribution before quantization is p, distribution after quantization is q by using a threshold value T, original distribution information entropy is H (p), cross entropy of original distribution and distribution after quantization is H (p, q), and KL relative entropy is DKL (p | | | q), the following formula is given:
the available scaling factor is scale _ fm, where:
and (4) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the coefficients by adopting 32-bit integer numbers. And setting the weight quantization coefficient as scale _ w, the current layer feature map quantization coefficient as scale _ fm, the bias as bias, and the next layer quantization coefficient as scale _ next _ fm.
Setting the inverse quantization coefficient after merging, amplification and rounding as Scale _ merge, the Bias coefficient as Bias _ merge, and the amplification factor as shift _ coe, including:
Bias_merge=int(bias×scale_next_fm×shift_coe) (7)
(3) merging network structures
And combining the convolution layer and the normalization layer, wherein the deep separable convolution comprises the normalization layer, and the normalization layer can accelerate network convergence, control overfitting and solve the problems of gradient disappearance and gradient explosion when a network model is trained. After the model training is completed, all parameters are fixed, and at this time, the convolution layer parameters and the normalization layer parameters in the network are combined, as shown in fig. 4, so that the network structure can be effectively simplified, the calculation amount is reduced, and the calculation efficiency is improved.
Assuming that the convolution output is y1, the input is x, the weight is w, the deviation is b, x, w and b are vectors, the normalization layer output is y2, the mean value is μ, the standard deviation is σ, the scaling coefficient is γ, the scaling offset is β, and let ε be 1e-6, the convolution calculation formula is:
y1=wx+b (9)
the normalization layer calculation formula is as follows:
wherein:
substituting equation (10) for equation (11), assuming W as the post-fusion weight and B as the post-fusion bias, with the combined output y 3:
wherein:
and after fusion, normalization layer calculation is not needed any more, so that the size of the model is reduced, the calculation resource is saved, and the performance is improved for forward reasoning.
The merge activation operation, quantization, dequantization, and saturation truncation are FETCH operations, as shown in fig. 5.
The fetch operation performs a 32bit to 8bit transition, essentially completing both the ReLu activation and saturation truncation processes. After calculation by using fixed-point data, the result needs to be reduced to the previous 1/shift _ coe, for edge equipment, since shift _ coe is the power of 2, the bit can be directly taken, the positive and negative of input data needs to be judged in the middle, and if the positive is true, saturation truncation is carried out; if the output is negative, the activated value is 0.
FIG. 6 shows the inference steps before the stationarity and network structure consolidation, and FIG. 7 shows the inference steps after optimization.
Example 2:
(1) optimized cut and retrain
And (5) performing retraining on the model after the Skynet network structure is optimized into a straight cylinder type. Table 2 shows comparison of accuracy before and after pruning of Skynet, and the accuracy is reduced by less than 0.03 by using average accuracy (AP, as shown in formula 23) as an evaluation index, so that the efficiency of real-time calculation of Skynet on the edge device is greatly improved under the condition of meeting the actual application requirements.
TABLE 2 comparison of accuracy before and after Skynet pruning
Type of model | Average accuracy |
Complete model | 0.797 |
Model after pruning | 0.770 |
(2) Compressing models and merging network structures
The model after pruning and retraining is quantized and combined to activate operation, quantization, inverse quantization and saturation truncation operation, compared with the model before uncompressed, the precision is reduced by only 2.34%, and the size of the network model is reduced by 74.5%.
Comparison tables relating to model size, average accuracy and loss of accuracy before and after model compression optimization, e.g.
Shown in table 3.
Before compression | After compression | |
Data type | 32bit floating point type | 7bit/8bit/32bit fixed point type |
Model size (MB) | 1.41 | 0.359 |
Average Precision (AP) | 0.770 | 0.752 |
Loss of precision | --- | 2.34% |
Compression ratio | --- | 74.5% |
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (8)
1. An object detection network compression method for an edge device, the object detection network compression method for an edge device comprising:
optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.
2. The object detection network compression method for edge devices of claim 1, wherein the object detection network compression method for edge devices comprises the steps of:
removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to perform optimized cutting of the SkyNet network to obtain an optimized SkyNet network;
step two, the SkyNet network is compressed after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and step three, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.
3. The target detection network compression method for an edge device of claim 2, wherein the performing optimized pruning of the SkyNet network comprises the steps of:
first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;
secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.
4. The target detection network compression method for an edge device of claim 2, wherein the optimized SkyNet network comprises:
a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;
the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.
5. The method of object detection network compression for an edge device of claim 2, wherein the compression of the retrained SkyNet network comprises the steps of:
(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:
q_w=w×scale_w;
wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;
(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.
6. The method according to claim 5, wherein the selecting the threshold value using the KL relative entropy and performing the feature map quantization using saturation quantization comprises:
1) selecting a threshold value by utilizing KL relative entropy:
wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;
2) calculating a scaling factor scale _ fm:
3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;
4) calculating the inverse quantization coefficient after merging, amplification and rounding:
Bias_merge=int(bias×scale_next_fm×shift_coe);
wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.
7. The method of object detection network compression for an edge device of claim 2, wherein said merging the SkyNet network structure comprises:
(1) merging the convolution layer and the normalization layer to obtain a merged output y3Comprises the following steps:
wherein:
wherein, y1Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;
(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; performing ReLu activation and saturation truncation; the performing ReLu activation and saturation truncation comprises:
judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.
8. An object detection network compression system for an edge device implementing the object detection network compression method for the edge device according to any one of claims 7, wherein the object detection network compression system for the edge device comprises:
the network pruning module is used for carrying out optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a part of channels output by the first layer to obtain the optimized SkyNet network;
the network compression module is used for compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;
and the network structure merging module is used for merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing and merging the SkyNet network structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210038592.0A CN114372565B (en) | 2022-01-13 | 2022-01-13 | Target detection network compression method for edge equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210038592.0A CN114372565B (en) | 2022-01-13 | 2022-01-13 | Target detection network compression method for edge equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114372565A true CN114372565A (en) | 2022-04-19 |
CN114372565B CN114372565B (en) | 2024-10-15 |
Family
ID=81144914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210038592.0A Active CN114372565B (en) | 2022-01-13 | 2022-01-13 | Target detection network compression method for edge equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114372565B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505774A (en) * | 2021-07-14 | 2021-10-15 | 青岛全掌柜科技有限公司 | Novel policy identification model size compression method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111612147A (en) * | 2020-06-30 | 2020-09-01 | 上海富瀚微电子股份有限公司 | Quantization method of deep convolutional network |
US20210035331A1 (en) * | 2019-07-31 | 2021-02-04 | Hewlett Packard Enterprise Development Lp | Deep neural network color space optimization |
CN112488070A (en) * | 2020-12-21 | 2021-03-12 | 上海交通大学 | Neural network compression method for remote sensing image target detection |
CN113807330A (en) * | 2021-11-19 | 2021-12-17 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Three-dimensional sight estimation method and device for resource-constrained scene |
-
2022
- 2022-01-13 CN CN202210038592.0A patent/CN114372565B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210035331A1 (en) * | 2019-07-31 | 2021-02-04 | Hewlett Packard Enterprise Development Lp | Deep neural network color space optimization |
CN111612147A (en) * | 2020-06-30 | 2020-09-01 | 上海富瀚微电子股份有限公司 | Quantization method of deep convolutional network |
CN112488070A (en) * | 2020-12-21 | 2021-03-12 | 上海交通大学 | Neural network compression method for remote sensing image target detection |
CN113807330A (en) * | 2021-11-19 | 2021-12-17 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Three-dimensional sight estimation method and device for resource-constrained scene |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505774A (en) * | 2021-07-14 | 2021-10-15 | 青岛全掌柜科技有限公司 | Novel policy identification model size compression method |
Also Published As
Publication number | Publication date |
---|---|
CN114372565B (en) | 2024-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764471B (en) | Neural network cross-layer pruning method based on feature redundancy analysis | |
CN110378468B (en) | Neural network accelerator based on structured pruning and low bit quantization | |
CN113159173B (en) | Convolutional neural network model compression method combining pruning and knowledge distillation | |
CN111079781B (en) | Lightweight convolutional neural network image recognition method based on low rank and sparse decomposition | |
CN109002889B (en) | Adaptive iterative convolution neural network model compression method | |
CN112163628A (en) | Method for improving target real-time identification network structure suitable for embedded equipment | |
CN112329922A (en) | Neural network model compression method and system based on mass spectrum data set | |
CN113222138A (en) | Convolutional neural network compression method combining layer pruning and channel pruning | |
CN110533022B (en) | Target detection method, system, device and storage medium | |
CN113011570A (en) | Adaptive high-precision compression method and system of convolutional neural network model | |
CN110781912A (en) | Image classification method based on channel expansion inverse convolution neural network | |
CN113111889B (en) | Target detection network processing method for edge computing end | |
CN111696149A (en) | Quantization method for stereo matching algorithm based on CNN | |
CN110751265A (en) | Lightweight neural network construction method and system and electronic equipment | |
CN110503135A (en) | Deep learning model compression method and system for the identification of power equipment edge side | |
CN113595993A (en) | Vehicle-mounted sensing equipment joint learning method for model structure optimization under edge calculation | |
CN113610192A (en) | Neural network lightweight method and system based on continuous pruning | |
CN117333497A (en) | Mask supervision strategy-based three-dimensional medical image segmentation method for efficient modeling | |
CN112597919A (en) | Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization | |
CN114372565A (en) | Target detection network compression method for edge device | |
CN112488291B (en) | 8-Bit quantization compression method for neural network | |
CN112561054B (en) | Neural network filter pruning method based on batch characteristic heat map | |
CN112613604A (en) | Neural network quantification method and device | |
CN117151178A (en) | FPGA-oriented CNN customized network quantification acceleration method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |