CN114372565A

CN114372565A - Target detection network compression method for edge device

Info

Publication number: CN114372565A
Application number: CN202210038592.0A
Authority: CN
Inventors: 钟胜; 唐维伟; 邹旭; 徐文辉; 谭富中; 卢金仪
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-04-19
Anticipated expiration: 2042-01-13
Also published as: CN114372565B

Abstract

The invention belongs to the technical field of edge device target detection, and discloses a target detection network compression method for edge devices. The target detection network compression method for edge devices includes: optimizing the network structure of SkyNet, and compressing feature maps and weight parameters. Quantization; reconstructs the forward inference structure, incorporating part of the computational processing in depthwise separable convolutions. From the perspective of algorithm optimization, the present invention takes SkyNet as an example, and proposes a compression processing technology for the target detection network, which reduces the difficulty of deploying the target detection network on edge devices. The invention cuts out the network and is more suitable for edge devices. The present invention performs quantization processing and greatly reduces the size of the network model. The present invention performs merging processing, which greatly reduces the computational load of the network.

Description

A target detection network compression method for edge devices

技术领域technical field

本发明属于边缘设备目标检测技术领域，尤其涉及一种用于边缘设备的目标检测网络压缩方法。The invention belongs to the technical field of edge device target detection, and in particular relates to a target detection network compression method for edge devices.

背景技术Background technique

目前，目标检测任务是找出图像中感兴趣的目标并加以标识，它们被广泛用于自动驾驶、人脸检测和视频监控等场景中。近年来，基于卷积神经网络的目标检测算法取得了比传统方法更好的性能，但由于其庞大的计算量和参数量，大多数卷积神经网络部署在通用CPU或者通用GPU上，功耗大、体积大同时很难做到边缘设备的实时检测，亟需一种轻量型网络或者压缩网络的方式，使卷积神经网络能够直接在边缘设备上开展推理工作。Currently, the task of object detection is to find and identify objects of interest in images, which are widely used in scenarios such as autonomous driving, face detection, and video surveillance. In recent years, target detection algorithms based on convolutional neural networks have achieved better performance than traditional methods, but due to their huge amount of computation and parameters, most convolutional neural networks are deployed on general-purpose CPUs or general-purpose GPUs. Due to its large size and volume, it is difficult to achieve real-time detection of edge devices. There is an urgent need for a lightweight network or a way to compress the network, so that convolutional neural networks can directly perform reasoning work on edge devices.

为了解决该问题，现有技术1提出Mobilenet的轻量型网络，现有技术2则提出Xception的轻量型网络，此类网络使用深度可分离卷积(DSC，Depthwise SeparableConvolution)替代标准卷积，以降低计算量和参数量，一定程度上提高了目标检测网络在边缘设备上的运行效率。现有技术3在深度可分离卷积基础上，针对边缘设备设计出硬件友好型网络Skynet，相较于MobileNet以及Xception，Skynet结构更规整，模块复用率更高，但仍对部署平台有较高算力要求。In order to solve this problem, prior art 1 proposes a lightweight network of Mobilenet, and prior art 2 proposes a lightweight network of Xception, which uses Depthwise Separable Convolution (DSC, Depthwise Separable Convolution) instead of standard convolution, In order to reduce the amount of calculation and parameters, the operation efficiency of the target detection network on the edge device is improved to a certain extent. Existing technology 3 designs a hardware-friendly network Skynet for edge devices on the basis of depthwise separable convolution. Compared with MobileNet and Xception, Skynet has a more regular structure and a higher module reuse rate, but it is still relatively difficult for deployment platforms. High computing power requirements.

为满足边缘应用场景，诸多神经网络优化技术被提出，其中以网络压缩为主，大体分为对网络的量化和对网络的剪枝。量化部分是用低精度数代替高精度数参与到卷积计算中，用部分精度的损失来避免浮点数运算，或利用聚类的方式，用一个中心值代替一部分权值；剪枝则为利用网络稀疏性，由于神经网络中权值越小，对最终预测结果影响也越小，因此可通过判断网络权值是否为零值来对网络进行计算的跳过。In order to meet the edge application scenarios, many neural network optimization techniques have been proposed, among which network compression is the main one, which is roughly divided into network quantization and network pruning. The quantization part uses low-precision numbers instead of high-precision numbers to participate in the convolution calculation, and uses partial precision loss to avoid floating-point operations, or uses clustering to replace part of the weights with a central value; pruning is to use Network sparsity, because the smaller the weight in the neural network, the smaller the impact on the final prediction result, so the network can be skipped by judging whether the network weight is zero.

通过上述分析，现有技术存在的问题及缺陷为：现有的目标检测网络在边缘设备上运行效率不高，且对设备要求高，功耗大、体积大，不能进行边缘设备的实时检测。Through the above analysis, the existing problems and defects of the existing technology are: the existing target detection network is inefficient in operation on edge devices, has high requirements on the devices, consumes a lot of power, and is bulky, and cannot perform real-time detection of edge devices.

解决以上问题及缺陷的难度为：利用卷积神经网络可实现高精度的目标检测任务，但其参数量、计算量大的特点使得难以在边缘设备上实现。The difficulty of solving the above problems and defects is: the use of convolutional neural networks can achieve high-precision target detection tasks, but its characteristics of large amount of parameters and large amount of calculation make it difficult to achieve on edge devices.

解决以上问题及缺陷的意义为：通过该套压缩方法，可降低网络参数量和计算量，可在低功耗和低成本的边缘设备上开展网络部署，精度损失在合理范围内。The significance of solving the above problems and defects is: through this set of compression methods, the amount of network parameters and calculations can be reduced, and network deployment can be carried out on edge devices with low power consumption and low cost, and the accuracy loss is within a reasonable range.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的问题，本发明提供了一种用于边缘设备的目标检测网络压缩方法。In view of the problems existing in the prior art, the present invention provides a target detection network compression method for edge devices.

本发明是这样实现的，一种用于边缘设备的目标检测网络压缩方法，包括：The present invention is implemented in this way, a target detection network compression method for edge devices, comprising:

对SkyNet优化网络结构，对特征图和权重参数进行量化；重新构造前向推理结构，合并深度可分离卷积中的部分计算处理。Optimize the network structure of SkyNet, quantify the feature map and weight parameters; reconstruct the forward inference structure, and combine some computing processing in the depthwise separable convolution.

进一步，所述用于边缘设备的目标检测网络压缩方法包括以下步骤：Further, the target detection network compression method for edge devices includes the following steps:

步骤一，通过去除SkyNet网络中的旁路分支结构、删除第一层输出的部分通道进行SkyNet网络的优化裁剪，得到优化后的SkyNet网络；Step 1, by removing the bypass branch structure in the SkyNet network, deleting the partial channel of the first layer output to carry out the optimization and cutting of the SkyNet network, to obtain the optimized SkyNet network;

步骤二，通过将权重量化到7bit、将特征图量化到8bit、对偏置参数和缩放系数进行融合，并定点化到32bit进行重训练后的SkyNet网络的压缩；Step 2: Compress the SkyNet network after retraining by quantizing the weights to 7 bits, quantizing the feature maps to 8 bits, fusing the bias parameters and scaling coefficients, and quantizing them to 32 bits;

步骤三，通过合并卷积层与归一化层、将激活处理、量化、反量化以及饱和截断处理合并为FETCH处理进行SkyNet网络结构的合并。Step 3: Merge the SkyNet network structure by merging the convolution layer and the normalization layer, merging activation processing, quantization, inverse quantization and saturation truncation processing into FETCH processing.

进一步，所述进行SkyNet网络的优化裁剪包括以下步骤：Further, the described optimization and cropping of the SkyNet network comprises the following steps:

首先，将Skynet分支剪枝掉；以每一个深度可分离卷积为最小单位层，将第一层输出剪枝为32通道；First, prune the Skynet branch; take each depthwise separable convolution as the minimum unit layer, and prune the output of the first layer into 32 channels;

其次，在第六层后加入池化处理，并将最后一层修改为深度可分离卷积，整体网络结构优化为直筒型。Second, pooling is added after the sixth layer, and the last layer is modified to be depthwise separable convolution, and the overall network structure is optimized to be straight.

进一步，所述优化后的SkyNet网络包括：Further, the optimized SkyNet network includes:

3通道输入层CHL3，中间层CHL32、CHL96、CHL192、CHL384、CHL512、CHL96以及回归层CHL30；3-channel input layer CHL3, intermediate layer CHL32, CHL96, CHL192, CHL384, CHL512, CHL96 and regression layer CHL30;

所述优化后的SkyNet网络每层之间卷积使用深度可分离卷积。The convolution between each layer of the optimized SkyNet network uses depthwise separable convolution.

进一步，所述进行重训练后的SkyNet网络的压缩包括以下步骤：Further, the described compression of the SkyNet network after retraining comprises the following steps:

(1)选取各输出通道对应的卷积核中的最大值作为量化的最大值，采用最大值量化方式进行权重的量化：(1) Select the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and use the maximum value quantization method to quantify the weight:

q_w＝w×scale_w；q_w=w×scale_w;

其中，w表示各通道对应的原始权重的向量；q_w表示量化后权重的向量；scale_w表示缩放系数为标量；Among them, w represents the vector of the original weight corresponding to each channel; q_w represents the vector of the quantized weight; scale_w represents that the scaling coefficient is a scalar;

(2)利用KL相对熵选取阈值，采用饱和式量化进行特征图量化：选取阈值T，将原始分布处在±T范围内的值等比例缩放到-127到+127，将超出范围外的部分进行饱和处理，直接取饱和值表示超出范围外的部分的值。(2) Use the KL relative entropy to select the threshold value, and use saturated quantization to quantify the feature map: select the threshold value T, and scale the value of the original distribution within the range of ±T to -127 to +127, and the part beyond the range is scaled. Perform saturation processing, and directly take the saturation value to represent the value of the part outside the range.

进一步，所述利用KL相对熵选取阈值，采用饱和式量化进行特征图量化包括：Further, using the KL relative entropy to select a threshold, and adopting saturated quantization to perform feature map quantization includes:

1)利用KL相对熵选取阈值：1) Use the KL relative entropy to select the threshold:

其中，p表示设量化前原始分布，q表示使用阈值T量化后分布，H(p)表示原始分布信息熵，H(p,q)表示原始分布与量化后分布交叉熵，DKL(p||q)表示KL相对熵；Among them, p represents the original distribution before quantization, q represents the distribution after quantization using the threshold T, H(p) represents the information entropy of the original distribution, H(p, q) represents the cross entropy between the original distribution and the quantized distribution, DKL(p|| q) represents the KL relative entropy;

2)计算缩放系数scale_fm：2) Calculate the scaling factor scale_fm:

3)对偏置和缩放系数定点化，将前向推理中出现的浮点数单元进行合并，并放大取整，采用32位整型数保存最终系数采用32位整型数保存；3) The bias and scaling coefficients are fixed-point, and the floating-point units appearing in the forward inference are combined, enlarged and rounded, and the final coefficients are saved by 32-bit integers;

4)计算合并及放大取整后的反量化系数：4) Calculate the inverse quantization coefficient after combining and enlarging:

Bias_merge＝int(bias×scale_next_fm×shift_coe)；Bias_merge=int(bias×scale_next_fm×shift_coe);

其中，scale_w表示权重量化系数；scale_fm当表示前层特征图量化系数；bias表示偏置；scale_next_fm表示下一层量化系数；Scale_merge表示合并及放大取整后的反量化系数；Bias_merge表示偏置系数；shift_coe表示放大倍数。Among them, scale_w represents the weight quantization coefficient; scale_fm currently represents the quantization coefficient of the feature map of the previous layer; bias represents the bias; scale_next_fm represents the quantization coefficient of the next layer; Scale_merge represents the inverse quantization coefficient after merging and enlarging; shift_coe represents the magnification.

进一步，所述进行SkyNet网络结构的合并包括：Further, the described merging of SkyNet network structure includes:

(1)合并卷积层与归一化层，得到合并后输出y₃为：(1) Combine the convolutional layer and the normalization layer, and the combined output y ₃ is:

其中：in:

其中，y₁表示卷积输出；x表示输入；w表示权重；b表示偏差；x、w和b均为向量；μ表示均值；σ表示标准差；γ表示缩放系数；β表示缩放偏移；ε＝1e-6；W表示融合后权重；B表示融合后偏置；where y ₁ represents the convolution output; x represents the input; w represents the weight; b represents the bias; x, w and b are all vectors; μ represents the mean; σ represents the standard deviation; γ represents the scaling factor; β represents the scaling offset; ε=1e-6; W represents the weight after fusion; B represents the bias after fusion;

(2)合并激活处理、量化、反量化以及饱和截断为FETCH处理；所述FETCH处理将32bit转换为8bit；进行ReLu激活及饱和截断。(2) Combine activation processing, quantization, inverse quantization, and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; and performs ReLu activation and saturation truncation.

进一步，所述进行ReLu激活及饱和截断包括：Further, the described performing ReLu activation and saturation cutoff includes:

对输入数据的正负进行判断，若为正，则进行饱和截断；若为负，则输出激活后的值0。Judging the positive and negative of the input data, if it is positive, it will perform saturation truncation; if it is negative, it will output the activated value of 0.

本发明的另一目的在于提供一种用于边缘设备的目标检测网络压缩系统，包括：Another object of the present invention is to provide a target detection network compression system for edge devices, including:

网络剪枝模块，用于通过去除SkyNet网络中的旁路分支结构、删除第一层输出的部分通道进行SkyNet网络的优化裁剪，得到优化后的SkyNet网络；The network pruning module is used to optimize and trim the SkyNet network by removing the bypass branch structure in the SkyNet network and deleting part of the output channels of the first layer to obtain the optimized SkyNet network;

网络压缩模块，通过将权重量化到7bit、将特征图量化到8bit、对偏置参数和缩放系数进行融合，并定点化到32bit进行重训练后的SkyNet网络的压缩；The network compression module compresses the SkyNet network after retraining by quantizing the weights to 7 bits, quantizing the feature maps to 8 bits, fusing the bias parameters and scaling coefficients, and fixing them to 32 bits;

网络结构合并模块，用于通过合并卷积层与归一化层、将激活处理、量化、反量化以及饱和截断处理合并为FETCH处理进行SkyNet网络结构的合并。The network structure merging module is used to merge the SkyNet network structure by merging the convolution layer and the normalization layer, merging activation processing, quantization, inverse quantization and saturation truncation processing into FETCH processing.

结合上述的所有技术方案，本发明所具备的优点及积极效果为：Combined with all the above-mentioned technical solutions, the advantages and positive effects possessed by the present invention are:

本发明分别从算法优化角度出发，以SkyNet为例，提出一种对目标检测网络的压缩处理技术，降低了目标检测网络在边缘设备上的部署难度。本发明对网络进行了裁剪，更适用于边缘设备。本发明进行量化处理，大幅度降低网络模型尺寸。本发明进行合并处理，大幅度降低网络的计算量。From the perspective of algorithm optimization, the present invention takes SkyNet as an example, and proposes a compression processing technology for the target detection network, which reduces the difficulty of deploying the target detection network on edge devices. The invention cuts out the network and is more suitable for edge devices. The present invention performs quantization processing and greatly reduces the size of the network model. The present invention performs merging processing, which greatly reduces the computational load of the network.

附图说明Description of drawings

图1是本发明实施例提供的用于边缘设备的目标检测网络压缩方法流程图。FIG. 1 is a flowchart of a target detection network compression method for an edge device provided by an embodiment of the present invention.

图2是本发明实施例提供的优化后Skynet网络结构图。FIG. 2 is a structural diagram of an optimized Skynet network provided by an embodiment of the present invention.

图3是本发明实施例提供的饱和式截取缩放量化示意图。FIG. 3 is a schematic diagram of a saturated intercept scaling and quantization provided by an embodiment of the present invention.

图4是本发明实施例提供的合并卷积层与归一化层示意图。FIG. 4 is a schematic diagram of a combined convolution layer and a normalization layer provided by an embodiment of the present invention.

图5是本发明实施例提供的FETCH操作示意图。FIG. 5 is a schematic diagram of a FETCH operation provided by an embodiment of the present invention.

图6是本发明实施例提供的定点化前的每层计算过程示意图。FIG. 6 is a schematic diagram of a calculation process of each layer before fixed-pointization provided by an embodiment of the present invention.

图7是本发明实施例提供的定点化后的每层计算过程示意图。FIG. 7 is a schematic diagram of a calculation process of each layer after fixed pointization provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

针对现有技术存在的问题，本发明提供了一种用于边缘设备的目标检测网络压缩方法，下面结合附图对本发明作详细的描述。In view of the problems existing in the prior art, the present invention provides a target detection network compression method for edge devices. The present invention will be described in detail below with reference to the accompanying drawings.

本发明实施例提供的用于边缘设备的目标检测网络压缩方法包括：The target detection network compression method for edge devices provided by the embodiment of the present invention includes:

如图1所示，本发明实施例提供的用于边缘设备的目标检测网络压缩方法包括以下步骤：As shown in FIG. 1 , the target detection network compression method for edge devices provided by an embodiment of the present invention includes the following steps:

S101，通过去除SkyNet网络中的旁路分支结构、删除第一层输出的部分通道进行SkyNet网络的优化裁剪，得到优化后的SkyNet网络；S101, optimize and cut the SkyNet network by removing the bypass branch structure in the SkyNet network and deleting part of the output channels of the first layer to obtain the optimized SkyNet network;

S102，通过将权重量化到7bit、将特征图量化到8bit、对偏置参数和缩放系数进行融合，并定点化到32bit进行重训练后的SkyNet网络的压缩；S102, compress the SkyNet network after retraining by quantizing the weights to 7 bits, quantizing the feature maps to 8 bits, fusing the bias parameters and scaling coefficients, and quantizing them to 32 bits;

S103，通过合并卷积层与归一化层、将激活处理、量化、反量化以及饱和截断处理合并为FETCH处理进行SkyNet网络结构的合并。S103 , merge the SkyNet network structure by merging the convolution layer and the normalization layer, and merging activation processing, quantization, inverse quantization, and saturation truncation processing into FETCH processing.

本发明实施例提供的进行SkyNet网络的优化裁剪包括以下步骤：The optimization and tailoring of the SkyNet network provided by the embodiment of the present invention includes the following steps:

本发明实施例提供的优化后的SkyNet网络包括：The optimized SkyNet network provided by the embodiment of the present invention includes:

本发明实施例提供的进行重训练后的SkyNet网络的压缩包括以下步骤：The compression of the SkyNet network after retraining provided by the embodiment of the present invention includes the following steps:

q_w＝w×scale_w；q_w=w×scale_w;

本发明实施例提供的利用KL相对熵选取阈值，采用饱和式量化进行特征图量化包括：Using the KL relative entropy to select a threshold provided by the embodiment of the present invention, and adopting saturated quantization to perform feature map quantization includes:

2)计算缩放系数scale_fm：2) Calculate the scaling factor scale_fm:

本发明实施例提供的进行SkyNet网络结构的合并包括：The merging of the SkyNet network structure provided by the embodiment of the present invention includes:

其中：in:

本发明实施例提供的进行ReLu激活及饱和截断包括：The ReLu activation and saturation truncation provided by the embodiment of the present invention includes:

下面结合具体实施例对本发明的技术方案做进一步说明。The technical solutions of the present invention will be further described below with reference to specific embodiments.

实施例1：Example 1:

(1)优化裁剪(1) Optimize cutting

本发明首先将Skynet分支剪枝掉，其次，以每一个深度可分离卷积为最小单位层，将第一层输出剪枝为32通道，然后，在第六层后加入池化操作，最后，将最后一层修改为深度可分离卷积，整体网络结构优化为直筒型。In the present invention, the Skynet branch is first pruned, secondly, each depth separable convolution is used as the minimum unit layer, the output of the first layer is pruned into 32 channels, and then a pooling operation is added after the sixth layer, and finally, The last layer is modified to be depthwise separable convolution, and the overall network structure is optimized to be straight.

优化后的Skynet如图2所示，整体网络一共包含8层，分别是3通道输入层(CHL3)、中间层(CHL32、CHL96、CHL192、CHL384、CHL512、CHL96)、回归层(CHL30)，每层之间卷积使用深度可分离卷积(DSC)实现，网络结构规整，便于模块复用，形成高效的计算结构。The optimized Skynet is shown in Figure 2. The overall network contains a total of 8 layers, namely the 3-channel input layer (CHL3), the middle layer (CHL32, CHL96, CHL192, CHL384, CHL512, CHL96), and the regression layer (CHL30). The convolution between layers is implemented using depthwise separable convolution (DSC), and the network structure is regular, which facilitates module reuse and forms an efficient computing structure.

本发明所用输入图像尺寸大小为160×160，每层图像尺寸大小如表1所示：The size of the input image used in the present invention is 160×160, and the size of each layer of image is shown in Table 1:

表1各层具体尺寸Table 1 The specific dimensions of each layer

(2)对模型进行压缩(2) Compress the model

量化部分分别是对权重的量化、对特征图的量化以及对偏置和缩放系数的定点化。The quantization part is the quantization of weights, the quantization of feature maps, and the quantization of bias and scaling coefficients, respectively.

权重采用最大值量化方式，最大值选取各输出通道对应的卷积核中的最大值。The weight adopts the maximum value quantization method, and the maximum value is selected from the maximum value in the convolution kernel corresponding to each output channel.

设各通道对应的原始权重为向量w，量化后权重为向量q_w，缩放系数为标量scale_w，则对应关系如式(1)(2)所示，为便于运算，边界取63，最终得到的q_w均匀分布在-63～63之间。Let the original weight corresponding to each channel be a vector w, the quantized weight is a vector q_w, and the scaling coefficient is a scalar scale_w, then the corresponding relationship is shown in formula (1) (2). Evenly distributed between -63 and 63.

q_w＝w×scale_w (2)q_w=w×scale_w (2)

特征图量化采用饱和式量化，利用KL相对熵选取阈值，显著减小精度损失。The feature map quantization adopts saturated quantization, and uses the KL relative entropy to select the threshold value, which significantly reduces the loss of accuracy.

饱和式量化如图3所示，选取一阈值T，将原始分布处在±T范围内的值等比例缩放到-127到+127，图中红色的值表缩放后超出范围外，进行饱和处理，直接取饱和值表示该部分值。Saturated quantization is shown in Figure 3. A threshold T is selected, and the original distribution is scaled to -127 to +127 in the range of ±T. The red value table in the figure is out of range after scaling, and saturation processing is performed. , directly take the saturation value to represent this part of the value.

设量化前原始分布为p，使用阈值T量化后分布为q，原始分布信息熵为H(p)，原始分布与量化后分布交叉熵为H(p,q)，KL相对熵为DKL(p||q)，则有如下公式：Let the original distribution before quantization be p, the distribution after quantization with the threshold T is q, the information entropy of the original distribution is H(p), the cross entropy between the original distribution and the quantized distribution is H(p,q), and the KL relative entropy is DKL(p ||q), then there is the following formula:

可得缩放系数为scale_fm,其中：The available scaling factor is scale_fm, where:

对偏置和缩放系数定点化，将前向推理中出现的浮点数单元进行合并，并放大取整，最终系数采用32位整型数保存。设权重量化系数为scale_w，当前层特征图量化系数为scale_fm，偏置为bias，下一层量化系数为scale_next_fm。The bias and scaling coefficients are fixed-point, and the floating-point units appearing in the forward inference are combined, enlarged and rounded, and the final coefficients are stored in 32-bit integers. Let the weight quantization coefficient be scale_w, the current layer feature map quantization coefficient is scale_fm, the bias is bias, and the next layer quantization coefficient is scale_next_fm.

设合并及放大取整后的反量化系数为Scale_merge，偏置系数为Bias_merge，放大倍数为shift_coe，有：Let the inverse quantization coefficient after merging and magnification be Scale_merge, the bias coefficient be Bias_merge, and the magnification factor be shift_coe, there are:

Bias_merge＝int(bias×scale_next_fm×shift_coe) (7)Bias_merge=int(bias×scale_next_fm×shift_coe) (7)

(3)合并网络结构(3) Merge network structure

合并卷积层与归一化层，深度可分离卷积包含归一化层，在训练网络模型时，归一化层能加速网络收敛，控制过拟合，解决梯度消失和梯度爆炸的问题。当模型训练完成后，所有参数都已固定下来，此时对网络中的卷积层参数和归一化层参数进行合并，如图4所示，可以有效简化网络结构，减少计算量，提高计算效率。The convolution layer and the normalization layer are combined. The depthwise separable convolution includes the normalization layer. When training the network model, the normalization layer can accelerate the network convergence, control over-fitting, and solve the problems of gradient disappearance and gradient explosion. After the model training is completed, all parameters have been fixed. At this time, the convolution layer parameters and normalization layer parameters in the network are combined, as shown in Figure 4, which can effectively simplify the network structure, reduce the amount of calculation, and improve the calculation efficiency.

设卷积输出为y1，输入为x，权重为w，偏差为b，x、w和b均为向量，归一化层输出为y2，均值为μ，标准差为σ，缩放系数为γ，缩放偏移为β，取ε＝1e-6，则卷积计算公式为：Let the convolution output be y1, the input be x, the weight be w, the bias be b, x, w and b are all vectors, the output of the normalization layer is y2, the mean is μ, the standard deviation is σ, and the scaling factor is γ, The scaling offset is β, and ε=1e-6, then the convolution calculation formula is:

y₁＝wx+b (9)y ₁ =wx+b (9)

归一化层计算公式为：The normalization layer calculation formula is:

其中：in:

将式(10)代入式(11)，设W为融合后权重，B为融合后偏置，合并后有输出y3：Substitute Equation (10) into Equation (11), let W be the weight after fusion, B be the offset after fusion, and output y3 after merging:

其中：in:

融合后不再有归一化层计算，减小了模型尺寸，节省计算资源，为前向推理带来性能上的提升。After the fusion, there is no normalization layer calculation, which reduces the size of the model, saves computing resources, and improves the performance of forward reasoning.

合并激活操作、量化、反量化以及饱和截断为FETCH操作，如图5所示。The activation operation, quantization, inverse quantization, and saturation truncation are combined into a FETCH operation, as shown in Figure 5.

fetch操作进行32bit到8bit的转换，实质上完成了ReLu激活及饱和截断两个过程。用定点化后的数据计算后，需要将结果缩小为之前的1/shift_coe，对于边缘设备来讲，由于shift_coe是2的次幂，因此可直接进行取位，中途需要对输入数据的正负进行判断，若为正，则进行饱和截断；若为负，则输出激活后的值0。The fetch operation performs the conversion from 32bit to 8bit, which essentially completes the two processes of ReLu activation and saturation truncation. After calculating with the fixed-point data, the result needs to be reduced to the previous 1/shift_coe. For edge devices, since shift_coe is a power of 2, it can be directly fetched, and the positive and negative input data needs to be checked in the middle. Judgment, if it is positive, the saturation truncation is performed; if it is negative, the value 0 after activation is output.

图6展示了定点化以及网络结构合并前的推理步骤，图7展示了优化后的推理步骤。Figure 6 shows the inference steps before fixed-pointization and network structure merging, and Figure 7 shows the optimized inference steps.

实施例2：Example 2:

(1)优化裁剪和重训练(1) Optimize cropping and retraining

对Skynet网络结构优化为直筒型后的模型开展重训练。表2为Skynet剪枝前后精度对比，以平均精度(average precision，AP，如式23所示)为评价指标，其精度下降不到0.03，在满足实际应用需求的情况下，极大提高Skynet在边缘设备实时计算的效率。Retraining is carried out on the model after the Skynet network structure is optimized to be straight. Table 2 shows the accuracy comparison of Skynet before and after pruning. Taking the average precision (AP, as shown in Equation 23) as the evaluation index, the accuracy dropped by less than 0.03. In the case of meeting the needs of practical applications, it greatly improves the performance of Skynet. Efficiency of real-time computing on edge devices.

表2 Skynet剪枝前后精度对比Table 2 Accuracy comparison before and after Skynet pruning

模型类型model type 平均精度average precision 完整模型full model 0.7970.797 剪枝后模型model after pruning 0.7700.770

(2)对模型进行压缩以及合并网络结构(2) Compress the model and merge the network structure

对剪枝以及重训练后的模型进行量化，并合并激活操作、量化、反量化以及饱和截断操作，与未压缩前的模型相比，精度仅下降了2.34％，而网络模型尺寸缩小了74.5％。Quantizing the pruned and retrained model and combining activation operations, quantization, inverse quantization, and saturation truncation operations, compared with the uncompressed model, the accuracy is only reduced by 2.34%, while the network model size is reduced by 74.5% .

模型压缩优化前后，有关模型大小、平均精度以及精度损失的对比表格如Before and after model compression optimization, the comparison table about model size, average accuracy and accuracy loss is as follows

表3所示。shown in Table 3.

压缩前Before compression 压缩后after compression 数据类型type of data 32bit浮点型32bit floating point 7bit/8bit/32bit定点型7bit/8bit/32bit fixed point 模型大小(MB)Model size (MB) 1.411.41 0.3590.359 平均精度(AP)Average Precision (AP) 0.7700.770 0.7520.752 精度损失loss of precision ------ 2.34％2.34% 压缩比compression ratio ------ 74.5％74.5%

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，都应涵盖在本发明的保护范围之内。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this, any person skilled in the art is within the technical scope disclosed by the present invention, and all within the spirit and principle of the present invention Any modifications, equivalent replacements and improvements made within the scope of the present invention should be included within the protection scope of the present invention.

Claims

1. An object detection network compression method for an edge device, the object detection network compression method for an edge device comprising:

optimizing a network structure for SkyNet, and quantizing the characteristic diagram and the weight parameter; and reconstructing a forward reasoning structure and combining part of calculation processing in the depth separable convolution.

2. The object detection network compression method for edge devices of claim 1, wherein the object detection network compression method for edge devices comprises the steps of:

removing a bypass branch structure in the SkyNet network and deleting a partial channel output by a first layer to perform optimized cutting of the SkyNet network to obtain an optimized SkyNet network;

step two, the SkyNet network is compressed after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;

and step three, merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing, and merging the SkyNet network structure.

3. The target detection network compression method for an edge device of claim 2, wherein the performing optimized pruning of the SkyNet network comprises the steps of:

first, the Skynet branches are pruned; pruning the output of the first layer into 32 channels by taking each depth separable convolution as a minimum unit layer;

secondly, adding pooling treatment after the sixth layer, modifying the last layer into depth separable convolution, and optimizing the whole network structure into a straight cylinder type.

4. The target detection network compression method for an edge device of claim 2, wherein the optimized SkyNet network comprises:

a 3-channel input layer CHL3, intermediate layers CHL32, CHL96, CHL192, CHL384, CHL512, CHL96, and a regression layer CHL 30;

the convolution between each layer of the optimized SkyNet network uses a deep separable convolution.

5. The method of object detection network compression for an edge device of claim 2, wherein the compression of the retrained SkyNet network comprises the steps of:

(1) selecting the maximum value in the convolution kernel corresponding to each output channel as the maximum value of quantization, and performing weight quantization by adopting a maximum value quantization mode:

q_w＝w×scale_w；

wherein w represents a vector of the original weight corresponding to each channel; q _ w represents a vector of quantized weights; scale _ w represents that the scaling factor is a scalar;

(2) selecting a threshold value by utilizing the KL relative entropy, and quantizing the feature diagram by adopting a saturation type quantization method: selecting a threshold value T, scaling the value of the original distribution within the range of +/-T to-127 to +127 in equal proportion, carrying out saturation treatment on the part out of the range, and directly taking the value of the part out of the range represented by the saturation value.

6. The method according to claim 5, wherein the selecting the threshold value using the KL relative entropy and performing the feature map quantization using saturation quantization comprises:

1) selecting a threshold value by utilizing KL relative entropy:

wherein p represents original distribution before quantization is set, q represents distribution after quantization using a threshold value T, H (p) represents original distribution information entropy, H (p, q) represents cross entropy of original distribution and distribution after quantization, and DKL (p | | q) represents KL relative entropy;

2) calculating a scaling factor scale _ fm:

3) performing fixed-point processing on the bias and scaling coefficients, merging floating point number units appearing in forward reasoning, amplifying and rounding, and storing the final coefficient by adopting a 32-bit integer number;

4) calculating the inverse quantization coefficient after merging, amplification and rounding:

Bias_merge＝int(bias×scale_next_fm×shift_coe)；

wherein scale _ w represents a weight quantization coefficient; scale _ fm represents the quantization coefficient of the front layer feature map; bias represents bias; scale _ next _ fm represents the next layer quantized coefficient; scale _ merge represents the inverse quantization coefficient after merging, amplification and rounding; bias _ merge represents a Bias coefficient; shift _ coe represents magnification.

7. The method of object detection network compression for an edge device of claim 2, wherein said merging the SkyNet network structure comprises:

(1) merging the convolution layer and the normalization layer to obtain a merged output y₃Comprises the following steps:

wherein:

wherein, y₁Represents the convolution output; x represents an input; w represents a weight; b represents a deviation; x, w and b are vectors; μ represents a mean value; σ represents the standard deviation; γ represents a scaling coefficient; β represents a scaling offset; 1 e-6; w represents a post-fusion weight; b represents post-fusion bias;

(2) merging activation processing, quantization, inverse quantization and saturation truncation into FETCH processing; the FETCH processing converts 32 bits into 8 bits; performing ReLu activation and saturation truncation; the performing ReLu activation and saturation truncation comprises:

judging the positive and negative of the input data, and if the positive is true, performing saturation truncation; if the output is negative, the activated value is 0.

8. An object detection network compression system for an edge device implementing the object detection network compression method for the edge device according to any one of claims 7, wherein the object detection network compression system for the edge device comprises:

the network pruning module is used for carrying out optimized cutting on the SkyNet network by removing a bypass branch structure in the SkyNet network and deleting a part of channels output by the first layer to obtain the optimized SkyNet network;

the network compression module is used for compressing the SkyNet network after the weight is quantized to 7 bits, the characteristic diagram is quantized to 8 bits, the bias parameters and the scaling coefficients are fused, and the point is fixed to 32 bits for retraining;

and the network structure merging module is used for merging the convolution layer and the normalization layer, merging the activation processing, the quantization, the inverse quantization and the saturation truncation processing into FETCH processing and merging the SkyNet network structure.