CN110458841A

CN110458841A - A method to improve the running rate of image segmentation

Info

Publication number: CN110458841A
Application number: CN201910535642.4A
Authority: CN
Inventors: 张烨; 樊一超; 郭艺玲
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2019-11-15
Anticipated expiration: 2039-06-20
Also published as: CN110458841B

Abstract

A method for improving the running rate of image segmentation, comprising: step 1, designing a multi-scale hollow convolution kernel; step 2, designing a channel convolution network; step 3, designing a full convolution connection and a deconvolution network; The network of convolution and full convolution operation can be applied to any image size and size, and can perform semantic analysis on each pixel of the image, achieve the purpose of fast image segmentation, and can quickly and accurately locate image features.

Description

A method to improve the running rate of image segmentation

技术领域technical field

本发明涉及一种改变图像分割速率的方法。The present invention relates to a method for changing the image segmentation rate.

技术背景technical background

近几年随着计算机科学技术的飞速发展，基于计算机技术的图像处理、图像目标检测等也获得了空前的快速发展，其中深度学习通过学习海量的数字图像特征，提取关键目标特征，在目标检测上已超过人类，给工业界带来了一个又一个惊喜。随着神经元网络的再度兴起，基于卷积神经元网络的视频图像法成为图像分割和识别的主流技术，采用模板匹配、边缘特征提取、梯度直方图等手段，实现对图像的精确识别。虽然基于神经网络的图像特征检测能够针对复杂场景的目标进行有效的特征识别，而且其效果远优于传统的方法，但也存在不足之处：(1)对噪声抗干扰性较弱；(2)通过使用Dropout方法解决了过拟合问题，改进了卷积神经网络模型和参数，但是精度却略有下降；(3)引入了可变型卷积与可分离卷积结构，提高了模型的泛化性，增强了网络模型特征提取能力，但对复杂场景的目标识别表现欠佳；(4)较新的一种图像分割方法，即End-to-End，直接预测图像像素分类信息，做到了目标物体的像素定位，但模型存在参数量大、效率慢、分割粗糙等问题。总之，传统的检测方法和视频图像法均存在着操作繁琐、识别精度不高、识别效率慢和分割粗糙等问题。In recent years, with the rapid development of computer science and technology, image processing and image target detection based on computer technology have also achieved unprecedented rapid development. Among them, deep learning extracts key target features by learning massive digital image features. It has surpassed human beings, bringing one surprise after another to the industry. With the re-emergence of neural network, the video image method based on convolutional neural network has become the mainstream technology of image segmentation and recognition. It adopts template matching, edge feature extraction, gradient histogram and other means to achieve accurate image recognition. Although image feature detection based on neural network can perform effective feature recognition for targets in complex scenes, and its effect is much better than traditional methods, it also has shortcomings: (1) It is weak against noise; (2) ) The over-fitting problem is solved by using the Dropout method, and the convolutional neural network model and parameters are improved, but the accuracy is slightly decreased; (3) The variable convolution and separable convolution structures are introduced to improve the generalization of the model. (4) A newer image segmentation method, namely End-to-End, directly predicts the image pixel classification information, and achieves Pixel localization of the target object, but the model has problems such as large amount of parameters, slow efficiency, and rough segmentation. In short, traditional detection methods and video image methods have problems such as cumbersome operation, low recognition accuracy, slow recognition efficiency and rough segmentation.

发明内容SUMMARY OF THE INVENTION

为了克服现有技术的上述不足，本发明针对样本问题提供了一种全卷积的提高图像分割运行速率的方法。本发明采用深度学习框架，并对卷积神经网络进行优化和改进；采用通道卷积的方法减少模型的参数量；采用多尺度的空洞卷积增加图像的特征，解决传统网络感受野小的问题。In order to overcome the above-mentioned shortcomings of the prior art, the present invention provides a fully convolutional method for improving the running rate of image segmentation aiming at the sample problem. The invention adopts a deep learning framework, and optimizes and improves the convolutional neural network; adopts the method of channel convolution to reduce the parameter quantity of the model; adopts multi-scale hole convolution to increase the characteristics of the image, and solves the problem of small receptive field of the traditional network .

为实现上述目的，本发明采用以下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种提高图像分割运行速率的方法，包括如下步骤：A method for improving the running rate of image segmentation, comprising the following steps:

步骤一，设计多尺度空洞卷积核；Step 1, design a multi-scale hole convolution kernel;

为了解决通过采用传统卷积和最大池化方法来增大感受野的难题，本发明采用了空洞卷积核，在基于传统的卷积核上增大采样率rate，将原始卷积核变得“蓬松”。In order to solve the problem of increasing the receptive field by using traditional convolution and maximum pooling methods, the present invention adopts a hole convolution kernel, increases the sampling rate rate based on the traditional convolution kernel, and changes the original convolution kernel into "fluffy".

这样在保持原有计算量的同时，增大了感受野，使得图像分割的信息足够精确，则基于空洞卷积核的感受野大小计算公式为In this way, while maintaining the original calculation amount, the receptive field is increased, so that the information of image segmentation is accurate enough. The calculation formula of the receptive field size based on the hole convolution kernel is:

式中：F为当前层感受野大小；rate为空洞卷积核的采样率，即间距数，可将传统卷积核的rate视为1，而空洞卷积的采样率rate视为2。传统的卷积感受野计算公式为In the formula: F is the size of the receptive field of the current layer; rate is the sampling rate of the atrous convolution kernel, that is, the number of spacings, the rate of the traditional convolution kernel can be regarded as 1, and the sampling rate of the atrous convolution is regarded as 2. The traditional convolution receptive field calculation formula is

式中：F_i-1为上一层的感受野大小；k_i为第i层的卷积核或池化核大小；n为卷积总层数；s_i为第i层卷积核的卷积步长Stride。In the formula: F _i-1 is the size of the receptive field of the previous layer; ki is the size of the convolution kernel or pooling kernel of the _i -th layer; n is the total number of convolution layers; s _i is the convolution kernel of the i-th layer. Convolution stride Stride.

借鉴多尺度图像变化的思想来设计多尺度空洞卷积，对采样率、卷积核大小进行多样化处理，从而能够适应不同大小目标的特征提取过程。多尺度空洞卷积的计算为Drawing on the idea of multi-scale image changes to design multi-scale hole convolution, the sampling rate and convolution kernel size are diversified, so that it can adapt to the feature extraction process of different size targets. The multi-scale atrous convolution is calculated as

式中：y[i]为第i个步长位置对应的卷积求和结果；K为卷积核；k为卷积核内参数坐标位置，k∈K；w[k]为卷积核权重；rate为采样率，可取1，2，3的对应值。In the formula: y[i] is the convolution summation result corresponding to the i-th step position; K is the convolution kernel; k is the parameter coordinate position in the convolution kernel, k∈K; w[k] is the convolution kernel Weight; rate is the sampling rate, which can take the corresponding value of 1, 2, and 3.

步骤二，设计通道卷积网络；Step 2, design a channel convolutional network;

由于传统的卷积方式都是一种升维操作，因此可以考虑一开始采用通道卷积的方式来达到特征卷积降维的作用。首先，将传统的卷积改成两层卷积，类似于ResNet中的group操作，这种新结构在不影响准确率的前提下缩短计算时间约为原来的1/8，减少参数量约为原来的1/9，并且能够很好的应用于移动端，实现目标的实时检测，模型压缩效果明显。Since the traditional convolution method is a dimensional increase operation, it can be considered to use the channel convolution method at the beginning to achieve the function of feature convolution dimension reduction. First, the traditional convolution is changed to a two-layer convolution, similar to the group operation in ResNet. This new structure shortens the calculation time by about 1/8 of the original without affecting the accuracy, and reduces the amount of parameters by about The original 1/9, and can be well applied to the mobile terminal to achieve real-time detection of targets, and the model compression effect is obvious.

对于传统的卷积而言，假设输入的特征通道数为M；卷积核的宽或高分别为D_k或D_k；卷积核的数量为N。则卷积每滑动一次某一位置就有N个M·D_k·D_k的参数量，滑动的步长设置为s。则滑动后的图像尺寸大小计算公式为For traditional convolution, it is assumed that the number of input feature channels is M; the width or height of the convolution kernel is D _k or D _k , respectively; the number of convolution kernels is N. Then, each time the convolution slides a certain position, there are N parameters of _M · _Dk ·Dk, and the sliding step size is set to s. Then the formula for calculating the size of the image after sliding is:

式中：h'，w'为卷积后的高和宽；pad为宽高填充的边界。因此，h'·w'卷积后尺寸某一点对应N个M·D_k·D_k的参数量，则可得到总的参数量大小为N·M·D_k·D_k·h'·w'(6)In the formula: h', w' are the height and width after convolution; pad is the border of width and height padding. Therefore, after the convolution of h'·w', a certain point of the size corresponds to N parameters of _M ·Dk· _Dk , and the total parameter size can be obtained as N· _M ·Dk· _Dk ·h'·w '(6)

而采用改进后的通道卷积方式，其卷积步骤分为两步：With the improved channel convolution method, the convolution step is divided into two steps:

1)采用D_k·D_k·M的卷积分别对M个通道进行卷积。采用同样的步长s进行滑动，卷积后的尺寸大小为h'，w'，则该步骤产生的参数量为D_k·D_k·M·h'·w' (7)1) Convolve the M channels respectively by using the convolution of D _k · D _k · M. The same step size s is used for sliding, and the size after convolution is h', w', then the amount of parameters generated in this step is D _k · D _k · M · h' · w' (7)

2)设置1·1·N的卷积核进行升维特征提取。此时采用步长为1的方式对上述结果得出的特征图进行再次特征提取，原有M个通道特征，每一个采用N个卷积核进行特征提取，则计算的总参数量大小为M·N·h'·w'·1·1 (8)2) Set a convolution kernel of 1·1·N for feature extraction. At this time, the feature map obtained from the above results is extracted again with a step size of 1. There are original M channel features, each of which uses N convolution kernels for feature extraction, and the total calculated parameter size is M· N·h'·w'·1·1 (8)

综合这两个步骤的卷积结构，得到通道卷积最后的参数量大小为D_k·D_k·M·h'·w'+M·N·h'·w' (9)Combining the convolution structure of these two steps, the final parameter size of channel convolution is D _k · D _k · M · h' · w'+M · N · h' · w' (9)

如前所述，传统卷积核的参数量和改进后的通道卷积参数量比较大小为As mentioned above, the parameter quantity of the traditional convolution kernel and the improved channel convolution parameter quantity are as follows:

从式(10)分析可得，如果采用卷积核大小为3×3，那么通道卷积操作可将参数量降低为原来的1/9。From the analysis of formula (10), it can be seen that if the size of the convolution kernel is 3×3, the channel convolution operation can reduce the amount of parameters to 1/9 of the original.

步骤三，设计全卷积连接与反卷积网络；Step 3: Design a fully convolutional connection and a deconvolutional network;

传统的网络结构最后层采用固定的尺寸大小，以至于输入的图片需事先转化为固定尺寸，不利于物流车辆车长坐标的获取；并且传统全连接层网络存在确定的位数空间坐标丢失，导致图像空间信息失真，未能有效对目标进行精确定位。为解决信息丢失问题，本发明采用了全卷积连接方式对图片中特征的位置坐标进行精确定位。The last layer of the traditional network structure adopts a fixed size, so that the input image needs to be converted into a fixed size in advance, which is not conducive to the acquisition of the coordinates of the length of the logistics vehicle; and the traditional fully connected layer network has a certain loss of digit space coordinates, resulting in The spatial information of the image is distorted, which cannot effectively locate the target. In order to solve the problem of information loss, the present invention adopts a fully convolutional connection method to precisely locate the position coordinates of the features in the picture.

传统网络的全连接将前部分的卷积网络[b,c,h,w]转为[b,c·h·w]，即[b,4096]，再转为[b,cls]，其中b表示批次batch size大小，cls表示类别数。而采用全卷积网络是相对于后接1×1的卷积网络，没有全连接层。因此，称为全卷积网络。全卷积的计算方法为The full connection of the traditional network converts the convolutional network [b,c,h,w] of the previous part to [b,c·h·w], that is, [b,4096], and then to [b,cls], where b represents the batch size, and cls represents the number of categories. The use of a fully convolutional network is relative to a 1×1 convolutional network followed by a fully connected layer. Hence, it is called a fully convolutional network. The calculation method of full convolution is

式中：1≤n≤N；y_n[i][j]为第n个卷积核的第(i,j)位置卷积后的数值；s_i为横向的卷积步长；s_j为纵向的卷积步长；k_n为第n个卷积核；D_k为卷积核宽和高，卷积核大小对应步骤2中的D_k·D_k；δ_i，δ_j为该卷积核中的位置，该层总共有N个不同类型的卷积核，0≤δ_i,δ_j≤D_k，而卷积核的滑动卷积操作可转为两个矩阵相乘操作。对应图像的像素与卷积的结果可表示为In the formula: 1≤n≤N; y _n [i][j] is the value after convolution at the (i, j)th position of the nth convolution kernel; s _i is the horizontal convolution step size; s _j is the vertical convolution step size; k _n is the nth convolution kernel; D _k is the width and height of the convolution kernel, and the size of the convolution kernel corresponds to D _k · D _k in step 2; δ _i , δ _j are the The position in the convolution kernel, this layer has a total of N different types of convolution kernels, 0≤δ _i ,δ _j ≤D _k , and the sliding convolution operation of the convolution kernel can be converted into two matrix multiplication operations. The result of convolution between the pixels of the corresponding image can be expressed as

其中：左边的矩阵维度为[N,M·D_k·D_k]；右边的矩阵维度为[M·D_k·D_k,w′·h′]；卷积后的维度为[N,w′·h′]。右边的矩阵中I为img，其下标依次为像宽和像高，即I_wh。Among them: the dimension of the matrix on the left is [N, _M · _Dk · _Dk ]; the dimension of the matrix on the right is [M·Dk· _Dk ,w′·h′]; the dimension after convolution is [N,w '·h']. In the matrix on the right, I is img, and its subscripts are image width and image height in turn, that is, I _wh .

最后通过反卷积操作，将[N,w′·h′]转为输入时的图像大小，这样可以精确的识别每一像素代表的具体语义信息，且避免了空间信息损失。反卷积的具体操作，相当于卷积的逆运算，即Finally, through the deconvolution operation, [N,w′·h′] is converted into the image size at the time of input, so that the specific semantic information represented by each pixel can be accurately identified, and the loss of spatial information can be avoided. The specific operation of deconvolution is equivalent to the inverse operation of convolution, namely

式中：k₁，…，k_N个卷积核对应的权值由原来的变化为该权值是通过训练调整后的权值大小，具有图像语义信息特征的权值。In the formula: k ₁ ,...,k _N corresponding weights of convolution kernels are changed from the original change to The weight is the weight size adjusted by training, and has the weight of the image semantic information feature.

因此，通过反卷积和全卷积操作的网络能够适用于任意图像大小尺寸，且能够对图像的每一像素点进行语义分析，达到快速分割图像的目的，并且能够对图像特征进行快速精确的定位。Therefore, the network through deconvolution and full convolution operations can be applied to any image size and size, and can perform semantic analysis on each pixel of the image to achieve the purpose of fast image segmentation, and can quickly and accurately perform image features. position.

本发明的优点是：The advantages of the present invention are:

本发明针对样本问题采用了一种全卷积的方法来提高图像分割运行速率，其最突出的特点是对图像进行了轻量化处理，在保证分割精度的情况下，提升了模型的分割效率，通过通道卷积的方式减少了模型的参数量；又设置了多尺度的空洞卷积核，合理而简单地提高模型的感受野，增强了模型的泛化性。该算法可以广泛的应用在图像定位识别领域，比如物流园区车辆识别等。The present invention adopts a full convolution method to improve the running rate of image segmentation for the problem of samples. The number of parameters of the model is reduced by channel convolution; a multi-scale hollow convolution kernel is set up, which reasonably and simply improves the receptive field of the model and enhances the generalization of the model. The algorithm can be widely used in the field of image positioning and recognition, such as vehicle identification in logistics parks.

附图说明Description of drawings

图1是现有传统卷积核卷积操作示意图；Figure 1 is a schematic diagram of the existing traditional convolution kernel convolution operation;

图2是本发明的改进的空洞卷积核的卷积操作示意图；Fig. 2 is the convolution operation schematic diagram of the improved hollow convolution kernel of the present invention;

图3a～图3c是本发明的多尺度空洞卷积核，图3a是采样率为1的空洞卷积核，图3b是采样率为2的空洞卷积核，图3c是采样率为3的空洞卷积核；Figures 3a to 3c are multi-scale hole convolution kernels of the present invention, Figure 3a is a hole convolution kernel with a sampling rate of 1, Figure 3b is a hole convolution kernel with a sampling rate of 2, and Figure 3c is a hole convolution kernel with a sampling rate of 3 Atrous convolution kernel;

图4是现有的卷积方式；Figure 4 is the existing convolution method;

图5是本发明的通道卷积方式；Fig. 5 is the channel convolution mode of the present invention;

图6是本发明的通道卷积结构；Fig. 6 is the channel convolution structure of the present invention;

图7是本发明的全卷积网络设计结构；Fig. 7 is the fully convolutional network design structure of the present invention;

图8是本发明的全卷积矩阵计算过程示意图。FIG. 8 is a schematic diagram of the calculation process of the full convolution matrix of the present invention.

注：图6中，DW为通道卷积组，表示通道卷积核组成的固定搭配；BN为批量归一化操作，解决在训练过程中，中间层数据分布发生改变的问题；Conv为卷积层操作；RelU为修正线性单元，是一个激活函数。Note: In Figure 6, DW is a channel convolution group, which represents a fixed combination of channel convolution kernels; BN is a batch normalization operation, which solves the problem that the data distribution of the middle layer changes during the training process; Conv is a convolution Layer operation; RelU is a modified linear unit, which is an activation function.

注：图8中：k₁，…，k_N为卷积核个数；为第n个卷积核的位置权值。Note: In Figure 8: k ₁ , ..., k _N is the number of convolution kernels; is the position weight of the nth convolution kernel.

具体实施方式Detailed ways

为了克服现有技术的上述不足，本发明针对样本问题提供了一种全卷积的图像分割方法，采用深度学习框架，并对卷积神经网络进行优化和改进；采用通道卷积的方法减少模型的参数量；采用多尺度的空洞卷积增加图像的特征，解决传统网络感受野小的问题。In order to overcome the above-mentioned shortcomings of the prior art, the present invention provides a fully convolutional image segmentation method for the sample problem, adopts a deep learning framework, and optimizes and improves the convolutional neural network; adopts the channel convolution method to reduce the model The amount of parameters; multi-scale atrous convolution is used to increase the characteristics of the image and solve the problem of small receptive field of the traditional network.

为了验证该发明的优越性，以物流园区车辆为实例，构建以下网络模型，进行对照实验：In order to verify the superiority of the invention, taking the logistics park vehicle as an example, the following network model is constructed and a control experiment is carried out:

首先，进行网络构建：从物流园区采集了厢式货车、牵引式货车、自卸车、罐式车四种类型的物流车辆，将其划分为训练集8 000张，每一类别2 000张，测试集4 000张，每一类别1 000张。搭建的网络模型结构各参数配置如下表1所示。First, construct the network: collect four types of logistics vehicles from the logistics park, including vans, traction trucks, dump trucks, and tank trucks, and divide them into 8,000 training sets, 2,000 for each category, and test them. Set of 4,000 sheets, 1,000 sheets per category. The parameter configuration of the network model structure built is shown in Table 1 below.

表1中：k为卷积核大小；s为步长；p为填充的尺寸；DW为通道卷积组，表示通道卷积核组成的固定搭配；使用了残差求和有利于大网络的梯度传递；各层的激活和批量标准化操作(Batch Normalization,BN)有利于加快网络的训练；ReLU为修正线性单元，是一个激活函数。In Table 1: k is the size of the convolution kernel; s is the step size; p is the size of the padding; DW is the channel convolution group, which represents a fixed combination of channel convolution kernels; the use of residual summation is beneficial to large networks. Gradient transfer; activation of each layer and Batch Normalization (BN) are beneficial to speed up the training of the network; ReLU is a modified linear unit and an activation function.

表1网络模型结构各参数设计Table 1 Design of parameters of network model structure

本实例采用的计算机配置为技嘉NVIDIA英伟达GTX1080Ti显存11 G，1 607 MHz的显卡。The computer used in this example is configured as a Gigabyte NVIDIA GTX1080Ti graphics card with 11 G of memory and 1 607 MHz.

最后，对比了本实例网络和传统网络的模型测试性能，结果如表2所示。Finally, the model test performance of this example network and the traditional network is compared, and the results are shown in Table 2.

表2轻量化分割模型性能对比Table 2 Performance comparison of lightweight segmentation models

表2中的评价指标MPA表示平均像素点准确率(Mean pixel accuracy)；MA表示前景面积占标签面积的比值(Mean accuracy)；而MIOU表示平均交并与面积覆盖率比值(Meanintersection over union)，即预测正确的区域占预测面积和标签面积并集的比值；单位M·pic-1表示训练一张图片所占用的内存，内存单位兆(M)；单位ms·iter-1表示每迭代一次需要的时间，时间单位毫秒(ms)；采用通道卷积后，占用的显存减少了51％，训练速度提升了78％，测试速度提高了79％，分割定位的各项评价指标中都获得大幅提升，其中MIOU的提升幅度最大。The evaluation index MPA in Table 2 represents the mean pixel accuracy; MA represents the ratio of the foreground area to the label area (Mean accuracy); and MIOU represents the mean intersection over union ratio (Mean intersection over union), That is, the ratio of the correctly predicted area to the union of the predicted area and the label area; the unit M pic-1 represents the memory occupied by training a picture, and the unit of memory is mega (M); the unit ms iter-1 represents the need for each iteration. time in milliseconds (ms); after channel convolution, the occupied video memory is reduced by 51%, the training speed is increased by 78%, the test speed is increased by 79%, and the evaluation indicators of segmentation and positioning have been greatly improved. , among which MIOU has the largest increase.

通过本实例，验证了本改进方法确实能够提高模型测试的性能，即提高图像分割的运行速率。Through this example, it is verified that the improved method can indeed improve the performance of model testing, that is, improve the running rate of image segmentation.

本方案的优点是：The advantages of this scheme are:

本说明书实施例所述的内容仅仅是对发明构思的实现形式的列举，本发明的保护范围不应当被视为仅限于实施例所陈述的具体形式，本发明的保护范围也及于本领域技术人员根据本发明构思所能够想到的等同技术手段。The content described in the embodiments of the present specification is only an enumeration of the realization forms of the inventive concept, and the protection scope of the present invention should not be regarded as limited to the specific forms stated in the embodiments, and the protection scope of the present invention also extends to those skilled in the art. Equivalent technical means that can be conceived by a person based on the inventive concept.

Claims

1. a method for improving image segmentation running rate, comprising the steps:

Step 1, design a multi-scale hole convolution kernel;

In order to solve the problem of increasing the receptive field by using traditional convolution and maximum pooling methods, a hole convolution kernel is used, and the sampling rate rate is increased based on the traditional convolution kernel to make the original convolution kernel "fluffy". ";

In this way, while maintaining the original calculation amount, the receptive field is increased, so that the information of image segmentation is accurate enough. The calculation formula of the receptive field size based on the hole convolution kernel is:

In the formula: F is the size of the receptive field of the current layer; rate is the sampling rate of the hole convolution kernel, that is, the spacing number, the rate of the traditional convolution kernel can be regarded as 1, and the sampling rate rate of the hole convolution is regarded as 2; The calculation formula of the convolution receptive field is

In the formula: F _i-1 is the size of the receptive field of the previous layer; ki is the size of the convolution kernel or pooling kernel of the _i -th layer; n is the total number of convolution layers; s _i is the convolution kernel of the i-th layer. Convolution stride Stride;

Drawing on the idea of multi-scale image changes, we design multi-scale atrous convolution, and diversify the sampling rate and convolution kernel size, so as to adapt to the feature extraction process of objects of different sizes; the calculation of multi-scale atrous convolution is as follows:

In the formula: y[i] is the convolution summation result corresponding to the i-th step position; K is the convolution kernel; k is the parameter coordinate position in the convolution kernel, k∈K; w[k] is the convolution kernel Weight; rate is the sampling rate, which can take the corresponding value of 1, 2, and 3;

Step 2, design a channel convolutional network;

Since the traditional convolution method is a dimensional increase operation, the channel convolution method is used at the beginning to achieve the function of feature convolution dimension reduction; first, the traditional convolution is changed to a two-layer convolution, similar to ResNet In the group operation, this new structure shortens the calculation time by about 1/8 of the original and reduces the amount of parameters by about 1/9 of the original without affecting the accuracy rate, and can be well applied to the mobile terminal. The real-time detection of the target, the model compression effect is obvious;

For traditional convolution, it is assumed that the number of input feature channels is M; the width or height of the convolution kernel is D _k or D _k respectively; the number of convolution kernels is N; There are N parameters of _M · _Dk ·Dk, and the sliding step is set to s; then the calculation formula of the size of the image after sliding is:

In the formula: h', w' are the height and width after convolution; pad is the border of width and height padding; therefore, a certain point of the size of h'·w' after convolution corresponds to N parameters of _M · _Dk ·Dk quantity, the total parameter size can be obtained as

N· _M ·Dk· _Dk ·h'·w' (6)

With the improved channel convolution method, the convolution step is divided into two steps:

1) Use the convolution of D _k · D _k · M to convolve the M channels respectively; use the same step size s to slide, and the size of the convolution is h', w', then the parameters generated by this step The amount is

D _k · D _k · M · h' · w' (7)

2) Set the convolution kernel of 1·1·N to extract the dimension-raising feature; at this time, the feature map obtained from the above results is extracted again with a step size of 1. The original M channel features, each of which uses N convolution kernels are used for feature extraction, and the total calculated parameter size is

M·N·h'·w'·1·1 (8)

Combining the convolution structure of these two steps, the final parameter size of the channel convolution is obtained as

D _k · D _k · M · h' · w' + M · N · h' · w' (9)

As mentioned above, the parameter quantity of the traditional convolution kernel and the improved channel convolution parameter quantity are as follows:

From the analysis of formula (10), it can be obtained that the channel convolution operation reduces the amount of parameters;

Step 3: Design a fully convolutional connection and a deconvolutional network;

The last layer of the traditional network structure adopts a fixed size, so that the input image needs to be converted into a fixed size in advance, which is not conducive to the acquisition of the coordinates of the length of the logistics vehicle; and the traditional fully connected layer network has a certain loss of digit space coordinates, resulting in The spatial information of the image is distorted, and the target cannot be accurately located; in order to solve the problem of information loss, the full convolution connection method is used to accurately locate the position coordinates of the features in the image;

The full connection of the traditional network converts the convolutional network [b,c,h,w] of the previous part to [b,c·h·w], that is, [b,4096], and then to [b,cls], where b represents the batch size, and cls represents the number of categories; the use of a fully convolutional network is relative to a 1×1 convolutional network followed by a fully connected layer; therefore, it is called a fully convolutional network; a fully convolutional network The calculation method is

In the formula: 1≤n≤N; y _n [i][j] is the value after convolution at the (i, j)th position of the nth convolution kernel; s _i is the horizontal convolution step size; s _j is the vertical convolution step size; k _n is the nth convolution kernel; D _k is the width and height of the convolution kernel, and the size of the convolution kernel corresponds to D _k · D _k in step 2; δ _i , δ _j are the The position in the convolution kernel, this layer has a total of N different types of convolution kernels, 0≤δ _i ,δ _j ≤D _k , and the sliding convolution operation of the convolution kernel can be converted into two matrix multiplication operations; The result of convolution between the pixels of the corresponding image can be expressed as

Among them: the dimension of the matrix on the left is [N, _M · _Dk · _Dk ]; the dimension of the matrix on the right is [M·Dk· _Dk ,w′·h′]; the dimension after convolution is [N,w '·h']; in the matrix on the right, I is img, and its subscripts are image width and image height in turn, i.e. I _wh ;

Finally, through the deconvolution operation, [N,w′·h′] is converted into the image size at the time of input, so that the specific semantic information represented by each pixel can be accurately identified, and the loss of spatial information is avoided; The specific operation is equivalent to the inverse operation of convolution, that is,

In the formula: k ₁ ,...,k _N corresponding weights of convolution kernels are changed from the original change to The weight is the weight size adjusted by training, and has the weight of the image semantic information feature.