CN109214505B

CN109214505B - Full convolution target detection method of densely connected convolution neural network

Info

Publication number: CN109214505B
Application number: CN201810998184.3A
Authority: CN
Inventors: 胡海峰; 黄福强; 王伟轩; 张运鸿; 孙永丞
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2022-07-01
Anticipated expiration: 2038-08-29
Also published as: CN109214505A

Abstract

The invention relates to the field of artificial intelligence, in particular to a full convolution target detection method of a dense connection convolution neural network. The invention provides a full convolution target detection method of a densely connected convolution neural network, aiming at overcoming the defect that the prior method can not detect multi-scale targets more accurately, which is characterized in that the multi-scale feature mapping can be effectively utilized to detect the targets, so that the convolution neural network has higher accuracy in the detection of the targets with different scales in the same image.

Description

A Fully Convolutional Object Detection Method Based on Densely Connected Convolutional Neural Networks

技术领域technical field

本发明涉及人工智能领域，更具体的，涉及一种密集连接卷积神经网络的全卷积目标检测方法。The invention relates to the field of artificial intelligence, and more particularly, to a fully convolutional target detection method with densely connected convolutional neural networks.

背景技术Background technique

卷积神经网络对特征的检测具有不变性。例如对一物体进行平移、旋转后卷积神经网络仍能识别它们为同一物体，但是对于一些在图像中所占面积较少的目标，其信息在卷积神经网络提取特征的过程中会丢失，导致无法准确地检测出目标。随着近期研究的推进，人们发现在使用“多尺度”的特征表示时能有效地提高对不同尺度的目标检测的准确率。曾有人尝试使用图像金字塔进行多尺度目标的检测，具体做法是先对一副图像进行多个尺度的缩放，然后将不同尺度的图像输入到卷积神经网络中去，但是这种方法需要很大的计算量和内存，因此不具有可行性。Convolutional neural networks are invariant to feature detection. For example, after translating and rotating an object, the convolutional neural network can still recognize them as the same object, but for some objects that occupy a small area in the image, their information will be lost in the process of extracting features by the convolutional neural network. As a result, the target cannot be accurately detected. With the advancement of recent research, it has been found that the use of "multi-scale" feature representation can effectively improve the accuracy of object detection at different scales. Some people have tried to use the image pyramid for multi-scale target detection. The specific method is to first scale an image at multiple scales, and then input images of different scales into the convolutional neural network, but this method requires a lot of The amount of computation and memory is therefore not feasible.

发明内容SUMMARY OF THE INVENTION

为了克服现有方法不能对多尺度目标进行更准确的检测，本发明提供了一种密集连接卷积神经网络的全卷积目标检测方法。In order to overcome the inability of existing methods to perform more accurate detection on multi-scale targets, the present invention provides a fully convolutional target detection method with densely connected convolutional neural networks.

为实现以上发明目的，采用的技术方案是：In order to achieve the above purpose of the invention, the technical scheme adopted is:

一种密集连接卷积神经网络的全卷积目标检测方法，具体包括以下步骤：A fully convolutional object detection method based on densely connected convolutional neural networks, which specifically includes the following steps:

步骤S1：构建特征提取网络Densenet，特征提取网络由多个密集连接块及转换层组成，使用密集连接块能识别到图像中更具有判别性的视觉特征，输入图像经过特征提取网络后，保留各个密集连接块输出的具有不同语义和不同分辨率的特征；Step S1: Build a feature extraction network Densenet. The feature extraction network consists of multiple dense connection blocks and conversion layers. Using dense connection blocks can identify more discriminative visual features in the image. After the input image passes through the feature extraction network, each Features with different semantics and different resolutions output by densely connected blocks;

步骤S2:构建特征金字塔FPN，把步骤S1中保留各层特征输入到FPN中，按照特征尺度堆叠，形成一个自下而上、尺度递增的低语义特征金字塔，由最下层开始，每层特征都经过“平行路径”进行卷积操作以获得更高的语义性；同时卷积后的特征会被上抽样到上一层特征的相同尺度，并与上一层特征进行合并，该特征将会继续往上传递，直到金字塔塔顶，循环此步骤直到构建出完整的特征金字塔；Step S2: Build a feature pyramid FPN, input the features of each layer retained in step S1 into the FPN, and stack them according to the feature scale to form a bottom-up, scale-increasing low-semantic feature pyramid. The convolution operation is performed through the "parallel path" to obtain higher semantics; at the same time, the convolved feature will be up-sampled to the same scale of the previous layer feature and merged with the previous layer feature, and the feature will continue to be Pass up until the top of the pyramid, and repeat this step until a complete feature pyramid is constructed;

步骤S3：构建全卷积预测器FCP网络，全卷积预测器FCP是一个能同时输出目标边界框信息及分类概率的预测器，分别对特征金字塔中的所有尺度的特征映射进行预测，预测器使输入的特征映射经过一个卷积神经网络后输出一个大小为S*S*(B*5+C)的向量作为预测结果，其作用相当于把原图像分割为S*S个网格，对每个网格预测B个边界框，每个边界框包含5个信息，包括边界框的中心坐标偏移值(t_x，t_y)，边界框的宽高偏移值(t_w,t_h)，以及预测边界框的置信度t₀，还有对每个网格预测C个目标类别的概率；Step S3: Construct a full convolution predictor FCP network. The full convolution predictor FCP is a predictor that can output the target bounding box information and classification probability at the same time, and predicts the feature maps of all scales in the feature pyramid respectively. After the input feature map is passed through a convolutional neural network, a vector of size S*S*(B*5+C) is output as the prediction result, which is equivalent to dividing the original image into S*S grids. Each grid predicts B bounding boxes, and each bounding box contains 5 pieces of information, including the center coordinate offset value of the bounding box (t _x , _ty ), the width and height offset value of the bounding box (t _w , t _{h )} ), and the confidence t ₀ of the predicted bounding box, and the probability of predicting C target classes for each grid;

步骤S4：训练整体网络，采集目标图像参数并输入到网络中，各层网络的参数按照Xavier的方式初始化，并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调。Step S4: Train the overall network, collect the target image parameters and input them into the network, the parameters of each layer of the network are initialized according to the Xavier method, and use the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification to calculate the loss gradients and fine-tune the parameters in all layers of the entire network using a backpropagation algorithm.

优选的，所述步骤S1中具体步骤如下：Preferably, the specific steps in the step S1 are as follows:

步骤S101将现有的已训练好的密集连接卷积神经网络模型进行调整得到初步的特征提取网络模型；Step S101 adjusts an existing trained densely connected convolutional neural network model to obtain a preliminary feature extraction network model;

步骤S102密集连接卷积神经网络在实施过程中分为多个的密集连接块，不同的密集连接块之间通过转换层进行连接；In step S102, the densely connected convolutional neural network is divided into multiple densely connected blocks during the implementation process, and different densely connected blocks are connected through a conversion layer;

步骤S103在一个密集连接块内具有多个卷积神经网络层，每一个卷积神经网络层的输入是同一个密集连接块内在它之前的所有卷积神经网络层的输出的叠加；设密集连接块内第l层的卷积网络输入为x_l，输出为y_l，则x_l＝(x₁+y₁+…+y_l-1)，y_l＝H(x_l)，其中H(.)定义为激活函数；Step S103 has multiple convolutional neural network layers in a densely connected block, and the input of each convolutional neural network layer is the superposition of the outputs of all the convolutional neural network layers before it in the same densely connected block; set densely connected The input of the convolutional network of the lth layer in the block is x _l and the output is y _l , then x _l =(x ₁ +y ₁ +...+y _l-1 ), y _l =H(x _l ), where H( .) is defined as the activation function;

步骤S104 H(.)是每层卷积神经网络后接的激活函数，在这里它是一个复合操作，表示输入x_l先经过一个BN操作，再经过一个ReLU函数，最后经过一个卷积层的处理作为整个激活函数的输出；Step S104 H(.) is the activation function followed by each layer of the convolutional neural network, here it is a compound operation, indicating that the input x _l first undergoes a BN operation, then a ReLU function, and finally a convolutional layer. Process the output as the entire activation function;

步骤S105由于不同的密集连接块的空间大小不同，所以相互之间通过一个转换层进行连接，转换层以上一个密集连接块的输出作为输入，先经过一个BN操作，再接一个卷积神经网络层，最后经过一个池化层将特征映射的空间大小调整到符合下一个密集连接块的输入；在这里设经过池化层特征映射的空间大小变为原来的1/n倍；In step S105, due to the different spatial sizes of different dense connection blocks, they are connected to each other through a conversion layer, and the output of a dense connection block above the conversion layer is used as the input, first through a BN operation, and then a convolutional neural network layer. , and finally, through a pooling layer, the spatial size of the feature map is adjusted to match the input of the next dense connection block; here, the spatial size of the feature map after the pooling layer is set to be 1/n times the original size;

步骤S106密集连接块和转换层进行多次交替连接，使得特征映射的空间大小每经过一个密集连接块后都减小，而特征映射的通道数则增加，在这里设每个密集连接块的最后一层卷积神经网络输出的特征映射为c_m；In step S106, the dense connection block and the conversion layer are alternately connected for many times, so that the space size of the feature map decreases after each dense connection block, while the number of channels of the feature map increases. The feature map output by a layer of convolutional _neural network is cm;

步骤S107删除现有的密集连接卷积神经网络的全局平均池化层和全连接的分类层，并将最后一个密集连接块的最后一层卷积神经网络输出的特征映射作为特征提取网络的输出。Step S107 deletes the global average pooling layer and the fully connected classification layer of the existing densely connected convolutional neural network, and uses the feature map output by the last layer of the convolutional neural network of the last densely connected block as the output of the feature extraction network .

优选的，所述步骤S2中具体步骤如下：Preferably, the specific steps in the step S2 are as follows:

步骤S201FPN由“自下而上的特征金字塔”和一个“平行路径”组成，FPN先从特征提取网络中获取其各层具有不同语义不同尺度的视觉特征，然后由“自下而上”的结构堆叠生成较低语义特征的特征金字塔；Step S201 FPN consists of a "bottom-up feature pyramid" and a "parallel path". FPN first obtains visual features with different semantics and different scales in each layer from the feature extraction network, and then consists of a "bottom-up" structure. Stacking generates feature pyramids of lower semantic features;

步骤S202取步骤S107中输出的特征映射作为FPN的首个输入，输入的特征映射用一个卷积层将通道数调整为一常数d，并将经过通道数调整后的特征映射作为特征金字塔的最低层特征映射，在这里设特征金字塔每层的特征映射为D_m；Step S202 takes the feature map output in step S107 as the first input of the FPN, the input feature map uses a convolution layer to adjust the number of channels to a constant d, and uses the feature map adjusted by the number of channels as the lowest of the feature pyramid. Layer feature map, where the feature map of each layer of the feature pyramid is set as D _m ;

步骤S203FPN中的“自下而上路径”，其主要任务是对特征金字塔的低一层特征映射进行上抽样，其上抽样的因子为特征提取网络中池化层的缩小因子的倒数n，得到的特征映射与步骤S1中相对应的密集连接块输出的特征映射具有相同的空间大小；Step S203 The "bottom-up path" in the FPN, its main task is to upsample the feature map of the lower layer of the feature pyramid, and the upsampling factor is the reciprocal n of the reduction factor of the pooling layer in the feature extraction network, obtaining The feature map has the same spatial size as the feature map output by the corresponding densely connected block in step S1;

步骤S204FPN中的“平行路径”，它以步骤S1中各个密集连接块输出的特征映射作为输入，然后使用一个卷积层把输出的特征映射的通道数调整为d；Step S204 the "parallel path" in FPN, which takes the feature map output by each dense connection block in step S1 as input, and then uses a convolution layer to adjust the number of channels of the output feature map to d;

步骤S205经过步骤S203和步骤S204，得到两个在空间大小和通道数上相同的特征映射，把这两个特征映射进行对应元素相加，然后经过一个卷积层达到减少上抽样过程中的混叠效应，由此得到了特征金字塔低一层的特征映射，把步骤S203和步骤S204中对输入的操作分别记作f(.)和g(.)，则D_m＝g(C_m)，D_k＝∫(f(D_k+1)+g(C_k))，其中(0<k<m)，∫表示S2.5中的卷积操作；In step S205, through steps S203 and S204, two feature maps with the same spatial size and number of channels are obtained, the corresponding elements of the two feature maps are added, and then a convolutional layer is used to reduce the mixing in the up-sampling process. Therefore, the feature map of the lower layer of the feature pyramid is obtained, and the operations on the input in steps S203 and S204 are respectively denoted as f(.) and g(.), then D _m =g(C _m ), D _k =∫(f(D _k+1 )+g(C _k )), where (0<k<m), ∫ represents the convolution operation in S2.5;

步骤S206重复步骤S203、步骤S204和步骤S205，使得从特征金字塔的最低层逐层往上地构建出整个特征金字塔。Step S206 repeats step S203, step S204 and step S205, so that the entire feature pyramid is constructed layer by layer from the lowest level of the feature pyramid upward.

优选的，所述步骤S3中具体步骤如下：Preferably, the specific steps in the step S3 are as follows:

步骤S301在步骤S02中得到了一个特征金字塔，其特点是特征金字塔的特征尺度自下而上逐层增加，但是每一层的通道数保持不变，相邻两层的特征映射的空间大小的比例因子为n，构建一个同时输出目标边界框信息及分类概率的预测器，预测器将作用于特征金字塔的每一层特征，使得网络能利用不同尺度的特征映射；In step S301, a feature pyramid is obtained in step S02, which is characterized in that the feature scale of the feature pyramid increases layer by layer from bottom to top, but the number of channels in each layer remains unchanged, and the spatial size of the feature maps of two adjacent layers is different. The scale factor is n, and a predictor that outputs the target bounding box information and classification probability at the same time is constructed. The predictor will act on the features of each layer of the feature pyramid, so that the network can use feature maps of different scales;

步骤S302输出目标边界框信息及分类概率的预测器的构建，以特征金字塔的某一层特征映射为输入，经过两个全连接层的处理后，输出一个S*S*(B*5+C)的向量作为预测结果，其作用相当于把原图像分割为S*S个网格，对每个网格预测B个边界框，每个边界框包含5个信息，包括边界框的中心坐标偏移值(t_x，t_y)，边界框的宽高偏移值(t_w，t_h)，以及预测边界框的置信度t₀，还有对每个网格预测C个目标类别的概率；Step S302 is the construction of a predictor that outputs target bounding box information and classification probability, takes a feature map of a certain layer of the feature pyramid as input, and after processing two fully connected layers, outputs a S*S*(B*5+C ) vector as the prediction result, which is equivalent to dividing the original image into S*S grids, and predicting B bounding boxes for each grid, each bounding box contains 5 pieces of information, including the center coordinate deviation of the bounding box. Shift value (t _x , _ty ), the width and height offset value of the bounding box (t _w , _th ), and the confidence t ₀ of the predicted bounding box, and the probability of predicting C target categories for each grid ;

步骤S303坐标值的计算：Step S303 Calculation of coordinate values:

x＝c_x+σ(t_x)x=c _x +σ(t _x )

y＝c_y+σ(t_y) _y =cy +σ( _ty )

σ(t₀)＝Pr(object)*IOU(b，object)σ(t ₀ )=Pr(object)*IOU(b, object)

其中x，y为边界框中心在图像中的实际坐标，w，h分别为边界框的宽和高；(c_x，c_y)为格子的左上角坐标为，p_w，p_h为输入图像的宽和高分别。where x, y are the actual coordinates of the center of the bounding box in the image, w, h are the width and height of the bounding box, respectively; (c _x , c _y ) are the coordinates of the upper left corner of the grid, p _w , p _h are the input image width and height respectively.

优选的，所述步骤S4中具体步骤如下：Preferably, the specific steps in the step S4 are as follows:

步骤S401图像采集：采集日常生活中包含各类目标的图像作为训练图像，每张图像带上经过处理都得到的关于该图像中目标的边界框及分类的信息；Step S401 image collection: collect images containing various types of targets in daily life as training images, and each image carries the information about the bounding box and classification of the target in the image obtained after processing;

步骤S402为各个预测量建立代价函数用于训练，对于边界框的中心坐标，,采用公式Step S402 establishes a cost function for each predictor for training, and for the center coordinates of the bounding box, the formula is

作为代价函数，对于边界框的宽高，采用公式As a cost function, for the width and height of the bounding box, the formula

作为代价函数，对于预测类别，采用公式As a cost function, for the predicted class, the formula

其中λ_coord和λ_noobj是为了让代价函数在边界框和概率的代价间作出平衡，而

表示目标出现在第i个格子中，

表示第i个格子中的第j个边界框对应预测的目标，最终得到如下的代价函数：where λ _coord and λ _noobj are designed to balance the cost function between bounding box and probability costs, and

Indicates that the target appears in the i-th grid,

Represents the predicted target corresponding to the jth bounding box in the ith grid, and finally obtains the following cost function:

步骤S403把步骤S401中收集到的已做好标记的数据输入到网络中，各层的参数按照Xavier的方式初始化，并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调，达到对网络进行训练的目的。Step S403 inputs the marked data collected in step S401 into the network, the parameters of each layer are initialized according to the Xavier method, and the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification is adopted. Calculate the loss gradient and use the backpropagation algorithm to fine-tune the parameters in all layers of the entire network to achieve the purpose of training the network.

优选的，所述步骤S1中，使用密集连接块与转换层交替连接的网络结构进行特征提取，能提取到图像中更好有判别性的特征映射。Preferably, in the step S1, feature extraction is performed using a network structure in which densely connected blocks and conversion layers are alternately connected, so that a better discriminative feature map in the image can be extracted.

优选的，所述的一种密集连接卷积下而上的特征金字塔”和“平行路径”组成的FPN网络，能够有效利用高语义低尺度和高尺度低语义的特征映射，构建出具备高语义特征、大尺度和高定位信息的特征金字塔。Preferably, the FPN network composed of a densely connected convolutional bottom-up feature pyramid and a parallel path can effectively utilize the feature maps of high-semantic low-scale and high-scale low-semantics to construct a feature map with high semantics Feature pyramid of features, large scale and high localization information.

与现有技术相比，本发明的有益效果是：Compared with the prior art, the beneficial effects of the present invention are:

本发明提供了一种密集连接卷积神经网络的全卷积目标检测方法，其特点在于可以有效地利用多尺度的特征映射来进行目标检测，使得卷积神经网络对同一图像中的不同尺度目标的检测都具有较高的准确率。The present invention provides a full convolution target detection method with densely connected convolutional neural network, which is characterized in that it can effectively use multi-scale feature maps to detect targets, so that the convolutional neural network can detect targets of different scales in the same image. detection has high accuracy.

附图说明Description of drawings

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

具体实施方式Detailed ways

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent;

以下结合附图和实施例对本发明做进一步的阐述。The present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

如图1所示，本发明提供一种密集连接卷积神经网络的全卷积目标检测方法，具体包括以下步骤：As shown in FIG. 1, the present invention provides a fully convolutional target detection method of densely connected convolutional neural network, which specifically includes the following steps:

步骤S106密集连接块和转换层进行多次交替连接，使得特征映射的空间大小每经过一个密集连接块后都减小，而特征映射的通道数则增加，在这里设每个密集连接块的最后一层卷积神经网络输出的特征映射为C_m；In step S106, the dense connection block and the conversion layer are alternately connected for many times, so that the space size of the feature map decreases after each dense connection block, while the number of channels of the feature map increases. The feature map output by a layer of convolutional neural network is C _m ;

步骤S303坐标值的计算：Step S303 Calculation of coordinate values:

x＝c_x+σ(t_x)x=c _x +σ(t _x )

y＝c_y+σ(t_y) _y =cy +σ( _ty )

σ(t₀)＝Pr(object)*IOU(b,object)σ(t ₀ )=Pr(object)*IOU(b,object)

表示目标出现在第i个格子中，

Indicates that the target appears in the i-th grid,

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. A full convolution target detection method of a dense connection convolution neural network is characterized by comprising the following steps:

step S1: constructing a feature extraction network, wherein the feature extraction network consists of a plurality of dense connecting blocks and a conversion layer, the dense connecting blocks can be used for identifying visual features which are more discriminative in the image, and the features which are output by each dense connecting block and have different semantics and different resolutions are reserved after the input image passes through the feature extraction network;

step S2: constructing a feature pyramid FPN, inputting each layer of features reserved in the step S1 into the FPN, stacking according to feature scales to form a low-semantic feature pyramid with scales increasing from bottom to top, and performing convolution operation on each layer of features through a parallel path from the lowest layer to obtain higher semantic; simultaneously, the convolved features are sampled to the same scale of the previous layer of features and are combined with the previous layer of features, the features are continuously transmitted upwards until the top of the pyramid, and the step is circulated until a complete feature pyramid is constructed;

step S3: constructing a full convolution predictor FCP network, wherein the full convolution predictor FCP is a predictor capable of simultaneously outputting target boundary box information and classification probability, predicting feature mappings of all scales in a feature pyramid respectively, enabling the input feature mappings to pass through a convolution neural network and then outputting a vector with the size of S (B5 + C) as a prediction result, the function of the predictor is equivalent to dividing an original image into S grids, predicting B boundary boxes for each grid, and each boundary box contains 5 pieces of information and comprises a central coordinate deviation value (t) of the boundary box_x,t_y) Width and height offset value (t) of bounding box_w,t_h) And confidence t of the predicted bounding box₀And also the probability of predicting C object classes for each mesh;

step S4: training the whole network, collecting target image parameters and inputting the target image parameters into the network, initializing the parameters of each layer of network according to an Xavier mode, calculating a loss gradient by adopting a random gradient descent algorithm of a loss function consisting of bounding box coordinate regression and object classification, and finely adjusting the parameters in all layers in the whole network by using a reverse conduction algorithm.

2. The method for detecting the full convolution target of the densely-connected convolutional neural network of claim 1, wherein the specific steps in the step S1 are as follows:

s101, adjusting an existing trained dense connection convolutional neural network model to obtain a primary feature extraction network model;

s102, the dense connection convolutional neural network is divided into a plurality of dense connection blocks in the implementation process, and different dense connection blocks are connected through conversion layers;

step S103, a plurality of convolutional neural network layers are arranged in a dense connection block, and the input of each convolutional neural network layer is the superposition of the outputs of all the convolutional neural network layers before the convolutional neural network layer in the same dense connection block; let the convolutional network input of the l-th layer in the dense connection block be x_lOutput is y_lThen x_l＝(x₁+y₁+…+y_l-1)，y_l＝H(x_l) Where H (.) is defined as the activation function;

step S104H (.) is the activation function followed by each layer of the convolutional neural network, where it is a complex operation representing the input x_lFirst, a BN operation is carried out, then a ReLU function is carried out, and finally a convolution layer is processed to be used as the output of the whole activation function;

s105, because the space sizes of different dense connecting blocks are different, the dense connecting blocks are connected with each other through a conversion layer, the output of more than one dense connecting block of the conversion layer is used as input, the output is firstly subjected to BN operation, then is connected with a convolutional neural network layer, and finally the space size of feature mapping is adjusted to be in accordance with the input of the next dense connecting block through a pooling layer; setting the space size after the characteristic mapping of the pooling layer to be 1/n times of the original space size;

step S106, the dense connecting blocks and the conversion layers are alternately connected for a plurality of times, so that the space size of the feature mapping is reduced after passing through one dense connecting block every time, the number of channels of the feature mapping is increased, and the feature mapping output by the last layer of convolutional neural network of each dense connecting block is set as C_m；

Step S107 deletes the global average pooling layer and the fully-connected classification layer of the existing densely-connected convolutional neural network, and uses the feature mapping output by the last layer of convolutional neural network of the last densely-connected block as the output of the feature extraction network.

3. The method for detecting the full convolution target of the densely-connected convolutional neural network of claim 2, wherein the specific steps in the step S2 are as follows:

step S201FPN is composed of a bottom-up feature pyramid and a parallel path, the FPN firstly obtains visual features of different semantemes and different scales of each layer from a feature extraction network, and then the feature pyramid with lower semantic features is generated by stacking bottom-up structures;

step S202 takes the feature mapping output in step S107 as the first input of FPN, the input feature mapping uses a convolution layer to adjust the channel number to a constant D, and the feature mapping after the channel number adjustment is used as the feature mapping of the lowest layer of the feature pyramid, wherein the feature mapping of each layer of the feature pyramid is set as D_m；

The 'bottom-up path' in the step S203FPN has a main task of upsampling the feature map of the lower layer of the feature pyramid, wherein the factor of the upsampling is the reciprocal n of the reduction factor of the pooling layer in the feature extraction network, and the obtained feature map has the same space size as the feature map output by the corresponding dense connection block in the step S1;

a "parallel path" in step S204FPN, which takes the feature map output from each dense connection block in step S1 as input, and then adjusts the number of channels of the output feature map to d using one convolutional layer;

step S205, through step S203 and step S204, obtains two feature maps with the same spatial size and channel number, adds corresponding elements to the two feature maps, and then reduces aliasing effect in the upsampling process through a convolutional layer, thereby obtaining a feature map with a lower feature pyramid, and records the input operations in step S203 and step S204 as f () and g (), respectively, then D_m＝g(C_m)，D_k＝∫(f(D_k+1)+g(C_k) Wherein k is more than 0 and less than m, and the formulaIllustrating the convolution operation in S205;

step S206 repeats step S203, step S204, and step S205, so that the entire feature pyramid is constructed layer by layer upward from the lowest layer of the feature pyramid.

4. The method for detecting the full convolution target of the densely-connected convolutional neural network of claim 2, wherein the specific steps in the step S3 are as follows:

step S301 is to obtain a feature pyramid in step S02, wherein the feature scale of the feature pyramid is increased from bottom to top layer by layer, but the number of channels in each layer is kept unchanged, the scale factor of the space size of the feature mapping of two adjacent layers is n, a predictor for simultaneously outputting the information of the target boundary box and the classification probability is constructed, and the predictor acts on the features of each layer of the feature pyramid, so that the network can utilize the feature mapping of different scales;

step S302 is to construct a predictor for outputting target bounding box information and classification probability, which takes a feature map of a certain layer of a feature pyramid as input, and after processing of two fully connected layers, outputs a vector of S × S (B × 5+ C) as a prediction result, which is equivalent to dividing an original image into S × S meshes, and predicts B bounding boxes for each mesh, each bounding box containing 5 pieces of information, including a central coordinate offset value (t) of the bounding box_x，t_y) Width and height offset value (t) of bounding box_w，t_h) And confidence t of the predicted bounding box₀And also the probability of predicting C object classes for each mesh;

step S303 calculation of coordinate values:

x＝c_x+σ(t_x)

y＝c_y+σ(t_y)

σ(t₀)＝Pr(object)*IOU(b，object)

wherein x and y are the actual coordinates of the center of the bounding box in the image, and w and h are the width and height of the bounding box respectively; (c)_x，c_y) Is the coordinate of the upper left corner of the grid, p_w，p_hThe width and height of the input image, respectively.

5. The method for detecting the full convolution target of the densely-connected convolutional neural network of claim 1, wherein the specific steps in the step S4 are as follows:

step S401, image acquisition: collecting images containing various targets in daily life as training images, wherein each image is provided with processed information about a boundary frame and classification of the targets in the image;

step S402, establishing a cost function for each pre-measurement for training, and adopting a formula for the center coordinates of the bounding box

As a cost function, for the width and height of the bounding box, a formula is adopted

As a cost function, for the prediction class, a formula is adopted

Wherein λ is_coordAnd λ_noobjIn order to balance the cost function between the cost of the bounding box and the probability

The presentation target appears in the ith cell,

representing the target corresponding to the prediction of the jth bounding box in the ith lattice, and finally obtaining the following cost function:

step S403 inputs the marked data collected in step S401 into the network, initializes the parameters of each layer in the Xavier manner, calculates the loss gradient by using a random gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification, and finely adjusts the parameters in all layers in the entire network by using a back-propagation algorithm, thereby achieving the purpose of training the network.

6. The method of claim 1, wherein in step S1, the feature extraction is performed by using a network structure in which the densely connected blocks and the conversion layers are alternately connected, so as to extract a feature map with better discriminability in the image.

7. The method for detecting the full-convolution target of the densely-connected convolutional neural network as claimed in claim 1, wherein a FPN network composed of a "bottom-up feature pyramid" and a "parallel path" is used, so that feature mapping of high semantic low-scale and high-scale low semantic can be effectively utilized to construct a feature pyramid with high semantic features, large scale and high positioning information.