CN109214505B - Full convolution target detection method of densely connected convolution neural network - Google Patents
Full convolution target detection method of densely connected convolution neural network Download PDFInfo
- Publication number
- CN109214505B CN109214505B CN201810998184.3A CN201810998184A CN109214505B CN 109214505 B CN109214505 B CN 109214505B CN 201810998184 A CN201810998184 A CN 201810998184A CN 109214505 B CN109214505 B CN 109214505B
- Authority
- CN
- China
- Prior art keywords
- feature
- layer
- network
- neural network
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 title claims abstract 6
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000013507 mapping Methods 0.000 claims abstract 17
- 238000013527 convolutional neural network Methods 0.000 claims description 48
- 238000000605 extraction Methods 0.000 claims description 24
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 238000011176 pooling Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 230000000007 visual effect Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims 1
- 238000005259 measurement Methods 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000007547 defect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 22
- 150000001875 compounds Chemical class 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及人工智能领域,更具体的,涉及一种密集连接卷积神经网络的全卷积目标检测方法。The invention relates to the field of artificial intelligence, and more particularly, to a fully convolutional target detection method with densely connected convolutional neural networks.
背景技术Background technique
卷积神经网络对特征的检测具有不变性。例如对一物体进行平移、旋转后卷积神经网络仍能识别它们为同一物体,但是对于一些在图像中所占面积较少的目标,其信息在卷积神经网络提取特征的过程中会丢失,导致无法准确地检测出目标。随着近期研究的推进,人们发现在使用“多尺度”的特征表示时能有效地提高对不同尺度的目标检测的准确率。曾有人尝试使用图像金字塔进行多尺度目标的检测,具体做法是先对一副图像进行多个尺度的缩放,然后将不同尺度的图像输入到卷积神经网络中去,但是这种方法需要很大的计算量和内存,因此不具有可行性。Convolutional neural networks are invariant to feature detection. For example, after translating and rotating an object, the convolutional neural network can still recognize them as the same object, but for some objects that occupy a small area in the image, their information will be lost in the process of extracting features by the convolutional neural network. As a result, the target cannot be accurately detected. With the advancement of recent research, it has been found that the use of "multi-scale" feature representation can effectively improve the accuracy of object detection at different scales. Some people have tried to use the image pyramid for multi-scale target detection. The specific method is to first scale an image at multiple scales, and then input images of different scales into the convolutional neural network, but this method requires a lot of The amount of computation and memory is therefore not feasible.
发明内容SUMMARY OF THE INVENTION
为了克服现有方法不能对多尺度目标进行更准确的检测,本发明提供了一种密集连接卷积神经网络的全卷积目标检测方法。In order to overcome the inability of existing methods to perform more accurate detection on multi-scale targets, the present invention provides a fully convolutional target detection method with densely connected convolutional neural networks.
为实现以上发明目的,采用的技术方案是:In order to achieve the above purpose of the invention, the technical scheme adopted is:
一种密集连接卷积神经网络的全卷积目标检测方法,具体包括以下步骤:A fully convolutional object detection method based on densely connected convolutional neural networks, which specifically includes the following steps:
步骤S1:构建特征提取网络Densenet,特征提取网络由多个密集连接块及转换层组成,使用密集连接块能识别到图像中更具有判别性的视觉特征,输入图像经过特征提取网络后,保留各个密集连接块输出的具有不同语义和不同分辨率的特征;Step S1: Build a feature extraction network Densenet. The feature extraction network consists of multiple dense connection blocks and conversion layers. Using dense connection blocks can identify more discriminative visual features in the image. After the input image passes through the feature extraction network, each Features with different semantics and different resolutions output by densely connected blocks;
步骤S2:构建特征金字塔FPN,把步骤S1中保留各层特征输入到FPN中,按照特征尺度堆叠,形成一个自下而上、尺度递增的低语义特征金字塔,由最下层开始,每层特征都经过“平行路径”进行卷积操作以获得更高的语义性;同时卷积后的特征会被上抽样到上一层特征的相同尺度,并与上一层特征进行合并,该特征将会继续往上传递,直到金字塔塔顶,循环此步骤直到构建出完整的特征金字塔;Step S2: Build a feature pyramid FPN, input the features of each layer retained in step S1 into the FPN, and stack them according to the feature scale to form a bottom-up, scale-increasing low-semantic feature pyramid. The convolution operation is performed through the "parallel path" to obtain higher semantics; at the same time, the convolved feature will be up-sampled to the same scale of the previous layer feature and merged with the previous layer feature, and the feature will continue to be Pass up until the top of the pyramid, and repeat this step until a complete feature pyramid is constructed;
步骤S3:构建全卷积预测器FCP网络,全卷积预测器FCP是一个能同时输出目标边界框信息及分类概率的预测器,分别对特征金字塔中的所有尺度的特征映射进行预测,预测器使输入的特征映射经过一个卷积神经网络后输出一个大小为S*S*(B*5+C)的向量作为预测结果,其作用相当于把原图像分割为S*S个网格,对每个网格预测B个边界框,每个边界框包含5个信息,包括边界框的中心坐标偏移值(tx,ty),边界框的宽高偏移值(tw,th),以及预测边界框的置信度t0,还有对每个网格预测C个目标类别的概率;Step S3: Construct a full convolution predictor FCP network. The full convolution predictor FCP is a predictor that can output the target bounding box information and classification probability at the same time, and predicts the feature maps of all scales in the feature pyramid respectively. After the input feature map is passed through a convolutional neural network, a vector of size S*S*(B*5+C) is output as the prediction result, which is equivalent to dividing the original image into S*S grids. Each grid predicts B bounding boxes, and each bounding box contains 5 pieces of information, including the center coordinate offset value of the bounding box (t x , ty ), the width and height offset value of the bounding box (t w , t h ) ), and the confidence t 0 of the predicted bounding box, and the probability of predicting C target classes for each grid;
步骤S4:训练整体网络,采集目标图像参数并输入到网络中,各层网络的参数按照Xavier的方式初始化,并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调。Step S4: Train the overall network, collect the target image parameters and input them into the network, the parameters of each layer of the network are initialized according to the Xavier method, and use the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification to calculate the loss gradients and fine-tune the parameters in all layers of the entire network using a backpropagation algorithm.
优选的,所述步骤S1中具体步骤如下:Preferably, the specific steps in the step S1 are as follows:
步骤S101将现有的已训练好的密集连接卷积神经网络模型进行调整得到初步的特征提取网络模型;Step S101 adjusts an existing trained densely connected convolutional neural network model to obtain a preliminary feature extraction network model;
步骤S102密集连接卷积神经网络在实施过程中分为多个的密集连接块,不同的密集连接块之间通过转换层进行连接;In step S102, the densely connected convolutional neural network is divided into multiple densely connected blocks during the implementation process, and different densely connected blocks are connected through a conversion layer;
步骤S103在一个密集连接块内具有多个卷积神经网络层,每一个卷积神经网络层的输入是同一个密集连接块内在它之前的所有卷积神经网络层的输出的叠加;设密集连接块内第l层的卷积网络输入为xl,输出为yl,则xl=(x1+y1+…+yl-1),yl=H(xl),其中H(.)定义为激活函数;Step S103 has multiple convolutional neural network layers in a densely connected block, and the input of each convolutional neural network layer is the superposition of the outputs of all the convolutional neural network layers before it in the same densely connected block; set densely connected The input of the convolutional network of the lth layer in the block is x l and the output is y l , then x l =(x 1 +y 1 +...+y l-1 ), y l =H(x l ), where H( .) is defined as the activation function;
步骤S104 H(.)是每层卷积神经网络后接的激活函数,在这里它是一个复合操作,表示输入xl先经过一个BN操作,再经过一个ReLU函数,最后经过一个卷积层的处理作为整个激活函数的输出;Step S104 H(.) is the activation function followed by each layer of the convolutional neural network, here it is a compound operation, indicating that the input x l first undergoes a BN operation, then a ReLU function, and finally a convolutional layer. Process the output as the entire activation function;
步骤S105由于不同的密集连接块的空间大小不同,所以相互之间通过一个转换层进行连接,转换层以上一个密集连接块的输出作为输入,先经过一个BN操作,再接一个卷积神经网络层,最后经过一个池化层将特征映射的空间大小调整到符合下一个密集连接块的输入;在这里设经过池化层特征映射的空间大小变为原来的1/n倍;In step S105, due to the different spatial sizes of different dense connection blocks, they are connected to each other through a conversion layer, and the output of a dense connection block above the conversion layer is used as the input, first through a BN operation, and then a convolutional neural network layer. , and finally, through a pooling layer, the spatial size of the feature map is adjusted to match the input of the next dense connection block; here, the spatial size of the feature map after the pooling layer is set to be 1/n times the original size;
步骤S106密集连接块和转换层进行多次交替连接,使得特征映射的空间大小每经过一个密集连接块后都减小,而特征映射的通道数则增加,在这里设每个密集连接块的最后一层卷积神经网络输出的特征映射为cm;In step S106, the dense connection block and the conversion layer are alternately connected for many times, so that the space size of the feature map decreases after each dense connection block, while the number of channels of the feature map increases. The feature map output by a layer of convolutional neural network is cm;
步骤S107删除现有的密集连接卷积神经网络的全局平均池化层和全连接的分类层,并将最后一个密集连接块的最后一层卷积神经网络输出的特征映射作为特征提取网络的输出。Step S107 deletes the global average pooling layer and the fully connected classification layer of the existing densely connected convolutional neural network, and uses the feature map output by the last layer of the convolutional neural network of the last densely connected block as the output of the feature extraction network .
优选的,所述步骤S2中具体步骤如下:Preferably, the specific steps in the step S2 are as follows:
步骤S201FPN由“自下而上的特征金字塔”和一个“平行路径”组成,FPN先从特征提取网络中获取其各层具有不同语义不同尺度的视觉特征,然后由“自下而上”的结构堆叠生成较低语义特征的特征金字塔;Step S201 FPN consists of a "bottom-up feature pyramid" and a "parallel path". FPN first obtains visual features with different semantics and different scales in each layer from the feature extraction network, and then consists of a "bottom-up" structure. Stacking generates feature pyramids of lower semantic features;
步骤S202取步骤S107中输出的特征映射作为FPN的首个输入,输入的特征映射用一个卷积层将通道数调整为一常数d,并将经过通道数调整后的特征映射作为特征金字塔的最低层特征映射,在这里设特征金字塔每层的特征映射为Dm;Step S202 takes the feature map output in step S107 as the first input of the FPN, the input feature map uses a convolution layer to adjust the number of channels to a constant d, and uses the feature map adjusted by the number of channels as the lowest of the feature pyramid. Layer feature map, where the feature map of each layer of the feature pyramid is set as D m ;
步骤S203FPN中的“自下而上路径”,其主要任务是对特征金字塔的低一层特征映射进行上抽样,其上抽样的因子为特征提取网络中池化层的缩小因子的倒数n,得到的特征映射与步骤S1中相对应的密集连接块输出的特征映射具有相同的空间大小;Step S203 The "bottom-up path" in the FPN, its main task is to upsample the feature map of the lower layer of the feature pyramid, and the upsampling factor is the reciprocal n of the reduction factor of the pooling layer in the feature extraction network, obtaining The feature map has the same spatial size as the feature map output by the corresponding densely connected block in step S1;
步骤S204FPN中的“平行路径”,它以步骤S1中各个密集连接块输出的特征映射作为输入,然后使用一个卷积层把输出的特征映射的通道数调整为d;Step S204 the "parallel path" in FPN, which takes the feature map output by each dense connection block in step S1 as input, and then uses a convolution layer to adjust the number of channels of the output feature map to d;
步骤S205经过步骤S203和步骤S204,得到两个在空间大小和通道数上相同的特征映射,把这两个特征映射进行对应元素相加,然后经过一个卷积层达到减少上抽样过程中的混叠效应,由此得到了特征金字塔低一层的特征映射,把步骤S203和步骤S204中对输入的操作分别记作f(.)和g(.),则Dm=g(Cm),Dk=∫(f(Dk+1)+g(Ck)),其中(0<k<m),∫表示S2.5中的卷积操作;In step S205, through steps S203 and S204, two feature maps with the same spatial size and number of channels are obtained, the corresponding elements of the two feature maps are added, and then a convolutional layer is used to reduce the mixing in the up-sampling process. Therefore, the feature map of the lower layer of the feature pyramid is obtained, and the operations on the input in steps S203 and S204 are respectively denoted as f(.) and g(.), then D m =g(C m ), D k =∫(f(D k+1 )+g(C k )), where (0<k<m), ∫ represents the convolution operation in S2.5;
步骤S206重复步骤S203、步骤S204和步骤S205,使得从特征金字塔的最低层逐层往上地构建出整个特征金字塔。Step S206 repeats step S203, step S204 and step S205, so that the entire feature pyramid is constructed layer by layer from the lowest level of the feature pyramid upward.
优选的,所述步骤S3中具体步骤如下:Preferably, the specific steps in the step S3 are as follows:
步骤S301在步骤S02中得到了一个特征金字塔,其特点是特征金字塔的特征尺度自下而上逐层增加,但是每一层的通道数保持不变,相邻两层的特征映射的空间大小的比例因子为n,构建一个同时输出目标边界框信息及分类概率的预测器,预测器将作用于特征金字塔的每一层特征,使得网络能利用不同尺度的特征映射;In step S301, a feature pyramid is obtained in step S02, which is characterized in that the feature scale of the feature pyramid increases layer by layer from bottom to top, but the number of channels in each layer remains unchanged, and the spatial size of the feature maps of two adjacent layers is different. The scale factor is n, and a predictor that outputs the target bounding box information and classification probability at the same time is constructed. The predictor will act on the features of each layer of the feature pyramid, so that the network can use feature maps of different scales;
步骤S302输出目标边界框信息及分类概率的预测器的构建,以特征金字塔的某一层特征映射为输入,经过两个全连接层的处理后,输出一个S*S*(B*5+C)的向量作为预测结果,其作用相当于把原图像分割为S*S个网格,对每个网格预测B个边界框,每个边界框包含5个信息,包括边界框的中心坐标偏移值(tx,ty),边界框的宽高偏移值(tw,th),以及预测边界框的置信度t0,还有对每个网格预测C个目标类别的概率;Step S302 is the construction of a predictor that outputs target bounding box information and classification probability, takes a feature map of a certain layer of the feature pyramid as input, and after processing two fully connected layers, outputs a S*S*(B*5+C ) vector as the prediction result, which is equivalent to dividing the original image into S*S grids, and predicting B bounding boxes for each grid, each bounding box contains 5 pieces of information, including the center coordinate deviation of the bounding box. Shift value (t x , ty ), the width and height offset value of the bounding box (t w , th ), and the confidence t 0 of the predicted bounding box, and the probability of predicting C target categories for each grid ;
步骤S303坐标值的计算:Step S303 Calculation of coordinate values:
x=cx+σ(tx)x=c x +σ(t x )
y=cy+σ(ty) y =cy +σ( ty )
σ(t0)=Pr(object)*IOU(b,object)σ(t 0 )=Pr(object)*IOU(b, object)
其中x,y为边界框中心在图像中的实际坐标,w,h分别为边界框的宽和高;(cx,cy)为格子的左上角坐标为,pw,ph为输入图像的宽和高分别。where x, y are the actual coordinates of the center of the bounding box in the image, w, h are the width and height of the bounding box, respectively; (c x , c y ) are the coordinates of the upper left corner of the grid, p w , p h are the input image width and height respectively.
优选的,所述步骤S4中具体步骤如下:Preferably, the specific steps in the step S4 are as follows:
步骤S401图像采集:采集日常生活中包含各类目标的图像作为训练图像,每张图像带上经过处理都得到的关于该图像中目标的边界框及分类的信息;Step S401 image collection: collect images containing various types of targets in daily life as training images, and each image carries the information about the bounding box and classification of the target in the image obtained after processing;
步骤S402为各个预测量建立代价函数用于训练,对于边界框的中心坐标,,采用公式Step S402 establishes a cost function for each predictor for training, and for the center coordinates of the bounding box, the formula is
作为代价函数,对于边界框的宽高,采用公式As a cost function, for the width and height of the bounding box, the formula
作为代价函数,对于预测类别,采用公式As a cost function, for the predicted class, the formula
其中λcoord和λnoobj是为了让代价函数在边界框和概率的代价间作出平衡,而表示目标出现在第i个格子中,表示第i个格子中的第j个边界框对应预测的目标,最终得到如下的代价函数:where λ coord and λ noobj are designed to balance the cost function between bounding box and probability costs, and Indicates that the target appears in the i-th grid, Represents the predicted target corresponding to the jth bounding box in the ith grid, and finally obtains the following cost function:
步骤S403把步骤S401中收集到的已做好标记的数据输入到网络中,各层的参数按照Xavier的方式初始化,并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调,达到对网络进行训练的目的。Step S403 inputs the marked data collected in step S401 into the network, the parameters of each layer are initialized according to the Xavier method, and the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification is adopted. Calculate the loss gradient and use the backpropagation algorithm to fine-tune the parameters in all layers of the entire network to achieve the purpose of training the network.
优选的,所述步骤S1中,使用密集连接块与转换层交替连接的网络结构进行特征提取,能提取到图像中更好有判别性的特征映射。Preferably, in the step S1, feature extraction is performed using a network structure in which densely connected blocks and conversion layers are alternately connected, so that a better discriminative feature map in the image can be extracted.
优选的,所述的一种密集连接卷积下而上的特征金字塔”和“平行路径”组成的FPN网络,能够有效利用高语义低尺度和高尺度低语义的特征映射,构建出具备高语义特征、大尺度和高定位信息的特征金字塔。Preferably, the FPN network composed of a densely connected convolutional bottom-up feature pyramid and a parallel path can effectively utilize the feature maps of high-semantic low-scale and high-scale low-semantics to construct a feature map with high semantics Feature pyramid of features, large scale and high localization information.
与现有技术相比,本发明的有益效果是:Compared with the prior art, the beneficial effects of the present invention are:
本发明提供了一种密集连接卷积神经网络的全卷积目标检测方法,其特点在于可以有效地利用多尺度的特征映射来进行目标检测,使得卷积神经网络对同一图像中的不同尺度目标的检测都具有较高的准确率。The present invention provides a full convolution target detection method with densely connected convolutional neural network, which is characterized in that it can effectively use multi-scale feature maps to detect targets, so that the convolutional neural network can detect targets of different scales in the same image. detection has high accuracy.
附图说明Description of drawings
图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.
具体实施方式Detailed ways
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only, and should not be construed as limitations on this patent;
以下结合附图和实施例对本发明做进一步的阐述。The present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.
实施例1Example 1
如图1所示,本发明提供一种密集连接卷积神经网络的全卷积目标检测方法,具体包括以下步骤:As shown in FIG. 1, the present invention provides a fully convolutional target detection method of densely connected convolutional neural network, which specifically includes the following steps:
步骤S1:构建特征提取网络Densenet,特征提取网络由多个密集连接块及转换层组成,使用密集连接块能识别到图像中更具有判别性的视觉特征,输入图像经过特征提取网络后,保留各个密集连接块输出的具有不同语义和不同分辨率的特征;Step S1: Build a feature extraction network Densenet. The feature extraction network consists of multiple dense connection blocks and conversion layers. Using dense connection blocks can identify more discriminative visual features in the image. After the input image passes through the feature extraction network, each Features with different semantics and different resolutions output by densely connected blocks;
步骤S2:构建特征金字塔FPN,把步骤S1中保留各层特征输入到FPN中,按照特征尺度堆叠,形成一个自下而上、尺度递增的低语义特征金字塔,由最下层开始,每层特征都经过“平行路径”进行卷积操作以获得更高的语义性;同时卷积后的特征会被上抽样到上一层特征的相同尺度,并与上一层特征进行合并,该特征将会继续往上传递,直到金字塔塔顶,循环此步骤直到构建出完整的特征金字塔;Step S2: Build a feature pyramid FPN, input the features of each layer retained in step S1 into the FPN, and stack them according to the feature scale to form a bottom-up, scale-increasing low-semantic feature pyramid. The convolution operation is performed through the "parallel path" to obtain higher semantics; at the same time, the convolved feature will be up-sampled to the same scale of the previous layer feature and merged with the previous layer feature, and the feature will continue to be Pass up until the top of the pyramid, and repeat this step until a complete feature pyramid is constructed;
步骤S3:构建全卷积预测器FCP网络,全卷积预测器FCP是一个能同时输出目标边界框信息及分类概率的预测器,分别对特征金字塔中的所有尺度的特征映射进行预测,预测器使输入的特征映射经过一个卷积神经网络后输出一个大小为S*S*(B*5+C)的向量作为预测结果,其作用相当于把原图像分割为S*S个网格,对每个网格预测B个边界框,每个边界框包含5个信息,包括边界框的中心坐标偏移值(tx,ty),边界框的宽高偏移值(tw,th),以及预测边界框的置信度t0,还有对每个网格预测C个目标类别的概率;Step S3: Construct a full convolution predictor FCP network. The full convolution predictor FCP is a predictor that can output the target bounding box information and classification probability at the same time, and predicts the feature maps of all scales in the feature pyramid respectively. After the input feature map is passed through a convolutional neural network, a vector of size S*S*(B*5+C) is output as the prediction result, which is equivalent to dividing the original image into S*S grids. Each grid predicts B bounding boxes, and each bounding box contains 5 pieces of information, including the center coordinate offset value of the bounding box (t x , ty ), the width and height offset value of the bounding box (t w , t h ) ), and the confidence t 0 of the predicted bounding box, and the probability of predicting C target classes for each grid;
步骤S4:训练整体网络,采集目标图像参数并输入到网络中,各层网络的参数按照Xavier的方式初始化,并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调。Step S4: Train the overall network, collect the target image parameters and input them into the network, the parameters of each layer of the network are initialized according to the Xavier method, and use the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification to calculate the loss gradients and fine-tune the parameters in all layers of the entire network using a backpropagation algorithm.
优选的,所述步骤S1中具体步骤如下:Preferably, the specific steps in the step S1 are as follows:
步骤S101将现有的已训练好的密集连接卷积神经网络模型进行调整得到初步的特征提取网络模型;Step S101 adjusts an existing trained densely connected convolutional neural network model to obtain a preliminary feature extraction network model;
步骤S102密集连接卷积神经网络在实施过程中分为多个的密集连接块,不同的密集连接块之间通过转换层进行连接;In step S102, the densely connected convolutional neural network is divided into multiple densely connected blocks during the implementation process, and different densely connected blocks are connected through a conversion layer;
步骤S103在一个密集连接块内具有多个卷积神经网络层,每一个卷积神经网络层的输入是同一个密集连接块内在它之前的所有卷积神经网络层的输出的叠加;设密集连接块内第l层的卷积网络输入为xl,输出为yl,则xl=(x1+y1+…+yl-1),yl=H(xl),其中H(.)定义为激活函数;Step S103 has multiple convolutional neural network layers in a densely connected block, and the input of each convolutional neural network layer is the superposition of the outputs of all the convolutional neural network layers before it in the same densely connected block; set densely connected The input of the convolutional network of the lth layer in the block is x l and the output is y l , then x l =(x 1 +y 1 +...+y l-1 ), y l =H(x l ), where H( .) is defined as the activation function;
步骤S104 H(.)是每层卷积神经网络后接的激活函数,在这里它是一个复合操作,表示输入xl先经过一个BN操作,再经过一个ReLU函数,最后经过一个卷积层的处理作为整个激活函数的输出;Step S104 H(.) is the activation function followed by each layer of the convolutional neural network, here it is a compound operation, indicating that the input x l first undergoes a BN operation, then a ReLU function, and finally a convolutional layer. Process the output as the entire activation function;
步骤S105由于不同的密集连接块的空间大小不同,所以相互之间通过一个转换层进行连接,转换层以上一个密集连接块的输出作为输入,先经过一个BN操作,再接一个卷积神经网络层,最后经过一个池化层将特征映射的空间大小调整到符合下一个密集连接块的输入;在这里设经过池化层特征映射的空间大小变为原来的1/n倍;In step S105, due to the different spatial sizes of different dense connection blocks, they are connected to each other through a conversion layer, and the output of a dense connection block above the conversion layer is used as the input, first through a BN operation, and then a convolutional neural network layer. , and finally, through a pooling layer, the spatial size of the feature map is adjusted to match the input of the next dense connection block; here, the spatial size of the feature map after the pooling layer is set to be 1/n times the original size;
步骤S106密集连接块和转换层进行多次交替连接,使得特征映射的空间大小每经过一个密集连接块后都减小,而特征映射的通道数则增加,在这里设每个密集连接块的最后一层卷积神经网络输出的特征映射为Cm;In step S106, the dense connection block and the conversion layer are alternately connected for many times, so that the space size of the feature map decreases after each dense connection block, while the number of channels of the feature map increases. The feature map output by a layer of convolutional neural network is C m ;
步骤S107删除现有的密集连接卷积神经网络的全局平均池化层和全连接的分类层,并将最后一个密集连接块的最后一层卷积神经网络输出的特征映射作为特征提取网络的输出。Step S107 deletes the global average pooling layer and the fully connected classification layer of the existing densely connected convolutional neural network, and uses the feature map output by the last layer of the convolutional neural network of the last densely connected block as the output of the feature extraction network .
优选的,所述步骤S2中具体步骤如下:Preferably, the specific steps in the step S2 are as follows:
步骤S201FPN由“自下而上的特征金字塔”和一个“平行路径”组成,FPN先从特征提取网络中获取其各层具有不同语义不同尺度的视觉特征,然后由“自下而上”的结构堆叠生成较低语义特征的特征金字塔;Step S201 FPN consists of a "bottom-up feature pyramid" and a "parallel path". FPN first obtains visual features with different semantics and different scales in each layer from the feature extraction network, and then consists of a "bottom-up" structure. Stacking generates feature pyramids of lower semantic features;
步骤S202取步骤S107中输出的特征映射作为FPN的首个输入,输入的特征映射用一个卷积层将通道数调整为一常数d,并将经过通道数调整后的特征映射作为特征金字塔的最低层特征映射,在这里设特征金字塔每层的特征映射为Dm;Step S202 takes the feature map output in step S107 as the first input of the FPN, the input feature map uses a convolution layer to adjust the number of channels to a constant d, and uses the feature map adjusted by the number of channels as the lowest of the feature pyramid. Layer feature map, where the feature map of each layer of the feature pyramid is set as D m ;
步骤S203FPN中的“自下而上路径”,其主要任务是对特征金字塔的低一层特征映射进行上抽样,其上抽样的因子为特征提取网络中池化层的缩小因子的倒数n,得到的特征映射与步骤S1中相对应的密集连接块输出的特征映射具有相同的空间大小;Step S203 The "bottom-up path" in the FPN, its main task is to upsample the feature map of the lower layer of the feature pyramid, and the upsampling factor is the reciprocal n of the reduction factor of the pooling layer in the feature extraction network, obtaining The feature map has the same spatial size as the feature map output by the corresponding densely connected block in step S1;
步骤S204FPN中的“平行路径”,它以步骤S1中各个密集连接块输出的特征映射作为输入,然后使用一个卷积层把输出的特征映射的通道数调整为d;Step S204 the "parallel path" in FPN, which takes the feature map output by each dense connection block in step S1 as input, and then uses a convolution layer to adjust the number of channels of the output feature map to d;
步骤S205经过步骤S203和步骤S204,得到两个在空间大小和通道数上相同的特征映射,把这两个特征映射进行对应元素相加,然后经过一个卷积层达到减少上抽样过程中的混叠效应,由此得到了特征金字塔低一层的特征映射,把步骤S203和步骤S204中对输入的操作分别记作f(.)和g(.),则Dm=g(Cm),Dk=∫(f(Dk+1)+g(Ck)),其中(0<k<m),∫表示S2.5中的卷积操作;In step S205, through steps S203 and S204, two feature maps with the same spatial size and number of channels are obtained, the corresponding elements of the two feature maps are added, and then a convolutional layer is used to reduce the mixing in the up-sampling process. Therefore, the feature map of the lower layer of the feature pyramid is obtained, and the operations on the input in steps S203 and S204 are respectively denoted as f(.) and g(.), then D m =g(C m ), D k =∫(f(D k+1 )+g(C k )), where (0<k<m), ∫ represents the convolution operation in S2.5;
步骤S206重复步骤S203、步骤S204和步骤S205,使得从特征金字塔的最低层逐层往上地构建出整个特征金字塔。Step S206 repeats step S203, step S204 and step S205, so that the entire feature pyramid is constructed layer by layer from the lowest level of the feature pyramid upward.
优选的,所述步骤S3中具体步骤如下:Preferably, the specific steps in the step S3 are as follows:
步骤S301在步骤S02中得到了一个特征金字塔,其特点是特征金字塔的特征尺度自下而上逐层增加,但是每一层的通道数保持不变,相邻两层的特征映射的空间大小的比例因子为n,构建一个同时输出目标边界框信息及分类概率的预测器,预测器将作用于特征金字塔的每一层特征,使得网络能利用不同尺度的特征映射;In step S301, a feature pyramid is obtained in step S02, which is characterized in that the feature scale of the feature pyramid increases layer by layer from bottom to top, but the number of channels in each layer remains unchanged, and the spatial size of the feature maps of two adjacent layers is different. The scale factor is n, and a predictor that outputs the target bounding box information and classification probability at the same time is constructed. The predictor will act on the features of each layer of the feature pyramid, so that the network can use feature maps of different scales;
步骤S302输出目标边界框信息及分类概率的预测器的构建,以特征金字塔的某一层特征映射为输入,经过两个全连接层的处理后,输出一个S*S*(B*5+C)的向量作为预测结果,其作用相当于把原图像分割为S*S个网格,对每个网格预测B个边界框,每个边界框包含5个信息,包括边界框的中心坐标偏移值(tx,ty),边界框的宽高偏移值(tw,th),以及预测边界框的置信度t0,还有对每个网格预测C个目标类别的概率;Step S302 is the construction of a predictor that outputs target bounding box information and classification probability, takes a feature map of a certain layer of the feature pyramid as input, and after processing two fully connected layers, outputs a S*S*(B*5+C ) vector as the prediction result, which is equivalent to dividing the original image into S*S grids, and predicting B bounding boxes for each grid, each bounding box contains 5 pieces of information, including the center coordinate deviation of the bounding box. Shift value (t x , ty ), the width and height offset value of the bounding box (t w , th ), and the confidence t 0 of the predicted bounding box, and the probability of predicting C target categories for each grid ;
步骤S303坐标值的计算:Step S303 Calculation of coordinate values:
x=cx+σ(tx)x=c x +σ(t x )
y=cy+σ(ty) y =cy +σ( ty )
σ(t0)=Pr(object)*IOU(b,object)σ(t 0 )=Pr(object)*IOU(b,object)
其中x,y为边界框中心在图像中的实际坐标,w,h分别为边界框的宽和高;(cx,cy)为格子的左上角坐标为,pw,ph为输入图像的宽和高分别。where x, y are the actual coordinates of the center of the bounding box in the image, w, h are the width and height of the bounding box, respectively; (c x , c y ) are the coordinates of the upper left corner of the grid, p w , p h are the input image width and height respectively.
优选的,所述步骤S4中具体步骤如下:Preferably, the specific steps in the step S4 are as follows:
步骤S401图像采集:采集日常生活中包含各类目标的图像作为训练图像,每张图像带上经过处理都得到的关于该图像中目标的边界框及分类的信息;Step S401 image collection: collect images containing various types of targets in daily life as training images, and each image carries the information about the bounding box and classification of the target in the image obtained after processing;
步骤S402为各个预测量建立代价函数用于训练,对于边界框的中心坐标,,采用公式Step S402 establishes a cost function for each predictor for training, and for the center coordinates of the bounding box, the formula is
作为代价函数,对于边界框的宽高,采用公式As a cost function, for the width and height of the bounding box, the formula
作为代价函数,对于预测类别,采用公式As a cost function, for the predicted class, the formula
其中λcoord和λnoobj是为了让代价函数在边界框和概率的代价间作出平衡,而表示目标出现在第i个格子中,表示第i个格子中的第j个边界框对应预测的目标,最终得到如下的代价函数:where λ coord and λ noobj are designed to balance the cost function between bounding box and probability costs, and Indicates that the target appears in the i-th grid, Represents the predicted target corresponding to the jth bounding box in the ith grid, and finally obtains the following cost function:
步骤S403把步骤S401中收集到的已做好标记的数据输入到网络中,各层的参数按照Xavier的方式初始化,并采用由边界框坐标回归和物体分类所组成的损失函数的随机梯度下降算法计算损失梯度并使用反向传导算法对整个网络里所有层中的参数进行微调,达到对网络进行训练的目的。Step S403 inputs the marked data collected in step S401 into the network, the parameters of each layer are initialized according to the Xavier method, and the stochastic gradient descent algorithm of the loss function composed of bounding box coordinate regression and object classification is adopted. Calculate the loss gradient and use the backpropagation algorithm to fine-tune the parameters in all layers of the entire network to achieve the purpose of training the network.
优选的,所述步骤S1中,使用密集连接块与转换层交替连接的网络结构进行特征提取,能提取到图像中更好有判别性的特征映射。Preferably, in the step S1, feature extraction is performed using a network structure in which densely connected blocks and conversion layers are alternately connected, so that a better discriminative feature map in the image can be extracted.
优选的,所述的一种密集连接卷积下而上的特征金字塔”和“平行路径”组成的FPN网络,能够有效利用高语义低尺度和高尺度低语义的特征映射,构建出具备高语义特征、大尺度和高定位信息的特征金字塔。Preferably, the FPN network composed of a densely connected convolutional bottom-up feature pyramid and a parallel path can effectively utilize the feature maps of high-semantic low-scale and high-scale low-semantics to construct a feature map with high semantics Feature pyramid of features, large scale and high localization information.
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Obviously, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the claims of the present invention.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998184.3A CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810998184.3A CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109214505A CN109214505A (en) | 2019-01-15 |
CN109214505B true CN109214505B (en) | 2022-07-01 |
Family
ID=64985668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810998184.3A Active CN109214505B (en) | 2018-08-29 | 2018-08-29 | Full convolution target detection method of densely connected convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109214505B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886B (en) * | 2019-01-21 | 2020-12-18 | 南京邮电大学 | A pedestrian and vehicle detection method and system based on improved YOLOv3 |
CN109871823B (en) * | 2019-03-11 | 2021-08-31 | 中国电子科技集团公司第五十四研究所 | Satellite image ship detection method combining rotating frame and context information |
CN110009622B (en) * | 2019-04-04 | 2022-02-01 | 武汉精立电子技术有限公司 | Display panel appearance defect detection network and defect detection method thereof |
CN110060274A (en) * | 2019-04-12 | 2019-07-26 | 北京影谱科技股份有限公司 | The visual target tracking method and device of neural network based on the dense connection of depth |
CN110322509B (en) * | 2019-06-26 | 2021-11-12 | 重庆邮电大学 | Target positioning method, system and computer equipment based on hierarchical class activation graph |
CN110555371A (en) * | 2019-07-19 | 2019-12-10 | 华瑞新智科技(北京)有限公司 | Wild animal information acquisition method and device based on unmanned aerial vehicle |
CN110689081B (en) * | 2019-09-30 | 2020-08-21 | 中国科学院大学 | Weak supervision target classification and positioning method based on bifurcation learning |
CN112184641A (en) * | 2020-09-15 | 2021-01-05 | 佛山中纺联检验技术服务有限公司 | Small target object detection method |
CN112016535A (en) * | 2020-10-26 | 2020-12-01 | 成都合能创越软件有限公司 | Vehicle-mounted garbage traceability method and system based on edge calculation and block chain |
CN112560778B (en) * | 2020-12-25 | 2022-05-27 | 万里云医疗信息科技(北京)有限公司 | DR image body part identification method, device, equipment and readable storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10755172B2 (en) * | 2016-06-22 | 2020-08-25 | Massachusetts Institute Of Technology | Secure training of multi-party deep neural network |
CN106250812B (en) * | 2016-07-15 | 2019-08-20 | 汤一平 | A kind of model recognizing method based on quick R-CNN deep neural network |
CN106844442A (en) * | 2016-12-16 | 2017-06-13 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Multi-modal Recognition with Recurrent Neural Network Image Description Methods based on FCN feature extractions |
CN107437096B (en) * | 2017-07-28 | 2020-06-26 | 北京大学 | Image Classification Method Based on Parameter Efficient Deep Residual Network Model |
CN108182388A (en) * | 2017-12-14 | 2018-06-19 | 哈尔滨工业大学(威海) | A kind of motion target tracking method based on image |
-
2018
- 2018-08-29 CN CN201810998184.3A patent/CN109214505B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109214505A (en) | 2019-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109214505B (en) | Full convolution target detection method of densely connected convolution neural network | |
CN110321923B (en) | Target detection method, system and medium for fusion of feature layers of different scales of receptive fields | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN114202672A (en) | A small object detection method based on attention mechanism | |
CN112084869B (en) | Compact quadrilateral representation-based building target detection method | |
CN107392901A (en) | A kind of method for transmission line part intelligence automatic identification | |
CN111310756B (en) | Damaged corn particle detection and classification method based on deep learning | |
CN111753682B (en) | Hoisting area dynamic monitoring method based on target detection algorithm | |
CN107480730A (en) | Power equipment identification model construction method and system, the recognition methods of power equipment | |
CN112132818B (en) | Pulmonary nodule detection and clinical analysis method constructed based on graph convolution neural network | |
CN110310253B (en) | Digital slice classification method and device | |
CN105551036A (en) | Training method and device for deep learning network | |
CN105335725A (en) | Gait identification identity authentication method based on feature fusion | |
CN111898419B (en) | Partition landslide detection system and method based on cascaded deep convolutional neural network | |
CN111738114B (en) | Vehicle target detection method based on accurate sampling of remote sensing images without anchor points | |
CN114241422A (en) | Student classroom behavior detection method based on ESRGAN and improved YOLOv5s | |
CN107633226A (en) | A kind of human action Tracking Recognition method and system | |
CN112818969A (en) | Knowledge distillation-based face pose estimation method and system | |
CN108052929A (en) | Parking space state detection method, system, readable storage medium storing program for executing and computer equipment | |
CN114170526A (en) | Remote sensing image multi-scale target detection and identification method based on lightweight network | |
CN112507861A (en) | Pedestrian detection method based on multilayer convolution feature fusion | |
CN111310821A (en) | Multi-view feature fusion method, system, computer equipment and storage medium | |
CN110322509A (en) | Object localization method, system and computer equipment based on level Class Activation figure | |
CN116310837B (en) | SAR ship target rotation detection method and system | |
CN116416534A (en) | Unmanned aerial vehicle spare area identification method facing protection target |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |