CN116311062A - A small target detection method on expressway - Google Patents
A small target detection method on expressway Download PDFInfo
- Publication number
- CN116311062A CN116311062A CN202310269546.6A CN202310269546A CN116311062A CN 116311062 A CN116311062 A CN 116311062A CN 202310269546 A CN202310269546 A CN 202310269546A CN 116311062 A CN116311062 A CN 116311062A
- Authority
- CN
- China
- Prior art keywords
- image
- features
- target detection
- representing
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000000873 masking effect Effects 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 26
- 230000003993 interaction Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims 1
- 230000002159 abnormal effect Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 11
- 230000006399 behavior Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 241000334993 Parma Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域technical field
本发明属于图像识别与计算机视觉技术领域,具体涉及一种高速公路小目标检测方法。The invention belongs to the technical field of image recognition and computer vision, and in particular relates to a method for detecting small targets on expressways.
背景技术Background technique
高速公路是现代化标志,是一个国家综合国力的体现,高速公路对国家的意义及作用主要体现在其建设和运营涉及到国家经济和社会生活的各个方面。然而高速公路上会出现汽车之外的物体,比如货车洒落的货物、动物、垃圾等,有很大的安全隐患。通过计算机视觉技术,使用摄像头采集实时图像来检测高速公路出现的异物,并及时采取措施处理异物,以此来维护高速公路的畅通。Expressways are a symbol of modernization and a manifestation of a country's comprehensive national strength. The significance and role of expressways to the country are mainly reflected in the fact that their construction and operation involve all aspects of national economic and social life. However, there will be objects other than cars on the expressway, such as goods, animals, garbage, etc. spilled by trucks, which pose a great safety hazard. Through computer vision technology, the camera is used to collect real-time images to detect foreign objects appearing on the expressway, and timely measures are taken to deal with the foreign objects, so as to maintain the smooth flow of the expressway.
现有目标检测方法是基于深度学习的检测方法。通常首先采集几种类别的目标数据集,然后使用一种通用目标检测模型进行训练,最后将训练好的模型进行检测。虽然目前基于深度学习的检测方法具有很高的检测精度,但是对于高速公路上采集的图片,异物的像素较小,可利用特征较少,定位精度较高,导致难以检测甚至会忽略这些目标。显然现有目标检测模型应用在高速公路上实际效果并不理想。Existing object detection methods are detection methods based on deep learning. Usually, several types of target data sets are collected first, then a general target detection model is used for training, and finally the trained model is used for detection. Although the current detection method based on deep learning has high detection accuracy, for the pictures collected on the highway, the pixels of foreign objects are small, there are few available features, and the positioning accuracy is high, making it difficult to detect or even ignore these objects. Obviously, the actual effect of the existing target detection model on the highway is not ideal.
发明内容Contents of the invention
本发明的目的在于针对上述问题,提出一种高速公路小目标检测方法,用以克服传统的目标检测模型难以取得较好检测效果的问题。The purpose of the present invention is to address the above problems and propose a small-target detection method on expressways to overcome the problem that traditional target detection models are difficult to obtain better detection results.
为实现上述目的,本发明所采取的技术方案为:In order to achieve the above object, the technical scheme adopted in the present invention is:
本发明提出的一种高速公路小目标检测方法,包括如下步骤:A kind of expressway small target detection method that the present invention proposes, comprises the following steps:
S1、获取未标记数据集X={x1,x2,…,xl,…,xN},并对未标记数据集中的各输入图像进行数据增强处理,形成对应的重建图像xl表示第l个输入图像,l=1,2,…,N;S1. Obtain an unlabeled data set X={x 1 , x 2 , ..., x l , ..., x N }, and perform data enhancement processing on each input image in the unlabeled data set to form a corresponding reconstructed image x l represents the lth input image, l=1, 2,..., N;
S2、建立目标检测网络模型并对重建图像进行检测,获得对应的目标检测结果,目标检测网络模型包括特征提取模块、动态实例交互头、以及分类与回归分支单元,特征提取模块采用FPN网络,动态实例交互头包括N个特征提取单元,各特征提取单元包括自注意力模块、全连接层、第一卷积层、第二卷积层、ReLu函数和view操作,目标检测网络模型执行如下操作:S2. Establish a target detection network model and reconstruct the image Perform detection to obtain corresponding target detection results. The target detection network model includes a feature extraction module, a dynamic instance interaction head, and a classification and regression branch unit. The feature extraction module uses an FPN network, and the dynamic instance interaction head includes N feature extraction units. The feature extraction unit includes a self-attention module, a fully connected layer, a first convolutional layer, a second convolutional layer, a ReLu function, and a view operation. The target detection network model performs the following operations:
S21、将重建图像输入特征提取模块,获取对应的多尺度特征图;S21. Reconstruct the image Input the feature extraction module to obtain the corresponding multi-scale feature map;
S22、设置N个建议框和对应的建议特征,建议框表示为归一化的中心坐标、高和宽形成的四维向量,建议特征具有与特征提取模块的输出特征相同的维度;S22. Set N suggestion boxes and corresponding suggestion features, where the suggestion box is expressed as a four-dimensional vector formed by normalized center coordinates, height and width, and the suggestion features have the same dimensions as the output features of the feature extraction module;
S23、将建议框和多尺度特征图一一对应通过RoIAlign操作获得对应的ROI特征;S23. Corresponding the suggestion frame and the multi-scale feature map one by one to obtain the corresponding ROI feature through the RoIAlign operation;
S24、将各建议框的建议特征和ROI特征一一对应输入动态实例交互头的特征提取单元,获取对应的目标框和目标特征,特征提取单元执行如下操作:S24. Input the suggestion features and ROI features of each suggestion frame into the feature extraction unit of the dynamic instance interaction head one by one, and obtain the corresponding target frame and target feature. The feature extraction unit performs the following operations:
利用自注意力模块对建议特征进行自注意力操作,获得第一特征;Use the self-attention module to perform self-attention operations on the suggested features to obtain the first feature;
将第一特征经过全连接层转换成一维向量形成第二特征;Convert the first feature to a one-dimensional vector through a fully connected layer to form a second feature;
将ROI特征和第二特征输入第一卷积层,并依次经过第二卷积层、ReLu函数,然后采用view操作调整维度,获得对应的目标特征;Input the ROI feature and the second feature into the first convolutional layer, and pass through the second convolutional layer and the ReLu function in turn, and then use the view operation to adjust the dimension to obtain the corresponding target feature;
S25、更新建议框和建议特征对应为目标框和目标特征,返回执行步骤S23,直至完成迭代次数,获得交互特征;S25. Update the suggestion frame and the suggestion feature corresponding to the target frame and the target feature, return to step S23, until the number of iterations is completed, and the interaction feature is obtained;
S26、将交互特征输入分类与回归分支单元,获得目标检测结果。S26. Inputting the interaction feature into the classification and regression branch unit to obtain a target detection result.
优选地,对未标记数据集中的各输入图像进行数据增强处理,形成对应的重建图像采用数据增强模块实现,数据增强模块包括第一编码器、第二编码器和解码器,并执行如下操作:Preferably, data enhancement processing is performed on each input image in the unlabeled data set to form a corresponding reconstructed image Implemented by a data enhancement module, the data enhancement module includes a first encoder, a second encoder and a decoder, and performs the following operations:
S11、采用未标记数据集X={x1,x2,…,xl,…,xN}训练第二编码器和解码器,其中,第二编码器Eθ的可学习参数θ满足解码器/>满足/>M∈{0,1}W×H表示采用图像块大小为W×H像素的逐块二进制掩码,W表示输入图像x的像素宽度,H表示输入图像x的像素高度;S11. Use the unlabeled data set X={x 1 , x 2 , ..., x l , ..., x N } to train the second encoder and decoder, where the learnable parameter θ of the second encoder E θ satisfies decoder /> meet /> M ∈ {0, 1} W×H represents a block-wise binary mask with an image block size of W×H pixels, where W represents the pixel width of the input image x, and H represents the pixel height of the input image x;
S12、将各输入图像划分为S个图像块;S12. Divide each input image into S image blocks;
S13、对每个划分后的输入图像执行如下操作:S13. Perform the following operations on each divided input image:
S131、利用第一编码器将划分后的输入图像转换为向量;S131. Use the first encoder to convert the divided input image into a vector;
S132、基于注意力策略获取第i个图像块的注意力图Attni:S132. Obtain the attention map Attn i of the i-th image block based on the attention strategy:
Attni=qcls·ki,i∈{0,1,…,p2-1}Attn i =q cls k i , i∈{0, 1, ..., p 2 -1}
式中,qcls表示图像块序列的查询,ki表示第i个图像块的键嵌入,p表示图像块的尺寸;where q cls represents the query of the image block sequence, ki represents the key embedding of the i-th image block, and p represents the size of the image block;
S133、对各注意力图排序获取前K个索引集Ω:S133. Sorting each attention map to obtain the first K index sets Ω:
Ω=top-rank(Attn,K)Ω=top-rank(Attn,K)
式中,top-rank(·,K)表示返回前K个最大元素的索引,Attn表示Attni的集合;In the formula, top-rank( , K) means to return the index of the first K largest elements, and Attn means the set of Attn i ;
S134、获取二进制掩码M*:S134. Obtain the binary mask M * :
式中,表示向下舍入运算,mod(·)表示模运算,Ωi表示索引集Ω中的第i个元素;In the formula, Indicates the rounding down operation, mod(·) indicates the modulo operation, Ω i indicates the i-th element in the index set Ω;
S135、根据二进制掩码M*获取掩蔽图像M*⊙x,划分掩蔽图像成不重叠的图像块并丢弃被二进制掩码遮挡的图像块,余留的可见图像块送入预训练好的第二编码器和解码器以生成对应的重建图像 S135. Obtain the masked image M * ⊙x according to the binary mask M * , divide the masked image into non-overlapping image blocks and discard the image blocks blocked by the binary mask, and send the remaining visible image blocks to the pre-trained second encoder and decoder to generate the corresponding reconstructed image
优选地,目标检测网络模型的损失函数计算如下:Preferably, the loss function of the target detection network model Calculated as follows:
其中,in,
式中,为预测分类和真实分类的均衡焦点损失,/>为预测框与真实框的L1损失,/>为预测框与真实框的距离交并比损失,λcls、λL1、λdiou依次对应为/> 的系数,αt为平衡正负样本数量的权重因子,pt为预测是正样本的概率,γj为第j类的聚焦系数,j=1,2,…,T,T为类别总数,γj解耦为第一分量γb和第二分量/>第一分量γb用于控制分类器的基本行为,/>为可变参数,采用梯度引导机制选择/>gj表示第j类正样本与负样本的累积梯度比,取值范围为[0,1],s为确定γj上限的比例因子,ypz表示预测值,ygz表示真实值,z=1,2,…,n,n表示目标物体的数量,ρ2(bp,bg)表示预测框的中心点bp和真实框的中心点bg的欧氏距离,c表示同时覆盖预测框和真实框的最小矩形的对角线距离,/>表示惩罚项,IOU表示交并比。In the formula, For the balanced focal loss of predicted and true classifications, /> is the L1 loss between the predicted frame and the real frame, /> For the distance intersection ratio loss between the predicted frame and the real frame, λ cls , λ L1 , and λ diou correspond to /> , α t is the weighting factor to balance the number of positive and negative samples, p t is the probability that the prediction is a positive sample, γ j is the focusing coefficient of the jth class, j=1, 2,..., T, T is the total number of categories, γ j is decoupled into the first component γ b and the second component /> The first component γ b is used to control the basic behavior of the classifier, /> is a variable parameter, using the gradient guidance mechanism to select /> g j represents the cumulative gradient ratio between positive samples and negative samples of the jth class, and the value range is [0, 1], s is the proportional factor to determine the upper limit of γ j , y pz represents the predicted value, y gz represents the real value, z= 1, 2,..., n, n represents the number of target objects, ρ 2 (b p , b g ) represents the Euclidean distance between the center point b p of the predicted frame and the center point b g of the real frame, and c represents the simultaneous coverage prediction The diagonal distance of the smallest rectangle of the box and the ground truth box, /> Represents the penalty item, and IOU represents the intersection and union ratio.
优选地,迭代次数E=6,建议框和建议特征的数量N=100。Preferably, the number of iterations is E=6, and the number of suggested boxes and suggested features is N=100.
与现有技术相比,本发明的有益效果为:Compared with prior art, the beneficial effect of the present invention is:
该方法不同于传统的目标检测模型,对小像素目标的识别有较高的精度,尤其具有高速公路的异常天气场景适应能力强等特点,能够更加精准的检测高速公路上的异常物体,且使用掩蔽重建的数据增强方法来为小目标物体的检测获得更加精确的框,并针对小目标物体特征少,样本不平衡的特点,改进了损失函数,采用均衡焦点损失函数来缓解类别不平衡的问题,平衡正负样本难易样本的损失贡献,由此提高小目标检测的精度,更好的应用于高速公路。This method is different from the traditional target detection model. It has high accuracy in the recognition of small pixel targets, especially has the characteristics of strong adaptability to abnormal weather scenes on expressways, and can detect abnormal objects on expressways more accurately. The data enhancement method of masking and reconstruction is used to obtain more accurate frames for the detection of small target objects, and for the characteristics of small target objects with few features and sample imbalance, the loss function is improved, and the balanced focus loss function is used to alleviate the problem of category imbalance. , to balance the loss contribution of positive and negative samples and difficult samples, thereby improving the accuracy of small target detection and better applied to highways.
附图说明Description of drawings
图1为本发明高速公路小目标检测方法流程图;Fig. 1 is the flow chart of expressway small target detection method of the present invention;
图2为本发明目标检测网络模型的结构示意图;Fig. 2 is the structural representation of target detection network model of the present invention;
图3为本发明数据增强模块的结构示意图;Fig. 3 is a schematic structural diagram of a data enhancement module of the present invention;
图4为本发明特征提取模块的结构示意图;Fig. 4 is the structural representation of feature extraction module of the present invention;
图5为本发明动态实例交互头的交互过程示意图。Fig. 5 is a schematic diagram of the interaction process of the interactive head of the dynamic instance of the present invention.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
需要说明的是,当组件被称为与另一个组件“连接”时,它可以直接与另一个组件连接或者也可以存在居中的组件。除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是在于限制本申请。It should be noted that when a component is said to be "connected" to another component, it may be directly connected to the other component or intervening components may also exist. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the description of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.
如图1-5所示,一种高速公路小目标检测方法,包括如下步骤:As shown in Figure 1-5, a small target detection method on a highway includes the following steps:
S1、获取未标记数据集X={x1,x2,…,xl,…,xN},并对未标记数据集中的各输入图像进行数据增强处理,形成对应的重建图像xl表示第l个输入图像,l=1,2,…,N。S1. Obtain an unlabeled data set X={x 1 , x 2 , ..., x l , ..., x N }, and perform data enhancement processing on each input image in the unlabeled data set to form a corresponding reconstructed image x l represents the lth input image, l=1, 2, ..., N.
在一实施例中,对未标记数据集中的各输入图像进行数据增强处理,形成对应的重建图像采用数据增强模块实现,数据增强模块包括第一编码器、第二编码器和解码器,并执行如下操作:In one embodiment, data enhancement processing is performed on each input image in the unlabeled data set to form a corresponding reconstructed image Implemented by a data enhancement module, the data enhancement module includes a first encoder, a second encoder and a decoder, and performs the following operations:
S11、采用未标记数据集X={x1,x2,…,xl,…,xN}训练第二编码器和解码器,其中,第二编码器Eθ的可学习参数θ满足解码器/>满足/>M∈{0,1}W×H表示采用图像块大小为W×H像素的逐块二进制掩码,W表示输入图像x的像素宽度,H表示输入图像x的像素高度;S11. Use the unlabeled data set X={x 1 , x 2 , ..., x l , ..., x N } to train the second encoder and decoder, where the learnable parameter θ of the second encoder E θ satisfies decoder /> meet /> M ∈ {0, 1} W×H represents a block-wise binary mask with an image block size of W×H pixels, where W represents the pixel width of the input image x, and H represents the pixel height of the input image x;
S12、将各输入图像划分为S个图像块;S12. Divide each input image into S image blocks;
S13、对每个划分后的输入图像执行如下操作:S13. Perform the following operations on each divided input image:
S131、利用第一编码器将划分后的输入图像转换为向量;S131. Use the first encoder to convert the divided input image into a vector;
S132、基于注意力策略获取第i个图像块的注意力图Attni:S132. Obtain the attention map Attn i of the i-th image block based on the attention strategy:
Attni=qcls·ki,i∈{0,1,…,p2-1}Attn i =q cls k i , i∈{0, 1, ..., p 2 -1}
式中,qcls表示图像块序列的查询,ki表示第i个图像块的键嵌入,p表示图像块的尺寸;where q cls represents the query of the image block sequence, ki represents the key embedding of the i-th image block, and p represents the size of the image block;
S133、对各注意力图排序获取前K个索引集Ω:S133. Sorting each attention map to obtain the first K index sets Ω:
Ω=top-rank(Attn,K)Ω=top-rank(Attn,K)
式中,top-rank(·,K)表示返回前K个最大元素的索引,Attn表示Attni的集合;In the formula, top-rank( , K) means to return the index of the first K largest elements, and Attn means the set of Attn i ;
S134、获取二进制掩码M*:S134. Obtain the binary mask M * :
式中,表示向下舍入运算,mod(·)表示模运算,Ωi表示索引集Ω中的第i个元素;In the formula, Indicates the rounding down operation, mod(·) indicates the modulo operation, Ω i indicates the i-th element in the index set Ω;
S135、根据二进制掩码M*获取掩蔽图像M*⊙x,划分掩蔽图像成不重叠的图像块并丢弃被二进制掩码遮挡的图像块,余留的可见图像块送入预训练好的第二编码器和解码器以生成对应的重建图像 S135. Obtain the masked image M * ⊙x according to the binary mask M * , divide the masked image into non-overlapping image blocks and discard the image blocks blocked by the binary mask, and send the remaining visible image blocks to the pre-trained second encoder and decoder to generate the corresponding reconstructed image
如图3所示,数据增强模块包括第一编码器(Encoder)、第二编码器(encoder)和解码器(decoder),并通过热力图(Heat map),在做完注意力操作之后,形象的表示数据之间的关系,Top-k表示对各注意力图排序。As shown in Figure 3, the data enhancement module includes the first encoder (Encoder), the second encoder (encoder) and the decoder (decoder), and through the heat map (Heat map), after the attention operation, the image Represents the relationship between the data, and Top-k represents the sorting of each attention map.
S2、建立目标检测网络模型并对重建图像进行检测,获得对应的目标检测结果,目标检测网络模型包括特征提取模块、动态实例交互头、以及分类与回归分支单元,特征提取模块采用FPN网络,动态实例交互头包括N个特征提取单元,各特征提取单元包括自注意力模块、全连接层、第一卷积层、第二卷积层、ReLu函数和view操作,目标检测网络模型执行如下操作:S2. Establish a target detection network model and reconstruct the image Perform detection to obtain corresponding target detection results. The target detection network model includes a feature extraction module, a dynamic instance interaction head, and a classification and regression branch unit. The feature extraction module uses an FPN network, and the dynamic instance interaction head includes N feature extraction units. The feature extraction unit includes a self-attention module, a fully connected layer, a first convolutional layer, a second convolutional layer, a ReLu function, and a view operation. The target detection network model performs the following operations:
S21、将重建图像输入特征提取模块,获取对应的多尺度特征图;S21. Reconstruct the image Input the feature extraction module to obtain the corresponding multi-scale feature map;
S22、设置N个建议框和对应的建议特征,建议框表示为归一化的中心坐标、高和宽形成的四维向量,建议特征具有与特征提取模块的输出特征相同的维度;S22. Set N suggestion boxes and corresponding suggestion features, where the suggestion box is expressed as a four-dimensional vector formed by normalized center coordinates, height and width, and the suggestion features have the same dimensions as the output features of the feature extraction module;
S23、将建议框和多尺度特征图一一对应通过RoIAlign操作获得对应的ROI特征;S23. Corresponding the suggestion frame and the multi-scale feature map one by one to obtain the corresponding ROI feature through the RoIAlign operation;
S24、将各建议框的建议特征和ROI特征一一对应输入动态实例交互头的特征提取单元,获取对应的目标框和目标特征,特征提取单元执行如下操作:S24. Input the suggestion features and ROI features of each suggestion frame into the feature extraction unit of the dynamic instance interaction head one by one, and obtain the corresponding target frame and target feature. The feature extraction unit performs the following operations:
利用自注意力模块对建议特征进行自注意力操作,获得第一特征;Use the self-attention module to perform self-attention operations on the suggested features to obtain the first feature;
将第一特征经过全连接层转换成一维向量形成第二特征;Convert the first feature to a one-dimensional vector through a fully connected layer to form a second feature;
将ROI特征和第二特征输入第一卷积层,并依次经过第二卷积层、ReLu函数,然后采用view操作调整维度,获得对应的目标特征;Input the ROI feature and the second feature into the first convolutional layer, and pass through the second convolutional layer and the ReLu function in turn, and then use the view operation to adjust the dimension to obtain the corresponding target feature;
S25、更新建议框和建议特征对应为目标框和目标特征,返回执行步骤S23,直至完成迭代次数,获得交互特征;S25. Update the suggestion frame and the suggestion feature corresponding to the target frame and the target feature, return to step S23, until the number of iterations is completed, and the interaction feature is obtained;
S26、将交互特征输入分类与回归分支单元,获得目标检测结果。S26. Inputting the interaction feature into the classification and regression branch unit to obtain a target detection result.
如图5所示,Proposal Feat表示建议特征,Roi Feat表示ROI特征,Self-Attention表示自注意力模块、Parmas表示第二特征。图2中,特征向量表示各建议框的建议特征和ROI特征。As shown in Figure 5, Proposal Feat represents the suggested feature, Roi Feat represents the ROI feature, Self-Attention represents the self-attention module, and Parmas represents the second feature. In Fig. 2, the feature vector represents the proposal features and ROI features of each proposal box.
在一实施例中,目标检测网络模型的损失函数计算如下:In one embodiment, the loss function of the target detection network model Calculated as follows:
其中,in,
式中,为预测分类和真实分类的均衡焦点损失,/>为预测框与真实框的L1损失,/>为预测框与真实框的距离交并比损失,λcls、λL1、λdiou依次对应为/> 的系数,αt为平衡正负样本数量的权重因子,pt为预测是正样本的概率,γj为第j类的聚焦系数,j=1,2,…,T,T为类别总数,γj解耦为第一分量γb和第二分量/>第一分量γb用于控制分类器的基本行为,/>为可变参数,采用梯度引导机制选择/>gj表示第j类正样本与负样本的累积梯度比,取值范围为[0,1],s为确定γj上限的比例因子,ypz表示预测值,ygz表示真实值,z=1,2,…,n,n表示目标物体的数量,ρ2(bp,bg)表示预测框的中心点bp和真实框的中心点bg的欧氏距离,c表示同时覆盖预测框和真实框的最小矩形的对角线距离,/>表示惩罚项,IOU表示交并比。In the formula, For the balanced focal loss of predicted and true classifications, /> is the L1 loss between the predicted frame and the real frame, /> For the distance intersection ratio loss between the predicted frame and the real frame, λ cls , λ L1 , and λ diou correspond to /> , α t is the weighting factor to balance the number of positive and negative samples, p t is the probability that the prediction is a positive sample, γ j is the focusing coefficient of the jth class, j=1, 2,..., T, T is the total number of categories, γ j is decoupled into the first component γ b and the second component /> The first component γ b is used to control the basic behavior of the classifier, /> is a variable parameter, using the gradient guidance mechanism to select /> g j represents the cumulative gradient ratio between positive samples and negative samples of the jth class, and the value range is [0, 1], s is the proportional factor to determine the upper limit of γ j , y pz represents the predicted value, y gz represents the real value, z= 1, 2,..., n, n represents the number of target objects, ρ 2 (b p , b g ) represents the Euclidean distance between the center point b p of the predicted frame and the center point b g of the real frame, and c represents the simultaneous coverage prediction The diagonal distance of the smallest rectangle of the box and the ground truth box, /> Represents the penalty item, and IOU represents the intersection and union ratio.
在一实施例中,迭代次数E=6,建议框和建议特征的数量N=100。In one embodiment, the number of iterations is E=6, and the number of suggested boxes and suggested features is N=100.
具体地,本实施例中特征提取模块采用基于ResNet的FPN网络。其中,FPN网络是特征金字塔,为现有技术,结构如图4,如采用以下步骤获取:(1)自下而上的路径,经过backbone,使用每一个阶段的最后一个残差结构的特征激活输出,将这些残差模块conv2,conv3,conv4,conv5的输出表示为{C2,C3,C4,C5};(2)自上而下的路径以及横向连接,对深层特征图进行上采样来得到更高分辨率的图,然后将这些上采样之后的特征图与自下而上的特征图谱通过横向连接的方式拼接在一起,具体如图4所示。构建P2到P5的特征金字塔。以l来表示金字塔层数,每层特征图的分辨率比输入图像低2l,所有金字塔层数都有256通道数。设重建图像的尺寸为h×w,h为重建图像的高度,w为重建图像的宽度。FPN网络各阶段的输出如下表1所示:Specifically, the feature extraction module in this embodiment uses a ResNet-based FPN network. Among them, the FPN network is a feature pyramid, which is an existing technology. The structure is shown in Figure 4, and the following steps are used to obtain it: (1) Bottom-up path, through the backbone, using the feature activation of the last residual structure of each stage Output, denote the output of these residual modules conv2, conv3, conv4, conv5 as {C 2 , C 3 , C 4 , C 5 }; (2) top-down path and lateral connection, deep feature map Upsampling is used to obtain higher resolution maps, and then these upsampled feature maps and bottom-up feature maps are stitched together in a horizontal connection, as shown in Figure 4. Build a feature pyramid from P 2 to P 5 . Let l represent the number of pyramid layers, the resolution of each layer feature map is 2 l lower than that of the input image, and all pyramid layers have 256 channels. Set up a new image The dimension of is h×w, h is the height of the reconstructed image, and w is the width of the reconstructed image. The output of each stage of the FPN network is shown in Table 1 below:
表1Table 1
建议框和建议特征都是可学习的,并且两者是一一对应的。使用一组可学习的目标框作为区域建议,这些建议框由范围从0到1的四维参数表示,分别是归一化的中心坐标、高和宽。在训练期间通过反向传播算法更新建议框的参数。反向传播是当前用来训练人工神经网络的最常用的方法,它将输出层的误差反向逐层传播,通过计算偏导数来更新网络参数使得误差损失函数最小化。这些可学习的建议框是训练集中潜在目标位置的统计,可以被看作不考虑输入的情况下,图像中最有可能包含目标区域的初始猜想。但是,这些框仅提供了大致的定位信息,丢失了物体的姿态与形状,不利于后续的分类与回归,所以通过可学习建议特征,表征每一个实例的特征。是一个高维的潜在向量,用于编码丰富的实例特性。Both proposal boxes and proposal features are learnable, and there is a one-to-one correspondence between the two. A set of learnable object boxes are used as region proposals, which are represented by four-dimensional parameters ranging from 0 to 1, which are normalized center coordinates, height and width, respectively. The parameters of the proposal boxes are updated through the backpropagation algorithm during training. Backpropagation is currently the most commonly used method for training artificial neural networks. It propagates the error of the output layer backward layer by layer, and updates the network parameters by calculating partial derivatives to minimize the error loss function. These learnable proposal boxes are statistics of potential object locations in the training set and can be viewed as an initial guess of the most likely object region contained in the image regardless of the input. However, these frames only provide approximate positioning information, and the pose and shape of the object are lost, which is not conducive to subsequent classification and regression. Therefore, the features of each instance can be represented by learning the suggested features. is a high-dimensional latent vector for encoding rich instance features.
分类与回归分支单元为现有技术,如所进行的回归预测采用一个三层的感知计算,分类预测由一个线性映射层实现,在此不再赘述。目标检测网络模型用一组预测损失用在固定尺寸的分类和框坐标预测上。基于集合的损失在预测和真实对象之间产生最佳二分匹配。例如,目标检测网络模型检测出100个目标框,将真实框也扩展成100个检测框。这样预测和真实都是两个100个元素的集合了。通过采用匈牙利算法进行二分匹配,即对预测集合和真实集合的元素进行一一对应,使得匹配损失最小,由正负样本对来计算损失。The classification and regression branch unit is an existing technology. For example, a three-layer perceptual calculation is used for the regression prediction, and a linear mapping layer is used for the classification prediction, which will not be repeated here. The object detection network model uses a set of prediction losses for fixed-size classification and box coordinate prediction. An ensemble-based loss yields the best bipartite match between predictions and real objects. For example, the target detection network model detects 100 target boxes, and expands the real box into 100 detection boxes. In this way, prediction and reality are two sets of 100 elements. By using the Hungarian algorithm for binary matching, that is, to make one-to-one correspondence between the elements of the prediction set and the real set, so that the matching loss is minimized, and the loss is calculated by positive and negative sample pairs.
其中,目标检测网络模型的损失函数中,γj起平衡难易样本的作用,γb用于控制分类器的一个基本行为,并不会作用于类别不平衡问题。/>是一个可变参数,决定了在正负不平衡问题上对第j类别学习的关注程度,采用梯度引导机制来选择/>为了更好的满足需求,在实际中将gj控制在[0,1]范围内。/>作为权重系数用来平衡不同类别的损失贡献,让稀少的样本比常见样本作出更多损失贡献。对于稀有类别数据,将权重系数设置为较大的值,以增加其损失贡献,而对于频繁类别数据,权重系数保持在1附近。最终损失是训练批次中对象数量归一化后的所有对之和。Among them, the loss function of the target detection network model Among them, γ j plays the role of balancing difficult and easy samples, and γ b is used to control a basic behavior of the classifier, and will not affect the category imbalance problem. /> is a variable parameter that determines the degree of attention to the learning of the jth category on the problem of positive and negative imbalances, and uses a gradient-guided mechanism to select /> In order to better meet the requirements, g j is controlled within the range of [0, 1] in practice. /> As a weight coefficient, it is used to balance the loss contributions of different categories, so that rare samples can make more loss contributions than common samples. For rare class data, the weight coefficient is set to a larger value to increase its loss contribution, while for frequent class data, the weight coefficient is kept around 1. The final loss is the sum of all pairs normalized by the number of objects in the training batch.
由于高速公路上出现的洒落物、动物以及垃圾的种类并不是很多,并且不同类别的数量也呈现极度不平衡的状态。为了解决类别的极端不平衡,在传统的焦点损失函数基础上,加入聚焦系数和权重系数。本实施例采用距离交并比(DIOU)。由于高速公路上的散落物在实时的摄像头拍摄下占比很小,属于小目标,预测框往往比真实框大,形成了包含关系。目标框在预测框的中心和角落的时候损失是一样的,通过加入惩罚项用于度量目标框和预测框之间中心点的距离。Because there are not many types of spills, animals, and garbage on the highway, and the number of different types is extremely unbalanced. In order to solve the extreme imbalance of categories, on the basis of the traditional focal loss function, a focal coefficient and a weight coefficient are added. In this embodiment, distance intersection over union (DIOU) is used. Since the scattered objects on the highway account for a small proportion in the real-time camera shooting, they are small objects, and the predicted frame is often larger than the real frame, forming a containment relationship. The loss of the target frame is the same when the center and corner of the prediction frame are predicted, by adding a penalty term It is used to measure the distance between the center point between the target box and the predicted box.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.
以上所述实施例仅表达了本申请描述较为具体和详细的实施例,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express the more specific and detailed embodiments described in the present application, but should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310269546.6A CN116311062A (en) | 2023-03-20 | 2023-03-20 | A small target detection method on expressway |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310269546.6A CN116311062A (en) | 2023-03-20 | 2023-03-20 | A small target detection method on expressway |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116311062A true CN116311062A (en) | 2023-06-23 |
Family
ID=86802786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310269546.6A Pending CN116311062A (en) | 2023-03-20 | 2023-03-20 | A small target detection method on expressway |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116311062A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117576217A (en) * | 2024-01-12 | 2024-02-20 | 电子科技大学 | Object pose estimation method based on single-instance image reconstruction |
-
2023
- 2023-03-20 CN CN202310269546.6A patent/CN116311062A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117576217A (en) * | 2024-01-12 | 2024-02-20 | 电子科技大学 | Object pose estimation method based on single-instance image reconstruction |
CN117576217B (en) * | 2024-01-12 | 2024-03-26 | 电子科技大学 | An object pose estimation method based on single instance image reconstruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111126472B (en) | An Improved Target Detection Method Based on SSD | |
CN109859190B (en) | Target area detection method based on deep learning | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN114202672A (en) | A small object detection method based on attention mechanism | |
CN110084292A (en) | Object detection method based on DenseNet and multi-scale feature fusion | |
CN113240691A (en) | Medical image segmentation method based on U-shaped network | |
CN107944369A (en) | A kind of pedestrian detection method based on tandem zones generation network and enhancing random forest | |
CN112528862B (en) | Remote sensing image target detection method based on improved cross entropy loss function | |
CN117333845B (en) | A real-time detection method for small target traffic signs based on improved YOLOv5s | |
CN117765373A (en) | Lightweight road crack detection method and system with self-adaptive crack size | |
CN111209858B (en) | Real-time license plate detection method based on deep convolutional neural network | |
CN113297959A (en) | Target tracking method and system based on corner attention twin network | |
CN109670555B (en) | Instance-level pedestrian detection and pedestrian re-recognition system based on deep learning | |
CN114549832B (en) | A semantic segmentation method based on dataset distribution and contextual information | |
CN107463881A (en) | A kind of character image searching method based on depth enhancing study | |
CN114612658B (en) | Image semantic segmentation method based on dual category-level adversarial network | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN115223017A (en) | A multi-scale feature fusion bridge detection method based on depthwise separable convolution | |
CN113159215A (en) | Small target detection and identification method based on fast Rcnn | |
CN117496384A (en) | A method for object detection in drone images | |
CN117132889A (en) | Multi-scale pavement crack detection method based on deep and shallow attention feature fusion | |
CN117611998A (en) | An optical remote sensing image target detection method based on improved YOLOv7 | |
CN115565150A (en) | A pedestrian and vehicle target detection method and system based on improved YOLOv3 | |
CN111222534A (en) | A single-shot multi-box detector optimization method based on bidirectional feature fusion and more balanced L1 loss | |
CN118941526A (en) | A road crack detection method, medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |