CN116311062A

CN116311062A - A small target detection method on expressway

Info

Publication number: CN116311062A
Application number: CN202310269546.6A
Authority: CN
Inventors: 邵奇可; 郑泖琛; 叶文武; 颜世航
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-06-23

Abstract

The invention discloses a method for detecting a small target on a highway, which comprises the following steps: acquiring an unlabeled data set and performing data enhancement processing on each input image in the unlabeled data set to form a corresponding reconstructed image

Establishing a target detection network modelAnd reconstruct the image

And detecting to obtain a corresponding target detection result. The method is different from the traditional target detection model, has higher accuracy in identifying the small pixel targets, has strong adaptability to abnormal weather scenes of the expressway, can more accurately detect abnormal objects on the expressway, obtains a more accurate frame for detecting the small target objects by using a data enhancement method of masking reconstruction, improves a loss function according to the characteristics of small target objects, such as few characteristics and unbalanced samples, and adopts an equilibrium focus loss function to alleviate the problem of unbalanced categories, thereby improving the accuracy of small target detection and being better applied to the expressway.

Description

A small target detection method on expressway

技术领域technical field

本发明属于图像识别与计算机视觉技术领域，具体涉及一种高速公路小目标检测方法。The invention belongs to the technical field of image recognition and computer vision, and in particular relates to a method for detecting small targets on expressways.

背景技术Background technique

高速公路是现代化标志，是一个国家综合国力的体现，高速公路对国家的意义及作用主要体现在其建设和运营涉及到国家经济和社会生活的各个方面。然而高速公路上会出现汽车之外的物体，比如货车洒落的货物、动物、垃圾等，有很大的安全隐患。通过计算机视觉技术，使用摄像头采集实时图像来检测高速公路出现的异物，并及时采取措施处理异物，以此来维护高速公路的畅通。Expressways are a symbol of modernization and a manifestation of a country's comprehensive national strength. The significance and role of expressways to the country are mainly reflected in the fact that their construction and operation involve all aspects of national economic and social life. However, there will be objects other than cars on the expressway, such as goods, animals, garbage, etc. spilled by trucks, which pose a great safety hazard. Through computer vision technology, the camera is used to collect real-time images to detect foreign objects appearing on the expressway, and timely measures are taken to deal with the foreign objects, so as to maintain the smooth flow of the expressway.

现有目标检测方法是基于深度学习的检测方法。通常首先采集几种类别的目标数据集，然后使用一种通用目标检测模型进行训练，最后将训练好的模型进行检测。虽然目前基于深度学习的检测方法具有很高的检测精度，但是对于高速公路上采集的图片，异物的像素较小，可利用特征较少，定位精度较高，导致难以检测甚至会忽略这些目标。显然现有目标检测模型应用在高速公路上实际效果并不理想。Existing object detection methods are detection methods based on deep learning. Usually, several types of target data sets are collected first, then a general target detection model is used for training, and finally the trained model is used for detection. Although the current detection method based on deep learning has high detection accuracy, for the pictures collected on the highway, the pixels of foreign objects are small, there are few available features, and the positioning accuracy is high, making it difficult to detect or even ignore these objects. Obviously, the actual effect of the existing target detection model on the highway is not ideal.

发明内容Contents of the invention

本发明的目的在于针对上述问题，提出一种高速公路小目标检测方法，用以克服传统的目标检测模型难以取得较好检测效果的问题。The purpose of the present invention is to address the above problems and propose a small-target detection method on expressways to overcome the problem that traditional target detection models are difficult to obtain better detection results.

为实现上述目的，本发明所采取的技术方案为：In order to achieve the above object, the technical scheme adopted in the present invention is:

本发明提出的一种高速公路小目标检测方法，包括如下步骤：A kind of expressway small target detection method that the present invention proposes, comprises the following steps:

S1、获取未标记数据集X＝{x₁，x₂，…，x_l，…，x_N}，并对未标记数据集中的各输入图像进行数据增强处理，形成对应的重建图像

x_l表示第l个输入图像，l＝1，2，…，N；S1. Obtain an unlabeled data set X={x ₁ , x ₂ , ..., x _l , ..., x _N }, and perform data enhancement processing on each input image in the unlabeled data set to form a corresponding reconstructed image

x _l represents the lth input image, l=1, 2,..., N;

S2、建立目标检测网络模型并对重建图像

进行检测，获得对应的目标检测结果，目标检测网络模型包括特征提取模块、动态实例交互头、以及分类与回归分支单元，特征提取模块采用FPN网络，动态实例交互头包括N个特征提取单元，各特征提取单元包括自注意力模块、全连接层、第一卷积层、第二卷积层、ReLu函数和view操作，目标检测网络模型执行如下操作：S2. Establish a target detection network model and reconstruct the image

Perform detection to obtain corresponding target detection results. The target detection network model includes a feature extraction module, a dynamic instance interaction head, and a classification and regression branch unit. The feature extraction module uses an FPN network, and the dynamic instance interaction head includes N feature extraction units. The feature extraction unit includes a self-attention module, a fully connected layer, a first convolutional layer, a second convolutional layer, a ReLu function, and a view operation. The target detection network model performs the following operations:

S21、将重建图像

输入特征提取模块，获取对应的多尺度特征图；S21. Reconstruct the image

Input the feature extraction module to obtain the corresponding multi-scale feature map;

S22、设置N个建议框和对应的建议特征，建议框表示为归一化的中心坐标、高和宽形成的四维向量，建议特征具有与特征提取模块的输出特征相同的维度；S22. Set N suggestion boxes and corresponding suggestion features, where the suggestion box is expressed as a four-dimensional vector formed by normalized center coordinates, height and width, and the suggestion features have the same dimensions as the output features of the feature extraction module;

S23、将建议框和多尺度特征图一一对应通过RoIAlign操作获得对应的ROI特征；S23. Corresponding the suggestion frame and the multi-scale feature map one by one to obtain the corresponding ROI feature through the RoIAlign operation;

S24、将各建议框的建议特征和ROI特征一一对应输入动态实例交互头的特征提取单元，获取对应的目标框和目标特征，特征提取单元执行如下操作：S24. Input the suggestion features and ROI features of each suggestion frame into the feature extraction unit of the dynamic instance interaction head one by one, and obtain the corresponding target frame and target feature. The feature extraction unit performs the following operations:

利用自注意力模块对建议特征进行自注意力操作，获得第一特征；Use the self-attention module to perform self-attention operations on the suggested features to obtain the first feature;

将第一特征经过全连接层转换成一维向量形成第二特征；Convert the first feature to a one-dimensional vector through a fully connected layer to form a second feature;

将ROI特征和第二特征输入第一卷积层，并依次经过第二卷积层、ReLu函数，然后采用view操作调整维度，获得对应的目标特征；Input the ROI feature and the second feature into the first convolutional layer, and pass through the second convolutional layer and the ReLu function in turn, and then use the view operation to adjust the dimension to obtain the corresponding target feature;

S25、更新建议框和建议特征对应为目标框和目标特征，返回执行步骤S23，直至完成迭代次数，获得交互特征；S25. Update the suggestion frame and the suggestion feature corresponding to the target frame and the target feature, return to step S23, until the number of iterations is completed, and the interaction feature is obtained;

S26、将交互特征输入分类与回归分支单元，获得目标检测结果。S26. Inputting the interaction feature into the classification and regression branch unit to obtain a target detection result.

优选地，对未标记数据集中的各输入图像进行数据增强处理，形成对应的重建图像

采用数据增强模块实现，数据增强模块包括第一编码器、第二编码器和解码器，并执行如下操作：Preferably, data enhancement processing is performed on each input image in the unlabeled data set to form a corresponding reconstructed image

Implemented by a data enhancement module, the data enhancement module includes a first encoder, a second encoder and a decoder, and performs the following operations:

S11、采用未标记数据集X＝{x₁，x₂，…，x_l，…，x_N}训练第二编码器和解码器，其中，第二编码器E_θ的可学习参数θ满足

解码器/>

满足/>

M∈{0，1}^W×H表示采用图像块大小为W×H像素的逐块二进制掩码，W表示输入图像x的像素宽度，H表示输入图像x的像素高度；S11. Use the unlabeled data set X={x ₁ , x ₂ , ..., x _l , ..., x _N } to train the second encoder and decoder, where the learnable parameter θ of the second encoder E _θ satisfies

decoder />

meet />

M ∈ {0, 1} ^W×H represents a block-wise binary mask with an image block size of W×H pixels, where W represents the pixel width of the input image x, and H represents the pixel height of the input image x;

S12、将各输入图像划分为S个图像块；S12. Divide each input image into S image blocks;

S13、对每个划分后的输入图像执行如下操作：S13. Perform the following operations on each divided input image:

S131、利用第一编码器将划分后的输入图像转换为向量；S131. Use the first encoder to convert the divided input image into a vector;

S132、基于注意力策略获取第i个图像块的注意力图Attn_i：S132. Obtain the attention map Attn _i of the i-th image block based on the attention strategy:

Attn_i＝q_cls·k_i，i∈{0，1，…，p²-1}Attn _i =q _cls k _i , i∈{0, 1, ..., p ² -1}

式中，q_cls表示图像块序列的查询，k_i表示第i个图像块的键嵌入，p表示图像块的尺寸；where q _cls represents the query of the image block sequence, _ki represents the key embedding of the i-th image block, and p represents the size of the image block;

S133、对各注意力图排序获取前K个索引集Ω：S133. Sorting each attention map to obtain the first K index sets Ω:

Ω＝top-rank(Attn，K)Ω=top-rank(Attn,K)

式中，top-rank(·，K)表示返回前K个最大元素的索引，Attn表示Attn_i的集合；In the formula, top-rank( , K) means to return the index of the first K largest elements, and Attn means the set of Attn _i ;

S134、获取二进制掩码M^*：S134. Obtain the binary mask M ^* :

式中，

表示向下舍入运算，mod(·)表示模运算，Ω_i表示索引集Ω中的第i个元素；In the formula,

Indicates the rounding down operation, mod(·) indicates the modulo operation, Ω _i indicates the i-th element in the index set Ω;

S135、根据二进制掩码M^*获取掩蔽图像M^*⊙x，划分掩蔽图像成不重叠的图像块并丢弃被二进制掩码遮挡的图像块，余留的可见图像块送入预训练好的第二编码器和解码器以生成对应的重建图像

S135. Obtain the masked image M ^* ⊙x according to the binary mask M ^* , divide the masked image into non-overlapping image blocks and discard the image blocks blocked by the binary mask, and send the remaining visible image blocks to the pre-trained second encoder and decoder to generate the corresponding reconstructed image

优选地，目标检测网络模型的损失函数

计算如下：Preferably, the loss function of the target detection network model

Calculated as follows:

其中，in,

式中，

为预测分类和真实分类的均衡焦点损失，/>

为预测框与真实框的L1损失，/>

为预测框与真实框的距离交并比损失，λ_cls、λ_L1、λ_diou依次对应为/>

的系数，α_t为平衡正负样本数量的权重因子，p_t为预测是正样本的概率，γ^j为第j类的聚焦系数，j＝1，2，…，T，T为类别总数，γ^j解耦为第一分量γ_b和第二分量/>

第一分量γ_b用于控制分类器的基本行为，/>

为可变参数，采用梯度引导机制选择/>

g^j表示第j类正样本与负样本的累积梯度比，取值范围为[0，1]，s为确定γ^j上限的比例因子，y_pz表示预测值，y_gz表示真实值，z＝1，2，…，n，n表示目标物体的数量，ρ²(b_p，b_g)表示预测框的中心点b_p和真实框的中心点b_g的欧氏距离，c表示同时覆盖预测框和真实框的最小矩形的对角线距离，/>

表示惩罚项，IOU表示交并比。In the formula,

For the balanced focal loss of predicted and true classifications, />

is the L1 loss between the predicted frame and the real frame, />

For the distance intersection ratio loss between the predicted frame and the real frame, λ _cls , λ _L1 , and λ _diou correspond to />

, α _t is the weighting factor to balance the number of positive and negative samples, p _t is the probability that the prediction is a positive sample, γ ^j is the focusing coefficient of the jth class, j=1, 2,..., T, T is the total number of categories, γ ^j is decoupled into the first component γ _b and the second component />

The first component γ _b is used to control the basic behavior of the classifier, />

is a variable parameter, using the gradient guidance mechanism to select />

g ^j represents the cumulative gradient ratio between positive samples and negative samples of the jth class, and the value range is [0, 1], s is the proportional factor to determine the upper limit of γ ^j , y _pz represents the predicted value, y _gz represents the real value, z= 1, 2,..., n, n represents the number of target objects, ρ ² (b _p , b _g ) represents the Euclidean distance between the center point b _p of the predicted frame and the center point b _g of the real frame, and c represents the simultaneous coverage prediction The diagonal distance of the smallest rectangle of the box and the ground truth box, />

Represents the penalty item, and IOU represents the intersection and union ratio.

优选地，迭代次数E＝6，建议框和建议特征的数量N＝100。Preferably, the number of iterations is E=6, and the number of suggested boxes and suggested features is N=100.

与现有技术相比，本发明的有益效果为：Compared with prior art, the beneficial effect of the present invention is:

该方法不同于传统的目标检测模型，对小像素目标的识别有较高的精度，尤其具有高速公路的异常天气场景适应能力强等特点，能够更加精准的检测高速公路上的异常物体，且使用掩蔽重建的数据增强方法来为小目标物体的检测获得更加精确的框，并针对小目标物体特征少，样本不平衡的特点，改进了损失函数，采用均衡焦点损失函数来缓解类别不平衡的问题，平衡正负样本难易样本的损失贡献，由此提高小目标检测的精度，更好的应用于高速公路。This method is different from the traditional target detection model. It has high accuracy in the recognition of small pixel targets, especially has the characteristics of strong adaptability to abnormal weather scenes on expressways, and can detect abnormal objects on expressways more accurately. The data enhancement method of masking and reconstruction is used to obtain more accurate frames for the detection of small target objects, and for the characteristics of small target objects with few features and sample imbalance, the loss function is improved, and the balanced focus loss function is used to alleviate the problem of category imbalance. , to balance the loss contribution of positive and negative samples and difficult samples, thereby improving the accuracy of small target detection and better applied to highways.

附图说明Description of drawings

图1为本发明高速公路小目标检测方法流程图；Fig. 1 is the flow chart of expressway small target detection method of the present invention;

图2为本发明目标检测网络模型的结构示意图；Fig. 2 is the structural representation of target detection network model of the present invention;

图3为本发明数据增强模块的结构示意图；Fig. 3 is a schematic structural diagram of a data enhancement module of the present invention;

图4为本发明特征提取模块的结构示意图；Fig. 4 is the structural representation of feature extraction module of the present invention;

图5为本发明动态实例交互头的交互过程示意图。Fig. 5 is a schematic diagram of the interaction process of the interactive head of the dynamic instance of the present invention.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some, not all, embodiments of the application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

需要说明的是，当组件被称为与另一个组件“连接”时，它可以直接与另一个组件连接或者也可以存在居中的组件。除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是在于限制本申请。It should be noted that when a component is said to be "connected" to another component, it may be directly connected to the other component or intervening components may also exist. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the description of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.

如图1-5所示，一种高速公路小目标检测方法，包括如下步骤：As shown in Figure 1-5, a small target detection method on a highway includes the following steps:

x_l表示第l个输入图像，l＝1，2，…，N。S1. Obtain an unlabeled data set X={x ₁ , x ₂ , ..., x _l , ..., x _N }, and perform data enhancement processing on each input image in the unlabeled data set to form a corresponding reconstructed image

x _l represents the lth input image, l=1, 2, ..., N.

在一实施例中，对未标记数据集中的各输入图像进行数据增强处理，形成对应的重建图像

采用数据增强模块实现，数据增强模块包括第一编码器、第二编码器和解码器，并执行如下操作：In one embodiment, data enhancement processing is performed on each input image in the unlabeled data set to form a corresponding reconstructed image

解码器/>

满足/>

decoder />

meet />

Ω＝top-rank(Attn，K)Ω=top-rank(Attn,K)

S134、获取二进制掩码M^*：S134. Obtain the binary mask M ^* :

式中，

如图3所示，数据增强模块包括第一编码器(Encoder)、第二编码器(encoder)和解码器(decoder)，并通过热力图(Heat map)，在做完注意力操作之后，形象的表示数据之间的关系，Top-k表示对各注意力图排序。As shown in Figure 3, the data enhancement module includes the first encoder (Encoder), the second encoder (encoder) and the decoder (decoder), and through the heat map (Heat map), after the attention operation, the image Represents the relationship between the data, and Top-k represents the sorting of each attention map.

S2、建立目标检测网络模型并对重建图像

S21、将重建图像

如图5所示，Proposal Feat表示建议特征，Roi Feat表示ROI特征，Self-Attention表示自注意力模块、Parmas表示第二特征。图2中，特征向量表示各建议框的建议特征和ROI特征。As shown in Figure 5, Proposal Feat represents the suggested feature, Roi Feat represents the ROI feature, Self-Attention represents the self-attention module, and Parmas represents the second feature. In Fig. 2, the feature vector represents the proposal features and ROI features of each proposal box.

在一实施例中，目标检测网络模型的损失函数

计算如下：In one embodiment, the loss function of the target detection network model

Calculated as follows:

其中，in,

式中，

为预测分类和真实分类的均衡焦点损失，/>

为预测框与真实框的L1损失，/>

第一分量γ_b用于控制分类器的基本行为，/>

为可变参数，采用梯度引导机制选择/>

表示惩罚项，IOU表示交并比。In the formula,

For the balanced focal loss of predicted and true classifications, />

is the L1 loss between the predicted frame and the real frame, />

is a variable parameter, using the gradient guidance mechanism to select />

在一实施例中，迭代次数E＝6，建议框和建议特征的数量N＝100。In one embodiment, the number of iterations is E=6, and the number of suggested boxes and suggested features is N=100.

具体地，本实施例中特征提取模块采用基于ResNet的FPN网络。其中，FPN网络是特征金字塔，为现有技术，结构如图4，如采用以下步骤获取：(1)自下而上的路径，经过backbone，使用每一个阶段的最后一个残差结构的特征激活输出，将这些残差模块conv2，conv3，conv4，conv5的输出表示为{C₂，C₃，C₄，C₅}；(2)自上而下的路径以及横向连接，对深层特征图进行上采样来得到更高分辨率的图，然后将这些上采样之后的特征图与自下而上的特征图谱通过横向连接的方式拼接在一起，具体如图4所示。构建P₂到P₅的特征金字塔。以l来表示金字塔层数，每层特征图的分辨率比输入图像低2^l，所有金字塔层数都有256通道数。设重建图像

的尺寸为h×w，h为重建图像的高度，w为重建图像的宽度。FPN网络各阶段的输出如下表1所示：Specifically, the feature extraction module in this embodiment uses a ResNet-based FPN network. Among them, the FPN network is a feature pyramid, which is an existing technology. The structure is shown in Figure 4, and the following steps are used to obtain it: (1) Bottom-up path, through the backbone, using the feature activation of the last residual structure of each stage Output, denote the output of these residual modules conv2, conv3, conv4, conv5 as {C ₂ , C ₃ , C ₄ , C ₅ }; (2) top-down path and lateral connection, deep feature map Upsampling is used to obtain higher resolution maps, and then these upsampled feature maps and bottom-up feature maps are stitched together in a horizontal connection, as shown in Figure 4. Build a feature pyramid from P ₂ to P ₅ . Let l represent the number of pyramid layers, the resolution of each layer feature map is 2 ^l lower than that of the input image, and all pyramid layers have 256 channels. Set up a new image

The dimension of is h×w, h is the height of the reconstructed image, and w is the width of the reconstructed image. The output of each stage of the FPN network is shown in Table 1 below:

表1Table 1

建议框和建议特征都是可学习的，并且两者是一一对应的。使用一组可学习的目标框作为区域建议，这些建议框由范围从0到1的四维参数表示，分别是归一化的中心坐标、高和宽。在训练期间通过反向传播算法更新建议框的参数。反向传播是当前用来训练人工神经网络的最常用的方法，它将输出层的误差反向逐层传播，通过计算偏导数来更新网络参数使得误差损失函数最小化。这些可学习的建议框是训练集中潜在目标位置的统计，可以被看作不考虑输入的情况下，图像中最有可能包含目标区域的初始猜想。但是，这些框仅提供了大致的定位信息，丢失了物体的姿态与形状，不利于后续的分类与回归，所以通过可学习建议特征，表征每一个实例的特征。是一个高维的潜在向量，用于编码丰富的实例特性。Both proposal boxes and proposal features are learnable, and there is a one-to-one correspondence between the two. A set of learnable object boxes are used as region proposals, which are represented by four-dimensional parameters ranging from 0 to 1, which are normalized center coordinates, height and width, respectively. The parameters of the proposal boxes are updated through the backpropagation algorithm during training. Backpropagation is currently the most commonly used method for training artificial neural networks. It propagates the error of the output layer backward layer by layer, and updates the network parameters by calculating partial derivatives to minimize the error loss function. These learnable proposal boxes are statistics of potential object locations in the training set and can be viewed as an initial guess of the most likely object region contained in the image regardless of the input. However, these frames only provide approximate positioning information, and the pose and shape of the object are lost, which is not conducive to subsequent classification and regression. Therefore, the features of each instance can be represented by learning the suggested features. is a high-dimensional latent vector for encoding rich instance features.

分类与回归分支单元为现有技术，如所进行的回归预测采用一个三层的感知计算，分类预测由一个线性映射层实现，在此不再赘述。目标检测网络模型用一组预测损失用在固定尺寸的分类和框坐标预测上。基于集合的损失在预测和真实对象之间产生最佳二分匹配。例如，目标检测网络模型检测出100个目标框，将真实框也扩展成100个检测框。这样预测和真实都是两个100个元素的集合了。通过采用匈牙利算法进行二分匹配，即对预测集合和真实集合的元素进行一一对应，使得匹配损失最小，由正负样本对来计算损失。The classification and regression branch unit is an existing technology. For example, a three-layer perceptual calculation is used for the regression prediction, and a linear mapping layer is used for the classification prediction, which will not be repeated here. The object detection network model uses a set of prediction losses for fixed-size classification and box coordinate prediction. An ensemble-based loss yields the best bipartite match between predictions and real objects. For example, the target detection network model detects 100 target boxes, and expands the real box into 100 detection boxes. In this way, prediction and reality are two sets of 100 elements. By using the Hungarian algorithm for binary matching, that is, to make one-to-one correspondence between the elements of the prediction set and the real set, so that the matching loss is minimized, and the loss is calculated by positive and negative sample pairs.

其中，目标检测网络模型的损失函数

中，γ^j起平衡难易样本的作用，γ_b用于控制分类器的一个基本行为，并不会作用于类别不平衡问题。/>

是一个可变参数，决定了在正负不平衡问题上对第j类别学习的关注程度，采用梯度引导机制来选择/>

为了更好的满足需求，在实际中将g^j控制在[0，1]范围内。/>

作为权重系数用来平衡不同类别的损失贡献，让稀少的样本比常见样本作出更多损失贡献。对于稀有类别数据，将权重系数设置为较大的值，以增加其损失贡献，而对于频繁类别数据，权重系数保持在1附近。最终损失是训练批次中对象数量归一化后的所有对之和。Among them, the loss function of the target detection network model

Among them, γ ^j plays the role of balancing difficult and easy samples, and γ _b is used to control a basic behavior of the classifier, and will not affect the category imbalance problem. />

is a variable parameter that determines the degree of attention to the learning of the jth category on the problem of positive and negative imbalances, and uses a gradient-guided mechanism to select />

In order to better meet the requirements, g ^j is controlled within the range of [0, 1] in practice. />

As a weight coefficient, it is used to balance the loss contributions of different categories, so that rare samples can make more loss contributions than common samples. For rare class data, the weight coefficient is set to a larger value to increase its loss contribution, while for frequent class data, the weight coefficient is kept around 1. The final loss is the sum of all pairs normalized by the number of objects in the training batch.

由于高速公路上出现的洒落物、动物以及垃圾的种类并不是很多，并且不同类别的数量也呈现极度不平衡的状态。为了解决类别的极端不平衡，在传统的焦点损失函数基础上，加入聚焦系数和权重系数。本实施例采用距离交并比(DIOU)。由于高速公路上的散落物在实时的摄像头拍摄下占比很小，属于小目标，预测框往往比真实框大，形成了包含关系。目标框在预测框的中心和角落的时候损失是一样的，通过加入惩罚项

用于度量目标框和预测框之间中心点的距离。Because there are not many types of spills, animals, and garbage on the highway, and the number of different types is extremely unbalanced. In order to solve the extreme imbalance of categories, on the basis of the traditional focal loss function, a focal coefficient and a weight coefficient are added. In this embodiment, distance intersection over union (DIOU) is used. Since the scattered objects on the highway account for a small proportion in the real-time camera shooting, they are small objects, and the predicted frame is often larger than the real frame, forming a containment relationship. The loss of the target frame is the same when the center and corner of the prediction frame are predicted, by adding a penalty term

It is used to measure the distance between the center point between the target box and the predicted box.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本申请描述较为具体和详细的实施例，但并不能因此而理解为对申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express the more specific and detailed embodiments described in the present application, but should not be construed as limiting the scope of the patent application. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the scope of protection of the patent application should be based on the appended claims.

Claims

1. A method for detecting a small target on an expressway is characterized by comprising the following steps of: the method for detecting the expressway small target comprises the following steps:

s1, acquiring an unlabeled data set X= { X ₁ ,x ₂ ,…,x _l ,…,x _N Performing data enhancement processing on each input image in the unlabeled dataset to form a corresponding reconstruction mapImage forming apparatus

x _l Representing the first input image, l=1, 2, …, N;

s2, establishing a target detection network model and reconstructing an image

Detecting to obtain a corresponding target detection result, wherein the target detection network model comprises a feature extraction module, a dynamic instance interaction head and a classification and regression branch unit, the feature extraction module adopts an FPN network, the dynamic instance interaction head comprises N feature extraction units, each feature extraction unit comprises a self-attention module, a full-connection layer, a first convolution layer, a second convolution layer, a ReLu function and view operation, and the target detection network model executes the following operations:

s21, reconstructing an image

Inputting a feature extraction module to obtain a corresponding multi-scale feature map;

s22, setting N suggestion boxes and corresponding suggestion features, wherein the suggestion boxes are expressed as four-dimensional vectors formed by normalized center coordinates, heights and widths, and the suggestion features have the same dimension as the output features of the feature extraction module;

s23, obtaining corresponding ROI features through RoIAlign operation by correspondingly enabling the suggestion frames and the multi-scale feature map one by one;

s24, inputting the suggested features and the ROI features of each suggested frame into a feature extraction unit of a dynamic instance interaction head in a one-to-one correspondence mode, obtaining corresponding target frames and target features, and executing the following operations by the feature extraction unit:

performing self-attention operation on the suggested features by using a self-attention module to obtain first features;

converting the first feature into a one-dimensional vector through the full connection layer to form a second feature;

inputting the ROI features and the second features into a first convolution layer, sequentially passing through the second convolution layer and a ReLu function, and then adopting view operation to adjust the dimensions to obtain corresponding target features;

s25, updating the suggested frame and the suggested features to be the target frame and the target features, and returning to the step S23 until the iteration times are completed, so as to obtain interaction features;

s26, inputting the interaction characteristics into a classification and regression branch unit to obtain a target detection result.

2. The highway small target detection method according to claim 1, wherein: the data enhancement processing is carried out on each input image in the unlabeled data set to form a corresponding reconstructed image

The method is realized by adopting a data enhancement module, wherein the data enhancement module comprises a first encoder, a second encoder and a decoder, and performs the following operations:

s11, adopting an unlabeled data set X= { X ₁ ,x ₂ ,…,x _l ,…,x _N Training a second encoder and decoder, wherein the second encoder E _θ Is satisfied by the learnable parameter theta

Said decoder->

Satisfy the following requirements

M∈{0,1} ^W×H A block-wise binary mask representing pixels of an image block size W x H, W representing the pixel width of the input image x and H representing the pixel height of the input image x;

s12, dividing each input image into S image blocks;

s13, executing the following operations on each divided input image:

s131, converting the divided input image into vectors by using a first encoder;

s132, acquiring attention map Attn of the ith image block based on the attention policy _i ：

Attn _i ＝q _cls ·k _i ,i∈{0,1,…,p ² -1}

in the formula ,q_cls Queries representing sequences of image blocks, k _i Key embedding representing the ith image block, p representing the size of the image block;

s133, acquiring the first K index sets omega by sequencing each attention attempt:

Ω＝top-rankuttn，K)

in the formula, top-rank (·, K) represents the index of the last K largest elements returned, and Attn represents Attn _i Is a collection of (3);

s134, obtaining binary mask M ^* ：

in the formula ,

represents a round-down operation, mod (·) represents a modulo operation, Ω _i Representing the i-th element in the index set Ω;

s135, according to binary mask M ^* Acquiring a masking image M ^* As indicated by x, dividing the masking image into non-overlapping image blocks and discarding the image blocks blocked by the binary mask, and feeding the remaining visible image blocks into a pre-trained second encoder and decoder to generate a corresponding reconstructed image

3. The highway small target detection method according to claim 1, wherein: loss function of the object detection network model

The calculation is as follows:

wherein ,

in the formula ,

balanced focus loss for predictive and true classification, +.>

To predict L1 loss for a frame and a real frame,

to predict the distance cross-ratio loss of the frame and the real frame lambda _cls 、λ _L1 、λ _diou Sequentially correspond to->

Coefficient of alpha _t To balance the weight factor of the number of positive and negative samples, p _t To predict the probability of being a positive sample, γ ^j For the j-th class of focusing coefficients, j=1, 2, …, T is the total number of classes, γ ^j Is decoupled into a first component gamma _b And a second component->

First component gamma _b Basic behavior for controlling a classifier, +.>

For variable parameters, a gradient guidance mechanism is used to select +.>

g ^j The cumulative gradient ratio of the j-th positive sample and the negative sample is represented, and the value range is [0,1]S is the determination of gamma ^j Scale factor of upper limit, y _pz Representing predicted value, y _gz Representing the true value, z=1, 2, …, n, n represents the number of target objects, ρ ² (b _p ,b _g ) Representing the center point b of the prediction frame _p And the center point b of the real frame _h C represents the diagonal distance of the smallest rectangle covering both the predicted and real frames,/->

Representing penalty terms, IOU represents the cross-over ratio.

4. The highway small target detection method according to claim 1, wherein: the iteration number e=6, the number of suggested boxes and suggested features n=100.