CN114220053B

CN114220053B - Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching

Info

Publication number: CN114220053B
Application number: CN202111534212.4A
Authority: CN
Inventors: 吕京国; 白颖奇; 曹逸飞; 王琛; 贺柳良
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing Lingyun Space Technology Co ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-06-03
Anticipated expiration: 2041-12-15
Also published as: CN114220053A

Abstract

The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which comprises the following steps: inputting the image frames into a trained light suppression model and feature-enhanced multi-scale vehicle detection module, and acquiring a plurality of vehicle detection result frames, namely respectively capturing images in each vehicle detection result frame in the image frames to obtain z detected vehicle images; inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; thereby completing retrieval and positioning of the target vehicle. The method is suitable for videos shot by the unmanned aerial vehicle in different complex scenes, the influences of insufficient vehicle detail information caused by illumination and target size change of the unmanned aerial vehicle at different heights are removed to the maximum extent, the problem that the vehicle to be inquired is difficult to find in a plurality of targets is solved, and the vehicle to be inquired can be more accurately retrieved.

Description

A UAV video vehicle retrieval method based on vehicle feature matching

技术领域technical field

本发明属于遥感信息智能处理技术领域，具体涉及一种基于车辆特征匹配的无人机视频车辆检索方法。The invention belongs to the technical field of intelligent processing of remote sensing information, and in particular relates to a method for retrieving unmanned aerial vehicle video vehicles based on vehicle feature matching.

背景技术Background technique

地面监控视频通过在十字路口、高速路口等关键地方安装固定的摄像机来获取道路信息，具有全天候、受环境影响小的优点。基于地面监控视频的车辆检索系统主要包括：(1)传统的车辆检索方法：传统的车辆检索方法通过视觉词袋、深度哈希等算法提取目标车辆的细节特征，如Haar特征、SIFT特征以及HOG特征等，对车辆的表征能力有限，导致相似车辆的区分能力较弱；(2)基于深度学习的车辆检索方法：基于深度学习的车辆检索方法通过大量样本数据对神经网络进行训练，使神经网络能够提取车辆特征，进而完成车辆检索任务。该类检索方法基于经典的目标检测网络提取车辆的语义信息，如Faster R-CNN、YOLOV3、SPP-NET，在简单场景下具有较高的检索精度。Ground surveillance video obtains road information by installing fixed cameras at key places such as intersections and high-speed intersections, which has the advantages of all-weather and less environmental impact. The vehicle retrieval system based on ground surveillance video mainly includes: (1) Traditional vehicle retrieval method: The traditional vehicle retrieval method extracts the detailed features of the target vehicle through algorithms such as visual word bag and deep hashing, such as Haar feature, SIFT feature and HOG (2) Vehicle retrieval method based on deep learning: The vehicle retrieval method based on deep learning trains the neural network through a large number of sample data, making the neural network The vehicle features can be extracted to complete the vehicle retrieval task. This kind of retrieval method extracts the semantic information of vehicles based on the classic object detection network, such as Faster R-CNN, YOLOV3, SPP-NET, and has high retrieval accuracy in simple scenes.

但由于地面监控摄像机的拍摄角度倾斜，视频中包含众多种尺度的车辆，在检测时会导致车辆漏检，进而影响车辆检索的精度。同时，因摄像机的机位固定，待检索车辆仅短暂出现在画面，对后续跟踪任务帮助有限。However, due to the oblique shooting angle of the ground surveillance camera, the video contains vehicles of various scales, which will lead to vehicle missed detection during detection, thereby affecting the accuracy of vehicle retrieval. At the same time, due to the fixed position of the camera, the vehicle to be retrieved only briefly appears on the screen, which is of limited help for subsequent tracking tasks.

不同于地面监控，无人机具有成本低廉、快速部署、灵活机动、监控范围广等优点，无人机监控视频的车辆检索不仅能够快速实现对任意路口车辆的检索任务，还能在检索成功后继续实现目标车辆的跟踪等任务。但由于视频中所有车辆的大小均会随着无人机高度而变化，当候选框设计不合理、网络过深时，会因候选框回归能力不足、目标信息被稀释问题，导致过大或过小尺度的车辆被漏检。同时，因无人机通常在天气较好的情况下使用，视频中往往会存在亮度过高导致车辆细节丢失问题，导致视频亮度过高区域的车辆被漏检。。Different from ground monitoring, UAV has the advantages of low cost, rapid deployment, flexible maneuverability, and wide monitoring range. Vehicle retrieval of UAV surveillance video can not only quickly realize the retrieval task of vehicles at any intersection, but also after successful retrieval. Continue to achieve tasks such as tracking of the target vehicle. However, since the size of all vehicles in the video will change with the height of the drone, when the candidate frame design is unreasonable and the network is too deep, it will be too large or too large due to the insufficient regression ability of the candidate frame and the dilution of the target information. Small-scale vehicles are missed. At the same time, because drones are usually used in good weather conditions, there is often a problem of losing vehicle details due to excessive brightness in the video, resulting in vehicles in areas with high video brightness being missed. .

发明内容SUMMARY OF THE INVENTION

针对现有技术直接应用在无人机视频车辆检索出现的漏检问题，本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，可有效解决上述问题。Aiming at the problem of missed detection when the prior art is directly applied to the UAV video vehicle retrieval, the present invention provides a UAV video vehicle retrieval method based on vehicle feature matching, which can effectively solve the above problems.

本发明采用的技术方案如下：The technical scheme adopted in the present invention is as follows:

本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，包括以下步骤：The present invention provides a UAV video vehicle retrieval method based on vehicle feature matching, comprising the following steps:

步骤1，确定需要检索的目标车辆图S；Step 1, determine the target vehicle map S to be retrieved;

步骤2，无人机对地面进行拍摄，获得无人机视频数据；Step 2, the drone shoots the ground to obtain the drone video data;

步骤3，对所述无人机视频数据的每一帧图像均执行步骤4-步骤8，判断每一帧图像中是否包含需要检索的目标车辆图S：Step 3: Steps 4 to 8 are performed on each frame of the UAV video data to determine whether each frame of the image contains the target vehicle map S that needs to be retrieved:

其中，将当前图像帧记为Frm(t)，t为当前图像帧的帧数，采用以下步骤4-步骤8，判断图像帧Frm(t)中是否包含需要检索的目标车辆图S：Wherein, the current image frame is denoted as Frm(t), t is the frame number of the current image frame, and the following steps 4 to 8 are used to determine whether the image frame Frm(t) contains the target vehicle map S that needs to be retrieved:

步骤4，将所述图像帧Frm(t)输入到训练完成的抑光模型中，进行特征提取和抑光处理，得到包含n个图层的光照抑制特征图，记为F_RestrainMap；Step 4: Input the image frame Frm(t) into the light suppression model that has been trained, perform feature extraction and light suppression processing, and obtain a light suppression feature map containing n layers, denoted as F _Restrain Map;

步骤5，将光照抑制特征图F_RestrainMap输入到特征增强的多尺度车辆检测模块中，获取图像帧Frm(t)中的z个车辆检测结果框：Step 5: Input the light suppression feature map F _Restrain Map into the feature-enhanced multi-scale vehicle detection module, and obtain z vehicle detection result frames in the image frame Frm(t):

步骤5.1，光照抑制特征图F_RestrainMap具有n个图层，对于每一图层，表示为：图层layer_i，i＝1，...n，均执行步骤5.1.1-步骤5.1.3，得到图层layer_i的依赖权重值w″_i：Step 5.1, the light suppression feature map F _Restrain Map has n layers, and for each layer, it is expressed as: layer layer _i , i=1,...n, step 5.1.1-step 5.1.3 is performed. , get the dependent weight value w″ _{i of layer i} _:

步骤5.1.1，计算图层layer_i的所有像素点的平均值，作为图层layer_i的初始权重w_i；Step 5.1.1, calculate the average value of all pixels of layer _i as the initial weight w _{i of layer i} _;

步骤5.1.2，将图层layer_i的初始权重w_i输入到全连接层，通过sigmoid激活函数，将初始权重w_i映射到(0，1)特征空间，从而输出图层layer_i的归一化权重值w′_i；Step 5.1.2, input the initial weight _wi of layer _i to the fully connected layer, and map the initial weight _wi to the (0, 1) feature space through the sigmoid activation function, thereby outputting the normalization of layer _i the weighted value w′ _i ;

步骤5.1.3，建立分段函数，对图层layer_i的归一化权重值w′_i进行分段抑制或增强，得到图层layer_i的依赖权重值w″_i：Step 5.1.3, establish a segment function, perform segment suppression or enhancement on the normalized weight value w′ _i of layer _i , and obtain the dependent weight value w′ _{i of layer i} _:

其中：in:

ε代表系统常数，用于调节依赖权重值对图层的影响程度；ε represents the system constant, which is used to adjust the degree of influence of the dependent weight value on the layer;

步骤5.2，由于得到光照抑制特征图F_RestrainMap的n个图层的依赖权重值，分别为：w″₁...w″_n；In step 5.2, the dependent weight values of the n layers of the light suppression feature map F _Restrain Map are obtained, respectively: w″ ₁ ... w″ _n ;

将w″₁...w″_n合并，得到光照抑制特征图F_RestrainMap的1*1*n的依赖权重向量W″；Combine w″ ₁ ... w″ _n to obtain the 1*1*n dependent weight vector W″ of the light suppression feature map F _Restrain Map;

将依赖权重向量W″作为卷积核对光照抑制特征图F_RestrainMap进行卷积，得到图层增强特征图F_EhcMap；Convolve the light suppression feature map F _Restrain Map with the dependent weight vector W" as the convolution kernel to obtain the layer enhancement feature map F _Ehc Map;

步骤5.3，将图层增强特征图F_EhcMap输入到小目标响应层，得到小目标显著特征图F_SmallMap；Step 5.3, input the layer enhancement feature map F _Ehc Map to the small target response layer, and obtain the small target salient feature map F _Small Map;

其中，小目标显著特征图F_SmallMap中包含更多的车辆细节信息，在无人机飞行高度较高时，可提高小目标车辆检测的成功率；Among them, the small target salient feature map F _Small Map contains more vehicle detail information, which can improve the success rate of small target vehicle detection when the drone is flying at a high altitude;

步骤5.4，将小目标显著特征图F_SmallMap输入到大目标响应层，得到大目标显著特征图F_LargeMap；Step 5.4, input the salient feature map F _Small Map of the small target into the response layer of the large target, and obtain the salient feature map F _Large Map of the large target;

其中：大目标显著特征图F_LargeMap包含更多的语义信息，在无人机飞行高度较低时，可提高大目标车辆检测的精确率；Among them: the large target salient feature map F _Large Map contains more semantic information, which can improve the accuracy of large target vehicle detection when the UAV is flying at a low altitude;

步骤5.5，将小目标显著特征图F_SmallMap输入到结果框生成层，从而在图像帧Frm(t)中，得到p个小目标车辆检测结果框Box_Small(1)...Box_Small(p)；Step 5.5, input the small target salient feature map F _Small Map into the result frame generation layer, so that in the image frame Frm(t), p small target vehicle detection result frames Box _Small (1)...Box _Small (p );

将大目标显著特征图F_LargeMap输入到结果框生成层，从而在图像帧Frm(t)中，得到q个大目标车辆检测结果框Box_Large(1)...Box_Large(q)；Input the large target salient feature map F _Large Map into the result box generation layer, so that in the image frame Frm(t), q large target vehicle detection result boxes Box _Large (1)...Box _Large (q) are obtained;

具体方法为：The specific method is:

步骤5.5.1，将小目标显著特征图F_SmallMap中的每个像素点作为锚点，以每个锚点为中心，生成多个大小不同的候选框；因此，对于小目标显著特征图F_SmallMap中的所有像素点，一共得到若干个候选框；Step 5.5.1, take each pixel in the small target salient feature map F _Small Map as an anchor point, and take each anchor point as the center to generate multiple candidate boxes of different sizes; therefore, for the small target salient feature map F All the pixels in the _Small Map get a total of several candidate frames;

步骤5.5.2，计算得到每个候选框的车辆概率值；Step 5.5.2, calculate the vehicle probability value of each candidate frame;

步骤5.5.3，对候选框进行筛选，去除车辆概率值低于预设阈值的候选框，从而得到候选框：A₁，A₂...A_p；其中，p代表候选框数量；Step 5.5.3: Screen the candidate frames, and remove the candidate frames whose vehicle probability value is lower than the preset threshold, so as to obtain the candidate frames: A ₁ , A ₂ . . . A _p ; wherein, p represents the number of candidate frames;

步骤5.5.4计算候选框A₁，A₂...A_p中每个候选框的回归参数，每个候选框均具有以下回归参数：宽度，高度和锚点偏移量；Step 5.5.4 Calculate the regression parameters of each candidate box in the candidate boxes A ₁ , A ₂ . . . A _p , each candidate box has the following regression parameters: width, height and anchor offset;

步骤5.5.5，将候选框A₁，A₂...A_p中每个候选框的锚点坐标和其对应的回归参数映射回图像帧Frm(t)，从而在图像帧Frm(t)中，得到p个小目标车辆检测结果框Box_Small(1)...Box_Small(p)；Step 5.5.5, map the anchor point coordinates and the corresponding regression parameters of each candidate frame in the candidate frame A ₁ , A ₂ . . . A _p back to the image frame Frm(t), so that in the image frame Frm(t) , obtain p small target vehicle detection result boxes Box _Small (1)...Box _Small (p);

步骤5.5.6，以大目标显著特征图F_LargeMap替换步骤5.5.1中的小目标显著特征图F_SmallMap，增大步骤5.5.1中的候选框的初始生成尺寸，采用步骤5.5.1-5.5.5的方法，在图像帧Frm(t)中，得到q个大目标车辆检测结果框Box_Large(1)...Box_Large(q)；Step 5.5.6, replace the small target salient feature map F _Small Map in step 5.5.1 with the large target salient feature map F _Large Map, increase the initial generation size of the candidate frame in step 5.5.1, and use step 5.5.1 - The method of 5.5.5, in the image frame Frm(t), obtain q large target vehicle detection result boxes Box _Large (1)...Box _Large (q);

步骤5.6，将图像帧Frm(t)中的p个小目标车辆检测结果框Box_Small(1)...Box_Small(p)和q个大目标车辆检测结果框Box_Large(1)...Box_Large(q)，统称为p+q个车辆检测结果框；Step 5.6, put the p small target vehicle detection result boxes Box _Small (1)...Box _Small (p) and q large target vehicle detection result boxes Box _Large (1)... Box _Large (q), collectively referred to as p+q vehicle detection result boxes;

对于图像帧Frm(t)中得到的p+q个车辆检测结果框，计算任意两个车辆检测结果框之间的相似系数，如果相似系数小于设定阈值，则不进行处理；如果相似系数大于设定阈值，则将这两个车辆检测结果框合并为一个车辆检测结果框，最终得到z个车辆检测结果框，表示为：Box(1)...Box(z)；For the p+q vehicle detection result frames obtained in the image frame Frm(t), calculate the similarity coefficient between any two vehicle detection result frames. If the similarity coefficient is less than the set threshold, it will not be processed; if the similarity coefficient is greater than If the threshold is set, the two vehicle detection result boxes are combined into one vehicle detection result box, and finally z vehicle detection result boxes are obtained, which are expressed as: Box(1)...Box(z);

步骤6，在图像帧Frm(t)中分别截取每个车辆检测结果框中的图像，得到z个检测车辆图；Step 6, in the image frame Frm(t), respectively intercept the image in each vehicle detection result frame to obtain z detection vehicle images;

步骤7，将每个检测车辆图与目标车辆图S输入到多特征联合车辆搜索网络进行特征匹配，得到目标车辆所在的检测车辆图；该检测车辆图在图像帧Frm(t)中的位置，即为目标车辆在图像帧Frm(t)的位置，从而完成对目标车辆的检索定位；Step 7: Input each detected vehicle map and the target vehicle map S into the multi-feature joint vehicle search network for feature matching to obtain the detected vehicle map where the target vehicle is located; the position of the detected vehicle map in the image frame Frm(t), That is, the position of the target vehicle in the image frame Frm(t), so as to complete the retrieval and positioning of the target vehicle;

步骤8，若当前图像帧Frm(t)中所有检测车辆图与目标车辆图S的匹配度均低于设定阈值，即当前图像帧Frm(t)中不存在目标车辆，继续对下一时刻图像帧Frm(t+1)进行检索。Step 8: If the matching degree of all detected vehicle maps in the current image frame Frm(t) and the target vehicle map S is lower than the set threshold, that is, there is no target vehicle in the current image frame Frm(t), continue to the next moment. The image frame Frm(t+1) is retrieved.

优选的，步骤4具体为：Preferably, step 4 is specifically:

步骤4.1，构建抑光模型；Step 4.1, build a light suppression model;

所述抑光模型为双分支网络，包括学习分支网络和抑制分支网络；其中，所述学习分支网络包括串联的卷积层conv1、浅层特征挑选层f₁()和深层特征挑选层f₂()；所述抑制分支网络包括串联的卷积层conv1′、浅层特征挑选层f′₁()和深层特征挑选层f′₂()；The light suppression model is a double-branch network, including a learning branch network and a suppressing branch network; wherein, the learning branch network includes a convolutional layer conv1 in series, a shallow feature selection layer f ₁ () and a deep feature selection layer f ₂ ( ); the suppression branch network includes a convolutional layer conv1 ′ in series, a shallow feature selection layer f′ ₁ ( ) and a deep feature selection layer f′ ₂ ( );

步骤4.2，获取a组训练样本对；Step 4.2, obtain a group of training sample pairs;

每组训练样本对包括无人机视角下光线正常图像I和光线过亮图像I′；其中，光线过亮图像I′为对光线正常图像I随机添加亮度值的方式获得；将a组训练样本对分别表示为：(I₁，I′₁)，(I₂，I′₂)，...，(I_a，I′_a)；Each group of training sample pairs includes a normal light image I and an excessively bright image I' from the perspective of the drone; wherein, the excessively bright image I' is obtained by randomly adding brightness values to the normal light image I; The pairs are respectively expressed as: (I ₁ , I' ₁ ), (I ₂ , I' ₂ ), ..., (I _a , I' _a );

步骤4.3，采用a组训练样本对输入到步骤4.1构建的抑光模型进行离线训练，离线训练的目标函数为：Step 4.3, use a group of training samples to perform offline training on the light suppression model constructed in step 4.1. The objective function of offline training is:

其中：in:

Loss_抑光代表抑光损失函数；Loss _suppression represents the suppression loss function;

argmin()代表使目标函数取最小值时的变量值；argmin() represents the variable value when the objective function takes the minimum value;

f′₁(I′_j)代表光线过亮图像I′_j输入到浅层特征挑选层f′₁()后，输出的浅层特征值；f' ₁ (I' _j ) represents the output shallow feature value after the image I' _j with too bright light is input to the shallow feature selection layer f' ₁ ();

f′₂(I′_j)代表光线过亮图像I′_j输入到深层特征挑选层f′₂()后，输出的深层特征值；f′ ₂ (I′ _j ) represents the output deep feature value after the image I′ _j with too bright light is input to the deep feature selection layer f′ ₂ ( );

f₁(I_j)代表光线正常图像I_j输入到浅层特征挑选层f₁()后，输出的浅层特征值；f ₁ (I _j ) represents the shallow feature value of the output after the normal light image I _j is input to the shallow feature selection layer f ₁ ();

f₂(I_j)代表光线正常图像I_j输入到深层特征挑选层f₂()后，输出的深层特征值；f ₂ (I _j ) represents the deep feature value of the output after the normal light image I _j is input to the deep feature selection layer f ₂ ();

代表：L2范数的平方；

Represents: the square of the L2 norm;

γ代表惩罚系数，通过人为设置控制

对抑光损失函数的影响，其值越大，

对抑光损失函数的影响越大；γ represents the penalty coefficient, which is controlled by artificial settings

The effect on the suppression loss function, the larger the value, the

The greater the impact on the suppression loss function;

步骤4.4，通过对抑光模型进行离线训练，削弱抑制分支网络对于亮度特征的敏感程度，使抑制分支网络能够对无人机拍摄的亮度过高图像进行光照特征抑制，提高无人机视角下车辆细节特征的显著性；Step 4.4, through offline training of the light suppression model, the sensitivity of the suppression branch network to the brightness feature is weakened, so that the suppression branch network can suppress the light feature of the image with high brightness taken by the UAV, and improve the vehicle under the perspective of the UAV. The salience of detail features;

因此，将图像帧Frm(t)输入到训练完成的抑光模型的抑制分支网络，得到光照抑制特征图F_RestrainMap。Therefore, the image frame Frm(t) is input into the suppression branch network of the trained light suppression model, and the light suppression feature map F _Restrain Map is obtained.

优选的，步骤5.6中，将两个车辆检测结果框合并为一个车辆检测结果框，具体为：Preferably, in step 5.6, two vehicle detection result frames are combined into one vehicle detection result frame, specifically:

设需要合并的两个车辆检测结果框分别为：车辆检测结果框Box_Small(1)和车辆检测结果框Box_Large(1)；合并后的车辆检测结果框表示为Box(1)，则：Suppose the two vehicle detection result boxes that need to be merged are: the vehicle detection result box Box _Small (1) and the vehicle detection result box Box _Large (1); the combined vehicle detection result box is represented as Box (1), then:

Box(1)的中心点，为Box_Small(1)中心点和Box_Large(1)中心点连线的中间点；The center point of Box(1) is the middle point of the line connecting the center point of Box _Small (1) and the center point of Box _Large (1);

Box(1)的高度，为Box_Small(1)高度和Box_Large(1)高度的平均值；The height of Box(1) is the average of the height of Box _Small (1) and Box _Large (1);

Box(1)的宽度，为Box_Small(1)宽度和Box_Large(1)宽度的平均值。The width of Box(1), which is the average of the width of Box _Small (1) and the width of Box _Large (1).

优选的，步骤7中，多特征联合车辆搜索网络建立方式为：Preferably, in step 7, the multi-feature joint vehicle search network is established in the following manner:

以车辆颜色特征、车辆类型特征作为车辆全局特征，以车辆侧视图、车辆前视图、车辆后视图、车辆顶视图和非车辆视图作为车辆局部特征，建立多特征联合车辆搜索网络。A multi-feature joint vehicle search network is established with vehicle color features and vehicle type features as vehicle global features, and vehicle side view, vehicle front view, vehicle rear view, vehicle top view and non-vehicle view as vehicle local features.

优选的，步骤7具体为：Preferably, step 7 is specifically:

步骤7.1，构建多特征联合车辆搜索网络；所述多特征联合车辆搜索网络包括全局特征识别模块和局部特征匹配模块；Step 7.1, building a multi-feature joint vehicle search network; the multi-feature joint vehicle search network includes a global feature identification module and a local feature matching module;

步骤7.2，将z个检测车辆图和目标车辆图S分别输入到全局特征识别模块，采用以下方法，得到z′个与目标车辆图S颜色、车辆类型一致的疑似车辆图；Step 7.2, input the z detected vehicle images and the target vehicle image S to the global feature recognition module respectively, and obtain z' suspected vehicle images with the same color and vehicle type as the target vehicle image S by using the following method;

其中，全局特征识别模块包括共享特征层、车辆颜色特征层和车辆类型特征层；The global feature recognition module includes a shared feature layer, a vehicle color feature layer, and a vehicle type feature layer;

步骤7.2.1，识别目标车辆图S的颜色特征，包括以下步骤：Step 7.2.1, identify the color features of the target vehicle map S, including the following steps:

步骤7.2.1.1，将目标车辆图S输入到共享特征层，得到共享特征图F_ShrMap；Step 7.2.1.1, input the target vehicle map S into the shared feature layer to obtain the shared feature map F _Shr Map;

步骤7.2.1.2，将共享特征图F_ShrMap输入到车辆颜色特征层，得到车辆颜色特征向量V_Color；其中，所述车辆颜色特征层包括conv4_Color、最大池化层Maxpool和全连接层FC_Color；Step 7.2.1.2, the shared feature map F _Shr Map is input into the vehicle color feature layer, and the vehicle color feature vector V _Color is obtained; wherein, the vehicle color feature layer includes conv4 _Color , maximum pooling layer Maxpool and fully connected layer FC _Color ;

步骤7.2.1.3，采用矩阵广播的方式将车辆颜色特征向量V_Color与共享特征图F_ShrMap相乘，得到颜色敏感特征图F_ColorMap；Step 7.2.1.3: Multiply the vehicle color feature vector V _Color and the shared feature map F _Shr Map by means of matrix broadcasting to obtain the color-sensitive feature map F _Color Map;

步骤7.2.1.4，以颜色敏感特征图F_ColorMap为卷积核，对目标车辆图S进行互卷积，得到颜色特征增强图S′_Color，增强目标车辆图S对颜色特征的响应程度；Step 7.2.1.4, take the color-sensitive feature map F _Color Map as the convolution kernel, perform mutual convolution on the target vehicle map S to obtain a color feature enhancement map S′ _Color , and enhance the response degree of the target vehicle map S to the color feature;

步骤7.2.1.5，将颜色特征增强图S′_Color依次输入到共享特征层、Conv4_Color、Conv5_Color、最大值池化层、全连接层，通过非极大值抑制算法得到目标车辆图S的颜色类别；Step 7.2.1.5, input the color feature enhancement map S′ _Color to the shared feature layer, Conv4 _Color , Conv5 _Color , maximum pooling layer, and fully connected layer in turn, and obtain the color of the target vehicle map S through the non-maximum value suppression algorithm category;

步骤7.2.2，采用相同方法，得到目标车辆图S的车辆类型，进而得到每个检测车辆图所属的颜色类别和车辆类型；Step 7.2.2, using the same method, obtain the vehicle type of the target vehicle map S, and then obtain the color category and vehicle type to which each detected vehicle map belongs;

步骤7.2.3，在z个检测车辆图中，判断是否存在与目标车辆图S颜色、车辆类型相同的检测车辆图，如果没有，则直接对下一帧图像进行检索；Step 7.2.3, in the z detected vehicle images, determine whether there is a detected vehicle image with the same color and vehicle type as the target vehicle image S, if not, directly search the next frame of image;

如果有，则将所有与目标车辆图S颜色、车辆类型相同的检测车辆图均提取出来，假设一共提取到z′个，将提取到的z′个检测车辆图称为疑似车辆图，表示为：疑似车辆图D_c，其中，c＝1...z′；If there are, all the detected vehicle images with the same color and vehicle type as the target vehicle image S are extracted. Assuming that a total of z' are extracted, the extracted z' detected vehicle images are called suspected vehicle images, which are expressed as : Suspected vehicle map D _c , where c=1...z′;

步骤7.3，将目标车辆图S和每个疑似车辆图D_c分别输入到局部特征匹配模块，局部特征匹配模块采用匹配算法，得到目标车辆图S的车辆均值向量矩阵V_s；Step 7.3, input the target vehicle map S and each suspected vehicle map D _c to the local feature matching module respectively, and the local feature matching module adopts a matching algorithm to obtain the vehicle mean vector matrix V _s of the target vehicle map S;

局部特征匹配模块采用相同匹配算法，得到每个疑似车辆图D_c的疑似车辆均值向量矩阵V_c；The local feature matching module adopts the same matching algorithm to obtain the suspected vehicle mean value vector matrix V _{c of each suspected vehicle map D c} _;

其中，局部特征匹配模块包括特征提取层、特征稀疏卷积层Conv6和一个全连接层FC_sight；Among them, the local feature matching module includes a feature extraction layer, a feature sparse convolutional layer Conv6 and a fully connected layer FC _sight ;

局部特征匹配模块对目标车辆图S进行特征匹配，得到目标车辆图S的车辆均值向量矩阵V_s，具体为：The local feature matching module performs feature matching on the target vehicle map S, and obtains the vehicle mean vector matrix V _s of the target vehicle map S, specifically:

步骤7.3.1，将目标车辆图S通过4*4网格进行格网分割，得到16个车辆子块图；Step 7.3.1, divide the target vehicle map S through a 4*4 grid to obtain 16 vehicle sub-block maps;

步骤7.3.2，将每个车辆子块图分别输入到特征提取层，得到对应的车辆子块特征图F_subMap(m)，m＝1...16；Step 7.3.2, input each vehicle sub-block map to the feature extraction layer respectively, and obtain the corresponding vehicle sub-block feature map F _sub Map(m), m=1...16;

步骤7.3.3，将每个车辆子块特征图F_subMap(m)输入到特征稀疏卷积层Conv6，得到对应的稀疏特征图F_sparseMap(m)；Step 7.3.3, input the feature map F _sub Map(m) of each vehicle sub-block into the feature sparse convolution layer Conv6 to obtain the corresponding sparse feature map F _sparse Map(m);

步骤7.3.4，确定车辆子块图的视角类别：Step 7.3.4, determine the viewing angle category of the vehicle subplot:

将每个稀疏特征图F_sparseMap(m)输入到全连接层FC_sight，通过非极大值抑制，得到该车辆子块图的视角类别；其中，视角类别包括侧视图、前视图、后视图、顶视图和非车辆视图五类；Input each sparse feature map F _sparse Map(m) into the fully connected layer FC _sight , and obtain the viewing angle category of the vehicle sub-block map through non-maximum suppression; among which, the viewing angle categories include side view, front view, and rear view , top view and non-vehicle view five categories;

步骤7.3.5，确定车辆子块图的视角类别的视角向量：Step 7.3.5, determine the view vector of the view category of the vehicle subplot:

如果视角类别为侧视图、前视图、后视图和顶视图，则提取每个稀疏特征图F_sparseMap(m)的特征，将其重塑为一维特征向量，该一维特征向量作为该车辆子块图对应的视角向量；其中，所述视角向量依据视角类别划分包括：侧视向量、正视向量、后视向量和顶视向量；If the viewing angle categories are side view, front view, rear view and top view, extract the features of each sparse feature map F _sparse Map(m) and reshape it into a one-dimensional feature vector, which is used as the vehicle The viewing angle vector corresponding to the sub-block map; wherein, the viewing angle vector is divided according to the viewing angle category and includes: a side view vector, a front view vector, a rear view vector and a top view vector;

如果视角类别为非车辆视图，则舍弃；If the viewing angle category is non-vehicle view, it will be discarded;

步骤7.3.6，确定每种视角类别的视角均值向量：Step 7.3.6, determine the view mean vector for each view category:

求取目标车辆图S中相同视角类别的各个车辆子块图的视角向量均值，分别得到侧视视角均值向量、正视视角均值向量、后视视角均值向量和顶视视角均值向量；Obtain the mean value of the angle vector of each vehicle sub-block graph of the same angle of view category in the target vehicle map S, and obtain the mean vector of the side view angle, the mean vector of the front view, the mean vector of the rear view and the mean vector of the top view respectively;

若不存在某一视角类别，则其视角均值向量不存在，则将该视角均值向量的所有元素置为0；If a certain viewing angle category does not exist, then its viewing angle mean vector does not exist, then all elements of the viewing angle mean vector are set to 0;

因此，得到四个视角类别的视角均值向量V_cl；其中，cl＝1，2，3，4，分别代表侧视视角均值向量V₁、正视视角均值向量V₂、后视视角均值向量V₃和顶视视角均值向量V₄；四个视角类别的视角均值向量V_cl，构成目标车辆图S的车辆均值向量矩阵V_s；Therefore, the viewing angle mean vector V _cl of the four viewing angle categories is obtained; wherein, cl=1, 2, 3, 4, respectively representing the side view viewing angle mean vector V ₁ , the front viewing viewing angle mean vector V ₂ , and the rear viewing viewing angle mean vector V ₃ and the top view angle mean vector V ₄ ; the angle mean vector V _cl of the four viewing angle categories constitutes the vehicle mean vector matrix V _s of the target vehicle map S;

与之对应，得到每个疑似车辆图D_c的四个视角类别的疑似车辆均值向量V′_cl，构成疑似车辆图D_c的疑似车辆均值向量矩阵V_c；Correspondingly, the suspected vehicle mean vector V′ _cl of the four viewing angle categories of each suspected vehicle map D _c is obtained, forming the suspected vehicle mean vector matrix V _{c of the suspected vehicle map D c} _;

步骤7.4，计算目标车辆图S与每个疑似车辆图D_c的共有视角类别的视角均值向量个数Num，采用下式，得到与每个疑似车辆图D_c对应的特征匹配值Match；Step 7.4: Calculate the number Num of the average viewing angle vectors of the common viewing angle category of the target vehicle map S and each suspected vehicle map D _c , and use the following formula to obtain the feature matching value Match corresponding to each suspected vehicle map D _c ;

其中，λ为视角均值向量个数的权重；T表示转置；tr表示矩阵的迹，代表矩阵主对角线元素之和；Among them, λ is the weight of the number of viewing angle mean vectors; T is the transpose; tr is the trace of the matrix, which is the sum of the main diagonal elements of the matrix;

步骤7.5，当存在多个疑似车辆图D_c的特征匹配值Match高于阈值时，则通过非极大值抑制方法在多个疑似车辆图D_c中，确定目标车辆所在的疑似车辆图，该疑似车辆图在图像帧Frm(t)中的位置为目标车辆在图像帧Frm(t)的位置；Step 7.5, when the feature matching value Match of multiple suspected vehicle maps _Dc is higher than the threshold, the non-maximum value suppression method is used to determine the suspected vehicle map where the target vehicle is located in the multiple suspected vehicle maps _Dc . The position of the suspected vehicle map in the image frame Frm(t) is the position of the target vehicle in the image frame Frm(t);

当目标车辆图S和所有疑似车辆图D_c的特征匹配值Match均低于阈值，则图像帧Frm(t)中不包含目标车辆。When the feature matching value Match of the target vehicle map S and all suspected vehicle maps Dc is lower than the threshold, the image frame Frm( _t ) does not contain the target vehicle.

优选的，步骤7.3.3中，为使稀疏特征图能够充分表达车辆子块特征图F_subMap(m)中的特征，减少压缩过程中的信息损失，在训练时采用压缩损失函数Loss_sparse：Preferably, in step 7.3.3, in order to enable the sparse feature map to fully express the features in the vehicle sub-block feature map F _sub Map(m) and reduce the information loss in the compression process, the compression loss function Loss _sparse is used during training:

Loss_sparse＝Min(F_subMap(m)-(F_sparseMap(m)*W_Tran))Loss _sparse = Min(F _sub Map(m)-(F _sparse Map(m)*W _Tran ))

式中：where:

W_Tran为通过反卷积得到的上采样权重。W _Tran is the upsampling weight obtained by deconvolution.

本发明提供的一种基于车辆特征匹配的无人机视频车辆检索方法具有以下优点：A UAV video vehicle retrieval method based on vehicle feature matching provided by the present invention has the following advantages:

本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，适用于无人机在不同复杂场景下拍摄的视频，最大限度去除光照导致车辆细节信息不足和无人机不同高度下目标大小变化的影响，解决待查询车辆在众多目标中难以发现的问题，可以更加准确的检索到待查询车辆。The invention provides a UAV video vehicle retrieval method based on vehicle feature matching, which is suitable for videos shot by UAVs in different complex scenes, and minimizes the lack of vehicle detail information caused by illumination and the target size at different heights of the UAV. The impact of changes can solve the problem that the vehicle to be queried is difficult to find among many targets, and the vehicle to be queried can be retrieved more accurately.

附图说明Description of drawings

图1为本发明提供的一种基于车辆特征匹配的无人机视频车辆检索方法的流程示意图；1 is a schematic flowchart of a method for retrieving UAV video vehicles based on vehicle feature matching provided by the present invention;

图2为抑光模型的结构图；Fig. 2 is the structure diagram of light suppression model;

图3为每个特征挑选层f的结构图；Fig. 3 is the structure diagram of each feature selection layer f;

图4为右侧分支网络的结构图；Fig. 4 is the structure diagram of the right branch network;

图5为车辆多维特征概率识别网络的图。Figure 5 is a diagram of a vehicle multi-dimensional feature probabilistic identification network.

具体实施方式Detailed ways

为了使本发明所解决的技术问题、技术方案及有益效果更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects solved by the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，主要思路为：构建并训练抑光模型，生成光照抑制特征图；构建特征增强的多尺度车辆检测模块，获取当前帧内所有车辆检测结果框；在图像帧Frm(t)中分别截取每个车辆检测结果框中的图像，得到z个检测车辆图；将每个检测车辆图与目标车辆图S输入到多特征联合车辆搜索网络进行特征匹配，得到目标车辆所在的检测车辆图；该检测车辆图在图像帧Frm(t)中的位置，即为目标车辆在图像帧Frm(t)的位置，从而完成对目标车辆的检索定位。本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，适用于无人机在不同复杂场景下拍摄的视频，最大限度去除光照导致车辆细节信息不足和无人机不同高度下目标大小变化的影响，解决待查询车辆在众多目标中难以发现的问题，可以更加准确的检索到待查询车辆。The present invention provides a UAV video vehicle retrieval method based on vehicle feature matching. The main ideas are: constructing and training a light suppression model, and generating a light suppression feature map; building a feature-enhanced multi-scale vehicle detection module to obtain all the Vehicle detection result box; intercept the image in each vehicle detection result box in the image frame Frm(t), respectively, to obtain z detection vehicle maps; input each detected vehicle map and the target vehicle map S into the multi-feature joint vehicle search The network performs feature matching to obtain the detection vehicle map where the target vehicle is located; the position of the detection vehicle map in the image frame Frm(t) is the position of the target vehicle in the image frame Frm(t), thus completing the retrieval of the target vehicle position. The invention provides a UAV video vehicle retrieval method based on vehicle feature matching, which is suitable for videos shot by UAVs in different complex scenes, and minimizes the lack of vehicle detail information caused by illumination and the target size at different heights of the UAV. The impact of changes can solve the problem that the vehicle to be queried is difficult to find among many targets, and the vehicle to be queried can be retrieved more accurately.

本发明提供一种基于车辆特征匹配的无人机视频车辆检索方法，参考图1，包括以下步骤：The present invention provides a UAV video vehicle retrieval method based on vehicle feature matching. Referring to FIG. 1, the method includes the following steps:

具体的，无人机视频数据中经常存在由于拍摄期间光线较强，使得视频中图像的亮度过高，车辆检索方法难以从视频图像中提取有效信息，导致漏检问题。因此，本发明采用抑光模型，抑光模型采用光线正常图像和光线过亮图像组成的影像对训练，使抑光模型能够对亮度过高图像进行光照特征抑制，提高光线过强时的检测精度。Specifically, in the video data of drones, the brightness of the images in the video is often too high due to the strong light during the shooting period, and it is difficult for the vehicle retrieval method to extract valid information from the video images, resulting in the problem of missed detection. Therefore, the present invention adopts a light suppression model, and the light suppression model adopts the image pair training consisting of a normal light image and an excessively bright image, so that the light suppression model can suppress the illumination feature of the image with excessive brightness, and improve the detection accuracy when the light is too strong .

步骤4具体为：Step 4 is specifically:

步骤4.1，构建抑光模型；Step 4.1, build a light suppression model;

如图2所示，为抑光模型的结构图；所述抑光模型为双分支网络，包括学习分支网络和抑制分支网络；其中，所述学习分支网络包括串联的卷积层conv1、浅层特征挑选层f₁()和深层特征挑选层f₂()；所述抑制分支网络包括串联的卷积层conv1′、浅层特征挑选层f′₁()和深层特征挑选层f′₂()；As shown in Figure 2, it is the structure diagram of the light suppression model; the light suppression model is a double branch network, including a learning branch network and a suppression branch network; wherein, the learning branch network includes a convolutional layer conv1 connected in series, a shallow layer feature selection layer f ₁ ( ) and deep feature selection layer f ₂ ( ); the suppression branch network includes a concatenated convolution layer conv1′, a shallow feature selection layer f′ ₁ ( ) and a deep feature selection layer f′ ₂ ( );

作为一种具体实现方式，初始未训练时，学习分支网络和抑制分支网络的网络结构相同，每侧的分支网络结构如下表所示：As a specific implementation method, when the initial training is not performed, the network structures of the learning branch network and the suppression branch network are the same. The branch network structure on each side is shown in the following table:

表1：抑光模型主干网络卷积核参数Table 1: Light suppression model backbone network convolution kernel parameters

每个特征挑选层f的结构如图3所示，包括：3个1*1的卷积核和2个3*3卷积核、两个最大值池化层(Maxpool)。The structure of each feature selection layer f is shown in Figure 3, including: three 1*1 convolution kernels, two 3*3 convolution kernels, and two maximum pooling layers (Maxpool).

其中：in:

代表：L2范数的平方；

Represents: the square of the L2 norm;

γ代表惩罚系数，通过人为设置控制

对抑光损失函数的影响，其值越大，

The effect on the suppression loss function, the larger the value, the

The greater the impact on the suppression loss function;

步骤4.4，通过对抑光模型进行离线训练，削弱抑制分支网络对于亮度特征的敏感程度，使抑制分支网络能够对无人机拍摄的亮度过高图像进行光照特征抑制，提高无人机视角下车辆细节特征的显著性；Step 4.4, through offline training of the light suppression model, the sensitivity of the suppression branch network to the brightness feature is weakened, so that the suppression branch network can suppress the light feature of the image with high brightness captured by the UAV, and improve the vehicle under the perspective of the UAV. The salience of detail features;

综上所述，为了使抑光模型具有抑光能力，抑光模型采用双分支网络，每侧的分支网络包括：一个卷积层和两个特征挑选层；其中，由于无人机视频中物体较多，采用两个特征挑选层分别对图像的浅层特征与深层特征进行光照特征抑制，增强网络学习能力，提高抑制效果。To sum up, in order to make the light suppression model have light suppression ability, the light suppression model adopts a double branch network, and the branch network on each side includes: a convolution layer and two feature selection layers; If there are many, two feature selection layers are used to suppress the illumination features of the shallow features and deep features of the image respectively, so as to enhance the network learning ability and improve the suppression effect.

抑光模型在线检测时，仅采用抑制分支网络，如图4所示，为抑制分支网络的结构图。When the light suppression model is detected online, only the suppression branch network is used, as shown in Figure 4, which is the structure diagram of the suppression branch network.

抑制分支网络对图像帧Frm(t)的处理方法，如图3所示，具体为：The processing method of the image frame Frm(t) by the suppression branch network, as shown in Figure 3, is as follows:

1)图像帧Frm(t)通过conv1层得到低维特征图F_LowMap；目的是为了将输入图像不同层的特征信息进行融合，提高物体特征的显著性。1) The image frame Frm(t) obtains the low-dimensional feature map F _Low Map through the conv1 layer; the purpose is to fuse the feature information of different layers of the input image and improve the saliency of object features.

2)将低维特征图F_LowMap输入到浅层特征挑选层f₁()，低维特征图F_LowMap经过conv2₁和conv2₂后，一方面，输出中维特征图F_MidMap；另一方面，继续经过conv2₃和conv2₄后，输出高维特征图F_highMa_p；2) Input the low-dimensional feature map F _Low Map to the shallow feature selection layer f ₁ (), and after the low-dimensional feature map F _Low Map passes through conv2 ₁ and conv2 ₂ , on the one hand, output the mid-dimensional feature map F _Mid Map; On the one hand, after going through conv2 ₃ and _conv2 ₄ , output the high-dimensional feature map F _hig hMap ;

3)将低维特征图F_LowMap经过3*3最大值池化，得到特征图F′_LowMap；将中维特征图F_MidMap经过2*2的最大值池化，得到特征图F′_MidMap；3) The low-dimensional feature map F _Low Map is subjected to 3*3 maximum pooling to obtain the feature map F' _Low Map; the mid-dimensional feature map F _Mid Map is subjected to 2*2 maximum pooling to obtain the feature map F' _Mid Map;

其中，经过本步骤操作，对低维特征图F_LowMap和中维特征图F_MidMap进行尺寸缩放处理，使处理后的特征图F′_LowMap和特征图F′_MidMap，与高维特征图F_highMap尺寸相同；Among them, after this step, the low-dimensional feature map F _Low Map and the mid-dimensional feature map F _Mid Map are scaled, so that the processed feature map F' _Low Map and feature map F' _Mid Map are the same as the high-dimensional feature map. Figure F _high Map has the same size;

4)将特征图F′_LowMap、特征图F′_MidMap和高维特征图F_highMap进行串联，经过conv2₅卷积后，输出多维特征图F_MultiMap；4) Connect the feature map F' _Low Map, the feature map F' _Mid Map and the high-dimensional feature map F _high Map in series, and after conv2 ₅ convolution, output the multi-dimensional feature map F _Multi Map;

具体的，由于无人机的机载能力较弱，且拍摄的视频图像中车辆目标较小，多维特征图融合不仅减少计算量，还通过融合提高不同维度特征的利用率，获得更多的物体特征信息。Specifically, due to the weak airborne capability of the UAV and the small vehicle targets in the captured video images, the multi-dimensional feature map fusion not only reduces the amount of calculation, but also improves the utilization of different dimensional features through fusion, and obtains more objects. characteristic information.

其中：浅层特征挑选层f₁()属于浅层特征，其对物体的纹理和形状特征敏感，可以抑制图像中大部分区域的亮度，但由于无人机视野宽广，会有许多非物体的纹理、几何特征，这些特征会干扰亮度抑制。因此，经过浅层特征挑选层f₁()后，需要再经过深层特征挑选层f₂()进行特征处理。Among them: the shallow feature selection layer f ₁ ( ) belongs to the shallow feature, which is sensitive to the texture and shape features of the object and can suppress the brightness of most areas in the image, but due to the wide field of view of the drone, there will be many non-object features. Texture, geometric features that interfere with luminance suppression. Therefore, after going through the shallow feature selection layer f ₁ ( ), it needs to go through the deep feature selection layer f ₂ ( ) for feature processing.

5)将多维特征图F_MultiMap作为新的特征图F_LowMap，输入到深层特征挑选层f₂()中，重复以上步骤，得到光照抑制特征图F_RestrainMap。5) Use the multi-dimensional feature map F _Multi Map as a new feature map F _Low Map, and input it into the deep feature selection layer f ₂ ( ), and repeat the above steps to obtain the light suppression feature map F _Restrain Map.

随着网络深度的增加，深层特征挑选层f₂()的深度特征对于语义特征更加敏感，可以有效抑制非物体纹理、几何特征带来的干扰，弥补浅层特征挑选层f₁()的不足。With the increase of network depth, the deep features of the deep feature selection layer f ₂ ( ) are more sensitive to semantic features, which can effectively suppress the interference caused by non-object texture and geometric features, and make up for the deficiencies of the shallow feature selection layer f ₁ ( ). .

具体的，由于无人机视频图像中存在大量相似外观的物体，例如，路边矩形的电箱、长形状的遮阳伞等。为了使车辆更加显著，本发明提出了特征增强的多尺度车辆检测模块，从而更有利于提取到车辆。Specifically, there are a large number of objects with similar appearances in the UAV video images, such as roadside rectangular electric boxes, long-shaped sunshades, etc. In order to make the vehicle more salient, the present invention proposes a feature-enhanced multi-scale vehicle detection module, which is more beneficial to extract the vehicle.

同时，由于无人机视频图像中的车辆大小会随着无人机高度变化而变化，当无人机飞行高度较低时，车辆在视频图像中外观较大，浅层网络由于感受野不够大造成漏检、误检；当无人机飞行高度过高，车辆外观过小，深层网络由于过度卷积会造成信息丢失，造成漏检，对此，本发明设计了基于层级检测的大小目标检测方法，实现对各种大小不同的车辆的检测，提高视频图像中车辆检测的全面性，避免漏检。At the same time, since the size of the vehicle in the UAV video image will change with the height of the UAV, when the UAV flying height is low, the vehicle will appear larger in the video image, and the shallow network is not large enough due to the receptive field. Cause missed detection and false detection; when the flying height of the drone is too high, the appearance of the vehicle is too small, and the deep network will cause information loss due to excessive convolution, resulting in missed detection. In this regard, the present invention designs a large and small target detection based on hierarchical detection. The method realizes the detection of vehicles of different sizes, improves the comprehensiveness of vehicle detection in video images, and avoids missed detection.

其中：in:

其中，小目标响应层可以采用一个1*1的卷积层。1*1的卷积层的目的是降低特征图深度，提高小目标检测的成功率。Among them, the small target response layer can use a 1*1 convolutional layer. The purpose of the 1*1 convolutional layer is to reduce the depth of the feature map and improve the success rate of small target detection.

其中，大目标响应层可以采用2个3*3的卷积层。通过两个3*3的卷积层增大感受野，提高大目标检测的成功率。同时，在感受野相同的前提下，两个3*3卷积层比一个5*5卷积层的计算量更少。Among them, the large target response layer can use two 3*3 convolutional layers. The receptive field is increased by two 3*3 convolutional layers, and the success rate of large target detection is improved. At the same time, under the premise of the same receptive field, two 3*3 convolutional layers require less computation than one 5*5 convolutional layer.

具体方法为：The specific method is:

例如，以每个锚点为中心，生成6个大小不同的候选框；其中，当进行小目标检测时，可以按1∶1，1∶2，2∶1的长和宽的比例，生成3个面积为8的候选框，以及，生成3个面积为16的候选框。For example, taking each anchor point as the center, 6 candidate boxes of different sizes are generated; among them, when small target detection is performed, 3 A candidate box with an area of 8, and 3 candidate boxes with an area of 16 are generated.

当进行大目标检测时，可以按1∶1，1∶2，2∶1的长和宽的比例，生成3个面积为32的候选框，以及，生成3个面积为64的候选框。When performing large target detection, three candidate boxes with an area of 32 and three candidate boxes with an area of 64 can be generated according to the ratio of length and width of 1:1, 1:2, and 2:1.

1∶1，1∶2，2∶1的长和宽的比例，是根据无人机视窗中车辆的长宽比特点设定。The ratio of length and width of 1:1, 1:2, 2:1 is set according to the aspect ratio of the vehicle in the drone window.

例如，采用1个1*1卷积层，将候选框重塑为一个1维向量，再利用sigmoid函数计算候选框的车辆概率值。For example, use a 1*1 convolutional layer to reshape the candidate frame into a 1-dimensional vector, and then use the sigmoid function to calculate the vehicle probability value of the candidate frame.

步骤5.5.3，对候选框进行筛选，去除车辆概率值低于预设阈值的候选框，例如，阈值设定为0.6，从而得到候选框：A₁，A₂...A_p；其中，p代表候选框数量；Step 5.5.3: Screen the candidate frames, and remove the candidate frames whose vehicle probability value is lower than the preset threshold. For example, the threshold is set to 0.6, so as to obtain the candidate frames: A ₁ , A ₂ . . . A _p ; wherein, p represents the number of candidate boxes;

例如，两个候选框之间的Jaccard相似系数Ja＞0.8，则进行合并操作。For example, if the Jaccard similarity coefficient Ja>0.8 between the two candidate frames, then the merge operation is performed.

假设两个候选框分别表示为：Box_Small(1)，Box_Large(1)，则采用下式，计算Jaccard相似系数：Assuming that the two candidate boxes are respectively expressed as: Box _Small (1), Box _Large (1), the following formula is used to calculate the Jaccard similarity coefficient:

步骤5.6中，将两个车辆检测结果框合并为一个车辆检测结果框，具体为：In step 5.6, the two vehicle detection result frames are combined into one vehicle detection result frame, specifically:

步骤7中，多特征联合车辆搜索网络建立方式为：In step 7, the multi-feature joint vehicle search network is established in the following manner:

步骤7具体为：Step 7 is as follows:

步骤7.2.1.2，将共享特征图F_ShrMap输入到车辆颜色特征层，得到车辆颜色特征向量V_Color；其中，所述车辆颜色特征层包括conv4_Color、最大池化层Maxpool和全连接层FC_Color；Step 7.2.1.2, the shared feature map F _Shr Map is input to the vehicle color feature layer, and the vehicle color feature vector V _Color is obtained; wherein, the vehicle color feature layer includes conv4 _Color , maximum pooling layer Maxpool and fully connected layer FC _Color ;

步骤7.3.3中，为使稀疏特征图能够充分表达车辆子块特征图F_subMap(m)中的特征，减少压缩过程中的信息损失，在训练时采用压缩损失函数Loss_sparse：In step 7.3.3, in order to make the sparse feature map fully express the features in the vehicle sub-block feature map F _sub Map(m) and reduce the information loss in the compression process, the compression loss function Loss _sparse is used during training:

式中：where:

将每个稀疏特征图F_sparseMap(m)输入到全连接层FC_sight，通过非极大值抑制，得到该车辆子块图的视角类别；其中，视角类别包括侧视图、前视图、后视图、顶视图和非车辆视图五类；Input each sparse feature map F _spars eMap(m) to the fully connected layer FC _sight , and obtain the view category of the vehicle sub-block map through non-maximum suppression; the view category includes side view, front view, and rear view , top view and non-vehicle view five categories;

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.

Claims

1. An unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching is characterized by comprising the following steps:

step 1, determining a target vehicle map S to be retrieved;

step 2, shooting the ground by the unmanned aerial vehicle to obtain video data of the unmanned aerial vehicle;

step 3, executing steps 4 to 8 on each frame of image of the unmanned aerial vehicle video data, and judging whether each frame of image contains a target vehicle image S to be retrieved:

recording the current image frame as Frm (t), wherein t is the frame number of the current image frame, and judging whether the image frame Frm (t) contains a target vehicle map S to be searched by adopting the following steps 4-8:

step 4, inputting the image frame frm (t) into a trained light suppression model, and performing feature extraction and light suppression processing to obtain an illumination suppression feature map comprising n image layers, which is marked as F_RestrainMap；

The light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolution layer conv1 and a shallow feature selection layer f which are connected in series₁() And deep layer characteristic selection layer f₂() (ii) a The suppression branch network comprises convolution layers conv1 ' and shallow feature selection layers f ' connected in series '₁() And deep characteristic choosing layer f'₂()；

Step 5, the illumination inhibition characteristic diagram F_RestrainMap is input into a multi-scale vehicle detection module with enhanced features, and z vehicle detection result frames in an image frame frm (t) are acquired:

step 5.1, light inhibition feature map F_RestrainMap has n layers, and for each layer, it is expressed as: layer_iAnd i is 1, … n, executing the steps 5.1.1-5.1.3 to obtain the layer_iDependent weight value w ″)_i：

Step 5.1.1, calculating layer_iThe average value of all the pixel points is used as the layer_iInitial weight w of_i；

Step 5.1.2, layer of the graphic layer_iInitial weight w of_iInputting the initial weight w into a full connection layer, and activating a function through sigmoid_iMapping to (0, 1) feature space, thereby outputting layer_iNormalized weight value w'_i；

Step 5.1.3, establishing a piecewise function and carrying out layer matching_iOf normalized weight value w'_iPerforming segmented suppression or enhancement to obtain the layer_iDependent weight value w ″)_i：

Wherein:

epsilon represents a system constant and is used for adjusting the influence degree of the dependence weight value on the layer;

step 5.2, obtaining an illumination inhibition characteristic diagram F_RestrainThe dependency weighted values of the n layers of the Map are respectively as follows: w ″)₁…w″_n；

Will w ″)₁…w″_nCombining to obtain an illumination inhibition characteristic diagram F_Restrain1 x n dependent weight vector W "for Map;

using the dependent weight vector W' as convolution kernel to check the illumination inhibition characteristic diagram F_RestrainThe Map is convoluted to obtain a layer enhancement feature Map F_EhcMap；

Step 5.3, enhancing feature map F of image layer_EhcInputting Map into small target response layer to obtain small target significant feature graph F_SmallMap；

Wherein, the small target significant feature map F_SmallThe Map contains more vehicle detail information, and the success rate of small target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is higher;

step 5.4, a small target salient feature map F_SmallInputting Map into large target response layer to obtain large target significant characteristic diagram F_LargeMap；

Wherein: large target significant feature map F_LargeThe Map contains more semantic information, so that the accuracy rate of large target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is low;

step 5.5, a small target salient feature map F_SmallMap is input to the result frame generation layer, so that in the image frame frm (t), p small target vehicle detection result frames Box are obtained_Small(1)…Box_Small(p)；

Drawing F for salient features of large target_LargeMap is input to the result frame generation layer, so that q large target vehicle detection result frames Box are obtained in the image frame frm (t)_Large(1)…Box_Large(q)；

The specific method comprises the following steps:

step 5.5.1, a small target salient feature map F_SmallEach pixel point in the Map is used as an anchor point, and a plurality of candidate frames with different sizes are generated by taking each anchor point as a center; thus, for a small target salient feature map F_SmallAll pixel points in the Map obtain a plurality of candidate frames;

step 5.5.2, calculating to obtain the vehicle probability value of each candidate box;

and 5.5.3, screening the candidate frames, and removing the candidate frames with the vehicle probability value lower than a preset threshold value to obtain the candidate frames: a. the₁,A₂…A_p(ii) a Wherein p represents the number of candidate boxes;

step 5.5.4 calculating candidate Box A₁,A₂…A_pThe regression parameters of each candidate box in the list, each candidate box having the following regression parameters: width, height, and anchor point offset;

step 5.5.5, candidate frame A₁,A₂…A_pThe anchor point coordinates of each candidate frame and the regression parameters corresponding to the anchor point coordinates are mapped back to the image frame Frm (t), so that p small target vehicle detection result frames Box are obtained in the image frame Frm (t)_Small(1)…Box_Small(p)；

Step 5.5.6, toLarge target significant feature map F_LargeMap substitution small target salient feature Map F in step 5.5.1_SmallMap, increasing the initial generation size of the candidate frame in step 5.5.1, and obtaining q large target vehicle detection result frames Box in the image frame frm (t) by adopting the method of steps 5.5.1-5.5.5_Large(1)…Box_Large(q)；

Step 5.6, detecting result frames Box of p small target vehicles in the image frames Frm (t)_Small(1)…Box_Small(p) and q large target vehicle detection result boxes Box_Large(1)…Box_Large(q), collectively referred to as p + q vehicle detection result frames;

calculating a similarity coefficient between any two vehicle detection result frames for the p + q vehicle detection result frames obtained in the image frame frm (t), and if the similarity coefficient is smaller than a set threshold, not performing processing; if the similarity coefficient is larger than the set threshold, combining the two vehicle detection result frames into one vehicle detection result frame, and finally obtaining z vehicle detection result frames, wherein the z vehicle detection result frames are represented as: box (1) … Box (z);

step 6, respectively intercepting images in each vehicle detection result frame in an image frame Frm (t) to obtain z detection vehicle images;

step 7, inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; the position of the detected vehicle map in the image frame frm (t) is the position of the target vehicle in the image frame frm (t), so that the retrieval and positioning of the target vehicle are completed;

the multi-feature combined vehicle search network comprises a global feature recognition module and a local feature matching module; the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer; the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FC_sight；

Step 8, if the matching degrees of all the detected vehicle maps and the target vehicle map S in the current image frame frm (t) are lower than the set threshold, that is, the target vehicle does not exist in the current image frame frm (t), the image frame Frm (t +1) at the next time is continuously retrieved.

2. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching according to claim 1, wherein the step 4 specifically comprises:

step 4.1, constructing a light suppression model;

the light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolution layer conv1 and a shallow feature selection layer f which are connected in series₁() And deep layer characteristic selection layer f₂() (ii) a The suppression branch network comprises convolution layers conv1 ' and a shallow feature selection layer f ' which are connected in series '₁() And deep characteristic choosing layer f'₂()；

Step 4.2, obtaining a group of training sample pairs;

each group of training sample pairs comprises a normal light image I and an over-bright light image I' under the visual angle of the unmanned aerial vehicle; the light over-bright image I' is obtained by randomly adding a brightness value to the light normal image I; the a groups of training sample pairs are respectively expressed as: (I)₁，I′₁),(I₂，I′₂),...,(I_a，I′_a)；

Step 4.3, performing off-line training on the light-inhibiting model constructed in the step 4.1 by adopting a group of training samples, wherein the target function of the off-line training is as follows:

wherein:

Loss_{light suppression}Representing a light loss suppressing function;

argmin () represents the value of the variable at which the target function takes the minimum value;

f′₁(I′_j) Represents light over-bright image I'_jInput to shallow feature chosen layer f'₁() Then outputting a shallow layer characteristic value;

f′₂(I′_j) Representing too bright lightImage I'_jInputting into deep layer characteristic choosing layer f'₂() Then, outputting the deep characteristic value;

f₁(I_j) Representing normal light images I_jInput to the shallow feature selection layer f₁() Then outputting a shallow layer characteristic value;

f₂(I_j) Representing normal light images I_jInputting into deep characteristic selection layer f₂() Then, outputting the deep characteristic value;

represents: the square of the L2 norm;

gamma represents a penalty coefficient and is controlled by artificial setting

The effect on the light loss suppressing function, the larger its value,

the greater the effect on the light loss suppressing function;

4.4, the sensitivity of the suppression branch network to the brightness characteristics is weakened by performing off-line training on the light suppression model, so that the suppression branch network can perform illumination characteristic suppression on the image with overhigh brightness, which is shot by the unmanned aerial vehicle, and the significance of the detail characteristics of the vehicle at the view angle of the unmanned aerial vehicle is improved;

therefore, the image frame frm (t) is input to the suppression branch network of the trained light suppression model to obtain the light suppression feature map F_RestrainMap。

3. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 5.6, two vehicle detection result frames are combined into one vehicle detection result frame, specifically:

setting two vehicle detection result frames which need to be combined as follows: vehicle test result Box Box_Small(1) And a vehicleTest result Box Box_Large(1) (ii) a The merged vehicle detection result Box is denoted as Box (1), and then:

the center point of Box (1) is Box_Small(1) Center point and Box_Large(1) The middle point of the central point connecting line;

height of Box (1), Box_Small(1) Height and Box_Large(1) An average value of the heights;

the width of Box (1) is Box_Small(1) Width and Box_Large(1) Average value of the width.

4. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 7, the multi-feature joint vehicle search network establishment method is as follows:

and establishing a multi-feature joint vehicle search network by taking the vehicle color feature and the vehicle type feature as vehicle global features and taking the vehicle side view, the vehicle front view, the vehicle rear view, the vehicle top view and the non-vehicle view as vehicle local features.

5. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching according to claim 4, wherein the step 7 specifically comprises:

step 7.1, constructing a multi-feature united vehicle search network; the multi-feature combined vehicle search network comprises a global feature identification module and a local feature matching module;

step 7.2, inputting the z detected vehicle images and the target vehicle image S into a global feature recognition module respectively, and obtaining z' suspected vehicle images with the same color and the same vehicle type as the target vehicle image S by adopting the following method;

the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer;

step 7.2.1, identifying the color characteristics of the target vehicle map S, comprising the steps of:

step 7.2.1.1, inputting the target vehicle map S into the shared characteristic layer to obtain a shared characteristic map F_ShrMap；

Step 7.2.1.2, sharing the feature map F_ShrMap is input to the vehicle color feature layer to obtain a vehicle color feature vector V_Color(ii) a Wherein the vehicle color feature layer comprises conv4_ColorMax pooling layer Maxpool and full junction layer FC_Color；

7.2.1.3, adopting the matrix broadcast mode to make the vehicle color feature vector V_ColorAnd sharing the profile F_ShrMultiplying Map to obtain color sensitive characteristic diagram F_ColorMap；

Step 7.2.1.4, color sensitive feature map F_ColorMap is a convolution kernel, and the target vehicle Map S is subjected to mutual convolution to obtain a color feature enhancement Map S'_ColorEnhancing the response degree of the target vehicle map S to the color features;

step 7.2.1.5, enhancing the color feature by map S'_ColorSequentially input to a shared feature layer, Conv4_Color、Conv5_ColorThe maximum value pooling layer and the full connection layer are used for obtaining the color type of the target vehicle image S through a non-maximum value suppression algorithm;

step 7.2.2, obtaining the vehicle type of the target vehicle map S by adopting the same method, and further obtaining the color type and the vehicle type of each detected vehicle map;

step 7.2.3, judging whether a detected vehicle image with the same color and the same vehicle type as the target vehicle image S exists in the z detected vehicle images, and if not, directly searching the next frame of image;

if yes, extracting all the detected vehicle maps with the same color and the same vehicle type as the target vehicle map S, and assuming that z 'are extracted in total, referring the extracted z' detected vehicle maps as suspected vehicle maps, and representing that: suspected vehicle map D_cWherein, c is 1 … z';

step 7.3, the target vehicle map S and each suspected vehicle map D_cRespectively input into a local feature matching module, and the local feature matching module obtains a vehicle mean vector matrix V of a target vehicle map S by adopting a matching algorithm_s；

Local feature matchingThe same matching algorithm is adopted by the modules to obtain each suspected vehicle image D_cIs suspected of vehicle mean vector matrix V_c；

Wherein the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FC_sight；

The local feature matching module performs feature matching on the target vehicle image S to obtain a vehicle mean vector matrix V of the target vehicle image S_sThe method specifically comprises the following steps:

step 7.3.1, performing grid segmentation on the target vehicle map S through 4-by-4 grids to obtain 16 vehicle sub-block maps;

step 7.3.2, respectively inputting each vehicle sub-block map into the feature extraction layer to obtain corresponding vehicle sub-block feature maps F_subMap(m)，m＝1…16；

Step 7.3.3, each vehicle sub-block feature map F_subMap (m) is input to the feature sparse convolution layer Conv6 to obtain the corresponding sparse feature map F_sparseMap(m)；

And 7.3.4, determining the view angle type of the vehicle sub-block map:

each sparse feature map F_sparseMap (m) input to full connection layer FC_sightObtaining the view angle type of the vehicle sub-block map through non-maximum value suppression; wherein the view angle categories comprise five categories of side view, front view, rear view, top view and non-vehicle view;

step 7.3.5, determining the view angle vector of the view angle category of the vehicle sub-block diagram:

if the view angle categories are side view, front view, back view and top view, extracting each sparse feature map F_sparseMap (m), remolding the features into one-dimensional feature vectors, wherein the one-dimensional feature vectors are used as view angle vectors corresponding to the vehicle sub-block diagram; wherein the view vector is divided according to view categories, including: a side view vector, a front view vector, a rear view vector, and a top view vector;

if the visual angle category is a non-vehicle view, discarding;

step 7.3.6, determine the view mean vector for each view category:

obtaining a view angle vector mean value of each vehicle sub-block map of the same view angle category in the target vehicle map S, and respectively obtaining a side view angle mean value vector, a front view angle mean value vector, a rear view angle mean value vector and a top view angle mean value vector;

if a certain visual angle type does not exist, the visual angle mean vector does not exist, and all elements of the visual angle mean vector are set to be 0;

thus, a view mean vector V for the four view classes is obtained_cl(ii) a Where cl is 1,2,3, and 4, and represents a side view mean vector V₁Mean vector V of front view angle₂Mean vector V of rear view angle₃And top view mean vector V₄(ii) a View mean vector V for four view classes_clA vehicle mean vector matrix V constituting a target vehicle map S_s；

Correspondingly, each suspected vehicle image D is obtained_cIs a suspected vehicle mean vector V 'of the four view angle categories'_clTo construct a suspected vehicle map D_cIs suspected of vehicle mean vector matrix V_c；

Step 7.4, calculating a target vehicle map S and each suspected vehicle map D_cThe number Num of the viewing angle mean value vectors of the common viewing angle category is obtained by adopting the following formula to obtain a vehicle map D corresponding to each suspected vehicle_cCorresponding feature matching value Match;

wherein, lambda is the weight of the number of the visual angle mean vectors; t represents transposition; tr represents the trace of the matrix and represents the sum of the main diagonal elements of the matrix;

step 7.5, when a plurality of suspected vehicle images D exist_cWhen the feature matching value Match is higher than the threshold value, the non-maximum value suppression method is used to suppress the plurality of suspected vehicle images D_cDetermining a suspected vehicle map of the target vehicle, wherein the position of the suspected vehicle map in the image frame Frm (t) is the position of the target vehicle in the image frame Frm (t);

when the target vehicleMap S and all suspect vehicles map D_cIf the feature matching value Match of (1) is lower than the threshold value, the target vehicle is not included in the image frame frm (t).

6. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 7.3.3, in order to make the sparse feature map sufficiently express the vehicle sub-block feature map F_subFeatures in map (m) reduce information Loss in compression process, and compression Loss function Loss is adopted in training_sparse：

Loss_sparse＝Min(F_subMap(m)-(F_sparseMap(m)*W_Tran))

In the formula:

W_Tranare upsampled weights obtained by deconvolution.