CN114220053B - Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching - Google Patents
Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching Download PDFInfo
- Publication number
- CN114220053B CN114220053B CN202111534212.4A CN202111534212A CN114220053B CN 114220053 B CN114220053 B CN 114220053B CN 202111534212 A CN202111534212 A CN 202111534212A CN 114220053 B CN114220053 B CN 114220053B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- map
- feature
- layer
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which comprises the following steps: inputting the image frames into a trained light suppression model and feature-enhanced multi-scale vehicle detection module, and acquiring a plurality of vehicle detection result frames, namely respectively capturing images in each vehicle detection result frame in the image frames to obtain z detected vehicle images; inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; thereby completing retrieval and positioning of the target vehicle. The method is suitable for videos shot by the unmanned aerial vehicle in different complex scenes, the influences of insufficient vehicle detail information caused by illumination and target size change of the unmanned aerial vehicle at different heights are removed to the maximum extent, the problem that the vehicle to be inquired is difficult to find in a plurality of targets is solved, and the vehicle to be inquired can be more accurately retrieved.
Description
Technical Field
The invention belongs to the technical field of intelligent remote sensing information processing, and particularly relates to an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching.
Background
The ground monitoring video acquires road information by installing fixed cameras at key places such as crossroads, high-speed intersections and the like, and has the advantages of all weather and small environmental influence. The vehicle retrieval system based on the ground monitoring video mainly comprises: (1) the conventional vehicle retrieval method: the traditional vehicle retrieval method extracts the detail characteristics of a target vehicle through algorithms such as visual word bag and deep hash, such as Haar characteristics, SIFT characteristics and HOG characteristics, and has limited vehicle characterization capability, so that similar vehicles have weak distinguishing capability; (2) the vehicle retrieval method based on deep learning comprises the following steps: the vehicle retrieval method based on deep learning trains the neural network through a large amount of sample data, so that the neural network can extract vehicle features, and further the vehicle retrieval task is completed. The retrieval method extracts semantic information of vehicles such as Faster R-CNN, YOLOV3 and SPP-NET based on a classical target detection network, and has high retrieval precision in a simple scene.
However, as the shooting angle of the ground monitoring camera is inclined, the video contains vehicles with various scales, so that the vehicles are missed to be detected during detection, and the accuracy of vehicle retrieval is further influenced. Meanwhile, because the camera position is fixed, the vehicle to be searched only appears on the picture temporarily, and the help to the follow-up tracking task is limited.
Different from ground monitoring, the unmanned aerial vehicle has the advantages of low cost, rapid deployment, flexible maneuvering, wide monitoring range and the like, and the vehicle retrieval of the unmanned aerial vehicle monitoring video can not only rapidly realize the retrieval task of vehicles at any intersections, but also continue to realize the tasks of tracking target vehicles and the like after the retrieval is successful. However, since the sizes of all vehicles in the video can change along with the height of the unmanned aerial vehicle, when the candidate frame is unreasonably designed and the network is too deep, the vehicles with too large or too small scales are missed to be detected due to the problems that the regression capability of the candidate frame is insufficient and the target information is diluted. Simultaneously, because of unmanned aerial vehicle uses under the better condition of weather usually, often can have luminance in the video too high and lead to the vehicle detail to lose the problem, lead to the vehicle in the video luminance too high region to be missed and examined. .
Disclosure of Invention
Aiming at the problem of missed detection when the prior art is directly applied to unmanned aerial vehicle video vehicle retrieval, the invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which can effectively solve the problem.
The technical scheme adopted by the invention is as follows:
the invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which comprises the following steps:
step 1, determining a target vehicle map S to be retrieved;
recording the current image frame as Frm (t), wherein t is the frame number of the current image frame, and judging whether the image frame Frm (t) contains a target vehicle map S to be searched by adopting the following steps 4-8:
step 4, inputting the image frame frm (t) into a trained light suppression model, and performing feature extraction and light suppression processing to obtain an illumination suppression feature map comprising n image layers, which is marked as FRestrainMap;
Step 5, an illumination inhibition characteristic diagram FRestrainMap is input into a multi-scale vehicle detection module with enhanced features, and z vehicle detection result frames in an image frame frm (t) are acquired:
step 5.1, light inhibition feature map FRestrainMap has n layers, and for each layer, it is expressed as: layeriN, each executing step 5.1.1 to step 5.1.3 to obtain an image layeriDependent weight value w ″)i:
Step 5.1.1, calculating layeriThe average value of all the pixel points is used as the layeriInitial weight w ofi;
Step 5.1.2, layer of the graphic layeriInitial weight w ofiInputting the initial weight w into a full connection layer, and activating a function through sigmoidiMapping to (0, 1) feature space, thereby outputting layeriNormalized weight value w'i;
Step 5.1.3, establishing a piecewise function and carrying out layer matchingiOf normalized weight value w'iPerforming segmented suppression or enhancement to obtain the layeriDependent weight value w ″)i:
Wherein:
epsilon represents a system constant and is used for adjusting the influence degree of the dependence weight value on the layer;
step 5.2, obtaining an illumination inhibition characteristic diagram FRestrainThe dependency weighted values of the n layers of the Map are respectively as follows: w ″)1...w″n;
Will w ″)1...w″nCombining to obtain an illumination inhibition characteristic diagram FRestrain1 x n dependent weight vector W "for Map;
using the dependent weight vector W' as convolution kernel to check the illumination inhibition characteristic diagram FRestrainThe Map is convoluted to obtain a layer enhancement feature Map FEhcMap;
Step 5.3, enhancing feature map F of image layerEhcInputting Map to small target response layer to obtain small target significant feature Map FSmallMap;
Wherein, the small target significant feature map FSmallThe Map contains more vehicle detail information, and the success rate of small target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is higher;
step 5.4, a small target salient feature map FSmallInputting Map into large target response layer to obtain large target significant characteristic diagram FLargeMap;
Wherein: large target significant feature map FLargeThe Map contains more semantic information, so that the accuracy rate of large target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is low;
step 5.5, a small target salient feature map FSmallMap is input to the result frame generation layer, so that in the image frame frm (t), p small target vehicle detection result frames Box are obtainedSmall(1)...BoxSmall(p);
Drawing F for salient features of large targetLargeMap is input to the result frame generation layer, so that q large target vehicle detection result frames Box are obtained in the image frame frm (t)Large(1)...BoxLarge(q);
The specific method comprises the following steps:
step 5.5.1, a small target salient feature map FSmallEach pixel point in the Map is used as an anchor point, and a plurality of candidate frames with different sizes are generated by taking each anchor point as a center; thus, for a small target salient feature map FSmallAll pixel points in the Map obtain a plurality of candidate frames;
step 5.5.2, calculating to obtain the vehicle probability value of each candidate box;
and 5.5.3, screening the candidate frames, and removing the candidate frames with the vehicle probability value lower than a preset threshold value to obtain the candidate frames: a. the1,A2...Ap(ii) a Wherein p represents the number of candidate boxes;
step 5.5.4 calculate candidate box A1,A2...ApThe regression parameters of each candidate box in (1), each candidate box having the following regression parameters: width, height, and anchor point offset;
step 5.5.5, candidate Box A1,A2...ApThe anchor point coordinates of each candidate frame and the regression parameters corresponding to the anchor point coordinates are mapped back to the image frame Frm (t), so that p small target vehicle detection result frames Box are obtained in the image frame Frm (t)Small(1)...BoxSmall(p);
Step 5.5.6, using a large target significant feature map FLargeMap substitution of the small target salient feature Map F in step 5.5.1SmallMap, increasing the initial generation size of the candidate frame in step 5.5.1, and obtaining q large target vehicle detection result frames Box in the image frame frm (t) by adopting the method of steps 5.5.1-5.5.5Large(1)...BoxLarge(q);
Step 5.6, detecting result frames Box of p small target vehicles in the image frames Frm (t)Small(1)...BoxSmall(p) and q large target vehicle detection result boxes BoxLarge(1)...BoxLarge(q), collectively referred to as p + q vehicle detection result frames;
calculating a similarity coefficient between any two vehicle detection result frames for the p + q vehicle detection result frames obtained in the image frame frm (t), and if the similarity coefficient is smaller than a set threshold, not performing processing; if the similarity coefficient is larger than the set threshold, combining the two vehicle detection result frames into one vehicle detection result frame, and finally obtaining z vehicle detection result frames, wherein the z vehicle detection result frames are represented as: box (1.). Box (z);
step 6, respectively intercepting images in each vehicle detection result frame in an image frame Frm (t) to obtain z detection vehicle images;
step 7, inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; the position of the detected vehicle map in the image frame frm (t) is the position of the target vehicle in the image frame frm (t), so that the retrieval and positioning of the target vehicle are completed;
step 8, if the matching degrees of all the detected vehicle maps and the target vehicle map S in the current image frame frm (t) are lower than the set threshold, that is, the target vehicle does not exist in the current image frame frm (t), the image frame Frm (t +1) at the next time is continuously retrieved.
Preferably, step 4 specifically comprises:
step 4.1, constructing a light suppression model;
the light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolutional layer conv1 and a shallow feature selection layer f which are connected in series1() And deep layer characteristic selection layer f2() (ii) a The suppression branch network comprises convolution layers conv1 ' and shallow feature selection layers f ' connected in series '1() And deep characteristic choosing layer f'2();
Step 4.2, obtaining a group of training sample pairs;
each group of training sample pairs comprises a normal light image I and an over-bright light image I' under the visual angle of the unmanned aerial vehicle; the light over-bright image I' is obtained by randomly adding a brightness value to the light normal image I; the a groups of training sample pairs are respectively expressed as: (I)1,I′1),(I2,I′2),...,(Ia,I′a);
Step 4.3, performing off-line training on the light-inhibiting model constructed in the step 4.1 by adopting a group of training samples, wherein an objective function of the off-line training is as follows:
wherein:
Losslight suppressionRepresenting a light loss suppressing function;
argmin () represents the value of the variable at which the target function takes the minimum value;
f′1(I′j) Represents light over-bright image I'jInput to shallow feature chosen layer f'1() Then outputting a shallow layer characteristic value;
f′2(I′j) Represents light over-bright image I'jInputting into deep layer characteristic choosing layer f'2() Then, outputting the deep characteristic value;
f1(Ij) Representing normal light images IjInput to the shallow feature selection layer f1() Then outputting a shallow layer characteristic value;
f2(Ij) Representing normal light images IjInputting into deep characteristic selection layer f2() Then, outputting the deep characteristic value;
gamma represents a penalty coefficient and is controlled by artificial settingThe effect on the light loss suppressing function, the larger its value,the greater the effect on the light loss suppressing function;
4.4, the sensitivity of the suppression branch network to the brightness characteristics is weakened by performing off-line training on the light suppression model, so that the suppression branch network can perform illumination characteristic suppression on the image with overhigh brightness, which is shot by the unmanned aerial vehicle, and the significance of the detailed characteristics of the vehicle under the view angle of the unmanned aerial vehicle is improved;
therefore, the image frame frm (t) is input to the branch network for suppressing the light-suppressing model after training, and the light-suppression feature map F is obtainedRestrainMap。
Preferably, in step 5.6, the two vehicle detection result frames are combined into one vehicle detection result frame, specifically:
setting two vehicle detection result frames which need to be combined as follows: vehicle test result Box BoxSmall(1) And a vehicle detection result Box BoxLarge(1) (ii) a The merged vehicle detection result Box is denoted as Box (1), and then:
the center point of Box (1) is BoxSmall(1) Center point and BoxLarge(1) The middle point of the central point connecting line;
height of Box (1), BoxSmall(1) Height and BoxLarge(1) An average value of the heights;
the width of Box (1) is BoxSmall(1) Width and BoxLarge(1) Average value of the width.
Preferably, in step 7, the multi-feature combined vehicle search network establishment method is as follows:
and establishing a multi-feature joint vehicle search network by taking the vehicle color feature and the vehicle type feature as vehicle global features and taking the vehicle side view, the vehicle front view, the vehicle rear view, the vehicle top view and the non-vehicle view as vehicle local features.
Preferably, step 7 specifically comprises:
step 7.1, constructing a multi-feature united vehicle search network; the multi-feature combined vehicle search network comprises a global feature identification module and a local feature matching module;
step 7.2, inputting the z detected vehicle images and the target vehicle image S into a global feature recognition module respectively, and obtaining z' suspected vehicle images with the same color and the same vehicle type as the target vehicle image S by adopting the following method;
the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer;
step 7.2.1, identifying the color characteristics of the target vehicle map S, comprising the steps of:
step 7.2.1.1, inputting the target vehicle map S into the shared characteristic layer to obtain a shared characteristic map FShrMap;
Step 7.2.1.2, sharing the feature map FShrMap is input to the vehicle color feature layer to obtain a vehicle color feature vector VColor(ii) a Wherein the vehicle color feature layer comprises conv4ColorMax pooling layer Maxpool and full junction layer FCColor;
7.2.1.3, adopting the matrix broadcast mode to make the vehicle color feature vector VColorAnd sharing the profile FShrMultiplying Map to obtain color sensitive characteristic diagram FColorMap;
Step 7.2.1.4, color sensitive feature map FColorMap is a convolution kernel, and the target vehicle Map S is subjected to mutual convolution to obtain a color feature enhancement Map S'ColorEnhancing the response degree of the target vehicle map S to the color features;
step 7.2.1.5, enhancing the color feature by map S'ColorSequentially input to a shared feature layer, Conv4Color、Conv5ColorThe maximum value pooling layer and the full connection layer are used for obtaining the color type of the target vehicle image S through a non-maximum value suppression algorithm;
step 7.2.2, obtaining the vehicle type of the target vehicle map S by adopting the same method, and further obtaining the color type and the vehicle type of each detected vehicle map;
step 7.2.3, judging whether a detected vehicle image with the same color and the same vehicle type as the target vehicle image S exists in the z detected vehicle images, and if not, directly retrieving the next frame of image;
if yes, extracting all detected vehicle maps with the same color and the same vehicle type as the target vehicle map S, and if z 'are extracted in total, calling the extracted z' detected vehicle maps as suspected vehicle maps and representing the suspected vehicle maps as: suspected vehicle map DcWherein, c 1.. z';
step 7.3, the target vehicle map S and each suspected vehicle map DcRespectively input into a local feature matching module, and the local feature matching module obtains a vehicle mean vector matrix V of a target vehicle map S by adopting a matching algorithms;
The local feature matching module obtains each suspected vehicle image D by adopting the same matching algorithmcIs suspected of vehicle mean vector matrix Vc;
Wherein, the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FCsight;
The local feature matching module performs feature matching on the target vehicle image S to obtain a vehicle mean vector matrix V of the target vehicle image SsThe method specifically comprises the following steps:
step 7.3.1, performing grid segmentation on the target vehicle map S through 4-by-4 grids to obtain 16 vehicle sub-block maps;
step 7.3.2, respectively inputting each vehicle sub-block map into the feature extraction layer to obtain corresponding vehicle sub-block feature maps FsubMap(m),m=1...16;
Step 7.3.3, each vehicle sub-block feature map FsubMap (m) is input to the feature sparse convolution layer Conv6 to obtain the corresponding sparse feature map FsparseMap(m);
And 7.3.4, determining the view angle type of the vehicle sub-block map:
each sparse feature map FsparseMap (m) input to full connection layer FCsightObtaining the view angle type of the vehicle sub-block map through non-maximum value suppression; wherein the view angle categories comprise five categories of side view, front view, rear view, top view and non-vehicle view;
step 7.3.5, determining the view angle vector of the view angle category of the vehicle sub-block diagram:
if the view angle categories are side view, front view, back view and top view, extracting each sparse feature map FsparseMap (m), reshaping it into a one-dimensional feature vector, the one-dimensional feature vectorThe quantity is used as a visual angle vector corresponding to the vehicle sub-block diagram; wherein the view vector is divided according to view categories, including: a side view vector, a front view vector, a rear view vector, and a top view vector;
if the visual angle category is a non-vehicle view, discarding;
step 7.3.6, determine the view mean vector for each view category:
obtaining the visual angle vector mean value of each vehicle sub-block image of the same visual angle category in the target vehicle image S, and respectively obtaining a side visual angle mean value vector, a front visual angle mean value vector, a rear visual angle mean value vector and a top visual angle mean value vector;
if a certain visual angle type does not exist, the visual angle mean vector does not exist, and all elements of the visual angle mean vector are set to be 0;
thus, a view mean vector V for the four view classes is obtainedcl(ii) a Where cl is 1,2,3, and 4, and represents a side view mean vector V1Mean vector V of front view angle2Mean vector V of rear view angle3And top view mean vector V4(ii) a View mean vector V for four view classesclA vehicle mean vector matrix V constituting a target vehicle map Ss;
Correspondingly, each suspected vehicle image D is obtainedcIs a suspected vehicle mean vector V 'of the four view angle categories'clTo construct a suspected vehicle map DcIs suspected of vehicle mean vector matrix Vc;
Step 7.4, calculating a target vehicle map S and each suspected vehicle map DcThe number Num of the viewing angle mean value vectors of the common viewing angle category is obtained by adopting the following formulacCorresponding feature matching value Match;
wherein, lambda is the weight of the number of the visual angle mean vectors; t represents transposition; tr represents the trace of the matrix and represents the sum of the main diagonal elements of the matrix;
step 7.5, when a plurality of suspected vehicle images D existcWhen the feature matching value Match is higher than the threshold value, the non-maximum value suppression method is used to suppress the plurality of suspected vehicle images DcDetermining a suspected vehicle map of the target vehicle, wherein the position of the suspected vehicle map in the image frame Frm (t) is the position of the target vehicle in the image frame Frm (t);
when the target vehicle map S and all suspected vehicle maps DcIf the feature matching value Match of (1) is lower than the threshold value, the target vehicle is not included in the image frame frm (t).
Preferably, in step 7.3.3, the sparse feature map is used to fully express the vehicle sub-block feature map FsubFeatures in map (m) reduce information Loss in compression process, and compression Loss function Loss is adopted in trainingsparse:
Losssparse=Min(FsubMap(m)-(FsparseMap(m)*WTran))
In the formula:
WTranare upsampled weights obtained by deconvolution.
The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching provided by the invention has the following advantages:
the invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which is suitable for videos shot by unmanned aerial vehicles in different complex scenes, eliminates the influences of insufficient vehicle detail information caused by illumination and target size change of the unmanned aerial vehicles at different heights to the maximum extent, solves the problem that vehicles to be queried are difficult to find in numerous targets, and can more accurately retrieve the vehicles to be queried.
Drawings
Fig. 1 is a schematic flow chart of an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching according to the present invention;
FIG. 2 is a structural diagram of a light-suppressing model;
FIG. 3 is a structural diagram of each feature selection layer f;
FIG. 4 is a block diagram of a right-hand branched network;
FIG. 5 is a diagram of a vehicle multi-dimensional feature probability identification network.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which mainly comprises the following steps: constructing and training a light suppression model to generate an illumination suppression characteristic diagram; constructing a multi-scale vehicle detection module with enhanced characteristics, and acquiring all vehicle detection result frames in a current frame; respectively intercepting images in each vehicle detection result frame in an image frame Frm (t) to obtain z detection vehicle images; inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; the position of the detected vehicle map in the image frame frm (t) is the position of the target vehicle in the image frame frm (t), thereby completing the retrieval and positioning of the target vehicle. The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which is suitable for videos shot by unmanned aerial vehicles in different complex scenes, eliminates the influences of insufficient vehicle detail information caused by illumination and target size change of the unmanned aerial vehicles at different heights to the maximum extent, solves the problem that vehicles to be queried are difficult to find in numerous targets, and can more accurately retrieve the vehicles to be queried.
The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which comprises the following steps with reference to fig. 1:
step 1, determining a target vehicle map S to be retrieved;
recording the current image frame as Frm (t), wherein t is the frame number of the current image frame, and judging whether the image frame Frm (t) contains a target vehicle map S to be searched by adopting the following steps 4-8:
step 4, inputting the image frame frm (t) into a trained light suppression model, and performing feature extraction and light suppression processing to obtain an illumination suppression feature map comprising n image layers, which is marked as FRestrainMap;
Specifically, the unmanned aerial vehicle video data often have because the light is stronger during shooting for the luminance of image is too high in the video, and the vehicle retrieval method is difficult to follow video image and draws effective information, leads to the missed-examination problem. Therefore, the light suppression model is adopted, and the light suppression model is trained by using the image pair consisting of the normal light image and the over-bright light image, so that the light suppression model can perform illumination characteristic suppression on the over-bright image, and the detection precision is improved when the light is too strong.
The step 4 specifically comprises the following steps:
step 4.1, constructing a light suppression model;
as shown in fig. 2, a structure diagram of the light suppression model; the light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolution layer conv1 and a shallow feature selection layer f which are connected in series1() And deep layer characteristic selection layer f2() (ii) a The suppression branch network comprises convolution layers conv1 ' and a shallow feature selection layer f ' which are connected in series '1() And deep characteristic choosing layer f'2();
As a specific implementation manner, when initially untrained, the network structures of the learning branch network and the suppression branch network are the same, and the branch network structure of each side is shown in the following table:
table 1: convolution kernel parameter of light suppression model backbone network
The structure of each feature selection layer f is shown in fig. 3, and includes: 3 convolution kernels of 1 x 1 and 2 convolution kernels of 3 x 3, two maximum pooling layers (Maxpool).
Step 4.2, obtaining a group of training sample pairs;
each group of training sample pairs comprises a normal light image I and an over-bright light image I' under the visual angle of the unmanned aerial vehicle; the light over-bright image I' is obtained by randomly adding a brightness value to the light normal image I; the a groups of training sample pairs are respectively expressed as: (I)1,I′1),(I2,I′2),...,(Ia,I′a);
Step 4.3, performing off-line training on the light-inhibiting model constructed in the step 4.1 by adopting a group of training samples, wherein an objective function of the off-line training is as follows:
wherein:
Losslight suppressionRepresenting a light loss suppressing function;
argmin () represents the value of the variable at which the target function is minimized;
f′1(I′j) Represents light over-bright image I'jInput to shallow feature chosen layer f'1() Then outputting a shallow layer characteristic value;
f′2(I′j) Represents light over-bright image I'jInputting into deep layer characteristic choosing layer f'2() Then, outputting the deep characteristic value;
f1(Ij) Representing normal light images IjInput to the shallow feature selection layer f1() Then outputting a shallow layer characteristic value;
f2(Ij) Representing normal light images IjInputting into deep characteristic selection layer f2() Then, outputting the deep characteristic value;
gamma represents a penalty coefficient and is controlled by artificial settingThe effect on the light loss suppressing function, the larger its value,the greater the effect on the light loss suppressing function;
4.4, the sensitivity of the suppression branch network to the brightness characteristics is weakened by performing off-line training on the light suppression model, so that the suppression branch network can perform illumination characteristic suppression on the image with overhigh brightness, which is shot by the unmanned aerial vehicle, and the significance of the detail characteristics of the vehicle at the view angle of the unmanned aerial vehicle is improved;
therefore, the image frame frm (t) is input to the suppression branch network of the trained light suppression model to obtain the light suppression feature map FRestrainMap。
To sum up, in order to make the light suppression model have light suppression capability, the light suppression model adopts a dual-branch network, and the branch network on each side includes: a convolution layer and two feature selection layers; because there are many objects in the unmanned aerial vehicle video, the light feature suppression is respectively carried out on the shallow feature and the deep feature of the image by adopting two feature selection layers, the network learning capability is enhanced, and the suppression effect is improved.
In the on-line detection of the light-suppressing model, only the suppression branch network is used, and as shown in fig. 4, the structure of the suppression branch network is shown.
As shown in fig. 3, the method for processing the image frame frm (t) by the branch suppression network specifically includes:
1) image frame Frm (t) passes through conv1 layer to obtain low-dimensional feature map FLowMap; the object is to fuse the feature information of different layers of an input image and improve the significance of object features.
2) Will low dimension feature map FLowMap is input into a shallow feature selection layer f1() Low dimensional feature map FLowMap passing through conv21And conv22Then, on the one hand, the middle dimension feature map F is outputMidMap; on the other hand, continue past conv23And conv24Then, outputting a high-dimensional feature map FhighMap;
3) Will low dimension feature map FLowMap is subjected to 3-3 maximum value pooling to obtain a feature Map F'LowMap; will the middle dimension feature map FMidMap is subjected to maximum value pooling of 2 x 2 to obtain a feature Map F'MidMap;
Wherein, through the operation of the step, the low-dimensional characteristic diagram F is processedLowMap and median feature graph FMidMap is subjected to size scaling processing, and the processed feature Map F'LowMap and feature Map F'MidMap, and high dimensional feature Map FhighThe Map sizes are the same;
4) will feature map F'LowMap, feature Map F'MidMap and high-dimensional feature Map FhighMap was concatenated over conv25After convolution, a multi-dimensional feature map F is outputMultiMap;
Specifically, because unmanned aerial vehicle's machine carries the ability weakly, and the vehicle target is less in the video image of shooing, the multidimensional characteristic map fuses not only reduces the calculated amount, still improves the utilization ratio of different dimension characteristics through fusing, obtains more object characteristic information.
Wherein: shallow feature selection layer f1() The unmanned aerial vehicle is characterized by being shallow, sensitive to texture and shape features of an object and capable of inhibiting brightness of most areas in an image, but due to the fact that the unmanned aerial vehicle is wide in visual field, the unmanned aerial vehicle can have texture and geometric features of a plurality of non-objects, and the features can interfere with brightness inhibition. Therefore, the shallow feature selection layer f1() Then, a deep characteristic selecting layer f is needed2() And performing characteristic processing.
5) The multi-dimensional feature map FMultiMap as a novel feature Map FLowMap, input to deep feature selection layer f2() In the method, the steps are repeated to obtain an illumination inhibition characteristic diagram FRestrainMap。
With the increase of the network depth, a deep feature picking layer f2() The depth feature of the method is more sensitive to semantic features, can effectively inhibit interference caused by non-object textures and geometric features, and makes up for shallow layersCharacteristic picking layer f1() Is not sufficient.
Step 5, the illumination inhibition characteristic diagram FRestrainMap is input into a multi-scale vehicle detection module with enhanced features, and z vehicle detection result frames in an image frame frm (t) are acquired:
in particular, a large number of objects with similar appearances exist in the video image of the unmanned aerial vehicle, such as roadside rectangular electric boxes, long-shaped sunshade umbrellas and the like. In order to make the vehicle more prominent, the invention provides a multi-scale vehicle detection module with enhanced features, thereby being more beneficial to extracting the vehicle.
Meanwhile, the size of the vehicle in the video image of the unmanned aerial vehicle can change along with the height change of the unmanned aerial vehicle, when the flying height of the unmanned aerial vehicle is low, the appearance of the vehicle in the video image is large, and a shallow network has missed detection and false detection due to insufficient receptive field; in view of the fact that the flying height of the unmanned aerial vehicle is too high, the appearance of the vehicle is too small, and the deep network loses information due to excessive convolution, so that missing detection is caused.
Step 5.1, light inhibition feature map FRestrainMap has n layers, and for each layer, it is expressed as: layeriN, each executing step 5.1.1 to step 5.1.3 to obtain an image layeriDependent weight value w ″)i:
Step 5.1.1, calculating layeriThe average value of all the pixel points is used as the layeriInitial weight w ofi;
Step 5.1.2, layer of the graphic layeriInitial weight w ofiInputting the initial weight w into a full connection layer, and activating a function through sigmoidiMapping to (0, 1) feature space, thereby outputting layeriOf normalized weight value w'i;
Step 5.1.3, establishing a piecewise function and carrying out layer matchingiOf normalized weight value w'iPerforming segmented suppression or enhancement to obtain a layerlayeriDependent weight value w ″)i:
Wherein:
epsilon represents a system constant and is used for adjusting the influence degree of the dependence weight value on the layer;
step 5.2, obtaining an illumination inhibition characteristic diagram FRestrainThe dependency weighted values of the n layers of the Map are respectively as follows: w1...w″n;
Will w ″)1...w″nCombining to obtain an illumination inhibition characteristic diagram FRestrain1 x n dependent weight vector W "for Map;
using the dependent weight vector W' as convolution kernel to check the illumination inhibition characteristic diagram FRestrainThe Map is convoluted to obtain a layer enhancement feature Map FEhcMap;
Step 5.3, enhancing feature map F of image layerEhcInputting Map into small target response layer to obtain small target significant feature graph FSmallMap;
Wherein, the small target response layer can adopt a convolution layer of 1 x 1. The purpose of 1 × 1 convolutional layer is to reduce feature map depth and improve the success rate of small target detection.
Wherein, the small target significant feature map FSmallThe Map contains more vehicle detail information, and the success rate of small target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is higher;
step 5.4, a small target salient feature map FSmallInputting Map into large target response layer to obtain large target significant characteristic diagram FLargeMap;
Wherein, the large target response layer can adopt 2 convolution layers of 3 x 3. The sensing field is increased through two 3-by-3 convolution layers, and the success rate of large target detection is improved. Meanwhile, two 3 x 3 convolutional layers are less computationally intensive than one 5 x 5 convolutional layer, given the same receptive field.
Wherein: large target significant feature map FLargeThe Map contains more semantic information, so that the accuracy rate of large target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is low;
step 5.5, a small target salient feature map FSmallMap is input to the result frame generation layer, so that in the image frame frm (t), p small target vehicle detection result frames Box are obtainedSmall(1)...BoxSmall(p);
Drawing F for salient features of large targetLargeMap is input to the result frame generation layer, so that q large target vehicle detection result frames Box are obtained in the image frame frm (t)Large(1)...BoxLarge(q);
The specific method comprises the following steps:
step 5.5.1, a small target salient feature map FSmallEach pixel point in the Map is used as an anchor point, and a plurality of candidate frames with different sizes are generated by taking each anchor point as a center; thus, for a small target salient feature map FSmallAll pixel points in the Map obtain a plurality of candidate frames;
for example, 6 candidate boxes with different sizes are generated by taking each anchor point as a center; when small target detection is performed, 3 candidate frames with the area of 8 and 3 candidate frames with the area of 16 can be generated according to the length-width ratio of 1: 1, 1: 2 and 2: 1.
When large object detection is performed, 3 candidate frames with an area of 32 and 3 candidate frames with an area of 64 may be generated in a ratio of length to width of 1: 1, 1: 2, 2: 1.
The length-width ratio of 1: 1, 1: 2 and 2: 1 is set according to the length-width ratio of the vehicle in the unmanned aerial vehicle window.
Step 5.5.2, calculating to obtain the vehicle probability value of each candidate box;
for example, 1-by-1 convolution layer is adopted to reshape the candidate frame into a 1-dimensional vector, and then the vehicle probability value of the candidate frame is calculated by using a sigmoid function.
And 5.5.3, screening the candidate frames, and removing the candidate frames with the vehicle probability value lower than a preset threshold, for example, setting the threshold to be 0.6, so as to obtain the candidate frames: a. the1,A2...Ap(ii) a Wherein p represents the number of candidate boxes;
step 5.5.4 calculate candidate box A1,A2...ApThe regression parameters of each candidate box in the list, each candidate box having the following regression parameters: width, height, and anchor point offset;
step 5.5.5, candidate Box A1,A2...ApThe anchor point coordinates of each candidate frame and the regression parameters corresponding to the anchor point coordinates are mapped back to the image frame Frm (t), so that p small target vehicle detection result frames Box are obtained in the image frame Frm (t)Small(1)...BoxSmall(p);
Step 5.5.6, using a large target significant feature map FLargeMap substitution of the small target salient feature Map F in step 5.5.1SmallMap, increasing the initial generation size of the candidate frame in step 5.5.1, and obtaining q large target vehicle detection result frames Box in the image frame frm (t) by adopting the method of steps 5.5.1-5.5.5Large(1)...BoxLarge(q);
Step 5.6, detecting result frames Box of p small target vehicles in the image frames Frm (t)Small(1)...BoxSmall(p) and q large target vehicle detection result boxes BoxLarge(1)...BoxLarge(q), collectively referred to as p + q vehicle detection result frames;
calculating a similarity coefficient between any two vehicle detection result frames for the p + q vehicle detection result frames obtained in the image frame frm (t), and if the similarity coefficient is smaller than a set threshold, not performing processing; if the similarity coefficient is larger than the set threshold, combining the two vehicle detection result frames into one vehicle detection result frame, and finally obtaining z vehicle detection result frames, wherein the z vehicle detection result frames are represented as: box (1.). Box (z);
for example, if the Jaccard similarity coefficient Ja > 0.8 between the two candidate frames, the merge operation is performed.
Assume that two candidate boxes are represented as: boxSmall(1),BoxLarge(1) Then the Jaccard similarity factor is calculated using the following formula:
in step 5.6, the two vehicle detection result frames are combined into one vehicle detection result frame, which specifically comprises the following steps:
setting two vehicle detection result frames which need to be combined as follows: vehicle test result Box BoxSmall(1) And a vehicle detection result Box BoxLarge(1) (ii) a The merged vehicle detection result Box is denoted as Box (1), and then:
the center point of Box (1) is BoxSmall(1) Center point and BoxLarge(1) The middle point of the central point connecting line;
height of Box (1), BoxSmall(1) Height and BoxLarge(1) An average value of heights;
the width of Box (1) is BoxSmall(1) Width and BoxLarge(1) Average value of the width.
Step 6, respectively intercepting images in each vehicle detection result frame in an image frame Frm (t) to obtain z detected vehicle images;
step 7, inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; the position of the detected vehicle map in the image frame frm (t) is the position of the target vehicle in the image frame frm (t), so that the retrieval and positioning of the target vehicle are completed;
in step 7, the multi-feature united vehicle search network establishment method is as follows:
and establishing a multi-feature joint vehicle search network by taking the vehicle color feature and the vehicle type feature as vehicle global features and taking the vehicle side view, the vehicle front view, the vehicle rear view, the vehicle top view and the non-vehicle view as vehicle local features.
The step 7 specifically comprises the following steps:
step 7.1, constructing a multi-feature united vehicle search network; the multi-feature combined vehicle search network comprises a global feature identification module and a local feature matching module;
step 7.2, inputting the z detected vehicle images and the target vehicle image S into a global feature recognition module respectively, and obtaining z' suspected vehicle images with the same color and the same vehicle type as the target vehicle image S by adopting the following method;
the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer;
step 7.2.1, identifying the color characteristics of the target vehicle map S, comprising the steps of:
step 7.2.1.1, inputting the target vehicle map S into the shared characteristic layer to obtain a shared characteristic map FShrMap;
Step 7.2.1.2, sharing the feature map FShrMap is input to the vehicle color feature layer to obtain a vehicle color feature vector VColor(ii) a Wherein the vehicle color feature layer comprises conv4ColorMax pooling layer Maxpool and full junction layer FCColor;
7.2.1.3, adopting the matrix broadcast mode to make the vehicle color feature vector VColorAnd sharing the profile FShrMultiplying Map to obtain color sensitive characteristic diagram FColorMap;
Step 7.2.1.4, color sensitive feature map FColorMap is a convolution kernel, and the target vehicle Map S is subjected to mutual convolution to obtain a color feature enhancement Map S'ColorEnhancing the response degree of the target vehicle map S to the color features;
step 7.2.1.5, enhancing the color feature by map S'ColorSequentially input to a shared feature layer, Conv4Color、Conv5colorThe maximum value pooling layer and the full connection layer are used for obtaining the color type of the target vehicle image S through a non-maximum value suppression algorithm;
step 7.2.2, obtaining the vehicle type of the target vehicle map S by adopting the same method, and further obtaining the color type and the vehicle type of each detected vehicle map;
step 7.2.3, judging whether a detected vehicle image with the same color and the same vehicle type as the target vehicle image S exists in the z detected vehicle images, and if not, directly retrieving the next frame of image;
if yes, extracting all the detected vehicle maps with the same color and the same vehicle type as the target vehicle map S, and assuming that z 'are extracted in total, referring the extracted z' detected vehicle maps as suspected vehicle maps, and representing that: suspected vehicle map DcWherein, c 1.. z';
step 7.3, the target vehicle map S and each suspected vehicle map DcRespectively input into a local feature matching module, and the local feature matching module obtains a vehicle mean vector matrix V of a target vehicle map S by adopting a matching algorithms;
The local feature matching module obtains each suspected vehicle image D by adopting the same matching algorithmcIs suspected of vehicle mean vector matrix Vc;
Wherein, the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FCsight;
The local feature matching module performs feature matching on the target vehicle image S to obtain a vehicle mean vector matrix V of the target vehicle image SsThe method specifically comprises the following steps:
step 7.3.1, performing grid segmentation on the target vehicle map S through 4-by-4 grids to obtain 16 vehicle sub-block maps;
step 7.3.2, respectively inputting each vehicle sub-block map into the feature extraction layer to obtain corresponding vehicle sub-block feature maps FsubMap(m),m=1...16;
Step 7.3.3, each vehicle sub-block feature map FsubMap (m) is input to the feature sparse convolution layer Conv6 to obtain the corresponding sparse feature map FsparseMap(m);
In step 7.3.3, in order to make the sparse feature map fully express the vehicle subblock feature map FsubFeatures in map (m) reduce information Loss in compression process, and compression Loss function Loss is adopted in trainingsparse:
Losssparse=Min(FsubMap(m)-(FsparseMap(m)*WTran))
In the formula:
WTranto obtain by deconvolutionThe upsampling weight of (1).
And 7.3.4, determining the view angle type of the vehicle sub-block map:
each sparse feature map FsparseMap (m) input to full connectivity layer FCsightObtaining the view angle type of the vehicle sub-block map through non-maximum value suppression; wherein the view angle categories comprise five categories of side view, front view, rear view, top view and non-vehicle view;
step 7.3.5, determining the view angle vector of the view angle category of the vehicle sub-block diagram:
if the view angle categories are side view, front view, back view and top view, extracting each sparse feature map FsparseMap (m), remolding the features into one-dimensional feature vectors, wherein the one-dimensional feature vectors are used as view angle vectors corresponding to the vehicle sub-block diagram; wherein the view vector is divided according to view categories, including: a side view vector, a front view vector, a rear view vector, and a top view vector;
if the visual angle category is a non-vehicle view, discarding;
step 7.3.6, determine the view mean vector for each view category:
obtaining the visual angle vector mean value of each vehicle sub-block image of the same visual angle category in the target vehicle image S, and respectively obtaining a side visual angle mean value vector, a front visual angle mean value vector, a rear visual angle mean value vector and a top visual angle mean value vector;
if a certain visual angle type does not exist, the visual angle mean vector does not exist, and all elements of the visual angle mean vector are set to be 0;
thus, a view mean vector V for the four view classes is obtainedcl(ii) a Where cl is 1,2,3, and 4, and represents a side view mean vector V1Mean vector V of front view angle2Mean vector V of rear view angle3And top view mean vector V4(ii) a View mean vector V for four view classesclA vehicle mean vector matrix V constituting a target vehicle map Ss;
Correspondingly, each suspected vehicle image D is obtainedcSuspected vehicle of four perspective categoriesMean vector V'clTo construct a suspected vehicle map DcIs suspected of vehicle mean vector matrix Vc;
Step 7.4, calculating a target vehicle map S and each suspected vehicle map DcThe number Num of the viewing angle mean value vectors of the common viewing angle category is obtained by adopting the following formula to obtain a vehicle map D corresponding to each suspected vehiclecCorresponding feature matching value Match;
wherein, lambda is the weight of the number of the visual angle mean vectors; t represents transposition; tr represents the trace of the matrix and represents the sum of the main diagonal elements of the matrix;
step 7.5, when a plurality of suspected vehicle images D existcWhen the feature matching value Match is higher than the threshold value, the non-maximum value suppression method is used to suppress the plurality of suspected vehicle images DcDetermining a suspected vehicle map of the target vehicle, wherein the position of the suspected vehicle map in the image frame Frm (t) is the position of the target vehicle in the image frame Frm (t);
when the target vehicle map S and all suspected vehicle maps DcIf the feature matching value Match of (1) is lower than the threshold value, the target vehicle is not included in the image frame frm (t).
Step 8, if the matching degrees of all the detected vehicle maps and the target vehicle map S in the current image frame frm (t) are lower than the set threshold, that is, the target vehicle does not exist in the current image frame frm (t), the image frame Frm (t +1) at the next time is continuously retrieved.
The invention provides an unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching, which is suitable for videos shot by unmanned aerial vehicles in different complex scenes, eliminates the influences of insufficient vehicle detail information caused by illumination and target size change of the unmanned aerial vehicles at different heights to the maximum extent, solves the problem that vehicles to be queried are difficult to find in numerous targets, and can more accurately retrieve the vehicles to be queried.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.
Claims (6)
1. An unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching is characterized by comprising the following steps:
step 1, determining a target vehicle map S to be retrieved;
step 2, shooting the ground by the unmanned aerial vehicle to obtain video data of the unmanned aerial vehicle;
step 3, executing steps 4 to 8 on each frame of image of the unmanned aerial vehicle video data, and judging whether each frame of image contains a target vehicle image S to be retrieved:
recording the current image frame as Frm (t), wherein t is the frame number of the current image frame, and judging whether the image frame Frm (t) contains a target vehicle map S to be searched by adopting the following steps 4-8:
step 4, inputting the image frame frm (t) into a trained light suppression model, and performing feature extraction and light suppression processing to obtain an illumination suppression feature map comprising n image layers, which is marked as FRestrainMap;
The light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolution layer conv1 and a shallow feature selection layer f which are connected in series1() And deep layer characteristic selection layer f2() (ii) a The suppression branch network comprises convolution layers conv1 ' and shallow feature selection layers f ' connected in series '1() And deep characteristic choosing layer f'2();
Step 5, the illumination inhibition characteristic diagram FRestrainMap is input into a multi-scale vehicle detection module with enhanced features, and z vehicle detection result frames in an image frame frm (t) are acquired:
step 5.1, light inhibition feature map FRestrainMap has n layers, and for each layer, it is expressed as: layeriAnd i is 1, … n, executing the steps 5.1.1-5.1.3 to obtain the layeriDependent weight value w ″)i:
Step 5.1.1, calculating layeriThe average value of all the pixel points is used as the layeriInitial weight w ofi;
Step 5.1.2, layer of the graphic layeriInitial weight w ofiInputting the initial weight w into a full connection layer, and activating a function through sigmoidiMapping to (0, 1) feature space, thereby outputting layeriNormalized weight value w'i;
Step 5.1.3, establishing a piecewise function and carrying out layer matchingiOf normalized weight value w'iPerforming segmented suppression or enhancement to obtain the layeriDependent weight value w ″)i:
Wherein:
epsilon represents a system constant and is used for adjusting the influence degree of the dependence weight value on the layer;
step 5.2, obtaining an illumination inhibition characteristic diagram FRestrainThe dependency weighted values of the n layers of the Map are respectively as follows: w ″)1…w″n;
Will w ″)1…w″nCombining to obtain an illumination inhibition characteristic diagram FRestrain1 x n dependent weight vector W "for Map;
using the dependent weight vector W' as convolution kernel to check the illumination inhibition characteristic diagram FRestrainThe Map is convoluted to obtain a layer enhancement feature Map FEhcMap;
Step 5.3, enhancing feature map F of image layerEhcInputting Map into small target response layer to obtain small target significant feature graph FSmallMap;
Wherein, the small target significant feature map FSmallThe Map contains more vehicle detail information, and the success rate of small target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is higher;
step 5.4, a small target salient feature map FSmallInputting Map into large target response layer to obtain large target significant characteristic diagram FLargeMap;
Wherein: large target significant feature map FLargeThe Map contains more semantic information, so that the accuracy rate of large target vehicle detection can be improved when the flying height of the unmanned aerial vehicle is low;
step 5.5, a small target salient feature map FSmallMap is input to the result frame generation layer, so that in the image frame frm (t), p small target vehicle detection result frames Box are obtainedSmall(1)…BoxSmall(p);
Drawing F for salient features of large targetLargeMap is input to the result frame generation layer, so that q large target vehicle detection result frames Box are obtained in the image frame frm (t)Large(1)…BoxLarge(q);
The specific method comprises the following steps:
step 5.5.1, a small target salient feature map FSmallEach pixel point in the Map is used as an anchor point, and a plurality of candidate frames with different sizes are generated by taking each anchor point as a center; thus, for a small target salient feature map FSmallAll pixel points in the Map obtain a plurality of candidate frames;
step 5.5.2, calculating to obtain the vehicle probability value of each candidate box;
and 5.5.3, screening the candidate frames, and removing the candidate frames with the vehicle probability value lower than a preset threshold value to obtain the candidate frames: a. the1,A2…Ap(ii) a Wherein p represents the number of candidate boxes;
step 5.5.4 calculating candidate Box A1,A2…ApThe regression parameters of each candidate box in the list, each candidate box having the following regression parameters: width, height, and anchor point offset;
step 5.5.5, candidate frame A1,A2…ApThe anchor point coordinates of each candidate frame and the regression parameters corresponding to the anchor point coordinates are mapped back to the image frame Frm (t), so that p small target vehicle detection result frames Box are obtained in the image frame Frm (t)Small(1)…BoxSmall(p);
Step 5.5.6, toLarge target significant feature map FLargeMap substitution small target salient feature Map F in step 5.5.1SmallMap, increasing the initial generation size of the candidate frame in step 5.5.1, and obtaining q large target vehicle detection result frames Box in the image frame frm (t) by adopting the method of steps 5.5.1-5.5.5Large(1)…BoxLarge(q);
Step 5.6, detecting result frames Box of p small target vehicles in the image frames Frm (t)Small(1)…BoxSmall(p) and q large target vehicle detection result boxes BoxLarge(1)…BoxLarge(q), collectively referred to as p + q vehicle detection result frames;
calculating a similarity coefficient between any two vehicle detection result frames for the p + q vehicle detection result frames obtained in the image frame frm (t), and if the similarity coefficient is smaller than a set threshold, not performing processing; if the similarity coefficient is larger than the set threshold, combining the two vehicle detection result frames into one vehicle detection result frame, and finally obtaining z vehicle detection result frames, wherein the z vehicle detection result frames are represented as: box (1) … Box (z);
step 6, respectively intercepting images in each vehicle detection result frame in an image frame Frm (t) to obtain z detection vehicle images;
step 7, inputting each detected vehicle map and the target vehicle map S into a multi-feature united vehicle search network for feature matching to obtain a detected vehicle map of the target vehicle; the position of the detected vehicle map in the image frame frm (t) is the position of the target vehicle in the image frame frm (t), so that the retrieval and positioning of the target vehicle are completed;
the multi-feature combined vehicle search network comprises a global feature recognition module and a local feature matching module; the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer; the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FCsight;
Step 8, if the matching degrees of all the detected vehicle maps and the target vehicle map S in the current image frame frm (t) are lower than the set threshold, that is, the target vehicle does not exist in the current image frame frm (t), the image frame Frm (t +1) at the next time is continuously retrieved.
2. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching according to claim 1, wherein the step 4 specifically comprises:
step 4.1, constructing a light suppression model;
the light suppression model is a double-branch network and comprises a learning branch network and a suppression branch network; wherein the learning branch network comprises a convolution layer conv1 and a shallow feature selection layer f which are connected in series1() And deep layer characteristic selection layer f2() (ii) a The suppression branch network comprises convolution layers conv1 ' and a shallow feature selection layer f ' which are connected in series '1() And deep characteristic choosing layer f'2();
Step 4.2, obtaining a group of training sample pairs;
each group of training sample pairs comprises a normal light image I and an over-bright light image I' under the visual angle of the unmanned aerial vehicle; the light over-bright image I' is obtained by randomly adding a brightness value to the light normal image I; the a groups of training sample pairs are respectively expressed as: (I)1,I′1),(I2,I′2),...,(Ia,I′a);
Step 4.3, performing off-line training on the light-inhibiting model constructed in the step 4.1 by adopting a group of training samples, wherein the target function of the off-line training is as follows:
wherein:
Losslight suppressionRepresenting a light loss suppressing function;
argmin () represents the value of the variable at which the target function takes the minimum value;
f′1(I′j) Represents light over-bright image I'jInput to shallow feature chosen layer f'1() Then outputting a shallow layer characteristic value;
f′2(I′j) Representing too bright lightImage I'jInputting into deep layer characteristic choosing layer f'2() Then, outputting the deep characteristic value;
f1(Ij) Representing normal light images IjInput to the shallow feature selection layer f1() Then outputting a shallow layer characteristic value;
f2(Ij) Representing normal light images IjInputting into deep characteristic selection layer f2() Then, outputting the deep characteristic value;
gamma represents a penalty coefficient and is controlled by artificial settingThe effect on the light loss suppressing function, the larger its value,the greater the effect on the light loss suppressing function;
4.4, the sensitivity of the suppression branch network to the brightness characteristics is weakened by performing off-line training on the light suppression model, so that the suppression branch network can perform illumination characteristic suppression on the image with overhigh brightness, which is shot by the unmanned aerial vehicle, and the significance of the detail characteristics of the vehicle at the view angle of the unmanned aerial vehicle is improved;
therefore, the image frame frm (t) is input to the suppression branch network of the trained light suppression model to obtain the light suppression feature map FRestrainMap。
3. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 5.6, two vehicle detection result frames are combined into one vehicle detection result frame, specifically:
setting two vehicle detection result frames which need to be combined as follows: vehicle test result Box BoxSmall(1) And a vehicleTest result Box BoxLarge(1) (ii) a The merged vehicle detection result Box is denoted as Box (1), and then:
the center point of Box (1) is BoxSmall(1) Center point and BoxLarge(1) The middle point of the central point connecting line;
height of Box (1), BoxSmall(1) Height and BoxLarge(1) An average value of the heights;
the width of Box (1) is BoxSmall(1) Width and BoxLarge(1) Average value of the width.
4. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 7, the multi-feature joint vehicle search network establishment method is as follows:
and establishing a multi-feature joint vehicle search network by taking the vehicle color feature and the vehicle type feature as vehicle global features and taking the vehicle side view, the vehicle front view, the vehicle rear view, the vehicle top view and the non-vehicle view as vehicle local features.
5. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching according to claim 4, wherein the step 7 specifically comprises:
step 7.1, constructing a multi-feature united vehicle search network; the multi-feature combined vehicle search network comprises a global feature identification module and a local feature matching module;
step 7.2, inputting the z detected vehicle images and the target vehicle image S into a global feature recognition module respectively, and obtaining z' suspected vehicle images with the same color and the same vehicle type as the target vehicle image S by adopting the following method;
the global feature identification module comprises a shared feature layer, a vehicle color feature layer and a vehicle type feature layer;
step 7.2.1, identifying the color characteristics of the target vehicle map S, comprising the steps of:
step 7.2.1.1, inputting the target vehicle map S into the shared characteristic layer to obtain a shared characteristic map FShrMap;
Step 7.2.1.2, sharing the feature map FShrMap is input to the vehicle color feature layer to obtain a vehicle color feature vector VColor(ii) a Wherein the vehicle color feature layer comprises conv4ColorMax pooling layer Maxpool and full junction layer FCColor;
7.2.1.3, adopting the matrix broadcast mode to make the vehicle color feature vector VColorAnd sharing the profile FShrMultiplying Map to obtain color sensitive characteristic diagram FColorMap;
Step 7.2.1.4, color sensitive feature map FColorMap is a convolution kernel, and the target vehicle Map S is subjected to mutual convolution to obtain a color feature enhancement Map S'ColorEnhancing the response degree of the target vehicle map S to the color features;
step 7.2.1.5, enhancing the color feature by map S'ColorSequentially input to a shared feature layer, Conv4Color、Conv5ColorThe maximum value pooling layer and the full connection layer are used for obtaining the color type of the target vehicle image S through a non-maximum value suppression algorithm;
step 7.2.2, obtaining the vehicle type of the target vehicle map S by adopting the same method, and further obtaining the color type and the vehicle type of each detected vehicle map;
step 7.2.3, judging whether a detected vehicle image with the same color and the same vehicle type as the target vehicle image S exists in the z detected vehicle images, and if not, directly searching the next frame of image;
if yes, extracting all the detected vehicle maps with the same color and the same vehicle type as the target vehicle map S, and assuming that z 'are extracted in total, referring the extracted z' detected vehicle maps as suspected vehicle maps, and representing that: suspected vehicle map DcWherein, c is 1 … z';
step 7.3, the target vehicle map S and each suspected vehicle map DcRespectively input into a local feature matching module, and the local feature matching module obtains a vehicle mean vector matrix V of a target vehicle map S by adopting a matching algorithms;
Local feature matchingThe same matching algorithm is adopted by the modules to obtain each suspected vehicle image DcIs suspected of vehicle mean vector matrix Vc;
Wherein the local feature matching module comprises a feature extraction layer, a feature sparse convolution layer Conv6 and a full connection layer FCsight;
The local feature matching module performs feature matching on the target vehicle image S to obtain a vehicle mean vector matrix V of the target vehicle image SsThe method specifically comprises the following steps:
step 7.3.1, performing grid segmentation on the target vehicle map S through 4-by-4 grids to obtain 16 vehicle sub-block maps;
step 7.3.2, respectively inputting each vehicle sub-block map into the feature extraction layer to obtain corresponding vehicle sub-block feature maps FsubMap(m),m=1…16;
Step 7.3.3, each vehicle sub-block feature map FsubMap (m) is input to the feature sparse convolution layer Conv6 to obtain the corresponding sparse feature map FsparseMap(m);
And 7.3.4, determining the view angle type of the vehicle sub-block map:
each sparse feature map FsparseMap (m) input to full connection layer FCsightObtaining the view angle type of the vehicle sub-block map through non-maximum value suppression; wherein the view angle categories comprise five categories of side view, front view, rear view, top view and non-vehicle view;
step 7.3.5, determining the view angle vector of the view angle category of the vehicle sub-block diagram:
if the view angle categories are side view, front view, back view and top view, extracting each sparse feature map FsparseMap (m), remolding the features into one-dimensional feature vectors, wherein the one-dimensional feature vectors are used as view angle vectors corresponding to the vehicle sub-block diagram; wherein the view vector is divided according to view categories, including: a side view vector, a front view vector, a rear view vector, and a top view vector;
if the visual angle category is a non-vehicle view, discarding;
step 7.3.6, determine the view mean vector for each view category:
obtaining a view angle vector mean value of each vehicle sub-block map of the same view angle category in the target vehicle map S, and respectively obtaining a side view angle mean value vector, a front view angle mean value vector, a rear view angle mean value vector and a top view angle mean value vector;
if a certain visual angle type does not exist, the visual angle mean vector does not exist, and all elements of the visual angle mean vector are set to be 0;
thus, a view mean vector V for the four view classes is obtainedcl(ii) a Where cl is 1,2,3, and 4, and represents a side view mean vector V1Mean vector V of front view angle2Mean vector V of rear view angle3And top view mean vector V4(ii) a View mean vector V for four view classesclA vehicle mean vector matrix V constituting a target vehicle map Ss;
Correspondingly, each suspected vehicle image D is obtainedcIs a suspected vehicle mean vector V 'of the four view angle categories'clTo construct a suspected vehicle map DcIs suspected of vehicle mean vector matrix Vc;
Step 7.4, calculating a target vehicle map S and each suspected vehicle map DcThe number Num of the viewing angle mean value vectors of the common viewing angle category is obtained by adopting the following formula to obtain a vehicle map D corresponding to each suspected vehiclecCorresponding feature matching value Match;
wherein, lambda is the weight of the number of the visual angle mean vectors; t represents transposition; tr represents the trace of the matrix and represents the sum of the main diagonal elements of the matrix;
step 7.5, when a plurality of suspected vehicle images D existcWhen the feature matching value Match is higher than the threshold value, the non-maximum value suppression method is used to suppress the plurality of suspected vehicle images DcDetermining a suspected vehicle map of the target vehicle, wherein the position of the suspected vehicle map in the image frame Frm (t) is the position of the target vehicle in the image frame Frm (t);
when the target vehicleMap S and all suspect vehicles map DcIf the feature matching value Match of (1) is lower than the threshold value, the target vehicle is not included in the image frame frm (t).
6. The unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching of claim 1, wherein in step 7.3.3, in order to make the sparse feature map sufficiently express the vehicle sub-block feature map FsubFeatures in map (m) reduce information Loss in compression process, and compression Loss function Loss is adopted in trainingsparse:
Losssparse=Min(FsubMap(m)-(FsparseMap(m)*WTran))
In the formula:
WTranare upsampled weights obtained by deconvolution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534212.4A CN114220053B (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534212.4A CN114220053B (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114220053A CN114220053A (en) | 2022-03-22 |
CN114220053B true CN114220053B (en) | 2022-06-03 |
Family
ID=80702585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111534212.4A Active CN114220053B (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114220053B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436738A (en) * | 2011-09-26 | 2012-05-02 | 同济大学 | Traffic monitoring device based on unmanned aerial vehicle (UAV) |
CN110110624A (en) * | 2019-04-24 | 2019-08-09 | 江南大学 | A kind of Human bodys' response method based on DenseNet network and the input of frame difference method feature |
CN110443208A (en) * | 2019-08-08 | 2019-11-12 | 南京工业大学 | YOLOv 2-based vehicle target detection method, system and equipment |
CN110717387A (en) * | 2019-09-02 | 2020-01-21 | 东南大学 | Real-time vehicle detection method based on unmanned aerial vehicle platform |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885764B (en) * | 2017-09-21 | 2020-12-18 | 银江股份有限公司 | Rapid Hash vehicle retrieval method based on multitask deep learning |
US11036216B2 (en) * | 2018-09-26 | 2021-06-15 | International Business Machines Corporation | Voice-controllable unmanned aerial vehicle for object retrieval and delivery |
CN109815886B (en) * | 2019-01-21 | 2020-12-18 | 南京邮电大学 | Pedestrian and vehicle detection method and system based on improved YOLOv3 |
CN109977812B (en) * | 2019-03-12 | 2023-02-24 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
US20200301015A1 (en) * | 2019-03-21 | 2020-09-24 | Foresight Ai Inc. | Systems and methods for localization |
WO2021207999A1 (en) * | 2020-04-16 | 2021-10-21 | 华为技术有限公司 | Vehicle positioning method and apparatus, and positioning map layer generation method and apparatus |
CN112149643B (en) * | 2020-11-09 | 2022-02-22 | 西北工业大学 | Vehicle weight identification method for unmanned aerial vehicle platform based on multi-stage attention mechanism |
CN112381043A (en) * | 2020-11-27 | 2021-02-19 | 华南理工大学 | Flag detection method |
CN112766087A (en) * | 2021-01-04 | 2021-05-07 | 武汉大学 | Optical remote sensing image ship detection method based on knowledge distillation |
-
2021
- 2021-12-15 CN CN202111534212.4A patent/CN114220053B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102436738A (en) * | 2011-09-26 | 2012-05-02 | 同济大学 | Traffic monitoring device based on unmanned aerial vehicle (UAV) |
CN110110624A (en) * | 2019-04-24 | 2019-08-09 | 江南大学 | A kind of Human bodys' response method based on DenseNet network and the input of frame difference method feature |
CN110443208A (en) * | 2019-08-08 | 2019-11-12 | 南京工业大学 | YOLOv 2-based vehicle target detection method, system and equipment |
CN110717387A (en) * | 2019-09-02 | 2020-01-21 | 东南大学 | Real-time vehicle detection method based on unmanned aerial vehicle platform |
Also Published As
Publication number | Publication date |
---|---|
CN114220053A (en) | 2022-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304873B (en) | Target detection method and system based on high-resolution optical satellite remote sensing image | |
CN108573276B (en) | Change detection method based on high-resolution remote sensing image | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN113065558A (en) | Lightweight small target detection method combined with attention mechanism | |
CN111046880B (en) | Infrared target image segmentation method, system, electronic equipment and storage medium | |
CN111767882A (en) | Multi-mode pedestrian detection method based on improved YOLO model | |
CN110246141B (en) | Vehicle image segmentation method based on joint corner pooling under complex traffic scene | |
CN109684922B (en) | Multi-model finished dish identification method based on convolutional neural network | |
CN105404886B (en) | Characteristic model generation method and characteristic model generating means | |
CN109903331B (en) | Convolutional neural network target detection method based on RGB-D camera | |
CN106897673B (en) | Retinex algorithm and convolutional neural network-based pedestrian re-identification method | |
CN110929593B (en) | Real-time significance pedestrian detection method based on detail discrimination | |
CN110263712B (en) | Coarse and fine pedestrian detection method based on region candidates | |
CN109034184B (en) | Grading ring detection and identification method based on deep learning | |
KR102320985B1 (en) | Learning method and learning device for improving segmentation performance to be used for detecting road user events using double embedding configuration in multi-camera system and testing method and testing device using the same | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
CN101356539A (en) | Method and system for detecting a human in a test image of a scene acquired by a camera | |
CN110659550A (en) | Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium | |
CN110334703B (en) | Ship detection and identification method in day and night image | |
CN111274964B (en) | Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle | |
CN115661777A (en) | Semantic-combined foggy road target detection algorithm | |
CN113569981A (en) | Power inspection bird nest detection method based on single-stage target detection network | |
CN116363535A (en) | Ship detection method in unmanned aerial vehicle aerial image based on convolutional neural network | |
Zhao et al. | Image dehazing based on haze degree classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230518 Address after: 3032B, 3rd Floor, Building 9, No.16 Fengguan Road, Fengtai District, Beijing, 100071 Patentee after: Beijing Lingyun Space Technology Co.,Ltd. Address before: 100044 No. 1, Exhibition Road, Beijing, Xicheng District Patentee before: Beijing University of Civil Engineering and Architecture |