CN107885764B

CN107885764B - Rapid Hash vehicle retrieval method based on multitask deep learning

Info

Publication number: CN107885764B
Application number: CN201710857318.5A
Authority: CN
Inventors: 汤一平; 温晓岳; 柳展; 张文广; 樊锦祥
Original assignee: Enjoyor Co Ltd
Current assignee: Yinjiang Technology Co.,Ltd.
Priority date: 2017-09-21
Filing date: 2017-09-21
Publication date: 2020-12-18
Anticipated expiration: 2037-09-21
Also published as: CN107885764A

Abstract

A fast Hash vehicle retrieval method based on multi-task deep learning comprises a multi-task deep convolution neural network used for deep learning and training recognition, a feature fusion method of a segmented compact Hash code and an example feature used for improving retrieval precision and retrieval method practicability, a local sensitive Hash reordering algorithm used for improving retrieval performance and a cross-modal retrieval method used for improving retrieval engine robustness and accuracy. Firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.

Description

Rapid Hash vehicle retrieval method based on multitask deep learning

Technical Field

The invention relates to the application of artificial intelligence, digital image processing, convolutional neural network and computer vision in the field of public safety, and belongs to the field of intelligent transportation.

Background

Today, smart cities and intelligent transportation are rapidly developing, and the demands for large-scale image monitoring, vehicle identification of video databases and vehicle retrieval in public safety systems are rapidly increasing.

In the prior art, a vehicle retrieval method mainly extracts license plate information of a target vehicle. Then, the motor vehicle to be retrieved is retrieved according to the license plate information. This is typically done by identifying the license plate number of the vehicle from the monitored images and then identifying the motor vehicle having the license plate number from the other monitored images. Although this method of searching only by the license plate number is easy to implement, it is not possible to effectively search for a motor vehicle that cannot acquire license plate information, such as a fake-license-plate vehicle.

The vehicle retrieval technology based on the appearance characteristics can not only make up the limitations and the defects of the traditional license plate identification method, but also has very important practical significance and very wide application prospect in intelligent vehicle retrieval, especially in violation inspection, hit-and-miss pursuit, criminal suspect vehicle locking, fake license vehicle identification and criminal investigation case solving efficiency and speed acceleration.

The existing vehicle retrieval method basically utilizes the algorithms of sift, surf, dog and the like to extract the whole image characteristics of a target vehicle image, the characteristics are taken as target characteristics, the same algorithm is utilized to extract the whole image characteristics of each vehicle image in a database, the characteristics are taken as characteristics to be matched, the Euclidean distance between the target characteristics and each characteristic to be matched is calculated, and the vehicle corresponding to the characteristic to be matched with the closest Euclidean distance is taken as the target vehicle.

Vehicle retrieval requires finding a specific target vehicle among a series of similarly contoured vehicles, making the task even more challenging; furthermore, the influence of practical conditions, such as monitoring environment, weather conditions and lighting conditions, is taken into account.

In recent years, the technology of deep learning in the field of computer vision is rapidly developed, and the deep learning can utilize a large number of training samples and hidden layers to deeply learn abstract information of an image layer by layer so as to more comprehensively and directly acquire image characteristics. The digital image is described by a matrix, and the convolutional neural network better starts from a local information block to further describe the overall structure of the image, so that the convolutional neural network is mostly adopted to solve the problem in the field of computer vision and deep learning methods. The deep convolutional neural network technology is from R-CNN, fast R-CNN to Fasterer R-CNN around improving the detection precision and the detection time. The method is characterized by further precision improvement, acceleration, end-to-end and more practicability, and almost covers all fields from classification to detection, segmentation and positioning. The application of the deep learning technology to vehicle retrieval is a research field with practical application value.

Reordering is a technique commonly used in image retrieval technology to improve retrieval performance, for example, initial retrieval results may be reordered through visual feature matching relationship between image pairs. However, the re-ordering effect depends strongly on whether the visual features used are sufficiently effective to represent the image.

In similar vehicle search, as many vehicles are often similar in appearance, the extracted visual features are also similar and different vehicle types cannot be distinguished, so that similar vehicles cannot be well retrieved by the reordering method directly using the matching relationship between the image pairs.

Query expansion is a common method used in search technology to improve recall and accuracy. The query expansion technology is a method of adding new keywords to an original query sentence to re-query, for example, a search engine searches the query sentence input by a user once, selects suitable keywords according to a searched file, and adds the keywords to the query sentence to re-search, thereby finding out more related files. Therefore, query expansion can effectively improve the recall rate of information retrieval, but no specific query expansion method is provided for a special object, namely a vehicle in an image in the prior art.

Compared with the traditional vehicle retrieval method based on the license plate number, the method provided by the Chinese patent application with the application number of 201510744990.4 not only effectively avoids the dependence on the license plate recognition accuracy, but also can be used for retrieving fake-licensed vehicles and fake-licensed vehicles. However, this technology is also a computer vision technology belonging to the pre-deep learning age.

The chinese patent application with application number 201610671729.0 discloses a vehicle retrieval method and device based on big data, the method includes: extracting brand features of the target vehicle in the target vehicle image; determining the probability that each pixel point in the target vehicle image corresponds to each marker, wherein the markers comprise one or more of annual inspection markers, ornaments and hanging decorations; determining the position of each marker in the target vehicle image according to the probability of each pixel point in the target vehicle image corresponding to each marker and the probability threshold value corresponding to each marker; extracting image features of each marker according to the position of each marker in the target vehicle image; and searching the target vehicle in the plurality of vehicle images to be searched according to the image characteristics of each marker in the target vehicle image and the brand characteristics of the target vehicle. Although the technology adopts the deep learning technology, the technology belongs to the deep learning technology of a single task; however, vehicle retrieval is a typical multitasking deep learning technique.

Chinese patent application No. 201410381577.1 discloses a query expansion method and device in similar vehicle retrieval, wherein the method comprises: determining vehicle type information of an image to be inquired according to the image to be inquired comprising a vehicle; selecting a plurality of sample images which accord with preset conditions from a vehicle model template library corresponding to the vehicle model information of the image to be inquired; forming a query expansion image set by the sample images so that the sample images in the query expansion image set replace the images to be queried to query in a target database; wherein, motorcycle type template storehouse includes: the method can improve the recall rate and the accuracy rate of the vehicle image retrieval. The method is also an image retrieval technology belonging to the early deep learning era.

Chinese patent application No. 201410652730.X discloses an image-based motor vehicle retrieval method and apparatus. The method comprises the following steps: acquiring a first image containing information of a motor vehicle to be retrieved; determining a first appearance contour of the motor vehicle to be retrieved from the first image; dividing the image in the first appearance contour into a plurality of areas, and extracting the image characteristics of each area by adopting different step lengths; combining the image characteristics of all the areas to obtain the overall image characteristics of the motor vehicle to be retrieved; and comparing the overall image features of the motor vehicle to be retrieved with the pre-extracted overall image features of the target motor vehicle to obtain a comparison result. The method is also an image retrieval technology belonging to the early deep learning era.

Disclosure of Invention

Aiming at the problems of how to efficiently utilize mass video data generated in the field of public safety and improve the vehicle retrieval efficiency in the big data era, the invention provides a fast Hash retrieval method based on multi-task deep learning, which effectively utilizes the relevance among detection and identification tasks and the diversity of basic information of vehicles at a checkpoint to realize the purpose of real-time retrieval; finally, the multitask deep learning quick Hash vehicle retrieval method with high retrieval precision and good robustness is provided.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a fast Hash vehicle retrieval method based on multitask deep learning comprises the following steps:

the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;

secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;

thirdly, adopting a locality sensitive hash reordering algorithm;

and fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval.

Further, in the first step, fast R-CNN is adopted as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;

the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;

the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;

each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;

in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;

following the multitask loss in Faster R-CNN, minimizing the objective function; the loss function for an image is defined as:

where i is the index of an anchor, p_iIs the predicted probability that anchor is the ith target, and if anchor is positive, GT label

That is, 1, if anchor is negative,

is 0; t is t_iIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,

is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, N_clsIs the normalized value of the cls term being the size of the mini-batch, N_regIs the normalized value of reg terms as the number of anchor positions, the classification loss function L_clsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:

for the regression loss function L_regDefined by the following function:

in the formula, L_regFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)₁；

In the formula, smooth_L1Is smooth L₁The loss function, x, is a variable.

Further, in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);

in the formula (I), the compound is shown in the specification,

is a characteristic of inputVector quantity

And a weight parameter w^tL (-) is a loss function, phi (w)^t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded as

Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,

respectively representing a feature vector and a label of the ith sample;

for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),

in the formula, x_iIs the ith depth feature, W_jThe jth column of weights in the last fully-connected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.

Furthermore, in the second step, the feature fusion method process of the segment compact hash code and the example feature is as follows:

in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;

for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;

and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.

The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each task^tClass, in m^tRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);

where θ represents a random hyperplane, m^tA fully-connected output vector representing each task, c^tRepresenting the categories existing under each task, q^tRepresenting a fully connected layer output;

and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:

wherein q is^tRepresents the full connection layer output, H^tRepresenting the binary output of the excitation segment coding module;

finally, H is put^tFusing into a vehicle segment compact hash code vector f_A：

f_A＝[α¹H¹；α²H²；...；α^tH^t] (9)

Wherein f is_ARepresenting a vehicle segmented compact hash code vector, alpha^tRepresenting the coefficient, calculated using equation (10), H^tRepresenting the binary output of the excitation segment coding module, te (1,T)：

wherein alpha is^tRepresenting coefficients, multiplied by a coefficient alpha before the H vector^tIn order to compensate for errors caused by uneven classification among different tasks.

In the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:

select {4 } for the deepest level of conv2_ x through conv5_ x, respectively²,8²,16²,16²As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {_n},n∈(1,d)，S_nThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a three-dimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'_n},n∈(1,d)，S′_nOf l × l for each S'_nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'_nBecomes l/kXl/k, and S 'for each channel'_nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector f_BThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);

f＝[f_A；f_B] (11)

where f is a feature vector for vehicle retrieval, f_BInstance feature vectors, i.e. personality feature vectors, f_ARepresenting a vehicle segmented compact hash code vector.

In the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:

s{h(f_Aq)＝h(f_A)}＝sim(f_Aq,f_A) (12)

in the formula, sim (f)_Aq,f_A) Denotes f_AqAnd f_ASimilarity of (c), h (f)_A) Denotes f_AHash function of h (f)_Aq) Denotes f_AqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),

a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),

h(f_A)＝sign(Wf_A+b) (14)

where W is a random hyperplane vector and b is a random intercept.

In the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):

where k denotes the kth image in the bucket,

represents a penalty factor and

cos represents the cosine distance formula and y represents f before mapping_AqAnd

whether they are equal; y is 1 if equal, 0 otherwise,

representing the kth image vehicle segment compact hash code vector, f_AqRepresenting reordered vehicle segmented compact hash code vectors

Making the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.

In the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_A(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.

The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:

inputting: a text O; and (3) outputting: a set of roughly similar images;

STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;

STEP 2: taking the minimum vector R of the random combination from O ═ R (R)₁,r₂,...,r_n)；

STEP 3: to R and f_AIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristics

At this time f_ATxtA dimension less than R;

STEP 4: searching by using a locality sensitive reordering Hash algorithm;

STEP 5: returning a similar image group I;

wherein the text attribute feature function

Expressed by equation (16):

in the formula, A^TExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,

for the text attribute feature function, sign represents a sign function;

in the formula (I), the compound is shown in the specification,

diag denotes the determination of the diagonal matrix,

the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code A^TInitialized to the full 1 vector of (1 × c).

The technical conception of the invention is as follows: firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.

The text generation retrieval feature vector is the same as the segmented compact hash code vector generated by the convolutional network, so that the feature vectors generated by the text generation retrieval feature vector and the segmented compact hash code vector can be retrieved by using the same retrieval system without additional training.

The deep convolutional neural network model constructed by the method is an end-to-end learning system as shown in fig. 1, and the model integrates tasks such as text feature representation, image feature learning, text feature learning, cross-modal retrieval and reordering and the like into the same learning framework.

The invention has the following beneficial effects:

1) a vehicle appearance recognition framework of multitask deep learning is designed. The generalization capability of the system is improved by using weight sharing in the correlation parallel processing process among tasks, the influence of overfitting on a neural network is weakened, the problem that the generalization capability of a classifier is not strong due to insufficient samples is solved, different network structures are tried, and finally, tasks which are correlated with each other are fused, so that the network parameter sharing is maximized.

2) A segmented approach is employed to learn hash codes in conjunction with a multitasking network architecture to reduce redundancy between binary hash codes. Each task is responsible for learning a part of hash codes without mutual connection, and accurate image feature representation of each vehicle is obtained through the vector fusion method provided by the text, and the feature is called as the segmented compact feature of the vehicle; the method comprises the steps of constructing example features of a feature pyramid network capture image by adopting a multi-layer combination of a shared stacked convolution layer, a pyramid pooling layer and a Vector flat layer, namely a Vector flat layer, and finally performing Vector re-fusion on image representations of two kinds of acquired different feature dimension information to obtain a final retrieval feature Vector.

3) And providing a local sensitive Hash reordering retrieval method for quickly matching the acquired retrieval characteristics so as to meet the actual application requirements of intelligent transportation. The retrieval method comprises the steps of firstly mapping images in a query library to each barrel by using a segmented compact hash code, then sequencing the images in the barrel by using example feature vectors again, screening out the most similar images of topK by depending on different feature dimensions of vehicles, and utilizing the mapping of coding vectors to avoid the one-to-one comparison of the images so as to achieve the effect of quick real-time retrieval.

4) Aiming at special conditions that image information of a vehicle cannot be acquired, the camera view is fuzzy at night or the illumination is too strong in daytime, the camera is dead, and the like, the invention provides a cross-modal auxiliary retrieval mode to meet actual requirements of different environments; and summarizing vehicle characteristics according to manual judgment, and converting the vehicle characteristics into text data to be sent into a retrieval network to realize auxiliary retrieval.

Drawings

FIG. 1 is an overall network framework for fast hash retrieval of a multitasking deep convolutional neural network;

FIG. 2 is a schematic representation of a reordering sequence;

FIG. 3 is an illustration of a text feature vector generation process;

fig. 4 is a diagram of an RPN network architecture;

FIG. 5 is a diagram of a multitask Faster R-CNN deep convolutional network.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 5, a fast hash vehicle retrieval method based on multitask deep learning includes:

thirdly, adopting a locality sensitive hash reordering algorithm;

In the first step, a multitask deep convolution neural network for deep learning and training recognition is shown in fig. 1; adopting fast R-CNN as the basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;

the RPN, as shown in fig. 4; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;

with these definitions, the objective function is minimized following the multitasking penalty in Faster R-CNN; the loss function for an image is defined as:

That is, 1, if anchor is negative,

is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, where λ is 10, N_clsIs the normalized value of the cls term, where N is the size of the mini-batch_cls＝256，N_regIs the number of anchor positions normalized by the reg term, N_reg2,400, classification loss function L_clsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:

for the regression loss function L_regDefined by the following function:

In the formula, smooth_L1Is smooth L₁The loss function, x, is a variable.

The multitasking deep convolutional neural network, as shown in fig. 5; in order to integrate a plurality of tasks for learning and training, it is crucial to design a multi-task objective function; the multitask objective function is expressed by formula (5);

in the formula (I), the compound is shown in the specification,

is an input feature vector

respectively representing a feature vector and a label of the ith sample;

in the formula, x_iIs the ith depth feature, W_jFor the last fully-connected layerThe jth column of the middle weight, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.

The method for fusing the characteristics of the segmented compact hash code and the example characteristics is shown in fig. 1; on one hand, in the vehicle image feature extraction phase, firstly, a threshold value is limited between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;

on the other hand, example features pertaining to vehicles; the vehicle instance features extracted from the convolutional layer are further fused with the compact features extracted from the multitask deep learning vehicle retrieval network under the enlightening of the image pyramid technology, so that the retrieval result is more accurate and reliable; the realization method comprises the following steps: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;

f_A＝[α¹H¹；α²H²；...；α^tH^t] (9)

Wherein f is_ARepresenting a vehicle segmented compact hash code vector, alpha^tRepresenting the coefficient, calculated using equation (10), H^tRepresents the binary output of the excitation segment coding module, te (1, T):

The feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the specific implementation method is as follows:

select {4 } for the deepest level of conv2_ x through conv5_ x, respectively²,8²,16²,16²As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {_n},n∈(1,d)，S_nThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into pyramid pooling layer to obtain three-dimensional vector T', sizeL × l × d, still comprising a series of characteristic maps S ' ═ S ' { S '_n},n∈(1,d)，S′_nOf l × l for each S'_nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'_nBecomes l/kXl/k, and S 'for each channel'_nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector f_BThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);

f＝[f_A；f_B] (11)

The algorithm for improving the search performance of the locality sensitive hash reordering algorithm is shown in fig. 2, and the idea of the algorithm is to map similar samples into the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:

s{h(f_Aq)＝h(f_A)}＝sim(f_Aq,f_A) (12)

h(f_A)＝sign(Wf_A+b) (14)

where W is a random hyperplane vector and b is a random intercept.

In the feature fusion method of the segmented compact hash code and the example features, in order to enable similar images to be closer, after a query image is mapped into a similar bucket through the segmented compact hash code, the image returned from the bucket is reordered by using the example features of the image in combination with a formula (15); the reordering calculation method is shown in equation (15):

where k denotes the kth image in the bucket,

represents a penalty factor and

whether they are equal; y is 1 if equal, 0 otherwise,

representing the kth image vehicle segment compact hash code vector, f_AqRepresenting the reordered vehicle segment compact hash code vectors;

coefficient of addition

The purpose is to ensure the correctness of LSH mapping, namely calculating the similarity of the example feature vectors under the condition of the same segment compact hash code, and when different segment compact hash codes are mapped into the same bucket, using a penalty factor

The cross-modal retrieval method is realized by constructingEstablishing a group of deep neural networks to map the image and text data to a common semantic space in a feature learning manner so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_A(ii) a Then, the text is generated into a retrieval feature vector, so that the feature vectors generated by the text and the retrieval feature vector can be retrieved by using the same retrieval system, and the specific implementation process is shown in fig. 3.

inputting: a text O; and (3) outputting: a set of roughly similar images;

At this time f_ATxtA dimension less than R;

STEP 4: searching by using a locality sensitive reordering Hash algorithm;

STEP 5: returning a similar image group I;

wherein the text attribute feature function

Expressed by equation (16):

for the text attribute feature function, sign represents a sign function;

in the formula (I), the compound is shown in the specification,

diag denotes the determination of the diagonal matrix,

The above description is only exemplary of the preferred embodiments of the present invention, and is not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A fast Hash vehicle retrieval method based on multitask deep learning is characterized in that: the vehicle retrieval method includes the steps of:

thirdly, adopting a locality sensitive hash reordering algorithm;

fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval;

in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);

in the formula (I), the compound is shown in the specification,

is an input feature vector

respectively representing a feature vector and a label of the ith sample;

2. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1, characterized in that: in the first step, fast R-CNN is used as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;

That is, 1, if anchor is negative,

for the regression loss function L_regDefined by the following function:

In the formula, smooth_L1Is smooth L₁The loss function, x, is a variable.

3. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the second step, the feature fusion method of the segment compact hash code and the example feature comprises the following steps:

4. The fast hash vehicle retrieval method based on multitask deep learning according to claim 3, wherein: the vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each task^tClass, in m^tRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]In the formula (7)Calculating;

f_A＝[α¹H¹；α²H²；...；α^tH^t] (9)

5. The fast hash vehicle retrieval method based on multitask deep learning according to claim 4, wherein: in the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:

f＝[f_A；f_B] (11)

6. The fast hash vehicle retrieval method based on multitask deep learning according to claim 5, wherein: in the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:

s{h(f_Aq)＝h(f_A)}＝sim(f_Aq,f_A) (12)

h(f_A)＝sign(Wf_A+b) (14)

where W is a random hyperplane vector and b is a random intercept.

7. The multitask deep learning based fast hash vehicle retrieval method according to claim 6 wherein: in the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):

where k denotes the kth image in the bucket,

represents a penalty factor and

whether they are equal; y is 1 if equal, 0 otherwise,

8. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_A(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.

9. The multitask deep learning based fast hash vehicle retrieval method according to claim 8 wherein: the semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:

inputting: a text O; and (3) outputting: a set of roughly similar images;

At this time f_ATxtA dimension less than R;

STEP 4: searching by using a locality sensitive reordering Hash algorithm;

STEP 5: returning a similar image group I;

wherein the text attribute feature function

Expressed by equation (16):

for the text attribute feature function, sign represents a sign function;

in the formula (I), the compound is shown in the specification,

diag denotes the determination of the diagonal matrix,