CN107885764B  Rapid Hash vehicle retrieval method based on multitask deep learning  Google Patents
Rapid Hash vehicle retrieval method based on multitask deep learning Download PDFInfo
 Publication number
 CN107885764B CN107885764B CN201710857318.5A CN201710857318A CN107885764B CN 107885764 B CN107885764 B CN 107885764B CN 201710857318 A CN201710857318 A CN 201710857318A CN 107885764 B CN107885764 B CN 107885764B
 Authority
 CN
 China
 Prior art keywords
 vehicle
 vector
 retrieval
 feature
 hash
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 230000001537 neural Effects 0.000 claims abstract description 29
 230000011218 segmentation Effects 0.000 claims abstract description 12
 238000007500 overflow downdraw method Methods 0.000 claims abstract description 8
 230000000875 corresponding Effects 0.000 claims description 13
 230000036880 Cls Effects 0.000 claims description 10
 230000005284 excitation Effects 0.000 claims description 9
 238000000034 method Methods 0.000 claims description 8
 239000011159 matrix material Substances 0.000 claims description 7
 238000011176 pooling Methods 0.000 claims description 7
 230000004913 activation Effects 0.000 claims description 6
 150000001875 compounds Chemical class 0.000 claims description 6
 230000000694 effects Effects 0.000 claims description 6
 238000000605 extraction Methods 0.000 claims description 6
 238000010586 diagram Methods 0.000 claims description 5
 238000004458 analytical method Methods 0.000 claims description 3
 238000004364 calculation method Methods 0.000 claims description 3
 230000001808 coupling Effects 0.000 claims description 3
 238000010168 coupling process Methods 0.000 claims description 3
 238000005859 coupling reaction Methods 0.000 claims description 3
 230000004927 fusion Effects 0.000 claims description 3
 238000005516 engineering process Methods 0.000 description 16
 239000003550 marker Substances 0.000 description 7
 238000001514 detection method Methods 0.000 description 4
 239000000203 mixture Substances 0.000 description 4
 238000004642 transportation engineering Methods 0.000 description 3
 230000000007 visual effect Effects 0.000 description 3
 230000001133 acceleration Effects 0.000 description 2
 238000007689 inspection Methods 0.000 description 2
 101710040755 PBK Proteins 0.000 description 1
 230000002596 correlated Effects 0.000 description 1
 239000000284 extract Substances 0.000 description 1
 238000005286 illumination Methods 0.000 description 1
 238000009114 investigational therapy Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 238000003062 neural network model Methods 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
 G06F16/36—Creation of semantic tools, e.g. ontology or thesauri

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
 G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
 G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
 G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
 G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
 G06K9/62—Methods or arrangements for recognition using electronic means
 G06K9/6267—Classification techniques
 G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches
 G06K9/6277—Classification techniques relating to the classification paradigm, e.g. parametric or nonparametric approaches based on a parametric (probabilistic) model, e.g. based on NeymanPearson lemma, likelihood ratio, Receiver Operating Characteristic [ROC] curve plotting a False Acceptance Rate [FAR] versus a False Reject Rate [FRR]

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computer systems based on biological models
 G06N3/02—Computer systems based on biological models using neural network models
 G06N3/04—Architectures, e.g. interconnection topology
 G06N3/0454—Architectures, e.g. interconnection topology using a combination of multiple neural nets
Abstract
A fast Hash vehicle retrieval method based on multitask deep learning comprises a multitask deep convolution neural network used for deep learning and training recognition, a feature fusion method of a segmented compact Hash code and an example feature used for improving retrieval precision and retrieval method practicability, a local sensitive Hash reordering algorithm used for improving retrieval performance and a crossmodal retrieval method used for improving retrieval engine robustness and accuracy. Firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a crossmode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.
Description
Technical Field
The invention relates to the application of artificial intelligence, digital image processing, convolutional neural network and computer vision in the field of public safety, and belongs to the field of intelligent transportation.
Background
Today, smart cities and intelligent transportation are rapidly developing, and the demands for largescale image monitoring, vehicle identification of video databases and vehicle retrieval in public safety systems are rapidly increasing.
In the prior art, a vehicle retrieval method mainly extracts license plate information of a target vehicle. Then, the motor vehicle to be retrieved is retrieved according to the license plate information. This is typically done by identifying the license plate number of the vehicle from the monitored images and then identifying the motor vehicle having the license plate number from the other monitored images. Although this method of searching only by the license plate number is easy to implement, it is not possible to effectively search for a motor vehicle that cannot acquire license plate information, such as a fakelicenseplate vehicle.
The vehicle retrieval technology based on the appearance characteristics can not only make up the limitations and the defects of the traditional license plate identification method, but also has very important practical significance and very wide application prospect in intelligent vehicle retrieval, especially in violation inspection, hitandmiss pursuit, criminal suspect vehicle locking, fake license vehicle identification and criminal investigation case solving efficiency and speed acceleration.
The existing vehicle retrieval method basically utilizes the algorithms of sift, surf, dog and the like to extract the whole image characteristics of a target vehicle image, the characteristics are taken as target characteristics, the same algorithm is utilized to extract the whole image characteristics of each vehicle image in a database, the characteristics are taken as characteristics to be matched, the Euclidean distance between the target characteristics and each characteristic to be matched is calculated, and the vehicle corresponding to the characteristic to be matched with the closest Euclidean distance is taken as the target vehicle.
Vehicle retrieval requires finding a specific target vehicle among a series of similarly contoured vehicles, making the task even more challenging; furthermore, the influence of practical conditions, such as monitoring environment, weather conditions and lighting conditions, is taken into account.
In recent years, the technology of deep learning in the field of computer vision is rapidly developed, and the deep learning can utilize a large number of training samples and hidden layers to deeply learn abstract information of an image layer by layer so as to more comprehensively and directly acquire image characteristics. The digital image is described by a matrix, and the convolutional neural network better starts from a local information block to further describe the overall structure of the image, so that the convolutional neural network is mostly adopted to solve the problem in the field of computer vision and deep learning methods. The deep convolutional neural network technology is from RCNN, fast RCNN to Fasterer RCNN around improving the detection precision and the detection time. The method is characterized by further precision improvement, acceleration, endtoend and more practicability, and almost covers all fields from classification to detection, segmentation and positioning. The application of the deep learning technology to vehicle retrieval is a research field with practical application value.
Reordering is a technique commonly used in image retrieval technology to improve retrieval performance, for example, initial retrieval results may be reordered through visual feature matching relationship between image pairs. However, the reordering effect depends strongly on whether the visual features used are sufficiently effective to represent the image.
In similar vehicle search, as many vehicles are often similar in appearance, the extracted visual features are also similar and different vehicle types cannot be distinguished, so that similar vehicles cannot be well retrieved by the reordering method directly using the matching relationship between the image pairs.
Query expansion is a common method used in search technology to improve recall and accuracy. The query expansion technology is a method of adding new keywords to an original query sentence to requery, for example, a search engine searches the query sentence input by a user once, selects suitable keywords according to a searched file, and adds the keywords to the query sentence to research, thereby finding out more related files. Therefore, query expansion can effectively improve the recall rate of information retrieval, but no specific query expansion method is provided for a special object, namely a vehicle in an image in the prior art.
Compared with the traditional vehicle retrieval method based on the license plate number, the method provided by the Chinese patent application with the application number of 201510744990.4 not only effectively avoids the dependence on the license plate recognition accuracy, but also can be used for retrieving fakelicensed vehicles and fakelicensed vehicles. However, this technology is also a computer vision technology belonging to the predeep learning age.
The chinese patent application with application number 201610671729.0 discloses a vehicle retrieval method and device based on big data, the method includes: extracting brand features of the target vehicle in the target vehicle image; determining the probability that each pixel point in the target vehicle image corresponds to each marker, wherein the markers comprise one or more of annual inspection markers, ornaments and hanging decorations; determining the position of each marker in the target vehicle image according to the probability of each pixel point in the target vehicle image corresponding to each marker and the probability threshold value corresponding to each marker; extracting image features of each marker according to the position of each marker in the target vehicle image; and searching the target vehicle in the plurality of vehicle images to be searched according to the image characteristics of each marker in the target vehicle image and the brand characteristics of the target vehicle. Although the technology adopts the deep learning technology, the technology belongs to the deep learning technology of a single task; however, vehicle retrieval is a typical multitasking deep learning technique.
Chinese patent application No. 201410381577.1 discloses a query expansion method and device in similar vehicle retrieval, wherein the method comprises: determining vehicle type information of an image to be inquired according to the image to be inquired comprising a vehicle; selecting a plurality of sample images which accord with preset conditions from a vehicle model template library corresponding to the vehicle model information of the image to be inquired; forming a query expansion image set by the sample images so that the sample images in the query expansion image set replace the images to be queried to query in a target database; wherein, motorcycle type template storehouse includes: the method can improve the recall rate and the accuracy rate of the vehicle image retrieval. The method is also an image retrieval technology belonging to the early deep learning era.
Chinese patent application No. 201410652730.X discloses an imagebased motor vehicle retrieval method and apparatus. The method comprises the following steps: acquiring a first image containing information of a motor vehicle to be retrieved; determining a first appearance contour of the motor vehicle to be retrieved from the first image; dividing the image in the first appearance contour into a plurality of areas, and extracting the image characteristics of each area by adopting different step lengths; combining the image characteristics of all the areas to obtain the overall image characteristics of the motor vehicle to be retrieved; and comparing the overall image features of the motor vehicle to be retrieved with the preextracted overall image features of the target motor vehicle to obtain a comparison result. The method is also an image retrieval technology belonging to the early deep learning era.
Disclosure of Invention
Aiming at the problems of how to efficiently utilize mass video data generated in the field of public safety and improve the vehicle retrieval efficiency in the big data era, the invention provides a fast Hash retrieval method based on multitask deep learning, which effectively utilizes the relevance among detection and identification tasks and the diversity of basic information of vehicles at a checkpoint to realize the purpose of realtime retrieval; finally, the multitask deep learning quick Hash vehicle retrieval method with high retrieval precision and good robustness is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a fast Hash vehicle retrieval method based on multitask deep learning comprises the following steps:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a crossmodal retrieval method is adopted to realize vehicle retrieval.
Further, in the first step, fast RCNN is adopted as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a lowdimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/nontarget is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersectionoverUnion, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the nonpositive and nonnegative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster RCNN, minimizing the objective function; the loss function for an image is defined as:
where i is the index of an anchor, p_{i}Is the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is t_{i}Is a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, N_{cls}Is the normalized value of the cls term being the size of the minibatch, N_{reg}Is the normalized value of reg terms as the number of anchor positions, the classification loss function L_{cls}Are two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function L_{reg}Defined by the following function:
in the formula, L_{reg}For the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)_{1}；
In the formula, smooth_{L1}Is smooth L_{1}The loss function, x, is a variable.
Further, in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is a characteristic of inputVector quantityAnd a weight parameter w^{t}L () is a loss function, phi (w)^{t}) Is the regularization value of the weight parameter, T is the total task number, and the training data of the Tth task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the loglikelihood cost function to train the characteristics of the last layer to realize the multitask image classification, the softmax loss function is defined by the formula (6),
in the formula, x_{i}Is the ith depth feature, W_{j}The jth column of weights in the last fullyconnected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
Furthermore, in the second step, the feature fusion method process of the segment compact hash code and the example feature is as follows:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved threedimensional features into onedimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each task^{t}Class, in m^{t}Representing the fullyconnected output vector of each task, and enabling the fullyconnected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
where θ represents a random hyperplane, m^{t}A fullyconnected output vector representing each task, c^{t}Representing the categories existing under each task, q^{t}Representing a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q is^{t}Represents the full connection layer output, H^{t}Representing the binary output of the excitation segment coding module;
finally, H is put^{t}Fusing into a vehicle segment compact hash code vector f_{A}：
f_{A}＝[α^{1}H^{1}；α^{2}H^{2}；...；α^{t}H^{t}] (9)
Wherein f is_{A}Representing a vehicle segmented compact hash code vector, alpha^{t}Representing the coefficient, calculated using equation (10), H^{t}Representing the binary output of the excitation segment coding module, te (1,T)：
wherein alpha is^{t}Representing coefficients, multiplied by a coefficient alpha before the H vector^{t}In order to compensate for errors caused by uneven classification among different tasks.
In the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively^{2},8^{2},16^{2},16^{2}As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a threedimensional vector T of size h '× w' × d, containing a series of twodimensional feature maps S ═ S {_{n}},n∈(1,d)，S_{n}The size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a threedimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'_{n}},n∈(1,d)，S′_{n}Of l × l for each S'_{n}Traversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'_{n}Becomes l/kXl/k, and S 'for each channel'_{n}Fusing to obtain a onedimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector f_{B}The size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f＝[f_{A}；f_{B}] (11)
where f is a feature vector for vehicle retrieval, f_{B}Instance feature vectors, i.e. personality feature vectors, f_{A}Representing a vehicle segmented compact hash code vector.
In the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(f_{Aq})＝h(f_{A})}＝sim(f_{Aq},f_{A}) (12)
in the formula, sim (f)_{Aq},f_{A}) Denotes f_{Aq}And f_{A}Similarity of (c), h (f)_{A}) Denotes f_{A}Hash function of h (f)_{Aq}) Denotes f_{Aq}The hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the localitysensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(f_{A})＝sign(Wf_{A}+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mapping_{Aq}Andwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, f_{Aq}Representing reordered vehicle segmented compact hash code vectorsMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
In the fourth step, the crossmodal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a onedimensional convolutional neural network; first, a piecewise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_{A}(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)_{1},r_{2},...,r_{n})；
STEP 3: to R and f_{A}Integrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time f_{ATxt}A dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, A^{T}Expressed as the transposed matrix of the vehicle piecewise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code A^{T}Initialized to the full 1 vector of (1 × c).
The technical conception of the invention is as follows: firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a crossmode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.
The text generation retrieval feature vector is the same as the segmented compact hash code vector generated by the convolutional network, so that the feature vectors generated by the text generation retrieval feature vector and the segmented compact hash code vector can be retrieved by using the same retrieval system without additional training.
The deep convolutional neural network model constructed by the method is an endtoend learning system as shown in fig. 1, and the model integrates tasks such as text feature representation, image feature learning, text feature learning, crossmodal retrieval and reordering and the like into the same learning framework.
The invention has the following beneficial effects:
1) a vehicle appearance recognition framework of multitask deep learning is designed. The generalization capability of the system is improved by using weight sharing in the correlation parallel processing process among tasks, the influence of overfitting on a neural network is weakened, the problem that the generalization capability of a classifier is not strong due to insufficient samples is solved, different network structures are tried, and finally, tasks which are correlated with each other are fused, so that the network parameter sharing is maximized.
2) A segmented approach is employed to learn hash codes in conjunction with a multitasking network architecture to reduce redundancy between binary hash codes. Each task is responsible for learning a part of hash codes without mutual connection, and accurate image feature representation of each vehicle is obtained through the vector fusion method provided by the text, and the feature is called as the segmented compact feature of the vehicle; the method comprises the steps of constructing example features of a feature pyramid network capture image by adopting a multilayer combination of a shared stacked convolution layer, a pyramid pooling layer and a Vector flat layer, namely a Vector flat layer, and finally performing Vector refusion on image representations of two kinds of acquired different feature dimension information to obtain a final retrieval feature Vector.
3) And providing a local sensitive Hash reordering retrieval method for quickly matching the acquired retrieval characteristics so as to meet the actual application requirements of intelligent transportation. The retrieval method comprises the steps of firstly mapping images in a query library to each barrel by using a segmented compact hash code, then sequencing the images in the barrel by using example feature vectors again, screening out the most similar images of topK by depending on different feature dimensions of vehicles, and utilizing the mapping of coding vectors to avoid the onetoone comparison of the images so as to achieve the effect of quick realtime retrieval.
4) Aiming at special conditions that image information of a vehicle cannot be acquired, the camera view is fuzzy at night or the illumination is too strong in daytime, the camera is dead, and the like, the invention provides a crossmodal auxiliary retrieval mode to meet actual requirements of different environments; and summarizing vehicle characteristics according to manual judgment, and converting the vehicle characteristics into text data to be sent into a retrieval network to realize auxiliary retrieval.
Drawings
FIG. 1 is an overall network framework for fast hash retrieval of a multitasking deep convolutional neural network;
FIG. 2 is a schematic representation of a reordering sequence;
FIG. 3 is an illustration of a text feature vector generation process;
fig. 4 is a diagram of an RPN network architecture;
FIG. 5 is a diagram of a multitask Faster RCNN deep convolutional network.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 5, a fast hash vehicle retrieval method based on multitask deep learning includes:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a crossmodal retrieval method is adopted to realize vehicle retrieval.
In the first step, a multitask deep convolution neural network for deep learning and training recognition is shown in fig. 1; adopting fast RCNN as the basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a lowdimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN, as shown in fig. 4; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/nontarget is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersectionoverUnion, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the nonpositive and nonnegative anchors have no effect on the training target, abandoning the anchors;
with these definitions, the objective function is minimized following the multitasking penalty in Faster RCNN; the loss function for an image is defined as:
where i is the index of an anchor, p_{i}Is the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is t_{i}Is a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, where λ is 10, N_{cls}Is the normalized value of the cls term, where N is the size of the minibatch_{cls}＝256，N_{reg}Is the number of anchor positions normalized by the reg term, N_{reg}2,400, classification loss function L_{cls}Are two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function L_{reg}Defined by the following function:
in the formula, L_{reg}For the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)_{1}；
In the formula, smooth_{L1}Is smooth L_{1}The loss function, x, is a variable.
The multitasking deep convolutional neural network, as shown in fig. 5; in order to integrate a plurality of tasks for learning and training, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is an input feature vectorAnd a weight parameter w^{t}L () is a loss function, phi (w)^{t}) Is the regularization value of the weight parameter, T is the total task number, and the training data of the Tth task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the loglikelihood cost function to train the characteristics of the last layer to realize the multitask image classification, the softmax loss function is defined by the formula (6),
in the formula, x_{i}Is the ith depth feature, W_{j}For the last fullyconnected layerThe jth column of the middle weight, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
The method for fusing the characteristics of the segmented compact hash code and the example characteristics is shown in fig. 1; on one hand, in the vehicle image feature extraction phase, firstly, a threshold value is limited between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
on the other hand, example features pertaining to vehicles; the vehicle instance features extracted from the convolutional layer are further fused with the compact features extracted from the multitask deep learning vehicle retrieval network under the enlightening of the image pyramid technology, so that the retrieval result is more accurate and reliable; the realization method comprises the following steps: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved threedimensional features into onedimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each task^{t}Class, in m^{t}Representing the fullyconnected output vector of each task, and enabling the fullyconnected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
where θ represents a random hyperplane, m^{t}A fullyconnected output vector representing each task, c^{t}Representing the categories existing under each task, q^{t}Representing a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q is^{t}Represents the full connection layer output, H^{t}Representing the binary output of the excitation segment coding module;
finally, H is put^{t}Fusing into a vehicle segment compact hash code vector f_{A}：
f_{A}＝[α^{1}H^{1}；α^{2}H^{2}；...；α^{t}H^{t}] (9)
Wherein f is_{A}Representing a vehicle segmented compact hash code vector, alpha^{t}Representing the coefficient, calculated using equation (10), H^{t}Represents the binary output of the excitation segment coding module, te (1, T):
wherein alpha is^{t}Representing coefficients, multiplied by a coefficient alpha before the H vector^{t}In order to compensate for errors caused by uneven classification among different tasks.
The feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the specific implementation method is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively^{2},8^{2},16^{2},16^{2}As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a threedimensional vector T of size h '× w' × d, containing a series of twodimensional feature maps S ═ S {_{n}},n∈(1,d)，S_{n}The size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into pyramid pooling layer to obtain threedimensional vector T', sizeL × l × d, still comprising a series of characteristic maps S ' ═ S ' { S '_{n}},n∈(1,d)，S′_{n}Of l × l for each S'_{n}Traversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'_{n}Becomes l/kXl/k, and S 'for each channel'_{n}Fusing to obtain a onedimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector f_{B}The size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f＝[f_{A}；f_{B}] (11)
where f is a feature vector for vehicle retrieval, f_{B}Instance feature vectors, i.e. personality feature vectors, f_{A}Representing a vehicle segmented compact hash code vector.
The algorithm for improving the search performance of the locality sensitive hash reordering algorithm is shown in fig. 2, and the idea of the algorithm is to map similar samples into the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(f_{Aq})＝h(f_{A})}＝sim(f_{Aq},f_{A}) (12)
in the formula, sim (f)_{Aq},f_{A}) Denotes f_{Aq}And f_{A}Similarity of (c), h (f)_{A}) Denotes f_{A}Hash function of h (f)_{Aq}) Denotes f_{Aq}The hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the localitysensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(f_{A})＝sign(Wf_{A}+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the feature fusion method of the segmented compact hash code and the example features, in order to enable similar images to be closer, after a query image is mapped into a similar bucket through the segmented compact hash code, the image returned from the bucket is reordered by using the example features of the image in combination with a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mapping_{Aq}Andwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, f_{Aq}Representing the reordered vehicle segment compact hash code vectors;
coefficient of additionThe purpose is to ensure the correctness of LSH mapping, namely calculating the similarity of the example feature vectors under the condition of the same segment compact hash code, and when different segment compact hash codes are mapped into the same bucket, using a penalty factorMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
The crossmodal retrieval method is realized by constructingEstablishing a group of deep neural networks to map the image and text data to a common semantic space in a feature learning manner so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a onedimensional convolutional neural network; first, a piecewise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_{A}(ii) a Then, the text is generated into a retrieval feature vector, so that the feature vectors generated by the text and the retrieval feature vector can be retrieved by using the same retrieval system, and the specific implementation process is shown in fig. 3.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)_{1},r_{2},...,r_{n})；
STEP 3: to R and f_{A}Integrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time f_{ATxt}A dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, A^{T}Expressed as the transposed matrix of the vehicle piecewise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code A^{T}Initialized to the full 1 vector of (1 × c).
The above description is only exemplary of the preferred embodiments of the present invention, and is not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A fast Hash vehicle retrieval method based on multitask deep learning is characterized in that: the vehicle retrieval method includes the steps of:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
fourthly, a crossmodal retrieval method is adopted to realize vehicle retrieval;
in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is an input feature vectorAnd a weight parameter w^{t}L () is a loss function, phi (w)^{t}) Is the regularization value of the weight parameter, T is the total task number, and the training data of the Tth task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the loglikelihood cost function to train the characteristics of the last layer to realize the multitask image classification, the softmax loss function is defined by the formula (6),
in the formula, x_{i}Is the ith depth feature, W_{j}The jth column of weights in the last fullyconnected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
2. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1, characterized in that: in the first step, fast RCNN is used as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a lowdimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/nontarget is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersectionoverUnion, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the nonpositive and nonnegative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster RCNN, minimizing the objective function; the loss function for an image is defined as:
where i is the index of an anchor, p_{i}Is the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is t_{i}Is a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, N_{cls}Is the normalized value of the cls term being the size of the minibatch, N_{reg}Is the normalized value of reg terms as the number of anchor positions, the classification loss function L_{cls}Are two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function L_{reg}Defined by the following function:
in the formula, L_{reg}For the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)_{1}；
In the formula, smooth_{L1}Is smooth L_{1}The loss function, x, is a variable.
3. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the second step, the feature fusion method of the segment compact hash code and the example feature comprises the following steps:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved threedimensional features into onedimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
4. The fast hash vehicle retrieval method based on multitask deep learning according to claim 3, wherein: the vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each task^{t}Class, in m^{t}Representing the fullyconnected output vector of each task, and enabling the fullyconnected layer output to be [0,1] by utilizing the softmax activation function]In the formula (7)Calculating;
where θ represents a random hyperplane, m^{t}A fullyconnected output vector representing each task, c^{t}Representing the categories existing under each task, q^{t}Representing a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q is^{t}Represents the full connection layer output, H^{t}Representing the binary output of the excitation segment coding module;
finally, H is put^{t}Fusing into a vehicle segment compact hash code vector f_{A}：
f_{A}＝[α^{1}H^{1}；α^{2}H^{2}；...；α^{t}H^{t}] (9)
Wherein f is_{A}Representing a vehicle segmented compact hash code vector, alpha^{t}Representing the coefficient, calculated using equation (10), H^{t}Represents the binary output of the excitation segment coding module, te (1, T):
wherein alpha is^{t}Representing coefficients, multiplied by a coefficient alpha before the H vector^{t}In order to compensate for errors caused by uneven classification among different tasks.
5. The fast hash vehicle retrieval method based on multitask deep learning according to claim 4, wherein: in the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively^{2},8^{2},16^{2},16^{2}As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a threedimensional vector T of size h '× w' × d, containing a series of twodimensional feature maps S ═ S {_{n}},n∈(1,d)，S_{n}The size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a threedimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'_{n}},n∈(1,d)，S′_{n}Of l × l for each S'_{n}Traversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'_{n}Becomes l/kXl/k, and S 'for each channel'_{n}Fusing to obtain a onedimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector f_{B}The size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f＝[f_{A}；f_{B}] (11)
where f is a feature vector for vehicle retrieval, f_{B}Instance feature vectors, i.e. personality feature vectors, f_{A}Representing a vehicle segmented compact hash code vector.
6. The fast hash vehicle retrieval method based on multitask deep learning according to claim 5, wherein: in the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(f_{Aq})＝h(f_{A})}＝sim(f_{Aq},f_{A}) (12)
in the formula, sim (f)_{Aq},f_{A}) Denotes f_{Aq}And f_{A}Similarity of (c), h (f)_{A}) Denotes f_{A}Hash function of h (f)_{Aq}) Denotes f_{Aq}The hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the localitysensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(f_{A})＝sign(Wf_{A}+b) (14)
where W is a random hyperplane vector and b is a random intercept.
7. The multitask deep learning based fast hash vehicle retrieval method according to claim 6 wherein: in the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mapping_{Aq}Andwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, f_{Aq}Representing reordered vehicle segmented compact hash code vectorsMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
8. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the fourth step, the crossmodal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a onedimensional convolutional neural network; first, a piecewise compact hash f of the vehicle is dynamically generated by a deep convolutional neural network_{A}(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
9. The multitask deep learning based fast hash vehicle retrieval method according to claim 8 wherein: the semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)_{1},r_{2},...,r_{n})；
STEP 3: to R and f_{A}Integrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time f_{ATxt}A dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, A^{T}Expressed as the transposed matrix of the vehicle piecewise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code A^{T}Initialized to the full 1 vector of (1 × c).
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201710857318.5A CN107885764B (en)  20170921  20170921  Rapid Hash vehicle retrieval method based on multitask deep learning 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201710857318.5A CN107885764B (en)  20170921  20170921  Rapid Hash vehicle retrieval method based on multitask deep learning 
Publications (2)
Publication Number  Publication Date 

CN107885764A CN107885764A (en)  20180406 
CN107885764B true CN107885764B (en)  20201218 
Family
ID=61780800
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201710857318.5A Active CN107885764B (en)  20170921  20170921  Rapid Hash vehicle retrieval method based on multitask deep learning 
Country Status (1)
Country  Link 

CN (1)  CN107885764B (en) 
Families Citing this family (12)
Publication number  Priority date  Publication date  Assignee  Title 

CN108629414B (en) *  20180509  20200414  清华大学  Deep hash learning method and device 
CN109033172A (en) *  20180621  20181218  西安理工大学  A kind of image search method of deep learning and approximate target positioning 
CN109035267A (en) *  20180622  20181218  华东师范大学  A kind of image object based on deep learning takes method 
CN109086866A (en) *  20180702  20181225  重庆大学  A kind of part twovalue convolution method suitable for embedded device 
CN109034245B (en) *  20180727  20210205  燕山大学  Target detection method using feature map fusion 
CN109145798A (en) *  20180813  20190104  浙江零跑科技有限公司  A kind of Driving Scene target identification and travelable region segmentation integrated approach 
CN109886286A (en) *  20190103  20190614  武汉精测电子集团股份有限公司  Object detection method, target detection model and system based on cascade detectors 
CN109994201A (en) *  20190318  20190709  浙江大学  A kind of diabetes based on deep learning and hypertension method for calculating probability 
CN110163106A (en) *  20190419  20190823  中国科学院计算技术研究所  Integral type is tatooed detection and recognition methods and system 
CN110222140A (en) *  20190422  20190910  中国科学院信息工程研究所  A kind of crossmodule state search method based on confrontation study and asymmetric Hash 
CN110298404B (en) *  20190702  20201229  西南交通大学  Target tracking method based on triple twin Hash network learning 
CN111639240B (en) *  20200514  20210409  山东大学  Crossmodal Hash retrieval method and system based on attention awareness mechanism 
Family Cites Families (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN106250812B (en) *  20160715  20190820  汤一平  A kind of model recognizing method based on quick RCNN deep neural network 
CN106227851B (en) *  20160729  20191001  汤一平  The image search method of depth of seam division search based on depth convolutional neural networks 
CN106528662A (en) *  20161020  20170322  中山大学  Quick retrieval method and system of vehicle image on the basis of feature geometric constraint 

2017
 20170921 CN CN201710857318.5A patent/CN107885764B/en active Active
Also Published As
Publication number  Publication date 

CN107885764A (en)  20180406 
Similar Documents
Publication  Publication Date  Title 

Noh et al.  Largescale image retrieval with attentive deep local features  
Long et al.  Accurate object localization in remote sensing images based on convolutional neural networks  
US10354406B2 (en)  Method of detecting objects within a 3D environment  
Weyand et al.  Planetphoto geolocation with convolutional neural networks  
CN106126581B (en)  Cartographical sketching image search method based on deep learning  
Li et al.  GPS estimation for places of interest from social users' uploaded photos  
Huttunen et al.  Car type recognition with deep neural networks  
Chaudhuri et al.  Multilabel remote sensing image retrieval using a semisupervised graphtheoretic method  
Chen et al.  Vehicle detection in highresolution aerial images via sparse representation and superpixels  
Lin et al.  Crossview image geolocalization  
Rusu et al.  Detecting and segmenting objects for mobile manipulation  
Gould et al.  Regionbased Segmentation and Object Detection.  
Sivic et al.  Unsupervised discovery of visual object class hierarchies  
CN102236794B (en)  Recognition and pose determination of 3D objects in 3D scenes  
Fuh et al.  Hierarchical color image region segmentation for contentbased image retrieval system  
Athitsos et al.  Boostmap: An embedding method for efficient nearest neighbor retrieval  
Shyu et al.  GeoIRIS: Geospatial information retrieval and indexing system—Content mining, semantics modeling, and complex queries  
Bosse et al.  Keypoint design and evaluation for place recognition in 2D lidar maps  
CN107729801B (en)  Vehicle color recognition system based on multitask deep convolution neural network  
Hu et al.  Semanticbased surveillance video retrieval  
US20190180464A1 (en)  Remote determination of containers in geographical region  
Chang et al.  Automatic license plate recognition  
JP2014232533A (en)  System and method for ocr output verification  
WO2017012277A1 (en)  Method and device for searching a target in an image  
US20140254923A1 (en)  Image processing and object classification 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 