CN107885764B - Rapid Hash vehicle retrieval method based on multitask deep learning - Google Patents

Rapid Hash vehicle retrieval method based on multitask deep learning Download PDF

Info

Publication number
CN107885764B
CN107885764B CN201710857318.5A CN201710857318A CN107885764B CN 107885764 B CN107885764 B CN 107885764B CN 201710857318 A CN201710857318 A CN 201710857318A CN 107885764 B CN107885764 B CN 107885764B
Authority
CN
China
Prior art keywords
vehicle
vector
hash
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710857318.5A
Other languages
Chinese (zh)
Other versions
CN107885764A (en
Inventor
汤一平
温晓岳
柳展
张文广
樊锦祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinjiang Technology Co.,Ltd.
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201710857318.5A priority Critical patent/CN107885764B/en
Publication of CN107885764A publication Critical patent/CN107885764A/en
Application granted granted Critical
Publication of CN107885764B publication Critical patent/CN107885764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A fast Hash vehicle retrieval method based on multi-task deep learning comprises a multi-task deep convolution neural network used for deep learning and training recognition, a feature fusion method of a segmented compact Hash code and an example feature used for improving retrieval precision and retrieval method practicability, a local sensitive Hash reordering algorithm used for improving retrieval performance and a cross-modal retrieval method used for improving retrieval engine robustness and accuracy. Firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.

Description

Rapid Hash vehicle retrieval method based on multitask deep learning
Technical Field
The invention relates to the application of artificial intelligence, digital image processing, convolutional neural network and computer vision in the field of public safety, and belongs to the field of intelligent transportation.
Background
Today, smart cities and intelligent transportation are rapidly developing, and the demands for large-scale image monitoring, vehicle identification of video databases and vehicle retrieval in public safety systems are rapidly increasing.
In the prior art, a vehicle retrieval method mainly extracts license plate information of a target vehicle. Then, the motor vehicle to be retrieved is retrieved according to the license plate information. This is typically done by identifying the license plate number of the vehicle from the monitored images and then identifying the motor vehicle having the license plate number from the other monitored images. Although this method of searching only by the license plate number is easy to implement, it is not possible to effectively search for a motor vehicle that cannot acquire license plate information, such as a fake-license-plate vehicle.
The vehicle retrieval technology based on the appearance characteristics can not only make up the limitations and the defects of the traditional license plate identification method, but also has very important practical significance and very wide application prospect in intelligent vehicle retrieval, especially in violation inspection, hit-and-miss pursuit, criminal suspect vehicle locking, fake license vehicle identification and criminal investigation case solving efficiency and speed acceleration.
The existing vehicle retrieval method basically utilizes the algorithms of sift, surf, dog and the like to extract the whole image characteristics of a target vehicle image, the characteristics are taken as target characteristics, the same algorithm is utilized to extract the whole image characteristics of each vehicle image in a database, the characteristics are taken as characteristics to be matched, the Euclidean distance between the target characteristics and each characteristic to be matched is calculated, and the vehicle corresponding to the characteristic to be matched with the closest Euclidean distance is taken as the target vehicle.
Vehicle retrieval requires finding a specific target vehicle among a series of similarly contoured vehicles, making the task even more challenging; furthermore, the influence of practical conditions, such as monitoring environment, weather conditions and lighting conditions, is taken into account.
In recent years, the technology of deep learning in the field of computer vision is rapidly developed, and the deep learning can utilize a large number of training samples and hidden layers to deeply learn abstract information of an image layer by layer so as to more comprehensively and directly acquire image characteristics. The digital image is described by a matrix, and the convolutional neural network better starts from a local information block to further describe the overall structure of the image, so that the convolutional neural network is mostly adopted to solve the problem in the field of computer vision and deep learning methods. The deep convolutional neural network technology is from R-CNN, fast R-CNN to Fasterer R-CNN around improving the detection precision and the detection time. The method is characterized by further precision improvement, acceleration, end-to-end and more practicability, and almost covers all fields from classification to detection, segmentation and positioning. The application of the deep learning technology to vehicle retrieval is a research field with practical application value.
Reordering is a technique commonly used in image retrieval technology to improve retrieval performance, for example, initial retrieval results may be reordered through visual feature matching relationship between image pairs. However, the re-ordering effect depends strongly on whether the visual features used are sufficiently effective to represent the image.
In similar vehicle search, as many vehicles are often similar in appearance, the extracted visual features are also similar and different vehicle types cannot be distinguished, so that similar vehicles cannot be well retrieved by the reordering method directly using the matching relationship between the image pairs.
Query expansion is a common method used in search technology to improve recall and accuracy. The query expansion technology is a method of adding new keywords to an original query sentence to re-query, for example, a search engine searches the query sentence input by a user once, selects suitable keywords according to a searched file, and adds the keywords to the query sentence to re-search, thereby finding out more related files. Therefore, query expansion can effectively improve the recall rate of information retrieval, but no specific query expansion method is provided for a special object, namely a vehicle in an image in the prior art.
Compared with the traditional vehicle retrieval method based on the license plate number, the method provided by the Chinese patent application with the application number of 201510744990.4 not only effectively avoids the dependence on the license plate recognition accuracy, but also can be used for retrieving fake-licensed vehicles and fake-licensed vehicles. However, this technology is also a computer vision technology belonging to the pre-deep learning age.
The chinese patent application with application number 201610671729.0 discloses a vehicle retrieval method and device based on big data, the method includes: extracting brand features of the target vehicle in the target vehicle image; determining the probability that each pixel point in the target vehicle image corresponds to each marker, wherein the markers comprise one or more of annual inspection markers, ornaments and hanging decorations; determining the position of each marker in the target vehicle image according to the probability of each pixel point in the target vehicle image corresponding to each marker and the probability threshold value corresponding to each marker; extracting image features of each marker according to the position of each marker in the target vehicle image; and searching the target vehicle in the plurality of vehicle images to be searched according to the image characteristics of each marker in the target vehicle image and the brand characteristics of the target vehicle. Although the technology adopts the deep learning technology, the technology belongs to the deep learning technology of a single task; however, vehicle retrieval is a typical multitasking deep learning technique.
Chinese patent application No. 201410381577.1 discloses a query expansion method and device in similar vehicle retrieval, wherein the method comprises: determining vehicle type information of an image to be inquired according to the image to be inquired comprising a vehicle; selecting a plurality of sample images which accord with preset conditions from a vehicle model template library corresponding to the vehicle model information of the image to be inquired; forming a query expansion image set by the sample images so that the sample images in the query expansion image set replace the images to be queried to query in a target database; wherein, motorcycle type template storehouse includes: the method can improve the recall rate and the accuracy rate of the vehicle image retrieval. The method is also an image retrieval technology belonging to the early deep learning era.
Chinese patent application No. 201410652730.X discloses an image-based motor vehicle retrieval method and apparatus. The method comprises the following steps: acquiring a first image containing information of a motor vehicle to be retrieved; determining a first appearance contour of the motor vehicle to be retrieved from the first image; dividing the image in the first appearance contour into a plurality of areas, and extracting the image characteristics of each area by adopting different step lengths; combining the image characteristics of all the areas to obtain the overall image characteristics of the motor vehicle to be retrieved; and comparing the overall image features of the motor vehicle to be retrieved with the pre-extracted overall image features of the target motor vehicle to obtain a comparison result. The method is also an image retrieval technology belonging to the early deep learning era.
Disclosure of Invention
Aiming at the problems of how to efficiently utilize mass video data generated in the field of public safety and improve the vehicle retrieval efficiency in the big data era, the invention provides a fast Hash retrieval method based on multi-task deep learning, which effectively utilizes the relevance among detection and identification tasks and the diversity of basic information of vehicles at a checkpoint to realize the purpose of real-time retrieval; finally, the multitask deep learning quick Hash vehicle retrieval method with high retrieval precision and good robustness is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a fast Hash vehicle retrieval method based on multitask deep learning comprises the following steps:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval.
Further, in the first step, fast R-CNN is adopted as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster R-CNN, minimizing the objective function; the loss function for an image is defined as:
Figure GDA0002664728950000031
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT label
Figure GDA0002664728950000032
That is, 1, if anchor is negative,
Figure GDA0002664728950000033
is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure GDA0002664728950000034
is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, NclsIs the normalized value of the cls term being the size of the mini-batch, NregIs the normalized value of reg terms as the number of anchor positions, the classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
Figure GDA0002664728950000041
for the regression loss function LregDefined by the following function:
Figure GDA0002664728950000042
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
Figure GDA0002664728950000043
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
Further, in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
Figure GDA0002664728950000044
in the formula (I), the compound is shown in the specification,
Figure GDA0002664728950000045
is a characteristic of inputVector quantity
Figure GDA0002664728950000046
And a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded as
Figure GDA0002664728950000047
Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,
Figure GDA0002664728950000048
respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
Figure GDA0002664728950000049
in the formula, xiIs the ith depth feature, WjThe jth column of weights in the last fully-connected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
Furthermore, in the second step, the feature fusion method process of the segment compact hash code and the example feature is as follows:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
Figure GDA0002664728950000051
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
Figure GDA0002664728950000052
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresenting the binary output of the excitation segment coding module, te (1,T):
Figure GDA0002664728950000053
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
In the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a three-dimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
In the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
Figure GDA0002664728950000061
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
Figure GDA0002664728950000062
where k denotes the kth image in the bucket,
Figure GDA0002664728950000063
represents a penalty factor and
Figure GDA0002664728950000064
cos represents the cosine distance formula and y represents f before mappingAqAnd
Figure GDA0002664728950000065
whether they are equal; y is 1 if equal, 0 otherwise,
Figure GDA0002664728950000066
representing the kth image vehicle segment compact hash code vector, fAqRepresenting reordered vehicle segmented compact hash code vectors
Figure GDA0002664728950000067
Making the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
In the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristics
Figure GDA0002664728950000071
At this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature function
Figure GDA0002664728950000072
Expressed by equation (16):
Figure GDA0002664728950000073
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,
Figure GDA0002664728950000074
for the text attribute feature function, sign represents a sign function;
Figure GDA0002664728950000075
in the formula (I), the compound is shown in the specification,
Figure GDA0002664728950000076
diag denotes the determination of the diagonal matrix,
Figure GDA0002664728950000077
the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
The technical conception of the invention is as follows: firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.
The text generation retrieval feature vector is the same as the segmented compact hash code vector generated by the convolutional network, so that the feature vectors generated by the text generation retrieval feature vector and the segmented compact hash code vector can be retrieved by using the same retrieval system without additional training.
The deep convolutional neural network model constructed by the method is an end-to-end learning system as shown in fig. 1, and the model integrates tasks such as text feature representation, image feature learning, text feature learning, cross-modal retrieval and reordering and the like into the same learning framework.
The invention has the following beneficial effects:
1) a vehicle appearance recognition framework of multitask deep learning is designed. The generalization capability of the system is improved by using weight sharing in the correlation parallel processing process among tasks, the influence of overfitting on a neural network is weakened, the problem that the generalization capability of a classifier is not strong due to insufficient samples is solved, different network structures are tried, and finally, tasks which are correlated with each other are fused, so that the network parameter sharing is maximized.
2) A segmented approach is employed to learn hash codes in conjunction with a multitasking network architecture to reduce redundancy between binary hash codes. Each task is responsible for learning a part of hash codes without mutual connection, and accurate image feature representation of each vehicle is obtained through the vector fusion method provided by the text, and the feature is called as the segmented compact feature of the vehicle; the method comprises the steps of constructing example features of a feature pyramid network capture image by adopting a multi-layer combination of a shared stacked convolution layer, a pyramid pooling layer and a Vector flat layer, namely a Vector flat layer, and finally performing Vector re-fusion on image representations of two kinds of acquired different feature dimension information to obtain a final retrieval feature Vector.
3) And providing a local sensitive Hash reordering retrieval method for quickly matching the acquired retrieval characteristics so as to meet the actual application requirements of intelligent transportation. The retrieval method comprises the steps of firstly mapping images in a query library to each barrel by using a segmented compact hash code, then sequencing the images in the barrel by using example feature vectors again, screening out the most similar images of topK by depending on different feature dimensions of vehicles, and utilizing the mapping of coding vectors to avoid the one-to-one comparison of the images so as to achieve the effect of quick real-time retrieval.
4) Aiming at special conditions that image information of a vehicle cannot be acquired, the camera view is fuzzy at night or the illumination is too strong in daytime, the camera is dead, and the like, the invention provides a cross-modal auxiliary retrieval mode to meet actual requirements of different environments; and summarizing vehicle characteristics according to manual judgment, and converting the vehicle characteristics into text data to be sent into a retrieval network to realize auxiliary retrieval.
Drawings
FIG. 1 is an overall network framework for fast hash retrieval of a multitasking deep convolutional neural network;
FIG. 2 is a schematic representation of a reordering sequence;
FIG. 3 is an illustration of a text feature vector generation process;
fig. 4 is a diagram of an RPN network architecture;
FIG. 5 is a diagram of a multitask Faster R-CNN deep convolutional network.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 5, a fast hash vehicle retrieval method based on multitask deep learning includes:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval.
In the first step, a multitask deep convolution neural network for deep learning and training recognition is shown in fig. 1; adopting fast R-CNN as the basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN, as shown in fig. 4; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
with these definitions, the objective function is minimized following the multitasking penalty in Faster R-CNN; the loss function for an image is defined as:
Figure GDA0002664728950000091
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT label
Figure GDA0002664728950000092
That is, 1, if anchor is negative,
Figure GDA0002664728950000093
is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure GDA0002664728950000094
is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, where λ is 10, NclsIs the normalized value of the cls term, where N is the size of the mini-batchcls=256,NregIs the number of anchor positions normalized by the reg term, Nreg2,400, classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
Figure GDA0002664728950000095
for the regression loss function LregDefined by the following function:
Figure GDA0002664728950000096
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
Figure GDA0002664728950000097
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
The multitasking deep convolutional neural network, as shown in fig. 5; in order to integrate a plurality of tasks for learning and training, it is crucial to design a multi-task objective function; the multitask objective function is expressed by formula (5);
Figure GDA0002664728950000098
in the formula (I), the compound is shown in the specification,
Figure GDA0002664728950000101
is an input feature vector
Figure GDA0002664728950000102
And a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded as
Figure GDA0002664728950000103
Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,
Figure GDA0002664728950000104
respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
Figure GDA0002664728950000105
in the formula, xiIs the ith depth feature, WjFor the last fully-connected layerThe jth column of the middle weight, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
The method for fusing the characteristics of the segmented compact hash code and the example characteristics is shown in fig. 1; on one hand, in the vehicle image feature extraction phase, firstly, a threshold value is limited between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
on the other hand, example features pertaining to vehicles; the vehicle instance features extracted from the convolutional layer are further fused with the compact features extracted from the multitask deep learning vehicle retrieval network under the enlightening of the image pyramid technology, so that the retrieval result is more accurate and reliable; the realization method comprises the following steps: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
Figure GDA0002664728950000106
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
Figure GDA0002664728950000107
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresents the binary output of the excitation segment coding module, te (1, T):
Figure GDA0002664728950000111
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
The feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the specific implementation method is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into pyramid pooling layer to obtain three-dimensional vector T', sizeL × l × d, still comprising a series of characteristic maps S ' ═ S ' { S 'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
The algorithm for improving the search performance of the locality sensitive hash reordering algorithm is shown in fig. 2, and the idea of the algorithm is to map similar samples into the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
Figure GDA0002664728950000112
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the feature fusion method of the segmented compact hash code and the example features, in order to enable similar images to be closer, after a query image is mapped into a similar bucket through the segmented compact hash code, the image returned from the bucket is reordered by using the example features of the image in combination with a formula (15); the reordering calculation method is shown in equation (15):
Figure GDA0002664728950000121
where k denotes the kth image in the bucket,
Figure GDA0002664728950000122
represents a penalty factor and
Figure GDA0002664728950000123
cos represents the cosine distance formula and y represents f before mappingAqAnd
Figure GDA0002664728950000124
whether they are equal; y is 1 if equal, 0 otherwise,
Figure GDA0002664728950000125
representing the kth image vehicle segment compact hash code vector, fAqRepresenting the reordered vehicle segment compact hash code vectors;
coefficient of addition
Figure GDA0002664728950000126
The purpose is to ensure the correctness of LSH mapping, namely calculating the similarity of the example feature vectors under the condition of the same segment compact hash code, and when different segment compact hash codes are mapped into the same bucket, using a penalty factor
Figure GDA0002664728950000127
Making the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
The cross-modal retrieval method is realized by constructingEstablishing a group of deep neural networks to map the image and text data to a common semantic space in a feature learning manner so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a retrieval feature vector, so that the feature vectors generated by the text and the retrieval feature vector can be retrieved by using the same retrieval system, and the specific implementation process is shown in fig. 3.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristics
Figure GDA0002664728950000128
At this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature function
Figure GDA0002664728950000131
Expressed by equation (16):
Figure GDA0002664728950000132
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,
Figure GDA0002664728950000133
for the text attribute feature function, sign represents a sign function;
Figure GDA0002664728950000134
in the formula (I), the compound is shown in the specification,
Figure GDA0002664728950000135
diag denotes the determination of the diagonal matrix,
Figure GDA0002664728950000136
the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
The above description is only exemplary of the preferred embodiments of the present invention, and is not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A fast Hash vehicle retrieval method based on multitask deep learning is characterized in that: the vehicle retrieval method includes the steps of:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval;
in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
Figure FDA0002664728940000011
in the formula (I), the compound is shown in the specification,
Figure FDA0002664728940000012
is an input feature vector
Figure FDA0002664728940000013
And a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded as
Figure FDA0002664728940000014
Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,
Figure FDA0002664728940000015
respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
Figure FDA0002664728940000016
in the formula, xiIs the ith depth feature, WjThe jth column of weights in the last fully-connected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
2. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1, characterized in that: in the first step, fast R-CNN is used as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster R-CNN, minimizing the objective function; the loss function for an image is defined as:
Figure FDA0002664728940000021
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT label
Figure FDA0002664728940000022
That is, 1, if anchor is negative,
Figure FDA0002664728940000023
is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure FDA0002664728940000024
is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, NclsIs the normalized value of the cls term being the size of the mini-batch, NregIs the normalized value of reg terms as the number of anchor positions, the classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
Figure FDA0002664728940000025
for the regression loss function LregDefined by the following function:
Figure FDA0002664728940000026
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
Figure FDA0002664728940000027
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
3. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the second step, the feature fusion method of the segment compact hash code and the example feature comprises the following steps:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
4. The fast hash vehicle retrieval method based on multitask deep learning according to claim 3, wherein: the vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]In the formula (7)Calculating;
Figure FDA0002664728940000031
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
Figure FDA0002664728940000032
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresents the binary output of the excitation segment coding module, te (1, T):
Figure FDA0002664728940000033
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
5. The fast hash vehicle retrieval method based on multitask deep learning according to claim 4, wherein: in the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a three-dimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
6. The fast hash vehicle retrieval method based on multitask deep learning according to claim 5, wherein: in the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
Figure FDA0002664728940000041
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
7. The multitask deep learning based fast hash vehicle retrieval method according to claim 6 wherein: in the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
Figure FDA0002664728940000042
where k denotes the kth image in the bucket,
Figure FDA0002664728940000043
represents a penalty factor and
Figure FDA0002664728940000044
cos represents the cosine distance formula and y represents f before mappingAqAnd
Figure FDA0002664728940000045
whether they are equal; y is 1 if equal, 0 otherwise,
Figure FDA0002664728940000046
representing the kth image vehicle segment compact hash code vector, fAqRepresenting reordered vehicle segmented compact hash code vectors
Figure FDA0002664728940000047
Making the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
8. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
9. The multitask deep learning based fast hash vehicle retrieval method according to claim 8 wherein: the semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristics
Figure FDA0002664728940000051
At this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature function
Figure FDA0002664728940000052
Expressed by equation (16):
Figure FDA0002664728940000053
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,
Figure FDA0002664728940000054
for the text attribute feature function, sign represents a sign function;
Figure FDA0002664728940000055
in the formula (I), the compound is shown in the specification,
Figure FDA0002664728940000056
diag denotes the determination of the diagonal matrix,
Figure FDA0002664728940000057
the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
CN201710857318.5A 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning Active CN107885764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710857318.5A CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710857318.5A CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Publications (2)

Publication Number Publication Date
CN107885764A CN107885764A (en) 2018-04-06
CN107885764B true CN107885764B (en) 2020-12-18

Family

ID=61780800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710857318.5A Active CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Country Status (1)

Country Link
CN (1) CN107885764B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11429820B2 (en) * 2018-03-13 2022-08-30 Recogni Inc. Methods for inter-camera recognition of individuals and their properties
CN108629414B (en) * 2018-05-09 2020-04-14 清华大学 Deep hash learning method and device
CN109033172B (en) * 2018-06-21 2021-12-17 西安理工大学 Image retrieval method for deep learning and approximate target positioning
CN109035267B (en) * 2018-06-22 2021-07-27 华东师范大学 Image target matting method based on deep learning
CN109086866B (en) * 2018-07-02 2021-07-30 重庆大学 Partial binary convolution method suitable for embedded equipment
CN109034245B (en) * 2018-07-27 2021-02-05 燕山大学 Target detection method using feature map fusion
CN109145798B (en) * 2018-08-13 2021-10-22 浙江零跑科技股份有限公司 Driving scene target identification and travelable region segmentation integration method
CN109583305B (en) * 2018-10-30 2022-05-20 南昌大学 Advanced vehicle re-identification method based on key component identification and fine-grained classification
CN109857887A (en) * 2018-12-10 2019-06-07 福州大学 A kind of method that photograph album personage classifies automatically
CN109766884A (en) * 2018-12-26 2019-05-17 哈尔滨工程大学 A kind of airfield runway foreign matter detecting method based on Faster-RCNN
CN109886286B (en) * 2019-01-03 2021-07-23 武汉精测电子集团股份有限公司 Target detection method based on cascade detector, target detection model and system
CN109933682B (en) * 2019-01-11 2022-01-04 上海交通大学 Image hash retrieval method and system based on combination of semantics and content information
CN109885716B (en) * 2019-02-18 2022-10-14 成都快眼科技有限公司 Image retrieval method based on heterogeneous multi-task learning deep supervised discrete hashing
CN110019652B (en) * 2019-03-14 2022-06-03 九江学院 Cross-modal Hash retrieval method based on deep learning
CN109994201B (en) * 2019-03-18 2021-06-11 浙江大学 Diabetes and hypertension probability calculation system based on deep learning
CN109961057B (en) * 2019-04-03 2021-09-03 罗克佳华科技集团股份有限公司 Vehicle position obtaining method and device
CN110163106A (en) * 2019-04-19 2019-08-23 中国科学院计算技术研究所 Integral type is tatooed detection and recognition methods and system
CN110222140B (en) * 2019-04-22 2021-07-13 中国科学院信息工程研究所 Cross-modal retrieval method based on counterstudy and asymmetric hash
CN110532456B (en) * 2019-06-14 2023-06-27 平安科技(深圳)有限公司 Case query method, device, computer equipment and storage medium
CN110298404B (en) * 2019-07-02 2020-12-29 西南交通大学 Target tracking method based on triple twin Hash network learning
CN110442741B (en) * 2019-07-22 2022-10-18 成都澳海川科技有限公司 Tensor fusion and reordering-based cross-modal image-text mutual search method
JP7151654B2 (en) * 2019-07-26 2022-10-12 トヨタ自動車株式会社 Search device, learning device, search system, search program, and learning program
CN110516640B (en) * 2019-08-30 2022-09-30 华侨大学 Vehicle re-identification method based on feature pyramid joint representation
CN110765281A (en) * 2019-11-04 2020-02-07 山东浪潮人工智能研究院有限公司 Multi-semantic depth supervision cross-modal Hash retrieval method
CN111639240B (en) * 2020-05-14 2021-04-09 山东大学 Cross-modal Hash retrieval method and system based on attention awareness mechanism
CN112115291B (en) * 2020-08-12 2024-02-27 南京止善智能科技研究院有限公司 Three-dimensional indoor model retrieval method based on deep learning
CN112418168B (en) * 2020-12-10 2024-04-02 深圳云天励飞技术股份有限公司 Vehicle identification method, device, system, electronic equipment and storage medium
CN112580569B (en) * 2020-12-25 2023-06-09 山东旗帜信息有限公司 Vehicle re-identification method and device based on multidimensional features
CN112614187B (en) * 2020-12-31 2024-03-26 深圳市优必选科技股份有限公司 Loop detection method, loop detection device, terminal equipment and readable storage medium
CN114220053B (en) * 2021-12-15 2022-06-03 北京建筑大学 Unmanned aerial vehicle video vehicle retrieval method based on vehicle feature matching
CN114494818B (en) * 2022-01-26 2023-07-25 北京百度网讯科技有限公司 Image processing method, model training method, related device and electronic equipment
CN114332745B (en) * 2022-03-11 2022-05-31 西安科技大学 Near-repetitive video big data cleaning method based on deep neural network
CN115512154A (en) * 2022-09-21 2022-12-23 东南大学 Highway vehicle image retrieval method based on deep learning neural network
CN115686868B (en) * 2022-12-28 2023-04-07 中南大学 Cross-node-oriented multi-mode retrieval method based on federated hash learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812B (en) * 2016-07-15 2019-08-20 汤一平 A kind of model recognizing method based on quick R-CNN deep neural network
CN106227851B (en) * 2016-07-29 2019-10-01 汤一平 The image search method of depth of seam division search based on depth convolutional neural networks
CN106528662A (en) * 2016-10-20 2017-03-22 中山大学 Quick retrieval method and system of vehicle image on the basis of feature geometric constraint

Also Published As

Publication number Publication date
CN107885764A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
CN107885764B (en) Rapid Hash vehicle retrieval method based on multitask deep learning
CN108108657B (en) Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning
CN107679250B (en) Multi-task layered image retrieval method based on deep self-coding convolutional neural network
CN106227851B (en) The image search method of depth of seam division search based on depth convolutional neural networks
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
Garcia-Fidalgo et al. Vision-based topological mapping and localization methods: A survey
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
Schindler et al. City-scale location recognition
CN111126360A (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN109558823B (en) Vehicle identification method and system for searching images by images
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
Kobyshev et al. Matching features correctly through semantic understanding
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
US20230050679A1 (en) System and method for rare object localization and search in overhead imagery
Guo et al. Traffic-sign spotting in the wild via deep features
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
CN108804581B (en) Similar object retrieval method and system based on deep learning
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN109583371A (en) Landmark information based on deep learning extracts and matching process
Majdik et al. Adaptive appearance based loop-closing in heterogeneous environments
CN116108217A (en) Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction
Wang et al. Extraction of main urban roads from high resolution satellite images by machine learning
Sunil et al. Identifying oil pads in high spatial resolution aerial images using faster R-CNN
Huang et al. Baggage image retrieval with attention-based network for security checks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee before: ENJOYOR Co.,Ltd.

CP01 Change in the name or title of a patent holder