CN107885764B - Rapid Hash vehicle retrieval method based on multitask deep learning - Google Patents

Rapid Hash vehicle retrieval method based on multitask deep learning Download PDF

Info

Publication number
CN107885764B
CN107885764B CN201710857318.5A CN201710857318A CN107885764B CN 107885764 B CN107885764 B CN 107885764B CN 201710857318 A CN201710857318 A CN 201710857318A CN 107885764 B CN107885764 B CN 107885764B
Authority
CN
China
Prior art keywords
vehicle
vector
retrieval
feature
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710857318.5A
Other languages
Chinese (zh)
Other versions
CN107885764A (en
Inventor
汤一平
温晓岳
柳展
张文广
樊锦祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enjoyor Co Ltd
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201710857318.5A priority Critical patent/CN107885764B/en
Publication of CN107885764A publication Critical patent/CN107885764A/en
Application granted granted Critical
Publication of CN107885764B publication Critical patent/CN107885764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • G06K9/6277Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches based on a parametric (probabilistic) model, e.g. based on Neyman-Pearson lemma, likelihood ratio, Receiver Operating Characteristic [ROC] curve plotting a False Acceptance Rate [FAR] versus a False Reject Rate [FRR]
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0454Architectures, e.g. interconnection topology using a combination of multiple neural nets

Abstract

A fast Hash vehicle retrieval method based on multi-task deep learning comprises a multi-task deep convolution neural network used for deep learning and training recognition, a feature fusion method of a segmented compact Hash code and an example feature used for improving retrieval precision and retrieval method practicability, a local sensitive Hash reordering algorithm used for improving retrieval performance and a cross-modal retrieval method used for improving retrieval engine robustness and accuracy. Firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.

Description

Rapid Hash vehicle retrieval method based on multitask deep learning
Technical Field
The invention relates to the application of artificial intelligence, digital image processing, convolutional neural network and computer vision in the field of public safety, and belongs to the field of intelligent transportation.
Background
Today, smart cities and intelligent transportation are rapidly developing, and the demands for large-scale image monitoring, vehicle identification of video databases and vehicle retrieval in public safety systems are rapidly increasing.
In the prior art, a vehicle retrieval method mainly extracts license plate information of a target vehicle. Then, the motor vehicle to be retrieved is retrieved according to the license plate information. This is typically done by identifying the license plate number of the vehicle from the monitored images and then identifying the motor vehicle having the license plate number from the other monitored images. Although this method of searching only by the license plate number is easy to implement, it is not possible to effectively search for a motor vehicle that cannot acquire license plate information, such as a fake-license-plate vehicle.
The vehicle retrieval technology based on the appearance characteristics can not only make up the limitations and the defects of the traditional license plate identification method, but also has very important practical significance and very wide application prospect in intelligent vehicle retrieval, especially in violation inspection, hit-and-miss pursuit, criminal suspect vehicle locking, fake license vehicle identification and criminal investigation case solving efficiency and speed acceleration.
The existing vehicle retrieval method basically utilizes the algorithms of sift, surf, dog and the like to extract the whole image characteristics of a target vehicle image, the characteristics are taken as target characteristics, the same algorithm is utilized to extract the whole image characteristics of each vehicle image in a database, the characteristics are taken as characteristics to be matched, the Euclidean distance between the target characteristics and each characteristic to be matched is calculated, and the vehicle corresponding to the characteristic to be matched with the closest Euclidean distance is taken as the target vehicle.
Vehicle retrieval requires finding a specific target vehicle among a series of similarly contoured vehicles, making the task even more challenging; furthermore, the influence of practical conditions, such as monitoring environment, weather conditions and lighting conditions, is taken into account.
In recent years, the technology of deep learning in the field of computer vision is rapidly developed, and the deep learning can utilize a large number of training samples and hidden layers to deeply learn abstract information of an image layer by layer so as to more comprehensively and directly acquire image characteristics. The digital image is described by a matrix, and the convolutional neural network better starts from a local information block to further describe the overall structure of the image, so that the convolutional neural network is mostly adopted to solve the problem in the field of computer vision and deep learning methods. The deep convolutional neural network technology is from R-CNN, fast R-CNN to Fasterer R-CNN around improving the detection precision and the detection time. The method is characterized by further precision improvement, acceleration, end-to-end and more practicability, and almost covers all fields from classification to detection, segmentation and positioning. The application of the deep learning technology to vehicle retrieval is a research field with practical application value.
Reordering is a technique commonly used in image retrieval technology to improve retrieval performance, for example, initial retrieval results may be reordered through visual feature matching relationship between image pairs. However, the re-ordering effect depends strongly on whether the visual features used are sufficiently effective to represent the image.
In similar vehicle search, as many vehicles are often similar in appearance, the extracted visual features are also similar and different vehicle types cannot be distinguished, so that similar vehicles cannot be well retrieved by the reordering method directly using the matching relationship between the image pairs.
Query expansion is a common method used in search technology to improve recall and accuracy. The query expansion technology is a method of adding new keywords to an original query sentence to re-query, for example, a search engine searches the query sentence input by a user once, selects suitable keywords according to a searched file, and adds the keywords to the query sentence to re-search, thereby finding out more related files. Therefore, query expansion can effectively improve the recall rate of information retrieval, but no specific query expansion method is provided for a special object, namely a vehicle in an image in the prior art.
Compared with the traditional vehicle retrieval method based on the license plate number, the method provided by the Chinese patent application with the application number of 201510744990.4 not only effectively avoids the dependence on the license plate recognition accuracy, but also can be used for retrieving fake-licensed vehicles and fake-licensed vehicles. However, this technology is also a computer vision technology belonging to the pre-deep learning age.
The chinese patent application with application number 201610671729.0 discloses a vehicle retrieval method and device based on big data, the method includes: extracting brand features of the target vehicle in the target vehicle image; determining the probability that each pixel point in the target vehicle image corresponds to each marker, wherein the markers comprise one or more of annual inspection markers, ornaments and hanging decorations; determining the position of each marker in the target vehicle image according to the probability of each pixel point in the target vehicle image corresponding to each marker and the probability threshold value corresponding to each marker; extracting image features of each marker according to the position of each marker in the target vehicle image; and searching the target vehicle in the plurality of vehicle images to be searched according to the image characteristics of each marker in the target vehicle image and the brand characteristics of the target vehicle. Although the technology adopts the deep learning technology, the technology belongs to the deep learning technology of a single task; however, vehicle retrieval is a typical multitasking deep learning technique.
Chinese patent application No. 201410381577.1 discloses a query expansion method and device in similar vehicle retrieval, wherein the method comprises: determining vehicle type information of an image to be inquired according to the image to be inquired comprising a vehicle; selecting a plurality of sample images which accord with preset conditions from a vehicle model template library corresponding to the vehicle model information of the image to be inquired; forming a query expansion image set by the sample images so that the sample images in the query expansion image set replace the images to be queried to query in a target database; wherein, motorcycle type template storehouse includes: the method can improve the recall rate and the accuracy rate of the vehicle image retrieval. The method is also an image retrieval technology belonging to the early deep learning era.
Chinese patent application No. 201410652730.X discloses an image-based motor vehicle retrieval method and apparatus. The method comprises the following steps: acquiring a first image containing information of a motor vehicle to be retrieved; determining a first appearance contour of the motor vehicle to be retrieved from the first image; dividing the image in the first appearance contour into a plurality of areas, and extracting the image characteristics of each area by adopting different step lengths; combining the image characteristics of all the areas to obtain the overall image characteristics of the motor vehicle to be retrieved; and comparing the overall image features of the motor vehicle to be retrieved with the pre-extracted overall image features of the target motor vehicle to obtain a comparison result. The method is also an image retrieval technology belonging to the early deep learning era.
Disclosure of Invention
Aiming at the problems of how to efficiently utilize mass video data generated in the field of public safety and improve the vehicle retrieval efficiency in the big data era, the invention provides a fast Hash retrieval method based on multi-task deep learning, which effectively utilizes the relevance among detection and identification tasks and the diversity of basic information of vehicles at a checkpoint to realize the purpose of real-time retrieval; finally, the multitask deep learning quick Hash vehicle retrieval method with high retrieval precision and good robustness is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a fast Hash vehicle retrieval method based on multitask deep learning comprises the following steps:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval.
Further, in the first step, fast R-CNN is adopted as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster R-CNN, minimizing the objective function; the loss function for an image is defined as:
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, NclsIs the normalized value of the cls term being the size of the mini-batch, NregIs the normalized value of reg terms as the number of anchor positions, the classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function LregDefined by the following function:
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
Further, in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is a characteristic of inputVector quantityAnd a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
in the formula, xiIs the ith depth feature, WjThe jth column of weights in the last fully-connected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
Furthermore, in the second step, the feature fusion method process of the segment compact hash code and the example feature is as follows:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresenting the binary output of the excitation segment coding module, te (1,T):
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
In the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a three-dimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
In the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mappingAqAndwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, fAqRepresenting reordered vehicle segmented compact hash code vectorsMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
In the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
The technical conception of the invention is as follows: firstly, a multitask deep convolutional network segmentation learning hash code method is provided, image semantics and image representation are combined, the relation between related tasks is utilized to improve retrieval precision and refine image characteristics, and meanwhile minimized image coding is adopted to enable learned vehicle characteristics to be more robust; secondly, selecting a characteristic pyramid network and extracting example characteristics of the vehicle image; secondly, retrieving the extracted features by using a locality sensitive hash reordering method; and finally, adopting a cross-mode auxiliary vehicle retrieval method for special conditions that the query vehicle target image cannot be obtained.
The text generation retrieval feature vector is the same as the segmented compact hash code vector generated by the convolutional network, so that the feature vectors generated by the text generation retrieval feature vector and the segmented compact hash code vector can be retrieved by using the same retrieval system without additional training.
The deep convolutional neural network model constructed by the method is an end-to-end learning system as shown in fig. 1, and the model integrates tasks such as text feature representation, image feature learning, text feature learning, cross-modal retrieval and reordering and the like into the same learning framework.
The invention has the following beneficial effects:
1) a vehicle appearance recognition framework of multitask deep learning is designed. The generalization capability of the system is improved by using weight sharing in the correlation parallel processing process among tasks, the influence of overfitting on a neural network is weakened, the problem that the generalization capability of a classifier is not strong due to insufficient samples is solved, different network structures are tried, and finally, tasks which are correlated with each other are fused, so that the network parameter sharing is maximized.
2) A segmented approach is employed to learn hash codes in conjunction with a multitasking network architecture to reduce redundancy between binary hash codes. Each task is responsible for learning a part of hash codes without mutual connection, and accurate image feature representation of each vehicle is obtained through the vector fusion method provided by the text, and the feature is called as the segmented compact feature of the vehicle; the method comprises the steps of constructing example features of a feature pyramid network capture image by adopting a multi-layer combination of a shared stacked convolution layer, a pyramid pooling layer and a Vector flat layer, namely a Vector flat layer, and finally performing Vector re-fusion on image representations of two kinds of acquired different feature dimension information to obtain a final retrieval feature Vector.
3) And providing a local sensitive Hash reordering retrieval method for quickly matching the acquired retrieval characteristics so as to meet the actual application requirements of intelligent transportation. The retrieval method comprises the steps of firstly mapping images in a query library to each barrel by using a segmented compact hash code, then sequencing the images in the barrel by using example feature vectors again, screening out the most similar images of topK by depending on different feature dimensions of vehicles, and utilizing the mapping of coding vectors to avoid the one-to-one comparison of the images so as to achieve the effect of quick real-time retrieval.
4) Aiming at special conditions that image information of a vehicle cannot be acquired, the camera view is fuzzy at night or the illumination is too strong in daytime, the camera is dead, and the like, the invention provides a cross-modal auxiliary retrieval mode to meet actual requirements of different environments; and summarizing vehicle characteristics according to manual judgment, and converting the vehicle characteristics into text data to be sent into a retrieval network to realize auxiliary retrieval.
Drawings
FIG. 1 is an overall network framework for fast hash retrieval of a multitasking deep convolutional neural network;
FIG. 2 is a schematic representation of a reordering sequence;
FIG. 3 is an illustration of a text feature vector generation process;
fig. 4 is a diagram of an RPN network architecture;
FIG. 5 is a diagram of a multitask Faster R-CNN deep convolutional network.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 5, a fast hash vehicle retrieval method based on multitask deep learning includes:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
and fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval.
In the first step, a multitask deep convolution neural network for deep learning and training recognition is shown in fig. 1; adopting fast R-CNN as the basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN, as shown in fig. 4; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
with these definitions, the objective function is minimized following the multitasking penalty in Faster R-CNN; the loss function for an image is defined as:
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, where λ is 10, NclsIs the normalized value of the cls term, where N is the size of the mini-batchcls=256,NregIs the number of anchor positions normalized by the reg term, Nreg2,400, classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function LregDefined by the following function:
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
The multitasking deep convolutional neural network, as shown in fig. 5; in order to integrate a plurality of tasks for learning and training, it is crucial to design a multi-task objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is an input feature vectorAnd a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
in the formula, xiIs the ith depth feature, WjFor the last fully-connected layerThe jth column of the middle weight, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
The method for fusing the characteristics of the segmented compact hash code and the example characteristics is shown in fig. 1; on one hand, in the vehicle image feature extraction phase, firstly, a threshold value is limited between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
on the other hand, example features pertaining to vehicles; the vehicle instance features extracted from the convolutional layer are further fused with the compact features extracted from the multitask deep learning vehicle retrieval network under the enlightening of the image pyramid technology, so that the retrieval result is more accurate and reliable; the realization method comprises the following steps: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
The vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]Calculating by using a formula (7);
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresents the binary output of the excitation segment coding module, te (1, T):
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
The feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the specific implementation method is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into pyramid pooling layer to obtain three-dimensional vector T', sizeL × l × d, still comprising a series of characteristic maps S ' ═ S ' { S 'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
The algorithm for improving the search performance of the locality sensitive hash reordering algorithm is shown in fig. 2, and the idea of the algorithm is to map similar samples into the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
In the feature fusion method of the segmented compact hash code and the example features, in order to enable similar images to be closer, after a query image is mapped into a similar bucket through the segmented compact hash code, the image returned from the bucket is reordered by using the example features of the image in combination with a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mappingAqAndwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, fAqRepresenting the reordered vehicle segment compact hash code vectors;
coefficient of additionThe purpose is to ensure the correctness of LSH mapping, namely calculating the similarity of the example feature vectors under the condition of the same segment compact hash code, and when different segment compact hash codes are mapped into the same bucket, using a penalty factorMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
The cross-modal retrieval method is realized by constructingEstablishing a group of deep neural networks to map the image and text data to a common semantic space in a feature learning manner so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a retrieval feature vector, so that the feature vectors generated by the text and the retrieval feature vector can be retrieved by using the same retrieval system, and the specific implementation process is shown in fig. 3.
The semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
The above description is only exemplary of the preferred embodiments of the present invention, and is not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A fast Hash vehicle retrieval method based on multitask deep learning is characterized in that: the vehicle retrieval method includes the steps of:
the method comprises the steps of firstly, constructing a multitask deep convolution neural network for deep learning and training identification;
secondly, adopting a feature fusion method of the segmented compact hash codes and the example features;
thirdly, adopting a locality sensitive hash reordering algorithm;
fourthly, a cross-modal retrieval method is adopted to realize vehicle retrieval;
in the multitask deep convolution neural network, it is crucial to design a multitask objective function; the multitask objective function is expressed by formula (5);
in the formula (I), the compound is shown in the specification,is an input feature vectorAnd a weight parameter wtL (-) is a loss function, phi (w)t) Is the regularization value of the weight parameter, T is the total task number, and the training data of the T-th task is recorded asWherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,respectively representing a feature vector and a label of the ith sample;
for the loss function, softmax is used to match the log-likelihood cost function to train the characteristics of the last layer to realize the multi-task image classification, the softmax loss function is defined by the formula (6),
in the formula, xiIs the ith depth feature, WjThe jth column of weights in the last fully-connected layer, b is the bias term, and m, n are the number of processed samples and the number of classes, respectively.
2. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1, characterized in that: in the first step, fast R-CNN is used as a basic network of the multitask convolution neural network; the network is foremost a 3 × 3 convolutional network called conv1, followed by 4 stacked convolutional modules named conv2_ x to conv5_ x, each module respectively containing {2,3,3,3} units, conv1 to conv4_3 as shared networks; then RPN, namely regional suggestion network, the RPN takes an image of any scale as input, and outputs a set of rectangular target suggestion boxes, and each box comprises 4 position coordinate variables and a score; sliding a small net over the convolution signature output by the last shared convolution layer in order to generate a region suggestion box, this net being fully connected to the nxn spatial window of the input convolution signature; each sliding window is mapped to a low-dimensional vector, and one sliding window of each feature mapping corresponds to a numerical value; this vector is output to two fully connected layers of the same level;
the RPN; the RPN takes an image with any scale as input, and outputs a set of rectangular target suggestion boxes, wherein each box comprises 4 position coordinate variables and a score; the targets of the rectangular target suggestion box refer to vehicle objects;
the estimated probability that each suggestion box is a target/non-target is a classification layer realized by two classified softmax layers; the k suggestion boxes are parameterized by the corresponding k suggestion boxes called anchors;
each anchor is centered at the center of the current sliding window and corresponds to a scale and an aspect ratio, and 3 scales and 3 aspect ratios are used, so that k is 9 anchors at each sliding position;
in order to train the RPN network, each anchor is assigned with a binary label so as to mark whether the anchor is a target or not; positive labels are then assigned to both types of anchors: (I) the ratio of intersection-over-Union, overlapping anchor, with a real target bounding box, i.e. Ground Truth, GT, has the highest IoU; (II) an anchor with IoU overlap of greater than 0.7 with any GT bounding box; note that one GT bounding box may assign positive labels to multiple anchors; assigning negative labels to anchors whose IoU ratio to all GT bounding boxes is below 0.3; if the non-positive and non-negative anchors have no effect on the training target, abandoning the anchors;
following the multitask loss in Faster R-CNN, minimizing the objective function; the loss function for an image is defined as:
where i is the index of an anchor, piIs the predicted probability that anchor is the ith target, and if anchor is positive, GT labelThat is, 1, if anchor is negative,is 0; t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the GT bounding box corresponding to the positive anchor; λ is a balance weight, NclsIs the normalized value of the cls term being the size of the mini-batch, NregIs the normalized value of reg terms as the number of anchor positions, the classification loss function LclsAre two categories, namely motor vehicle target vs. logarithmic loss of road background:
for the regression loss function LregDefined by the following function:
in the formula, LregFor the regression loss function, R is a robust loss function, and smooth L is calculated by equation (4)1
In the formula, smoothL1Is smooth L1The loss function, x, is a variable.
3. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the second step, the feature fusion method of the segment compact hash code and the example feature comprises the following steps:
in the vehicle image feature extraction stage, firstly, limiting a threshold value between [0,1] through a softmax activation function; then, the output of the binary hash codes is promoted through a segmentation threshold function, and the redundancy among the hash codes is reduced by using a segmentation learning and coding strategy to improve the feature robustness; finally, fusing the hash codes obtained by the segmented learning in a characteristic fusion mode to finally obtain vehicle characteristic segmented compact hash codes;
for example features of a vehicle, the method of implementation is: sharing the last unit of stacking convolution each module from conv2_ x to conv5_ x, respectively, in conjunction with the output of the RPN network, adding a pyramid pooling layer and a vector flattening layer to accommodate convolution feature map inputs of different sizes, while flattening the convolved three-dimensional features into one-dimensional feature vectors, referred to as example features of the vehicle;
and finally, fusing the compact characteristics and the example characteristics of the vehicle segment compact hash codes again to obtain a characteristic vector for retrieval.
4. The fast hash vehicle retrieval method based on multitask deep learning according to claim 3, wherein: the vehicle characteristic segmented compact hash code is realized by the following method; there are T tasks in total, and c exists under each tasktClass, in mtRepresenting the fully-connected output vector of each task, and enabling the fully-connected layer output to be [0,1] by utilizing the softmax activation function]In the formula (7)Calculating;
where θ represents a random hyperplane, mtA fully-connected output vector representing each task, ctRepresenting the categories existing under each task, qtRepresenting a fully connected layer output;
and for the binary output of the excitation segmented coding module, using a threshold segmentation function for binarization:
wherein q istRepresents the full connection layer output, HtRepresenting the binary output of the excitation segment coding module;
finally, H is puttFusing into a vehicle segment compact hash code vector fA
fA=[α1H1;α2H2;...;αtHt] (9)
Wherein f isARepresenting a vehicle segmented compact hash code vector, alphatRepresenting the coefficient, calculated using equation (10), HtRepresents the binary output of the excitation segment coding module, te (1, T):
wherein alpha istRepresenting coefficients, multiplied by a coefficient alpha before the H vectortIn order to compensate for errors caused by uneven classification among different tasks.
5. The fast hash vehicle retrieval method based on multitask deep learning according to claim 4, wherein: in the third step, the feature vector for retrieval is obtained by fusing the compact features and the example features of the vehicle segment compact hash code, and the process is as follows:
select {4 } for the deepest level of conv2_ x through conv5_ x, respectively2,82,162,162As the output size of the feature map; for a given input image I of size h × w, the convolution convx _ x is activated as a three-dimensional vector T of size h '× w' × d, containing a series of two-dimensional feature maps S ═ S {n},n∈(1,d),SnThe size of the channel is h '× w' and is corresponding to the size of the nth channel characteristic diagram; sending T into the pyramid pooling layer to obtain a three-dimensional vector T ', which is l × l × d and still contains a series of feature maps S' ═ S '{ S'n},n∈(1,d),S′nOf l × l for each S'nTraversing and selecting maximum value, S 'by using sliding window with size of k multiplied by k'nBecomes l/kXl/k, and S 'for each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d); the final retrieval feature vector f is calculated by the method shown in formula (11);
f=[fA;fB] (11)
where f is a feature vector for vehicle retrieval, fBInstance feature vectors, i.e. personality feature vectors, fARepresenting a vehicle segmented compact hash code vector.
6. The fast hash vehicle retrieval method based on multitask deep learning according to claim 5, wherein: in the third step, similar samples are mapped into the same barrel with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (12)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (2), wherein the similarity measure is directly related to a distance function σ, calculated using equation (13),
a typical classification of the locality-sensitive hash function is given by the random projection and the threshold, computed using equation (14),
h(fA)=sign(WfA+b) (14)
where W is a random hyperplane vector and b is a random intercept.
7. The multitask deep learning based fast hash vehicle retrieval method according to claim 6 wherein: in the third step, after the query image is mapped into the similar bucket through the segmented compact hash code, the images returned from the bucket are reordered by utilizing the example characteristics of the images and combining a formula (15); the reordering calculation method is shown in equation (15):
where k denotes the kth image in the bucket,represents a penalty factor andcos represents the cosine distance formula and y represents f before mappingAqAndwhether they are equal; y is 1 if equal, 0 otherwise,representing the kth image vehicle segment compact hash code vector, fAqRepresenting reordered vehicle segmented compact hash code vectorsMaking the distance between the retrieval result error image and the input query image longer; a smaller dis indicates a higher similarity.
8. The fast hash vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized in that: in the fourth step, the cross-modal retrieval method is to map the image and the text data to a common semantic space in a feature learning manner by constructing a group of deep neural networks so as to realize semantic coupling of different modal data; extracting semantic features of an image mode from an input image directly by adopting a deep convolutional neural network, representing a text by adopting a word vector mode, and extracting the semantic features of the text mode from the word vector representation by using a one-dimensional convolutional neural network; first, a piece-wise compact hash f of the vehicle is dynamically generated by a deep convolutional neural networkA(ii) a Then, the text is generated into a search feature vector, so that the feature vectors generated by the text and the search feature vector can be searched by using the same search system.
9. The multitask deep learning based fast hash vehicle retrieval method according to claim 8 wherein: the semantic features of the text mode are to extract feature vectors from the text, and the feature vectors are used as a first step of an extraction algorithm and firstly need to split the text; the feature vector of the text comes from the entry of the text, and the method specifically comprises the following steps:
inputting: a text O; and (3) outputting: a set of roughly similar images;
STEP 1: initialization: (1) the text file is analyzed into a term vector; (2) removing small words and repeated words; (3) checking the entry to ensure the correctness of the analysis;
STEP 2: taking the minimum vector R of the random combination from O ═ R (R)1,r2,...,rn);
STEP 3: to R and fAIntegrating sequential and segmented compact Hash codes to obtain text attribute characteristicsAt this time fATxtA dimension less than R;
STEP 4: searching by using a locality sensitive reordering Hash algorithm;
STEP 5: returning a similar image group I;
wherein the text attribute feature functionExpressed by equation (16):
in the formula, ATExpressed as the transposed matrix of the vehicle piece-wise compact hash code, R is expressed as the smallest vector of terms combined randomly,for the text attribute feature function, sign represents a sign function;
in the formula (I), the compound is shown in the specification,diag denotes the determination of the diagonal matrix,the expression is that a feature vector is extracted from a text, and the vehicle segment compact hash code ATInitialized to the full 1 vector of (1 × c).
CN201710857318.5A 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning Active CN107885764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710857318.5A CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710857318.5A CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Publications (2)

Publication Number Publication Date
CN107885764A CN107885764A (en) 2018-04-06
CN107885764B true CN107885764B (en) 2020-12-18

Family

ID=61780800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710857318.5A Active CN107885764B (en) 2017-09-21 2017-09-21 Rapid Hash vehicle retrieval method based on multitask deep learning

Country Status (1)

Country Link
CN (1) CN107885764B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629414B (en) * 2018-05-09 2020-04-14 清华大学 Deep hash learning method and device
CN109033172A (en) * 2018-06-21 2018-12-18 西安理工大学 A kind of image search method of deep learning and approximate target positioning
CN109035267A (en) * 2018-06-22 2018-12-18 华东师范大学 A kind of image object based on deep learning takes method
CN109086866A (en) * 2018-07-02 2018-12-25 重庆大学 A kind of part two-value convolution method suitable for embedded device
CN109034245B (en) * 2018-07-27 2021-02-05 燕山大学 Target detection method using feature map fusion
CN109145798A (en) * 2018-08-13 2019-01-04 浙江零跑科技有限公司 A kind of Driving Scene target identification and travelable region segmentation integrated approach
CN109886286A (en) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 Object detection method, target detection model and system based on cascade detectors
CN109994201A (en) * 2019-03-18 2019-07-09 浙江大学 A kind of diabetes based on deep learning and hypertension method for calculating probability
CN110163106A (en) * 2019-04-19 2019-08-23 中国科学院计算技术研究所 Integral type is tatooed detection and recognition methods and system
CN110222140A (en) * 2019-04-22 2019-09-10 中国科学院信息工程研究所 A kind of cross-module state search method based on confrontation study and asymmetric Hash
CN110298404B (en) * 2019-07-02 2020-12-29 西南交通大学 Target tracking method based on triple twin Hash network learning
CN111639240B (en) * 2020-05-14 2021-04-09 山东大学 Cross-modal Hash retrieval method and system based on attention awareness mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250812B (en) * 2016-07-15 2019-08-20 汤一平 A kind of model recognizing method based on quick R-CNN deep neural network
CN106227851B (en) * 2016-07-29 2019-10-01 汤一平 The image search method of depth of seam division search based on depth convolutional neural networks
CN106528662A (en) * 2016-10-20 2017-03-22 中山大学 Quick retrieval method and system of vehicle image on the basis of feature geometric constraint

Also Published As

Publication number Publication date
CN107885764A (en) 2018-04-06

Similar Documents

Publication Publication Date Title
Noh et al. Large-scale image retrieval with attentive deep local features
Long et al. Accurate object localization in remote sensing images based on convolutional neural networks
US10354406B2 (en) Method of detecting objects within a 3D environment
Weyand et al. Planet-photo geolocation with convolutional neural networks
CN106126581B (en) Cartographical sketching image search method based on deep learning
Li et al. GPS estimation for places of interest from social users' uploaded photos
Huttunen et al. Car type recognition with deep neural networks
Chaudhuri et al. Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
Lin et al. Cross-view image geolocalization
Rusu et al. Detecting and segmenting objects for mobile manipulation
Gould et al. Region-based Segmentation and Object Detection.
Sivic et al. Unsupervised discovery of visual object class hierarchies
CN102236794B (en) Recognition and pose determination of 3D objects in 3D scenes
Fuh et al. Hierarchical color image region segmentation for content-based image retrieval system
Athitsos et al. Boostmap: An embedding method for efficient nearest neighbor retrieval
Shyu et al. GeoIRIS: Geospatial information retrieval and indexing system—Content mining, semantics modeling, and complex queries
Bosse et al. Keypoint design and evaluation for place recognition in 2D lidar maps
CN107729801B (en) Vehicle color recognition system based on multitask deep convolution neural network
Hu et al. Semantic-based surveillance video retrieval
US20190180464A1 (en) Remote determination of containers in geographical region
Chang et al. Automatic license plate recognition
JP2014232533A (en) System and method for ocr output verification
WO2017012277A1 (en) Method and device for searching a target in an image
US20140254923A1 (en) Image processing and object classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant