CN108108657B - Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning - Google Patents

Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning Download PDF

Info

Publication number
CN108108657B
CN108108657B CN201711135951.XA CN201711135951A CN108108657B CN 108108657 B CN108108657 B CN 108108657B CN 201711135951 A CN201711135951 A CN 201711135951A CN 108108657 B CN108108657 B CN 108108657B
Authority
CN
China
Prior art keywords
vehicle
function
hash
layer
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711135951.XA
Other languages
Chinese (zh)
Other versions
CN108108657A (en
Inventor
何霞
汤一平
陈朋
王丽冉
袁公萍
金宇杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201711135951.XA priority Critical patent/CN108108657B/en
Publication of CN108108657A publication Critical patent/CN108108657A/en
Application granted granted Critical
Publication of CN108108657B publication Critical patent/CN108108657B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

A retrieval method of a modified local sensitive Hash vehicle based on multitask deep learning is characterized in that a multitask end-to-end convolutional neural network is adopted to recognize vehicle types, vehicle series, vehicle logos, colors and license plates of the vehicle at the same time in a segmented and parallel mode, a network module for extracting vehicle image example features based on a feature pyramid is used, an algorithm for sorting vehicle features in a database is used for sorting the vehicle features by using a modified local sensitive Hash sorting algorithm, and a cross-modal text retrieval method for retrieving vehicle images cannot be obtained. The invention provides a multi-task end-to-end convolutional neural network and a modified local sensitive Hash vehicle retrieval method, which effectively improve the automation and intelligence level of vehicle retrieval, and meet the image retrieval requirement in a big data era with less storage space and higher retrieval speed.

Description

Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning
Technical Field
The invention relates to application of computer vision, pattern recognition, information retrieval, multi-task learning, similarity measurement, a deep self-coding convolutional neural network and a deep learning technology in the field of image retrieval, in particular to a modified local sensitive Hash vehicle retrieval method based on multi-task deep learning.
Background
With the rapid development of social economy, motor vehicles increasingly become essential vehicles for people in daily life, and also become essential tools for criminals and terrorists to engage in illegal activities. The vehicles at present are generally based on license plate recognition technology, and once suspected vehicles use fake cards, no cards and a mode of continuously replacing license plates, tracking and recognition of the vehicles by the existing gates can be avoided.
Vehicle feature recognition based on images relates to the related technical fields of image processing, pattern recognition, computer vision and the like, and the current research on the technology at home and abroad can be roughly divided into three directions: (1) based on vehicle type recognition of license plates, the method only recognizes license plate information from images, does not directly analyze and obtain the types of vehicles, has coarse classification granularity, and cannot distinguish fake-licensed vehicles; (2) based on vehicle type recognition of the vehicle logo, in practical application, due to the existence of objective factors such as undersize of the vehicle logo, light rays and shielding, an ideal effect cannot be achieved; and (3) vehicle type identification based on appearance characteristics, compared with the former two methods, the technology has better robustness, the identification type is more detailed, and the vehicle type identification can be accurate to the brand, series, model, annual payment and the like of the vehicle.
The vehicle feature identification technology based on the appearance features is mainly completed by the following three steps: vehicle segmentation, feature extraction of the vehicle, and classification of the vehicle. The traditional vehicle type identification method mainly comprises the following steps: a template matching method, a statistical pattern recognition method, a neural network recognition method, a recognition method of a bionic pattern (topological pattern), and a method of a support vector machine. However, these methods have their own drawbacks, and cannot satisfy the two most important indexes of speed and accuracy of vehicle type classification at the same time. However, the most important factors influencing the two indexes are the extracted vehicle characteristics and the vehicle can be quickly positioned, so that the characteristic extraction and the target quick positioning are the key of the whole identification process. The extraction of the vehicle features is influenced by many factors, such as the vehicle types are many but there is no obvious distinguishing feature, the movement of the vehicle and the height and angle of the camera cause the vehicle type to have large feature difference, the influence of weather, the influence of illumination and the like.
The development of the deep learning technology promotes the picture structuring and feature extraction capability. The bayonet system of early construction, intelligent analysis ability is weak, and picture quality and license plate discernment rate of accuracy are lower, often will pass car picture or video according to vehicle self intrinsic information such as brand model colour, from magnanimity, the artifical target vehicle that seeks, because factors such as a ray of police strength is limited, intensity of labour is big, the motorcycle type is various, light angle is uncertain, can't guarantee the accuracy and the ageing of seeking, especially emergent incident, often delay the best opportunity of handling. By using the vehicle characteristic deep learning system, the characteristic structural analysis and identification are carried out on the vehicle passing picture obtained by the front-end bayonet or the simple bayonet, valuable information in a mass of bayonet vehicle passing pictures is fully excavated, the accuracy of the vehicle type of the license plate can be improved, the identification information of the vehicle characteristic is increased, the identification and detection functions of the sub-brand of the vehicle, the color of the vehicle body, no safety belt fastening, the call receiving and making of the driver, the state of the sun visor and the like are realized, the vehicle passing data is finely corrected, the single traditional means of analyzing and researching by only depending on the license plate is eliminated, more practical vehicle prevention and control application is provided for bayonet electric police, the effective early warning and control on high-risk vehicles can be realized, the police deployment is optimized for carrying out targeted vehicle investigation, suspected vehicles can be effectively locked in a large number of vehicle-related driving cases, and the criminal investigation efficiency is improved, the public security prevention and control means is changed from passive investigation after the fact to active early warning in advance.
The invention discloses a vehicle retrieval method based on similarity learning, which is developed by the Chinese patent application No. CN201510744990.4, wherein after a vehicle region is given, SIFT feature points are obtained and described, and then a clustering algorithm is used for discretizing SIFT features. In order to make up for the defect that the SIFT features lack position information, neighborhood features are further generated by using discrete SIFT feature distribution in the neighborhood and serve as final feature point description, each vehicle picture is represented by a batch of features, the features of a pair of similar vehicle pictures form a positive sample, and the features of a pair of different vehicle pictures form a negative sample. After a large number of positive and negative samples are collected, the similarity learning is carried out by using a random forest method, and the obtained classifier can be used for judging whether two vehicles are similar or not so as to achieve the purpose of similar vehicle retrieval. The technology cannot sufficiently extract vehicle features by using the SIFT features.
The invention of China patent application No. CN201610711333.4 discloses a vehicle retrieval method and a device based on big data, the method comprises the following steps: extracting each vehicle inspection mark of the target vehicle from the target vehicle image; fusing the vehicle inspection marks according to the position relation among the vehicle inspection marks to obtain a plurality of fusion areas, wherein the fusion areas comprise at least one vehicle inspection mark; determining the shape and color of each vehicle inspection mark contained in each fusion area; and searching the target vehicle in the plurality of vehicle images to be searched layer by layer according to the number of the vehicle inspection marks, the number of the fusion areas and the number, the shape and the color of the vehicle inspection marks contained in each fusion area. The technique uses only a single feature to retrieve the vehicle.
The invention discloses a fake-licensed vehicle retrieval and identification system based on machine vision, which is developed by the Chinese patent application No. CN 201710451957.1. The system mainly comprises a vehicle image acquisition system, a database system and a retrieval system, and the invention provides a fake-licensed vehicle retrieval and identification system based on machine vision, which is used for retrieving a suspected vehicle by means of characteristics of vehicle-mounted ornaments of the suspected vehicle, such as vehicle ornaments, annual inspection labels and the like, and solving the problem of searching a target vehicle from a mass of traffic scene images by performing characteristic acquisition on a vehicle-mounted ornament area image and performing vehicle retrieval by adopting a vehicle-mounted ornament area image sparse coding method, so that the fake-licensed vehicle is accurately identified and discovered. This technique is time-complex in large databases.
In summary, the image searching technique using the deep self-coding convolutional neural network and the modified locality-sensitive hash reordering method has several problems as follows: 1) how to accurately segment the whole image of the detected vehicle from the complex background; how to adopt few label image data to learn and train and obtain the characteristic data of the vehicle type as far as possible; 2) how to classify vehicle types in more fine categories and identify more information such as the brand, series and body color of the vehicle. On the other hand, how to perform parallel processing on the vehicle type, the license plate and the vehicle logo in the same deep convolutional neural network, namely, the multi-task parallel computation of deep learning is realized so as to improve the vehicle identity recognition level; 3) how to design a method for extracting example characteristics from a vehicle image for searching similar types of vehicle models; 4) how to use the extracted features to establish hierarchical depth search so as to obtain more accurate retrieval results; 5) how to reduce the problems of large storage space consumption, low retrieval speed and the like of an image retrieval system under the background of a big data era.
Disclosure of Invention
The invention provides a vehicle image retrieval method through hierarchical depth search end-to-end based on a depth self-coding convolutional neural network, aiming at the problems that the existing vehicle retrieval technology is low in automation and intelligence level, lack of deep learning, difficult to obtain accurate retrieval results, large in storage space consumption of the retrieval technology, low in retrieval speed, and difficult to meet the image retrieval requirements in the big data era.
In order to solve the technical problems, the invention provides the following technical scheme:
a modified locality sensitive Hash vehicle retrieval method based on multitask deep learning comprises the following steps:
1) constructing a multi-task end-to-end convolutional neural network for deep learning and training recognition, and deeply learning various attribute information of the vehicle, including vehicle type, vehicle system, vehicle logo, color and license plate, by training data and a network structure which progresses layer by layer;
2) constructing a vehicle attribute hash code by using the multitask convolutional neural network in the step 1) and adopting a segmented parallel learning and coding strategy;
3) constructing a characteristic pyramid module by utilizing a pyramid pooling layer and a vector compression layer so as to adapt to the input of convolution characteristic graphs of different sizes and extract example characteristics of the vehicle;
4) constructing a local sensitive reordering algorithm by using the example characteristics obtained in the step 3);
5) a cross-mode retrieval method under the condition that a retrieval vehicle image cannot be obtained is constructed, and vehicle retrieval is realized.
Furthermore, the multi-task end-to-end convolutional neural network for deep learning and training identification comprises a shared convolution module, an interesting region coordinate regression and identification module, a multi-task learning module and an example feature extraction module;
a shared convolution module: the shared network consists of 5 convolution modules, where the last layer of conv2_ x through conv5_ x is {4 }, respectively2,82,162,162As the output size of the feature map, conv1 contains only a single convolutional layer as the input layer;
the method comprises the following steps that a region-of-interest coordinate regression and identification module is connected behind a shared convolution module, the module can take an image with any size as input, outputs a set of rectangular prediction frames of a target region, and comprises position coordinates of each prediction frame and probability scores of categories in a data set, in order to generate a region suggestion frame, firstly, the input image generates a feature map through a convolution sharing layer, and then, multi-scale convolution operation is carried out on the feature map, and the implementation process is as follows: using 3 scales and 3 length-width ratios at the position of each sliding window, taking the center of the current sliding window as the center and corresponding to one scale and length-width ratio, and then mapping to obtain 9 candidate regions with different scales on an original image; if the shared convolution characteristic diagram with the size of w × h is used, w × h × 9 candidate regions are totally obtained; finally, the classification layer outputs scores of w × h × 9 × 2 candidate regions, namely, the estimation probability that each region is a target/non-target, and the regression layer outputs w × h × 9 × 4 parameters, namely, coordinate parameters of the candidate regions;
when the RPN is trained, each candidate region is allocated with a binary label so as to mark whether the region is an object target, and the operation is as follows: 1) IoU (Intersection-over-Union ratio) with the highest overlap candidate area with a real target area (GT); 2) candidate regions with IoU overlap with any GT bounding box by more than 0.7. Assigning negative labels to candidate regions for which the IoU ratio for all GT bounding boxes is below 0.3; 3) between the two.
With these definitions, the objective function is minimized. The loss function for an image is defined as:
Figure GDA0002540593270000041
where i is the index of the ith candidate region, PiIs the probability that the candidate region is of the ith class. If the label of the candidate region is positive,
Figure GDA0002540593270000042
is 1, if the candidate area label is 0,
Figure GDA0002540593270000043
is simply 0. t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure GDA0002540593270000044
is the coordinate vector of the corresponding GT bounding box. N is a radical ofclsAnd NregRespectively, the normalized coefficients of the classification loss function and the position regression loss function, lambda is the weight parameter between the two, and the classification loss function LclsIs a log loss of two classes, target and non-target:
Figure GDA0002540593270000045
regression loss function L for positionregDefined by the following function:
Figure GDA0002540593270000046
where R is a robust loss function (smooth L1).
Figure GDA0002540593270000047
However, training a multitask deep learning network is not an easy process to implement, because information at different task levels has different learning difficulties and convergence rates, and therefore, it is important to design a good multitask objective function. The multitask joint training process is as follows: assuming that the total number of tasks is T, the training data for the T-th task is recorded as
Figure GDA0002540593270000051
Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,
Figure GDA0002540593270000052
respectively, the feature vector and the label of the ith sample, the multitask objective function is expressed as:
Figure GDA0002540593270000053
in the formula
Figure GDA0002540593270000054
Is an input feature vector
Figure GDA0002540593270000055
And a weight parameter wtL (-) is a loss function, phi (w)t) Is a regularization value of a weight parameter;
and for the loss function, training the characteristics of the last layer by utilizing softmax in cooperation with a log-likelihood cost function to realize image classification. The softmax loss function is defined as follows:
Figure GDA0002540593270000056
in the formula, xiIs the ith depth feature, WjJ column of weights in the last fully-connected layer, b is the offset term, m, n are the number and class of processed samples, respectivelyCounting;
the convolutional neural network training is a back propagation process, is similar to a BP algorithm, performs back propagation through an error function, and performs optimization adjustment on convolutional parameters and bias by using a random gradient descent method until the network converges or reaches the maximum iteration times;
the neural network training is a back propagation process, the convolution parameters and the bias are optimized and adjusted by a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times are reached;
the back propagation needs to compare the training samples with labels, adopt a square error cost function to identify multiple classes of the c classes and the N training samples, calculate the error by the formula (7) according to the final output error function of the network,
Figure GDA0002540593270000057
in the formula, ENIn order to be a function of the squared error cost,
Figure GDA0002540593270000058
for the kth dimension of the label for the nth sample,
Figure GDA0002540593270000059
corresponding to the k output of the network prediction for the n sample;
when the error function is reversely propagated, a calculation method similar to the traditional BP algorithm is adopted, the specific formula form is shown as a formula (8),
l=(Wl+1)T l+1×f'(ul) (ul=Wlxl-1+bl) (8)
in the formula (I), the compound is shown in the specification,lrepresenting the error function of the current layer,l+1representing the error function of the previous layer, Wl+1For the previous layer of the mapping matrix, f' represents the inverse of the activation function, i.e. upsampling, ulOutput, x, representing the layer above the failed activation functionl-1Denotes the input of the next layer, WlThe weight matrix is mapped for this layer.
Furthermore, the relevance of the tasks exists in the multi-task learning process, namely information sharing exists among the tasks, and when a plurality of tasks are trained simultaneously, the network enhances the induction bias capability of the system and the generalization capability of the classifier by using the shared information among the tasks; the multitask network is divided into five subtasks by adding five full-connection layers behind an interested region module, each full-connection post-connection softmax activation function normalizes a threshold value between [0 and 1], then sends the normalized value to a segmentation function to promote the output of binary codes, and reduces the redundancy among the hash codes through a segmentation learning and coding strategy so as to enhance the robustness of the learned characteristics;
dividing the multi-task learning network into T tasks, wherein each task contains ctOne class, full connected layer one-dimensional vector output per task uses mtRepresents; first, the output of the fully-connected layer is normalized to [0,1] by using a softmax activation function]The formula is embodied as follows:
Figure GDA0002540593270000061
wherein θ represents a random hyperplane; and sending the normalized value into a threshold segmentation function for binarization to obtain binary output of the full-link layer, wherein the formula is specifically expressed as follows:
Figure GDA0002540593270000062
and finally, H obtained by the formula (10) is subjected to H in order to obtain the vehicle attribute hash code which is obtained by the segmented parallel learning of the multitask convolutional networktThe vectors are fused again in a certain proportion, using the vector fAThe formula is expressed in the following concrete form:
fA=[α1H1;α2H2;...;αtHt](11)
in formula (11)Alpha of (A)tThe concrete expression form is as follows:
Figure GDA0002540593270000063
at each HtPreviously multiplying by a penalty factor alphatThe error caused by different classification numbers among different tasks is compensated.
Furthermore, the manually designed functional age uses a large amount of visualized image pyramids, so that object detectors like DPM require high density sampling to obtain good results (e.g. 10 scales per octave); for the recognition task, the engineering features have been largely replaced by features computed by deep convolutional networks. In addition to representing higher-level semantics, the deep convolutional network has stronger scale variability, so that the deep convolutional network is beneficial to being identified from features calculated on a single input scale; however, even with this robustness, the pyramid still needs to get the most accurate results; all of the most important recent entries in ImageNet and COCO detection challenges use multi-scale testing of the characterized image pyramid; the main advantage of characterizing each level of the image pyramid is that it produces a multi-level feature representation, where all levels are semantically strong, including high resolution levels;
simultaneously creating a characteristic pyramid with strong semantics on all scales by utilizing the pyramid shape of the convolution characteristic hierarchical structure; to achieve this goal, low resolution, semantically strong features are combined with high resolution, semantically weak features by top-down paths and transverse connections, and can be constructed quickly from a single input image scale, which can be used to replace a characterized image pyramid without sacrificing representative features, speed or memory; in order to obtain example features of the vehicle image and adapt to the input of a convolution feature map with any size, the last layer of each unit of the sharing modules conv2_ x to conv5_ x is selected and combined with the output of the region-of-interest module, and then a pyramid pooling layer and a vector compression layer are added to compress three-dimensional features into a one-dimensional feature vector, so that the selection is that the feature map information obtained by a feature pyramid can be enriched, and the deepest layer of each stage has the strongest feature representation function;
with the last layer of each module as input to the feature pyramid, {4 ] is selected in turn for the last layer of the networks conv2_ x through conv5_ x defined above2,82,162,162The size of an input feature map of the feature pyramid is used as the size of the feature pyramid; the input image is represented by I, the length and width of the input image are represented by letters h and w, the shared convolution module of the x-th layer is represented by convx _ x, the input image is activated into a three-dimensional feature vector T after being input, the dimension h 'multiplied by w' multiplied by d is a set of a series of two-dimensional feature maps, the length and width of the two-dimensional feature maps are h 'multiplied by w', the T contains d two-dimensional feature maps, and the set S is S { S ═ S { (S) }nIs represented by ∈ (1, d), SnCorresponding to the nth channel characteristic diagram; then, the three-dimensional feature vector T is sent into a feature pyramid, and is subjected to convolution by a plurality of scale convolution kernels to obtain a three-dimensional feature vector T ', the dimension of which is l × l × d, and the three-dimensional feature vector T ' also comprises a group of two-dimensional feature maps, wherein S ' can be used as S ' { S 'nIs represented by ∈ (1, d), wherein Sn' corresponding to the nth channel feature map, each feature map is l × l in size, and the total number of the feature maps is d; then, a sliding window with the size of k multiplied by k and the maximum pooling are selected to carry out logistic regression on the feature maps to obtain a group of feature maps with the size of l/k multiplied by l/k, and then S of each channel is carried outn' conducting fusion to obtain a one-dimensional vector, conducting the same operation on d channels in sequence, and finally obtaining an individual characteristic vector fBThe size is (1, l/k × d). The final search feature vector f is shown in equation (13):
f=[fA;fB](13)。
the basic idea of the locality sensitive hashing algorithm is as follows: after two adjacent data points in the original data space are subjected to the same mapping or projection transformation, the probability that the two data points are still adjacent in the new data space is very high, and the probability that non-adjacent data points are mapped to the same bucket is very low. That is, if we have some hash mapping on the original data, we want two data that were originally adjacent to each other to be able to be hashed into the same bucket, having the same bucket number. After all the data in the original data set are subjected to hash mapping, a hash table is obtained, the original data sets are dispersed into buckets of the hash table, each bucket can fall into some original data, the data belonging to the same bucket are probably adjacent, and certainly, the non-adjacent data are hashed in the same bucket. Therefore, if some hash functions can be found, after the hash mapping transformation of the hash functions, the adjacent data in the original space fall into the same bucket, neighbor searching in the data set becomes easy, and the data adjacent to the query data can be found only by performing hash mapping on the query data to obtain the bucket number of the query data, then taking out all the data in the bucket corresponding to the bucket number, and then performing linear matching. In other words, the original data set is divided into a plurality of subsets through the mapping transformation operation of the hash function, the data in each subset are adjacent, and the number of elements in each subset is small, so that the problem of searching for adjacent elements in a super-large set is converted into the problem of searching for adjacent elements in a small set, and the searching calculation amount can be greatly reduced by the algorithm;
the hash function that two adjacent data points fall into the same bucket after hash transformation originally needs to satisfy the following two conditions:
if d (x, y) ≦ d1, the probability of h (x) ≦ h (y) is at least p 1;
if d (x, y) ≧ d2, the probability of h (x) ≧ h (y) is at most p 2;
where d (x, y) represents the distance between x and y, d1< d2, h (x) and h (y) represent the hash of x and y, respectively.
Hash functions that satisfy the above two conditions are called (d1, d2, p1, p2) -sensitive. And the process of hashing the raw set of data by one or more (d1, d2, p1, p2) -sensitive hash functions to generate one or more hash tables is referred to as locality-sensitive hashing.
The process of using locality sensitive hashing to index mass data, namely, a hash table and performing approximate nearest neighbor lookup through indexing is as follows:
off-line index building
(1) Selecting a hash function satisfying (d1, d2, p1, p2) -sensitive locality sensitive hashes;
(2) determining the number L of hash tables, the number K of hash functions in each hash table and parameters related to the hash functions of the locality sensitive hashes according to the accuracy of the search results, namely the probability of the adjacent data being searched;
(3) hashing all data into corresponding buckets through a hash function of locality sensitive hashing to form one or more hash tables;
on-line lookup
(1) Hashing the query data by a hash function of locality sensitive hashing to obtain a corresponding barrel number;
(2) taking out corresponding data in the barrel number; in order to ensure the searching speed, only the first 2L data are taken out;
(3) calculating the similarity or distance between the query data and the 2L data, and returning the nearest neighbor data;
the locality sensitive hash online lookup time consists of two parts: calculating a hash value, namely calculating the time of a barrel number, by using a hash function of a locality sensitive hash; and secondly, comparing the query data with the data in the bucket for calculating time. Thus, the lookup time for locality sensitive hashes is at least a sub-linear time. This is because the matching speed is increased by indexing the attributes in the bucket, and the second part of the time consumption is changed from O (n) to O (logn) or O (1), which greatly reduces the amount of calculation.
One key of the locality sensitive hashing is as follows: mapping similar samples to the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (14)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAThe hash function of (a), wherein the similarity measure is directly associated with a distance function σ, such as:
Figure GDA0002540593270000091
a typical classification of a locality-sensitive hash function is given by a random projection and a threshold, as shown in equation (16),
h(fA)=sign(WfA+b) (16)
where W is a random hyperplane vector and b is a random intercept.
The locality sensitive hash is composed of a preprocessing algorithm and a nearest neighbor search algorithm, and the search image features are represented into a string of binary codes with fixed length through the processing of the two algorithms;
the preprocessing algorithm comprises the following processes:
inputting a set of extracted image features p and a hash table number l1Mapping image features by using a random hash function g (.) to obtain a point pjStore to hash table TiCorresponding barrel number gi(pj) Performing the following steps; output hash table Ti,i=1,…,l1
The nearest neighbor search algorithm comprises the following processes:
inputting a search image feature q, accessing a hash table T generated by a preprocessing algorithmi,i=1,…,l1The number K of the nearest neighbors returns K nearest neighbor data of the retrieval point q in the data set S;
if ═ I1,I2,…,InIs a data set composed of n images, each image corresponding to a binary code ofH={H1,H2,…,Hn},Hi∈{0,1}h(ii) a Given search image IqAnd binary code HqIs prepared from HqAnd HiHThe Hamming distance between is less than the threshold value THAre put into the candidate pool P,
Figure GDA0002540593270000092
are candidate images.
Constructing a local sensitive reordering algorithm by using the example characteristics; in the traditional locality sensitive hashing algorithm, returned images which are close in distance are mainly used, namely the similarity between a retrieval image and an image in a candidate pool is close to 1; the vehicle attribute hash codes are mapped to obtain vehicles of the same model, but the vehicles of the same model are still difficult to distinguish, and the vehicle attribute hash codes are obviously distinguished from subjective judgment of people, but the differences cannot be effectively distinguished only through the vehicle attribute hash codes; in order to find out vehicles in the candidate pool which have the same individual characteristics as the retrieval pictures, after the retrieval images are mapped into each barrel through the vehicle attribute hash codes, the images in the barrels are sorted again by using the acquired image example characteristics to reduce the errors in the classes, and the representation form of a re-sorting formula is as follows:
Figure GDA0002540593270000101
in equation (17), k represents the kth image in the bucket selected by the vehicle attribute hash code mapping,
Figure GDA0002540593270000102
represents a penalty factor and
Figure GDA0002540593270000103
cos represents a cosine distance formula for measuring the characteristics of the image instance; to exclude the vehicle attribute hash code from being mapped incorrectly, y represents the pre-mapping search image fAqAnd images in bucket
Figure GDA0002540593270000104
Is equal, if equal, y is 1, otherwise is 0;
in further ranking, H has already been assignedqAnd HiHThe Hamming distance between is less than the threshold value THAre put into a candidate pool P in order to obtain more precisionAccording to the search result, a re-ordering method is further adopted on the basis of the candidate pool;
reordering method, given search image IqAnd a candidate pool P, using the instance features to determine top k ranked images from the images in the candidate pool P; the degree of similarity between them is calculated using equation (17),
further, with respect to the re-ranking evaluation, a ranking-based criterion is used for evaluation; for a given search image IqAnd a similarity measure, one ranking for each dataset image; here, a search image I is represented by evaluating the top k ranked imagesqThe retrieval accuracy of (1) is expressed by the formula (18);
Figure GDA0002540593270000105
wherein Rel (I) represents the search image IqThe real correlation between the ith ranking image and the ith ranking image, wherein k represents the number of the ranking images and Precision @ k searching Precision; when real correlation is calculated, only a classification label part, Rel (i) is belonged to {0,1}, if the search image and the ith ranking image have the same label setting Rel (i) ═ 1, otherwise, Rel (i) ═ 0, and the search precision can be obtained by traversing the first k ranking images in the candidate pool P;
in the step 5), when the retrieved image cannot be obtained, a text retrieval mode is adopted for auxiliary retrieval, and the retrieval characteristics obtained through the text and the text characteristics obtained through the convolutional network can share one set of retrieval mode under the condition of not adding extra training, wherein the text acquisition characteristic method comprises the following steps:
initialization: the text file is analyzed into a term vector; removing small words and repeated words; checking the entry to ensure the correctness of the analysis;
5.1) extracting the randomly combined participle minimum vector R (R) from the input text O1,r2,...,rn);
5.2) pairing R with fAIntegrating the sequence and the vehicle attribute Hash codes to obtain text attribute characteristics
Figure GDA0002540593270000106
At this time fATxtA dimension less than R;
5.3) using a locality sensitive reordering Hash algorithm for retrieval;
5.4) returning the similar image group I.
To implement the above summary, several core problems must be solved: 1) aiming at the problem of difficult image feature extraction, the strong feature characterization capability of a depth self-coding convolutional neural network is utilized to realize feature self-adaptive extraction; 2) Aiming at the problem of low retrieval speed of large-scale images, a multi-task layering method is designed, and a query image is used for being rapidly compared with an image in a database; 3) designing a method for extracting example characteristics from a vehicle image for searching similar types of vehicle models; 4) designing a modified locality sensitive hash reordering code to increase the difference between the images of vehicles in the class; 5) by utilizing the advantages of an end-to-end deep network, an end-to-end deep self-coding convolutional neural network is designed, and detection, identification and feature extraction are fused into one network.
The invention discloses a modified locality sensitive hash reordering vehicle retrieval method based on multitask deep learning, which comprises the following processes: 1) sending the image into a depth self-coding convolutional neural network, performing logistic regression on the characteristic graph, and performing position and category segmentation and prediction on an interested region on the retrieval image; 2) extracting vehicle attribute hash codes learned by image segmentation and parallel learning by using a multi-task deep self-coding convolutional neural network; 3) extracting example features of each vehicle by using the pyramid shape of the convolution feature hierarchical structure; 4) retrieving the extracted features by using a modified locality sensitive hashing method; 5) adopting cross-modal retrieval under the condition that the vehicle image cannot be obtained;
the invention has the following beneficial effects:
1) a multi-task end-to-end convolutional neural network is provided for identifying the vehicle type, the vehicle series, the vehicle logo, the color and the license plate of the vehicle;
2) the strong characteristic representation capability of the deep convolutional neural network is utilized to realize the self-adaptive extraction of the characteristics;
3) constructing a characteristic college retrieval extracted from the modified locality sensitive Hash reordering code to the convolutional network;
4) the design gives consideration to universality and specificity, and meets the requirements of various users in the aspects of universality, retrieval speed, precision, practicability and the like; and in the aspect of specificity, a user makes a special data set according to the specific requirement of the user and finely adjusts network parameters to realize a vehicle retrieval system oriented to specific application.
Drawings
Fig. 1 is a flowchart of the overall search.
FIG. 2 is a flow chart of an overall training network.
Fig. 3 is an RPN network development diagram.
Fig. 4 is a schematic diagram of the vehicle attribute hash code being unable to distinguish the vehicle.
FIG. 5 is a diagram of text feature vector generation.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to fig. 1 to 5, a modified locality sensitive hash vehicle retrieval method based on multitask deep learning, the overall flow chart is as shown in fig. 1, firstly, a multitask end-to-end convolutional neural network for deep learning and training identification is sent to pictures in a database, and various attribute information of a vehicle, including a vehicle type, a vehicle family, a vehicle logo, a color and a license plate, is deeply learned through a massive training and layer-by-layer progressive network structure; then, extracting vehicle attribute hash codes which are obtained by segmenting and parallel learning of vehicle images by utilizing the convolutional network, and extracting example features of the vehicle from the constructed feature pyramid module; and comparing the retrieved vehicle image with the image in the database by using a modified locality sensitive hash reordering method.
The multi-task end-to-end convolutional neural network for deep learning and training identification comprises a shared convolution module, an interesting region coordinate regression and identification module, a multi-task learning module and an example feature extraction module, wherein the overall flow chart is shown in FIG. 2 and comprises 4 shared convolution modules and 4 layers of feature pyramid modules; the two-dot chain line in fig. 2 is a vehicle example feature extracted by the compression layer; the dashed line part in fig. 2 is framed by the proposed partitioning and encoding module, which learns the compact features of the vehicle through different tasks, and finally fuses the two extracted feature vectors;
the invention comprises the following steps:
1) a shared convolution module: the shared network consists of 5 convolution modules, where the last layer of conv2_ x through conv5_ x is {4 }, respectively2,82,162,162As the output size of the feature map, conv1 contains only a single convolutional layer as the input layer;
the method comprises the following steps that a region-of-interest coordinate regression and identification module is connected behind a shared convolution module, the module can take an image with any size as input, outputs a set of rectangular prediction frames of a target region, and comprises position coordinates of each prediction frame and probability scores of categories in a data set, in order to generate a region suggestion frame, firstly, the input image generates a feature map through a convolution sharing layer, and then, multi-scale convolution operation is carried out on the feature map, and the method is specifically realized as follows: using 3 scales and 3 length-width ratios at the position of each sliding window, taking the center of the current sliding window as the center and corresponding to one scale and length-width ratio, and then mapping to obtain 9 candidate regions with different scales on an original image; for a shared convolution signature of size w × h, there are a total of w × h × 9 candidate regions. Finally, the classification layer outputs scores of w × h × 9 × 2 candidate regions, that is, the estimated probability that each region is a target/non-target, and the regression layer outputs w × h × 9 × 4 parameters, that is, coordinate parameters of the candidate regions, and the specific form is shown in fig. 3;
when the RPN network is trained, each candidate region is assigned with a binary label so as to mark whether the region is an object target. The specific operation is as follows: 1) IoU (Intersection-over-Union ratio) with the highest overlap candidate area with a real target area (GT); 2) candidate regions with IoU overlap with any GT bounding box by more than 0.7. Assigning negative labels to candidate regions for which the IoU ratio for all GT bounding boxes is below 0.3; 3) between the two.
With these definitions, the objective function is minimized. The loss function for an image is defined as:
Figure GDA0002540593270000121
where i is the index of the ith candidate region, PiIs the probability that the candidate region is of the ith class. If the label of the candidate region is positive,
Figure GDA0002540593270000131
is 1, if the candidate area label is 0,
Figure GDA0002540593270000132
is simply 0. t is tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure GDA0002540593270000133
is the coordinate vector of the corresponding GT bounding box. N is a radical ofclsAnd NregThe normalized coefficients are respectively a classification loss function and a position regression loss function, and lambda is a weight parameter between the two. Classification loss function LclsIs the log loss of two classes (target vs. non-target):
Figure GDA0002540593270000134
regression loss function L for positionregDefined by the following function:
Figure GDA0002540593270000135
where R is a robust loss function (smooth L1).
Figure GDA0002540593270000136
However, training a multitask deep learning network is not a goalThe process is easy to implement because the information of different task levels has different learning difficulty and convergence speed respectively. Therefore, it is crucial to design a good multitask objective function. The multitask joint training process is as follows: assuming that the total number of tasks is T, the training data for the T-th task is recorded as
Figure GDA0002540593270000137
Wherein T belongs to (1, T), i belongs to (1, N), and N is the total training samples.
Figure GDA0002540593270000138
Respectively, the feature vector and the label of the ith sample. Then the multitasking objective function can be expressed as:
Figure GDA0002540593270000139
in the formula
Figure GDA00025405932700001310
Is an input feature vector
Figure GDA00025405932700001311
And a weight parameter wtL (-) is a loss function, phi (w)t) Is a regularization value of the weight parameter.
And for the loss function, training the characteristics of the last layer by utilizing softmax in cooperation with a log-likelihood cost function to realize image classification. The softmax loss function is defined as follows:
Figure GDA00025405932700001312
in the formula, xiIs the ith depth feature, WjB is a bias term, m and n are the number of processed samples and the category number respectively;
the convolutional neural network training is a back propagation process, is similar to a BP algorithm, performs back propagation through an error function, and performs optimization adjustment on convolutional parameters and bias by using a random gradient descent method until the network converges or reaches the maximum iteration times;
the neural network training is a back propagation process, the convolution parameters and the bias are optimized and adjusted by a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times are reached;
the back propagation needs to compare the training samples with labels, adopt a square error cost function to identify multiple classes of the c classes and the N training samples, calculate the error by the formula (7) according to the final output error function of the network,
Figure GDA0002540593270000141
in the formula, ENIn order to be a function of the squared error cost,
Figure GDA0002540593270000142
for the kth dimension of the label for the nth sample,
Figure GDA0002540593270000143
corresponding to the k output of the network prediction for the n sample;
when the error function is reversely propagated, a calculation method similar to the traditional BP algorithm is adopted, the specific formula form is shown as a formula (8),
l=(Wl+1)T l+1×f'(ul) (ul=Wlxl-1+bl) (8)
in the formula (I), the compound is shown in the specification,lrepresenting the error function of the current layer,l+1representing the error function of the previous layer, Wl+1For the previous layer of the mapping matrix, f' represents the inverse of the activation function, i.e. upsampling, ulOutput, x, representing the layer above the failed activation functionl-1Denotes the input of the next layer, WlMapping a weight matrix for the layer;
2) the relevance of tasks exists in the multi-task learning process, namely information sharing exists among the tasks, and when a plurality of tasks are trained simultaneously, the network enhances the induction bias capability of the system and the generalization capability of the classifier by using the shared information among the tasks; the multitask network is divided into five subtasks by adding five full-connection layers behind an interested region module, each full-connection post-connection softmax activation function normalizes a threshold value between [0 and 1], then sends the normalized value to a segmentation function to promote the output of binary codes, and reduces the redundancy among the hash codes through a segmentation learning and coding strategy so as to enhance the robustness of the learned characteristics;
dividing the multi-task learning network into T tasks, wherein each task contains ctOne class, full connected layer one-dimensional vector output per task uses mtRepresents; first, the output of the fully-connected layer is normalized to [0,1] by using a softmax activation function]The formula is embodied as follows:
Figure GDA0002540593270000144
wherein θ represents a random hyperplane; and sending the normalized value into a threshold segmentation function for binarization to obtain binary output of the full-link layer, wherein the formula is specifically expressed as follows:
Figure GDA0002540593270000145
and finally, H obtained by the formula (10) is subjected to H in order to obtain the vehicle attribute hash code which is obtained by the segmented parallel learning of the multitask convolutional networktThe vectors are fused again in a certain proportion, using the vector fAThe formula is expressed in the following concrete form:
fA=[α1H1;α2H2;...;αtHt](11)
α in formula (11)tThe concrete expression form is as follows:
Figure GDA0002540593270000151
at each HtPreviously multiplying by a penalty factor alphatThe error caused by different classification numbers among different tasks is made up;
3) simultaneously creating a characteristic pyramid with strong semantics on all scales by utilizing the pyramid shape of the convolution characteristic hierarchical structure; to achieve this goal, low resolution, semantically strong features are combined with high resolution, semantically weak features by top-down paths and transverse connections, and can be constructed quickly from a single input image scale, which can be used to replace a characterized image pyramid without sacrificing representative features, speed or memory; in order to obtain example features of the vehicle image and adapt to the input of a convolution feature map with any size, the last layer of each unit of the sharing modules conv2_ x to conv5_ x is selected and combined with the output of the region-of-interest module, and then a pyramid pooling layer and a vector compression layer are added to compress three-dimensional features into a one-dimensional feature vector, so that the selection is that the feature map information obtained by a feature pyramid can be enriched, and the deepest layer of each stage has the strongest feature representation function;
with the last layer of each module as input to the feature pyramid, {4 ] is selected in turn for the last layer of the networks conv2_ x through conv5_ x defined above2,82,162,162The size of an input feature map of the feature pyramid is used as the size of the feature pyramid; the input image is represented by I, the length and width of the input image are represented by letters h and w, the shared convolution module of the x-th layer is represented by convx _ x, the input image is activated into a three-dimensional feature vector T after being input, the dimension h 'multiplied by w' multiplied by d is a set of a series of two-dimensional feature maps, the length and width of the two-dimensional feature maps are h 'multiplied by w', the T contains d two-dimensional feature maps, and the set S is S { S ═ S { (S) }nIs represented by ∈ (1, d), SnCorresponding to the nth channel characteristic diagram; then, the three-dimensional feature vector T is sent into a feature pyramid, and is subjected to convolution by a plurality of scale convolution kernels to obtain a three-dimensional feature vector T ', the dimension of which is l × l × d, and the three-dimensional feature vector T ' also comprises a group of two-dimensional feature maps, wherein S ' can be used as S ' { S 'nIs represented by ∈ (1, d), wherein Sn' corresponds to the n-th channelEach feature map is l multiplied by l in size and comprises d in number; then, a sliding window with the size of k multiplied by k and the maximum pooling are selected to carry out logistic regression on the feature maps to obtain a group of feature maps with the size of l/k multiplied by l/k, and then S of each channel is carried outn' conducting fusion to obtain a one-dimensional vector, conducting the same operation on d channels in sequence, and finally obtaining an individual characteristic vector fBThe size is (1, l/k × d). The final search feature vector f is shown in equation (13):
f=[fA;fB](13)
the basic idea of the locality sensitive hashing algorithm is as follows: after two adjacent data points in the original data space are subjected to the same mapping or projection transformation, the probability that the two data points are still adjacent in the new data space is very high, and the probability that non-adjacent data points are mapped to the same bucket is very low. That is, if we have some hash mapping on the original data, we want two data that were originally adjacent to each other to be able to be hashed into the same bucket, having the same bucket number. After all the data in the original data set are subjected to hash mapping, a hash table is obtained, the original data sets are dispersed into buckets of the hash table, each bucket can fall into some original data, the data belonging to the same bucket are probably adjacent, and certainly, the non-adjacent data are hashed in the same bucket. Therefore, if some hash functions can be found, after the hash mapping transformation of the hash functions, the adjacent data in the original space fall into the same bucket, neighbor searching in the data set becomes easy, and the data adjacent to the query data can be found only by performing hash mapping on the query data to obtain the bucket number of the query data, then taking out all the data in the bucket corresponding to the bucket number, and then performing linear matching. In other words, the original data set is divided into a plurality of subsets through the mapping transformation operation of the hash function, the data in each subset are adjacent, and the number of elements in each subset is small, so that the problem of searching for adjacent elements in a super-large set is converted into the problem of searching for adjacent elements in a small set, and the searching calculation amount can be greatly reduced by the algorithm;
the hash function that two adjacent data points fall into the same bucket after hash transformation originally needs to satisfy the following two conditions:
if d (x, y) ≦ d1, the probability of h (x) ≦ h (y) is at least p 1;
if d (x, y) ≧ d2, the probability of h (x) ≧ h (y) is at most p 2;
where d (x, y) represents the distance between x and y, d1< d2, h (x) and h (y) represent the hash of x and y, respectively.
Hash functions that satisfy the above two conditions are called (d1, d2, p1, p2) -sensitive. And the process of hashing the raw set of data by one or more (d1, d2, p1, p2) -sensitive hash functions to generate one or more hash tables is referred to as locality-sensitive hashing.
The process of using locality sensitive hashing to index mass data, namely, a hash table and performing approximate nearest neighbor lookup through indexing is as follows:
off-line index building
(1) Selecting a hash function satisfying (d1, d2, p1, p2) -sensitive locality sensitive hashes;
(2) determining the number L of hash tables, the number K of hash functions in each hash table and parameters related to the hash functions of the locality sensitive hashes according to the accuracy of the search results, namely the probability of the adjacent data being searched;
(3) hashing all data into corresponding buckets through a hash function of locality sensitive hashing to form one or more hash tables;
on-line lookup
(1) Hashing the query data by a hash function of locality sensitive hashing to obtain a corresponding barrel number;
(2) taking out corresponding data in the barrel number; in order to ensure the searching speed, only the first 2L data are taken out;
(3) calculating the similarity or distance between the query data and the 2L data, and returning the nearest neighbor data;
the locality sensitive hash online lookup time consists of two parts: calculating a hash value, namely calculating the time of a barrel number, by using a hash function of a locality sensitive hash; and secondly, comparing the query data with the data in the bucket for calculating time. Thus, the lookup time for locality sensitive hashes is at least a sub-linear time. This is because the matching speed is increased by indexing the part belonging to the bucket, and the time consumption of the second part is changed from O (N) to O (logN) or O (1), thereby greatly reducing the calculation amount;
one key of the locality sensitive hashing is as follows: mapping similar samples to the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (14)
in the formula, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAThe hash function of (a), wherein the similarity measure is directly associated with a distance function σ, such as:
Figure GDA0002540593270000171
a typical classification of a locality-sensitive hash function is given by a random projection and a threshold, as shown in equation (16),
h(fA)=sign(WfA+b) (16)
where W is a random hyperplane vector and b is a random intercept.
The locality sensitive hash is composed of a preprocessing algorithm and a nearest neighbor search algorithm, and the search image features are represented into a string of binary codes with fixed length through the processing of the two algorithms;
the preprocessing algorithm comprises the following processes:
inputting a set of extracted image features p and a hash table number l1Mapping image features by using a random hash function g (.) to obtain a point pjStore to hash table TiCorresponding barrel number gi(pj) Performing the following steps; output hash table Ti,i=1,…,l1
The nearest neighbor search algorithm comprises the following processes:
inputting a search image feature q, accessing a hash table T generated by a preprocessing algorithmi,i=1,…,l1The number K of the nearest neighbors returns K nearest neighbor data of the retrieval point q in the data set S;
if ═ I1,I2,…,InIs a data set composed of n images, each image corresponding to a binary code ofH={H1,H2,…,Hn},Hi∈{0,1}h(ii) a Given search image IqAnd binary code HqIs prepared from HqAnd HiHThe Hamming distance between is less than the threshold value THAre put into the candidate pool P,
Figure GDA0002540593270000181
are candidate images.
4) Constructing a local sensitive reordering algorithm by using the example characteristics; in the traditional locality sensitive hashing algorithm, returned images which are close in distance are mainly used, namely the similarity between a retrieval image and an image in a candidate pool is close to 1; the vehicle attribute hash codes are mapped through the low-dimensional vehicle attribute hash codes to obtain vehicles of the same model, but the vehicles of the same model are still difficult to distinguish, and are obviously distinguished from subjective judgment of people, but the differences cannot be effectively distinguished through the vehicle attribute hash codes, as shown in fig. 4; in order to find out vehicles in the candidate pool which have the same individual characteristics as the retrieval pictures, after the retrieval images are mapped into each barrel through the vehicle attribute hash codes, the images in the barrels are sorted again by using the acquired image example characteristics to reduce the errors in the classes, and the representation form of a re-sorting formula is as follows:
Figure GDA0002540593270000182
in equation (17), k represents the kth image in the bucket selected by the vehicle attribute hash code mapping,
Figure GDA0002540593270000183
represents a penalty factor and
Figure GDA0002540593270000184
cos represents a cosine distance formula for measuring the characteristics of the image instance; to exclude the vehicle attribute hash code from being mapped incorrectly, y represents the pre-mapping search image fAqAnd images in bucket
Figure GDA0002540593270000185
Is equal, if equal, y is 1, otherwise is 0;
in further ranking, H has already been assignedqAnd HiHThe Hamming distance between is less than the threshold value THThe images are put into a candidate pool P, and in order to obtain a more accurate search result, a re-ordering method is further adopted on the basis of the candidate pool;
reordering method, given search image IqAnd a candidate pool P, using the instance features to determine top k ranked images from the images in the candidate pool P; the degree of similarity between them is calculated using equation (17),
further, with respect to the re-ranking evaluation, a ranking-based criterion is used for evaluation; for a given search image IqAnd a similarity measure, one ranking for each dataset image; here, a search image I is represented by evaluating the top k ranked imagesqThe retrieval accuracy of (1) is expressed by the formula (18);
Figure GDA0002540593270000186
wherein Rel (I) represents a search image IqThe real correlation between the ith ranking image and the ith ranking image, wherein k represents the number of the ranking images and Precision @ k searching Precision; in computing true correlations, only the part of the class label, Rel (i) e, is considered{0,1}, if the search image and the ith ranking image have the same label setting rel (i) ═ 1, otherwise, setting rel (i) ═ 0, traversing the first k ranking images in the candidate pool P to obtain the search precision;
5) when the retrieved image cannot be obtained, a text retrieval mode is adopted for auxiliary retrieval, so that the retrieval characteristics obtained through the text and the text characteristics obtained through the convolution network can share one set of retrieval mode under the condition of not adding extra training, and if a certain text contains a vehicle description information identification marker, as shown in fig. 5, the text acquisition characteristic method comprises the following steps:
the initialization process is as follows: the text file is analyzed into a term vector; removing small words and repeated words; checking the entry to ensure the correctness of the analysis;
5.1) extracting the randomly combined participle minimum vector R (R) from the input text O1,r2,...,rn);
5.2) pairing R with fAIntegrating the sequence and the vehicle attribute Hash codes to obtain text attribute characteristics
Figure GDA0002540593270000191
At this time fATxtA dimension less than R;
5.3) using a locality sensitive reordering Hash algorithm for retrieval;
5.4) returning the similar image group I;
the above description is only exemplary of the preferred embodiments of the present invention, and is not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A modified locality sensitive Hash vehicle retrieval method based on multitask deep learning is characterized by comprising the following steps:
1) constructing a multi-task end-to-end convolutional neural network for deep learning and training recognition, and deeply learning various attribute information of the vehicle, including vehicle type, vehicle system, vehicle logo, color and license plate, by training data and a network structure which progresses layer by layer;
2) constructing a vehicle attribute hash code by using the multitask convolutional neural network in the step 1) and adopting a segmented parallel learning and coding strategy;
3) constructing a characteristic pyramid module by utilizing a pyramid pooling layer and a vector compression layer so as to adapt to the input of convolution characteristic graphs of different sizes and extract example characteristics of the vehicle;
4) constructing a local sensitive reordering algorithm by using the example characteristics obtained in the step 3);
5) constructing a cross-mode retrieval method under the condition that a retrieval vehicle image cannot be obtained, and realizing vehicle retrieval;
the multi-task end-to-end convolutional neural network for deep learning and training identification comprises a shared convolution module, an interesting region coordinate regression and identification module and a multi-task learning module;
a shared convolution module: the shared network consists of 5 convolution modules, where the last layer of conv2_ x through conv5_ x is {4 }, respectively2,82,162,162As the output size of the feature map, conv1 contains only a single convolutional layer as the input layer;
the method comprises the following steps that a region-of-interest coordinate regression and identification module is connected behind a shared convolution module, the module takes an image with any size as input, outputs a set of rectangular prediction frames of a target region, and comprises position coordinates of each prediction frame and probability scores of categories in a data set, in order to generate a region suggestion frame, firstly, the input image generates a feature map through a convolution sharing layer, and then, multi-scale convolution operation is carried out on the feature map, and the implementation process is as follows: using 3 scales and 3 length-width ratios at the position of each sliding window, taking the center of the current sliding window as the center and corresponding to one scale and length-width ratio, and then mapping to obtain 9 candidate regions with different scales on an original image; if the shared convolution characteristic diagram with the size of w × h is used, w × h × 9 candidate regions are totally obtained; finally, the classification layer outputs scores of w × h × 9 × 2 candidate regions, namely, the estimation probability that each region is a target/non-target, and the regression layer outputs w × h × 9 × 4 parameters, namely, coordinate parameters of the candidate regions;
when the RPN is trained, each candidate region is allocated with a binary label so as to mark whether the region is an object target, and the operation is as follows: 1) IoU overlap candidate regions highest with a certain real target region GT; 2) candidate regions with IoU overlap of greater than 0.7 with any GT bounding box, assigning negative labels to candidate regions with IoU ratios to all GT bounding boxes below 0.3; 3) discard between the two;
with these definitions, the objective function is minimized, and the loss function for an image is defined as:
Figure FDA0002540593260000021
where i is the index of the ith candidate region, piIs the probability that the candidate region is of the ith class, if the label of the candidate region is positive,
Figure FDA0002540593260000022
is 1, if the candidate area label is 0,
Figure FDA0002540593260000023
is 0, tiIs a vector, representing the 4 parameterized coordinates of the predicted bounding box,
Figure FDA0002540593260000024
is the coordinate vector of the corresponding GT bounding box, NclsAnd NregRespectively, the normalized coefficients of the classification loss function and the position regression loss function, lambda is the weight parameter between the two, and the classification loss function LclsIs a log loss of two classes, target and non-target:
Figure FDA0002540593260000025
regression loss function L for positionregDefined by the following function:
Figure FDA0002540593260000026
wherein R is a robust loss function smoothL1
Figure FDA0002540593260000027
However, training a multitask deep learning network is not an easy process to implement, because information at different task levels has different learning difficulties and convergence rates, and the multitask joint training process is as follows: assuming that the total number of tasks is T, the training data for the T-th task is recorded as
Figure FDA0002540593260000028
Wherein T belongs to (1, T), i belongs to (1, N), N is the total training sample number,
Figure FDA0002540593260000029
respectively, the feature vector and the label of the ith sample, the multitask objective function is expressed as:
Figure FDA00025405932600000210
in the formula
Figure FDA00025405932600000211
Is an input feature vector
Figure FDA00025405932600000212
And a weight parameter wtL (-) is a loss function, phi (w)t) Is a regularization value of a weight parameter;
for a loss function, training the characteristics of the last layer by utilizing softmax matched with a log-likelihood cost function to realize image classification, wherein the softmax loss function is defined as follows:
Figure FDA00025405932600000213
in the formula, xiIs the ith depth feature, WjB is a bias term, m and n are the number of processed samples and the category number respectively;
the convolutional neural network training is a back propagation process, and the convolutional parameters and the bias are optimized and adjusted by using a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times is reached;
the neural network training is a back propagation process, the convolution parameters and the bias are optimized and adjusted by a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times are reached;
the back propagation needs to compare the training samples with labels, adopt a square error cost function to identify multiple classes of the c classes and the N training samples, calculate the error by the formula (7) according to the final output error function of the network,
Figure FDA0002540593260000031
in the formula, ENIn order to be a function of the squared error cost,
Figure FDA0002540593260000032
for the kth dimension of the label for the nth sample,
Figure FDA0002540593260000033
corresponding to the k output of the network prediction for the n sample;
when the error function is reversely propagated, a calculation method similar to the traditional BP algorithm is adopted, the specific formula form is shown as a formula (8),
l=(Wl+1)T l+1×f'(ul) (ul=Wlxl-1+bl) (8)
in the formula (I), the compound is shown in the specification,lrepresenting the error function of the current layer,l+1representing the error function of the previous layer, Wl+1For the previous layer of the mapping matrix, f' represents the inverse of the activation function, i.e. upsampling, ulOutput, x, representing the layer above the failed activation functionl-1Denotes the input of the next layer, WlThe weight matrix is mapped for this layer.
2. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 1, wherein: the multi-task has relevance between tasks in the learning process, namely information sharing exists between the tasks,
when a plurality of tasks are trained simultaneously, the network utilizes shared information among the tasks to enhance the induction bias capability of the system and the generalization capability of the classifier; the multitask network is divided into five subtasks by adding five full-connection layers behind an interested region module, each full-connection post-connection softmax activation function normalizes a threshold value between [0 and 1], then sends the normalized value to a segmentation function to promote the output of binary codes, and reduces the redundancy among the hash codes through a segmentation learning and coding strategy so as to enhance the robustness of the learned characteristics;
dividing the multi-task learning network into T tasks, wherein each task contains ctOne class, full connected layer one-dimensional vector output per task uses mtIndicating that the output of the fully-connected layer is first normalized to [0,1] using the softmax activation function]The formula is embodied as follows:
Figure FDA0002540593260000041
wherein θ represents a random hyperplane; and sending the normalized value into a threshold segmentation function for binarization to obtain binary output of the full-link layer, wherein the formula is specifically expressed as follows:
Figure FDA0002540593260000042
and finally, H obtained by the formula (10) is subjected to H in order to obtain the vehicle attribute hash code which is obtained by the segmented parallel learning of the multitask convolutional networktThe vectors are fused again in a certain proportion, using the vector fAThe formula is expressed in the following concrete form:
fA=[α1H1;α2H2;...;αtHt](11)
α in formula (11)tThe concrete expression form is as follows:
Figure FDA0002540593260000043
at each HtPreviously multiplying by a penalty factor alphatThe error caused by different classification numbers among different tasks is compensated.
3. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 2, wherein: simultaneously creating a characteristic pyramid with strong semantics on all scales by utilizing the pyramid shape of the convolution characteristic hierarchical structure; to achieve this goal, low resolution, semantically strong features are combined with high resolution, semantically weak features by top-down paths and transverse connections by means of a structure; the feature pyramid has rich semantics on all levels, can be quickly constructed from a single input image scale, can be used for replacing a characterized image pyramid without sacrificing representative features, speed or memory, selects the last layer of each unit of sharing modules conv2_ x to conv5_ x and combines the output of an interested region module in order to obtain example features of a vehicle image and adapt to the input of a convolution feature map with any size, and then adds a pyramid pooling layer and a vector compression layer to compress three-dimensional features into a one-dimensional feature vector, so that the feature information obtained by the feature pyramid can be enriched, and the deepest layer of each stage has the strongest feature representation function;
most of each moduleThe latter layer is used as input for the feature pyramid, and {4 } is selected in turn for the last layer of the networks conv2_ x through conv5_ x defined above2,82,162,162The size of an input feature map of the feature pyramid is used as the size of the feature pyramid; the input image is represented by I, the length and width of the input image are represented by letters h and w, the shared convolution module of the x-th layer is represented by convx _ x, the input image is activated into a three-dimensional feature vector T after being input, the dimension h 'multiplied by w' multiplied by d is a set of a series of two-dimensional feature maps, the length and width of the two-dimensional feature maps are h 'multiplied by w', the T contains d two-dimensional feature maps, and the set S is S { S ═ S { (S) }nIs represented by ∈ (1, d), SnCorresponding to the nth channel characteristic diagram; then, the three-dimensional feature vector T is sent into a feature pyramid, and is subjected to convolution by a plurality of scale convolution kernels to obtain a three-dimensional feature vector T ', the dimension of which is l × l × d, and the three-dimensional feature vector T ' also comprises a group of two-dimensional feature maps, wherein S ' can be used as S ' { S 'nDenotes n ∈ (1, d), wherein S'nCorresponding to the nth channel characteristic diagram, wherein the size of each characteristic diagram is l multiplied by l, and the total number of the characteristic diagrams is d; then, performing logistic regression on the feature maps by using a k × k sliding window and selecting the maximum pooling to obtain a group of l/k × l/k feature maps, and then performing S 'on each channel'nFusing to obtain a one-dimensional vector, sequentially performing the same operation on the d channels, and finally obtaining an individual feature vector fBThe size is (1, l/k × d), and the final search feature vector f is shown in formula (13):
f=[fA;fB](13)。
4. the revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 3, wherein: performing fast image comparison on the compact binary code by using a hash method and a Hamming distance; the Hash method adopts a locality sensitive Hash algorithm, namely, a Hash bit is constructed by adopting random projection transformation;
one key of the locality sensitive hashing is as follows: mapping similar samples to the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:
s{h(fAq)=h(fA)}=sim(fAq,fA) (14)
in the formula (f)AqRepresenting the pre-map search image, sim (f)Aq,fA) Denotes fAqAnd fASimilarity of (c), h (f)A) Denotes fAHash function of h (f)Aq) Denotes fAqThe hash function of (3), wherein the similarity measure is directly associated with a distance function σ, such as:
Figure FDA0002540593260000051
a typical classification of a locality-sensitive hash function is given by a random projection and a threshold, as shown in equation (16),
h(fA)=sign(WfA+b) (16)
where W is a random hyperplane vector and b is a random intercept.
5. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 4, wherein: the locality sensitive hash is composed of a preprocessing algorithm and a nearest neighbor search algorithm, and the search image features are represented into a string of binary codes with fixed length through the processing of the two algorithms;
the preprocessing algorithm comprises the following processes:
inputting a set of extracted image features p and a hash table number l1Mapping image features by using a random hash function g (.) to obtain a point pjStore to hash table TiCorresponding barrel number gi(pj) Performing the following steps; output hash table Ti,i=1,…,l1
The nearest neighbor search algorithm comprises the following processes:
inputting a search image feature q, accessing a hash table T generated by a preprocessing algorithmi,i=1,…,l1The number K of the nearest neighbors returns K nearest neighbor data of the retrieval point q in the data set S;
if ═ I1,I2,…,InIs a data set composed of n images, each image corresponding to a binary code ofH={H1,H2,…,Hn},Hi∈{0,1}h(ii) a Given search image IqAnd binary code HqIs prepared from HqAnd HiHThe Hamming distance between is less than the threshold value THAre put into the candidate pool P,
Figure FDA0002540593260000061
Figure FDA0002540593260000062
are candidate images.
CN201711135951.XA 2017-11-16 2017-11-16 Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning Active CN108108657B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711135951.XA CN108108657B (en) 2017-11-16 2017-11-16 Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711135951.XA CN108108657B (en) 2017-11-16 2017-11-16 Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning

Publications (2)

Publication Number Publication Date
CN108108657A CN108108657A (en) 2018-06-01
CN108108657B true CN108108657B (en) 2020-10-30

Family

ID=62206830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711135951.XA Active CN108108657B (en) 2017-11-16 2017-11-16 Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning

Country Status (1)

Country Link
CN (1) CN108108657B (en)

Families Citing this family (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11341631B2 (en) 2017-08-09 2022-05-24 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a physiological condition from a medical image of a patient
CN110443266B (en) * 2018-05-04 2022-06-24 上海商汤智能科技有限公司 Object prediction method and device, electronic equipment and storage medium
CN108791302B (en) * 2018-06-25 2020-05-19 大连大学 Driver behavior modeling system
CN108791308B (en) * 2018-06-25 2020-05-19 大连大学 System for constructing driving strategy based on driving environment
CN108819948B (en) * 2018-06-25 2020-05-19 大连大学 Driver behavior modeling method based on reverse reinforcement learning
CN108891421B (en) * 2018-06-25 2020-05-19 大连大学 Method for constructing driving strategy
CN108944940B (en) * 2018-06-25 2020-05-19 大连大学 Driver behavior modeling method based on neural network
CN109086866B (en) * 2018-07-02 2021-07-30 重庆大学 Partial binary convolution method suitable for embedded equipment
CN109308495B (en) * 2018-07-05 2021-07-02 科亚医疗科技股份有限公司 Apparatus and system for automatically predicting physiological condition from medical image of patient
US10846554B2 (en) 2018-07-17 2020-11-24 Avigilon Corporation Hash-based appearance search
EP3807782A4 (en) * 2018-07-17 2022-03-23 Avigilon Corporation Hash-based appearance search
CN109165306B (en) * 2018-08-09 2021-11-23 长沙理工大学 Image retrieval method based on multitask Hash learning
CN109144648B (en) * 2018-08-21 2020-06-23 第四范式(北京)技术有限公司 Method and system for uniformly performing feature extraction
CN109241322B (en) * 2018-08-28 2020-09-11 北京地平线机器人技术研发有限公司 Code generation method, code generation device and electronic equipment
CN109242019B (en) * 2018-09-01 2022-05-17 哈尔滨工程大学 Rapid detection and tracking method for optical small target on water surface
CN110879846A (en) * 2018-09-05 2020-03-13 深圳云天励飞技术有限公司 Image retrieval method and device, electronic equipment and computer-readable storage medium
CN109299097B (en) * 2018-09-27 2022-06-21 宁波大学 Online high-dimensional data nearest neighbor query method based on Hash learning
CN109583305B (en) * 2018-10-30 2022-05-20 南昌大学 Advanced vehicle re-identification method based on key component identification and fine-grained classification
CN109614512B (en) * 2018-11-29 2022-02-22 亿嘉和科技股份有限公司 Deep learning-based power equipment retrieval method
CN111325061B (en) * 2018-12-14 2023-05-23 顺丰科技有限公司 Vehicle detection algorithm, device and storage medium based on deep learning
CN110019652B (en) * 2019-03-14 2022-06-03 九江学院 Cross-modal Hash retrieval method based on deep learning
CN110059634B (en) * 2019-04-19 2023-04-18 山东博昂信息科技有限公司 Large-scene face snapshot method
CN110110325B (en) * 2019-04-22 2022-12-20 北京明智和术科技有限公司 Repeated case searching method and device and computer readable storage medium
CN110059967B (en) * 2019-04-23 2021-02-23 北京相数科技有限公司 Data processing method and device applied to city aid decision analysis
CN110135470A (en) * 2019-04-24 2019-08-16 电子科技大学 A kind of vehicle characteristics emerging system based on multi-modal vehicle feature recognition
CN110135419B (en) * 2019-05-06 2023-04-28 南京大学 Method for recognizing end-to-end text in natural scene
CN110059771B (en) * 2019-05-10 2021-01-15 合肥工业大学 Interactive vehicle data classification method under ordering support
CN110189394B (en) * 2019-05-14 2020-12-29 北京字节跳动网络技术有限公司 Mouth shape generation method and device and electronic equipment
CN110211109B (en) * 2019-05-30 2022-12-06 西安电子科技大学 Image change detection method based on deep neural network structure optimization
CN110309888A (en) * 2019-07-11 2019-10-08 南京邮电大学 A kind of image classification method and system based on layering multi-task learning
CN110362543B (en) * 2019-07-22 2022-03-29 武汉上善仿真科技有限责任公司 File name specifying system of automobile body technical solution and using method
CN110427509A (en) * 2019-08-05 2019-11-08 山东浪潮人工智能研究院有限公司 A kind of multi-scale feature fusion image Hash search method and system based on deep learning
CN110532904B (en) * 2019-08-13 2022-08-05 桂林电子科技大学 Vehicle identification method
CN110580503A (en) * 2019-08-22 2019-12-17 江苏和正特种装备有限公司 AI-based double-spectrum target automatic identification method
CN110516640B (en) * 2019-08-30 2022-09-30 华侨大学 Vehicle re-identification method based on feature pyramid joint representation
CN110543600A (en) * 2019-09-11 2019-12-06 上海携程国际旅行社有限公司 Search ranking method, system, device and storage medium based on neural network
CN110738248B (en) * 2019-09-30 2022-09-27 朔黄铁路发展有限责任公司 State perception data feature extraction method and device and system performance evaluation method
CN110751122A (en) * 2019-10-28 2020-02-04 中国电子科技集团公司第二十八研究所 License plate classification and identification method based on Gabor characteristic self-encoder
CN111046166B (en) * 2019-12-10 2022-10-11 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111460200B (en) * 2020-03-04 2023-07-04 西北大学 Image retrieval method and model based on multitask deep learning and construction method thereof
CN111523403B (en) * 2020-04-03 2023-10-20 咪咕文化科技有限公司 Method and device for acquiring target area in picture and computer readable storage medium
CN111581471B (en) * 2020-05-09 2023-11-10 北京京东振世信息技术有限公司 Regional vehicle checking method, device, server and medium
CN111666898B (en) * 2020-06-09 2021-10-26 北京字节跳动网络技术有限公司 Method and device for identifying class to which vehicle belongs
CN111881312B (en) * 2020-07-24 2022-07-05 成都成信高科信息技术有限公司 Image data set classification and division method
CN111814023B (en) * 2020-07-30 2021-06-15 广州威尔森信息科技有限公司 Automobile model network price monitoring system
CN111814751A (en) * 2020-08-14 2020-10-23 深延科技(北京)有限公司 Vehicle attribute analysis method and system based on deep learning target detection and image recognition
CN112446431B (en) * 2020-11-27 2024-08-27 鹏城实验室 Feature point extraction and matching method, network, equipment and computer storage medium
CN112507862B (en) * 2020-12-04 2023-05-26 东风汽车集团有限公司 Vehicle orientation detection method and system based on multitasking convolutional neural network
CN112686125A (en) * 2020-12-25 2021-04-20 浙江大华技术股份有限公司 Vehicle type determination method and device, storage medium and electronic device
CN112699402B (en) * 2020-12-28 2022-06-17 广西师范大学 Wearable device activity prediction method based on federal personalized random forest
CN112699953B (en) * 2021-01-07 2024-03-19 北京大学 Feature pyramid neural network architecture searching method based on multi-information path aggregation
CN112906804B (en) * 2021-03-02 2023-12-19 华南理工大学 Hash sample balance cancer labeling method for histopathological image
CN113076962B (en) * 2021-05-14 2022-10-21 电子科技大学 Multi-scale target detection method based on micro neural network search technology
CN113378972B (en) * 2021-06-28 2024-03-22 成都恒创新星科技有限公司 License plate recognition method and system under complex scene
CN113377981B (en) * 2021-06-29 2022-05-27 山东建筑大学 Large-scale logistics commodity image retrieval method based on multitask deep hash learning
CN113470001B (en) * 2021-07-22 2024-01-09 西北工业大学 Target searching method for infrared image
CN115102982B (en) * 2021-11-19 2023-06-23 北京邮电大学 Semantic communication method for intelligent task
CN114297582A (en) * 2021-12-28 2022-04-08 浙江大学 Modeling method of discrete counting data based on multi-probe locality sensitive Hash negative binomial regression model
CN114912629A (en) * 2022-03-08 2022-08-16 北京百度网讯科技有限公司 Joint perception model training method, joint perception device, joint perception equipment and medium
CN114648107A (en) * 2022-03-10 2022-06-21 北京宏景智驾科技有限公司 Method and circuit for improving efficiency of calculation of neural network input image point cloud convolution layer
CN114911965A (en) * 2022-04-19 2022-08-16 超级视线科技有限公司 Vehicle information query method and system
CN114972761B (en) * 2022-06-20 2024-05-07 平安科技(深圳)有限公司 Vehicle part segmentation method based on artificial intelligence and related equipment
CN115357747B (en) * 2022-10-18 2024-03-26 山东建筑大学 Image retrieval method and system based on ordinal hash
CN116108217B (en) * 2022-10-27 2023-12-19 浙江大学 Fee evasion vehicle similar picture retrieval method based on depth hash coding and multitask prediction
CN115994537B (en) * 2023-01-09 2023-06-20 杭州实在智能科技有限公司 Multitask learning method and system for solving entity overlapping and entity nesting
CN117171382B (en) * 2023-07-28 2024-05-03 宁波善德电子集团有限公司 Vehicle video retrieval method based on comprehensive features and natural language
CN116994073B (en) * 2023-09-27 2024-01-26 江西师范大学 Graph contrast learning method and device for self-adaptive positive and negative sample generation
CN118585721A (en) * 2024-08-03 2024-09-03 凯泰铭科技(北京)有限公司 HTTP request and response monitoring method and system based on browser extension

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808732A (en) * 2016-03-10 2016-07-27 北京大学 Integration target attribute identification and precise retrieval method based on depth measurement learning
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106776856A (en) * 2016-11-29 2017-05-31 江南大学 A kind of vehicle image search method of Fusion of Color feature and words tree
CN106886573A (en) * 2017-01-19 2017-06-23 博康智能信息技术有限公司 A kind of image search method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025642B (en) * 2016-01-27 2018-06-22 百度在线网络技术(北京)有限公司 Vehicle's contour detection method and device based on point cloud data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808732A (en) * 2016-03-10 2016-07-27 北京大学 Integration target attribute identification and precise retrieval method based on depth measurement learning
CN106227851A (en) * 2016-07-29 2016-12-14 汤平 Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN106776856A (en) * 2016-11-29 2017-05-31 江南大学 A kind of vehicle image search method of Fusion of Color feature and words tree
CN106886573A (en) * 2017-01-19 2017-06-23 博康智能信息技术有限公司 A kind of image search method and device

Also Published As

Publication number Publication date
CN108108657A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
CN108108657B (en) Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning
CN107679250B (en) Multi-task layered image retrieval method based on deep self-coding convolutional neural network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN106227851B (en) The image search method of depth of seam division search based on depth convolutional neural networks
CN110717534B (en) Target classification and positioning method based on network supervision
CN106250812B (en) A kind of model recognizing method based on quick R-CNN deep neural network
CN106096561B (en) Infrared pedestrian detection method based on image block deep learning features
CN107330451B (en) Clothing attribute retrieval method based on deep convolutional neural network
CN111709311B (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN109558823B (en) Vehicle identification method and system for searching images by images
CN108595636A (en) The image search method of cartographical sketching based on depth cross-module state correlation study
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN112418117A (en) Small target detection method based on unmanned aerial vehicle image
CN110689081A (en) Weak supervision target classification and positioning method based on bifurcation learning
CN111783831A (en) Complex image accurate classification method based on multi-source multi-label shared subspace learning
Tian et al. Small object detection via dual inspection mechanism for UAV visual images
CN107688830B (en) Generation method of vision information correlation layer for case serial-parallel
CN103714148B (en) SAR image search method based on sparse coding classification
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN111325237B (en) Image recognition method based on attention interaction mechanism
CN114694178A (en) Method and system for monitoring safety helmet in power operation based on fast-RCNN algorithm
Wang et al. A Convolutional Neural Network‐Based Classification and Decision‐Making Model for Visible Defect Identification of High‐Speed Train Images
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN114038007A (en) Pedestrian re-recognition method combining style transformation and attitude generation
CN114627424A (en) Gait recognition method and system based on visual angle transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant