CN107085607B - Image feature point matching method - Google Patents

Image feature point matching method Download PDF

Info

Publication number
CN107085607B
CN107085607B CN201710258205.3A CN201710258205A CN107085607B CN 107085607 B CN107085607 B CN 107085607B CN 201710258205 A CN201710258205 A CN 201710258205A CN 107085607 B CN107085607 B CN 107085607B
Authority
CN
China
Prior art keywords
vector
clustering
matched
feature vector
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710258205.3A
Other languages
Chinese (zh)
Other versions
CN107085607A (en
Inventor
段翰聪
赵子天
谭春强
文慧
闵革勇
陈超
李博洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710258205.3A priority Critical patent/CN107085607B/en
Publication of CN107085607A publication Critical patent/CN107085607A/en
Application granted granted Critical
Publication of CN107085607B publication Critical patent/CN107085607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an image feature point matching method, which comprises the following steps of extracting feature points of a warehousing image: extracting the features of the warehousing image, forming a warehousing feature vector, and reducing the dimension of the warehousing feature vector; vector storage: dividing the dimension-reduced warehouse-in characteristic vector, performing product quantization on each divided part, and then performing vector quantization to form a product quantizer and a vector quantizer, and establishing a retrieval tree and a hash table; extracting characteristic points of the picture to be matched: extracting the features of the image to be matched, forming a feature vector to be matched, and reducing the dimension of the feature vector; vector matching: dividing the feature vector to be matched after dimension reduction, finding out a plurality of clustering centers of the feature vector to be matched, a product quantizer and a vector quantizer, wherein the clustering centers are far away from each other, finding out pictures corresponding to the clustering centers according to a search tree and a hash table to form a candidate set, and calculating the picture closest to the feature vector to be matched in the candidate set by adopting a floating point vector; the speed is fast and the precision is high.

Description

Image feature point matching method
Technical Field
The invention relates to the technical field of picture searching, in particular to an image feature point matching method.
Background
In the field of image search, feature matching is a very important link, and the final search speed and accuracy are determined by the matching efficiency and accuracy of features. When searching for the existing picture, the method comprises the following steps: the first step is to train a conversion matrix through a large amount of sample data, convert the binary code through a hash function, segment the binary code, generate a plurality of hash tables, and directly use the obtained segmented binary code as an entry of the hash table. And secondly, when the vector to be queried reaches, converting the vector into a binary code in the same way, mapping the binary code to a corresponding hash table entry and other entries with the distance r, and taking all pictures in the entries as candidate sets. And thirdly, performing complete Hamming distance calculation on all the picture characteristic vectors in the candidate set and the vector to be queried, and rearranging the distance. When the floating point feature vector is converted into the binary code, the precision of the vector is lost due to the existence of the hash function, and the Hamming distance calculation based on the binary code is still used in the final reordering process, although the speed is high, the calling rate is reduced to a certain extent because the representation precision of the binary code is not as good as that of the floating point vector.
With the rapid development of the internet, the pictures on the internet have reached the billion level or even higher at present. With existing feature point matching methods, it has not been able to adapt to existing fast-growing picture library patterns. How to search for a large number of pictures in a search tree with high efficiency and high precision becomes a hot spot.
Disclosure of Invention
The present invention provides an image feature point matching method for solving the above technical problems, which has high matching precision and high speed.
The invention is realized by the following technical scheme:
an image feature point matching method comprises the following steps,
extracting characteristic points of the warehousing picture: extracting the features of the warehousing image, forming a warehousing feature vector, and reducing the dimension of the warehousing feature vector;
vector storage: dividing the dimension-reduced warehouse-in characteristic vector, performing product quantization on each divided part, and then performing vector quantization to form a product quantizer and a vector quantizer, and establishing a retrieval tree and a hash table;
extracting characteristic points of the picture to be matched: extracting the features of the image to be matched, forming a feature vector to be matched, and reducing the dimension of the feature vector;
vector matching: and dividing the feature vector to be matched after dimension reduction, finding out a plurality of clustering centers of the feature vector to be matched, the product quantizer and the vector quantizer, wherein the clustering centers are far away from each other, finding out pictures corresponding to the clustering centers according to a search tree and a hash table to form a candidate set, and calculating the picture closest to the feature vector to be matched in the candidate set by adopting a floating point vector.
The method of the scheme does not use an iterative quantization algorithm to calculate binary codes, adopts dimension reduction clustering to construct a retrieval tree and a hash table, does not cluster completely related data but divides the data in the first-level clustering, can accelerate the data by adopting a multi-thread parallel processing mode, and greatly reduces the training time of the quantizer. The whole process of matching and searching is carried out in two sections, the first section is to select a candidate set, the second section uses the floating point vector to carry out distance calculation on the whole, on the premise that the range of the candidate set is large, the floating point distance calculation is carried out, the difference between the recall rate of the searching result and the violence matching is small and is not more than 1 percentage point. Ordering is more accurate than hamming distance.
If N records exist in the database, the distance calculation needs to be carried out for N times in violent matching, and the method of the scheme has the advantages that the number of the records in the candidate set is N/100-N/10 according to the different selected parameters, so that the calculation is greatly reduced, and the matching speed is greatly improved. In the process of constructing the retrieval tree, the data clustered in the first stage is divided into a plurality of parts, and the clustering processes of the parts are completely independent, so that the multithreading technology can be used for clustering, and the clustering speed is improved.
Preferably, the vector storage method comprises the following steps:
partitioning the binned eigenvector into disjoint P sections;
the number of clustering centers inside each part is k1K-means clustering;
for each cluster center, performing vector quantization on all data distributed to the cluster center, wherein the number of the cluster centers is k2
And respectively recording the IDs of all the characteristics mapped to the corresponding clustering centers or the names of the corresponding pictures by using the P hash tables.
Further, the specific method of vector matching is as follows:
dividing the feature vector to be matched into P disjoint parts;
within each part, the feature vector to be matched and k are calculated1Of a cluster centerDistance, and selecting W clustering centers with the minimum distance;
aiming at the selected W clustering centers, the feature vector to be matched and k corresponding to the clustering center are used2The clustering centers of the second layer are subjected to distance calculation one by one to obtain k2A distance;
for W x k2Sequencing the distances, and taking m distances with the shortest distance, wherein m is a natural number more than 1;
taking out the clustering centers corresponding to the m distances, finding out the corresponding hash table entries, and adding the picture names or
ID constitutes a candidate set;
and (4) calculating the distances between the picture characteristic vectors corresponding to the picture IDs in the candidate set and the characteristic vectors to be matched one by adopting floating point vectors, and finally obtaining the target with the minimum distance.
Further, the k-means clustering adopts a parallel processing mode.
Preferably, the dimensionality of the features is reduced using a principal component analysis method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention adopts dimension-reducing clustering to construct a retrieval tree and a hash table, and when the first-level clustering is performed, the retrieval tree and the hash table are segmented, data of each part are completely independent, and the method can be accelerated by adopting a multi-thread parallel processing mode, so that the training time of a quantizer is greatly reduced; in the process of matching retrieval, a candidate set is selected, then the whole floating point vector is used for distance calculation, under the premise that the range of the candidate set is large, the floating point distance calculation is carried out, the difference between the recall rate of a retrieval result and the violence matching is small and does not exceed 1 percentage point, and the retrieval precision is high and efficient.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not used as limitations of the present invention.
Example 1
An image feature point matching method comprises the following steps,
extracting characteristic points of the warehousing picture: extracting the features of the warehousing image, forming a warehousing feature vector, and reducing the dimension of the warehousing feature vector;
vector storage: dividing the dimension-reduced warehouse-in characteristic vector, performing product quantization on each divided part, and then performing vector quantization to form a product quantizer and a vector quantizer, and establishing a retrieval tree and a hash table;
extracting characteristic points of the picture to be matched: extracting the features of the image to be matched, forming a feature vector to be matched, and reducing the dimension of the feature vector;
vector matching: and dividing the feature vector to be matched after dimension reduction, finding out a plurality of clustering centers of the feature vector to be matched, the product quantizer and the vector quantizer, wherein the clustering centers are far away from each other, finding out pictures corresponding to the clustering centers according to a search tree and a hash table to form a candidate set, and calculating the picture closest to the feature vector to be matched in the candidate set by adopting a floating point vector.
By adopting the method, not only the hash entry which is completely hit but also a plurality of entries nearby the hash entry are considered in the retrieval process, so that the probability of the most similar picture to the query picture in the candidate set is increased, and the recall rate is improved. When reordering is carried out, the original floating point feature vector is used for distance calculation, all information of the original floating point feature vector is reserved, and the original floating point feature vector is quantized into a binary feature code, so that the precision is lost to a certain extent.
Example 2
Based on the idea of the above embodiment, the present embodiment refines each step.
For the pictures stored as the search tree and the pictures to be matched, feature points need to be extracted, and the feature points are extracted by a plurality of methods, such as a convolutional neural network, the dimension of the output feature vector is a relatively large value and can be set as n, which may be 128, 256, 512, or the like. The dimension of the feature vector is large, the calculation amount of the matching process is increased, dimension reduction needs to be carried out on the feature vector, the dimension reduction can be carried out by adopting a principal component analysis method to reduce the dimension of the output feature vector into d dimension, wherein d is less than or equal to n, and d can be 128 or 64. The adoption of the dimension reduction method can not only remove the influence of noise, but also reduce the calculation amount and the calculation time.
The specific process of reducing the dimension is as follows:
assuming S pieces of data, in the original space, S pieces of n-dimensional eigenvectors can be represented by a matrix M ═ D1,D2,…DnDenotes wherein DnIs the column vector of S x 1. First, the covariance of the matrix M is determined to obtain a matrix var (M), where var (M) is MTM is 1/n, then the prior method is utilized to solve n eigenvalues and corresponding eigenvectors of the covariance matrix Var (M), the largest d eigenvalues and the corresponding vectors are selected as the matrix R for dimension reduction, the MR is calculated to obtain an L matrix for d, and the dimension reduction process is completed.
Extracting the characteristics of the picture to be stored in a warehouse and reducing the dimension to prepare for constructing a search tree for vector storage, wherein the specific method for vector storage comprises the following steps:
the method comprises the steps of dividing a warehousing feature vector into P disjoint parts, for example, dividing data with 128 dimensions of d into 4 parts, taking the 1 st floating point number to the 32 th floating point number of the feature vector as a first part, the 33 th floating point number to the 64 th floating point number as a second part, the 65 th floating point number to the 96 th floating point number as a third part, and the 97 th floating point number to the 128 th floating point number as a fourth part.
The number of clustering centers inside each part is k1The k-means clustering, here, is the first layer quantization, and the specific process is illustrated by taking the first part as an example:
1-1, for the S-strip 32 floating-point number features of the first part, firstly selecting k from the S-strip features1The strip acts as a cluster center;
1-2, respectively putting S pieces of characteristic data in k1Performing Euclidean distance calculation on each clustering center, and attributing the data to the clustering center when the characteristic data is closest to the clustering center;
1-3, for each cluster center, summing the corresponding floating point number of the feature data belonging to the cluster center in the previous step to obtain an average value, and taking the finally obtained average value vector as a new average value vectork1A cluster center;
1-4, if the clustering times or the clustering error is reduced to a certain range, terminating clustering, and obtaining k in the step 31And (4) obtaining the cluster centers, otherwise, returning to the step 2.
1-5, record k1The value of each cluster center is needed for retrieval.
On the basis of the first-layer quantization, all data distributed to the clustering center are subjected to vector quantization aiming at each clustering center, and the number of the clustering centers is k2Here, the second layer quantization is performed as follows:
2-1, for each cluster center quantified at the first level, the feature belonging to a cluster center is SiWherein all SiIs S, from SiIn selecting k2The strip acts as a cluster center;
2-2. mixing SiBar feature data in k2Performing Euclidean distance calculation on each clustering center, and attributing the data to the clustering center when the characteristic data is closest to the clustering center;
2-3, for each cluster center, summing the corresponding floating point number of the feature data belonging to the cluster center in the previous step to obtain an average value, and taking the finally obtained average value vector as a new k2A cluster center;
2-4, if the clustering times or the clustering error is reduced to a certain range, terminating clustering, and obtaining k in the step 32The clustering centers are obtained, otherwise, the step 2 is returned to;
2-5, record k1The value of each cluster center is needed for retrieval.
Respectively recording the ID of all the characteristics mapped to the corresponding clustering center or the name of the corresponding picture by using P hash tables, wherein the hash code of each hash table is log2 (k)1)+log2(k2) A bit. For example:
1. suppose k1=16,k216, the hash code is 4+ 4-8 bits, the first 4 bits representing one of the 16 cluster centers in the first layer, and the last 4 bits representing one of the 16 two-layer cluster centers corresponding to the first-layer cluster centerAnd (4) respectively.
2. And adding the corresponding picture name or ID into the hash entry corresponding to the hash code corresponding to each feature data mapped to the second-layer clustering center. The hash here is just encoding for the cluster center, and is a data structure for storage.
The essence of vector storage is to lay a foundation for vector matching, and after the retrieval tree and the hash table are built, the vector matching can be carried out. If the picture to be matched is input, the vector matching step is started after the characteristic points of the picture to be matched are extracted according to the method, and the specific method for vector matching comprises the following steps:
dividing the feature vector to be matched into P disjoint parts; the dividing method is the same as the dividing method in warehousing.
Within each part, the feature vector to be matched and k are calculated1The distances of the cluster centers are sorted and the W cluster centers with the smallest distance are selected, and P is 4, k1=16,k216, W is 4, the first part of the vector to be queried is illustrated as an example,
and (3) calculating Euclidean distances between the 16 cluster centers of the first part in storage by using the first part of the vector to be inquired, namely 1-32 floating point vectors and the 16 cluster centers. And sorting the 16 distances, and selecting the smallest W-4 clustering centers.
Aiming at the selected W clustering centers, the feature vector to be matched and k corresponding to the clustering center are used2The clustering centers of the second layer are subjected to distance calculation one by one to obtain k2And specifically, performing Euclidean distance calculation on the selected cluster centers by using 32 floating point vectors of the vector to be queried and 16 cluster centers of the second-level cluster in the selected cluster centers.
W x k2The distances are sequentially placed into the large top pile, only m distances are ensured in the large top pile, wherein m is a natural number greater than 1, and the first m distances are taken instead of one, so that the recall rate is ensured.
And (4) taking out m clustering centers corresponding to the distances in the large pushing, finding corresponding hash table entries, and forming candidate sets by picture names or IDs in the entries. And for the m clustering centers corresponding to the distance, encoding a hash code for each clustering center when the clustering centers are put in storage, wherein each hash code corresponds to a unique entry of the hash table, and obtaining a candidate set by summing IDs in the m entries.
Carrying out distance calculation on the picture characteristic vectors corresponding to the picture IDs in the candidate set and the characteristic vectors to be matched one by one, and finally obtaining the target with the minimum distance, wherein the method specifically comprises the following steps:
for each ID in the candidate set, acquiring a complete 128-dimensional floating point feature vector of each ID, and performing distance calculation by using the vector to be queried and the floating point feature vectors one by one;
and (3) selecting the minimum K results in the step (2), wherein the corresponding ID is the most similar picture. When K is 1, the search is accurate, and when K >1, the search is K neighbor.
Example 3
With respect to example 2, a detailed implementation is now disclosed.
The steps of extracting the feature points of the image to be stored and the steps of extracting the feature points of the image to be matched are not described in detail in this embodiment.
Vector storage: training of the product quantizer and the vector quantizer is performed using a large number of feature vectors in the database. The method comprises the following steps:
dividing the D-dimensional warehousing feature vector subjected to dimension reduction into P disjoint parts, taking D as 128 and P as 4 as examples, taking the 1 st to 32 th bits of the feature vector as a first section, the 33 th to 64 th bits as a second section, the 65 th to 96 th bits as a third section, and the 97 th to 128 th bits as a fourth section.
The number of clustering centers inside each part is k1K-means clustering of (1), total of P x k1A cluster center, all cluster centers being [ C ]1 i]p={[c1 i]p,i=0,1,2,…,k1(ii) a P is 0,1,2, …, P, and the cluster center [ C { (m) } is determined1 i]pThe storage is performed, since the features have been segmented in the previous step, the storage space consumption is only D/P × k1This step is called PQ, i.e. product quantization, and also becomes the first layer quantization, resulting in a corresponding PQ quantizer. And isDue to the independence of all parts, the process can use multiple threads, multiple processes and even multiple nodes to perform parallel processing, and the clustering speed is increased.
On the basis of the first layer quantization, all clusters are clustered to a cluster center c1 ijClustering again to generate k2Center of cluster, total of P x k1*k2A cluster center, all cluster centers being [ C ]2 ij]p={[c2 ij]p,i=0,1,2,3,…,k2;j=0,1,2,3…k1(ii) a P is 0,1,2, …, P, and the cluster center is stored, this step is called VQ, i.e. vector quantization, also referred to as second-level quantization, and the corresponding VQ quantizer is obtained.
Establishing P hash tables corresponding to the P vector sets separated in the first step, wherein the hash tables have a total
Figure BDA0001273998460000061
Each hash table is also corresponding to the hash code with the length of
Figure BDA0001273998460000062
For k in each part1*k2And coding the clustering centers, and storing the ID of the characteristic vector mapped to each clustering center in the sample data or the name of the corresponding picture in an entry in a hash table corresponding to the clustering center to obtain the inverted index based on the multiple hash tables.
After the four steps are completed, a PQ quantizer, a VQ quantizer, a search tree and an inverted index structure based on a multi-hash table based on a large amount of sample data can be obtained.
After the search tree and the hash table are established, if the picture to be matched is input, vector matching search is carried out.
When a feature vector y to be matched is given, a vector which is most adjacent to the feature vector y is retrieved from the retrieval tree and the inverted index, and the steps are as follows:
dividing the vector y into intersecting P portions, y ═ y1,y2,y3,…,yp]。
For ypCalculating k with the quantization of the first layer of the p-th part1Distance of individual cluster centers. Definition of dist (y)p,[c1 i]p)=||yp–[c1 i]p||2The distance between the pth part of the vector y to be queried and the ith clustering center of the pth part of the sample data.
Because of the uncertainty of the clustering process, a feature in sample space closest to y is likely to belong to other cluster centers nearby, so it is considered that the distance from ypOther cluster centers around the nearest cluster center. For y determined in the previous steppAnd reordering the distances between the cluster centers, and selecting w cluster centers with the smallest distance as the range of the next query.
For w first-layer cluster centers with the minimum distance, for w cluster centers, k is arranged below each cluster center2A second level cluster center, defining dist (y)p,[c1 ij]p)=||yp–[c1 ij]p||2Is the distance between the p-th part of the vector y to be queried and the j-th second layer cluster center under the i-th cluster center of the first layer in the sample space.
For w x k obtained in step 42And (3) sequencing the distances, wherein the distance is only the distance of the p-th part of the sample space, so that only the nearest one can not be taken, the cluster centers corresponding to the first m distances are taken, the corresponding hash table inlets are found according to the cluster centers, the picture IDs or names stored in the m hash table inlets are subjected to union, and finally the candidate set of the most similar picture IDs is obtained.
And taking out the characteristic vector corresponding to the picture ID, calculating the distance between the characteristic vector and the complete vector y to be inquired at one time, and adopting a data structure of small top heap to work well regardless of the size of the final data set and occupy a small amount of memory. And finally obtaining the topK which is the search result and is K before the distance sorting, wherein the topK is the most similar picture when K is 1.
The embodiment adopts product quantization, vector quantization and multi-Hash index to solve the problem of nearest neighbor search, and improves the retrieval recall rate by utilizing the parallel computing process of clustering and two-stage division of the retrieval process.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (4)

1. An image feature point matching method is characterized by comprising the following steps,
extracting characteristic points of the warehousing picture: extracting the features of the warehousing image, forming a warehousing feature vector, and reducing the dimension of the warehousing feature vector;
vector storage: dividing the dimension-reduced warehouse-in characteristic vector, performing product quantization on each divided part, and then performing vector quantization to form a product quantizer and a vector quantizer, and establishing a retrieval tree and a hash table;
extracting characteristic points of the picture to be matched: extracting the features of the image to be matched, forming a feature vector to be matched, and reducing the dimension of the feature vector;
vector matching: dividing the feature vector to be matched after dimension reduction, finding out a plurality of clustering centers of the feature vector to be matched, a product quantizer and a vector quantizer, wherein the clustering centers are far away from each other, finding out pictures corresponding to the clustering centers according to a search tree and a hash table to form a candidate set, and calculating the picture closest to the feature vector to be matched in the candidate set by adopting a floating point vector;
the specific method for vector storage comprises the following steps:
a step of dividing the warehousing feature vector: dividing the dimension-reduced warehousing feature vector into disjoint P parts;
product quantization step: clustering centers within each partThe number is k1K-means clustering of (g) to obtain P x k1Clustering centers, and dividing all P x k1Storing the clustering centers to form a product quantizer;
vector quantization step: aiming at each clustering center obtained in the product quantification step, performing clustering step again on all data distributed to the clustering center, wherein the number of the clustering centers is k2(ii) a To obtain P x k1*k2A second layer of cluster centers, clustering the P x k1*k2The second layer clustering centers are stored to form a vector quantizer;
establishing a retrieval tree and a hash table: respectively recording IDs of all the characteristics mapped to the corresponding clustering centers or names of corresponding pictures by using P hash tables; the specific method for vector matching is as follows:
dividing the feature vector to be matched into P disjoint parts;
within each portion, the eigenvectors to be matched and k obtained in the product quantization step are calculated1Distance of each clustering center, and selecting W clustering centers with the minimum distance;
aiming at the selected W clustering centers, the feature vector to be matched and k corresponding to the clustering center are used2The clustering centers of the second layer are subjected to distance calculation one by one to obtain k2A distance;
for W x k2Sequencing the distances, and taking m distances with the shortest distance, wherein m is a natural number more than 1;
taking out m clustering centers corresponding to the distances, finding corresponding hash table entries, and forming candidate sets by picture names or IDs in the entries;
and (4) calculating the distances between the picture characteristic vectors corresponding to the picture IDs in the candidate set and the characteristic vectors to be matched one by adopting floating point vectors, and finally obtaining the target with the minimum distance.
2. The image feature point matching method according to claim 1, characterized in that: and the k-means clustering adopts a parallel processing mode.
3. The image feature point matching method according to claim 1, characterized in that: and reducing the dimension of the feature by adopting a principal component analysis method.
4. The image feature point matching method according to claim 1, characterized in that: the dimension reduction comprises the following specific steps:
forming a matrix M by using L pieces of n-dimensional eigenvector data, and solving covariance of the matrix M to obtain a matrix Var (M), wherein the matrix M ═ { D ═ D1,D2,…DnN is the dimension of the feature vector, and L is a natural number greater than 1;
solving n eigenvalues and corresponding eigenvectors of the covariance matrix Var (M), and selecting the largest d eigenvalues and corresponding vectors as a dimension reduction matrix R;
and calculating the MR to obtain a matrix of L x d, thereby realizing dimension reduction.
CN201710258205.3A 2017-04-19 2017-04-19 Image feature point matching method Active CN107085607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710258205.3A CN107085607B (en) 2017-04-19 2017-04-19 Image feature point matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710258205.3A CN107085607B (en) 2017-04-19 2017-04-19 Image feature point matching method

Publications (2)

Publication Number Publication Date
CN107085607A CN107085607A (en) 2017-08-22
CN107085607B true CN107085607B (en) 2020-06-30

Family

ID=59611717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710258205.3A Active CN107085607B (en) 2017-04-19 2017-04-19 Image feature point matching method

Country Status (1)

Country Link
CN (1) CN107085607B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019905B (en) * 2017-10-13 2022-02-01 北京京东尚科信息技术有限公司 Information output method and device
CN110019907B (en) * 2017-12-01 2021-07-16 北京搜狗科技发展有限公司 Image retrieval method and device
CN107944046B (en) * 2017-12-15 2019-02-05 清华大学 Extensive high dimensional data method for quickly retrieving and system
CN110019096A (en) * 2017-12-29 2019-07-16 上海全土豆文化传播有限公司 The generation method and device of index file
CN110019875A (en) * 2017-12-29 2019-07-16 上海全土豆文化传播有限公司 The generation method and device of index file
CN108763481B (en) * 2018-05-29 2020-09-01 清华大学深圳研究生院 Picture geographical positioning method and system based on large-scale street view data
CN110889424B (en) * 2018-09-11 2023-06-30 阿里巴巴集团控股有限公司 Vector index establishing method and device and vector retrieving method and device
CN110175546B (en) * 2019-05-15 2022-02-25 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN110795582A (en) * 2019-10-31 2020-02-14 京东方科技集团股份有限公司 Image recommendation method, system, terminal device and server
CN111324760B (en) * 2020-02-19 2023-09-26 创优数字科技(广东)有限公司 Image retrieval method and device
CN112418298B (en) * 2020-11-19 2021-12-03 北京云从科技有限公司 Data retrieval method, device and computer readable storage medium
CN112988747A (en) * 2021-03-12 2021-06-18 山东英信计算机技术有限公司 Data retrieval method and system
CN117392415A (en) * 2023-10-12 2024-01-12 南京邮电大学 Image quick matching method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842032A (en) * 2012-07-18 2012-12-26 郑州金惠计算机系统工程有限公司 Method for recognizing pornography images on mobile Internet based on multi-mode combinational strategy
US9436758B1 (en) * 2011-12-27 2016-09-06 Google Inc. Methods and systems for partitioning documents having customer feedback and support content

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9436758B1 (en) * 2011-12-27 2016-09-06 Google Inc. Methods and systems for partitioning documents having customer feedback and support content
CN102842032A (en) * 2012-07-18 2012-12-26 郑州金惠计算机系统工程有限公司 Method for recognizing pornography images on mobile Internet based on multi-mode combinational strategy

Also Published As

Publication number Publication date
CN107085607A (en) 2017-08-22

Similar Documents

Publication Publication Date Title
CN107085607B (en) Image feature point matching method
Yu et al. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
CN108920720B (en) Large-scale image retrieval method based on depth hash and GPU acceleration
Jegou et al. Product quantization for nearest neighbor search
CN105912611B (en) A kind of fast image retrieval method based on CNN
Norouzi et al. Fast exact search in hamming space with multi-index hashing
CN103336795B (en) Video index method based on multiple features
CN113918753B (en) Image retrieval method based on artificial intelligence and related equipment
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
Wei et al. Projected residual vector quantization for ANN search
CN112948601B (en) Cross-modal hash retrieval method based on controlled semantic embedding
JP7006966B2 (en) Coding method based on mixed vector quantization and nearest neighbor search (NNS) method using this
CN108491430A (en) It is a kind of based on the unsupervised Hash search method clustered to characteristic direction
CN111177435A (en) CBIR method based on improved PQ algorithm
CN107527058B (en) Image retrieval method based on weighted local feature aggregation descriptor
CN115129949A (en) Vector range retrieval method, device, equipment, medium and program product
Cao et al. Scalable distributed hashing for approximate nearest neighbor search
Heo et al. Shortlist selection with residual-aware distance estimator for k-nearest neighbor search
CN108536772B (en) Image retrieval method based on multi-feature fusion and diffusion process reordering
Romberg et al. Robust feature bundling
CN112418298B (en) Data retrieval method, device and computer readable storage medium
Jégou et al. Searching with quantization: approximate nearest neighbor search using short codes and distance estimators
Chiu et al. Effective product quantization-based indexing for nearest neighbor search
Yuan et al. A novel index structure for large scale image descriptor search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant