CN102004786A - Acceleration method in image retrieval system - Google Patents

Acceleration method in image retrieval system Download PDF

Info

Publication number
CN102004786A
CN102004786A CN 201010573237 CN201010573237A CN102004786A CN 102004786 A CN102004786 A CN 102004786A CN 201010573237 CN201010573237 CN 201010573237 CN 201010573237 A CN201010573237 A CN 201010573237A CN 102004786 A CN102004786 A CN 102004786A
Authority
CN
China
Prior art keywords
image
standard picture
vector
index
retrieved
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010573237
Other languages
Chinese (zh)
Other versions
CN102004786B (en
Inventor
冯德瀛
杨杰
杨程
刘从新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN2010105732370A priority Critical patent/CN102004786B/en
Publication of CN102004786A publication Critical patent/CN102004786A/en
Application granted granted Critical
Publication of CN102004786B publication Critical patent/CN102004786B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an acceleration method in an image retrieval system in the technical field of computer information processing. The acceleration method comprises the following steps of: respectively extracting characteristic descriptors from a standard image and an image to be retrieved, and establishing a visual code book; establishing a random kd tree according to a seed point set and classifying the characteristic descriptors; performing vectorization processing and optimizing inverted indexes; and performing similarity search on the vector of the image to be retrieved in the optimized inverted indexes to accelerate the image retrieval system. The acceleration method can solve the problems of large calculation amount and long calculation time in the clustering process in the prior art, and improves the real-time property of the similarity search by optimizing the inverted indexes under the condition of ensuring the retrieval accuracy.

Description

Accelerated method in the image indexing system
Technical field
What the present invention relates to is a kind of method of technical field of computer information processing, specifically is the accelerated method in a kind of image indexing system.
Background technology
Along with the extensive of Internet network and digital collecting device popularized, view data has obtained using widely in people's life.Comprising a large amount of view data in increasing business activity, affairs transaction and the information performance.In extensive image data base, how to go to organize and search the hot issue that these view data become people's concern effectively according to demand.
Image retrieval technologies is meant searches for and finds out the respective image that meets querying condition according to query image content information or given query standard in the standard picture storehouse.Image retrieval technologies generally is divided into text-based image retrieval technology and CBIR technology.The text-based image retrieval technology, use more universal at present, it has continued to use the traditional text retrieval technique, avoided analysis to image low-level feature element, from aspects such as image name, picture size, compression type, author, ages image is described, by the form query image of keyword, perhaps browse the image of searching under the particular category according to the form of grade catalogue.The CBIR technology, under the prerequisite of given query image, from aspects such as global characteristics such as the color of image, shape, texture and local invariant features image is described, and, in the standard picture storehouse, carry out similarity searching and then find out the similar image of content by characteristics of image is carried out vectorized process.
The CBIR technology, the early stage global characteristics such as color, texture, shape that adopt mostly carry out similarity searching, but since these features for illumination, block and geometric deformation etc. does not have robustness, therefore replaced by local invariant characteristic detection methods such as DOG, MSER, Harris gradually.Present CBIR technology, generally extract characteristics of image by feature detection, establishment feature description, then the sub-cluster of feature description is created the vision code book, with image vector, at last image vector is carried out similarity searching in the high dimensional indexing structure, provide relevant search result.
Through the literature search of prior art is found, existing following and " accelerated method in the image indexing system " relevant technology.Andrew Zisserman etc. provide user's method that self-defined target is retrieved in image in patent " Object Retrieval " (U.S. Patent number is: US 2005/0225678A1, the open date is on Dec 13rd, 2005).Wherein to the feature description subclassification time, use the K-Means clustering method, used traditional inverted index method when similarity is inquired about between image vector to be retrieved and standard picture vector.In extensive image library, use the K-Means clustering method to divide time-like to all feature description, owing to have magnanimity feature description in the standard picture storehouse, the cluster centre number is many, and cluster need just can be finished through iteration repeatedly, thereby caused cluster process long computing time, the problem that calculated amount is big.When in the standard picture vector, using traditional inverted index method to carry out the similarity inquiry,, caused the problem of inquiry real-time difference equally because standard picture vector dimension height reaches more than 100,000 dimensions.
Further retrieval is found, (U.S. Patent number is David Nister etc.: US7725484B2 in patent " Scalable Object Recognition Using HierarchicalQuantization with a Vocabulary Tree ", the open date is on May 25th, 2010) in a kind of code book tree is provided, on the basis of K-Means clustering method, introduced the notion of layering, compare with traditional K-Means clustering method, cluster process shortens computing time to some extent, but owing to have the magnanimity descriptor in the standard picture storehouse, the calculated amount of cluster process is equally very big, the cluster overlong time, simultaneously owing to adopted the method for layering, the different descriptors that belong to same classification tend to be divided in the middle of the different classifications, and then have caused the quantification poor-performing.Between image vector to be retrieved and standard picture vector, carry out the similarity inquiry and used traditional inverted index method equally, because the not reduction of the dimension of image vector, and quantize poor-performing, thus caused retrieval rate lower, real-time is relatively poor.
Summary of the invention
The present invention is directed to the prior art above shortcomings, accelerated method in a kind of image indexing system is provided, be achieved by stochastic sampling establishment vision code book and according to standard picture vector establishment optimization inverted index, can remedy the problem that the cluster process calculated amount is big and computing time is long in the prior art, optimize inverted index under the situation that guarantees retrieval rate, improved the real-time of similarity searching.
The present invention is achieved by the following technical solutions, the present invention is by extracting feature description respectively and generate the vision code book standard picture and image to be retrieved, create at random the kd tree according to seed points set then and feature description is classified, by vectorized process inverted index is optimized then, at last image vector to be retrieved is carried out similarity searching in optimizing inverted index, realize the acceleration of image indexing system.
Describedly standard picture and image to be retrieved are extracted feature description respectively be meant: standard picture and image to be retrieved are adopted difference of Gaussian operator (Different of Gaussian earlier, DOG) carry out feature point detection, (Scale Invariant Feature Transformation SIFT) is described by the constant descriptor of yardstick with each difference of Gaussian operator then.
Described being described by the constant descriptor of yardstick comprises processed offline and two steps of real-time processing, wherein:
In processed offline, for standard picture storehouse C=(I 1, I 2..., I N) in image I i(i=1,2 ..., N), be expressed as by the SIFT descriptor
Figure BDA0000035639960000021
Wherein:
Figure BDA0000035639960000022
It is image I iIn single descriptor, dimension be 128 the dimension, n iIt is image I iThe number of middle SIFT descriptor.Whole SIFT descriptor set are expressed as S=(X in the standard picture storehouse 1, X 2..., X N), the SIFT descriptor adds up in the S set
In handling in real time, for image Q to be retrieved, T is expressed as T=(q by the SIFT descriptor 1, q 2..., q m), q wherein k(k=1,2 ..., m) being single descriptor among the image Q, dimension is 128 dimensions, m is the number of SIFT descriptor among the image Q.
Described establishment vision code book is meant: feature description in the standard picture storehouse is carried out stochastic sampling and creates the vision code book, and concrete steps are: to SIFT descriptor S set stochastic sampling, extract part SIFT descriptor and gather D as seed points, D=(y 1, y 2..., y z), wherein: the quantity of seed points is z among the set D, each son point is y j(j=1,2 ..., z); Then SIFT descriptor S set is classified seed points y jDetermined the SIFT descriptor similar to it is divided into seed points y jIn the corresponding class, the quantity z of seed points is the quantity of classification, and seed points set D is standard picture I iThe vision code book that quantification needs.
Described establishment kd tree at random is meant: by top-down iterative process, each iteration all selects at random in the dimension of a plurality of big variance yields correspondences with each node and the segmentation threshold of node is chosen as the establishment that principle is carried out node at corresponding dimension at random near in the element of intermediate value.
Described feature description is classified is meant: according to the node threshold value seed points is gathered each seed points y among the D jBe divided into different spaces, concrete steps are: use single optimum querying method that the SIFT descriptor is searched for to find corresponding seed points y in the kd tree at random at many j, find the most similar seed points and leave in the middle of the single optimal sequence, when reaching some, query path stops search, and then inquire the seed points corresponding class and be the classification that the SIFT descriptor should be divided.
Described vectorized process is meant: adopt seed frequency-fall picture frequency (term frequency-inverse documentfrequency, tf-idf) method is respectively to standard picture and image vector to be retrieved, then to the location index of standard picture vector and image vector nonzero element to be retrieved with calculate.
Described image vector comprises processed offline and processing in real time, wherein:
The processed offline step of image vector comprises:
1) to standard picture I iMiddle seed points y jThe frequency n that occurs IjAnd SIFT descriptor sum n iAdd up as seed frequency, then standard picture I iMiddle seed frequency
Figure BDA0000035639960000031
To including seed points y among the C of standard picture storehouse jStandard picture quantity M jAdd up;
2) adopt stop words method commonly used in the text retrieval, to M jSize judge that decision threshold is T, works as M jDuring>T, delete corresponding seed points y jWork as M jDuring≤T, keep M jAnd make M j=M rTo all M jAfter the judgement, the number of seed points is reduced to z ' by z, and then falls picture frequency
Figure BDA0000035639960000032
Seed frequency is by f IjBecome f Ir,
Figure BDA0000035639960000033
3) standard picture I iCorresponding image vector is V i, standard picture vector V then iBe expressed as V i=(c 1, c 2..., c Z '), wherein
Figure BDA0000035639960000034
Thereby finish standard image vector in the processed offline.
The real-time treatment step of image vector comprises:
A) treat seed points y among the retrieving images Q rThe number of times m that occurs rAnd SIFT descriptor number m adds up seed frequency among the image Q then to be retrieved
Figure BDA0000035639960000041
B) for the idf of falling the picture frequency of image Q to be retrieved Qr, the idf of falling the picture frequency of employing processed offline r, promptly
Figure BDA0000035639960000042
Image vector V to be retrieved qBe expressed as V q=(d 1, d 2..., d Z '), wherein
Figure BDA0000035639960000043
Thereby finish image vector to be retrieved in the real-time processing.
Described location index to standard picture vector and image vector nonzero element to be retrieved is meant with calculating: in processed offline, to vector V iBinaryzation establishes that image vector is V after the binaryzation i'=(p 1, p 2..., p Z '), wherein
Figure BDA0000035639960000044
Thereby standard picture vector V iThe location index of nonzero element and s iBe expressed as
Figure BDA0000035639960000045
In handling in real time, to vector V qBinaryzation establishes that image vector is V ' after the binaryzation q=(w 1, w 2..., w Z '), wherein
Figure BDA0000035639960000046
Thereby image vector V to be retrieved qThe location index of nonzero element and s qBe expressed as
Figure BDA0000035639960000047
Described establishment is optimized inverted index and is meant: in processed offline, adopt seed points y rAs index, the standard picture vector V iAs the index target, for seed points y r, have corresponding inverted index tabulation L rFor the standard picture vector V iIn element u, work as c u>0, this image vector V then iTitle I iAnd the location index and the s of nonzero element iBe recorded in tabulation L uIn, be designated as L u={ y u| (I i, s i); Then successively to the standard picture vector V iHandle and it is recorded corresponding index example table L according to the position of nonzero element rIn, create inverted index L={L 1, L 2..., L Z '; Again with inverted index example table L rAnd tabulation L rMiddle corresponding standard picture I iSort, for index L r, the quantity of record standard image is also inequality, the location index of standard picture vector nonzero element and also inequality.At first with index L rQuantity according to the record standard image sorts from big to small, then at index L rIn with standard picture I iLocation index and s according to nonzero element iOrdering from big to small.At L that inverted index is tabulated rAnd corresponding standard picture I in the tabulation iAfter the ordering, create and optimize inverted index L ', thereby being used for handling in real time carries out similarity searching.
Described similarity searching specifically may further comprise the steps:
I) inquiry comprises a fairly large number of index of standard picture, then in this index with the location index and the s of image nonzero element to be retrieved qAs threshold value, with s qLocation index and s with standard picture nonzero element in the tabulation iCompare, for less than this threshold value s qStandard picture and the littler standard picture of follow-up location exponential sum thereof will be excluded;
When ii) in optimizing inverted index L ', carrying out similarity searching, there is totalizer A, is used for the record standard image I iThe number of times a that occurs i, each standard picture all corresponding a totalizer a i, A=(a then 1, a 2..., a N), when standard image I in the inverted index tabulation iInquired about once, then standard picture I iCorresponding totalizer a i Add 1, i.e. a i=a i+ 1, the totalizer A to the standard picture correspondence sorts at last, and the standard picture of the totalizer correspondence that numerical value is bigger promptly is image vector V to be retrieved qCandidate's Query Result, optimize the inverted index search thereby finish;
Iii) with image vector V to be retrieved qWith candidate's standard picture vector V iCarry out similarity measurement, adopt the cosine value between two vectors to carry out similarity calculating,
Figure BDA0000035639960000051
Wherein
Figure BDA0000035639960000052
Figure BDA0000035639960000053
Calculating cosine value cos (V q, V i) after, with cosine value cos (V q, V i) ordering from big to small, maximum cosine value cos (V q, V i) corresponding standard picture I i, be the final Query Result of image Q to be retrieved.
The invention has the beneficial effects as follows: create the vision code book with traditional K-means clustering method and compare, the code book of vision at random provided by the invention only needs to carry out stochastic sampling in the set of SIFT descriptor, does not need repeatedly iterative processing, and calculated amount is little, and computing time is short.Compare with traditional inverted index, the optimization inverted index that the present invention proposes can and be got rid of irrelevant standard picture fast according to the location index of image vector nonzero element to be retrieved, has improved the speed of similarity searching in extensive image library.Compared with prior art, the present invention can improve the real-time of retrieval when reducing calculated amount.
Description of drawings
Fig. 1 is this method process flow diagram.
Fig. 2 is integral retrieval time and the spent time of correlation step in handling in real time.
Fig. 3 is that traditional inverted index query time compares with optimization inverted index query time.
Embodiment
Below embodiments of the invention are elaborated, present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
As shown in Figure 1, present embodiment adopts the accelerated method of image indexing system, and the mobile phone photographic images is retrieved, and concrete implementation step is as follows:
1. standard picture and image to be retrieved are extracted feature description respectively.
In processed offline, to standard picture storehouse C=(I 1, I 2..., I N) in image extract the SIFT descriptor.Image I iMiddle SIFT descriptor quantity is n i, then the whole SIFT descriptors in standard picture storehouse adds up to
Figure BDA0000035639960000054
In handling in real time, treat retrieving images Q and extract the SIFT descriptor, SIFT descriptor quantity is m among the image Q to be retrieved.
2. to the sub-stochastic sampling of the feature description in the standard picture storehouse, create the vision code book.
In processed offline, n SIFT descriptor of standard picture storehouse correspondence carried out stochastic sampling, extract wherein z SIFT descriptor and create vision code book, wherein z=20% * n as seed points.
3. kd tree is at random created in set according to seed points, and feature description of standard picture and image to be retrieved is classified.
In processed offline, create 8 independently kd trees at random according to z seed points, SIFT descriptor in the standard picture is similar to nearest neighbor search in the kd tree at random at 8 successively, the maximal value of query path quantity is made as 100, and then the SIFT descriptor is divided in the seed points corresponding class, add up each SIFT and describe and belong to the seed points classification.
In handling in real time, create according to processed offline 8 kd trees at random, SIFT descriptor among the image Q to be retrieved is similar to nearest neighbor search, the maximal value of query path quantity is made as 100 equally, and then the SIFT descriptor is divided in the seed points corresponding class, add up each SIFT and describe and belong to the seed points classification.
4. adopt seed frequency-method of falling the picture frequency respectively to standard picture and image vector to be retrieved.
In processed offline, to including seed points y among the C of standard picture storehouse jStandard picture quantity M jAdopt the stop words method, make stop words threshold value T=0.6 * max (M j).
In handling in real time, only consider to adopt in the processed offline seed points after the stop words method is screened, adopt the picture frequency of falling of processed offline simultaneously.
5. to the location index of standard picture vector and image vector nonzero element to be retrieved with calculate.
In processed offline, with the standard picture vector V i=(c 1, c 2..., c Z ') two-value turns to vector V i'=(p 1, p 2..., p Z '), wherein
Figure BDA0000035639960000061
Standard picture vector V then iThe location index of nonzero element and
In handling in real time, with image vector V to be retrieved qTwo-value turns to vector V q=(d 1, d 2..., d Z '), wherein
Figure BDA0000035639960000063
Image vector V then to be retrieved qThe location index of nonzero element and
Figure BDA0000035639960000064
6. create and optimize inverted index.
In processed offline, with seed points y rAs index, the standard picture vector V iAs the index target, to the standard picture vector V iIn nonzero element add up.Work as vector V iMiddle element u is non-vanishing, then with standard picture title I iAnd the location index and the s of nonzero element iBe recorded in seed points y uIn the corresponding index.In the standard picture vector V iIn after all nonzero elements statistics finishes, the quantity of index according to the record standard image is sorted from big to small, to location index and the s of the standard picture in the index according to nonzero element iOrdering is from big to small created and is optimized inverted index.
7. image vector to be retrieved is carried out similarity searching in optimizing inverted index, and carry out cosine value tolerance.
In handling in real time, for image vector V to be retrieved qAll index of nonzero element correspondence, including the maximum index of standard picture quantity will preferentially be inquired about, and in this index with image vector V to be retrieved qNonzero element location index and s qWith the standard picture vector V iNonzero element location index and s iCompare, work as s q〉=s iThe time, standard picture I iCorresponding totalizer a i=a i+ 1; Work as s q<s iThe time, get rid of corresponding standard picture I iAnd I iThe standard picture that the follow-up location exponential sum is littler.With vector V qIn after all nonzero elements inquire about successively, the totalizer A of standard picture correspondence is sorted, take out the accumulator value of preceding 5 maximums, the standard picture vector V that it is corresponding iWith image vector V to be retrieved qCarry out similarity measurement according to cosine value, 5 cosine values are sorted from big to small, the standard picture of maximum cosine value correspondence is the Query Result of image correspondence to be retrieved.
As follows to this method emulation experiment: as on the basis of 7,655 width of cloth standard pictures, 284 images to be retrieved to be retrieved test.Fig. 2 is the integral retrieval time and the spent time of correlation step of handling 284 images to be retrieved in real time.As seen from Figure 2, the cross curve representation is optimized the query time of inverted index and the time of image vector to be retrieved and standard picture vector similarity measurement, and curvilinear motion is relatively stable, on average expends time in to be 0.0055s.Compare with the cross curve, diamond curve represents that image SIFT descriptor to be retrieved distributes the time, the time of image vector and nonzero element location index and three's sum computing time, the time is longer relatively, but the curvilinear motion amplitude is little, on average expends time in to be 0.2047s.Square curve representation image SIFT to be retrieved descriptor extraction time, the curvilinear motion amplitude is big, and this is main relevant with the size of image to be retrieved, and on average expending time in is 0.2686s.The corresponding integral retrieval time of black curve, on average expending time in is 0.4788s, satisfies the requirement of real-time.
Creating the vision code book on the time, method of the present invention compares with AKM (Approximate K-Means) algorithm and HKM (Hierarchical K-Means) algorithm respectively.On the basis of 7,655 width of cloth standard pictures, extract 1,999,620 SIFT descriptors.If the cluster centre of AKM and HKM is 19962, iterations is 40 times, and the seed points number of vision code book is similarly 19962 at random.Three kinds of times that algorithm is created the vision code book have been provided in the table 1.As seen from Table 1, the creation-time of vision code book will be far smaller than the time that AKM and HKM create at random.
On the inverted index query time, method of the present invention and traditional inverted index compare.On the basis of 284 images to be retrieved, test.Provide traditional inverted index query time among Fig. 3 and optimized the inverted index query time.As seen from Figure 3, diamond curve is represented the query time of traditional inverted index, on average expends time in to be 0.0205s.Square curve representation is optimized the query time of inverted index, compares with diamond curve, and query time is shorter, and the profile amplitude fluctuation is less, and average query time is 0.0028s.Find out, optimize inverted index and accelerate the inquiry velocity of image to be retrieved in the standard picture storehouse.
More than all algorithms all on Matlab 7.6 operation.
AKM HKM Vision code book at random
Time 2.5h 2h 183s
Table 1 vision code book and AKM and HKM is at random created vision code book time ratio

Claims (10)

1. the accelerated method in the image indexing system, it is characterized in that, by adopting the difference of Gaussian operator to carry out feature point detection respectively to standard picture and image to be retrieved, then each difference of Gaussian operator is described and creates the vision code book by the constant descriptor of yardstick, create at random the kd tree according to seed points set then and feature description is classified, then carry out vectorized process and inverted index is optimized, at last image vector to be retrieved is carried out similarity searching in optimizing inverted index, realize the acceleration of image indexing system.
2. the accelerated method in the image indexing system according to claim 1 is characterized in that, described being described by the constant descriptor of yardstick comprises processed offline and two steps of real-time processing, wherein:
In processed offline, for standard picture storehouse C=(I 1, I 2..., I N) in image I i, be expressed as by the SIFT descriptor Wherein:
Figure FDA0000035639950000012
It is image I iIn single descriptor, dimension be 128 the dimension, n iIt is image I iThe number of middle SIFT descriptor, whole SIFT descriptors set are expressed as S=(X in the standard picture storehouse 1, X 2..., X N), the SIFT descriptor adds up in the S set
Figure FDA0000035639950000013
In handling in real time, for image Q to be retrieved, T is expressed as T=(q by the SIFT descriptor 1, q 2..., q m), q wherein k(k=1,2 ..., m) being single descriptor among the image Q, dimension is 128 dimensions, m is the number of SIFT descriptor among the image Q.
3. the accelerated method in the image indexing system according to claim 1, it is characterized in that, described establishment vision code book is meant: feature description in the standard picture storehouse is carried out stochastic sampling and creates the vision code book, concrete steps are: to SIFT descriptor S set stochastic sampling, extract part SIFT descriptor and gather D as seed points, D=(y 1, y 2..., y z), wherein: the quantity of seed points is z among the set D, and each seed points is y j(j=1,2 ..., z); Then SIFT descriptor S set is classified seed points y jDetermined the SIFT descriptor similar to it is divided into seed points y jIn the corresponding class, the quantity z of seed points is the quantity of classification, and seed points set D is standard picture I iThe vision code book that quantification needs.
4. the accelerated method in the image indexing system according to claim 1, it is characterized in that, described establishment kd tree at random is meant: by top-down iterative process, each iteration all selects at random in the dimension of a plurality of big variance yields correspondences with each node and the segmentation threshold of node is chosen as the establishment that principle is carried out node at corresponding dimension at random near in the element of intermediate value.
5. the accelerated method in the image indexing system according to claim 1 is characterized in that, described feature description is classified is meant: according to the node threshold value seed points is gathered each seed points y among the D jBe divided into different spaces, concrete steps are: use single optimum querying method that the SIFT descriptor is searched for to find corresponding seed points y in the kd tree at random at many j, the most similar seed points is searched and left in the middle of the single optimal sequence, when reaching some, query path stops search, and then inquire the seed points corresponding class and be the classification that the SIFT descriptor should be divided.
6. the accelerated method in the image indexing system according to claim 1, it is characterized in that, described vectorized process is meant: adopt seed frequency-method of falling the picture frequency respectively to standard picture and image vector to be retrieved, then to the location index of standard picture vector and image vector nonzero element to be retrieved with calculate.
7. the accelerated method in the image indexing system according to claim 1 is characterized in that, described image vector comprises processed offline and processing in real time, wherein:
The processed offline step of image vector comprises:
1) to standard picture I iMiddle seed points y jThe frequency n that occurs IjAnd SIFT descriptor sum n iAdd up as seed frequency, then standard picture I iMiddle seed frequency To including seed points y among the C of standard picture storehouse jStandard picture quantity M jAdd up;
2) adopt stop words method commonly used in the text retrieval, to M jSize judge that decision threshold is T, works as M jDuring>T, delete corresponding seed points y jWork as M jDuring≤T, keep M jAnd make M j=M rTo all M jAfter the judgement, the number of seed points is reduced to z ' by z, and then falls picture frequency
Figure FDA0000035639950000022
Seed frequency is by f IjBecome f Ir,
Figure FDA0000035639950000023
3) standard picture I iCorresponding image vector is V i, standard picture vector V then iBe expressed as V i=(c 1, c 2..., c Z '), wherein
Figure FDA0000035639950000024
Thereby finish standard image vector in the processed offline;
The real-time treatment step of image vector comprises:
A) treat seed points y among the retrieving images Q rThe number of times m that occurs rAnd SIFT descriptor number m adds up seed frequency among the image Q then to be retrieved
Figure FDA0000035639950000025
B) for the idf of falling the picture frequency of image Q to be retrieved Qr, the idf of falling the picture frequency of employing processed offline r, promptly Image vector V to be retrieved qBe expressed as V q=(d 1, d 2..., d Z '), wherein
Figure FDA0000035639950000032
Thereby finish image vector to be retrieved in the real-time processing.
8. the accelerated method in the image indexing system according to claim 1 is characterized in that, described location index to standard picture vector and image vector nonzero element to be retrieved is meant with calculating: in processed offline, to vector V iBinaryzation establishes that image vector is V ' after the binaryzation i=(p 1, p 2..., p Z '), wherein Thereby standard picture vector V iThe location index of nonzero element and s iBe expressed as
Figure FDA0000035639950000034
In handling in real time, to vector V qBinaryzation establishes that image vector is V ' after the binaryzation q=(w 1, w 2..., w Z '), wherein
Figure FDA0000035639950000035
Thereby image vector V to be retrieved qThe location index of nonzero element and s qBe expressed as
Figure FDA0000035639950000036
9. the accelerated method in the image indexing system according to claim 1 is characterized in that, described establishment is optimized inverted index and is meant: in processed offline, adopt seed points y rAs index, the standard picture vector V iAs the index target, for seed points y r, have corresponding inverted index tabulation L r, for the standard picture vector V iIn element u, work as c u>0, this image vector V then iTitle I iAnd the location index and the s of nonzero element iBe recorded in tabulation L uIn, be designated as L u={ y u| (I i, s i); Then successively to the standard picture vector V iHandle and it is recorded corresponding index L according to the position of nonzero element rIn, create inverted index L={L 1, L 2..., L Z '; L again tabulates inverted index rAnd tabulation L rMiddle corresponding standard picture I iSort, for index L r, the quantity of record standard image is also inequality, and the location index of standard picture vector nonzero element and also inequality is at first with index L rQuantity according to the record standard image sorts from big to small, then at index L rIn with standard picture I iLocation index and s according to nonzero element iOrdering from big to small is at L that inverted index is tabulated rAnd corresponding standard picture I in the tabulation iAfter the ordering, create and optimize inverted index L ', thereby being used for handling in real time carries out similarity searching.
10. according to the accelerated method in the described image indexing system of claim 1, it is characterized in that described similarity searching is meant:
I) inquiry comprises a fairly large number of index of standard picture, then in this index with the location index and the s of image nonzero element to be retrieved qAs threshold value, with s qLocation index and s with standard picture nonzero element in the tabulation iCompare, for less than this threshold value s qStandard picture and the littler standard picture of follow-up location exponential sum thereof will be excluded;
When ii) in optimizing inverted index L ', carrying out similarity searching, there is totalizer A, is used for the record standard image I iThe number of times a that occurs i, each standard picture all corresponding a totalizer a i, A=(a then 1, a 2..., a N), when standard image I in the inverted index tabulation iInquired about once, then standard picture I iCorresponding totalizer a iAdd 1, i.e. a i=a i+ 1, the totalizer A to the standard picture correspondence sorts at last, and the standard picture of the totalizer correspondence that numerical value is bigger promptly is image vector V to be retrieved qCandidate's Query Result, optimize the inverted index search thereby finish;
Iii) with image vector V to be retrieved qWith candidate's standard picture vector V iCarry out similarity measurement, adopt the cosine value between two vectors to carry out similarity calculating,
Figure FDA0000035639950000041
Wherein
Figure FDA0000035639950000042
Figure FDA0000035639950000043
Calculating cosine value cos (V q, V i) after, with cosine value cos (V q, V i) ordering from big to small, maximum cosine value cos (V q, V i) corresponding standard picture I i, be the final Query Result of image Q to be retrieved.
CN2010105732370A 2010-12-02 2010-12-02 Acceleration method in image retrieval system Expired - Fee Related CN102004786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105732370A CN102004786B (en) 2010-12-02 2010-12-02 Acceleration method in image retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105732370A CN102004786B (en) 2010-12-02 2010-12-02 Acceleration method in image retrieval system

Publications (2)

Publication Number Publication Date
CN102004786A true CN102004786A (en) 2011-04-06
CN102004786B CN102004786B (en) 2012-11-28

Family

ID=43812148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105732370A Expired - Fee Related CN102004786B (en) 2010-12-02 2010-12-02 Acceleration method in image retrieval system

Country Status (1)

Country Link
CN (1) CN102004786B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201001A (en) * 2011-04-29 2011-09-28 西安交通大学 Fast retrieval method based on inverted technology
CN102254015A (en) * 2011-07-21 2011-11-23 上海交通大学 Image retrieval method based on visual phrases
CN102902826A (en) * 2012-11-08 2013-01-30 公安部第三研究所 Quick image retrieval method based on reference image indexes
CN103092935A (en) * 2013-01-08 2013-05-08 杭州电子科技大学 Approximate copy image detection method based on scale invariant feature transform (SIFT) quantization
CN103390063A (en) * 2013-07-31 2013-11-13 南京大学 Search method for relevance feedback images based on ant colony algorithm and probability hypergraph
CN104199842A (en) * 2014-08-07 2014-12-10 同济大学 Similar image retrieval method based on local feature neighborhood information
CN104217006A (en) * 2014-09-15 2014-12-17 无锡天脉聚源传媒科技有限公司 Method and device for searching image
CN104424226A (en) * 2013-08-26 2015-03-18 阿里巴巴集团控股有限公司 Method and device for acquiring visual word dictionary and retrieving image
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN108959650A (en) * 2018-08-02 2018-12-07 聊城大学 Image search method based on symbiosis SURF feature
CN110019879A (en) * 2017-07-31 2019-07-16 清华大学 Satellite remote-sensing image searching method and device
CN110019907A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of image search method and device
CN111190893A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Method and device for establishing feature index
CN111797260A (en) * 2020-07-10 2020-10-20 宁夏中科启创知识产权咨询有限公司 Trademark retrieval method and system based on image recognition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6522782B2 (en) * 2000-12-15 2003-02-18 America Online, Inc. Image and text searching techniques
CN101567051A (en) * 2009-06-03 2009-10-28 复旦大学 Image matching method based on characteristic points

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6522782B2 (en) * 2000-12-15 2003-02-18 America Online, Inc. Image and text searching techniques
CN101567051A (en) * 2009-06-03 2009-10-28 复旦大学 Image matching method based on characteristic points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《IEEE ICCV2009 Workshops》 20091004 Mohamed Aly等 《Scaling Object Recognition: Benchmark of Current State of the Art Techniques》 第2117-2124页 1 , 2 *
《IEEE ICIS2009》 20091122 Mei Mei等 《Rapid Search Scheme for Video Copy Detection in Large Databases》 第448-452页 1 , 2 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201001B (en) * 2011-04-29 2012-11-28 西安交通大学 Fast retrieval method based on inverted technology
CN102201001A (en) * 2011-04-29 2011-09-28 西安交通大学 Fast retrieval method based on inverted technology
CN102254015A (en) * 2011-07-21 2011-11-23 上海交通大学 Image retrieval method based on visual phrases
CN102254015B (en) * 2011-07-21 2013-11-20 上海交通大学 Image retrieval method based on visual phrases
CN102902826B (en) * 2012-11-08 2016-07-06 公安部第三研究所 A kind of image method for quickly retrieving based on reference picture index
CN102902826A (en) * 2012-11-08 2013-01-30 公安部第三研究所 Quick image retrieval method based on reference image indexes
CN103092935A (en) * 2013-01-08 2013-05-08 杭州电子科技大学 Approximate copy image detection method based on scale invariant feature transform (SIFT) quantization
CN103390063A (en) * 2013-07-31 2013-11-13 南京大学 Search method for relevance feedback images based on ant colony algorithm and probability hypergraph
CN103390063B (en) * 2013-07-31 2016-08-10 南京大学 A kind of based on ant group algorithm with the search method of related feedback images of probability hypergraph
CN104424226B (en) * 2013-08-26 2018-08-24 阿里巴巴集团控股有限公司 A kind of method and device obtaining visual word dictionary, image retrieval
CN104424226A (en) * 2013-08-26 2015-03-18 阿里巴巴集团控股有限公司 Method and device for acquiring visual word dictionary and retrieving image
CN104199842A (en) * 2014-08-07 2014-12-10 同济大学 Similar image retrieval method based on local feature neighborhood information
CN104199842B (en) * 2014-08-07 2017-10-24 同济大学 A kind of similar pictures search method based on local feature neighborhood information
CN104217006A (en) * 2014-09-15 2014-12-17 无锡天脉聚源传媒科技有限公司 Method and device for searching image
CN105760503A (en) * 2016-02-23 2016-07-13 清华大学 Method for quickly calculating graph node similarity
CN105760503B (en) * 2016-02-23 2019-02-05 清华大学 A kind of method of quick calculating node of graph similarity
CN110019879A (en) * 2017-07-31 2019-07-16 清华大学 Satellite remote-sensing image searching method and device
CN110019907A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of image search method and device
CN110019907B (en) * 2017-12-01 2021-07-16 北京搜狗科技发展有限公司 Image retrieval method and device
CN108959650A (en) * 2018-08-02 2018-12-07 聊城大学 Image search method based on symbiosis SURF feature
CN111190893A (en) * 2018-11-15 2020-05-22 华为技术有限公司 Method and device for establishing feature index
CN111190893B (en) * 2018-11-15 2023-05-16 华为技术有限公司 Method and device for establishing feature index
CN111797260A (en) * 2020-07-10 2020-10-20 宁夏中科启创知识产权咨询有限公司 Trademark retrieval method and system based on image recognition

Also Published As

Publication number Publication date
CN102004786B (en) 2012-11-28

Similar Documents

Publication Publication Date Title
CN102004786B (en) Acceleration method in image retrieval system
CN109241317B (en) Pedestrian Hash retrieval method based on measurement loss in deep learning network
CN102129451B (en) Method for clustering data in image retrieval system
CN102254015B (en) Image retrieval method based on visual phrases
CN104834693B (en) Visual pattern search method and system based on deep search
CN102364498A (en) Multi-label-based image recognition method
KR20100135872A (en) Method for creating image database for object recognition, processing device, and processing program
WO2016049975A1 (en) Clustering coefficient-based adaptive clustering method and system
CN101866366A (en) Image formula Chinese document retrieval method based on content
Murty et al. Nearest neighbour based classifiers
CN103778206A (en) Method for providing network service resources
CN115048464A (en) User operation behavior data detection method and device and electronic equipment
Ye et al. Query-adaptive remote sensing image retrieval based on image rank similarity and image-to-query class similarity
CN103761286A (en) Method for retrieving service resources on basis of user interest
CN107169020B (en) directional webpage collecting method based on keywords
CN103235955A (en) Extraction method of visual word in image retrieval
CN105447142B (en) A kind of double mode agricultural science and technology achievement classification method and system
Pandey et al. A hierarchical clustering approach for image datasets
Dhoot et al. Efficient Dimensionality Reduction for Big Data Using Clustering Technique
JP5833499B2 (en) Retrieval device and program for retrieving content expressed by high-dimensional feature vector set with high accuracy
Zhu et al. Chinese texts classification system
Smelyakov et al. Object-Based Image Comparison Algorithm Development for Data Storage Management Systems.
CN106933805B (en) Method for identifying biological event trigger words in big data set
Sami et al. Incorporating random forest trees with particle swarm optimization for automatic image annotation
Histograms Bi-level classification of color indexed image histograms for content based image retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121128

Termination date: 20171202