CN107451200A - Search method using Randomized Quantizing words tree and the image search method based on it - Google Patents

Search method using Randomized Quantizing words tree and the image search method based on it Download PDF

Info

Publication number
CN107451200A
CN107451200A CN201710545225.9A CN201710545225A CN107451200A CN 107451200 A CN107451200 A CN 107451200A CN 201710545225 A CN201710545225 A CN 201710545225A CN 107451200 A CN107451200 A CN 107451200A
Authority
CN
China
Prior art keywords
image
cluster
characteristic vector
characteristic
search method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710545225.9A
Other languages
Chinese (zh)
Other versions
CN107451200B (en
Inventor
王晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201710545225.9A priority Critical patent/CN107451200B/en
Publication of CN107451200A publication Critical patent/CN107451200A/en
Application granted granted Critical
Publication of CN107451200B publication Critical patent/CN107451200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of search method using Randomized Quantizing words tree and the image search method based on it, comprise the following steps:(1) a nearest neighbor search tree, the root node using all characteristic vectors of whole database as first segment, downward merogenesis are produced;(2) in the second level, center of the k point as cluster is randomly selected from whole database, then according to selected method for measuring similarity, each characteristic vector is assigned to the cluster center of its nearest neighbours, whole database is divided into k subset, continues downward merogenesis;(3) in the third level, in the k cluster obtained for each from the second level, cluster centre of the k characteristic point as its next stage is randomly selected from their characteristic vector pond.(4) repeat.The image search method of the present invention overcomes the problem of words tree establishes the needs substantial amounts of time in the prior art, can establish words tree in a short period of time, meet requirement of real-time.

Description

Search method using Randomized Quantizing words tree and the image search method based on it
Technical field
Image retrieval technologies field of the present invention, more particularly to using Randomized Quantizing words tree search method and based on its Image search method.
Background technology
In recent years, gathered with the development of digital technology particularly network technology with popularization, Internet of Things and computerized information The development of software and hardware technology, increasing data are collected and store, and the speed of quantity collection is considerably beyond tradition Method can handle their speed, and this trend is more and more obvious.Facebook is world rankings leading photo point Website is enjoyed, about 3.5 hundred million photos are uploaded daily by the end of in November, 2013, and the photo capacity only on Facebook is 250PB is reached;In terms of digital video, YouTube was shown in the statistics of 2013, it is per minute upload 72 hours with On video content, have 4,000,000,000 web video playing requests daily, and these data are still being significantly increased.For so huge Big data resource and the requirements for access of same magnanimity, how effectively to organize, management and retrieval large scale database, turn into urgent It is essential the problem of to be solved.
Traditional text based image search method, image is annotated using keyword, image retrieval is become Lookup to keyword.The shortcomings that its is obvious be:Computer vision and artificial intelligence technology all can not enter style of writing automatically to image This mark is, it is necessary to rely on artificial mark.Because data scale constantly expands, the speed manually marked is much unable to catch up with view data Speed of expansion, and due to the subjectivity and inexactness that manually mark, understanding of the different people to image is different, causes to figure Picture annotates the unified standard of neither one.In order to overcome the limitation of text based image search method, 90 years 20th century , there is CBIR (Content Based-Image Retrieval, CBIR) in generation.It is different from tradition Retrieval method, fused images understand technology, there is provided in a kind of image data base from Large Copacity, according to it has been proposed that requirement The method effectively retrieved.
The basic thought of CBIR system is that the visual signature of image is carried out above and below analysis taken in conjunction Text is retrieved.Its implementation method is using view data library storage and manages view data, then by the figure based on content As retrieval technique is as in the engine embedded images database of database, there is provided CBIR function.Existing CBIR system in generally use low layer image information, including the color of image, texture, shape and The contents such as the spatial relationship between them, the similarity between query image and target image is calculated, then according to similarity Matching degree between size, i.e. characteristics of image is retrieved.Therefore, first using feature extraction each width in image library Image is converted to a point in image feature space, i.e., corresponding characteristic vector, then, image is carried out according to characteristic vector Retrieval, so as to which CBIR to be converted into retrieval to characteristic point in image feature space.
In the case of image data base scale is smaller, the most frequently used characteristics of image search method is sequential scanning method. But as people obtain the means continuous development of information and the continuous growth of information requirement, the scale of image data base is more next Bigger, traditional sequential scanning method can not meet requirement of the user for retrieval time.Therefore, by being carried out to data Effective tissue is improved retrieval rate, is based on interior so as to establish an efficient Indexing Mechanism with rapid drop range of search Hold the key point of retrieval.
In conventional correlative study, researchers are directed to specific application field, it is proposed that many data index methods. However, these data index methods when handling high dimensional data, are all influenceed by higher dimensional space " dimension disaster ", work as data dimension During degree increase, it is retrieved performance and degenerates to sequential scan, or even also poorer than sequential scan performance.In being studied in CBIR, from The characteristic vector extracted in original image is generally all higher-dimension, for image feature data index inevitably by The influence of " dimension disaster ".Nister and Stewenius proposes the search method based on words tree and shown in higher dimensional space Good retrieval effectiveness, still, its achievement time to higher dimensional space are very long, it is difficult to meet that modern data storehouse is ageing to retrieving Requirement.Therefore, for the higher-dimension characteristic of image feature data, efficient high dimensional data indexing mechanism is established, is present image The significant challenge that retrieval research is faced.
The content of the invention
The invention provides a kind of search method using Randomized Quantizing words tree and the image search method based on it, purport Solving drawbacks described above present in prior art.
To reach above-mentioned technical purpose, the present invention adopts the following technical scheme that:
Using the search method of Randomized Quantizing words tree, comprise the following steps:
(1) one nearest neighbor search tree of generation, the root node using all characteristic vectors of whole database as first segment, Downward merogenesis;
(2) in the second level, center of the k point as cluster is randomly selected from whole database, then according to selected Method for measuring similarity, each characteristic vector is assigned to the cluster center of its nearest neighbours, and whole database is divided into k subset, Continue downward merogenesis;
(3) it is random from their characteristic vector pond in the k cluster obtained for each from the second level in the third level Cluster centre of the k characteristic point as its next stage is chosen, is then distributed each characteristic vector using method for measuring similarity To the cluster center of its nearest neighbours, so as to form k on the third level2Individual cluster;
(4) repeat step (2), (3), until the characteristic vector that all leaf nodes include belongs to same class object or leaf segment The quantity for the characteristic vector that point includes is less than certain limitation;Wherein each characteristic vector has a class label associated with it.
In step (2), characteristic vector to two or more cluster centre distance it is equal, then randomly choose a cluster.
In step (3), the new characteristic vector selected from characteristic vector pond is assigned it in the cluster of its nearest neighbours The heart, when reaching the leaf node at cluster center of distribution, if all characteristic vector points have identical class label in the leaf node, Associated class label is then distributed to new characteristic vector, then stops computing;Otherwise, scanned for again in cluster is distributed, Select in cluster with new characteristic vector apart from most short characteristic vector, and the class label that this feature vector correlation joins is distributed to New characteristic vector, then stops computing.
A kind of image search method of the search method based on Randomized Quantizing words tree, comprises the following steps:
(1) some overlapped subregions are divided the image into by overlap partition method first;
(2) the characteristic information block of image is combined with its semantic feature;It is corresponding due to the characteristic vector of each extraction It is (poly- i.e. in data mining by carrying out non supervised learning to the characteristic point in feature space in a point of feature space Class), all characteristic vectors in characteristic vector storehouse are divided into multiple patterns so that have between the pattern in same class More similitudes, there is larger diversity between the pattern in inhomogeneity;Characteristic point is marked by class label, Each class label has specific semantic information, image feature information block is combined with semantic feature, to the difference of image Region explains, to establish image knowledge storehouse;
(3) a nearest neighbor search tree is produced, using all image feature vectors in image knowledge storehouse as first segment Root node, downward merogenesis;
(4) in the second level, center of the k point as cluster is randomly selected from image knowledge storehouse, then according to selected Method for measuring similarity, each image feature vector is assigned to the cluster center of its nearest neighbours, whole database is divided into k son Collection, continues downward merogenesis;
(5) it is random from their characteristic vector pond in the k cluster obtained for each from the second level in the third level Cluster centre of the k characteristic point as its next stage is chosen, then using method for measuring similarity by each image feature vector The cluster center of its nearest neighbours is assigned to, so as to form k on the third level2Individual cluster;
(6) repeat step (2), (3), until the image feature vector that includes of all leaf nodes belong to same class object or The quantity for the image feature vector that leaf node includes is less than certain limitation;Wherein each image feature vector has one and it Related class label.
In step (1), overlap partition method carries out the image that size is height × weight with N × N window Division, row and column direction press Nhop pixel shift, are divided into some overlapped subregions, in order that included in image Sufficiently small object can be detected, and reduce square window size, increase patch quantity;By overlapped subinterval, Color histogram is combined with the spatial distribution of color.
In step (2), establish during image knowledge storehouse, using Chameleon clustering algorithms and the cluster based on MST Algorithm, the color histogram feature vector storehouse of image is clustered, class label is set to cluster result, established based on colored straight The knowledge base of square figure feature.
Using above technical scheme, have the advantages that:
(1) image search method of the invention, which overcomes words tree in the prior art and established, needs asking for substantial amounts of time Topic, can establish words tree in a short period of time, meet requirement of real-time;
(2) by the present invention in that with overlap partition method, the color histogram by image thinning into multiple blocks of extraction images As characteristic vector storehouse, effectively the color histogram of image is combined with color space information, overcome in the prior art Ignore the spatial character of color this problem during image characteristics extraction;
(3) present invention can more quickly extract picture feature, meet requirement of real-time, while property data base is entered Row non supervised learning, the different zones of scene image are marked by zone marker, form knowledge base.
Brief description of the drawings
Fig. 1 is the schematic diagram of the present invention;
Fig. 2 is the schematic diagram of overlap partition method of the present invention;
Fig. 3 is the cluster result of 21-14 10648 dimension RGB histograms;
Fig. 4 is the cluster result of 24-16 5000 dimension HSV histograms;
Fig. 5 is the cluster result of 24-16 5832 dimension Opponent histograms;
Fig. 6 is the cluster result of 21-14 10648 dimension Transformed histograms;
Fig. 7 is 21-14 group RGB histogram accuracy rate comparison diagrams;
Fig. 8 is 24-16 group RGB histogram accuracy rate comparison diagrams;
Fig. 9 is 27-18 group RGB histogram accuracy rate comparison diagrams.
Specific embodiment
Below in conjunction with the accompanying drawings, embodiment, this programme is further described.
As shown in figure 1, using the search method of Randomized Quantizing words tree, comprise the following steps:
(1) one nearest neighbor search tree of generation, the root node using all characteristic vectors of whole database as first segment, Downward merogenesis;
(2) in the second level, center of the k point as cluster is randomly selected from whole database, then according to selected Method for measuring similarity, each characteristic vector is assigned to the cluster center of its nearest neighbours, and whole database is divided into k subset, Continue downward merogenesis;
(3) it is random from their characteristic vector pond in the k cluster obtained for each from the second level in the third level Cluster centre of the k characteristic point as its next stage is chosen, is then distributed each characteristic vector using method for measuring similarity To the cluster center of its nearest neighbours, so as to form k on the third level2Individual cluster;
(4) repeat step (2), (3), until the characteristic vector that all leaf nodes include belongs to same class object or leaf segment The quantity for the characteristic vector that point includes is less than certain limitation;Wherein each characteristic vector has a class label associated with it.
In step (2), characteristic vector to two or more cluster centre distance it is equal, then randomly choose a cluster.
In step (3), the new characteristic vector selected from characteristic vector pond is assigned it in the cluster of its nearest neighbours The heart, when reaching the leaf node at cluster center of distribution, if all characteristic vector points have identical class label in the leaf node, Associated class label is then distributed to new characteristic vector, then stops computing;Otherwise, scanned for again in cluster is distributed, Select in cluster with new characteristic vector apart from most short characteristic vector, and the class label that this feature vector correlation joins is distributed to New characteristic vector, then stops computing.
For large database, the search method selection of Randomized Quantizing tree follows the thought of words tree, but on its basis An important improvement has been done, has produced a nearest neighbor search tree.As shown in Fig. 1, data-oriented storehouse, only have in the first stage One node, comprising all characteristic vectors, turns into root node.The second level, K point conduct is randomly selected from whole database The center of cluster, then according to selected method for measuring similarity, each characteristic vector is assigned to from its immediate cluster The heart, whole database is divided into K subset.In the third level, the K cluster obtained for each from the second level, from they Cluster centre of the K characteristic point as its next stage is randomly selected in characteristic vector pond, then will using method for measuring similarity Each characteristic vector is assigned to the cluster center of its nearest neighbours, so as to form K cluster on the third level.Continue this process, until The characteristic vector that all leaf nodes include belongs to the feature that same class object (that is, the node is pure) or leaf node include The quantity of vector is less than certain limitation (for example, 50).Each characteristic vector has a class label associated with it.
When carrying out branch to tree, by a nearest neighbor search tree, the distances of each data to other data item can be with It is updated to a smaller value.This strategy ensure that immediate data point is more likely assigned in same subregion in space. However, because any one data point in subregion, compared to the center of other subregions, all closest to the cluster center of its own (not It is arest neighbors), if the distance at data point to the center of two or more clusters is equal, data point then randomly chooses a cluster.
Pass through the new characteristic vector that the search of Randomized Quantizing tree is given.New characteristic vector is a certain along Randomized Quantizing tree Particular path, this feature vector is calculated on each layer to the distance at K cluster center, result is this new characteristic vector to K A closest cluster central point of individual cluster central point.When reaching leaf node, if the leaf node is pure (that is, leaf segment All characteristic vector points have identical class label in point), associated class label is distributed to new characteristic vector, is then stopped Computing.Otherwise, a nearest neighbor search is done in the vector of related cluster, the result of search is according to selected similarity measurement The characteristic vector for the beeline that method obtains, and the class label that this feature vector correlation joins is distributed into new characteristic vector, Then computing is stopped.
A kind of image search method of the search method based on Randomized Quantizing words tree, comprises the following steps:
(1) some overlapped subregions are divided the image into by overlap partition method first;
(2) the characteristic information block of image is combined with its semantic feature;It is corresponding due to the characteristic vector of each extraction It is (poly- i.e. in data mining by carrying out non supervised learning to the characteristic point in feature space in a point of feature space Class), all characteristic vectors in characteristic vector storehouse are divided into multiple patterns so that have between the pattern in same class More similitudes, there is larger diversity between the pattern in inhomogeneity;Characteristic point is marked by class label, Each class label has specific semantic information, image feature information block is combined with semantic feature, to the difference of image Region explains, to establish image knowledge storehouse;
(3) a nearest neighbor search tree is produced, using all image feature vectors in image knowledge storehouse as first segment Root node, downward merogenesis;
(4) in the second level, center of the k point as cluster is randomly selected from image knowledge storehouse, then according to selected Method for measuring similarity, each image feature vector is assigned to the cluster center of its nearest neighbours, whole database is divided into k son Collection, continues downward merogenesis;
(5) it is random from their characteristic vector pond in the k cluster obtained for each from the second level in the third level Cluster centre of the k characteristic point as its next stage is chosen, then using method for measuring similarity by each image feature vector The cluster center of its nearest neighbours is assigned to, so as to form k on the third level2Individual cluster;
(6) repeat step (2), (3), until the image feature vector that includes of all leaf nodes belong to same class object or The quantity for the image feature vector that leaf node includes is less than certain limitation;Wherein each image feature vector has one and it Related class label.
As shown in Fig. 2 in step (1), image N × N of the overlap partition method by size for height × weight Window divided, Nhop pixel shift is pressed in row and column direction, is divided into some overlapped subregions, in order that figure The sufficiently small object included as in can be detected, and reduce square window size, increase patch quantity;Pass through phase mutual respect Folded subinterval, color histogram is combined with the spatial distribution of color.The image of size is divided into overlapping subinterval more Individual square:
Line number:Blockrows=(height-N)/Nhop+1
Columns:Blockcols=(weidth-N)/Nhop+1
Square patch number:NumofSamples=blockrows × blockcols
Image size is height × weight pixel, and the size of the image after caused processing is by patch window Size determine.Changed by the patch window size of picture, the quantity of caused square patch can also change.Work as patch When window reduces, square patch quantity caused by processing image can increase, and vice versa.Now, it is each in the image after processing Individual pixel is represented with the color histogram of a square patch.The size of square window directly affects point of the image after processing Resolution.Therefore, can not be excessive which has limited the size of patch square window so that need to have been represented with greater number of patch Whole image information.Mean that the patch number of each image increases.
Simple extracts its color histogram to image progress piecemeal, and image alone simply is divided into no any language The block of adopted information.It is the purpose of establishing image knowledge storehouse that the characteristic information block of image is combined with its semantic feature.Due to every The characteristic vector of one extraction, corresponding to a point of feature space, by carrying out the characteristic point in feature space without guidance Learn (cluster i.e. in data mining), all characteristic vectors in characteristic vector storehouse are divided into multiple patterns so that same There are more similitudes between pattern in one class, there is larger diversity between the pattern in inhomogeneity.Pass through Characteristic point is marked class label, and each class label has specific semantic information, by image feature information block and semanteme Feature is combined, and the different zones of image are explained.
In step (2), establish during image knowledge storehouse, using Chameleon clustering algorithms and the cluster based on MST Algorithm, the color histogram feature vector storehouse of image is clustered, class label is set to cluster result, established based on colored straight The knowledge base of square figure feature.
Below by the mode of experiment, to further demonstrate that the superiority of the present invention.
The cluster of image feature data
Chameleon clustering algorithms are respectively adopted to characteristic vector data storehouse and the clustering algorithm based on MST gathers Class, the object in image is divided into multiple clusters, and cluster is marked, form knowledge base.It is and poly- using coloured image performance Database after class.By by original image set, the cluster result image based on MST, Chameleon cluster result image threes Contrasted, verify the feasibility of the extracting method to characteristics of image herein.
In experiment, because view data is concentrated, image is relatively more, and we are only to the parts of images cluster result in database Be shown, simultaneously because property data base has multigroup, we are respectively to rgb space, HSV space, Opponent spaces and The one group cluster result in Transformed spaces is shown.
Table 1 is based on prevailing scenario and its semantic relation in RGB color histogram RGB knowledge bases
As shown in figure 3, tie up the cluster result of RGB histograms in figure for the 10648 of 21-14.
Table 2 is based on hsv color histogram HSV knowledge bases prevailing scenario and its semantic relation
As shown in figure 4, tie up the cluster result of HSV histograms in figure for the 5000 of 24-16.
Table 3 is based on Opponent color histograms Opponent knowledge bases prevailing scenario and its semantic relation
As shown in figure 5, tie up the cluster result of Opponent histograms in figure for the 5832 of 24-16.
Table 4 is based on prevailing scenario and its semantic relation in Transformed color histogram Transformed knowledge bases
As shown in fig. 6, the cluster result for the 10648 dimension Transformed histograms that figure is 21-14.
Extract characteristic velocity
It is 720 × 1280 pixels from image size, using 21 × 21 patch window sizes, mobile pixel value is arranged to 14 (being expressed as 21-14), then the size of the image after handling is 50 × 90 pixels;When big using 24 × 24 patch windows Hour, mobile pixel value is arranged to 16 (being expressed as 24-16), then the image size after handling is;When using 27 × 27 patch windows During mouth size, mobile pixel value is arranged to 18 (being expressed as 27-18), then the image size after handling is 39 × 70.
By the overlap partition method more refined, 4500 blocks, 3476 blocks, 2730 blocks are divided an image into, and Extract the color histogram of each block.Conventional image block, the main greyscale color histogram for extracting image, or HSV face Color Histogram, herein, extract the RGB color histogram of image, hsv color histogram, Opponent color histograms and Transformed color histograms, establish image feature vector storehouse.
In experiment, to 35 width images of 720 × 1280 sizes, 21-14, tri- kinds overlapping point of 24-16,27-18 is respectively adopted The RGB color histogram 10000 of block method extraction image is tieed up, HSV color histograms 10648 are tieed up, Opponent color histograms 10000 dimensions, Transformed color histograms 10000 are tieed up, Gabor textural characteristics, and the SIFT of extraction identical image special Sign point.
This experiment counts the extraction time of 35 width picture different characteristics respectively, and the average value of each image feature extraction is as follows Shown in table.
Table 5
Table 5 is shown:Using identical image block method, even if the higher-dimension color histogram of extraction image, it extracts speed Degree is substantially better than the extracting method speed of Gabor textural characteristics;Meanwhile the time that the SIFT feature for extracting entire image is spent Also it is slowly more many than extracting the time of its color histogram after piecemeal.Therefore, the feature extraction mode used herein can be quick Extraction characteristics of image.
Accuracy rate
Because nearest _neighbor retrieval is that retrieval object belongs to same class with its arest neighbors, its retrieval accuracy highest.We with Arest neighbors is as standard retrieval result set.Pass through the result and arest neighbors for retrieving the retrieval result of words tree, Randomized Quantizing tree Retrieval result is compared, and analyzes the retrieval rate of two kinds of trees.
RGB histograms accuracy rate contrasts
First, we be directed to RGB color in different groups (21-14,24-16,27-18) multiple dimensions (64 dimension, 125 dimensions, 216 dimensions, 512 dimensions, 1000 dimensions, 2744 dimensions, 5832 dimensions, 10648 dimensions), totally 24 groups of training sets are retrieved, contrast knot Fruit table 2, table 3, shown in table 4, wherein KQtree is Randomized Quantizing words tree, and VTree is traditional words tree.
RGB histogram accuracy rate comparing results:Shown from Fig. 7, Fig. 8, Fig. 9, the accuracy rate of Randomized Quantizing tree is substantially high In the accuracy rate of words tree.
In Fig. 8, the accuracy rate of Randomized Quantizing tree ties up highest 1000, reaches 83.03%, and show the retrieval of higher-dimension As a result it is better than low-dimensional retrieval result.In fig.9, Randomized Quantizing tree accuracy rate highest when dimension is 5832, reaches 86.73%, Also show that the retrieval result of higher-dimension is better than low-dimensional.In fig.9, Randomized Quantizing tree ties up accuracy rate highest 512, reaches 85.49%, high dimensional data retrieval effectiveness is better than low-dimensional data retrieval effectiveness, but unobvious, and the retrieval effectiveness of middle dimension is most It is good.
Fig. 7, Fig. 8, Fig. 9 are contrasted together, it has been found that RGB color histogram is generally imitated in the retrieval of high-dimensional data space Fruit is better than the retrieval effectiveness in low-dimensional data space, meanwhile, different square window sizes is directed to, window is smaller, square patch Number is more, and image information is abundanter, but is not that square patch is more, and the retrieval rate of image is higher, and different dimensions have Different performances, do not unify rule.
Establish the words tree time
Randomized Quantizing tree and words tree in RGB histograms different groups (21-14,24-16,27-18) multiple dimensions The run time of (64 dimensions, 125 dimensions, 216 dimensions, 512 dimensions, 1000 dimensions, 2744 dimensions, 5832 dimensions, 10648 dimensions), such as table 6, table 7, table Shown in 8, unit is the second, and wherein KQtree is Randomized Quantizing words tree, and VTree is traditional words tree.
The window 21 × 21 of table 6, displacement 14, Randomized Quantizing words tree and words tree are when RGB histograms different dimensions are run Between
The window 24 × 24 of table 7, displacement 16, Randomized Quantizing words tree and words tree are when RGB histograms different dimensions are run Between
The window 27 × 27 of table 8, displacement 18, Randomized Quantizing words tree and words tree are when RGB histograms different dimensions are run Between
In RGB histograms, the speed of service of Randomized Quantizing tree is substantially faster than the speed of service of words tree.As RGB is straight The increase of square figure dimension, words tree run time increase than Randomized Quantizing tree run time into geometry multiple.
Randomized Quantizing words tree and words tree in Opponent histograms different groups (21-14,24-16,27-18) The run time of multiple dimensions (64 dimensions, 125 dimensions, 216 dimensions, 512 dimensions, 1000 dimensions, 2744 dimensions, 5832 dimensions, 10648 dimensions), such as table Shown in 9,10,11, unit is the second.
The window 21 × 21 of table 9, displacement 14, Randomized Quantizing words tree and words tree are transported in Opponent histograms different dimensions The row time
The window 24 × 24 of table 10, displacement 16, Randomized Quantizing words tree and words tree are in Opponent histogram different dimensions Run time
The window 27 × 27 of table 11, displacement 18, Randomized Quantizing words tree and words tree are in Opponent histogram different dimensions Run time
In Opponent histograms, the speed of service of Randomized Quantizing tree is substantially faster than the speed of service of words tree.With The increase of Opponent histogram dimensions, words tree run time increase than Randomized Quantizing tree run time into geometry multiple.
It in summary it can be seen, the search method of Randomized Quantizing words tree is in time efficiency, hence it is evident that better than words tree.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art Member, under the premise without departing from the principles of the invention, can also make some improvement and supplement, and these are improved and supplement also should be regarded as Protection scope of the present invention.

Claims (6)

1. use the search method of Randomized Quantizing words tree, it is characterised in that comprise the following steps:
(1) one nearest neighbor search tree of generation, the root node using all characteristic vectors of whole database as first segment, downwards Merogenesis;
(2) in the second level, center of the k point as cluster is randomly selected from whole database, then according to selected similar Property measure, each characteristic vector is assigned to the cluster center of its nearest neighbours, whole database is divided into k subset, continue Downward merogenesis;
(3) in the third level, in the k cluster obtained for each from the second level, k is randomly selected from their characteristic vector pond Cluster centre of the individual characteristic point as its next stage, then each characteristic vector is assigned to from it using method for measuring similarity Nearest cluster center, so as to form k on the third level2Individual cluster;
(4) repeat step (2), (3), until the characteristic vector that all leaf nodes include belongs to same class object or leaf node bag The quantity of the characteristic vector contained is less than certain limitation;Wherein each characteristic vector has a class label associated with it.
2. the search method of Randomized Quantizing words tree is used as claimed in claim 1, it is characterised in that:It is special in step (2) The cluster centre distance that sign vector arrives two or more is equal, then randomly chooses a cluster.
3. the search method of Randomized Quantizing words tree is used as claimed in claim 1, it is characterised in that:In step (3), from The new characteristic vector selected in characteristic vector pond, the cluster center of its nearest neighbours is assigned it to, when the cluster center for reaching distribution Leaf node when, if all characteristic vector points have identical class label in the leaf node, distribute associated class label To new characteristic vector, then stop computing;Otherwise, scanned for again in cluster is distributed, select cluster in new feature to Span distributes to new characteristic vector from most short characteristic vector, and by the class label that this feature vector correlation joins, and then stops Only computing.
4. a kind of image search method of the search method based on Randomized Quantizing words tree, it is characterised in that comprise the following steps:
(1) some overlapped subregions are divided the image into by overlap partition method first;
(2) the characteristic information block of image is combined with its semantic feature;Due to the characteristic vector of each extraction, corresponding to spy A point in space is levied, will by carrying out non supervised learning (cluster i.e. in data mining) to the characteristic point in feature space All characteristic vectors in characteristic vector storehouse are divided into multiple patterns so that have between the pattern in same class more Similitude, there is larger diversity between the pattern in inhomogeneity;Characteristic point is marked by class label, each Class label has specific semantic information, and image feature information block is combined with semantic feature, the different zones of image are entered Row is explained, to establish image knowledge storehouse;
(3) a nearest neighbor search tree, the root section using all image feature vectors in image knowledge storehouse as first segment are produced Point, downward merogenesis;
(4) in the second level, center of the k point as cluster is randomly selected from image knowledge storehouse, then according to selected similar Property measure, each image feature vector is assigned to the cluster center of its nearest neighbours, whole database is divided into k subset, Continue downward merogenesis;
(5) in the third level, in the k cluster obtained for each from the second level, k is randomly selected from their characteristic vector pond Cluster centre of the individual characteristic point as its next stage, then each image feature vector is assigned to using method for measuring similarity The cluster center of its nearest neighbours, so as to form k on the third level2Individual cluster;
(6) repeat step (2), (3), until the image feature vector that all leaf nodes include belongs to same class object or leaf segment The quantity for the image feature vector that point includes is less than certain limitation;Wherein each image feature vector have one it is associated with it Class label.
5. a kind of image search method of the search method based on Randomized Quantizing words tree as claimed in claim 4, its feature It is:In step (1), overlap partition method is drawn the image that size is height × weight with N × N window Point, Nhop pixel shift is pressed in row and column direction, is divided into some overlapped subregions, in order that the foot included in image Enough small objects can be detected, and reduce square window size, increase patch quantity;, will by overlapped subinterval Color histogram is combined with the spatial distribution of color.
6. a kind of image search method of the search method based on Randomized Quantizing words tree as claimed in claim 4, its feature It is:In step (2), establish during image knowledge storehouse, calculated using Chameleon clustering algorithms with the cluster based on MST Method, the color histogram feature vector storehouse of image is clustered, class label is set to cluster result, foundation is based on color histogram The knowledge base of figure feature.
CN201710545225.9A 2017-07-06 2017-07-06 Retrieval method using random quantization vocabulary tree and image retrieval method based on same Active CN107451200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710545225.9A CN107451200B (en) 2017-07-06 2017-07-06 Retrieval method using random quantization vocabulary tree and image retrieval method based on same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710545225.9A CN107451200B (en) 2017-07-06 2017-07-06 Retrieval method using random quantization vocabulary tree and image retrieval method based on same

Publications (2)

Publication Number Publication Date
CN107451200A true CN107451200A (en) 2017-12-08
CN107451200B CN107451200B (en) 2020-07-28

Family

ID=60488400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710545225.9A Active CN107451200B (en) 2017-07-06 2017-07-06 Retrieval method using random quantization vocabulary tree and image retrieval method based on same

Country Status (1)

Country Link
CN (1) CN107451200B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241745A (en) * 2018-01-08 2018-07-03 阿里巴巴集团控股有限公司 The processing method and processing device of sample set, the querying method of sample and device
CN108536769A (en) * 2018-03-22 2018-09-14 深圳市安软慧视科技有限公司 Image analysis method, searching method and device, computer installation and storage medium
CN109992690A (en) * 2019-03-11 2019-07-09 中国华戎科技集团有限公司 A kind of image search method and system
CN111274367A (en) * 2018-11-20 2020-06-12 财团法人资讯工业策进会 Semantic analysis method, semantic analysis system and non-transitory computer readable medium
CN112966718A (en) * 2021-02-05 2021-06-15 深圳市优必选科技股份有限公司 Image identification method and device and communication equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214172A1 (en) * 2005-11-18 2007-09-13 University Of Kentucky Research Foundation Scalable object recognition using hierarchical quantization with a vocabulary tree
CN103678504A (en) * 2013-11-19 2014-03-26 西安华海盈泰医疗信息技术有限公司 Similarity-based breast image matching image searching method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070214172A1 (en) * 2005-11-18 2007-09-13 University Of Kentucky Research Foundation Scalable object recognition using hierarchical quantization with a vocabulary tree
CN103678504A (en) * 2013-11-19 2014-03-26 西安华海盈泰医疗信息技术有限公司 Similarity-based breast image matching image searching method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨树极: "一种结合语义特征和视觉特征的图像检索方法", 《电脑开发与应用》 *
林克正等: "基于分块主颜色匹配的图像检索", 《计算机工程》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241745A (en) * 2018-01-08 2018-07-03 阿里巴巴集团控股有限公司 The processing method and processing device of sample set, the querying method of sample and device
WO2019134567A1 (en) * 2018-01-08 2019-07-11 阿里巴巴集团控股有限公司 Sample set processing method and apparatus, and sample querying method and apparatus
CN108241745B (en) * 2018-01-08 2020-04-28 阿里巴巴集团控股有限公司 Sample set processing method and device and sample query method and device
TWI696081B (en) * 2018-01-08 2020-06-11 香港商阿里巴巴集團服務有限公司 Sample set processing method and device, sample query method and device
US10896164B2 (en) 2018-01-08 2021-01-19 Advanced New Technologies Co., Ltd. Sample set processing method and apparatus, and sample querying method and apparatus
CN108536769A (en) * 2018-03-22 2018-09-14 深圳市安软慧视科技有限公司 Image analysis method, searching method and device, computer installation and storage medium
CN108536769B (en) * 2018-03-22 2023-01-03 深圳市安软慧视科技有限公司 Image analysis method, search method and device, computer device and storage medium
CN111274367A (en) * 2018-11-20 2020-06-12 财团法人资讯工业策进会 Semantic analysis method, semantic analysis system and non-transitory computer readable medium
CN109992690A (en) * 2019-03-11 2019-07-09 中国华戎科技集团有限公司 A kind of image search method and system
CN109992690B (en) * 2019-03-11 2021-04-13 中国华戎科技集团有限公司 Image retrieval method and system
CN112966718A (en) * 2021-02-05 2021-06-15 深圳市优必选科技股份有限公司 Image identification method and device and communication equipment
CN112966718B (en) * 2021-02-05 2023-12-19 深圳市优必选科技股份有限公司 Image recognition method and device and communication equipment

Also Published As

Publication number Publication date
CN107451200B (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN107451200A (en) Search method using Randomized Quantizing words tree and the image search method based on it
Fan et al. Taking a deeper look at co-salient object detection
Huang et al. DeepDiff: Learning deep difference features on human body parts for person re-identification
CN106250423B (en) The cross-domain costume retrieval method of depth convolutional neural networks shared based on partial parameters
CN105718555A (en) Hierarchical semantic description based image retrieving method
CN109344842A (en) A kind of pedestrian's recognition methods again based on semantic region expression
CN106874421A (en) Image search method based on self adaptation rectangular window
CN103399863B (en) Image search method based on the poor characteristic bag of edge direction
CN104317946A (en) Multi-key image-based image content retrieval method
CN102402508A (en) Similar image search device and search method thereof
CN106778896A (en) A kind of Cordyceps sinensis detection method based on own coding feature learning
CN104268580A (en) Class cartoon layout image management method based on scene classification
CN103744903B (en) A kind of scene image search method based on sketch
Blažek et al. Video retrieval with feature signature sketches
EP3748460A1 (en) Search system, search method, and program
Rodrigues et al. Graph visual rhythms in temporal network analyses
Xiaoling A novel circular ring histogram for content-based image retrieval
CN103886333B (en) Method for active spectral clustering of remote sensing images
Huang et al. Tea garden detection from high-resolution imagery using a scene-based framework
CN111325290A (en) Chinese painting image classification method based on multi-view fusion and multi-example learning
CN108268533B (en) Image feature matching method for image retrieval
Dong et al. Effective and efficient photo quality assessment
Li et al. A combined feature representation of deep feature and hand-crafted features for person re-identification
Sun et al. A novel region-based approach to visual concept modeling using web images
Wang et al. Intensity filtering and group fusion for accurate mobile place recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant