CN107451200B

CN107451200B - Retrieval method using random quantized lexical tree and image retrieval method based thereon

Info

Publication number: CN107451200B
Application number: CN201710545225.9A
Authority: CN
Inventors: 王晓春
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2017-07-06
Filing date: 2017-07-06
Publication date: 2020-07-28
Anticipated expiration: 2037-07-06
Also published as: CN107451200A

Abstract

The invention discloses a retrieval method using a random quantized vocabulary tree and an image retrieval method based on the same, which comprises the following steps: (1) generating a nearest neighbor search tree, taking all the feature vectors of the whole database as root nodes of a first section, and segmenting downwards; (2) in the second stage, k points are randomly selected from the whole database to serve as cluster centers, then each feature vector is distributed to the cluster center closest to the feature vector according to the selected similarity measurement method, the whole database is divided into k subsets, and downward segmentation is continued; (3) in the third level, for each k clusters obtained from the second level, k feature points are randomly selected from the feature vector pool thereof as the cluster centers of the next level. (4) And (6) repeating. The image retrieval method of the invention overcomes the problem that the establishment of the vocabulary tree in the prior art needs a large amount of time, can establish the vocabulary tree in a short time and meets the real-time requirement.

Description

Retrieval method using random quantized lexical tree and image retrieval method based thereon

技术领域technical field

本发明图像检索技术领域，特别涉及使用随机量化词汇树的检索方法及基于其的图像检索方法。The present invention relates to the technical field of image retrieval, in particular to a retrieval method using a random quantized vocabulary tree and an image retrieval method based thereon.

背景技术Background technique

近年来，随着数字技术特别是网络技术的发展与普及、物联网和计算机信息采集软硬件技术的发展，越来越多的数据被采集和存储，数量采集的速度已经远远超过了传统方法能够处理它们的速度，而且这个趋势越来越明显。Facebook是世界排名领先的照片分享站点，截止到2013年11月每天上传约3.5亿张照片，而仅在Facebook 上的照片容量已经达到了250PB；在数字视频方面，YouTube在2013 年的统计数据显示，每分钟上传72小时以上的视频内容，每天有40 亿个网站视频播放请求，且这些数据仍在大幅增加。对于如此巨大的数据资源和同样海量的访问需求，如何有效地组织、管理和检索大规模数据库，成为迫切需要解决的问题。In recent years, with the development and popularization of digital technology, especially network technology, the development of Internet of Things and computer information collection software and hardware technology, more and more data are collected and stored, and the speed of quantity collection has far exceeded the traditional method. The speed at which they can be handled, and this trend is becoming more and more obvious. Facebook is the world's leading photo sharing site. As of November 2013, about 350 million photos are uploaded every day, while the photo capacity on Facebook alone has reached 250PB; in terms of digital video, YouTube's statistics in 2013 show that , more than 72 hours of video content are uploaded every minute, there are 4 billion website video playback requests every day, and these data are still increasing significantly. For such huge data resources and the same massive access requirements, how to effectively organize, manage and retrieve large-scale databases has become an urgent problem to be solved.

传统的基于文本的图像检索方法，采用关键字对图像进行注释，将图像检索变成对关键字的查找。其明显的缺点是：计算机视觉与人工智能技术都无法对图像自动进行文本标注，需要依赖人工标注。由于数据规模不断膨胀，人工标注的速度远远赶不上图像数据的膨胀速度，而且由于人工标注的主观性和不精确性，不同人对图像的理解不同，导致对图像的注释没有一个统一的标准。为了克服基于文本的图像检索方法的局限性，20世纪90年代出现了基于内容的图像检索 (Content Based-Image Retrieval，CBIR)。它区别于传统的检索手段，融合图像理解技术，提供一种从大容量的图像数据库中，根据人们提出的要求进行有效检索的方法。Traditional text-based image retrieval methods use keywords to annotate images, turning image retrieval into keyword search. Its obvious disadvantage is that neither computer vision nor artificial intelligence technology can automatically perform text annotation on images, and need to rely on manual annotation. Due to the continuous expansion of data scale, the speed of manual annotation is far behind the expansion speed of image data, and due to the subjectivity and imprecision of manual annotation, different people have different understandings of images, resulting in no unified standard for image annotation. To overcome the limitations of text-based image retrieval methods, Content Based-Image Retrieval (CBIR) emerged in the 1990s. Different from traditional retrieval methods, it integrates image understanding technology to provide a method for efficient retrieval from large-capacity image databases according to people's requirements.

基于内容的图像检索系统的基本思想是对图像的视觉特征进行分析并联系上下文进行检索。它的实现方法是采用图像数据库存储并管理图像数据，然后将基于内容的图像检索技术作为数据库的引擎嵌入图像数据库中，提供基于内容的图像检索功能。在现有的基于内容的图像检索系统中普遍采用低层的图像信息，包括图像的颜色、纹理、形状以及它们之间的空间关系等内容，计算查询图像和目标图像之间的相似度，然后按照相似度的大小，即图像特征之间的匹配程度进行检索。因此，首先采用特征提取把图像库中的每一幅图像都转化为图像特征空间中的一个点，即对应的特征向量，然后，根据特征向量进行图像的检索，从而将基于内容的图像检索转化为对图像特征空间中特征点的检索。The basic idea of content-based image retrieval systems is to analyze the visual features of images and retrieve them in context. Its implementation method is to use an image database to store and manage image data, and then embed the content-based image retrieval technology as the engine of the database into the image database to provide the content-based image retrieval function. In the existing content-based image retrieval system, low-level image information is generally used, including the color, texture, shape of the image and the spatial relationship between them, etc., to calculate the similarity between the query image and the target image, and then according to The size of similarity, that is, the degree of matching between image features, is retrieved. Therefore, firstly, feature extraction is used to convert each image in the image library into a point in the image feature space, that is, the corresponding feature vector, and then the image retrieval is carried out according to the feature vector, thereby transforming the content-based image retrieval into It is the retrieval of feature points in the image feature space.

在图像数据库规模较小的情况下，最常用的图像特征检索方法是顺序扫描方法。但是，随着人们获取信息的手段不断发展和信息需求的不断增长，图像数据库的规模越来越大，传统的顺序扫描方法已经无法满足用户对于检索时间的要求。因此，通过对数据进行有效的组织以快速缩小检索范围，提高检索速度，从而建立一个高效的索引机制，是基于内容检索的关键所在。In the case of a small image database, the most commonly used image feature retrieval method is the sequential scanning method. However, with the continuous development of people's means of obtaining information and the continuous growth of information needs, the scale of image databases is getting larger and larger, and the traditional sequential scanning method has been unable to meet the user's requirements for retrieval time. Therefore, the key to content-based retrieval is to establish an efficient indexing mechanism by effectively organizing the data to rapidly narrow the retrieval range and improve retrieval speed.

在以往的相关研究中，研究者们针对特定的应用领域，提出了许多数据索引方法。然而，这些数据索引方法在处理高维数据时，都受到高维空间“维度灾难”的影响，当数据维度增大时，其检索性能退化到顺序扫描，甚至比顺序扫描性能还差。由于在CBIR研究中，从原始图像中提取的特征向量通常都是高维的，对于图像特征数据的索引不可避免的受到“维度灾难”的影响。Nister和Stewenius提出了基于词汇树的检索方法在高维空间表现出很好的检索效果，但是，其对高维空间的建树时间很长，难以满足现代数据库对检索时效性的要求。因此，针对图像特征数据的高维特性，建立高效的高维数据索引机制，是当前图像检索研究所面临的一个重要挑战。In previous related studies, researchers have proposed many data indexing methods for specific application fields. However, when processing high-dimensional data, these data indexing methods are all affected by the "dimension disaster" of high-dimensional space. When the data dimension increases, their retrieval performance degrades to sequential scan, even worse than sequential scan performance. Since feature vectors extracted from original images are usually high-dimensional in CBIR research, the indexing of image feature data is inevitably affected by the "curse of dimensionality". The retrieval method based on lexical tree proposed by Nister and Stewenius showed a good retrieval effect in high-dimensional space, but it took a long time to build a tree in high-dimensional space, and it was difficult to meet the requirements of modern databases for retrieval timeliness. Therefore, according to the high-dimensional characteristics of image feature data, establishing an efficient high-dimensional data indexing mechanism is an important challenge faced by current image retrieval research.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种使用随机量化词汇树的检索方法及基于其的图像检索方法，旨在解决现有技术中存在的上述缺陷。The present invention provides a retrieval method using a random quantized vocabulary tree and an image retrieval method based thereon, aiming at solving the above-mentioned defects in the prior art.

为达到上述技术目的，本发明采用如下技术方案：In order to achieve above-mentioned technical purpose, the present invention adopts following technical scheme:

使用随机量化词汇树的检索方法，包括以下步骤：A retrieval method using randomly quantized lexical trees, including the following steps:

(1)产生一个最近邻搜索树，将整个数据库的所有特征向量作为第一节的根节点，向下分节；(1) Generate a nearest neighbor search tree, take all the feature vectors of the entire database as the root node of the first section, and divide it downwards;

(2)第二级中，从整个数据库中随机选取k个点作为簇的中心，然后根据所选择的相似性度量方法，将每个特征向量分配到离其最近的簇中心，将整个数据库分为k个子集，继续向下分节；(2) In the second stage, k points are randomly selected from the entire database as the center of the cluster, and then each feature vector is assigned to the nearest cluster center according to the selected similarity measurement method, and the entire database is divided into For k subsets, continue to subsection down;

(3)第三级中，对于每一个从第二级获得的k个簇中，从它们的特征向量池中随机选取k个特征点作为其下一级的聚类中心，然后利用相似性度量方法将每个特征向量分配到离其最近的簇中心，从而在第三级上形成k²个簇；(3) In the third level, for each of the k clusters obtained from the second level, randomly select k feature points from their feature vector pools as the cluster centers of the next level, and then use the similarity measure method assigns each feature vector to its nearest cluster center, thus forming k ² clusters on the third level;

(4)重复步骤(2)、(3)，直至所有叶节点包含的特征向量都属于同一类对象或叶节点包含的特征向量的数量低于一定的限制；其中每个特征向量都有一个与它相关的类标签。(4) Repeat steps (2) and (3) until all the eigenvectors contained in leaf nodes belong to the same class of objects or the number of eigenvectors contained in leaf nodes is lower than a certain limit; each eigenvector has a Its associated class label.

在步骤(2)中，特征向量到两个或者两个以上的簇中心距离相等，则随机选择一个簇。In step (2), if the distance between the feature vector and the center of two or more clusters is equal, then one cluster is randomly selected.

在步骤(3)中，从特征向量池中选出的新的特征向量，将其分配到离其最近的簇中心，当到达分配的簇中心的叶节点时，如果该叶节点中所有特征向量点具有相同的类标号，则分配相关联的类标签给新的特征向量，然后停止运算；否则，重新在分配簇中进行搜索，选择簇中与新的特征向量距离最短的特征向量，并且将该特征向量相关联的类标签分配给新的特征向量，随后停止运算。In step (3), the new feature vector selected from the feature vector pool is assigned to the cluster center closest to it. When reaching the leaf node of the assigned cluster center, if all the feature vectors in the leaf node are If the points have the same class label, assign the associated class label to the new feature vector, and then stop the operation; otherwise, re-search in the assigned cluster, select the feature vector with the shortest distance from the new feature vector in the cluster, and set the The class label associated with that eigenvector is assigned to the new eigenvector, and the computation is then stopped.

一种基于随机量化词汇树的检索方法的图像检索方法，包括以下步骤：An image retrieval method based on a retrieval method of random quantized vocabulary tree, comprising the following steps:

(1)首先通过重叠分块方法，将图像分成若干相互重叠的子区域；(1) First, the image is divided into several overlapping sub-regions by the overlapping block method;

(2)将图像的特征信息块与其语义特征相结合；由于每一个提取的特征向量，对应于特征空间的一个点，通过对特征空间中的特征点进行无指导学习(即数据挖掘中的聚类)，将特征向量库中的所有特征向量划分成多个模式，使得在同一个类中的模式之间具有更多的相似性，在不同类中的模式之间具有较大的相异性；通过类标号对特征点进行标记，每一个类标号具有特定的语义信息，将图像特征信息块与语义特征相结合，对图像的不同区域进行解释，以建立图像知识库；(2) Combine the feature information blocks of the image with their semantic features; since each extracted feature vector corresponds to a point in the feature space, by performing unguided learning on the feature points in the feature space (that is, clustering in data mining) class), dividing all feature vectors in the feature vector library into multiple patterns, so that there are more similarities between patterns in the same class and greater dissimilarity between patterns in different classes; The feature points are marked by class labels, each class label has specific semantic information, and the image feature information blocks are combined with semantic features to interpret different areas of the image to establish an image knowledge base;

(3)产生一个最近邻搜索树，将图像知识库中的所有图像特征向量作为第一节的根节点，向下分节；(3) Generate a nearest neighbor search tree, take all image feature vectors in the image knowledge base as the root node of the first section, and divide it downwards;

(4)第二级中，从图像知识库中随机选取k个点作为簇的中心，然后根据所选择的相似性度量方法，将每个图像特征向量分配到离其最近的簇中心，将整个数据库分为k个子集，继续向下分节；(4) In the second stage, k points are randomly selected from the image knowledge base as the center of the cluster, and then according to the selected similarity measurement method, each image feature vector is assigned to the nearest cluster center, and the entire The database is divided into k subsets, which continue to be sub-sectioned down;

(5)第三级中，对于每一个从第二级获得的k个簇中，从它们的特征向量池中随机选取k个特征点作为其下一级的聚类中心，然后利用相似性度量方法将每个图像特征向量分配到离其最近的簇中心，从而在第三级上形成k²个簇；(5) In the third level, for each of the k clusters obtained from the second level, randomly select k feature points from their feature vector pools as the cluster centers of the next level, and then use the similarity measure The method assigns each image feature vector to its nearest cluster center, thus forming k ² clusters on the third level;

(6)重复步骤(2)、(3)，直至所有叶节点包含的图像特征向量都属于同一类对象或叶节点包含的图像特征向量的数量低于一定的限制；其中每个图像特征向量都有一个与它相关的类标签。(6) Repeat steps (2) and (3) until the image feature vectors contained in all leaf nodes belong to the same class of objects or the number of image feature vectors contained in leaf nodes is lower than a certain limit; There is a class label associated with it.

在步骤(1)中，重叠分块方法将大小为height×weight的图像用N×N 的窗口进行划分，行和列方向按Nhop个像素移位，划分成若干相互重叠的子区域，为了使图像中包含的足够小的对象可以被检测出来，缩小方块窗口尺寸，增加补丁数量；通过相互重叠子区间，将颜色直方图与颜色的空间分布相结合。In step (1), the overlapping block method divides an image with a size of height×weight into an N×N window, and shifts the row and column directions by Nhop pixels, and divides it into several overlapping sub-regions. Small enough objects contained in the image can be detected, reducing the size of the square window and increasing the number of patches; by overlapping the subintervals with each other, the color histogram is combined with the spatial distribution of colors.

在步骤(2)中，建立图像知识库过程中，使用Chameleon聚类算法和基于MST的聚类算法，对图像的彩色直方图特征向量库进行聚类，对聚类结果设置类标号，建立基于彩色直方图特征的知识库。In step (2), in the process of establishing the image knowledge base, use the Chameleon clustering algorithm and the MST-based clustering algorithm to cluster the color histogram feature vector library of the image, set the class label for the clustering result, and establish the Knowledge base of color histogram features.

采用以上技术方案，具有如下有益效果：Adopting the above technical scheme has the following beneficial effects:

(1)本发明的图像检索方法克服了现有技术中词汇树建立需要大量的时间的问题，可以在很短的时间内建立词汇树，满足实时性要求；(1) The image retrieval method of the present invention overcomes the problem that the establishment of a vocabulary tree in the prior art requires a large amount of time, and can establish a vocabulary tree in a very short time to meet real-time requirements;

(2)本发明通过使用重叠分块方法，将图像细化成多个块提取图像的彩色直方图作为特征向量库，有效的将图像的彩色直方图与颜色空间信息相结合，克服了现有技术中图像特征提取时忽略颜色的空间特性这一问题；(2) The present invention uses the overlapping block method to refine the image into multiple blocks to extract the color histogram of the image as a feature vector library, effectively combining the color histogram of the image with the color space information, overcoming the prior art. The problem of ignoring the spatial characteristics of color in image feature extraction;

(3)本发明可以更加快速的提取图片特征，满足实时性要求，同时对特征数据库进行无指导学习，通过区域标记对场景图像的不同区域进行标记，形成知识库。(3) The present invention can extract image features more quickly to meet real-time requirements, and at the same time conduct unguided learning on the feature database, and mark different regions of the scene image through region marking to form a knowledge base.

附图说明Description of drawings

图1是本发明的示意图；Fig. 1 is the schematic diagram of the present invention;

图2是本发明重叠分块法的示意图；Fig. 2 is the schematic diagram of overlapping block method of the present invention;

图3是21-14的10648维RGB直方图的聚类结果；Figure 3 is the clustering result of the 10648-dimensional RGB histogram of 21-14;

图4是24-16的5000维HSV直方图的聚类结果；Fig. 4 is the clustering result of the 5000-dimensional HSV histogram of 24-16;

图5是24-16的5832维Opponent直方图的聚类结果；Fig. 5 is the clustering result of the 5832-dimensional Opponent histogram of 24-16;

图6是21-14的10648维Transformed直方图的聚类结果；Figure 6 is the clustering result of the 10648-dimensional Transformed histogram of 21-14;

图7是21-14组RGB直方图准确率对比图；Figure 7 is a comparison chart of the accuracy of 21-14 groups of RGB histograms;

图8是24-16组RGB直方图准确率对比图；Figure 8 is a comparison chart of the accuracy of 24-16 groups of RGB histograms;

图9是27-18组RGB直方图准确率对比图。Figure 9 is a comparison chart of the accuracy of 27-18 groups of RGB histograms.

具体实施例specific embodiment

下面结合附图、实施例，对本方案进行进一步说明。The solution will be further described below in conjunction with the accompanying drawings and embodiments.

如图1所示，使用随机量化词汇树的检索方法，包括以下步骤：As shown in Figure 1, the retrieval method using random quantized vocabulary tree includes the following steps:

对于大型数据库，随机量化树的检索方法选择遵循词汇树的思想，但在其基础上做了一个重要的改进，产生一个最近邻搜索树。如图1 所示，给定数据库，在第一级中只有一个节点，包含所有的特征向量，成为根节点。第二级，从整个数据库中随机选取K个点作为簇的中心，然后根据所选择的相似性度量方法，将每个特征向量分配到离其最接近的簇中心，将整个数据库分为K个子集。在第三级，对于每一个从第二级获得的K个簇中，从它们的特征向量池中随机选取K个特征点作为其下一级的聚类中心，然后利用相似性度量方法将每个特征向量分配到离其最近的簇中心，从而在第三级上形成K个簇。继续这个过程，直到所有叶节点包含的特征向量都属于同一类对象(即，该节点是纯净的)或叶节点包含的特征向量的数量低于一定的限制(例如， 50)。每个特征向量都有一个与它相关的类标签。For large databases, the retrieval method selection of random quantization tree follows the idea of lexical tree, but an important improvement is made on its basis to generate a nearest neighbor search tree. As shown in Figure 1, given the database, there is only one node in the first level, containing all the feature vectors, which becomes the root node. In the second stage, K points are randomly selected from the entire database as the center of the cluster, and then according to the selected similarity measure method, each feature vector is assigned to the cluster center closest to it, and the entire database is divided into K subsections set. In the third level, for each of the K clusters obtained from the second level, K feature points are randomly selected from their feature vector pools as the cluster centers of the next level, and then the similarity measurement method is used to classify each Each feature vector is assigned to its nearest cluster center, thus forming K clusters on the third level. Continue this process until all leaf nodes contain feature vectors that belong to the same class of objects (ie, the node is pure) or the number of leaf nodes contains feature vectors below a certain limit (eg, 50). Each feature vector has a class label associated with it.

在对树进行分支时，通过一个最近邻搜索树，每个数据到其他数据项的距离可以被更新到一个更小的值。这种策略保证了空间中最接近的数据点更可能被分配到同一分区中。然而，因为分区中的任意一个数据点，相比于其他分区的中心，都最接近它自己的簇中心(并不是最近邻)，如果数据点到两个或更多个簇的中心的距离相等，数据点则随机选择一个簇。When branching the tree, the distance of each data item to other data items can be updated to a smaller value through a nearest neighbor search tree. This strategy ensures that the closest data points in space are more likely to be assigned to the same partition. However, since any data point in a partition is closest to its own cluster center (and not the nearest neighbor) compared to the centers of other partitions, if the data point is equidistant from the centers of two or more clusters , the data points are randomly selected in a cluster.

通过随机量化树搜索给定的新的特征向量。新的特征向量沿着随机量化树的某一特定路径，在每一层上计算该特征向量到K个簇中心的距离，结果是这个新的特征向量到K个簇中心点距离最近的一个簇中心点。当到达叶节点时，如果该叶节点是纯净的(即，叶节点中所有特征向量点具有相同的类标号)，分配相关联的类标签给新的特征向量，然后停止运算。否则，在相关簇的向量中做一个最近邻搜索，搜索的结果是根据所选择的相似性度量方法得到的最短距离的特征向量，并将该特征向量相关联的类标签分配给新的特征向量，然后停止运算。Search through a random quantization tree for a given new feature vector. The new feature vector follows a specific path of the random quantization tree, and at each layer calculates the distance from the feature vector to the K cluster centers. The result is the cluster with the closest distance from the new feature vector to the K cluster center points. center point. When reaching a leaf node, if the leaf node is clean (ie, all feature vector points in the leaf node have the same class label), assign the associated class label to the new feature vector, and then stop the operation. Otherwise, do a nearest neighbor search in the vector of related clusters, the result of the search is the feature vector with the shortest distance obtained according to the selected similarity measure method, and assign the class label associated with the feature vector to the new feature vector , and then stop the operation.

如图2所示，在步骤(1)中，重叠分块方法将大小为height×weight 的图像用N×N的窗口进行划分，行和列方向按Nhop个像素移位，划分成若干相互重叠的子区域，为了使图像中包含的足够小的对象可以被检测出来，缩小方块窗口尺寸，增加补丁数量；通过相互重叠子区间，将颜色直方图与颜色的空间分布相结合。用重叠子区间将大小的图像划分为多个方块：As shown in Figure 2, in step (1), the overlapping block method divides an image with a size of height×weight into an N×N window, and the row and column directions are shifted by Nhop pixels and divided into several overlapping blocks. In order to make small enough objects contained in the image to be detected, the size of the square window is reduced and the number of patches is increased; by overlapping the sub-regions, the color histogram is combined with the spatial distribution of colors. Divide an image of size into squares with overlapping subintervals:

行数：blockrows＝(height-N)/Nhop+1Number of rows: blockrows=(height-N)/Nhop+1

列数：blockcols＝(weidth-N)/Nhop+1Number of columns: blockcols=(weidth-N)/Nhop+1

方块补丁数：numofSamples＝blockrows×blockcolsNumber of block patches: numofSamples=blockrows×blockcols

图像大小为height×weight个像素，产生的处理后的图像的尺寸大小由补丁窗口的大小决定。通过图片的补丁窗口大小变化，产生的方块补丁的数量也会发生变化。当补丁窗口缩小时，处理图像产生的方块补丁数量会增加，反之亦然。此时，处理后的图像中，每一个像素用一个方块补丁的颜色直方图表示。方块窗口的大小直接影响了处理后的图像的分辨率。因此，这限制了补丁方块窗口的尺寸不能过大，使得需要用较多数量的补丁来表示完整的图像信息。即意味着每幅图像的补丁数增多。The image size is height×weight pixels, and the size of the resulting processed image is determined by the size of the patch window. By changing the size of the patch window of the image, the number of generated square patches will also change. As the patch window shrinks, the number of square patches produced by processing the image increases, and vice versa. At this point, in the processed image, each pixel is represented by a color histogram of a square patch. The size of the square window directly affects the resolution of the processed image. Therefore, this restricts the size of the patch square window from being too large, so that a larger number of patches are required to represent the complete image information. That means more patches per image.

单纯的对图像进行分块提取其颜色直方图，只是将图像单独地划分成没有任何语义信息的块。将图像的特征信息块与其语义特征相结合是建立图像知识库的目的。由于每一个提取的特征向量，对应于特征空间的一个点，通过对特征空间中的特征点进行无指导学习(即数据挖掘中的聚类)，将特征向量库中的所有特征向量划分成多个模式，使得在同一个类中的模式之间具有更多的相似性，在不同类中的模式之间具有较大的相异性。通过类标号对特征点进行标记，每一个类标号具有特定的语义信息，将图像特征信息块与语义特征相结合，对图像的不同区域进行解释。Simply dividing the image into blocks to extract its color histogram just divides the image into blocks without any semantic information. Combining the feature information blocks of an image with its semantic features is the purpose of establishing an image knowledge base. Since each extracted feature vector corresponds to a point in the feature space, all feature vectors in the feature vector library are divided into multiple patterns, so that there are more similarities between patterns in the same class and greater dissimilarity between patterns in different classes. The feature points are marked by class labels, each class label has specific semantic information, and the image feature information blocks are combined with semantic features to explain different areas of the image.

下面通过实验的方式，来进一步表明本发明的优越性。The advantages of the present invention are further demonstrated by means of experiments below.

图像特征数据的聚类Clustering of Image Feature Data

对特征向量数据库分别采用chameleon聚类算法和基于MST 的聚类算法进行聚类，将图像中的物体划分成多个簇，并对簇进行标记，形成知识库。并采用彩色图像表现聚类后的数据库。通过将原始图像集、基于MST的聚类结果图像、Chameleon聚类结果图像三者进行对比，验证本文对图像特征的提取方法的可行性。The feature vector database is clustered by chameleon clustering algorithm and MST-based clustering algorithm, and the objects in the image are divided into multiple clusters, and the clusters are marked to form a knowledge base. And use color images to represent the clustered database. By comparing the original image set, the MST-based clustering result image, and the Chameleon clustering result image, the feasibility of the image feature extraction method in this paper is verified.

实验中，由于图像数据集中图像比较多，我们仅对数据库中的部分图像聚类结果进行展示，同时由于特征数据库具有多组，我们分别对RGB空间，HSV空间，Opponent空间及Transformed空间的一组聚类结果进行展示。In the experiment, due to the large number of images in the image dataset, we only show the clustering results of some images in the database. At the same time, because the feature database has multiple groups, we separately analyze a group of RGB space, HSV space, Opponent space and Transformed space. The clustering results are displayed.

表1基于RGB颜色直方图RGB知识库中主要场景及其语义关系Table 1 Main scenes and their semantic relationships in the RGB knowledge base based on RGB color histogram

如图3所示，图中为21-14的10648维RGB直方图的聚类结果。As shown in Figure 3, the figure shows the clustering result of the 10648-dimensional RGB histogram of 21-14.

表2基于HSV颜色直方图HSV知识库主要场景及其语义关系Table 2 Main scenarios and their semantic relationships in HSV knowledge base based on HSV color histogram

如图4所示，图中为24-16的5000维HSV直方图的聚类结果。As shown in Figure 4, the figure shows the clustering results of the 5000-dimensional HSV histogram of 24-16.

表3基于Opponent颜色直方图Opponent知识库主要场景及其语义关系Table 3 The main scenes and their semantic relationships in the Opponent knowledge base based on the Opponent color histogram

如图5所示，图中为24-16的5832维Opponent直方图的聚类结果。As shown in Figure 5, the figure shows the clustering results of the 5832-dimensional Opponent histogram of 24-16.

表4基于Transformed颜色直方图Transformed知识库中主要场景及其语义关系Table 4 Main scenes and their semantic relationships in the Transformed knowledge base based on the Transformed color histogram

如图6所示，图为21-14的10648维Transformed直方图的聚类结果。As shown in Figure 6, the figure is the clustering result of the 10648-dimensional Transformed histogram of 21-14.

提取特征速度Extract feature speed

选用图像大小为720×1280个像素，采用21×21补丁窗口大小，移动像素值设置为14(表示为21-14)，则处理后的图像的尺寸大小为 50×90像素；当采用24×24补丁窗口大小时，移动像素值设置为16(表示为24-16)，则处理后的图像大小为；当采用27×27补丁窗口大小时，移动像素值设置为18(表示为27-18)，则处理后的图像大小为 39×70。The image size is 720×1280 pixels, the patch window size is 21×21, and the moving pixel value is set to 14 (represented as 21-14), then the size of the processed image is 50×90 pixels; When the size of the patch window is 24, the moving pixel value is set to 16 (represented as 24-16), then the size of the processed image is ), the size of the processed image is 39×70.

通过更加细化的重叠分块方法，将图像划分成4500个块、3476 个块、2730个块，并提取每个块的彩色直方图。以往的图像分块，主要提取图像的灰度颜色直方图，或者HSV颜色直方图，本文中，提取图像的RGB颜色直方图、HSV颜色直方图、Opponent颜色直方图及Transformed颜色直方图，建立图像特征向量库。Through a more refined overlapping block method, the image is divided into 4500 blocks, 3476 blocks, and 2730 blocks, and the color histogram of each block is extracted. The previous image segmentation mainly extracts the grayscale color histogram or HSV color histogram of the image. In this paper, the RGB color histogram, HSV color histogram, Opponent color histogram and Transformed color histogram of the image are extracted to establish an image. Eigenvector library.

实验中，对720×1280大小的35幅图像，分别采用21-14，24-16，27-18三种重叠分块方法提取图像的RGB颜色直方图10000维、HSV 颜色直方图10648维、Opponent颜色直方图10000维、Transformed 颜色直方图10000维、Gabor纹理特征，以及提取相同图像的SIFT 特征点。In the experiment, for 35 images with a size of 720×1280, three overlapping block methods of 21-14, 24-16, 27-18 were used to extract the RGB color histogram of the image with 10000 dimensions, HSV color histogram with 10648 dimensions, Opponent Color histogram with 10000 dimensions, Transformed color histogram with 10000 dimensions, Gabor texture features, and SIFT feature points for extracting the same image.

本实验分别统计35幅图片不同特征的提取时间，每幅图像特征提取的平均值如下表所示。In this experiment, the extraction time of different features of 35 images was counted, and the average value of each image feature extraction is shown in the following table.

表5table 5

表5显示：采用相同的图像分块方法，即使提取图像的高维颜色直方图，其提取速度明显优于Gabor纹理特征的提取方法速度；同时，提取整幅图像的SIFT特征点花费的时间也比分块后提取其颜色直方图的时间缓慢很多。因此，本文采用的特征提取方式可以快速的提取图像特征。Table 5 shows that using the same image segmentation method, even if the high-dimensional color histogram of the image is extracted, the extraction speed is significantly better than that of the Gabor texture feature extraction method; at the same time, the time spent to extract the SIFT feature points of the entire image is Much slower than extracting its color histogram after binning. Therefore, the feature extraction method adopted in this paper can quickly extract image features.

准确率Accuracy

由于最近邻检索是检索对象与其最近邻属于同一类，其检索准确度最高。我们以最近邻作为标准检索结果集。通过将词汇树的检索结果，随机量化树检索的结果与最近邻检索结果进行比较，分析两种树的检索准确率。Since the nearest neighbor retrieval is that the retrieval object and its nearest neighbor belong to the same class, the retrieval accuracy is the highest. We retrieve the result set with nearest neighbors as the criterion. By comparing the retrieval results of the lexical tree, the retrieval results of the random quantization tree and the nearest neighbor retrieval results, the retrieval accuracy of the two trees is analyzed.

RGB直方图准确率对比RGB histogram accuracy comparison

首先，我们针对RGB颜色空间中不同组(21-14，24-16，27-18) 的多个维度(64维，125维，216维，512维，1000维，2744维，5832 维，10648维)，共24组训练集进行检索，对比结果表2，表3，表4 所示，其中KQtree为随机量化词汇树，VTree为传统的词汇树。First, we target multiple dimensions (64, 125, 216, 512, 1000, 2744, 5832, 10648) of different groups (21-14, 24-16, 27-18) in the RGB color space Dimension), a total of 24 training sets were retrieved, and the comparison results are shown in Table 2, Table 3, and Table 4, where KQtree is a random quantized vocabulary tree, and VTree is a traditional vocabulary tree.

RGB直方图准确率对比结果：从图7、图8、图9显示出，随机量化树的准确率明显高于词汇树的准确率。RGB histogram accuracy comparison results: As shown in Figure 7, Figure 8, and Figure 9, the accuracy of the random quantization tree is significantly higher than that of the vocabulary tree.

在图7，随机量化树的准确率在1000维最高，达到83.03％，并且显示出高维的检索结果比低维检索结果好。在图8中，随机量化树在维数为5832时准确率最高，达到86.73％，也显示出高维的检索结果比低维好。在图9中，随机量化树在512维准确率最高，达到85.49％，高维数据检索效果比低维数据检索效果好，但是不明显，中间维度的检索效果最好。In Figure 7, the accuracy of random quantization tree is the highest in 1000-dimension, reaching 83.03%, and it is shown that the retrieval results of high-dimension are better than those of low-dimension. In Figure 8, the random quantization tree has the highest accuracy when the dimension is 5832, reaching 86.73%, which also shows that the retrieval result of high dimension is better than that of low dimension. In Figure 9, the random quantization tree has the highest accuracy in 512 dimensions, reaching 85.49%. The retrieval effect of high-dimensional data is better than that of low-dimensional data, but it is not obvious, and the retrieval effect of intermediate dimensions is the best.

图7、图8、图9一起对比，我们发现RGB颜色直方图总体上在高维数据空间的检索效果比低维数据空间的检索效果好，同时，针对于不同的方块窗口大小，窗口越小，方块补丁数越多，图像信息越丰富，但并不是方块补丁越多，图像的检索准确率越高，不同维度具有不同的表现，没有统一规律。Comparing Figure 7, Figure 8, and Figure 9 together, we find that the retrieval effect of RGB color histograms in high-dimensional data space is generally better than that in low-dimensional data space. At the same time, for different square window sizes, the smaller the window is. , the more square patches, the richer the image information, but not the more square patches, the higher the retrieval accuracy of the image, different dimensions have different performance, there is no uniform rule.

建立词汇树时间Build vocabulary tree time

随机量化树和词汇树在RGB直方图中不同组(21-14，24-16， 27-18)的多个维度(64维，125维，216维，512维，1000维，2744 维，5832维，10648维)的运行时间，如表6、表7、表8所示，单位为秒，其中KQtree为随机量化词汇树，VTree为传统的词汇树。Randomized quantization tree and lexical tree in RGB histogram in multiple dimensions (64, 125, 216, 512, 1000, 2744, 5832) of different groups (21-14, 24-16, 27-18) dimension, 10648 dimensions), as shown in Table 6, Table 7, and Table 8, in seconds, where KQtree is a random quantized vocabulary tree, and VTree is a traditional vocabulary tree.

表6窗口21×21，移位14，随机量化词汇树和词汇树在RGB直方图不同维度运行时间Table 6 Window 21×21, shift 14, random quantization vocabulary tree and vocabulary tree running time in different dimensions of RGB histogram

表7窗口24×24，移位16，随机量化词汇树和词汇树在RGB直方图不同维度运行时间Table 7 Window 24×24, shift 16, random quantization vocabulary tree and vocabulary tree running time in different dimensions of RGB histogram

表8窗口27×27，移位18，随机量化词汇树和词汇树在RGB直方图不同维度运行时间Table 8 Window 27×27, shift 18, random quantization vocabulary tree and vocabulary tree running time in different dimensions of RGB histogram

在RGB直方图中，随机量化树的运行速度明显比词汇树的运行速度快。随着RGB直方图维数的增加，词汇树运行时间比随机量化树运行时间成几何倍数增加。In the RGB histogram, the random quantization tree runs significantly faster than the vocabulary tree. As the dimension of the RGB histogram increases, the running time of the lexical tree increases exponentially compared to the running time of the random quantization tree.

随机量化词汇树和词汇树在Opponent直方图中不同组(21-14， 24-16，27-18)的多个维度(64维，125维，216维，512维，1000 维，2744维，5832维，10648维)的运行时间，如表9，10，11所示，单位为秒。Randomized quantization of vocabulary tree and vocabulary tree in Opponent histogram of different groups (21-14, 24-16, 27-18) in multiple dimensions (64-dimensional, 125-dimensional, 216-dimensional, 512-dimensional, 1000-dimensional, 2744-dimensional, 5832 dimensions, 10648 dimensions), as shown in Tables 9, 10, and 11, in seconds.

表9窗口21×21，移位14，随机量化词汇树和词汇树在Opponent直方图不同维度运行时间Table 9 Window 21×21, shift 14, random quantization vocabulary tree and vocabulary tree running time in different dimensions of Opponent histogram

表10窗口24×24，移位16，随机量化词汇树和词汇树在Opponent直方图不同维度运行时间Table 10 Windowed 24×24, Shift 16, Randomized Quantized Vocabulary Tree and Vocabulary Tree Running Time at Different Dimensions of Opponent Histogram

表11窗口27×27，移位18，随机量化词汇树和词汇树在Opponent直方图不同维度运行时间Table 11 Window 27×27, shift 18, random quantization vocabulary tree and vocabulary tree running time in different dimensions of Opponent histogram

在Opponent直方图中，随机量化树的运行速度明显比词汇树的运行速度快。随着Opponent直方图维数的增加，词汇树运行时间比随机量化树运行时间成几何倍数增加。In the Opponent histogram, the random quantization tree runs significantly faster than the vocabulary tree. As the dimension of the Opponent histogram increases, the running time of the lexical tree increases exponentially compared to the running time of the random quantization tree.

综上可以看出，随机量化词汇树的检索方法在时间效率上，明显优于词汇树。To sum up, it can be seen that the retrieval method of random quantization vocabulary tree is obviously better than vocabulary tree in terms of time efficiency.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员，在不脱离本发明原理的前提下，还可以做出若干改进和补充，这些改进和补充也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and supplements can be made, and these improvements and supplements should also be considered as It is the protection scope of the present invention.

Claims

1. an image retrieval method based on the retrieval method of random quantization vocabulary tree, is characterized in that, the retrieval method that uses random quantization vocabulary tree comprises the following steps:

(1) Generate a nearest neighbor search tree, take all the feature vectors of the entire database as the root node of the first section, and divide it downwards;

(2) In the second stage, k points are randomly selected from the entire database as the center of the cluster, and then each feature vector is assigned to the nearest cluster center according to the selected similarity measurement method, and the entire database is divided into For k subsets, continue to segment downwards; if the distance between the feature vector and the center of two or more clusters is equal, a cluster is randomly selected;

(3) In the third level, for each of the k clusters obtained from the second level, randomly select k feature points from their feature vector pools as the cluster centers of the next level, and then use the similarity measure The method assigns each feature vector to its nearest cluster center, thereby forming k ² clusters on the third level; when reaching the leaf node of the assigned cluster center, if all feature vector points in the leaf node have the same class label, assign the associated class label to the new feature vector, and then stop the operation; otherwise, re-search in the assigned cluster, select the feature vector with the shortest distance from the new feature vector in the cluster, and correlate the feature vector The associated class label is assigned to the new feature vector, and then the operation is stopped;

(4) Repeat steps (2) and (3) until all the eigenvectors contained in leaf nodes belong to the same class of objects or the number of eigenvectors contained in leaf nodes is lower than a certain limit; each eigenvector has a its associated class label;

The image retrieval method includes the following steps:

(1) First, the image is divided into several overlapping sub-regions by the overlapping block method; specifically, the overlapping block method divides the image with the size of height×weight into N×N windows, rows and columns. The direction is shifted by Nhop pixels and divided into several overlapping sub-regions. In order to make the small enough objects contained in the image can be detected, the size of the square window is reduced and the number of patches is increased; by overlapping the sub-regions, the color histogram is The graph is combined with the spatial distribution of colors;

(2) Combine the feature information blocks of the image with their semantic features; since each extracted feature vector corresponds to a point in the feature space, by performing unguided learning on the feature points in the feature space (that is, clustering in data mining) class), dividing all feature vectors in the feature vector library into multiple patterns, so that there are more similarities between patterns in the same class and greater dissimilarity between patterns in different classes; The feature points are marked by class labels, each class label has specific semantic information, and the image feature information blocks are combined with semantic features to interpret different areas of the image to establish an image knowledge base; specifically, to establish an image knowledge base In the library process, use the Chameleon clustering algorithm and the clustering algorithm based on MST to cluster the color histogram feature vector library of the image, set the class label to the clustering result, and establish a knowledge base based on the color histogram feature;

(3) Generate a nearest neighbor search tree, take all image feature vectors in the image knowledge base as the root node of the first section, and divide it downwards;

(4) In the second stage, k points are randomly selected from the image knowledge base as the center of the cluster, and then according to the selected similarity measurement method, each image feature vector is assigned to the nearest cluster center, and the entire The database is divided into k subsets, which continue to be sub-sectioned down;

(5) In the third level, for each of the k clusters obtained from the second level, randomly select k feature points from their feature vector pools as the cluster centers of the next level, and then use the similarity measure The method assigns each image feature vector to its nearest cluster center, thus forming k ² clusters on the third level;

(6) Repeat steps (2) and (3) until the image feature vectors contained in all leaf nodes belong to the same class of objects or the number of image feature vectors contained in leaf nodes is lower than a certain limit; There is a class label associated with it.