CN112214623A - Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method - Google Patents
Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method Download PDFInfo
- Publication number
- CN112214623A CN112214623A CN202010943065.5A CN202010943065A CN112214623A CN 112214623 A CN112214623 A CN 112214623A CN 202010943065 A CN202010943065 A CN 202010943065A CN 112214623 A CN112214623 A CN 112214623A
- Authority
- CN
- China
- Prior art keywords
- sample
- image
- matrix
- text
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 56
- 230000006870 function Effects 0.000 claims abstract description 36
- 238000013507 mapping Methods 0.000 claims abstract description 16
- 238000013139 quantization Methods 0.000 claims abstract description 9
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 238000002372 labelling Methods 0.000 claims abstract 2
- 238000012549 training Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 8
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/41—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及多媒体技术领域,具体为一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法。The invention relates to the technical field of multimedia, in particular to an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples.
背景技术Background technique
随着网络技术和便携式移动设备的快速发展,越来越多的人习惯于通过网络分享生活中的点滴,例如某人过生日的时候,通过微信、脸谱等社交软件发布生日照片(图像)并描述自己的心情(文本)等,使得网络上的数据爆炸式增长,用户如何在海量数据中查找所需的信息成为一个挑战。一方面,网络上数据量大,而且样本特征的维度通常非常高,甚至可达上万维。传统的检索方法需要计算查询样本与所有待检索样本的距离,例如欧式距离、余弦距离等,这会造成过高的计算复杂度和内存开销。另一方面,网络上数据的模态多,而且各模态表示异构,如何衡量异构样本的相似度成为一个挑战。跨媒体哈希方法可以很好地解决上面两个问题。监督跨媒体哈希方法,可以利用含有高层语义的类别标签学习哈希码,提升了哈希码的区分能力,取得了令人满意的检索性能。但大部分方法存在以下问题,需要更进一步解决:1)大多数方法不能充分利用类别标签提升哈希码的性能,现有方法主要通过保持基于两两相似矩阵的相似度学习哈希码,然而两两相似矩阵不仅会造成类别信息的丢失,而且会导致较高的计算复杂度和内存开销;2)大部分现有的离散哈希方法在优化过程中对哈希码矩阵进行逐位求解,会导致较高的计算复杂度。本发明提出一种面向图文样本的监督图嵌入的高效哈希检索方法,可以有效的解决上面的问题。首先为了更好保持样本的语义相似性,本发明提出同时保持样本的模态间、模态内的语义相似性和基于类别标签的相似性,学习哈希码和线性映射矩阵,并学习一个正交旋转矩阵减少量化误差,进一步提升哈希码的区分能力。然后,提出一种迭代优化算法,不仅可直接得到样本的哈希码闭合的离散解,而且降低了算法的计算复杂度。With the rapid development of network technology and portable mobile devices, more and more people are accustomed to sharing moments in life through the network. Describing one's own mood (text), etc., makes the data on the Internet explode, and it becomes a challenge for users to find the required information in the massive data. On the one hand, the amount of data on the network is large, and the dimension of sample features is usually very high, even reaching tens of thousands of dimensions. The traditional retrieval method needs to calculate the distance between the query sample and all the samples to be retrieved, such as Euclidean distance, cosine distance, etc., which will cause high computational complexity and memory overhead. On the other hand, there are many modalities of data on the network, and each modal represents heterogeneity. How to measure the similarity of heterogeneous samples becomes a challenge. The cross-media hashing method can solve the above two problems well. The supervised cross-media hashing method can learn hash codes by using category labels with high-level semantics, which improves the distinguishing ability of hash codes and achieves satisfactory retrieval performance. However, most methods have the following problems, which need to be further solved: 1) Most methods cannot make full use of category labels to improve the performance of hash codes. Existing methods mainly learn hash codes by maintaining the similarity based on pairwise similarity matrices. The pairwise similarity matrix will not only cause the loss of category information, but also lead to high computational complexity and memory overhead; 2) Most of the existing discrete hashing methods solve the hash code matrix bit by bit in the optimization process, will lead to higher computational complexity. The present invention proposes an efficient hash retrieval method for image-text sample-oriented supervised graph embedding, which can effectively solve the above problems. First of all, in order to better maintain the semantic similarity of samples, the present invention proposes to simultaneously maintain the semantic similarity between modalities and within modalities of samples and the similarity based on class labels, learn hash codes and linear mapping matrices, and learn a positive The cross-rotation matrix reduces the quantization error and further improves the distinguishing ability of the hash code. Then, an iterative optimization algorithm is proposed, which not only can directly obtain the discrete solution of the hash code closure of the sample, but also reduces the computational complexity of the algorithm.
发明内容SUMMARY OF THE INVENTION
本发明的目的是克服现有技术缺陷,提供了一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法,其特征在于其利用计算机装置实现如下步骤:The purpose of the present invention is to overcome the deficiencies in the prior art, and provides a highly efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples, which is characterized in that it utilizes a computer device to implement the following steps:
步骤1、从网络收集图像和文本样本,并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集,标注图文样本对的类别,并将图文样本对划分为训练集和测试集;Step 1. Collect image and text samples from the network, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, label the categories of graphic-text sample pairs, and divide graphic-text sample pairs into training sets. and test set;
步骤2、提取训练集和测试集中所有图像和文本样本的特征,并对特征进行归一化和去均值;Step 2. Extract the features of all images and text samples in the training set and the test set, and normalize and de-average the features;
步骤3、训练集中的图文样本对的特征用表示,其中,分别为训练集中所有图像样本、文本样本的特征,,表示实数,表示特征的维度,表示训练集中图文样本对的数量,表示样本对的类别标签,其中表示总类别的数量,表示图文样本对的数量;随机选择个样本对作为锚点,其中,,利用高斯径向基函数将所有图像样本、文本样本的特征映射到非线性空间:Step 3. The features of the image-text sample pairs in the training set are used said, of which , are the features of all image samples and text samples in the training set, respectively, , represents a real number, represents the dimension of the feature, represents the number of image-text sample pairs in the training set, class labels representing pairs of samples, where represents the number of total categories, Indicates the number of image-text sample pairs; randomly selected sample pairs as an anchor, where , , using the Gaussian radial basis function to map the features of all image samples and text samples to the nonlinear space:
其中为尺度参数,表示范数,表示矩阵或向量的转置;in is the scale parameter, express norm, represents the transpose of a matrix or vector;
步骤4、利用图文样本对的类别标签构造样本对的图邻接矩阵,表示实数,其定义如下:Step 4. Construct the graph adjacency matrix of the sample pair by using the class label of the image-text sample pair , represents a real number, which is defined as follows:
其中,表示矩阵的第行第列的数值,表示范数;in, representation matrix First row the value of the column, express norm;
步骤5、进一步得到图邻接矩阵的Laplace矩阵,其中是 的对角阵,其对角元素;Step 5. Further obtain the graph adjacency matrix Laplace matrix ,in Yes the diagonal matrix of , whose diagonal elements ;
步骤6、基于上述步骤1-步骤5的变量,利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数,其定义如下:Step 6. Based on the variables of the above steps 1 to 5, the objective function of the method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:
其中、、、、和为权重参数,和分别表示为图像和文本样本模态学习的线性投影矩阵,表示哈希码的长度,表示矩阵的迹,为线性映射矩阵,为图文样本对学习的哈希码,为正交旋转矩阵,表示尺寸为的单位阵,表示正则化项;in , , , , and is the weight parameter, and are denoted as the linear projection matrices learned by the image and text sample modalities, respectively, represents the length of the hash code, represents the trace of the matrix, is a linear mapping matrix, is the learned hash code for the image-text sample pair, is an orthogonal rotation matrix, Indicates the size of the unit array, represents the regularization term;
步骤7、利用迭代优化算法求解目标函数,具体包括以下步骤:Step 7. Use an iterative optimization algorithm to solve the objective function, which specifically includes the following steps:
步骤71、固定,,和求解:去除与 无关的项,则目标函数变为:Step 71, fix , , and solve : remove with irrelevant terms, the objective function becomes:
对上式求关于的导数,并令其等于0,则可得:Find about the above formula the derivative of , and make it equal to 0, then we get:
由于Laplace矩阵的尺寸是的,所以计算的计算复杂度和内存开销都为,限制了本发明在大规模样本集的应用,进一步可将上式改写为:Due to the Laplace matrix The size is , so calculate The computational complexity and memory overhead are both , which limits the application of the present invention in large-scale sample sets, and the above formula can be further rewritten as:
然而计算和的计算复杂度和内存开销仍为,本发明提出预定义常量,则、,进一步预定义常量,则可写为,而计算的计算复杂度和内存开销为;对于可改写为,而计算此项的计算复杂度和内存开销为,因此计算的复杂度和内存开销都减少为;However, calculating and The computational complexity and memory overhead are still , the present invention proposes predefined constants ,but , , further predefined constants ,but can be written as , while computing The computational complexity and memory overhead of ;for can be rewritten as , and the computational complexity and memory overhead of computing this item are , so calculate The complexity and memory overhead are reduced to ;
步骤72、固定,,和求解:与求解类似,可得:Step 72, fix , , and solve : and solve Similarly, you can get:
进一步利用与求解类似的方法,可将计算的复杂度和内存开销都减少为;Further use and solution A similar method can be used to calculate The complexity and memory overhead are reduced to ;
步骤73、固定,,和求解:去除与无关的项,则目标函数变为:Step 73, fix , , and solve : remove with irrelevant terms, the objective function becomes:
对上式求关于的导数,并令其等于0,则可得:Find about the above formula the derivative of , and make it equal to 0, then we get:
步骤74、固定,,和求解:去除与无关的项,则目标函数变为:Step 74, fix , , and solve : remove with irrelevant terms, the objective function becomes:
上式可由奇异值分解(SVD)算法求解,即,其中为左奇异矩阵,为右奇异矩阵,为奇异值矩阵,则;The above formula can be solved by the singular value decomposition (SVD) algorithm, namely ,in is a left singular matrix, is a right singular matrix, is a singular value matrix, then ;
步骤75、固定,,和求解:去除与无关的项,则目标函数变为:Step 75, fix , , and solve : remove with irrelevant terms, the objective function becomes:
可得:Available:
其中表示符号函数;in represents a symbolic function;
步骤76、重复步骤71-步骤75,直到算法收敛或达到最大迭代次数;Step 76: Repeat steps 71 to 75 until the algorithm converges or the maximum number of iterations is reached;
步骤8、用户输入查询样本,样本可以为图像也可为文本,提取其特征,将特征进行归一化和去均值,并利用高斯径向基函数将样本的特征映射到非线性空间,得到查询样本的表示;Step 8. The user inputs a query sample, the sample can be an image or a text, extract its features, normalize and de-average the features, and use the Gaussian radial basis function to map the features of the sample to a nonlinear space to obtain the query representation of the sample ;
步骤9、利用已学习的线性映射函数和旋转矩阵,生成查询样本的哈希码:Step 9. Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:
; ;
步骤10、计算查询样本与样本集中异构样本哈希码的汉明距离,并按汉明距离从小到大排列,返回前个样本即为检索结果。Step 10. Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large. Each sample is the search result.
本发明与现有技术相比,其有益效果在于:Compared with the prior art, the present invention has the following beneficial effects:
1、通过引入的常量将基于谱嵌入算法的计算复杂度和内存开销由降低到。1. By introducing constants, the computational complexity and memory overhead of the spectral embedding algorithm are reduced by Reduce to .
2、利用保持模态内、模态间的语义相似性和基于标签的相似性学习哈希哈希码,提升了哈希码的性能。2. Hash codes are learned by maintaining intra-modal and inter-modal semantic similarity and label-based similarity, which improves the performance of hash codes.
3、利用监督的方式学习一个正交旋转矩阵减少量化误差,进一步增强了哈希码的区分能力,提升了算法的性能。3. Learning an orthogonal rotation matrix in a supervised way reduces quantization error, further enhances the distinguishing ability of hash codes, and improves the performance of the algorithm.
附图说明Description of drawings
图1为本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法的步骤流程图。FIG. 1 is a flow chart of the steps of the cross-media hash retrieval method for efficient supervised graph embedding for image and text samples according to the present invention.
具体实施方式Detailed ways
为对本发明的技术方案进行更完整、清楚地描述,以下结合具体实施方式对本发明进一步详细描述,应当理解,此处所描述的实施例仅是说明和解释本发明,并不是用于限定本发明的保护范围。In order to describe the technical solutions of the present invention more completely and clearly, the present invention will be described in further detail below with reference to specific embodiments. It should be understood that the embodiments described herein are only for illustrating and explaining the present invention, not for limiting the present invention. protected range.
本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法,在互联网上收集图像和文本样本,并将来源于同一网页的图像和文本样本构成样本对,建立图文样对本集,标注样本对的类别,并将图文样本集划分为训练集和测试集;提取训练集和测试集中所有图像和文本样本的特征,并利用径向基高斯核函数将其特征映射到非线性空间;利用样本对的类别标签构建样本对的图邻接矩阵,并进一步得到图的Laplace矩阵;利用线性映射将类别标签映射到一个潜在语义空间,并在此空间通过保持图像和文本样本的模态间和模态内的语义相似性为图像和文本模态分别学习线性映射矩阵;通过学习一个正交旋转矩阵最小化量化误差;提出一种高效的离散迭代优化算法,通过预定义几个常量避免直接利用Laplace矩阵求解,提升了算法的高效性,并可直接得到哈希码的离散解;本发明利用保持图像和文本样本的模态内、模态间语义相似性、基于类别标签的相似性和最小化量化误差学习哈希码,提升了算法的检索性能。The present invention is an efficient supervised graph embedding cross-media hash retrieval method for image and text samples, collects images and text samples on the Internet, forms sample pairs from images and text samples from the same web page, establishes a set of image and text sample pairs, and labels them. Classify the sample pair, and divide the image and text sample set into training set and test set; extract the features of all image and text samples in the training set and test set, and use the radial basis Gaussian kernel function to map their features to the nonlinear space; The graph adjacency matrix of the sample pair is constructed by the class label of the sample pair, and the Laplace matrix of the graph is further obtained; the class label is mapped to a latent semantic space using linear mapping, and in this space, the modalities of the image and text samples are maintained by maintaining the sum Semantic similarity within modalities learns linear mapping matrices for image and text modalities separately; minimizes quantization error by learning an orthogonal rotation matrix; proposes an efficient discrete iterative optimization algorithm that avoids direct use by predefining several constants The Laplace matrix solution improves the efficiency of the algorithm, and can directly obtain the discrete solution of the hash code; the present invention maintains the intra-modal and inter-modal semantic similarity of the image and text samples, the similarity based on the category label and the minimum The quantization error is used to learn the hash code, which improves the retrieval performance of the algorithm.
参见图1,一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法,其特征在于其利用计算机装置实现如下步骤:Referring to Fig. 1, an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples is characterized in that it utilizes a computer device to realize the following steps:
第一步:从网络上收集图像和文本样本,并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集,标注图文样本对的类别,并随机选择75%的图文样本对构成训练集,剩余的构成测试集;Step 1: Collect image and text samples from the Internet, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, mark the category of graphic-text sample pairs, and randomly select 75% of the graphic-text samples The sample pairs constitute the training set, and the rest constitute the test set;
第二步:提取所有图像样本的150维纹理特征、所有文本样本的500维BOW(Bag OfWords)特征,并对特征进行归一化和去均值;Step 2: Extract 150-dimensional texture features of all image samples and 500-dimensional BOW (Bag OfWords) features of all text samples, and normalize and de-average the features;
第三步:训练集中图文样本对的特征用表示,其中,分别表示训练集中所有图像、文本样本的特征,,,表示样本对的数量,表示样本对的类别标签,其中表示样本类别的数量;随机选择500个样本(其中)作为锚点,利用高斯径向基函数将样本的特征映射到非线性空间:Step 3: The features of the image-text sample pair in the training set are used said, of which , represent the features of all images and text samples in the training set, respectively, , , represents the number of sample pairs, class labels representing pairs of samples, where Indicates the number of sample categories; 500 samples are randomly selected (in ) as an anchor point, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space:
其中,,表示范数;in , , express norm;
第四步:利用图文样本对的类别标签构造样本对的图邻接矩阵,其定义如下:Step 4: Construct the graph adjacency matrix of the sample pair using the class label of the image-text sample pair , which is defined as follows:
其中,表示矩阵的第行第列的数值,表示范数;in, representation matrix First row the value of the column, express norm;
第五步:进一步得到图邻接矩阵的Laplace矩阵,其中是对角阵,其对角元素;Step 5: Further get the graph adjacency matrix Laplace matrix ,in is a diagonal matrix whose diagonal elements ;
第六步:基于上述的变量,利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数,其定义如下:Step 6: Based on the above variables, the objective function of this method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:
其中,,,, ,,和分别表示为图像和文本样本模态学习的线性投影矩阵,表示哈希码的长度,表示矩阵的迹,为线性映射矩阵,为图文样本对学习的哈希码,为正交旋转矩阵,表示尺寸为的单位阵,表示正则化项;in , , , , , , and are denoted as the linear projection matrices learned by the image and text sample modalities, respectively, represents the length of the hash code, represents the trace of the matrix, is a linear mapping matrix, is the learned hash code for the image-text sample pair, is an orthogonal rotation matrix, Indicates the size of the unit array, represents the regularization term;
第七步:利用迭代优化算法求解目标函数,首先初始化迭代次数、最大迭代次数、目标函数的值(足够大的数)和阈值0.001,具体包括以下步骤:Step 7: Use the iterative optimization algorithm to solve the objective function, first initialize the number of iterations ,The maximum number of iterations , the value of the objective function (a large enough number) and a threshold of 0.001, including the following steps:
(1)固定,,和求解:去除与 无关的项,则目标函数变为:(1) Fixed , , and solve : remove with irrelevant terms, the objective function becomes:
对上式求关于的导数,并令其等于0,则可得:Find about the above formula the derivative of , and make it equal to 0, then we get:
由于Laplace矩阵的尺寸是的,所以计算的复杂度和内存开销都为,限制了本发明在大规模样本集的应用,进一步可将上式改写为:Due to the Laplace matrix The size is , so calculate The complexity and memory overhead of , which limits the application of the present invention in large-scale sample sets, and the above formula can be further rewritten as:
然而计算和的复杂度和内存开销仍为,本发明提出预定义常量,则、,进一步预定义常量,则可写为,而计算的复杂度和内存开销为;对于可改写为,而计算此项的复杂度和内存开销为,因此计算的复杂度和内存开销都减少为;However, calculating and The complexity and memory overhead are still , the present invention proposes predefined constants ,but , , further predefined constants ,but can be written as , while computing The complexity and memory overhead of ;for can be rewritten as , and the complexity and memory overhead of computing this item is , so calculate The complexity and memory overhead are reduced to ;
(2)固定,,和求解:与求解类似,可得:(2) Fixed , , and solve : and solve Similarly, you can get:
进一步利用与求解类似的方法,可将计算的复杂度和内存开销都减少为;Further use and solution A similar method can be used to calculate The complexity and memory overhead are reduced to ;
(3)固定,,和求解:去除与无关的项,则目标函数变为:(3) Fixed , , and solve : remove with irrelevant terms, the objective function becomes:
对上式求关于的导数,并令其等于0,则可得:Find about the above formula the derivative of , and make it equal to 0, then we get:
(4)固定,,和求解:去除与无关的项,则目标函数变为:(4) Fixed , , and solve : remove with irrelevant terms, the objective function becomes:
上式可由奇异值分解(SVD)算法求解,即,其中为左奇异矩阵,为右奇异矩阵,为奇异值矩阵,则;The above formula can be solved by the singular value decomposition (SVD) algorithm, namely ,in is a left singular matrix, is a right singular matrix, is a singular value matrix, then ;
(5)固定,,和求解:去除与无关的项,则目标函数变为:(5) Fixed , , and solve : remove with irrelevant terms, the objective function becomes:
可得:Available:
其中表示符号函数;in represents a symbolic function;
(6)计算目标函数的值,并判断或是否成立,如果成立则停止迭代;如果不成立则、,并重复执行步骤(1)—(5);(6) Calculate the value of the objective function , and judge or Whether it is established, if so, stop the iteration; if not, then , , and repeat steps (1)-(5);
第八步:用户输入查询样本,也可以为图像也可为文本,如果输入图像则提取其150维的纹理特征,如果输入文本则提取其500维的BOW特征,将特征进行归一化和去均值,并利用高斯径向基函数将样本的特征映射到非线性空间,得到查询样本的表示;Step 8: The user inputs the query sample, which can be either an image or a text. If an image is input, its 150-dimensional texture features are extracted, and if a text is input, its 500-dimensional BOW features are extracted, and the features are normalized and removed. mean, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space to obtain the representation of the query sample ;
第九步:利用已学习的线性映射函数和旋转矩阵,生成查询样本的哈希码:Step 9: Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:
; ;
第十步:计算查询样本与样本集中异构样本哈希码的汉明距离,并按汉明距离从小到大排列,返回前个样本即为检索结果。Step 10: Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large. Each sample is the search result.
本实施例在公开样本集Mirflickr25K上验证本发明方法的有效性,此样本集包含20015个从社交网站Flickr上收集的图像文本对,这些样本对包含24个语义类别;本实施例随机选取75%的图文样本对作为训练集,剩余的25%作为测试集;每张图像表示为150维的Gist特征(纹理特征),文本表示为500维的BOW(Bag Of Words)特征,并对特征做归一化、去均值处理;为评价本发明方法的检索性能,在此用平均准确率(Mean AveragePrecision,MAP@100)作为评价标准,即MAP由前100个返回的样本计算,不同哈希码长在图像检索文本和文本检索图像两个任务上的MAP@100结果,如表1所示为本发明在Mirflickr25K样本集上的MAP@100结果,结果显示本发明方法的检索性能其平均准确率明显高于已有技术。This embodiment verifies the effectiveness of the method of the present invention on the public sample set Mirflickr25K. This sample set includes 20015 image-text pairs collected from the social networking site Flickr, and these sample pairs include 24 semantic categories; 75% of the sample pairs are randomly selected in this embodiment. The image and text sample pairs are used as the training set, and the remaining 25% are used as the test set; each image is represented as a 150-dimensional Gist feature (texture feature), and the text is represented as a 500-dimensional BOW (Bag Of Words) feature. Normalization and de-average processing; in order to evaluate the retrieval performance of the method of the present invention, the average accuracy rate (Mean AveragePrecision, MAP@100) is used as the evaluation standard, that is, the MAP is calculated from the first 100 returned samples, and different hash codes are used. The MAP@100 results on the two tasks of image retrieval text and text retrieval images are shown in Table 1. The MAP@100 results on the Mirflickr25K sample set of the present invention are shown in Table 1. The results show the retrieval performance of the method of the present invention and its average accuracy rate significantly higher than the existing technology.
表 1Table 1
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010943065.5A CN112214623A (en) | 2020-09-09 | 2020-09-09 | Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010943065.5A CN112214623A (en) | 2020-09-09 | 2020-09-09 | Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112214623A true CN112214623A (en) | 2021-01-12 |
Family
ID=74049225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010943065.5A Withdrawn CN112214623A (en) | 2020-09-09 | 2020-09-09 | Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214623A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191445A (en) * | 2021-05-16 | 2021-07-30 | 中国海洋大学 | Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm |
CN113407661A (en) * | 2021-08-18 | 2021-09-17 | 鲁东大学 | Discrete hash retrieval method based on robust matrix decomposition |
CN113868366A (en) * | 2021-12-06 | 2021-12-31 | 山东大学 | Streaming data-oriented online cross-modal retrieval method and system |
CN117315687A (en) * | 2023-11-10 | 2023-12-29 | 哈尔滨理工大学 | Image-text matching method for single-class low-information-content data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107256271A (en) * | 2017-06-27 | 2017-10-17 | 鲁东大学 | Cross-module state Hash search method based on mapping dictionary learning |
CN107729513A (en) * | 2017-10-25 | 2018-02-23 | 鲁东大学 | Discrete supervision cross-module state Hash search method based on semanteme alignment |
CN108595688A (en) * | 2018-05-08 | 2018-09-28 | 鲁东大学 | Across the media Hash search methods of potential applications based on on-line study |
CN109871454A (en) * | 2019-01-31 | 2019-06-11 | 鲁东大学 | A Robust Discretely Supervised Cross-Media Hash Retrieval Method |
CN110110100A (en) * | 2019-05-07 | 2019-08-09 | 鲁东大学 | Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix |
-
2020
- 2020-09-09 CN CN202010943065.5A patent/CN112214623A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107256271A (en) * | 2017-06-27 | 2017-10-17 | 鲁东大学 | Cross-module state Hash search method based on mapping dictionary learning |
CN107729513A (en) * | 2017-10-25 | 2018-02-23 | 鲁东大学 | Discrete supervision cross-module state Hash search method based on semanteme alignment |
CN108595688A (en) * | 2018-05-08 | 2018-09-28 | 鲁东大学 | Across the media Hash search methods of potential applications based on on-line study |
CN109871454A (en) * | 2019-01-31 | 2019-06-11 | 鲁东大学 | A Robust Discretely Supervised Cross-Media Hash Retrieval Method |
CN110110100A (en) * | 2019-05-07 | 2019-08-09 | 鲁东大学 | Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix |
Non-Patent Citations (2)
Title |
---|
TAO YAO,LIANSHAN YAN, YILAN MA, HONG YU, QINGTANG SU: "《Fast discrete cross-modal hashing with semantic consistency》", 《NEURAL NETWORKS》 * |
姚涛: "《基于哈希方法的跨媒体检索研究》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191445A (en) * | 2021-05-16 | 2021-07-30 | 中国海洋大学 | Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm |
CN113191445B (en) * | 2021-05-16 | 2022-07-19 | 中国海洋大学 | Large-scale Image Retrieval Method Based on Self-Supervised Adversarial Hash Algorithm |
CN113407661A (en) * | 2021-08-18 | 2021-09-17 | 鲁东大学 | Discrete hash retrieval method based on robust matrix decomposition |
CN113868366A (en) * | 2021-12-06 | 2021-12-31 | 山东大学 | Streaming data-oriented online cross-modal retrieval method and system |
CN117315687A (en) * | 2023-11-10 | 2023-12-29 | 哈尔滨理工大学 | Image-text matching method for single-class low-information-content data |
CN117315687B (en) * | 2023-11-10 | 2024-10-08 | 泓柯垚利(北京)劳务派遣有限公司 | Image-text matching method for single-class low-information-content data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107256271B (en) | Cross-modal hash retrieval method based on mapping dictionary learning | |
Kulis et al. | Fast similarity search for learned metrics | |
CN112214623A (en) | Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method | |
CN106202256B (en) | Web Image Retrieval Method Based on Semantic Propagation and Hybrid Multi-Instance Learning | |
CN104820696B (en) | A kind of large-scale image search method based on multi-tag least square hash algorithm | |
CN108334574A (en) | A kind of cross-module state search method decomposed based on Harmonious Matrix | |
CN103729428B (en) | Big data classification method and system | |
CN105469096A (en) | Feature bag image retrieval method based on Hash binary code | |
CN109871454B (en) | A Robust Discrete Supervised Cross-media Hashing Retrieval Method | |
CN108510559A (en) | It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method | |
CN114169442B (en) | Remote sensing image small sample scene classification method based on double prototype network | |
CN114329109B (en) | Multimodal retrieval method and system based on weakly supervised hash learning | |
Liu et al. | An indoor scene classification method for service robot Based on CNN feature | |
CN114896434B (en) | Hash code generation method and device based on center similarity learning | |
CN111523586B (en) | Noise-aware-based full-network supervision target detection method | |
Song et al. | Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning | |
CN114357200A (en) | A Cross-modal Hash Retrieval Method Based on Supervised Graph Embedding | |
CN112257716A (en) | A scene text recognition method based on scale adaptation and directional attention network | |
CN112883216B (en) | Semi-supervised image retrieval method and device based on disturbance consistency self-integration | |
CN103279581A (en) | Method for performing video retrieval by compact video theme descriptors | |
CN103605653B (en) | Big data retrieval method based on sparse hash | |
Ding et al. | Weakly-supervised online hashing with refined pseudo tags | |
CN111984800B (en) | Hash cross-modal information retrieval method based on dictionary pair learning | |
CN105808723B (en) | The picture retrieval method hashed based on picture semantic and vision | |
CN112307248A (en) | An image retrieval method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210112 |