CN112214623A

CN112214623A - Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Info

Publication number: CN112214623A
Application number: CN202010943065.5A
Authority: CN
Inventors: 姚涛; 刘莉; 闫连山; 贺文伟; 崔光海
Original assignee: Yantai Aidian Information Technology Co ltd; Ludong University
Current assignee: Yantai Aidian Information Technology Co ltd; Ludong University
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2021-01-12

Abstract

The invention relates to the technical field of multimedia, in particular to an image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method, which comprises the following steps: constructing a picture-text sample pair sample set, and labeling semantic categories of sample pairs; extracting the characteristics of the images and the text samples in the sample set, and mapping the characteristics to a nonlinear space by using a radial basis Gaussian kernel function; constructing a graph adjacency matrix of the sample pairs by using the class labels of the sample pairs to obtain a Laplace matrix; mapping the category labels to a potential semantic space by utilizing linear mapping, and keeping the semantic similarity between the modalities and in the modalities of the image and text samples to respectively learn linear mapping matrixes for the image and text modalities; learning an orthogonal rotation matrix minimization quantization error; a discrete iteration optimization algorithm is provided to obtain a discrete solution of the hash code; the invention utilizes the semantic similarity between the modes and the semantic similarity between the modes of the images and the text samples, the similarity based on the class labels and the minimized quantization error to learn the Hash codes, thereby improving the algorithm retrieval performance.

Description

An efficient supervised graph embedding cross-media hash retrieval method for graphic samples

技术领域technical field

本发明涉及多媒体技术领域，具体为一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法。The invention relates to the technical field of multimedia, in particular to an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples.

背景技术Background technique

随着网络技术和便携式移动设备的快速发展，越来越多的人习惯于通过网络分享生活中的点滴，例如某人过生日的时候，通过微信、脸谱等社交软件发布生日照片（图像）并描述自己的心情（文本）等，使得网络上的数据爆炸式增长，用户如何在海量数据中查找所需的信息成为一个挑战。一方面，网络上数据量大，而且样本特征的维度通常非常高，甚至可达上万维。传统的检索方法需要计算查询样本与所有待检索样本的距离，例如欧式距离、余弦距离等，这会造成过高的计算复杂度和内存开销。另一方面，网络上数据的模态多，而且各模态表示异构，如何衡量异构样本的相似度成为一个挑战。跨媒体哈希方法可以很好地解决上面两个问题。监督跨媒体哈希方法，可以利用含有高层语义的类别标签学习哈希码，提升了哈希码的区分能力，取得了令人满意的检索性能。但大部分方法存在以下问题，需要更进一步解决：1）大多数方法不能充分利用类别标签提升哈希码的性能，现有方法主要通过保持基于两两相似矩阵的相似度学习哈希码，然而两两相似矩阵不仅会造成类别信息的丢失，而且会导致较高的计算复杂度和内存开销；2）大部分现有的离散哈希方法在优化过程中对哈希码矩阵进行逐位求解，会导致较高的计算复杂度。本发明提出一种面向图文样本的监督图嵌入的高效哈希检索方法，可以有效的解决上面的问题。首先为了更好保持样本的语义相似性，本发明提出同时保持样本的模态间、模态内的语义相似性和基于类别标签的相似性，学习哈希码和线性映射矩阵，并学习一个正交旋转矩阵减少量化误差，进一步提升哈希码的区分能力。然后，提出一种迭代优化算法，不仅可直接得到样本的哈希码闭合的离散解，而且降低了算法的计算复杂度。With the rapid development of network technology and portable mobile devices, more and more people are accustomed to sharing moments in life through the network. Describing one's own mood (text), etc., makes the data on the Internet explode, and it becomes a challenge for users to find the required information in the massive data. On the one hand, the amount of data on the network is large, and the dimension of sample features is usually very high, even reaching tens of thousands of dimensions. The traditional retrieval method needs to calculate the distance between the query sample and all the samples to be retrieved, such as Euclidean distance, cosine distance, etc., which will cause high computational complexity and memory overhead. On the other hand, there are many modalities of data on the network, and each modal represents heterogeneity. How to measure the similarity of heterogeneous samples becomes a challenge. The cross-media hashing method can solve the above two problems well. The supervised cross-media hashing method can learn hash codes by using category labels with high-level semantics, which improves the distinguishing ability of hash codes and achieves satisfactory retrieval performance. However, most methods have the following problems, which need to be further solved: 1) Most methods cannot make full use of category labels to improve the performance of hash codes. Existing methods mainly learn hash codes by maintaining the similarity based on pairwise similarity matrices. The pairwise similarity matrix will not only cause the loss of category information, but also lead to high computational complexity and memory overhead; 2) Most of the existing discrete hashing methods solve the hash code matrix bit by bit in the optimization process, will lead to higher computational complexity. The present invention proposes an efficient hash retrieval method for image-text sample-oriented supervised graph embedding, which can effectively solve the above problems. First of all, in order to better maintain the semantic similarity of samples, the present invention proposes to simultaneously maintain the semantic similarity between modalities and within modalities of samples and the similarity based on class labels, learn hash codes and linear mapping matrices, and learn a positive The cross-rotation matrix reduces the quantization error and further improves the distinguishing ability of the hash code. Then, an iterative optimization algorithm is proposed, which not only can directly obtain the discrete solution of the hash code closure of the sample, but also reduces the computational complexity of the algorithm.

发明内容SUMMARY OF THE INVENTION

本发明的目的是克服现有技术缺陷，提供了一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法，其特征在于其利用计算机装置实现如下步骤：The purpose of the present invention is to overcome the deficiencies in the prior art, and provides a highly efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples, which is characterized in that it utilizes a computer device to implement the following steps:

步骤1、从网络收集图像和文本样本，并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集，标注图文样本对的类别，并将图文样本对划分为训练集和测试集；Step 1. Collect image and text samples from the network, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, label the categories of graphic-text sample pairs, and divide graphic-text sample pairs into training sets. and test set;

步骤2、提取训练集和测试集中所有图像和文本样本的特征，并对特征进行归一化和去均值；Step 2. Extract the features of all images and text samples in the training set and the test set, and normalize and de-average the features;

步骤3、训练集中的图文样本对的特征用

表示，其中

，

分别为训练集中所有图像样本、文本样本的特征，

，

表示实数，

表示特征的维度，

表示训练集中图文样本对的数量，

表示样本对的类别标签，其中

表示总类别的数量，

表示图文样本对的数量；随机选择

个样本对

作为锚点，其中

，

，利用高斯径向基函数将所有图像样本、文本样本的特征映射到非线性空间：Step 3. The features of the image-text sample pairs in the training set are used

said, of which

,

are the features of all image samples and text samples in the training set, respectively,

,

represents a real number,

represents the dimension of the feature,

represents the number of image-text sample pairs in the training set,

class labels representing pairs of samples, where

represents the number of total categories,

Indicates the number of image-text sample pairs; randomly selected

sample pairs

as an anchor, where

,

, using the Gaussian radial basis function to map the features of all image samples and text samples to the nonlinear space:

其中

为尺度参数，

表示

范数，

表示矩阵或向量的转置；in

is the scale parameter,

express

norm,

represents the transpose of a matrix or vector;

步骤4、利用图文样本对的类别标签构造样本对的图邻接矩阵

，

表示实数，其定义如下：Step 4. Construct the graph adjacency matrix of the sample pair by using the class label of the image-text sample pair

,

represents a real number, which is defined as follows:

其中，

表示矩阵

的第

行第

列的数值，

表示

范数；in,

representation matrix

First

row

the value of the column,

express

norm;

步骤5、进一步得到图邻接矩阵

的Laplace矩阵

，其中

是

的对角阵，其对角元素

；Step 5. Further obtain the graph adjacency matrix

Laplace matrix

,in

Yes

the diagonal matrix of , whose diagonal elements

;

步骤6、基于上述步骤1-步骤5的变量，利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数，其定义如下：Step 6. Based on the variables of the above steps 1 to 5, the objective function of the method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:

其中

、

、

、

和

为权重参数，

和

分别表示为图像和文本样本模态学习的线性投影矩阵，

表示哈希码的长度，

表示矩阵的迹，

为线性映射矩阵，

为图文样本对学习的哈希码，

为正交旋转矩阵，

表示尺寸为

的单位阵，

表示正则化项；in

,

and

is the weight parameter,

and

are denoted as the linear projection matrices learned by the image and text sample modalities, respectively,

represents the length of the hash code,

represents the trace of the matrix,

is a linear mapping matrix,

is the learned hash code for the image-text sample pair,

is an orthogonal rotation matrix,

Indicates the size of

the unit array,

represents the regularization term;

步骤7、利用迭代优化算法求解目标函数，具体包括以下步骤：Step 7. Use an iterative optimization algorithm to solve the objective function, which specifically includes the following steps:

步骤71、固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：Step 71, fix

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

对上式求关于

的导数，并令其等于0，则可得：Find about the above formula

the derivative of , and make it equal to 0, then we get:

由于Laplace矩阵

的尺寸是

的，所以计算

的计算复杂度和内存开销都为

，限制了本发明在大规模样本集的应用，进一步可将上式改写为：Due to the Laplace matrix

The size is

, so calculate

The computational complexity and memory overhead are both

, which limits the application of the present invention in large-scale sample sets, and the above formula can be further rewritten as:

然而计算

和

的计算复杂度和内存开销仍为

，本发明提出预定义常量

，则

、

，进一步预定义常量

，则

可写为

，而计算

的计算复杂度和内存开销为

；对于

可改写为

，而计算此项的计算复杂度和内存开销为

，因此计算

的复杂度和内存开销都减少为

；However, calculating

and

The computational complexity and memory overhead are still

, the present invention proposes predefined constants

,but

,

, further predefined constants

,but

can be written as

, while computing

The computational complexity and memory overhead of

;for

can be rewritten as

, and the computational complexity and memory overhead of computing this item are

, so calculate

The complexity and memory overhead are reduced to

;

步骤72、固定

，

，

和

求解

：与求解

类似，可得：Step 72, fix

,

and

solve

: and solve

Similarly, you can get:

进一步利用与求解

类似的方法，可将计算

的复杂度和内存开销都减少为

；Further use and solution

A similar method can be used to calculate

The complexity and memory overhead are reduced to

;

步骤73、固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：Step 73, fix

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

对上式求关于

的导数，并令其等于0，则可得：Find about the above formula

the derivative of , and make it equal to 0, then we get:

步骤74、固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：Step 74, fix

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

上式可由奇异值分解（SVD）算法求解，即

，其中

为左奇异矩阵，

为右奇异矩阵，

为奇异值矩阵，则

；The above formula can be solved by the singular value decomposition (SVD) algorithm, namely

,in

is a left singular matrix,

is a right singular matrix,

is a singular value matrix, then

;

步骤75、固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：Step 75, fix

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

可得：Available:

其中

表示符号函数；in

represents a symbolic function;

步骤76、重复步骤71-步骤75，直到算法收敛或达到最大迭代次数；Step 76: Repeat steps 71 to 75 until the algorithm converges or the maximum number of iterations is reached;

步骤8、用户输入查询样本，样本可以为图像也可为文本，提取其特征，将特征进行归一化和去均值，并利用高斯径向基函数将样本的特征映射到非线性空间，得到查询样本的表示

；Step 8. The user inputs a query sample, the sample can be an image or a text, extract its features, normalize and de-average the features, and use the Gaussian radial basis function to map the features of the sample to a nonlinear space to obtain the query representation of the sample

;

步骤9、利用已学习的线性映射函数和旋转矩阵，生成查询样本的哈希码：Step 9. Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:

；

;

步骤10、计算查询样本与样本集中异构样本哈希码的汉明距离，并按汉明距离从小到大排列，返回前

个样本即为检索结果。Step 10. Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large.

Each sample is the search result.

本发明与现有技术相比，其有益效果在于：Compared with the prior art, the present invention has the following beneficial effects:

1、通过引入的常量将基于谱嵌入算法的计算复杂度和内存开销由

降低到

。1. By introducing constants, the computational complexity and memory overhead of the spectral embedding algorithm are reduced by

Reduce to

.

2、利用保持模态内、模态间的语义相似性和基于标签的相似性学习哈希哈希码，提升了哈希码的性能。2. Hash codes are learned by maintaining intra-modal and inter-modal semantic similarity and label-based similarity, which improves the performance of hash codes.

3、利用监督的方式学习一个正交旋转矩阵减少量化误差，进一步增强了哈希码的区分能力，提升了算法的性能。3. Learning an orthogonal rotation matrix in a supervised way reduces quantization error, further enhances the distinguishing ability of hash codes, and improves the performance of the algorithm.

附图说明Description of drawings

图1为本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法的步骤流程图。FIG. 1 is a flow chart of the steps of the cross-media hash retrieval method for efficient supervised graph embedding for image and text samples according to the present invention.

具体实施方式Detailed ways

为对本发明的技术方案进行更完整、清楚地描述，以下结合具体实施方式对本发明进一步详细描述，应当理解，此处所描述的实施例仅是说明和解释本发明，并不是用于限定本发明的保护范围。In order to describe the technical solutions of the present invention more completely and clearly, the present invention will be described in further detail below with reference to specific embodiments. It should be understood that the embodiments described herein are only for illustrating and explaining the present invention, not for limiting the present invention. protected range.

本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法，在互联网上收集图像和文本样本，并将来源于同一网页的图像和文本样本构成样本对，建立图文样对本集，标注样本对的类别，并将图文样本集划分为训练集和测试集；提取训练集和测试集中所有图像和文本样本的特征，并利用径向基高斯核函数将其特征映射到非线性空间；利用样本对的类别标签构建样本对的图邻接矩阵，并进一步得到图的Laplace矩阵；利用线性映射将类别标签映射到一个潜在语义空间，并在此空间通过保持图像和文本样本的模态间和模态内的语义相似性为图像和文本模态分别学习线性映射矩阵；通过学习一个正交旋转矩阵最小化量化误差；提出一种高效的离散迭代优化算法，通过预定义几个常量避免直接利用Laplace矩阵求解，提升了算法的高效性，并可直接得到哈希码的离散解；本发明利用保持图像和文本样本的模态内、模态间语义相似性、基于类别标签的相似性和最小化量化误差学习哈希码，提升了算法的检索性能。The present invention is an efficient supervised graph embedding cross-media hash retrieval method for image and text samples, collects images and text samples on the Internet, forms sample pairs from images and text samples from the same web page, establishes a set of image and text sample pairs, and labels them. Classify the sample pair, and divide the image and text sample set into training set and test set; extract the features of all image and text samples in the training set and test set, and use the radial basis Gaussian kernel function to map their features to the nonlinear space; The graph adjacency matrix of the sample pair is constructed by the class label of the sample pair, and the Laplace matrix of the graph is further obtained; the class label is mapped to a latent semantic space using linear mapping, and in this space, the modalities of the image and text samples are maintained by maintaining the sum Semantic similarity within modalities learns linear mapping matrices for image and text modalities separately; minimizes quantization error by learning an orthogonal rotation matrix; proposes an efficient discrete iterative optimization algorithm that avoids direct use by predefining several constants The Laplace matrix solution improves the efficiency of the algorithm, and can directly obtain the discrete solution of the hash code; the present invention maintains the intra-modal and inter-modal semantic similarity of the image and text samples, the similarity based on the category label and the minimum The quantization error is used to learn the hash code, which improves the retrieval performance of the algorithm.

参见图1，一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法，其特征在于其利用计算机装置实现如下步骤：Referring to Fig. 1, an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples is characterized in that it utilizes a computer device to realize the following steps:

第一步：从网络上收集图像和文本样本，并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集，标注图文样本对的类别，并随机选择75%的图文样本对构成训练集，剩余的构成测试集；Step 1: Collect image and text samples from the Internet, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, mark the category of graphic-text sample pairs, and randomly select 75% of the graphic-text samples The sample pairs constitute the training set, and the rest constitute the test set;

第二步：提取所有图像样本的150维纹理特征、所有文本样本的500维BOW（Bag OfWords）特征，并对特征进行归一化和去均值；Step 2: Extract 150-dimensional texture features of all image samples and 500-dimensional BOW (Bag OfWords) features of all text samples, and normalize and de-average the features;

第三步：训练集中图文样本对的特征用

表示，其中

，

分别表示训练集中所有图像、文本样本的特征，

，

，

表示样本对的数量，

表示样本对的类别标签，其中

表示样本类别的数量；随机选择500个样本

（其中

）作为锚点，利用高斯径向基函数将样本的特征映射到非线性空间：Step 3: The features of the image-text sample pair in the training set are used

said, of which

,

represent the features of all images and text samples in the training set, respectively,

,

represents the number of sample pairs,

class labels representing pairs of samples, where

Indicates the number of sample categories; 500 samples are randomly selected

(in

) as an anchor point, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space:

其中

，

，

表示

范数；in

,

express

norm;

第四步：利用图文样本对的类别标签构造样本对的图邻接矩阵

，其定义如下：Step 4: Construct the graph adjacency matrix of the sample pair using the class label of the image-text sample pair

, which is defined as follows:

其中，

表示矩阵

的第

行第

列的数值，

表示

范数；in,

representation matrix

First

row

the value of the column,

express

norm;

第五步：进一步得到图邻接矩阵

的Laplace矩阵

，其中

是对角阵，其对角元素

；Step 5: Further get the graph adjacency matrix

Laplace matrix

,in

is a diagonal matrix whose diagonal elements

;

第六步：基于上述的变量，利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数，其定义如下：Step 6: Based on the above variables, the objective function of this method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:

其中

，

，

，

，

，

，

和

分别表示为图像和文本样本模态学习的线性投影矩阵，

表示哈希码的长度，

表示矩阵的迹，

为线性映射矩阵，

为图文样本对学习的哈希码，

为正交旋转矩阵，

表示尺寸为

的单位阵，

表示正则化项；in

,

and

represents the length of the hash code,

represents the trace of the matrix,

is a linear mapping matrix,

is the learned hash code for the image-text sample pair,

is an orthogonal rotation matrix,

Indicates the size of

the unit array,

represents the regularization term;

第七步：利用迭代优化算法求解目标函数，首先初始化迭代次数

、最大迭代次数

、目标函数的值

（足够大的数）和阈值0.001，具体包括以下步骤：Step 7: Use the iterative optimization algorithm to solve the objective function, first initialize the number of iterations

,The maximum number of iterations

, the value of the objective function

(a large enough number) and a threshold of 0.001, including the following steps:

（1）固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：(1) Fixed

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

对上式求关于

的导数，并令其等于0，则可得：Find about the above formula

the derivative of , and make it equal to 0, then we get:

由于Laplace矩阵

的尺寸是

的，所以计算

的复杂度和内存开销都为

The size is

, so calculate

The complexity and memory overhead of

然而计算

和

的复杂度和内存开销仍为

，本发明提出预定义常量

，则

、

，进一步预定义常量

，则

可写为

，而计算

的复杂度和内存开销为

；对于

可改写为

，而计算此项的复杂度和内存开销为

，因此计算

的复杂度和内存开销都减少为

；However, calculating

and

The complexity and memory overhead are still

, the present invention proposes predefined constants

,but

,

, further predefined constants

,but

can be written as

, while computing

The complexity and memory overhead of

;for

can be rewritten as

, and the complexity and memory overhead of computing this item is

, so calculate

The complexity and memory overhead are reduced to

;

（2）固定

，

和

求解

：与求解

类似，可得：(2) Fixed

,

and

solve

: and solve

Similarly, you can get:

进一步利用与求解

类似的方法，可将计算

的复杂度和内存开销都减少为

；Further use and solution

A similar method can be used to calculate

The complexity and memory overhead are reduced to

;

（3）固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：(3) Fixed

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

对上式求关于

的导数，并令其等于0，则可得：Find about the above formula

the derivative of , and make it equal to 0, then we get:

（4）固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：(4) Fixed

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

上式可由奇异值分解（SVD）算法求解，即

，其中

为左奇异矩阵，

为右奇异矩阵，

为奇异值矩阵，则

,in

is a left singular matrix,

is a right singular matrix,

is a singular value matrix, then

;

（5）固定

，

，

和

求解

：去除与

无关的项，则目标函数变为：(5) Fixed

,

and

solve

: remove with

irrelevant terms, the objective function becomes:

可得：Available:

其中

表示符号函数；in

represents a symbolic function;

（6）计算目标函数的值

，并判断

或

是否成立，如果成立则停止迭代；如果不成立则

、

，并重复执行步骤（1）—（5）；(6) Calculate the value of the objective function

, and judge

or

Whether it is established, if so, stop the iteration; if not, then

,

, and repeat steps (1)-(5);

第八步：用户输入查询样本，也可以为图像也可为文本，如果输入图像则提取其150维的纹理特征，如果输入文本则提取其500维的BOW特征，将特征进行归一化和去均值，并利用高斯径向基函数将样本的特征映射到非线性空间，得到查询样本的表示

；Step 8: The user inputs the query sample, which can be either an image or a text. If an image is input, its 150-dimensional texture features are extracted, and if a text is input, its 500-dimensional BOW features are extracted, and the features are normalized and removed. mean, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space to obtain the representation of the query sample

;

第九步：利用已学习的线性映射函数和旋转矩阵，生成查询样本的哈希码：Step 9: Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:

；

;

第十步：计算查询样本与样本集中异构样本哈希码的汉明距离，并按汉明距离从小到大排列，返回前

个样本即为检索结果。Step 10: Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large.

Each sample is the search result.

本实施例在公开样本集Mirflickr25K上验证本发明方法的有效性，此样本集包含20015个从社交网站Flickr上收集的图像文本对，这些样本对包含24个语义类别；本实施例随机选取75%的图文样本对作为训练集，剩余的25%作为测试集；每张图像表示为150维的Gist特征（纹理特征），文本表示为500维的BOW（Bag Of Words）特征，并对特征做归一化、去均值处理；为评价本发明方法的检索性能，在此用平均准确率（Mean AveragePrecision，MAP@100）作为评价标准，即MAP由前100个返回的样本计算，不同哈希码长在图像检索文本和文本检索图像两个任务上的MAP@100结果，如表1所示为本发明在Mirflickr25K样本集上的MAP@100结果，结果显示本发明方法的检索性能其平均准确率明显高于已有技术。This embodiment verifies the effectiveness of the method of the present invention on the public sample set Mirflickr25K. This sample set includes 20015 image-text pairs collected from the social networking site Flickr, and these sample pairs include 24 semantic categories; 75% of the sample pairs are randomly selected in this embodiment. The image and text sample pairs are used as the training set, and the remaining 25% are used as the test set; each image is represented as a 150-dimensional Gist feature (texture feature), and the text is represented as a 500-dimensional BOW (Bag Of Words) feature. Normalization and de-average processing; in order to evaluate the retrieval performance of the method of the present invention, the average accuracy rate (Mean AveragePrecision, MAP@100) is used as the evaluation standard, that is, the MAP is calculated from the first 100 returned samples, and different hash codes are used. The MAP@100 results on the two tasks of image retrieval text and text retrieval images are shown in Table 1. The MAP@100 results on the Mirflickr25K sample set of the present invention are shown in Table 1. The results show the retrieval performance of the method of the present invention and its average accuracy rate significantly higher than the existing technology.

表 1Table 1

Claims

1. An image-text sample-oriented efficient supervised graph embedding cross-media Hash retrieval method is characterized by comprising the following steps:

step 1, collecting images and text samples from a network, taking the images and the text samples belonging to the same webpage as image-text sample pairs to form an image-text sample set, labeling the types of the image-text sample pairs, and dividing the image-text sample pairs into a training set and a test set;

step 2, extracting the characteristics of all images and text samples in the training set and the test set, and normalizing and removing the mean value of the characteristics;

step 3, feature use of image-text sample pair in training set

Is shown in which

、

Respectively representing the characteristics of all image samples and text samples in the training set,

，

which represents a real number of the digital signal,

the dimensions of the features are represented in a graph,

representing the number of pairs of teletext samples in the training set,

class labels representing pairs of samples, wherein

The total number of categories is represented as,

representing the number of pairs of teletext samples; random selection

A sample pair

As anchor points, wherein

，

Mapping the characteristics of all image samples and text samples to a nonlinear space by using a Gaussian radial basis function:

wherein

In order to be a scale parameter,

to represent

The norm of the number of the first-order-of-arrival,

represents a transpose of a matrix or vector;

step 4, constructing a graph adjacency matrix of the sample pairs by using the class labels of the image-text sample pairs

，

Represents a real number, which is defined as follows:

wherein,

representation matrix

To (1) a

Go to the first

The values of the columns are such that,

to represent

A norm;

step 5, constructing a graph adjacency matrix

Is the Laplace matrix

Wherein

Is that

Diagonal matrix of (2), diagonal elements thereof

；

Step 6, combining the steps 1 to 5, constructing a target function of the method by using the inter-modal and intra-modal semantic similarity and the minimized quantization error which keep the characteristics of the samples;

7, solving an objective function by using an iterative optimization algorithm;

step 8, inputting a query sample by a user, extracting the characteristics of the query sample, normalizing and removing the mean value of the characteristics, and mapping the characteristics of the sample to a nonlinear space by using a Gaussian radial basis function to obtain the representation of the query sample

；

Step 9, generating a hash code of the query sample by utilizing the learned linear mapping function and the rotation matrix;

step 10, calculating Hamming distances of the hash codes of the query samples and the heterogeneous samples in the sample set, arranging the Hamming distances from small to large, and returning to the previous step

And obtaining the retrieval result by the sample.

2. The method for retrieving the cross-media hash of the embedded efficient supervision map for the image-text sample as claimed in claim 1, wherein the objective function in step 6 is defined as follows:

wherein

、

、

、

And

in order to be a weight parameter, the weight parameter,

and

respectively expressed as linear projection matrices learned for image sample and text sample modalities,

which indicates the length of the hash code and,

the traces of the matrix are represented by,

in order to be a linear mapping matrix, the mapping matrix is,

for the learned hash code of the image-text sample pair,

is an orthogonal rotation matrix and is characterized in that,

expressed in size of

The unit matrix of (a) is obtained,

a regularization term is represented.

3. The method for retrieving the cross-media hash embedded in the efficient supervision map facing the image-text sample as claimed in claim 1 or 2, wherein the step 7 of solving the objective function specifically comprises the following steps:

step 71, fixing