CN112214623A - Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method - Google Patents

Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method Download PDF

Info

Publication number
CN112214623A
CN112214623A CN202010943065.5A CN202010943065A CN112214623A CN 112214623 A CN112214623 A CN 112214623A CN 202010943065 A CN202010943065 A CN 202010943065A CN 112214623 A CN112214623 A CN 112214623A
Authority
CN
China
Prior art keywords
sample
image
matrix
text
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010943065.5A
Other languages
Chinese (zh)
Inventor
姚涛
刘莉
闫连山
贺文伟
崔光海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai Aidian Information Technology Co ltd
Ludong University
Original Assignee
Yantai Aidian Information Technology Co ltd
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai Aidian Information Technology Co ltd, Ludong University filed Critical Yantai Aidian Information Technology Co ltd
Priority to CN202010943065.5A priority Critical patent/CN112214623A/en
Publication of CN112214623A publication Critical patent/CN112214623A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of multimedia, in particular to an image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method, which comprises the following steps: constructing a picture-text sample pair sample set, and labeling semantic categories of sample pairs; extracting the characteristics of the images and the text samples in the sample set, and mapping the characteristics to a nonlinear space by using a radial basis Gaussian kernel function; constructing a graph adjacency matrix of the sample pairs by using the class labels of the sample pairs to obtain a Laplace matrix; mapping the category labels to a potential semantic space by utilizing linear mapping, and keeping the semantic similarity between the modalities and in the modalities of the image and text samples to respectively learn linear mapping matrixes for the image and text modalities; learning an orthogonal rotation matrix minimization quantization error; a discrete iteration optimization algorithm is provided to obtain a discrete solution of the hash code; the invention utilizes the semantic similarity between the modes and the semantic similarity between the modes of the images and the text samples, the similarity based on the class labels and the minimized quantization error to learn the Hash codes, thereby improving the algorithm retrieval performance.

Description

一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法An efficient supervised graph embedding cross-media hash retrieval method for graphic samples

技术领域technical field

本发明涉及多媒体技术领域,具体为一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法。The invention relates to the technical field of multimedia, in particular to an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples.

背景技术Background technique

随着网络技术和便携式移动设备的快速发展,越来越多的人习惯于通过网络分享生活中的点滴,例如某人过生日的时候,通过微信、脸谱等社交软件发布生日照片(图像)并描述自己的心情(文本)等,使得网络上的数据爆炸式增长,用户如何在海量数据中查找所需的信息成为一个挑战。一方面,网络上数据量大,而且样本特征的维度通常非常高,甚至可达上万维。传统的检索方法需要计算查询样本与所有待检索样本的距离,例如欧式距离、余弦距离等,这会造成过高的计算复杂度和内存开销。另一方面,网络上数据的模态多,而且各模态表示异构,如何衡量异构样本的相似度成为一个挑战。跨媒体哈希方法可以很好地解决上面两个问题。监督跨媒体哈希方法,可以利用含有高层语义的类别标签学习哈希码,提升了哈希码的区分能力,取得了令人满意的检索性能。但大部分方法存在以下问题,需要更进一步解决:1)大多数方法不能充分利用类别标签提升哈希码的性能,现有方法主要通过保持基于两两相似矩阵的相似度学习哈希码,然而两两相似矩阵不仅会造成类别信息的丢失,而且会导致较高的计算复杂度和内存开销;2)大部分现有的离散哈希方法在优化过程中对哈希码矩阵进行逐位求解,会导致较高的计算复杂度。本发明提出一种面向图文样本的监督图嵌入的高效哈希检索方法,可以有效的解决上面的问题。首先为了更好保持样本的语义相似性,本发明提出同时保持样本的模态间、模态内的语义相似性和基于类别标签的相似性,学习哈希码和线性映射矩阵,并学习一个正交旋转矩阵减少量化误差,进一步提升哈希码的区分能力。然后,提出一种迭代优化算法,不仅可直接得到样本的哈希码闭合的离散解,而且降低了算法的计算复杂度。With the rapid development of network technology and portable mobile devices, more and more people are accustomed to sharing moments in life through the network. Describing one's own mood (text), etc., makes the data on the Internet explode, and it becomes a challenge for users to find the required information in the massive data. On the one hand, the amount of data on the network is large, and the dimension of sample features is usually very high, even reaching tens of thousands of dimensions. The traditional retrieval method needs to calculate the distance between the query sample and all the samples to be retrieved, such as Euclidean distance, cosine distance, etc., which will cause high computational complexity and memory overhead. On the other hand, there are many modalities of data on the network, and each modal represents heterogeneity. How to measure the similarity of heterogeneous samples becomes a challenge. The cross-media hashing method can solve the above two problems well. The supervised cross-media hashing method can learn hash codes by using category labels with high-level semantics, which improves the distinguishing ability of hash codes and achieves satisfactory retrieval performance. However, most methods have the following problems, which need to be further solved: 1) Most methods cannot make full use of category labels to improve the performance of hash codes. Existing methods mainly learn hash codes by maintaining the similarity based on pairwise similarity matrices. The pairwise similarity matrix will not only cause the loss of category information, but also lead to high computational complexity and memory overhead; 2) Most of the existing discrete hashing methods solve the hash code matrix bit by bit in the optimization process, will lead to higher computational complexity. The present invention proposes an efficient hash retrieval method for image-text sample-oriented supervised graph embedding, which can effectively solve the above problems. First of all, in order to better maintain the semantic similarity of samples, the present invention proposes to simultaneously maintain the semantic similarity between modalities and within modalities of samples and the similarity based on class labels, learn hash codes and linear mapping matrices, and learn a positive The cross-rotation matrix reduces the quantization error and further improves the distinguishing ability of the hash code. Then, an iterative optimization algorithm is proposed, which not only can directly obtain the discrete solution of the hash code closure of the sample, but also reduces the computational complexity of the algorithm.

发明内容SUMMARY OF THE INVENTION

本发明的目的是克服现有技术缺陷,提供了一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法,其特征在于其利用计算机装置实现如下步骤:The purpose of the present invention is to overcome the deficiencies in the prior art, and provides a highly efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples, which is characterized in that it utilizes a computer device to implement the following steps:

步骤1、从网络收集图像和文本样本,并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集,标注图文样本对的类别,并将图文样本对划分为训练集和测试集;Step 1. Collect image and text samples from the network, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, label the categories of graphic-text sample pairs, and divide graphic-text sample pairs into training sets. and test set;

步骤2、提取训练集和测试集中所有图像和文本样本的特征,并对特征进行归一化和去均值;Step 2. Extract the features of all images and text samples in the training set and the test set, and normalize and de-average the features;

步骤3、训练集中的图文样本对的特征用

Figure DEST_PATH_IMAGE001
表示,其中
Figure 593037DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
分别为训练集中所有图像样本、文本样本的特征,
Figure 203141DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
表示实数,
Figure 811976DEST_PATH_IMAGE006
表示特征的维度,
Figure DEST_PATH_IMAGE007
表示训练集中图文样本对的数量,
Figure 47786DEST_PATH_IMAGE008
表示样本对的类别标签,其中
Figure DEST_PATH_IMAGE009
表示总类别的数量,
Figure 372981DEST_PATH_IMAGE010
表示图文样本对的数量;随机选择
Figure DEST_PATH_IMAGE011
个样本对
Figure 340937DEST_PATH_IMAGE012
作为锚点,其中
Figure DEST_PATH_IMAGE013
Figure 499386DEST_PATH_IMAGE014
,利用高斯径向基函数将所有图像样本、文本样本的特征映射到非线性空间:Step 3. The features of the image-text sample pairs in the training set are used
Figure DEST_PATH_IMAGE001
said, of which
Figure 593037DEST_PATH_IMAGE002
,
Figure DEST_PATH_IMAGE003
are the features of all image samples and text samples in the training set, respectively,
Figure 203141DEST_PATH_IMAGE004
,
Figure DEST_PATH_IMAGE005
represents a real number,
Figure 811976DEST_PATH_IMAGE006
represents the dimension of the feature,
Figure DEST_PATH_IMAGE007
represents the number of image-text sample pairs in the training set,
Figure 47786DEST_PATH_IMAGE008
class labels representing pairs of samples, where
Figure DEST_PATH_IMAGE009
represents the number of total categories,
Figure 372981DEST_PATH_IMAGE010
Indicates the number of image-text sample pairs; randomly selected
Figure DEST_PATH_IMAGE011
sample pairs
Figure 340937DEST_PATH_IMAGE012
as an anchor, where
Figure DEST_PATH_IMAGE013
,
Figure 499386DEST_PATH_IMAGE014
, using the Gaussian radial basis function to map the features of all image samples and text samples to the nonlinear space:

Figure DEST_PATH_IMAGE015
Figure DEST_PATH_IMAGE015

其中

Figure 24040DEST_PATH_IMAGE016
为尺度参数,
Figure DEST_PATH_IMAGE017
表示
Figure 951544DEST_PATH_IMAGE018
范数,
Figure DEST_PATH_IMAGE019
表示矩阵或向量的转置;in
Figure 24040DEST_PATH_IMAGE016
is the scale parameter,
Figure DEST_PATH_IMAGE017
express
Figure 951544DEST_PATH_IMAGE018
norm,
Figure DEST_PATH_IMAGE019
represents the transpose of a matrix or vector;

步骤4、利用图文样本对的类别标签构造样本对的图邻接矩阵

Figure 90402DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
表示实数,其定义如下:Step 4. Construct the graph adjacency matrix of the sample pair by using the class label of the image-text sample pair
Figure 90402DEST_PATH_IMAGE020
,
Figure DEST_PATH_IMAGE021
represents a real number, which is defined as follows:

Figure 985414DEST_PATH_IMAGE022
Figure 985414DEST_PATH_IMAGE022

其中,

Figure DEST_PATH_IMAGE023
表示矩阵
Figure 297447DEST_PATH_IMAGE024
的第
Figure DEST_PATH_IMAGE025
行第
Figure 95770DEST_PATH_IMAGE026
列的数值,
Figure DEST_PATH_IMAGE027
表示
Figure 405528DEST_PATH_IMAGE028
范数;in,
Figure DEST_PATH_IMAGE023
representation matrix
Figure 297447DEST_PATH_IMAGE024
First
Figure DEST_PATH_IMAGE025
row
Figure 95770DEST_PATH_IMAGE026
the value of the column,
Figure DEST_PATH_IMAGE027
express
Figure 405528DEST_PATH_IMAGE028
norm;

步骤5、进一步得到图邻接矩阵

Figure 538569DEST_PATH_IMAGE024
的Laplace矩阵
Figure DEST_PATH_IMAGE029
,其中
Figure 591976DEST_PATH_IMAGE030
Figure DEST_PATH_IMAGE031
的对角阵,其对角元素
Figure 746270DEST_PATH_IMAGE032
;Step 5. Further obtain the graph adjacency matrix
Figure 538569DEST_PATH_IMAGE024
Laplace matrix
Figure DEST_PATH_IMAGE029
,in
Figure 591976DEST_PATH_IMAGE030
Yes
Figure DEST_PATH_IMAGE031
the diagonal matrix of , whose diagonal elements
Figure 746270DEST_PATH_IMAGE032
;

步骤6、基于上述步骤1-步骤5的变量,利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数,其定义如下:Step 6. Based on the variables of the above steps 1 to 5, the objective function of the method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:

Figure DEST_PATH_IMAGE033
Figure DEST_PATH_IMAGE033

其中

Figure 289247DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
Figure 581688DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE037
Figure 251835DEST_PATH_IMAGE038
Figure DEST_PATH_IMAGE039
为权重参数,
Figure 8438DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE041
分别表示为图像和文本样本模态学习的线性投影矩阵,
Figure 971584DEST_PATH_IMAGE042
表示哈希码的长度,
Figure DEST_PATH_IMAGE043
表示矩阵的迹,
Figure 892267DEST_PATH_IMAGE044
为线性映射矩阵,
Figure DEST_PATH_IMAGE045
为图文样本对学习的哈希码,
Figure 287476DEST_PATH_IMAGE046
为正交旋转矩阵,
Figure DEST_PATH_IMAGE047
表示尺寸为
Figure 898586DEST_PATH_IMAGE048
的单位阵,
Figure DEST_PATH_IMAGE049
表示正则化项;in
Figure 289247DEST_PATH_IMAGE034
,
Figure DEST_PATH_IMAGE035
,
Figure 581688DEST_PATH_IMAGE036
,
Figure DEST_PATH_IMAGE037
,
Figure 251835DEST_PATH_IMAGE038
and
Figure DEST_PATH_IMAGE039
is the weight parameter,
Figure 8438DEST_PATH_IMAGE040
and
Figure DEST_PATH_IMAGE041
are denoted as the linear projection matrices learned by the image and text sample modalities, respectively,
Figure 971584DEST_PATH_IMAGE042
represents the length of the hash code,
Figure DEST_PATH_IMAGE043
represents the trace of the matrix,
Figure 892267DEST_PATH_IMAGE044
is a linear mapping matrix,
Figure DEST_PATH_IMAGE045
is the learned hash code for the image-text sample pair,
Figure 287476DEST_PATH_IMAGE046
is an orthogonal rotation matrix,
Figure DEST_PATH_IMAGE047
Indicates the size of
Figure 898586DEST_PATH_IMAGE048
the unit array,
Figure DEST_PATH_IMAGE049
represents the regularization term;

步骤7、利用迭代优化算法求解目标函数,具体包括以下步骤:Step 7. Use an iterative optimization algorithm to solve the objective function, which specifically includes the following steps:

步骤71、固定

Figure 301142DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE051
Figure 995604DEST_PATH_IMAGE052
Figure DEST_PATH_IMAGE053
求解
Figure 506089DEST_PATH_IMAGE054
:去除与
Figure DEST_PATH_IMAGE055
无关的项,则目标函数变为:Step 71, fix
Figure 301142DEST_PATH_IMAGE050
,
Figure DEST_PATH_IMAGE051
,
Figure 995604DEST_PATH_IMAGE052
and
Figure DEST_PATH_IMAGE053
solve
Figure 506089DEST_PATH_IMAGE054
: remove with
Figure DEST_PATH_IMAGE055
irrelevant terms, the objective function becomes:

Figure 971705DEST_PATH_IMAGE056
Figure 971705DEST_PATH_IMAGE056

对上式求关于

Figure DEST_PATH_IMAGE057
的导数,并令其等于0,则可得:Find about the above formula
Figure DEST_PATH_IMAGE057
the derivative of , and make it equal to 0, then we get:

Figure 43697DEST_PATH_IMAGE058
Figure 43697DEST_PATH_IMAGE058

由于Laplace矩阵

Figure DEST_PATH_IMAGE059
的尺寸是
Figure 798027DEST_PATH_IMAGE060
的,所以计算
Figure DEST_PATH_IMAGE061
的计算复杂度和内存开销都为
Figure 597356DEST_PATH_IMAGE062
,限制了本发明在大规模样本集的应用,进一步可将上式改写为:Due to the Laplace matrix
Figure DEST_PATH_IMAGE059
The size is
Figure 798027DEST_PATH_IMAGE060
, so calculate
Figure DEST_PATH_IMAGE061
The computational complexity and memory overhead are both
Figure 597356DEST_PATH_IMAGE062
, which limits the application of the present invention in large-scale sample sets, and the above formula can be further rewritten as:

Figure DEST_PATH_IMAGE063
Figure DEST_PATH_IMAGE063

然而计算

Figure 435255DEST_PATH_IMAGE064
Figure DEST_PATH_IMAGE065
的计算复杂度和内存开销仍为
Figure 927416DEST_PATH_IMAGE066
,本发明提出预定义常量
Figure DEST_PATH_IMAGE067
,则
Figure 903462DEST_PATH_IMAGE068
Figure DEST_PATH_IMAGE069
,进一步预定义常量
Figure 257215DEST_PATH_IMAGE070
,则
Figure DEST_PATH_IMAGE071
可写为
Figure 963002DEST_PATH_IMAGE072
,而计算
Figure DEST_PATH_IMAGE073
的计算复杂度和内存开销为
Figure 609753DEST_PATH_IMAGE074
;对于
Figure DEST_PATH_IMAGE075
可改写为
Figure 338675DEST_PATH_IMAGE076
,而计算此项的计算复杂度和内存开销为
Figure DEST_PATH_IMAGE077
,因此计算
Figure 479806DEST_PATH_IMAGE078
的复杂度和内存开销都减少为
Figure DEST_PATH_IMAGE079
;However, calculating
Figure 435255DEST_PATH_IMAGE064
and
Figure DEST_PATH_IMAGE065
The computational complexity and memory overhead are still
Figure 927416DEST_PATH_IMAGE066
, the present invention proposes predefined constants
Figure DEST_PATH_IMAGE067
,but
Figure 903462DEST_PATH_IMAGE068
,
Figure DEST_PATH_IMAGE069
, further predefined constants
Figure 257215DEST_PATH_IMAGE070
,but
Figure DEST_PATH_IMAGE071
can be written as
Figure 963002DEST_PATH_IMAGE072
, while computing
Figure DEST_PATH_IMAGE073
The computational complexity and memory overhead of
Figure 609753DEST_PATH_IMAGE074
;for
Figure DEST_PATH_IMAGE075
can be rewritten as
Figure 338675DEST_PATH_IMAGE076
, and the computational complexity and memory overhead of computing this item are
Figure DEST_PATH_IMAGE077
, so calculate
Figure 479806DEST_PATH_IMAGE078
The complexity and memory overhead are reduced to
Figure DEST_PATH_IMAGE079
;

步骤72、固定

Figure 790833DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE081
Figure 296901DEST_PATH_IMAGE082
Figure 575435DEST_PATH_IMAGE053
求解
Figure DEST_PATH_IMAGE083
:与求解
Figure 457941DEST_PATH_IMAGE084
类似,可得:Step 72, fix
Figure 790833DEST_PATH_IMAGE080
,
Figure DEST_PATH_IMAGE081
,
Figure 296901DEST_PATH_IMAGE082
and
Figure 575435DEST_PATH_IMAGE053
solve
Figure DEST_PATH_IMAGE083
: and solve
Figure 457941DEST_PATH_IMAGE084
Similarly, you can get:

Figure DEST_PATH_IMAGE085
Figure DEST_PATH_IMAGE085

进一步利用与求解

Figure 124939DEST_PATH_IMAGE086
类似的方法,可将计算
Figure DEST_PATH_IMAGE087
的复杂度和内存开销都减少为
Figure 129804DEST_PATH_IMAGE088
;Further use and solution
Figure 124939DEST_PATH_IMAGE086
A similar method can be used to calculate
Figure DEST_PATH_IMAGE087
The complexity and memory overhead are reduced to
Figure 129804DEST_PATH_IMAGE088
;

步骤73、固定

Figure DEST_PATH_IMAGE089
Figure 380788DEST_PATH_IMAGE090
Figure DEST_PATH_IMAGE091
Figure 801405DEST_PATH_IMAGE092
求解
Figure DEST_PATH_IMAGE093
:去除与
Figure 336291DEST_PATH_IMAGE093
无关的项,则目标函数变为:Step 73, fix
Figure DEST_PATH_IMAGE089
,
Figure 380788DEST_PATH_IMAGE090
,
Figure DEST_PATH_IMAGE091
and
Figure 801405DEST_PATH_IMAGE092
solve
Figure DEST_PATH_IMAGE093
: remove with
Figure 336291DEST_PATH_IMAGE093
irrelevant terms, the objective function becomes:

Figure 495746DEST_PATH_IMAGE094
Figure 495746DEST_PATH_IMAGE094

对上式求关于

Figure DEST_PATH_IMAGE095
的导数,并令其等于0,则可得:Find about the above formula
Figure DEST_PATH_IMAGE095
the derivative of , and make it equal to 0, then we get:

Figure 420977DEST_PATH_IMAGE096
Figure 420977DEST_PATH_IMAGE096

步骤74、固定

Figure DEST_PATH_IMAGE097
Figure 707602DEST_PATH_IMAGE090
Figure 847727DEST_PATH_IMAGE093
Figure 132078DEST_PATH_IMAGE092
求解
Figure 544605DEST_PATH_IMAGE091
:去除与
Figure 369341DEST_PATH_IMAGE091
无关的项,则目标函数变为:Step 74, fix
Figure DEST_PATH_IMAGE097
,
Figure 707602DEST_PATH_IMAGE090
,
Figure 847727DEST_PATH_IMAGE093
and
Figure 132078DEST_PATH_IMAGE092
solve
Figure 544605DEST_PATH_IMAGE091
: remove with
Figure 369341DEST_PATH_IMAGE091
irrelevant terms, the objective function becomes:

Figure 550924DEST_PATH_IMAGE098
Figure 550924DEST_PATH_IMAGE098

上式可由奇异值分解(SVD)算法求解,即

Figure DEST_PATH_IMAGE099
,其中
Figure 320690DEST_PATH_IMAGE100
为左奇异矩阵,
Figure DEST_PATH_IMAGE101
为右奇异矩阵,
Figure 282830DEST_PATH_IMAGE102
为奇异值矩阵,则
Figure DEST_PATH_IMAGE103
;The above formula can be solved by the singular value decomposition (SVD) algorithm, namely
Figure DEST_PATH_IMAGE099
,in
Figure 320690DEST_PATH_IMAGE100
is a left singular matrix,
Figure DEST_PATH_IMAGE101
is a right singular matrix,
Figure 282830DEST_PATH_IMAGE102
is a singular value matrix, then
Figure DEST_PATH_IMAGE103
;

步骤75、固定

Figure 661990DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE105
Figure 760396DEST_PATH_IMAGE106
Figure 698134DEST_PATH_IMAGE091
求解
Figure 85253DEST_PATH_IMAGE092
:去除与
Figure 189475DEST_PATH_IMAGE092
无关的项,则目标函数变为:Step 75, fix
Figure 661990DEST_PATH_IMAGE104
,
Figure DEST_PATH_IMAGE105
,
Figure 760396DEST_PATH_IMAGE106
and
Figure 698134DEST_PATH_IMAGE091
solve
Figure 85253DEST_PATH_IMAGE092
: remove with
Figure 189475DEST_PATH_IMAGE092
irrelevant terms, the objective function becomes:

Figure DEST_PATH_IMAGE107
Figure DEST_PATH_IMAGE107

可得:Available:

Figure 407967DEST_PATH_IMAGE108
Figure 407967DEST_PATH_IMAGE108

其中

Figure DEST_PATH_IMAGE109
表示符号函数;in
Figure DEST_PATH_IMAGE109
represents a symbolic function;

步骤76、重复步骤71-步骤75,直到算法收敛或达到最大迭代次数;Step 76: Repeat steps 71 to 75 until the algorithm converges or the maximum number of iterations is reached;

步骤8、用户输入查询样本,样本可以为图像也可为文本,提取其特征,将特征进行归一化和去均值,并利用高斯径向基函数将样本的特征映射到非线性空间,得到查询样本的表示

Figure 18071DEST_PATH_IMAGE110
;Step 8. The user inputs a query sample, the sample can be an image or a text, extract its features, normalize and de-average the features, and use the Gaussian radial basis function to map the features of the sample to a nonlinear space to obtain the query representation of the sample
Figure 18071DEST_PATH_IMAGE110
;

步骤9、利用已学习的线性映射函数和旋转矩阵,生成查询样本的哈希码:Step 9. Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:

Figure DEST_PATH_IMAGE111
Figure DEST_PATH_IMAGE111
;

步骤10、计算查询样本与样本集中异构样本哈希码的汉明距离,并按汉明距离从小到大排列,返回前

Figure 689223DEST_PATH_IMAGE112
个样本即为检索结果。Step 10. Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large.
Figure 689223DEST_PATH_IMAGE112
Each sample is the search result.

本发明与现有技术相比,其有益效果在于:Compared with the prior art, the present invention has the following beneficial effects:

1、通过引入的常量将基于谱嵌入算法的计算复杂度和内存开销由

Figure DEST_PATH_IMAGE113
降低到
Figure 188949DEST_PATH_IMAGE114
。1. By introducing constants, the computational complexity and memory overhead of the spectral embedding algorithm are reduced by
Figure DEST_PATH_IMAGE113
Reduce to
Figure 188949DEST_PATH_IMAGE114
.

2、利用保持模态内、模态间的语义相似性和基于标签的相似性学习哈希哈希码,提升了哈希码的性能。2. Hash codes are learned by maintaining intra-modal and inter-modal semantic similarity and label-based similarity, which improves the performance of hash codes.

3、利用监督的方式学习一个正交旋转矩阵减少量化误差,进一步增强了哈希码的区分能力,提升了算法的性能。3. Learning an orthogonal rotation matrix in a supervised way reduces quantization error, further enhances the distinguishing ability of hash codes, and improves the performance of the algorithm.

附图说明Description of drawings

图1为本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法的步骤流程图。FIG. 1 is a flow chart of the steps of the cross-media hash retrieval method for efficient supervised graph embedding for image and text samples according to the present invention.

具体实施方式Detailed ways

为对本发明的技术方案进行更完整、清楚地描述,以下结合具体实施方式对本发明进一步详细描述,应当理解,此处所描述的实施例仅是说明和解释本发明,并不是用于限定本发明的保护范围。In order to describe the technical solutions of the present invention more completely and clearly, the present invention will be described in further detail below with reference to specific embodiments. It should be understood that the embodiments described herein are only for illustrating and explaining the present invention, not for limiting the present invention. protected range.

本发明面向图文样本的高效监督图嵌入跨媒体哈希检索方法,在互联网上收集图像和文本样本,并将来源于同一网页的图像和文本样本构成样本对,建立图文样对本集,标注样本对的类别,并将图文样本集划分为训练集和测试集;提取训练集和测试集中所有图像和文本样本的特征,并利用径向基高斯核函数将其特征映射到非线性空间;利用样本对的类别标签构建样本对的图邻接矩阵,并进一步得到图的Laplace矩阵;利用线性映射将类别标签映射到一个潜在语义空间,并在此空间通过保持图像和文本样本的模态间和模态内的语义相似性为图像和文本模态分别学习线性映射矩阵;通过学习一个正交旋转矩阵最小化量化误差;提出一种高效的离散迭代优化算法,通过预定义几个常量避免直接利用Laplace矩阵求解,提升了算法的高效性,并可直接得到哈希码的离散解;本发明利用保持图像和文本样本的模态内、模态间语义相似性、基于类别标签的相似性和最小化量化误差学习哈希码,提升了算法的检索性能。The present invention is an efficient supervised graph embedding cross-media hash retrieval method for image and text samples, collects images and text samples on the Internet, forms sample pairs from images and text samples from the same web page, establishes a set of image and text sample pairs, and labels them. Classify the sample pair, and divide the image and text sample set into training set and test set; extract the features of all image and text samples in the training set and test set, and use the radial basis Gaussian kernel function to map their features to the nonlinear space; The graph adjacency matrix of the sample pair is constructed by the class label of the sample pair, and the Laplace matrix of the graph is further obtained; the class label is mapped to a latent semantic space using linear mapping, and in this space, the modalities of the image and text samples are maintained by maintaining the sum Semantic similarity within modalities learns linear mapping matrices for image and text modalities separately; minimizes quantization error by learning an orthogonal rotation matrix; proposes an efficient discrete iterative optimization algorithm that avoids direct use by predefining several constants The Laplace matrix solution improves the efficiency of the algorithm, and can directly obtain the discrete solution of the hash code; the present invention maintains the intra-modal and inter-modal semantic similarity of the image and text samples, the similarity based on the category label and the minimum The quantization error is used to learn the hash code, which improves the retrieval performance of the algorithm.

参见图1,一种面向图文样本的高效监督图嵌入跨媒体哈希检索方法,其特征在于其利用计算机装置实现如下步骤:Referring to Fig. 1, an efficient supervised graph embedding cross-media hash retrieval method oriented to graphic and text samples is characterized in that it utilizes a computer device to realize the following steps:

第一步:从网络上收集图像和文本样本,并将属于同一网页的图像和文本样本作为图文样本对构成图文样本集,标注图文样本对的类别,并随机选择75%的图文样本对构成训练集,剩余的构成测试集;Step 1: Collect image and text samples from the Internet, and use image and text samples belonging to the same web page as image-text sample pairs to form a graphic-text sample set, mark the category of graphic-text sample pairs, and randomly select 75% of the graphic-text samples The sample pairs constitute the training set, and the rest constitute the test set;

第二步:提取所有图像样本的150维纹理特征、所有文本样本的500维BOW(Bag OfWords)特征,并对特征进行归一化和去均值;Step 2: Extract 150-dimensional texture features of all image samples and 500-dimensional BOW (Bag OfWords) features of all text samples, and normalize and de-average the features;

第三步:训练集中图文样本对的特征用

Figure DEST_PATH_IMAGE115
表示,其中
Figure 261947DEST_PATH_IMAGE116
Figure DEST_PATH_IMAGE117
分别表示训练集中所有图像、文本样本的特征,
Figure 229903DEST_PATH_IMAGE118
Figure DEST_PATH_IMAGE119
Figure 139084DEST_PATH_IMAGE120
表示样本对的数量,
Figure DEST_PATH_IMAGE121
表示样本对的类别标签,其中
Figure 647426DEST_PATH_IMAGE122
表示样本类别的数量;随机选择500个样本
Figure DEST_PATH_IMAGE123
(其中
Figure 89777DEST_PATH_IMAGE124
)作为锚点,利用高斯径向基函数将样本的特征映射到非线性空间:Step 3: The features of the image-text sample pair in the training set are used
Figure DEST_PATH_IMAGE115
said, of which
Figure 261947DEST_PATH_IMAGE116
,
Figure DEST_PATH_IMAGE117
represent the features of all images and text samples in the training set, respectively,
Figure 229903DEST_PATH_IMAGE118
,
Figure DEST_PATH_IMAGE119
,
Figure 139084DEST_PATH_IMAGE120
represents the number of sample pairs,
Figure DEST_PATH_IMAGE121
class labels representing pairs of samples, where
Figure 647426DEST_PATH_IMAGE122
Indicates the number of sample categories; 500 samples are randomly selected
Figure DEST_PATH_IMAGE123
(in
Figure 89777DEST_PATH_IMAGE124
) as an anchor point, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space:

Figure DEST_PATH_IMAGE125
Figure DEST_PATH_IMAGE125

其中

Figure 290952DEST_PATH_IMAGE126
Figure DEST_PATH_IMAGE127
Figure 687429DEST_PATH_IMAGE128
表示
Figure DEST_PATH_IMAGE129
范数;in
Figure 290952DEST_PATH_IMAGE126
,
Figure DEST_PATH_IMAGE127
,
Figure 687429DEST_PATH_IMAGE128
express
Figure DEST_PATH_IMAGE129
norm;

第四步:利用图文样本对的类别标签构造样本对的图邻接矩阵

Figure 937145DEST_PATH_IMAGE130
,其定义如下:Step 4: Construct the graph adjacency matrix of the sample pair using the class label of the image-text sample pair
Figure 937145DEST_PATH_IMAGE130
, which is defined as follows:

Figure DEST_PATH_IMAGE131
Figure DEST_PATH_IMAGE131

其中,

Figure 984735DEST_PATH_IMAGE132
表示矩阵
Figure DEST_PATH_IMAGE133
的第
Figure DEST_PATH_IMAGE135
行第
Figure 609008DEST_PATH_IMAGE136
列的数值,
Figure DEST_PATH_IMAGE137
表示
Figure 742049DEST_PATH_IMAGE138
范数;in,
Figure 984735DEST_PATH_IMAGE132
representation matrix
Figure DEST_PATH_IMAGE133
First
Figure DEST_PATH_IMAGE135
row
Figure 609008DEST_PATH_IMAGE136
the value of the column,
Figure DEST_PATH_IMAGE137
express
Figure 742049DEST_PATH_IMAGE138
norm;

第五步:进一步得到图邻接矩阵

Figure DEST_PATH_IMAGE139
的Laplace矩阵
Figure 795456DEST_PATH_IMAGE140
,其中
Figure DEST_PATH_IMAGE141
是对角阵,其对角元素
Figure 448285DEST_PATH_IMAGE142
;Step 5: Further get the graph adjacency matrix
Figure DEST_PATH_IMAGE139
Laplace matrix
Figure 795456DEST_PATH_IMAGE140
,in
Figure DEST_PATH_IMAGE141
is a diagonal matrix whose diagonal elements
Figure 448285DEST_PATH_IMAGE142
;

第六步:基于上述的变量,利用保持样本特征的模态间、模态内语义相似性和最小化量化误差构造本方法的目标函数,其定义如下:Step 6: Based on the above variables, the objective function of this method is constructed by maintaining the inter-modal and intra-modal semantic similarity of the sample features and minimizing the quantization error, which is defined as follows:

Figure DEST_PATH_IMAGE143
Figure DEST_PATH_IMAGE143

其中

Figure 991262DEST_PATH_IMAGE144
Figure DEST_PATH_IMAGE145
Figure 595287DEST_PATH_IMAGE146
Figure DEST_PATH_IMAGE147
Figure 452385DEST_PATH_IMAGE148
Figure DEST_PATH_IMAGE149
Figure 208988DEST_PATH_IMAGE040
Figure 860550DEST_PATH_IMAGE150
分别表示为图像和文本样本模态学习的线性投影矩阵,
Figure DEST_PATH_IMAGE151
表示哈希码的长度,
Figure 718915DEST_PATH_IMAGE152
表示矩阵的迹,
Figure DEST_PATH_IMAGE153
为线性映射矩阵,
Figure 176441DEST_PATH_IMAGE154
为图文样本对学习的哈希码,
Figure DEST_PATH_IMAGE155
为正交旋转矩阵,
Figure 39749DEST_PATH_IMAGE156
表示尺寸为
Figure DEST_PATH_IMAGE157
的单位阵,
Figure 127790DEST_PATH_IMAGE158
表示正则化项;in
Figure 991262DEST_PATH_IMAGE144
,
Figure DEST_PATH_IMAGE145
,
Figure 595287DEST_PATH_IMAGE146
,
Figure DEST_PATH_IMAGE147
,
Figure 452385DEST_PATH_IMAGE148
,
Figure DEST_PATH_IMAGE149
,
Figure 208988DEST_PATH_IMAGE040
and
Figure 860550DEST_PATH_IMAGE150
are denoted as the linear projection matrices learned by the image and text sample modalities, respectively,
Figure DEST_PATH_IMAGE151
represents the length of the hash code,
Figure 718915DEST_PATH_IMAGE152
represents the trace of the matrix,
Figure DEST_PATH_IMAGE153
is a linear mapping matrix,
Figure 176441DEST_PATH_IMAGE154
is the learned hash code for the image-text sample pair,
Figure DEST_PATH_IMAGE155
is an orthogonal rotation matrix,
Figure 39749DEST_PATH_IMAGE156
Indicates the size of
Figure DEST_PATH_IMAGE157
the unit array,
Figure 127790DEST_PATH_IMAGE158
represents the regularization term;

第七步:利用迭代优化算法求解目标函数,首先初始化迭代次数

Figure DEST_PATH_IMAGE159
、最大迭代次数
Figure 457141DEST_PATH_IMAGE160
、目标函数的值
Figure DEST_PATH_IMAGE161
(足够大的数)和阈值0.001,具体包括以下步骤:Step 7: Use the iterative optimization algorithm to solve the objective function, first initialize the number of iterations
Figure DEST_PATH_IMAGE159
,The maximum number of iterations
Figure 457141DEST_PATH_IMAGE160
, the value of the objective function
Figure DEST_PATH_IMAGE161
(a large enough number) and a threshold of 0.001, including the following steps:

(1)固定

Figure 203511DEST_PATH_IMAGE162
Figure DEST_PATH_IMAGE163
Figure 872389DEST_PATH_IMAGE164
Figure DEST_PATH_IMAGE165
求解
Figure 193649DEST_PATH_IMAGE081
:去除与
Figure 993984DEST_PATH_IMAGE055
无关的项,则目标函数变为:(1) Fixed
Figure 203511DEST_PATH_IMAGE162
,
Figure DEST_PATH_IMAGE163
,
Figure 872389DEST_PATH_IMAGE164
and
Figure DEST_PATH_IMAGE165
solve
Figure 193649DEST_PATH_IMAGE081
: remove with
Figure 993984DEST_PATH_IMAGE055
irrelevant terms, the objective function becomes:

Figure 996575DEST_PATH_IMAGE166
Figure 996575DEST_PATH_IMAGE166

对上式求关于

Figure DEST_PATH_IMAGE167
的导数,并令其等于0,则可得:Find about the above formula
Figure DEST_PATH_IMAGE167
the derivative of , and make it equal to 0, then we get:

Figure 582277DEST_PATH_IMAGE168
Figure 582277DEST_PATH_IMAGE168

由于Laplace矩阵

Figure DEST_PATH_IMAGE169
的尺寸是
Figure 746542DEST_PATH_IMAGE170
的,所以计算
Figure 801217DEST_PATH_IMAGE078
的复杂度和内存开销都为
Figure DEST_PATH_IMAGE171
,限制了本发明在大规模样本集的应用,进一步可将上式改写为:Due to the Laplace matrix
Figure DEST_PATH_IMAGE169
The size is
Figure 746542DEST_PATH_IMAGE170
, so calculate
Figure 801217DEST_PATH_IMAGE078
The complexity and memory overhead of
Figure DEST_PATH_IMAGE171
, which limits the application of the present invention in large-scale sample sets, and the above formula can be further rewritten as:

Figure 341920DEST_PATH_IMAGE172
Figure 341920DEST_PATH_IMAGE172

然而计算

Figure DEST_PATH_IMAGE173
Figure 782129DEST_PATH_IMAGE174
的复杂度和内存开销仍为
Figure DEST_PATH_IMAGE175
,本发明提出预定义常量
Figure 697388DEST_PATH_IMAGE176
,则
Figure DEST_PATH_IMAGE177
Figure 223048DEST_PATH_IMAGE178
,进一步预定义常量
Figure DEST_PATH_IMAGE179
,则
Figure 380491DEST_PATH_IMAGE180
可写为
Figure DEST_PATH_IMAGE181
,而计算
Figure 878468DEST_PATH_IMAGE182
的复杂度和内存开销为
Figure DEST_PATH_IMAGE183
;对于
Figure 446853DEST_PATH_IMAGE184
可改写为
Figure DEST_PATH_IMAGE185
,而计算此项的复杂度和内存开销为
Figure 974655DEST_PATH_IMAGE186
,因此计算
Figure 857160DEST_PATH_IMAGE061
的复杂度和内存开销都减少为
Figure 209644DEST_PATH_IMAGE074
;However, calculating
Figure DEST_PATH_IMAGE173
and
Figure 782129DEST_PATH_IMAGE174
The complexity and memory overhead are still
Figure DEST_PATH_IMAGE175
, the present invention proposes predefined constants
Figure 697388DEST_PATH_IMAGE176
,but
Figure DEST_PATH_IMAGE177
,
Figure 223048DEST_PATH_IMAGE178
, further predefined constants
Figure DEST_PATH_IMAGE179
,but
Figure 380491DEST_PATH_IMAGE180
can be written as
Figure DEST_PATH_IMAGE181
, while computing
Figure 878468DEST_PATH_IMAGE182
The complexity and memory overhead of
Figure DEST_PATH_IMAGE183
;for
Figure 446853DEST_PATH_IMAGE184
can be rewritten as
Figure DEST_PATH_IMAGE185
, and the complexity and memory overhead of computing this item is
Figure 974655DEST_PATH_IMAGE186
, so calculate
Figure 857160DEST_PATH_IMAGE061
The complexity and memory overhead are reduced to
Figure 209644DEST_PATH_IMAGE074
;

(2)固定

Figure 948930DEST_PATH_IMAGE162
Figure DEST_PATH_IMAGE187
Figure 386864DEST_PATH_IMAGE188
Figure DEST_PATH_IMAGE189
求解
Figure 886110DEST_PATH_IMAGE190
:与求解
Figure 155417DEST_PATH_IMAGE055
类似,可得:(2) Fixed
Figure 948930DEST_PATH_IMAGE162
,
Figure DEST_PATH_IMAGE187
,
Figure 386864DEST_PATH_IMAGE188
and
Figure DEST_PATH_IMAGE189
solve
Figure 886110DEST_PATH_IMAGE190
: and solve
Figure 155417DEST_PATH_IMAGE055
Similarly, you can get:

Figure DEST_PATH_IMAGE191
Figure DEST_PATH_IMAGE191

进一步利用与求解

Figure 268867DEST_PATH_IMAGE192
类似的方法,可将计算
Figure DEST_PATH_IMAGE193
的复杂度和内存开销都减少为
Figure 496893DEST_PATH_IMAGE194
;Further use and solution
Figure 268867DEST_PATH_IMAGE192
A similar method can be used to calculate
Figure DEST_PATH_IMAGE193
The complexity and memory overhead are reduced to
Figure 496893DEST_PATH_IMAGE194
;

(3)固定

Figure DEST_PATH_IMAGE195
Figure 783518DEST_PATH_IMAGE196
Figure 110594DEST_PATH_IMAGE188
Figure 394945DEST_PATH_IMAGE189
求解
Figure DEST_PATH_IMAGE197
:去除与
Figure 417259DEST_PATH_IMAGE197
无关的项,则目标函数变为:(3) Fixed
Figure DEST_PATH_IMAGE195
,
Figure 783518DEST_PATH_IMAGE196
,
Figure 110594DEST_PATH_IMAGE188
and
Figure 394945DEST_PATH_IMAGE189
solve
Figure DEST_PATH_IMAGE197
: remove with
Figure 417259DEST_PATH_IMAGE197
irrelevant terms, the objective function becomes:

Figure 445257DEST_PATH_IMAGE198
Figure 445257DEST_PATH_IMAGE198

对上式求关于

Figure DEST_PATH_IMAGE199
的导数,并令其等于0,则可得:Find about the above formula
Figure DEST_PATH_IMAGE199
the derivative of , and make it equal to 0, then we get:

Figure 938425DEST_PATH_IMAGE200
Figure 938425DEST_PATH_IMAGE200

(4)固定

Figure 393677DEST_PATH_IMAGE081
Figure 293500DEST_PATH_IMAGE090
Figure 921927DEST_PATH_IMAGE162
Figure 958016DEST_PATH_IMAGE189
求解
Figure DEST_PATH_IMAGE201
:去除与
Figure 397219DEST_PATH_IMAGE201
无关的项,则目标函数变为:(4) Fixed
Figure 393677DEST_PATH_IMAGE081
,
Figure 293500DEST_PATH_IMAGE090
,
Figure 921927DEST_PATH_IMAGE162
and
Figure 958016DEST_PATH_IMAGE189
solve
Figure DEST_PATH_IMAGE201
: remove with
Figure 397219DEST_PATH_IMAGE201
irrelevant terms, the objective function becomes:

Figure 518759DEST_PATH_IMAGE202
Figure 518759DEST_PATH_IMAGE202

上式可由奇异值分解(SVD)算法求解,即

Figure DEST_PATH_IMAGE203
,其中
Figure 950877DEST_PATH_IMAGE204
为左奇异矩阵,
Figure DEST_PATH_IMAGE205
为右奇异矩阵,
Figure 421566DEST_PATH_IMAGE206
为奇异值矩阵,则
Figure 218621DEST_PATH_IMAGE103
;The above formula can be solved by the singular value decomposition (SVD) algorithm, namely
Figure DEST_PATH_IMAGE203
,in
Figure 950877DEST_PATH_IMAGE204
is a left singular matrix,
Figure DEST_PATH_IMAGE205
is a right singular matrix,
Figure 421566DEST_PATH_IMAGE206
is a singular value matrix, then
Figure 218621DEST_PATH_IMAGE103
;

(5)固定

Figure DEST_PATH_IMAGE207
Figure 889773DEST_PATH_IMAGE163
Figure 610736DEST_PATH_IMAGE208
Figure 621417DEST_PATH_IMAGE201
求解
Figure 651690DEST_PATH_IMAGE053
:去除与
Figure 747822DEST_PATH_IMAGE053
无关的项,则目标函数变为:(5) Fixed
Figure DEST_PATH_IMAGE207
,
Figure 889773DEST_PATH_IMAGE163
,
Figure 610736DEST_PATH_IMAGE208
and
Figure 621417DEST_PATH_IMAGE201
solve
Figure 651690DEST_PATH_IMAGE053
: remove with
Figure 747822DEST_PATH_IMAGE053
irrelevant terms, the objective function becomes:

Figure DEST_PATH_IMAGE209
Figure DEST_PATH_IMAGE209

可得:Available:

Figure 771011DEST_PATH_IMAGE210
Figure 771011DEST_PATH_IMAGE210

其中

Figure DEST_PATH_IMAGE211
表示符号函数;in
Figure DEST_PATH_IMAGE211
represents a symbolic function;

(6)计算目标函数的值

Figure 964095DEST_PATH_IMAGE212
,并判断
Figure DEST_PATH_IMAGE213
Figure 916001DEST_PATH_IMAGE214
是否成立,如果成立则停止迭代;如果不成立则
Figure DEST_PATH_IMAGE215
Figure 233850DEST_PATH_IMAGE216
,并重复执行步骤(1)—(5);(6) Calculate the value of the objective function
Figure 964095DEST_PATH_IMAGE212
, and judge
Figure DEST_PATH_IMAGE213
or
Figure 916001DEST_PATH_IMAGE214
Whether it is established, if so, stop the iteration; if not, then
Figure DEST_PATH_IMAGE215
,
Figure 233850DEST_PATH_IMAGE216
, and repeat steps (1)-(5);

第八步:用户输入查询样本,也可以为图像也可为文本,如果输入图像则提取其150维的纹理特征,如果输入文本则提取其500维的BOW特征,将特征进行归一化和去均值,并利用高斯径向基函数将样本的特征映射到非线性空间,得到查询样本的表示

Figure DEST_PATH_IMAGE217
;Step 8: The user inputs the query sample, which can be either an image or a text. If an image is input, its 150-dimensional texture features are extracted, and if a text is input, its 500-dimensional BOW features are extracted, and the features are normalized and removed. mean, and use the Gaussian radial basis function to map the features of the sample to the nonlinear space to obtain the representation of the query sample
Figure DEST_PATH_IMAGE217
;

第九步:利用已学习的线性映射函数和旋转矩阵,生成查询样本的哈希码:Step 9: Use the learned linear mapping function and rotation matrix to generate the hash code of the query sample:

Figure 811462DEST_PATH_IMAGE218
Figure 811462DEST_PATH_IMAGE218
;

第十步:计算查询样本与样本集中异构样本哈希码的汉明距离,并按汉明距离从小到大排列,返回前

Figure DEST_PATH_IMAGE219
个样本即为检索结果。Step 10: Calculate the Hamming distance between the query sample and the hash codes of the heterogeneous samples in the sample set, and arrange them according to the Hamming distance from small to large.
Figure DEST_PATH_IMAGE219
Each sample is the search result.

本实施例在公开样本集Mirflickr25K上验证本发明方法的有效性,此样本集包含20015个从社交网站Flickr上收集的图像文本对,这些样本对包含24个语义类别;本实施例随机选取75%的图文样本对作为训练集,剩余的25%作为测试集;每张图像表示为150维的Gist特征(纹理特征),文本表示为500维的BOW(Bag Of Words)特征,并对特征做归一化、去均值处理;为评价本发明方法的检索性能,在此用平均准确率(Mean AveragePrecision,MAP@100)作为评价标准,即MAP由前100个返回的样本计算,不同哈希码长在图像检索文本和文本检索图像两个任务上的MAP@100结果,如表1所示为本发明在Mirflickr25K样本集上的MAP@100结果,结果显示本发明方法的检索性能其平均准确率明显高于已有技术。This embodiment verifies the effectiveness of the method of the present invention on the public sample set Mirflickr25K. This sample set includes 20015 image-text pairs collected from the social networking site Flickr, and these sample pairs include 24 semantic categories; 75% of the sample pairs are randomly selected in this embodiment. The image and text sample pairs are used as the training set, and the remaining 25% are used as the test set; each image is represented as a 150-dimensional Gist feature (texture feature), and the text is represented as a 500-dimensional BOW (Bag Of Words) feature. Normalization and de-average processing; in order to evaluate the retrieval performance of the method of the present invention, the average accuracy rate (Mean AveragePrecision, MAP@100) is used as the evaluation standard, that is, the MAP is calculated from the first 100 returned samples, and different hash codes are used. The MAP@100 results on the two tasks of image retrieval text and text retrieval images are shown in Table 1. The MAP@100 results on the Mirflickr25K sample set of the present invention are shown in Table 1. The results show the retrieval performance of the method of the present invention and its average accuracy rate significantly higher than the existing technology.

表 1Table 1

Figure 111250DEST_PATH_IMAGE220
Figure 111250DEST_PATH_IMAGE220

Claims (4)

1. An image-text sample-oriented efficient supervised graph embedding cross-media Hash retrieval method is characterized by comprising the following steps:
step 1, collecting images and text samples from a network, taking the images and the text samples belonging to the same webpage as image-text sample pairs to form an image-text sample set, labeling the types of the image-text sample pairs, and dividing the image-text sample pairs into a training set and a test set;
step 2, extracting the characteristics of all images and text samples in the training set and the test set, and normalizing and removing the mean value of the characteristics;
step 3, feature use of image-text sample pair in training set
Figure DEST_PATH_IMAGE002
Is shown in which
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Respectively representing the characteristics of all image samples and text samples in the training set,
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE010
which represents a real number of the digital signal,
Figure DEST_PATH_IMAGE012
the dimensions of the features are represented in a graph,
Figure DEST_PATH_IMAGE014
representing the number of pairs of teletext samples in the training set,
Figure DEST_PATH_IMAGE016
class labels representing pairs of samples, wherein
Figure DEST_PATH_IMAGE018
The total number of categories is represented as,
Figure DEST_PATH_IMAGE020
representing the number of pairs of teletext samples; random selection
Figure DEST_PATH_IMAGE022
A sample pair
Figure DEST_PATH_IMAGE024
As anchor points, wherein
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
Mapping the characteristics of all image samples and text samples to a nonlinear space by using a Gaussian radial basis function:
Figure DEST_PATH_IMAGE030
wherein
Figure DEST_PATH_IMAGE032
In order to be a scale parameter,
Figure DEST_PATH_IMAGE034
to represent
Figure DEST_PATH_IMAGE036
The norm of the number of the first-order-of-arrival,
Figure DEST_PATH_IMAGE038
represents a transpose of a matrix or vector;
step 4, constructing a graph adjacency matrix of the sample pairs by using the class labels of the image-text sample pairs
Figure DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE042
Represents a real number, which is defined as follows:
Figure DEST_PATH_IMAGE044
wherein,
Figure DEST_PATH_IMAGE046
representation matrix
Figure DEST_PATH_IMAGE048
To (1) a
Figure DEST_PATH_IMAGE050
Go to the first
Figure DEST_PATH_IMAGE052
The values of the columns are such that,
Figure DEST_PATH_IMAGE054
to represent
Figure DEST_PATH_IMAGE056
A norm;
step 5, constructing a graph adjacency matrix
Figure 150601DEST_PATH_IMAGE048
Is the Laplace matrix
Figure DEST_PATH_IMAGE058
Wherein
Figure DEST_PATH_IMAGE060
Is that
Figure DEST_PATH_IMAGE062
Diagonal matrix of (2), diagonal elements thereof
Figure DEST_PATH_IMAGE064
Step 6, combining the steps 1 to 5, constructing a target function of the method by using the inter-modal and intra-modal semantic similarity and the minimized quantization error which keep the characteristics of the samples;
7, solving an objective function by using an iterative optimization algorithm;
step 8, inputting a query sample by a user, extracting the characteristics of the query sample, normalizing and removing the mean value of the characteristics, and mapping the characteristics of the sample to a nonlinear space by using a Gaussian radial basis function to obtain the representation of the query sample
Figure DEST_PATH_IMAGE066
Step 9, generating a hash code of the query sample by utilizing the learned linear mapping function and the rotation matrix;
step 10, calculating Hamming distances of the hash codes of the query samples and the heterogeneous samples in the sample set, arranging the Hamming distances from small to large, and returning to the previous step
Figure DEST_PATH_IMAGE068
And obtaining the retrieval result by the sample.
2. The method for retrieving the cross-media hash of the embedded efficient supervision map for the image-text sample as claimed in claim 1, wherein the objective function in step 6 is defined as follows:
Figure DEST_PATH_IMAGE070
wherein
Figure DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE076
Figure DEST_PATH_IMAGE078
Figure DEST_PATH_IMAGE080
And
Figure DEST_PATH_IMAGE082
in order to be a weight parameter, the weight parameter,
Figure DEST_PATH_IMAGE084
and
Figure DEST_PATH_IMAGE086
respectively expressed as linear projection matrices learned for image sample and text sample modalities,
Figure DEST_PATH_IMAGE088
which indicates the length of the hash code and,
Figure DEST_PATH_IMAGE090
the traces of the matrix are represented by,
Figure DEST_PATH_IMAGE092
in order to be a linear mapping matrix, the mapping matrix is,
Figure DEST_PATH_IMAGE094
for the learned hash code of the image-text sample pair,
Figure DEST_PATH_IMAGE096
is an orthogonal rotation matrix and is characterized in that,
Figure DEST_PATH_IMAGE098
expressed in size of
Figure DEST_PATH_IMAGE100
The unit matrix of (a) is obtained,
Figure DEST_PATH_IMAGE102
a regularization term is represented.
3. The method for retrieving the cross-media hash embedded in the efficient supervision map facing the image-text sample as claimed in claim 1 or 2, wherein the step 7 of solving the objective function specifically comprises the following steps:
step 71, fixing
Figure DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE106
Figure DEST_PATH_IMAGE108
And
Figure DEST_PATH_IMAGE110
solving for
Figure DEST_PATH_IMAGE112
: removing and
Figure DEST_PATH_IMAGE114
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE116
to the above formula
Figure DEST_PATH_IMAGE118
And let it equal 0, then:
Figure DEST_PATH_IMAGE120
laplace matrix
Figure DEST_PATH_IMAGE122
Is composed of
Figure DEST_PATH_IMAGE124
The matrix is a matrix of a plurality of matrices,
Figure DEST_PATH_IMAGE126
both the computational complexity and the memory overhead of
Figure DEST_PATH_IMAGE128
Figure DEST_PATH_IMAGE130
Figure DEST_PATH_IMAGE132
And
Figure DEST_PATH_IMAGE134
both the computational complexity and the memory overhead of
Figure DEST_PATH_IMAGE136
Predefining a constant
Figure DEST_PATH_IMAGE138
Then, then
Figure DEST_PATH_IMAGE140
Figure DEST_PATH_IMAGE142
(ii) a Predefined constants
Figure DEST_PATH_IMAGE144
Then, then
Figure DEST_PATH_IMAGE146
Can be converted into
Figure DEST_PATH_IMAGE148
To do so
Figure DEST_PATH_IMAGE150
The computational complexity and memory overhead of
Figure DEST_PATH_IMAGE152
Figure DEST_PATH_IMAGE154
Can be converted into
Figure DEST_PATH_IMAGE156
To do so
Figure 506889DEST_PATH_IMAGE156
The computational complexity and memory overhead of
Figure DEST_PATH_IMAGE158
Thus calculating
Figure DEST_PATH_IMAGE160
Is reduced in both computational complexity and memory overhead to
Figure DEST_PATH_IMAGE162
Step 72, fixing
Figure DEST_PATH_IMAGE164
Figure DEST_PATH_IMAGE166
Figure DEST_PATH_IMAGE168
And
Figure 99675DEST_PATH_IMAGE110
solving for
Figure DEST_PATH_IMAGE170
: and solving for
Figure DEST_PATH_IMAGE172
Similarly, one can obtain:
Figure DEST_PATH_IMAGE174
utilization and solution
Figure DEST_PATH_IMAGE176
In a similar way, will calculate
Figure DEST_PATH_IMAGE178
Is reduced in both computational complexity and memory overhead to
Figure DEST_PATH_IMAGE180
Step 73, fixing
Figure DEST_PATH_IMAGE182
Figure DEST_PATH_IMAGE184
Figure DEST_PATH_IMAGE186
And
Figure DEST_PATH_IMAGE188
solving for
Figure DEST_PATH_IMAGE190
: removing and
Figure 174511DEST_PATH_IMAGE190
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE192
to the above formula
Figure DEST_PATH_IMAGE194
And let it equal 0, then:
Figure DEST_PATH_IMAGE196
step 74, fixing
Figure DEST_PATH_IMAGE198
Figure 926567DEST_PATH_IMAGE184
Figure 686712DEST_PATH_IMAGE190
And
Figure 953745DEST_PATH_IMAGE188
solving for
Figure 152646DEST_PATH_IMAGE186
: removing and
Figure 759207DEST_PATH_IMAGE186
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE200
the above equation can be solved by a Singular Value Decomposition (SVD) algorithm, i.e.
Figure DEST_PATH_IMAGE202
Wherein
Figure DEST_PATH_IMAGE204
In the form of a left-hand singular matrix,
Figure DEST_PATH_IMAGE206
in the form of a right singular matrix,
Figure DEST_PATH_IMAGE208
is a matrix of singular values, then
Figure DEST_PATH_IMAGE210
Step 75, fixing
Figure DEST_PATH_IMAGE212
Figure DEST_PATH_IMAGE214
Figure DEST_PATH_IMAGE216
And
Figure 500374DEST_PATH_IMAGE186
solving for
Figure 192386DEST_PATH_IMAGE188
: removing and
Figure 867081DEST_PATH_IMAGE188
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE218
the following can be obtained:
Figure DEST_PATH_IMAGE220
wherein
Figure DEST_PATH_IMAGE222
Representing a symbolic function;
and step 76, repeating the steps 71-75 until the algorithm converges or the maximum iteration number is reached.
4. The method as claimed in claim 3, wherein in step 9, the hash code of the query sample is
Figure DEST_PATH_IMAGE224
CN202010943065.5A 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method Withdrawn CN112214623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010943065.5A CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010943065.5A CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Publications (1)

Publication Number Publication Date
CN112214623A true CN112214623A (en) 2021-01-12

Family

ID=74049225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010943065.5A Withdrawn CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Country Status (1)

Country Link
CN (1) CN112214623A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191445A (en) * 2021-05-16 2021-07-30 中国海洋大学 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN117315687A (en) * 2023-11-10 2023-12-29 哈尔滨理工大学 Image-text matching method for single-class low-information-content data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108595688A (en) * 2018-05-08 2018-09-28 鲁东大学 Across the media Hash search methods of potential applications based on on-line study
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A Robust Discretely Supervised Cross-Media Hash Retrieval Method
CN110110100A (en) * 2019-05-07 2019-08-09 鲁东大学 Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108595688A (en) * 2018-05-08 2018-09-28 鲁东大学 Across the media Hash search methods of potential applications based on on-line study
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A Robust Discretely Supervised Cross-Media Hash Retrieval Method
CN110110100A (en) * 2019-05-07 2019-08-09 鲁东大学 Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAO YAO,LIANSHAN YAN, YILAN MA, HONG YU, QINGTANG SU: "《Fast discrete cross-modal hashing with semantic consistency》", 《NEURAL NETWORKS》 *
姚涛: "《基于哈希方法的跨媒体检索研究》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191445A (en) * 2021-05-16 2021-07-30 中国海洋大学 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113191445B (en) * 2021-05-16 2022-07-19 中国海洋大学 Large-scale Image Retrieval Method Based on Self-Supervised Adversarial Hash Algorithm
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN117315687A (en) * 2023-11-10 2023-12-29 哈尔滨理工大学 Image-text matching method for single-class low-information-content data
CN117315687B (en) * 2023-11-10 2024-10-08 泓柯垚利(北京)劳务派遣有限公司 Image-text matching method for single-class low-information-content data

Similar Documents

Publication Publication Date Title
CN107256271B (en) Cross-modal hash retrieval method based on mapping dictionary learning
Kulis et al. Fast similarity search for learned metrics
CN112214623A (en) Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN106202256B (en) Web Image Retrieval Method Based on Semantic Propagation and Hybrid Multi-Instance Learning
CN104820696B (en) A kind of large-scale image search method based on multi-tag least square hash algorithm
CN108334574A (en) A kind of cross-module state search method decomposed based on Harmonious Matrix
CN103729428B (en) Big data classification method and system
CN105469096A (en) Feature bag image retrieval method based on Hash binary code
CN109871454B (en) A Robust Discrete Supervised Cross-media Hashing Retrieval Method
CN108510559A (en) It is a kind of based on have supervision various visual angles discretization multimedia binary-coding method
CN114169442B (en) Remote sensing image small sample scene classification method based on double prototype network
CN114329109B (en) Multimodal retrieval method and system based on weakly supervised hash learning
Liu et al. An indoor scene classification method for service robot Based on CNN feature
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN111523586B (en) Noise-aware-based full-network supervision target detection method
Song et al. Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning
CN114357200A (en) A Cross-modal Hash Retrieval Method Based on Supervised Graph Embedding
CN112257716A (en) A scene text recognition method based on scale adaptation and directional attention network
CN112883216B (en) Semi-supervised image retrieval method and device based on disturbance consistency self-integration
CN103279581A (en) Method for performing video retrieval by compact video theme descriptors
CN103605653B (en) Big data retrieval method based on sparse hash
Ding et al. Weakly-supervised online hashing with refined pseudo tags
CN111984800B (en) Hash cross-modal information retrieval method based on dictionary pair learning
CN105808723B (en) The picture retrieval method hashed based on picture semantic and vision
CN112307248A (en) An image retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210112