CN109271486B - Similarity-preserving cross-modal Hash retrieval method - Google Patents

Similarity-preserving cross-modal Hash retrieval method Download PDF

Info

Publication number
CN109271486B
CN109271486B CN201811097048.3A CN201811097048A CN109271486B CN 109271486 B CN109271486 B CN 109271486B CN 201811097048 A CN201811097048 A CN 201811097048A CN 109271486 B CN109271486 B CN 109271486B
Authority
CN
China
Prior art keywords
sample
hash
text
retrieval
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811097048.3A
Other languages
Chinese (zh)
Other versions
CN109271486A (en
Inventor
董西伟
杨茂保
孙丽
董小刚
尧时茂
王玉伟
邓安远
邓长寿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiujiang University
Original Assignee
Jiujiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiujiang University filed Critical Jiujiang University
Priority to CN201811097048.3A priority Critical patent/CN109271486B/en
Publication of CN109271486A publication Critical patent/CN109271486A/en
Application granted granted Critical
Publication of CN109271486B publication Critical patent/CN109271486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A similarity-preserving cross-modal hash retrieval method, the method comprising the steps of: (1) constructing an objective function based on a similarity retention strategy; (2) solving an objective function; (3) generating a binary Hash code of the query sample and the sample in the retrieval sample set; (4) calculating the Hamming distance from the query sample to each sample in the retrieval sample set; (5) the retrieval of the query sample is accomplished using a cross-modality retriever. The method can fully retain the similarity of samples between the modes during Hash learning, and can fully retain the similarity of samples in the modes, so that a hamming space obtained by learning has stronger identification capability, and cross-mode retrieval is facilitated.

Description

Similarity-preserving cross-modal Hash retrieval method
Technical Field
The invention relates to a similarity-preserving cross-modal Hash retrieval method.
Background
In various industries of the present society, a large amount of user data (such as the data volume owned by the search engine Chrome exceeds 100PB) is accumulated by a large number of users, and the data volume is still increasing in an exponential trend, and the big data era is coming. The big data plays a very important role in the industries such as internet finance, medical treatment, education, military affairs and transportation, for example, the big data is combined with a machine learning technology, so that reliable bases can be provided for financial investment, market decision and the like. Today's big data has the following characteristics: (1) the volume is large, and the data volume is in PB (positive displacement); (2) the dimensionality is high, and the data features have thousands of dimensionalities; (3) the mode is many, and data variety is many, the form is diversified, including form such as image, text, audio frequency and video. These features of big data pose serious challenges to machine learning. In the face of the current situation, how to reasonably utilize the big data, extract valuable information from the big data and provide a basis for actual project work is an urgent problem to be solved.
The information retrieval technology can retrieve valuable information for users, similarity search is a research hotspot in the field of information retrieval, and Approximate Nearest Neighbor search (ANN) is concerned due to the high search speed. The ANN searching method mainly comprises a tree method and a Hash learning method, and the two methods have respective characteristics. In particular, the tree-based approach has the following characteristics: (1) recursively dividing the data and divide and treat it; (2) query time complexity is o (logn); (3) as the data dimension increases, the tree-based ANN search performance gradually decreases; (4) a storage tree structure is needed, and the storage overhead is large; (5) when the system runs, original data needs to be stored, and the expense of a memory is increased. The hash learning method has excellent characteristics, including: (1) each item in the database is represented by a binary string, so that the data storage capacity and the memory space are greatly reduced; (2) the query time complexity is constant O (1) or sub-linear. Therefore, the hash learning method is widely used in practice.
The cross-modal hash is mainly used for solving the problem of mutual retrieval between multi-modal data, such as searching for text by using an image or searching for an image by using text. The cross-modal hash algorithm needs to perform hash coding on data in different modalities to generate a compact binary string, and then perform mutual retrieval on the data in different modalities. The cross-modal hash algorithm needs to consider not only the correlation between data in the same modality, but also the correlation between data in different modalities. In recent years, a hash method of more than a few cross-modal retrieval is proposed in succession. For example, Bronstein et al propose a Cross-modal Similarity Search Hashing (CMSSH) method, in which a hash function corresponding to each binary code is regarded as a weak classifier, and the learning of the hash function is performed by using an AdaBoost lifting algorithm. Kuma et al propose a Cross-View Hashing (CVH) method that learns respective hash functions for different modal data by minimizing the difference between semantic similarity and Hamming distance. Song et al propose an Inter-Media Hashing (IMH) method that finds a common hamming space by maintaining the consistency between Media and Media, and then learns respective corresponding hash functions for different modalities using a linear regression algorithm. Ding et al propose a Collaborative Matrix Factorization Hashing (CMFH) method, which learns a common semantic representation for different modalities through a collaborative Matrix decomposition, and then generates a uniform binary hash code through a quantization method. Zhu et al propose a Linear Cross-Modal Hashing (LCMH) method, which uses a K-means clustering algorithm to data in each mode to generate K clustering points, reconstructs a feature space of the data according to distances between the data points in the mode and the K clustering points, and solves a feature vector through feature value decomposition to obtain a hash function corresponding to each mode. Zhou et al propose a Latent Semantic Sparse Hashing (LSSH) method, which combines Sparse coding and matrix decomposition techniques to learn a common Latent semantic representation for features of different modalities, and then optimally solve an objective function through an iterative optimization algorithm. Zhang et al, based on the Semantic association Maximization (SCM) method, complete the learning of the hash function by maximizing the Semantic association, and propose a feature decomposition method SCM _ orth and a sequence learning method SCM _ seq. Lin et al propose a semantic Preserving Hashing (SePH) method, which converts a similarity matrix into a probability distribution calculation by minimizing K-L divergence, performs probability estimation on binary hash code strings corresponding to each sample, and then learns hash functions corresponding to each modality through a kernel function regression algorithm.
When mapping data of image modalities and text modalities from the original feature space to other feature spaces, some features of the original data are inevitably lost. For cross-modal retrieval based on Hash learning, when data of an image modality and a text modality are mapped to a Hamming space from an original feature space, effective retention and mining of identification information in the original data have a vital role in effectively completing a cross-modal retrieval task. For the samples of the image modality and the text modality, the similarity relationship of the samples of different modalities and the similarity relationship of the samples of the same modality are key factors influencing the cross-modality retrieval. Many existing cross-modal hash learning methods do not process similarity relations between modals and samples in the modals well, some methods only pay attention to keeping the similarity relations between the modals, and some methods only pay attention to keeping the similarity relations between the modals. This can adversely affect the discrimination performance of the learned hamming space. In addition, many methods do not fully consider the redundancy problem of the information contained in each bit of the hash code, so that the learned hash code has not only redundancy but also lacks sufficient discrimination capability. Therefore, in the cross-modal hash learning, the similarity relation between the modes and the samples in the modes is simultaneously kept, the information redundancy on each bit of the hash code is made to be as small as possible, and the cross-modal hash learning method has very important significance for effectively promoting the improvement of the cross-modal retrieval performance.
Disclosure of Invention
The invention aims to provide a similarity-preserving cross-modal Hash retrieval method, which solves the problems that the similarity of intra-modal and inter-modal samples is not fully preserved in the existing methods, and redundant information on each bit of Hash codes is not fully reduced, so that the learned Hash codes have good identification capability.
The technical scheme adopted for achieving the purpose is that a similarity-preserving cross-modal Hash retrieval method is provided, and n objects are assumed
Figure GDA0003292667660000041
The characteristics in the image mode and the text mode are respectively
Figure GDA0003292667660000042
And
Figure GDA0003292667660000043
wherein d is1And d2Representing the dimensions of the image modality and text modality feature vectors respectively,
Figure GDA0003292667660000044
and
Figure GDA0003292667660000045
respectively representing the characteristics of the ith object in an image modality and a text modality; meanwhile, the feature vectors of the image mode and the text mode are assumed to be subjected to zero-centering preprocessing, namely, the condition that the feature vectors meet the requirement of zero-centering preprocessing is met
Figure GDA0003292667660000046
Let L be [ L ] for the tag matrix formed by the class tags of n objects1,l2,…,ln]∈{0,1}m×nWherein l isi(i ═ 1,2, …, n) denotes category label information of the ith object, and m is the number of categories; assume a cross-modal similarity matrix of S, whose element SijRepresenting the similarity of the ith sample in the image modality and the jth sample in the text modality; if the ith sample in the image modality is similar to the jth sample in the text modality (at least belonging to the same category), Sij1, otherwise Sij0; the method comprises the following steps:
(1) constructing an objective function based on a similarity retention strategy: obtaining binary Hash codes U and V of n object image mode and text mode characteristic data in Hamming space by using an objective function designed based on an inter-mode similarity retaining strategy and an intra-mode similarity retaining strategy, and Hash projection matrixes P corresponding to the image mode and the text mode respectively1And P2And two coefficient matrices W1And W2
(2) Solving an objective function: by means of alternate updating, in view of the non-convex nature of the objective functionObtain a solution U, V, P of the objective function1、P2、W1And W2Namely, the following four subproblems are solved alternately: fixed U, V, W1And W2Solving for P1And P2(ii) a Fixed U, V, P1And P2Solving for W1And W2(ii) a Fixed V, P1、P2、W1And W2Solving U; fixed U, P1、P2、W1And W2Solving V;
(3) generating a sample binary hash code of the query sample and the retrieval sample set: hash projection matrix P based on image mode and text mode obtained by solving1And P2Generating binary hash codes for the query samples and the samples in the retrieval sample set;
(4) calculating the Hamming distance from the query sample to each sample in the retrieval sample set: calculating the Hamming distance from the query sample to each sample in the retrieval sample set based on the generated binary Hash codes;
(5) the retrieval of the query sample is accomplished using a cross-modality retriever: the retrieval of the query sample is accomplished using a cross-modal retriever based on approximate nearest neighbor searching.
The objective function designed based on the intra-modal similarity retention strategy and the inter-modal similarity retention strategy in the step (1) is in the form as follows:
Figure GDA0003292667660000061
wherein alpha, beta, gamma and eta are nonnegative balance factors, c is the length of binary hash code, I is an identity matrix, 11 denotes a column vector with elements all being 1,
Figure GDA0003292667660000062
λ > 0 is an adjustable scaling factor, uiBinary hash coding of the ith sample for an image modality, vjBinary hash coding of jth sample in text mode, | · | | torryFRepresentation matrixFrobenius norm, (.)TRepresenting a transpose operation of the matrix.
The solution U, V, P of the objective function is obtained by the alternative solution in the step (2)1、P2、W1And W2Specifically, the following four sub-problems are solved alternately:
(1) fixed U, V, W1And W2Solving for P1And P2(ii) a When the binary hash is fixed, U and V, and the coefficient matrix W1And W2Thereafter, the objective function in equation (1) is reduced to that regarding the Hash projection matrix P1And P2The sub-problems of (a):
Figure GDA0003292667660000063
(2) fixed U, V, P1And P2Solving for W1And W2(ii) a When the binary hash codes U and V are fixed, and the hash projection matrix P1And P2Thereafter, the objective function in equation (1) is reduced to about the coefficient matrix W1And W2The sub-problems of (a):
Figure GDA0003292667660000064
(3) fixed V, P1、P2、W1And W2Solving U; when the binary Hash code V of the fixed text mode is adopted, the Hash projection matrix P1And P2And a coefficient matrix W1And W2After that, the objective function in equation (1) reduces to a sub-problem about the image modality binary hash encoding U, namely:
Figure GDA0003292667660000071
(4) fixed U, P1、P2、W1And W2Solving V; binary hash coding in fixed image modeCode U, Hash projection matrix P1And P2And a coefficient matrix W1And W2The objective function in equation (1) is then reduced to the sub-problem with the text modality binary hash V, namely:
Figure GDA0003292667660000072
the Hash projection matrix P based on the image mode and the text mode obtained by solving in the step (3)1And P2Generating a binary hash code for the query sample and the samples in the search sample set, in particular, assuming that a feature vector of a query sample of the image modality is
Figure GDA0003292667660000073
The feature vector of a query sample of the text modality is
Figure GDA0003292667660000074
The image mode searches the characteristics of the samples in the sample set as
Figure GDA0003292667660000075
The text modal search sample set is characterized by
Figure GDA0003292667660000076
Wherein the content of the first and second substances,
Figure GDA0003292667660000077
representing the number of samples in the search sample set; the binary hash codes of the query samples in the image mode and the text mode and the binary hash codes of the samples in the retrieval sample set in the image mode and the text mode are respectively as follows:
Figure GDA0003292667660000081
and
Figure GDA0003292667660000082
wherein the content of the first and second substances,
Figure GDA0003292667660000083
in the step (4), the hamming distance from the query sample to each sample in the retrieval sample set is calculated based on the generated binary hash code, specifically, a formula is used
Figure GDA0003292667660000084
Calculating the Hamming distance from the query sample of the image mode to each sample in the text mode retrieval sample set, using a formula
Figure GDA0003292667660000085
The hamming distance from a query sample of a text modality to each sample in a set of image modality retrieval samples is calculated.
In the step (5), a cross-modal retriever based on approximate nearest neighbor search is used to complete the retrieval of the query sample, specifically, when the text is retrieved by using the image, the Hamming distance is selected
Figure GDA0003292667660000086
Sorting in order from small to large, for hamming distance when searching images with text
Figure GDA0003292667660000087
And sorting according to the sequence from small to large, and then taking samples corresponding to the first K minimum distances in the retrieval sample set as retrieval results.
Advantageous effects
Compared with the prior art, the invention has the following advantages.
1. The method can fully retain the similarity of samples between the modes during Hash learning, and can fully retain the similarity of samples in the modes, so that a hamming space obtained by learning has stronger identification capability, and cross-mode retrieval is facilitated;
2. the method fully considers the redundancy of the Hash code and minimizes the redundancy of each bit of the Hash code by implementing orthogonal constraint, so that the learned Hash code can contain more identification information, and the cross-modal retrieval performance is effectively improved.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings.
Fig. 1 is a flowchart of a similarity preserving cross-modal hash retrieval method according to the present invention.
Detailed Description
A similarity-preserving cross-modal Hash retrieval method assumes n objects
Figure GDA0003292667660000091
The characteristics in the image mode and the text mode are respectively
Figure GDA0003292667660000092
And
Figure GDA0003292667660000093
wherein d is1And d2Representing the dimensions of the image modality and text modality feature vectors respectively,
Figure GDA0003292667660000094
and
Figure GDA0003292667660000095
respectively representing the characteristics of the ith object in an image modality and a text modality; meanwhile, the feature vectors of the image mode and the text mode are assumed to be subjected to zero-centering preprocessing, namely, the condition that the feature vectors meet the requirement of zero-centering preprocessing is met
Figure GDA0003292667660000096
Let L be [ L ] for the tag matrix formed by the class tags of n objects1,l2,…,ln]∈{0,1}m×nWherein l isi(i ═ 1,2, …, n) denotes category label information of the ith object, and m is the number of categories; assume a cross-modal similarity matrix of S, whose element SijRepresenting the similarity of the ith sample in the image modality and the jth sample in the text modality; if the ith sample in the image mode and the jth sample in the text modeIf they are similar (belonging to at least one of the same categories), Sij1, otherwise Sij0; as shown in fig. 1, the method comprises the following steps:
(1) constructing an objective function based on a similarity retention strategy: obtaining binary Hash codes U and V of n object image mode and text mode characteristic data in Hamming space by using an objective function designed based on an inter-mode similarity retaining strategy and an intra-mode similarity retaining strategy, and Hash projection matrixes P corresponding to the image mode and the text mode respectively1And P2And two coefficient matrices W1And W2
(2) Solving an objective function: the solution U, V, P of the objective function is obtained by means of alternative solving in view of the non-convex nature of the objective function1、P2、W1And W2Namely, the following four subproblems are solved alternately: fixed U, V, W1And W2Solving for P1And P2(ii) a Fixed U, V, P1And P2Solving for W1And W2(ii) a Fixed V, P1、P2、W1And W2Solving U; fixed U, P1、P2、W1And W2Solving V;
(3) generating a sample binary hash code of the query sample and the retrieval sample set: hash projection matrix P based on image mode and text mode obtained by solving1And P2Generating binary hash codes for the query samples and the samples in the retrieval sample set;
(4) calculating the Hamming distance from the query sample to each sample in the retrieval sample set: calculating the Hamming distance from the query sample to each sample in the retrieval sample set based on the generated binary Hash codes;
(5) the retrieval of the query sample is accomplished using a cross-modality retriever: the retrieval of the query sample is accomplished using a cross-modal retriever based on approximate nearest neighbor searching.
The objective function designed based on the intra-modal similarity retention strategy and the inter-modal similarity retention strategy in the step (1) is in the form as follows:
Figure GDA0003292667660000101
wherein alpha, beta, gamma and eta are nonnegative balance factors, c is the length of binary hash code, I is an identity matrix, 1n×1A column vector with elements all being 1 is represented,
Figure GDA0003292667660000102
λ > 0 is an adjustable scaling factor, uiBinary hash coding of the ith sample for an image modality, vjBinary hash coding of jth sample in text mode, | · | | torryFFrobenius norm of the representation matrix (·)TRepresenting a transpose operation of the matrix.
The solution U, V, P of the objective function is obtained by the alternative solution in the step (2)1、P2、W1And W2Specifically, the following four sub-problems are solved alternately:
(1) fixed U, V, W1And W2Solving for P1And P2(ii) a When the binary hash is fixed, U and V, and the coefficient matrix W1And W2Thereafter, the objective function in equation (1) is reduced to that regarding the Hash projection matrix P1And P2The sub-problems of (a):
Figure GDA0003292667660000111
(2) fixed U, V, P1And P2Solving for W1And W2(ii) a When the binary hash codes U and V are fixed, and the hash projection matrix P1And P2Thereafter, the objective function in equation (1) is reduced to about the coefficient matrix W1And W2The sub-problems of (a):
Figure GDA0003292667660000112
(3) fixed V, P1、P2、W1And W2Solving U; when the binary Hash code V of the fixed text mode is adopted, the Hash projection matrix P1And P2And a coefficient matrix W1And W2After that, the objective function in equation (1) reduces to a sub-problem about the image modality binary hash encoding U, namely:
Figure GDA0003292667660000113
(4) fixed U, P1、P2、W1And W2Solving V; when the binary Hash code U of the fixed image mode is used, the Hash projection matrix P1And P2And a coefficient matrix W1And W2The objective function in equation (1) is then reduced to the sub-problem with the text modality binary hash V, namely:
Figure GDA0003292667660000121
the Hash projection matrix P based on the image mode and the text mode obtained by solving in the step (3)1And P2Generating a binary hash code for the query sample and the samples in the search sample set, in particular, assuming that a feature vector of a query sample of the image modality is
Figure GDA0003292667660000122
The feature vector of a query sample of the text modality is
Figure GDA0003292667660000123
The image mode searches the characteristics of the samples in the sample set as
Figure GDA0003292667660000124
The text modal search sample set is characterized by
Figure GDA0003292667660000125
Wherein the content of the first and second substances,
Figure GDA0003292667660000126
representing the number of samples in the search sample set; the binary hash codes of the query samples in the image mode and the text mode and the binary hash codes of the samples in the retrieval sample set in the image mode and the text mode are respectively as follows:
Figure GDA0003292667660000127
and
Figure GDA0003292667660000128
wherein the content of the first and second substances,
Figure GDA0003292667660000129
Figure GDA00032926676600001213
in the step (4), the hamming distance from the query sample to each sample in the retrieval sample set is calculated based on the generated binary hash code, specifically, a formula is used
Figure GDA00032926676600001210
Calculating the Hamming distance from the query sample of the image mode to each sample in the text mode retrieval sample set, using a formula
Figure GDA00032926676600001211
The hamming distance from a query sample of a text modality to each sample in a set of image modality retrieval samples is calculated.
In the step (5), a cross-modal retriever based on approximate nearest neighbor search is used to complete the retrieval of the query sample, specifically, when the text is retrieved by using the image, the Hamming distance is selected
Figure GDA00032926676600001212
Sorting in order from small to large, for hamming distance when searching images with text
Figure GDA0003292667660000131
And sorting according to the sequence from small to large, and then taking samples corresponding to the first K minimum distances in the retrieval sample set as retrieval results.
The specific implementation process mainly comprises the following steps:
(1) target function construction based on similarity retention strategy
In the method, the purpose of cross-modal Hash learning is to utilize the characteristic data X of an image modality and a text modality(1)And X(2)And class label information of the object, learning hash function f of image modality and text modality(1)(x(1))∈{-1,+1}c×1And f(2)(x(2))∈{-1,+1}c×1Where c is the binary hash encoding adjustable length. Suppose U ═ U1,u2,…,un]∈{-1,+1}c×nAnd V ═ V1,v2,…,vn]∈{-1,+1}c×nIs a binary hash code in a hash space generated by using the characteristic data of image modality and text modality of n objects and corresponding hash function, wherein u isiAnd viThe hash codes of the ith (i ═ 1,2, …, n) objects in the image modality and the text modality are respectively represented. In order to have good discrimination of the binary hashes U and V in the hash space, it is desirable for the hashes U and V to retain similar information in S, i.e. if S hasij1, then uiAnd vj、ujAnd viThe Hamming distance of (1) is as small as possible, whereas the distance between the Hamming distance and the Hamming distance is as large as possible.
For the sake of illustration, only u is formulaically describediAnd vjRelation of (a) ujAnd viThe relationships of (c) can be similarly formulated. For paired binary hash encoding ui,vjDefine Hash code u based on inner product of the twoiAnd vjThe similarity relationship between the two is shown in the formula (1):
Figure GDA0003292667660000132
wherein λ > 0 is an adjustable scale factor, c is a predetermined hash code length,<·,·>representing the inner product of the vector. Using Sigmoid function to convert thetaijFrom the original interval projection to the (0,1) range, we can get:
Figure GDA0003292667660000141
based on AijDefining the posterior probability of the cross-modal similarity matrix S, and obtaining:
Figure GDA0003292667660000142
according to the likelihood function estimation method in probability theory, the negative logarithm of formula (3) is expressed as
Figure GDA0003292667660000143
Figure GDA0003292667660000144
Where const denotes a constant.
Minimizing equation (4) may enable preserving cross-modality similarity in hash encodings U and V for image and text modalities. Specifically, as can be seen from equation (4), if SijWhen 1, then ΘijIt needs to be as large as possible, i.e. uiAnd vjThe inner product of (a) needs to be as large as possible, i.e., the binary hash code uiAnd vjThe Hamming distance between the two needs to be as small as possible; on the contrary, if SijWhen the value is 0, then thetaijNeeds to be as small as possible, binary hash code uiAnd vjThe hamming distance between them needs to be as small as possible.
An effective cross-modal retrieval method needs to consider not only inter-modal similarity but also intra-modal neighbor structure, so intra-modal similarityA reservation needs to be made. For a single modality, the hash encoding of that modality is some transformation of the original feature vector of that modality. The intra-hash-coding-modality similarity preservation problem can be treated as a classification problem, i.e. optimal hash-coding can be well used for the complete classification as well. Suppose an image modality feature X(1)And text modality feature X(2)The projection matrices mapped to hash codes U and V are respectively
Figure GDA0003292667660000145
And
Figure GDA0003292667660000146
the coefficient matrices for classifying the hash codes U and V are respectively
Figure GDA0003292667660000147
And
Figure GDA0003292667660000148
based on l2Loss, minimizing the function that preserves intra-modal similarity can be achieved:
Figure GDA0003292667660000151
the hash code applied to the cross-modal retrieval task is expected to have the following characteristics while keeping similarity between and within the modalities:
(1) independence. If each bit of the hash code is considered as an attribute, it is desirable that the redundancy between the attributes is as small as possible, that is, it is desirable that the bits are independent of each other. The formulation of this characteristic is described by equation (6):
UUT=nI,VVT=nI, (6)
wherein I is an identity matrix.
(2) And (4) balance. That is, it is desirable that the probability of each bit hash code being +1 and-1 be equal, each 50%. This constraint may maximize the information provided by each bit. The formulation of this characteristic is described by equation (7):
U1n×1=0,V1n×1=0, (7)
wherein 1 isn×1Representing a column vector with elements all being 1.
By combining the above analysis, the overall objective function design of the similarity-preserving cross-modal hash retrieval method is as follows:
Figure GDA0003292667660000152
wherein, alpha, beta, gamma and eta are non-negative balance factors.
(2) Solving of an objective function
The objective function equation (8) contains six variables to be solved, namely: hash codes U and V of image modality and text modality, and Hash projection matrix P of image modality and text modality1And P2Coefficient matrix W1And W2. The objective function in equation (8) is non-convex for the six variables to be solved, and therefore, the analytical solutions of the six variables to be solved cannot be obtained simultaneously. The unknown variable to be solved in equation (8) can be solved by solving the following four subproblems alternately, namely: fixed U, V, W1And W2Solving for P1And P2(ii) a Fixed U, V, P1And P2Solving for W1And W2(ii) a Fixed V, P1、P2、W1And W2Solving U; fixed U, P1、P2、W1And W2And solving for V.
(a) Fixed U, V, W1And W2Solving for P1And P2
When the binary hash is fixed, U and V, and the coefficient matrix W1And W2Thereafter, the objective function in equation (8) is reduced to that regarding the Hash projection matrix P1And P2The sub-problems of (a):
Figure GDA0003292667660000161
the problem in equation (9) is a standard partial least squares regression problem. Respectively to J about P1And P2Taking the partial derivative and making the derivative equal to 0, one can obtain:
Figure GDA0003292667660000162
Figure GDA0003292667660000163
a simple derivation of equations (10) and (11) can be found:
P1=(X(1)X(1)T+γI)-1X(1)UT, (12)
P2=(X(2)X(2)T+γI)-1X(2)VT, (13)
wherein, (.)-1Representing the inverse of the matrix.
(b) Fixed U, V, P1And P2Solving for W1And W2
When the binary hash codes U and V are fixed, and the hash projection matrix P1And P2Thereafter, the objective function in equation (8) is reduced to about the coefficient matrix W1And W2The sub-problems of (a):
Figure GDA0003292667660000171
the problem in equation (14) is also a standard partial least squares regression problem. Respectively to J about W1And W2Taking the partial derivative and making the derivative equal to 0, one can obtain:
Figure GDA0003292667660000172
Figure GDA0003292667660000173
a simple derivation of equations (15) and (16) can yield:
W1=(UUT+(γI)/α)-1ULT, (17)
W2=(VVT+(γI)/α)-1VLT。 (18)
(c) fixed V, P1、P2、W1And W2Solving for U
When the binary Hash code V of the fixed text mode is adopted, the Hash projection matrix P1And P2And a coefficient matrix W1And W2Then, the objective function in equation (8) reduces to a sub-problem with the image modality binary hash encoding U, namely:
Figure GDA0003292667660000174
for solving conveniently, the method relaxes the discrete hash variable U into a continuous variable for solving, so that the objective function in the formula (19) is converted into:
Figure GDA0003292667660000181
to pair
Figure GDA0003292667660000182
The derivative with respect to U can be found:
Figure GDA0003292667660000183
to find the optimal solution for U, U is iteratively updated using a gradient descent method, i.e., by iteratively updating U (t +1) ═ U (t) + Δ U (t). Specifically, according to the taylor expansion, there are:
Figure GDA0003292667660000184
therefore, to satisfy J (U + Δ U) < J (U), one may select:
Figure GDA0003292667660000185
wherein the step size omega1Is a predefined relatively small constant. After the continuous variable U is found, a discrete hash variable U is obtained using the formula U-sign (U), where sign (·) is a sign function, that is: sign (x) ═ 1 when x ≧ 0, and sign (x) ═ 1 when x < 0.
(d) Fixed U, P1、P2、W1And W2Solving for V
When the binary Hash code U of the fixed image mode is used, the Hash projection matrix P1And P2And a coefficient matrix W1And W2The objective function in equation (8) is then reduced to the sub-problem with the text modality binary hash V, namely:
Figure GDA0003292667660000191
similar to the solution of the discrete variable U, the discrete hash variable V is also relaxed to be a continuous variable for solution. After the continuous variable V is found, a discrete hash variable V is obtained using the formula V sign (V).
(3) Generating a binary hash of a sample in a set of query and search samples
Assume that a query sample of an image modality has a feature vector of
Figure GDA0003292667660000192
The feature vector of a query sample of the text modality is
Figure GDA0003292667660000193
The image mode searches the characteristics of the samples in the sample set as
Figure GDA0003292667660000194
The text modal search sample set is characterized by
Figure GDA0003292667660000195
Wherein the content of the first and second substances,
Figure GDA0003292667660000196
representing the number of samples in the search sample set. The binary hash codes of the query samples in the image mode and the text mode and the binary hash codes of the samples in the retrieval sample set in the image mode and the text mode are respectively as follows:
Figure GDA0003292667660000197
and
Figure GDA0003292667660000198
wherein the content of the first and second substances,
Figure GDA0003292667660000199
Figure GDA00032926676600001910
(4) calculating the Hamming distance from the query sample to each sample in the search sample set
Query sample for image modalities
Figure GDA00032926676600001911
Using the formula
Figure GDA00032926676600001912
Computing query samples for image modalities
Figure GDA00032926676600001913
Retrieving each sample in a sample set to a text modality
Figure GDA00032926676600001914
Hamming distance of. Query sample for text modality
Figure GDA00032926676600001915
Using the formula
Figure GDA00032926676600001916
Computing query samples of text modalities
Figure GDA00032926676600001917
Retrieving each sample in a set of samples to an image modality
Figure GDA00032926676600001918
Hamming distance of.
(5) Completing retrieval of query samples using a cross-modality retriever
When searching text with image, the Hamming distance is adjusted
Figure GDA0003292667660000201
Sorting in order from small to large, for hamming distance when searching images with text
Figure GDA0003292667660000202
And sorting according to the sequence from small to large, and then taking samples corresponding to the first K minimum distances in the retrieval sample set as retrieval results.
The following describes the advantageous effects of the present invention with reference to specific experiments.
The MIRFLICKR-25K data set contains 25000 images collected from the Flickr website, and each image is appended with several of the 24 text labels, so the MIRFLICKR-25K data set can be considered a multi-label data set. Only samples containing at least 20 text labels were taken in the experiment, constituting 20015 pairs of image-text samples. For each pair of image-text, each image is represented by a 512-dimensional GIST feature vector and the features of the text are represented by 1386-dimensional bag-of-words vectors. In the experiment, 1000 pairs of image-text samples are randomly selected for constructing a query sample set, and 5000 pairs of image-text samples are randomly selected for training a cross-modal retrieval model.
In the experiment, the performance of the cross-modal search method was measured by the average Precision average (MAP). To calculate the MAP requires first calculating the Average Precision (AP). Assuming that a query sample returns R retrieved samples when performing cross-modality retrieval, the average precision AP corresponding to this query sample is defined as:
Figure GDA0003292667660000203
in equation (25), p (r) represents the precision of the first r retrieved samples, i.e., how much of the first r retrieved samples are truly relevant to the query sample. For δ (r), δ (r) is 1 when the r-th retrieved sample is truly relevant to the query sample, and δ (r) is 0 otherwise. And when the average precision APs of all the query samples are obtained, calculating the average value of the average precision APs to obtain an average precision average value MAP.
In experiments, the parameters alpha, beta, gamma and eta in the method of the invention adopt 5-fold cross validation to determine the optimal values. For the parameters in the other methods, parameter setting is performed with reference to the parameter setting methods employed in the respective method documents. The results reported in the experiments are all the average of 10 random experimental results.
The method for comparing with the method of the invention respectively comprises the following steps: a typical Correlation Analysis (CCA) method, a Cross-View Hashing (CVH) method, an Inter-Media Hashing (IMH) method, and a Latent Semantic Sparse Hashing (LSSH) method. Table 1 summarizes the average accuracy mean MAP for cross-modal search on MIRFLICKR-25K data sets by the proposed method and related methods. In table 1, Img2Txt and Txt2Img respectively represent a task of retrieving a text with an image and a task of retrieving an image with a text. As can be seen from Table 1, for two retrieval tasks, namely image retrieval text and text retrieval image, the retrieval performance of the method of the invention is superior to that of the comparison method under four hash coding lengths. Specifically, the inventive method improves MAP on 16bits, 32bits, 64bits, and 128bits by at least 0.0152(═ 0.3121-0.2969), 0.022(═ 0.3285-0.3065), 0.0253(═ 0.3371-0.3118), and 0.0196(═ 0.3442-0.3246) for Img2Txt tasks as compared to other comparative methods; for the Txt2Img task, the MAP of the inventive method on 16bits, 32bits, 64bits and 128bits is improved by at least 0.0242(═ 0.3925-0.3683), 0.0278(═ 0.4257-0.3979), 0.0273(═ 0.4618-0.4345) and 0.0351(═ 0.4969-0.4618). This shows that the similarity-preserving cross-modal hash retrieval method proposed by the present invention is effective.
TABLE 1 MAP for each method on 1 MIRFLICKR-25K data set
Figure GDA0003292667660000221
The invention also includes: an inter-modality sample similarity retention policy, an intra-modality sample similarity retention policy, and a hash coding redundancy minimization scheme.
The sample similarity retaining strategy among the modes is as follows: for the cross-modal retrieval task, heterogeneous data with different modal properties and great differences needs to be confronted in a specific retrieval process, the heterogeneity of the data with different modalities is effectively eliminated, essential relations among the data with different modalities are fully mined from the complex relations of the data with different modalities, and the improvement of the cross-modal retrieval performance can be promoted. In order to fully mine identification information from data of different modes, the method defines the similarity relation of Hash codes of samples of different modes based on inner products, models the similarity relation into a probability model by utilizing a Sigmoid function, and then completes the retention of the similarity between the modes based on the posterior probability of a cross-mode similarity matrix, thereby achieving the aim of effectively mining the identification information from the cross-mode heterogeneous data.
The intra-modal sample similarity retention strategy is as follows: for the samples inside the modality, the marking information of the samples can effectively reflect the neighboring structure and similarity relation between the samples. For a single modality, the hash encoding of that modality is some transformation of the original feature data of that modality from the original feature space to the hamming space. In order to enable the Hash code to reserve the similarity between the samples in the mode, the method utilizes the marking information of the samples to finish the reservation of the similarity in the mode by utilizing a linear regression model which is used for classification tasks.
The hash coding redundancy minimization scheme comprises the following steps: for hash coding, if each bit of the hash coding is regarded as an attribute, it is desirable that the redundancy between different attributes is as small as possible, that is, it is desirable that bits can be independent of each other. The method of the present invention achieves this by implementing orthogonal constraints on the different bits of the hash code.

Claims (6)

1. A similarity-preserving cross-modal Hash retrieval method assumes n objects
Figure FDA0003292667650000011
The characteristics in the image mode and the text mode are respectively
Figure FDA0003292667650000012
And
Figure FDA0003292667650000013
wherein d is1And d2Representing the dimensions of the image modality and text modality feature vectors respectively,
Figure FDA0003292667650000014
and
Figure FDA0003292667650000015
respectively representing the characteristics of the ith object in an image modality and a text modality; meanwhile, the feature vectors of the image mode and the text mode are assumed to be subjected to zero-centering preprocessing, namely, the condition that the feature vectors meet the requirement of zero-centering preprocessing is met
Figure FDA0003292667650000016
Assuming that n objects are involvedThe mark matrix formed by the category marks is L ═ L1,l2,…,ln]∈{0,1}m×nWherein l isi(i ═ 1,2, …, n) denotes category label information of the ith object, and m is the number of categories; assume a cross-modal similarity matrix of S, whose element SijRepresenting the similarity of the ith sample in the image modality and the jth sample in the text modality; if the ith sample in the image modality is similar to the jth sample in the text modality, Sij1, otherwise Sij0; the method is characterized by comprising the following steps:
(1) constructing an objective function based on a similarity retention strategy: obtaining binary Hash codes U and V of n object image mode and text mode characteristic data in Hamming space by using an objective function designed based on an inter-mode similarity retaining strategy and an intra-mode similarity retaining strategy, and Hash projection matrixes P corresponding to the image mode and the text mode respectively1And P2And two coefficient matrices W1And W2
(2) Solving an objective function: the solution U, V, P of the objective function is obtained by means of alternative solving in view of the non-convex nature of the objective function1、P2、W1And W2Namely, the following four subproblems are solved alternately: fixed U, V, W1And W2Solving for P1And P2(ii) a Fixed U, V, P1And P2Solving for W1And W2(ii) a Fixed V, P1、P2、W1And W2Solving U; fixed U, P1、P2、W1And W2Solving V;
(3) generating a sample binary hash code of the query sample and the retrieval sample set: hash projection matrix P based on image mode and text mode obtained by solving1And P2Generating binary hash codes for the query samples and the samples in the retrieval sample set;
(4) calculating the Hamming distance from the query sample to each sample in the retrieval sample set: calculating the Hamming distance from the query sample to each sample in the retrieval sample set based on the generated binary Hash codes;
(5) the retrieval of the query sample is accomplished using a cross-modality retriever: the retrieval of the query sample is accomplished using a cross-modal retriever based on approximate nearest neighbor searching.
2. The similarity-preserving cross-modal hash retrieval method according to claim 1, wherein the objective function designed based on the inter-modal similarity preserving policy and the intra-modal similarity preserving policy in the step (1) is in the form of:
Figure FDA0003292667650000021
wherein alpha, beta, gamma and eta are nonnegative balance factors, c is the length of binary hash code, I is an identity matrix, 1n×1A column vector with elements all being 1 is represented,
Figure FDA0003292667650000022
λ > 0 is an adjustable scaling factor, uiBinary hash coding of the ith sample for an image modality, vjBinary hash coding of jth sample in text mode, | · | | torryFFrobenius norm of the representation matrix (·)TRepresenting a transpose operation of the matrix.
3. The similarity-preserving cross-modal hash retrieval method according to claim 2, wherein the solution U, V, P of the objective function is obtained by an alternate solution in the step (2)1、P2、W1And W2Specifically, the following four sub-problems are solved alternately:
(1) fixed U, V, W1And W2Solving for P1And P2(ii) a When the binary hash is fixed, U and V, and the coefficient matrix W1And W2Thereafter, the objective function in equation (1) is reduced to that regarding the Hash projection matrix P1And P2The sub-problems of (a):
Figure FDA0003292667650000031
(2) fixed U, V, P1And P2Solving for W1And W2(ii) a When the binary hash codes U and V are fixed, and the hash projection matrix P1And P2Thereafter, the objective function in equation (1) is reduced to about the coefficient matrix W1And W2The sub-problems of (a):
Figure FDA0003292667650000032
(3) fixed V, P1、P2、W1And W2Solving U; when the binary Hash code V of the fixed text mode is adopted, the Hash projection matrix P1And P2And a coefficient matrix W1And W2After that, the objective function in equation (1) reduces to a sub-problem about the image modality binary hash encoding U, namely:
Figure FDA0003292667650000033
(4) fixed U, P1、P2、W1And W2Solving V; when the binary Hash code U of the fixed image mode is used, the Hash projection matrix P1And P2And a coefficient matrix W1And W2The objective function in equation (1) is then reduced to the sub-problem with the text modality binary hash V, namely:
Figure FDA0003292667650000041
4. the similarity-preserving cross-modal hash retrieval method of claim 1,the Hash projection matrix P based on the image mode and the text mode obtained by solving in the step (3)1And P2Generating a binary hash code for the query sample and the samples in the search sample set, in particular, assuming that a feature vector of a query sample of the image modality is
Figure FDA0003292667650000042
The feature vector of a query sample of the text modality is
Figure FDA0003292667650000043
The image mode searches the characteristics of the samples in the sample set as
Figure FDA0003292667650000044
The text modal search sample set is characterized by
Figure FDA0003292667650000045
Wherein the content of the first and second substances,
Figure FDA0003292667650000046
representing the number of samples in the search sample set; the binary hash codes of the query samples in the image mode and the text mode and the binary hash codes of the samples in the retrieval sample set in the image mode and the text mode are respectively as follows:
Figure FDA0003292667650000047
and
Figure FDA0003292667650000048
wherein the content of the first and second substances,
Figure FDA0003292667650000049
5. the similarity-preserving cross-modal hash retrieval method of claim 4, wherein the step (4) is performed based on a generated binary hash codeThe Hamming distance from the query sample to each sample in the set of search samples is determined by using a formula
Figure FDA00032926676500000410
Calculating the Hamming distance from the query sample of the image mode to each sample in the text mode retrieval sample set, using a formula
Figure FDA00032926676500000411
The hamming distance from a query sample of a text modality to each sample in a set of image modality retrieval samples is calculated.
6. The similarity-preserving cross-modal hash retrieval method according to claim 5, wherein in the step (5), the retrieval of the query sample is performed by using a cross-modal retriever based on approximate nearest neighbor search, specifically, when the text is retrieved by using the image, the hamming distance is determined
Figure FDA0003292667650000051
Sorting in order from small to large, for hamming distance when searching images with text
Figure FDA0003292667650000052
And sorting according to the sequence from small to large, and then taking samples corresponding to the first K minimum distances in the retrieval sample set as retrieval results.
CN201811097048.3A 2018-09-19 2018-09-19 Similarity-preserving cross-modal Hash retrieval method Active CN109271486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811097048.3A CN109271486B (en) 2018-09-19 2018-09-19 Similarity-preserving cross-modal Hash retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811097048.3A CN109271486B (en) 2018-09-19 2018-09-19 Similarity-preserving cross-modal Hash retrieval method

Publications (2)

Publication Number Publication Date
CN109271486A CN109271486A (en) 2019-01-25
CN109271486B true CN109271486B (en) 2021-11-26

Family

ID=65197120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811097048.3A Active CN109271486B (en) 2018-09-19 2018-09-19 Similarity-preserving cross-modal Hash retrieval method

Country Status (1)

Country Link
CN (1) CN109271486B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019652B (en) * 2019-03-14 2022-06-03 九江学院 Cross-modal Hash retrieval method based on deep learning
CN109960732B (en) * 2019-03-29 2023-04-18 广东石油化工学院 Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN110059154B (en) * 2019-04-10 2022-04-15 山东师范大学 Cross-modal migration hash retrieval method based on inheritance mapping
CN111914108A (en) * 2019-05-07 2020-11-10 鲁东大学 Discrete supervision cross-modal Hash retrieval method based on semantic preservation
CN111914950B (en) * 2020-08-20 2021-04-16 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Unsupervised cross-modal retrieval model training method based on depth dual variational hash
CN112559810B (en) * 2020-12-23 2022-04-08 上海大学 Method and device for generating hash code by utilizing multi-layer feature fusion
CN113326287B (en) * 2021-08-04 2021-11-02 山东大学 Online cross-modal retrieval method and system using three-step strategy
CN114218259B (en) * 2022-02-21 2022-05-24 深圳市云初信息科技有限公司 Multi-dimensional scientific information search method and system based on big data SaaS

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777318A (en) * 2017-01-05 2017-05-31 西安电子科技大学 Matrix decomposition cross-module state Hash search method based on coorinated training
CN106844518A (en) * 2016-12-29 2017-06-13 天津中科智能识别产业技术研究院有限公司 A kind of imperfect cross-module state search method based on sub-space learning
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107346328A (en) * 2017-05-25 2017-11-14 北京大学 A kind of cross-module state association learning method based on more granularity hierarchical networks
CN108334574A (en) * 2018-01-23 2018-07-27 南京邮电大学 A kind of cross-module state search method decomposed based on Harmonious Matrix

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734436B2 (en) * 2015-06-05 2017-08-15 At&T Intellectual Property I, L.P. Hash codes for images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844518A (en) * 2016-12-29 2017-06-13 天津中科智能识别产业技术研究院有限公司 A kind of imperfect cross-module state search method based on sub-space learning
CN106777318A (en) * 2017-01-05 2017-05-31 西安电子科技大学 Matrix decomposition cross-module state Hash search method based on coorinated training
CN107346328A (en) * 2017-05-25 2017-11-14 北京大学 A kind of cross-module state association learning method based on more granularity hierarchical networks
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN108334574A (en) * 2018-01-23 2018-07-27 南京邮电大学 A kind of cross-module state search method decomposed based on Harmonious Matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于解析模态分解法的桥梁动态应变监测数据温度影响的分离;李苗 等;《振动与冲击》;20130201;第31卷(第21期);6-10,29 *
语义提升和矩阵分解在跨模哈希检索中的应用研究;王科;《中国优秀硕士学位论文全文数据库 信息科技辑》;20161015(第10期);I138-439 *

Also Published As

Publication number Publication date
CN109271486A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109271486B (en) Similarity-preserving cross-modal Hash retrieval method
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
Kulis et al. Fast similarity search for learned metrics
Lin et al. Spec hashing: Similarity preserving algorithm for entropy-based coding
US11651037B2 (en) Efficient cross-modal retrieval via deep binary hashing and quantization
Jain et al. Fast image search for learned metrics
CN106202256B (en) Web image retrieval method based on semantic propagation and mixed multi-instance learning
US8891908B2 (en) Semantic-aware co-indexing for near-duplicate image retrieval
US8428397B1 (en) Systems and methods for large scale, high-dimensional searches
CN109657112B (en) Cross-modal Hash learning method based on anchor point diagram
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
CN111460077A (en) Cross-modal Hash retrieval method based on class semantic guidance
Cheng et al. Semi-supervised multi-graph hashing for scalable similarity search
Yu et al. Binary set embedding for cross-modal retrieval
CN109766455B (en) Identified full-similarity preserved Hash cross-modal retrieval method
Lin et al. Optimizing ranking measures for compact binary code learning
CN112163114B (en) Image retrieval method based on feature fusion
Zhang et al. Autoencoder-based unsupervised clustering and hashing
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code
JP5833499B2 (en) Retrieval device and program for retrieving content expressed by high-dimensional feature vector set with high accuracy
Wang et al. A multi-label least-squares hashing for scalable image search
Liu et al. Multiview Cross-Media Hashing with Semantic Consistency
CN112307248B (en) Image retrieval method and device
CN111984800B (en) Hash cross-modal information retrieval method based on dictionary pair learning
Zhong et al. Deep multi-label hashing for large-scale visual search based on semantic graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant