CN107402993B

CN107402993B - The cross-module state search method for maximizing Hash is associated with based on identification

Info

Publication number: CN107402993B
Application number: CN201710581083.1A
Authority: CN
Inventors: 张化祥; 卢旭; 万文博; 刘丽; 郭培莲; 任玉伟; 孙建德; 王强
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2017-07-17
Filing date: 2017-07-17
Publication date: 2018-09-11
Anticipated expiration: 2037-07-17
Also published as: CN107402993A

Abstract

The present invention proposes a kind of cross-module state search method for being associated with based on identification and maximizing Hash, including：Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection；For training multi-modal data collection, constructs and be associated with the object function for maximizing Hash based on identification on the data set；The object function is solved, the joint Hash codes of image, the projection matrix for projecting to common hamming space of text, image text pair are obtained；For test data set, the common hamming space is projected to, and is quantified as the Hash codes of training set sample by hash function；Cross-module state retrieval is carried out based on Hash codes.The present invention improves efficiency and the accuracy of cross-media retrieval.

Description

The cross-module state search method for maximizing Hash is associated with based on identification

Technical field

The present invention relates to field of data retrieval, and in particular to a kind of to be associated with the cross-module state inspection for maximizing Hash based on identification Suo Fangfa.

Background technology

With the development of science and technology, a large amount of multi-modal data has been poured in internet.In order to be retrieved from internet To useful information, range of information retrieval technique has been risen.Traditional information retrieval is based on single mode, that is, the inquiry inputted Data and retrieval obtain the result is that same mode.This makes information retrieval have very much limitation, therefore it is desirable that by single mode The information retrieval of state expands to the information retrieval of cross-module state, i.e., a given pictures are retrieved and retouched with the relevant word of the picture It states, otherwise similarly.

Because the data of different modalities have different characteristics, therefore the similitude of hardly possible directly both measurements, This is the significant challenge across Modal Method.In order to solve this problem, most common method is exactly sub-space learning method.Typical phase It is a kind of general unsupervised sub-space learning method to close analysis (CCA), it is by the data projection of different modalities to the same space, together When by between two mode relationship maximize.CCA methods are intended to maximize the relationship between the data of two different modalities, And offset minimum binary (PLS) is to solve the problems, such as cross-media retrieval from the angle of covariance.The multimode analysis (GMA) of broad sense makes Use category label as supervision message, it is CCA methods in the extension for having supervision field.

Cross-media retrieval method above-mentioned is generally required to consume a large amount of time and be deposited when handling large-scale data Store up space.In order to solve this problem, hash method comes into being.In hash method, indicated with binary Hash codes Data, in the similitude between measuring different data, it is only necessary to base be carried out to the Hash codes of different data in hamming space In the XOR operation of digit.Hash method effectively reduces computational complexity, uses less memory space.Based on Hash across Modal Method obtains different modalities usually by the data projection of different modalities to a general hamming space in this space The Hash codes of data, so as to directly carry out the measuring similarity between different modalities data.Cross-module state inspection based on Hash Suo Fangfa has been obtained for effectively applying, and the common matrix decomposition (CMFH) based on Hash is that multi-modal data learns to one altogether Hash codes, and measuring similarity is carried out in general semantics space using it；Potential applications sparse hash (LSSH) is respectively The high-layer semantic information of two modal datas is obtained using sparse coding and matrix decomposition, is then carried out across matchmaker using hash method Physical examination rope.

Although there are many cross-media retrieval method based on Hash, existing method does not account for data characteristics Identification is distributed.(similar data characteristics is as close possible to inhomogeneous data characteristics to the greatest extent may be used for the identification distribution of data characteristics Can be separate) it can make the better accurate of cross-media retrieval.Therefore, how image and text are being projected into semantic space While keep their own identification to be distributed, be that current those skilled in the art still need to solve to improve retrieval precision The technical issues of.

Invention content

The present invention is to solve the above-mentioned problems, it is proposed that a kind of to be associated with the cross-module state retrieval for maximizing Hash based on identification Method still maintains each mode after the data characteristics of text and image modalities to be projected to a common hamming space Identification distribution and so that the association between pairs of multi-modal data is maximized, to improve cross-module state retrieve it is accurate Degree.

The specific technical solution of the present invention is as follows：

It is a kind of to be associated with the cross-module state search method for maximizing Hash based on identification, include the following steps：

Step 1：Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text；

Step 2：Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection O^train；

Step 3：For training multi-modal data collection O^train, construct being associated with based on identification on the data set and maximize Kazakhstan Uncommon object function；

Step 4：The object function is solved, the projection matrix for projecting to common hamming space of image, text is obtained W₁And W₂, image text pair joint Hash codes B, use joint Hash codes B as the Hash codes to image and text；

Step 5：Test data set is obtained, and multi-modal extraction is carried out to it, obtains test multi-modal data collection O^test；

Step 6：For testing multi-modal data collection O^test, the projection matrix W that is acquired according to step 3₁And W₂, number will be tested The common hamming space is projected to according to the image or text of concentrating each sample, and training set is quantified as by hash function The Hash codes of sample；

Step 7：Carry out cross-module state retrieval, be based on Hash codes, the training data concentrate retrieval with it is to be checked in test set The object of the relevant another mode of rope sample；

Object function is in the step 3：

Wherein,It is the data characteristics square of image and text respectively Battle array,It is label matrix；λ, μ₁, μ₂, β, α are balance parameters, and γ is regularization parameter.

Further, the step 3 includes：

Step 3-1：If training multi-modal data subset O^trainEach data sample isWherein,It is the feature vector of image,It is the feature vector of text, y_i∈{0,1}^cIt is category label, N is sample Number；The data of two mode are projected to from original isomeric space in common hamming space, and are made in a sample in pairs Image and text between association maximize：

Step 3-2：Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities number According to：

Step 3-3：It is Hash codes by two modal data Feature Conversions, the quantization of Hash codes will be obtained by hash function Minimization of loss：

Step 3-4：Category label is added as supervision message, classifies to Hash codes：

Step 3-5：Increasing regularization term prevents over-fitting, is defined as：

Step 3-6：Step 3-1 to 3-5 is integrated, object function is obtained.

Further, the step 4 object function method for solving is：

Step 4-1：Other in object function are fixed, the projection matrix W of image modalities is solved₁；

Step 4-2：Other in object function are fixed, the projection matrix W of text modality is solved₂；

Step 4-3：Other in object function are fixed, joint Hash codes B is solved；

Step 4-4：Other in object function are fixed, grader matrix Q is solved.

Further, the search method further includes：Retrieval is being judged according to the category label that multi-modal data collection carries just True rate.

According to another aspect of the present invention, the present invention also provides a kind of object function structure sides for the retrieval of cross-module state Method, including：

Step 1：Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text； Multi-modal extraction is carried out to the training dataset, obtains training multi-modal data collection O^train；

Step 2：The data of two mode are projected to from original isomeric space in common hamming space, and make one Association in sample between pairs of image and text maximizes；

Step 3：Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities number According to；

Step 4：It is Hash codes by two modal data Feature Conversions, the quantization that Hash codes are obtained by hash function is damaged It loses and minimizes；

Step 5：Category label is added as supervision message；

Step 6：Increasing regularization term prevents over-fitting；

Step 7：Step 2 to 6 is integrated, obtains being associated with the object function object function for maximizing Hash based on identification.

Further, the association between image and text pairs of in a sample is made to maximize definition in the step 2 For：

Wherein, V and T is the data characteristics matrix of image and text, W respectively₁And W₂Respectively image, text project to The projection matrix in common hamming space.

Further, the step 3 includes：Linear discriminant analysis processing is carried out to text modality data, obtains phase in class Like degree matrix S_wThe similarity matrix S between class_b, which is transmitted to image modalities data, is defined as：

Further, the quantization minimization of loss that Hash codes are obtained by hash function is defined as by the step 4：

Wherein, B is joint Hash codes.

Further, step 5 category label is defined as：

Wherein, Q is grader matrix.

Further, step 6 regularization term is defined as：

Beneficial effects of the present invention are：

The present invention takes full advantage of the identification distribution of data characteristics, to text in the cross-media retrieval based on Hash Mode carries out linear discriminant analysis and its characteristic is passed to image modalities.It it also allows the multi-modal number of same sample Association is maximized according to being still maintained after projecting to common hamming space.This all makes the data characteristics in hamming space point Cloth more has identification, to be easier to make for classifying by the Hash codes that data characteristics quantifies, to improve across media The performance of retrieval, while the application of salted hash Salted can reduce consumption of the cross-module state retrieval in the time, spatially.

Description of the drawings

The accompanying drawings which form a part of this application be for providing further understanding of the present application, the application's Suitability embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.

Fig. 1 is that the cross-media retrieval general flow chart for maximizing Hash is associated with based on identification；

Fig. 2 is the organigram for the object function that maximized Hash is associated with based on identification；

Fig. 3 is the schematic diagram for solving the object function.

Specific implementation mode

Below in conjunction with drawings and examples, technical scheme in the embodiment of the invention is clearly and completely described.

It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.

It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.

Embodiment one

A kind of cross-module state search method for being associated with based on identification and maximizing Hash is present embodiments provided, as shown in Figure 1, Include the following steps：

Step 4：The object function is solved, the projection matrix for projecting to common hamming space of image, text is obtained W₁And W₂, image text pair joint Hash codes B and grader matrix Q, use joint Hash codes B as this to image and text Hash codes；

Step 6：For test data set O^test, the projection matrix W that is acquired according to step 3₁And W₂, test data is concentrated The image or text of each sample project to the common hamming space, and utilize the hash function hash function f learnt (V)=sgn (W₁) and g (T)=sgn (W V₂T), the Hash codes that test data concentrates image and text can directly be acquired；

Step 7：Carry out cross-module state retrieval, be based on Hash codes, the training data concentrate retrieval with it is to be checked in test set The object of the relevant another mode of rope sample.

Described be associated with based on identification maximizes the object function of Hash as (as shown in Figure 2)：

Described be associated with based on identification is maximized the construction process of object function of Hash and is：

Step 1：Multi-modal data set O is obtained, the multi-modal data set includes training multi-modal data subset O^trainWith test multi-modal data subset O^test；

It is assumed that each data sampleWhereinIt is the feature vector of image, It is the feature vector of text, y_i∈{0,1}^cIt is category label, N is number of samples.Include a pair of of image in each data sample Text pair, their physical characteristic are different, but their semantic meanings having the same, belong to same class.

We assume here that each sample belongs to one of c class.Then It is the data characteristics matrix of image and text respectively.It is label matrix, such as One sample o of fruit_iIn image and text data feature v_iAnd t_iJ-th of class is belonged to, then y_iJ-th of element be 1, remaining It is 0.

Step 2：The data characteristics of original isomeric space is projected in a general hamming space.

Step 2-1：For O^trainIn each sampleSet two mode of image and text Hash function f (V)=sgn (W₁) and g (T)=sgn (W V₂T), the data of two mode are projected from original isomeric space Into in a common hamming space.

The hash function of two mode is defined as：WithI Use a kind of common hash function representation method, then the specific representation of two hash functions is as follows：F (V)=sgn (W₁) and g (T)=sgn (W V₂T).Wherein sgn () is sign function, and continuous data discrete can be melted into binary Kazakhstan by it Uncommon code；W₁And W₂It is the projection matrix of two mode respectively.

Step 2-2：Because the image and text in reset condition in one sample are pairs of, therefore after projection Hamming space in the association between image and text pairs of in a sample should be made to maximize, be defined as follows：

Wherein W₁And W₂It is the projection matrix of image and text.

Step 2-3：In order to keep the identification characteristic of data, we introduce linear discriminant analysis (LDA) to text mould The data of state are handled, and its characteristic is made to be transmitted to image modalities, are defined as follows：

Wherein S_wFor similar degree in the class matrix, S_bThe similarity matrix between class.

Linear discriminant analysis (LDA) into an optimal identification space, is projecting the data projection in higher dimensional space In space afterwards, distance is as big as possible between different classes of data, and the distance between similar data are as small as possible.To text The data of mode carry out linear discriminant analysis, define S_wFor similar degree in the class matrix, S_bThe similarity matrix between class.We are to text The data of this mode carry out linear discriminant analysis, this can be such that the data distribution for projecting to the text modality in public hamming space has Identification passes through S_wAnd S_bThis characteristic is transmitted to image modalities, is defined as：

Wherein tr () is the mark of matrix.The formula is equivalent to：

Step 3：Using the hash function defined in step (2-1), the image and text in public hamming space will be projected to Data characteristics be quantified as Hash codes.

Because a sample is made of a pair of of image and text, their semantic meanings having the same, therefore we Introduce an auxiliary variable --- the joint Hash codes of two modeThat is a pair of of figure in a sample Picture and text use the same Hash codes.It is as small as possible that we should be such that the quantization of generation Hash codes loses as possible, is defined as follows：

Step 4：Category label is added as supervision message, the joint Hash codes that we learn to obtain can be easy to handy In classification, specifically, the Hash codes learnt are B, be its increased semantic information it is Y, due to semantic information Y and Hash codes B Matrix dimensionality is inconsistent, introduces grader matrix Q and is converted.It is defined as：

Step 5：Over-fitting in order to prevent carries out regularization constraint to projection matrix, is defined as：

Increasing regularization term prevents over-fitting, is defined as：

Above five steps are integrated, we obtain a complete object function：

Wherein λ, μ₁, μ₂, β, α are balance parameters, and γ is regularization parameter (for preventing over-fitting).

Our purpose is to obtain projection matrix W by the object function of solution above₁And W₂, joint Hash codes B.Due to Contain multiple known variables in object function, it can not direct solution.Therefore the present invention proposes an iterative solution algorithm, fixed Its dependent variable solves a variable, we may finally obtain optimal solution in this way.In addition, calculating for simplicity, we will combine The discrete constraint B ∈ { -1,1 } of Hash codes B^L×NLoosen as continuous constraint 0≤B≤1.

According to the object function for maximizing Hash is associated with based on identification, we have proposed an iterative solution algorithms (such as Shown in Fig. 3), for solving our required projection matrix W₁And W₂, joint Hash codes B and grader matrix Q.

Step 1：Its dependent variable W in fixed object function₂, Q and B, solve projection matrix W₁.Object function becomes：

By to W₁Partial derivative is sought, obtained W₁Solution：

W₁=(μ₁BV^T+λW₂TV^T)(μ₁VV^T+λVV^T+γI)。

Step 2：By fixing its dependent variable W₁, Q and B, solve projection matrix W₂.Object function becomes：

By calculating W₂Partial derivative and enable its be equal to 0, obtain W₂Solution：

Step 3：Fix its dependent variable W₁、W₂And Q, solve joint Hash codes B.Object function becomes：

By calculating the partial derivative of B and it being enabled to be equal to 0, the solution of B is obtained：

B=(α Q^TQ+(μ₁+μ₂)I)^-1(αQ^TY+μ₁W₁V+μ₂W₂T)。

Step 4：Fix its dependent variable W₁、W₂And B, solve grader matrix Q.Object function becomes：

By calculating the partial derivative of Q and it being enabled to be equal to 0, the solution of Q is obtained：

Q=(α YB^T)(αBB^T+γI)^-1。

Finally, we use joint Hash codes B as the Hash codes of training sample, for new test sample, Wo Mentong It crosses and hash function is quantified to obtain the Hash codes of test sample.It is carried out across matchmaker by the similarity-rough set between Hash codes Physical examination rope.

The search method further includes：Retrieval accuracy is judged according to the category label that multi-modal data collection carries.Here We assess the retrieval accuracy of this method using common Average Accuracy (MAP) value.A sample retrieval set is given, The Average Accuracy (AP) of wherein each sample retrieval is defined as：WhereinIt is that sample retrieval is concentrated The sum of sample, P (r) indicates the ratio of the quantity and the sample size that is all retrieved of correlated samples, if r-th of retrieval obtains Sample then δ (r)=1 related to query sample, otherwise δ (r)=0.Average value, that is, MAP of the AP values of all samples.

Embodiment two

According to the cross-module state search method for maximizing Hash is associated with based on identification above, present embodiments provide corresponding Object function construction method, as shown in Fig. 2, including：

Step 5：Category label is added as supervision message；

Step 6：Increasing regularization term prevents over-fitting；

It maximizes the association between image and text pairs of in a sample in the step 2 to be defined as：

The step 3 includes：Linear discriminant analysis processing is carried out to text modality data, obtains similar degree in the class matrix S_w The similarity matrix S between class_b, which is transmitted to image modalities data, is defined as：

The quantization minimization of loss that Hash codes are obtained by hash function is defined as by the step 4：

Wherein, B is joint Hash codes.

Step 5 category label is defined as：

Wherein, Q is grader matrix.

Step 6 regularization term is defined as：

Experiment effect：

It is verified with the image text data in Wiki image text data sets, retrieval rate is as shown in table 1.

The retrieval accuracy of 6 kinds of cross-media retrievals (image retrieval text and text retrieval image) on 1 Wiki data sets of table (MAP) compare

As can be seen that the data that the method for the present invention is respectively two mode of text and image learn to respective Hash letter Original data characteristics is projected to a common hamming space, and carries out linear discriminant point to the data of text modality by number Analyse (LDA) processing so that the text feature after projecting keeps identification, and this characteristic will be passed to image modalities. In common hamming space, data characteristics can be transformed into Hash codes, can be easy to breathing out using Classmark information Uncommon code is classified.These operations can obtain good cross-media retrieval effect, at the same the application of salted hash Salted can reduce across Mode retrieves the consumption in the time, spatially.

The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.

Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims

1. a kind of being associated with the cross-module state search method for maximizing Hash based on identification, which is characterized in that include the following steps：

Step 3：For training multi-modal data collection O^train, construct being associated with based on identification on the data set and maximize Hash Object function；

Step 4：The object function is solved, the projection matrix W for projecting to common hamming space of image, text is obtained₁With W₂, image text pair Hash codes B；

Step 6：For testing multi-modal data collection O^test, the projection matrix W that is acquired according to step 4₁And W₂, by test data set In each sample image or text project to the common hamming space, and Hash codes are quantified as by hash function；

Step 7：Cross-module state retrieval is carried out, Hash codes are based on, retrieval and sample to be retrieved in test set are concentrated in the training data The object of this relevant another mode；

Object function is in the step 3：

s.t.B∈{-1,1}^L×N,W₁W₁ ^T=I_k,

Wherein,It is the data characteristics matrix of image and text respectively,It is label matrix；λ, μ₁, μ₂, β, α are balance parameters, and γ is regularization parameter, S_wFor similar degree in the class Matrix, S_bThe similarity matrix between class, Q are grader matrix, and N is number of samples, and c indicates classification number.

2. a kind of cross-module state search method being associated with maximization Hash based on identification as described in claim 1, feature are existed In the step 3 includes：

s.t.W₁W₁ ^T=I_k,

Step 3-2：Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities data：

Step 3-3：It is Hash codes by two modal data Feature Conversions, the quantization that Hash codes are obtained by hash function is lost It minimizes：

s.t.B∈{-1,1}^L,W₁W₁ ^T=I_k,

s.t.B∈{-1,1}^L

Step 3-6：Step 3-1 to 3-5 is integrated, object function is obtained.

3. a kind of cross-module state search method being associated with maximization Hash based on identification as claimed in claim 2, feature are existed In the step 4 object function method for solving is：

Step 4-3：Other in object function are fixed, joint Hash codes B is solved；

Step 4-4：Other in object function are fixed, grader matrix Q is solved.

4. a kind of cross-module state search method being associated with maximization Hash based on identification as described in claim 1, feature are existed In the search method further includes：Retrieval accuracy is judged according to the category label that multi-modal data collection carries.

5. a kind of object function construction method for the retrieval of cross-module state, which is characterized in that including：

Step 1：Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text；To institute It states training dataset and carries out multi-modal extraction, obtain training multi-modal data collection O^train；

Step 2：The data of two mode are projected to from original isomeric space in common hamming space, and make a sample In association between pairs of image and text maximize；

Step 3：Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities data；

Step 4：It is Hash codes by two modal data Feature Conversions, the quantization loss of Hash codes will be obtained by hash function most Smallization；

Step 5：Category label is added as supervision message；

Step 6：Increasing regularization term prevents over-fitting；

6. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 5, which is characterized in that the step It maximizes the association between image and text pairs of in a sample in rapid 2 to be defined as：

s.t.W₁W₁ ^T=I_k,

Wherein, V and T is the data characteristics matrix of image and text, W respectively₁And W₂Respectively image, text project to it is public Hamming space projection matrix.

7. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 6, which is characterized in that the step Rapid 3 include：Linear discriminant analysis processing is carried out to text modality data, obtains similar degree in the class matrix S_wThe similarity moment between class Battle array S_b, which is transmitted to image modalities data, is defined as：

S_wFor similar degree in the class matrix, S_bThe similarity matrix between class.

8. a kind of object function construction method for the retrieval of cross-module state as claimed in claims 6 or 7, which is characterized in that institute Step 4 is stated to be defined as the quantization minimization of loss for obtaining Hash codes by hash function：

s.t.B∈{-1,1}^L,W₁W₁ ^T=I_k,

Wherein, B is joint Hash codes.

9. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 8, which is characterized in that step 5 Category label is defined as：

s.t.B∈{-1,1}^L

Wherein, Q is grader matrix, and Y indicates label matrix.

10. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 9, which is characterized in that described Step 6 regularization term is defined as：