CN103488713B

CN103488713B - A kind of cross-module state search method that can directly measure similarity between different modalities data

Info

Publication number: CN103488713B
Application number: CN201310410553.XA
Authority: CN
Inventors: 庄越挺; 吴飞; 王彦斐; 汤斯亮; 邵健
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-09-10
Filing date: 2013-09-10
Publication date: 2016-09-28
Anticipated expiration: 2033-09-10
Also published as: CN103488713A

Abstract

The invention discloses a kind of cross-module state search method that can directly measure similarity between different modalities data.It comprises the steps: 1) feature extraction；2) model is set up and study；3) retrieve across media data；4) evaluation of result.The present invention directly can carry out similarity-rough set between different modalities data, and for cross-module state retrieval tasks, user can submit the text of any mode, image, sound etc. to, go to retrieve the corresponding mode result of their demand.The difference of the present invention and tradition cross-media retrieval method is the similarity system design that can directly carry out between different modalities data, meet the demand of cross-media retrieval, the retrieval more directly achieving user is intended to, compared with other cross-media retrieval algorithm that can directly measure different modalities similarity, this method has stronger noise resistance interference performance and to loosely related across the ability to express of modal data so that retrieval effectiveness is more preferable.

Description

A kind of cross-module state search method that can directly measure similarity between different modalities data

Technical field

The present invention relates to the retrieval of cross-module state, particularly relate to a kind of can directly measure the cross-module of similarity between different modalities data State search method.

Background technology

Nowadays, the type of electronic data increasingly trends towards rich and varied, various types of data, such as text, image, sound Sound, map etc. are widely present on the internet.Same semantic content the most both can describe, also by the data of a kind of mode Can describe by the data of other mode.Sometimes, for the description of a kind of categorical data of certain semanteme, it is intended that find The description of the other types data of its correspondence.Such as, remove, according to text, the picture that retrieval and text implication are close, or according to figure News report that sheet search is relevant with picture etc..But, existing search method is typically all for single mode data, as Text retrieval text, image retrieval image.Also there are some multi-modal or multimedia retrieval method, but these multi-modal retrieval sides Method is mostly to do similarity measurement between same mode, then calculates the similarity between media data by Function Mapping, Seldom there is the search method directly comparing different modalities similarity.The cross-media retrieval side of measuring similarity is carried out between same mode Method, it is disadvantageous in that, can not learn to the relation across between modal data, needs to rely in data base preassigned Join relation, and for corresponding relation loose between multi-medium data, inquiry effect is undesirable.Therefore, it is necessary to proposition can Directly carry out the cross-media retrieval method of different modalities similarity measurement.Directly carry out the comparison of different modalities data similarity, Its difficult point is, between the feature of different modalities data, difference is relatively big, and in general dimension is higher, there is " semantic gap " Problem.

For how overcoming " semantic gap " problem, carrying out the similarity system design between different modalities data and retrieval, having one A bit with the method that traditional method is different, these methods are generally divided into two classes: a class is that the data of different modalities are regarded as random change Amount, the mapping that the latent space that makes these stochastic variable relatednesss maximum by searching is corresponding, for retrieval data, also projected To latent space, thus complete the retrieval of cross-module state.Another kind of method assumes that more implicit themes, different modalities number in these data According to relatedness carried out model by theme.This two classes method is all directly to carry out the method for data similarity-rough set between different modalities, But, to semantic level loosely related situation between different modalities, " correlation maximum " and " theme " these semantic levels interior In the degree of reliability understood, just less than the classification of different modalities data and incidence relation, these known determine information.The present invention will Dictionary learning is incorporated in the retrieval of cross-module state, directly learns with regard to explicit incidence relation, and utilize label information, can be very Well relation in the loose correspondence of semantic level between text and image is modeled, thus improves the robustness to noise, carry The accuracy rate of high cross-module state retrieval.

Summary of the invention

It is an object of the invention to provide a kind of cross-module state retrieval side that can directly measure similarity between different modalities data Method, in order to the data of another one or more mode can be directly retrieved by the data of a kind of mode.

The cross-module state retrieval side that can directly measure similarity between different modalities data comprises the steps:

1) each modal data in data base is carried out feature extraction and label record；

2) according to corresponding informance between different modalities data in data base and label information, from the angle rebuild to different moulds Diversity and similarity between state paired data are expressed, and utilize label information, build cross-module state retrieval block mold and learn Practise model parameter；

3) the known modal data submitting user to, utilizes cross-media retrieval model to return the most right after carrying out feature extraction Other modal datas of the user's request answered；

4) utilize the true corresponding informance across modal data and label information, cross-media retrieval model is believed from correspondence simultaneously Breath and distinctiveness information two aspect are evaluated.

Described step 1) including:

1) all of image modalities data in data base are extracted SIFT feature, and use k-means method to cluster Forming vision word, be then normalized feature, making the characteristic vector representing each image is unit vector；

2) text modality data all of in data base are carried out part-of-speech tagging, go, unless noun word, to retain in text Noun, constitute a dictionary with the word that occurred in all data bases, each text individually added up the word in dictionary The number of times occurred, uses single text vocabulary frequency to carry out vector quantization, is then normalized characteristic vector, makes to represent each The characteristic vector of text is unit vector；

3) to the data of other mode in data base, extract conventional industry standard feature, and feature be normalized, Making the characteristic vector representing each data is unit vector.

4) to different modalities data corresponding in data base, add up their label information, i.e. record them from that Classification.

Described step 2) including:

1) in cross-media retrieval, introduce the concept of dictionary learning, form cross-module state searching algorithm based on dictionary learning, The data of each mode, the different distinctivenesses between dictionary encoding different modalities, different modalities is rebuild with dictionary and sparse coefficient Similarity between data is modeled by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix are equal Obtain from each modal data learning；

2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, belong to the same modal data of same label Sharing identical dictionary primitive, the dictionary being i.e. not zero arranges, so that label information encodes during dictionary learning, and study To the dictionary with distinctiveness information；

3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in as expression formula (1) based on dictionary In the multi-modal retrieval algorithm frame practised, the corresponding data of different modalities is expressed as entirety and learns；

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{{:, Ω}_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + γ Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (1)

Wherein, M represents the number of mode, and J represents classification i.e. label number, X^(m)Represent the characteristic of m mode, D^(m) Represent the dictionary of m mode, A^(m)Represent the sparse coefficient of m mode,Represent that m mode has those numbers of label l According to corresponding sparse coefficient, the matrix A to any k × n,W^(m)It it is m mode incidence relation square Battle array, λ_m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating the ratio that every part is shared in expression formula,Table Show D^(m)In dictionary element, i.e. a string, k is columns；

4) circulation updates sparse coefficient, dictionary and incidence relation matrix, first fixes dictionary and incidence relation matrix update Sparse coefficient, then utilizes the sparse coefficient obtained and fixed correlation relational matrix to update dictionary, the sparse system that recycling updates Number and dictionary updating incidence relation matrix, so circulate, until meeting the condition of convergence, specifically comprises the following steps that

(1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{{:, Ω}_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

(2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

(3) last, update incidence relation matrix as follows:

\min_{D} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

Described step 3) including:

1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study^(m), initialized Know the sparse coefficient of modal dataAs follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} {+ λ | | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, λ is the parameter of an adjustment factor degree；

2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study^(m), The sparse coefficient of initial reguirements modal dataAs follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study⁽ⁿ⁾, initialize and need Seek modal dataAs follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

4) according to known modal data, study obtain information and above initialization, update known mode sparse coefficient and The sparse coefficient of demand mode is as follows:

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} {+ | | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

Wherein β, λ_m、λ_nIt is adjustable parameter, corresponding with formula (1).

5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, demand modal data is finally determined such as Under:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

Described step 4) including:

1) evaluate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its other the most corresponding mode numbers According to, with the quality of the demand modal data corresponding with known modal data position evaluation result in the results list, for giving Fixed t% index, if before the demand modal data corresponding with known modal data comes t%, then it is assumed that retrieval is correct, otherwise recognizes For retrieval error；

2) retrieve with distinctiveness information evaluation cross-module state, be conceived to known modal data and belong to the need of same label with it Seek modal data, with retrieve list to weigh cross-module state retrieval result, there is identical label as phase with known modal data Closing, be otherwise uncorrelated, the concrete MAP used in information retrieval is as the measurement of this index, the cross-module state to a request Retrieval data, and the list that search returns, the definition of a length of R, MAP is defined as follows based on AP, AP:

AP = \frac{1}{L} Σ_{r = 1}^{R} prec (r) δ (r) - - - (10)

Wherein, the number of data relevant to retrieval data during L is the list that search returns.Prec (r) represents 1 ... r number The ratio shared by data relevant to retrieval data according to, if δ (r)=1 r item data is relevant to retrieval data, otherwise δ R ()=0, MAP is defined as the meansigma methods of all retrieval data AP values.

What the present invention had has the advantages that: the present invention compared with traditional cross-media retrieval method, can directly than Between relatively different modalities, similarity rather than dependence travel to different modalities data with similarity system design between mode by corresponding relation Between, directly compare the implicit associations pass that the benefit of similarity between different modalities it is possible to really excavate across between media data, directly Connect the retrieval intention realizing user.Other cross-media retrieval algorithm that can directly measure different modalities similarity is compared, this Bright improve measurement results to the capacity of resisting disturbance of noise and to loosely related across the ability to express of media data so that retrieval Effect is more preferable, and result is more relevant to retrieval data from semantically.

Accompanying drawing explanation

Fig. 1 is the cross-module state search method schematic diagram that can directly measure similarity between different modalities data；

Fig. 2 is corresponding picture and the example of text in embodiment data base；

Fig. 3 is picture retrieval text and the example of text retrieval picture of the present invention.Each inquiry lists first four and returns Return result.Top is the example of picture retrieval text, in order to more preferably show retrieval result, here with the true picture that text is corresponding Represent the Similar Text that retrieval obtains.Lower section is the example of text retrieval picture.Each example all compared for the present invention and (gives a name And another directly measures the retrieval effectiveness of cross-media retrieval method (GMA) of similarity between different modalities SliM2).

Detailed description of the invention

Described step 1) including:

Described step 2) including:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{{:, Ω}_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + γ Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (1)

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{{:, Ω}_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

(3) last, update incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

Described step 3) including:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, λ is the parameter of an adjustment factor degree；

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

Described step 4) including:

AP = \frac{1}{L} Σ_{r = 1}^{R} prec (r) δ (r) - - - (10)

Embodiment

Assume that we have 2173 respectively to the text of known corresponding relation and view data, the textual data of unknown corresponding relation According to each with view data 693, the example of picture and text such as Fig. 2.Firstly for image modalities data all of in data base Extract SIFT feature, and use k-means method to carry out cluster formation vision word, then feature is normalized, makes generation The characteristic vector of each image of table is unit vector.Text modality data all of in data base are carried out part of speech mark simultaneously Note, goes, unless noun word, to retain the noun in text, constitutes a dictionary with the word occurred in all data bases, right The number of times that the word in dictionary occurs individually added up by each text, uses single text vocabulary frequency to carry out vector quantization, then to spy Levying vector to be normalized, making the characteristic vector representing each text is unit vector.

2173 pairs of data (feature) of pairing are expressed as matrix form, it is stipulated that M represents the number of mode, and J represents classification I.e. label number, X^(m)Represent the characteristic of m mode, D^(m)Represent the dictionary of m mode, A^(m)Represent the sparse of m mode Coefficient,Represent that m mode has the sparse coefficient corresponding to those data of label l, the matrix to any k × nW^(m)It is m mode incidence relation matrix, λ_m(m=1 ..., M), β, γ be adjustable parameter, use Regulate the ratio that every part is shared in expression formula,Represent D^(m)In dictionary element, i.e. a string, k for row Number, has two mode, text and image, therefore M=2 here, and text and image are respectively as X⁽¹⁾And X⁽²⁾。

Then following steps are performed:

1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{{:, Ω}_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

3) last, update incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4)

Thus study obtains D={D⁽¹⁾,D⁽²⁾,...,D^(M), W={W⁽¹⁾,W⁽²⁾,...,W^(M)}.Subsequently into retrieval rank Section, in retrieval phase, we retrieve by any one in 693*2 text of the unknown corresponding relation and image, can return Return text or the image of its correspondence.Specifically comprise the following steps that

Assume that user submits the retrieval data of known image or text modality toWherein m=1 or 2.

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, λ is the parameter of an adjustment factor degree；

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

6) modal data according to demand, returns list ordering to demand mode candidate, returns result after sequence.

Fig. 3 illustrates the concrete instance of cross-media retrieval, including example (top) and the text retrieval of picture retrieval text The example (bottom) of picture, and (give a name SliM to compared for the present invention²) directly measure between different modalities with another similar The retrieval effectiveness of the cross-media retrieval method (GMA) of property.For the example (top) of picture retrieval text, in order to more intuitively open up Showing retrieval effectiveness, the true picture that we use text corresponding represents the text data that retrieval obtains.It will be seen that retrieval figure Sheet derives from sports category, and the result that two kinds of methods are retrieved both is from sports category, but utilizes the knot that the present invention retrieves Text (the corresponding picture of text represents) corresponding to retrieving image has been come first by fruit, and remaining retrieves resulting text (representing with corresponding picture) is also semantic more relevant to retrieving image with in content.For the example of text retrieval picture, literary composition This has intercepted one section of display, and the content of the text is mainly relevant park and trail, belongs to geographical classification, examines by the present invention Rope result out and retrieval text belong to same category, and the most relevant in content.And alternatively retrieve Result out makes number one and the picture of the 4th is from history classification, does not the most also have method and the inspection of the present invention Rope text relevant is strong.

From the example above it can be seen that be different from traditional method, the present invention can directly carry out between different modalities similar Property tolerance, thus realize cross-module state retrieval, even and if compared with the method the most directly comparing different modalities similarity, The method of the present invention has more preferable retrieval effectiveness.

Claims

1. the cross-module state search method that can directly measure similarity between different modalities data, it is characterised in that include walking as follows Rapid:

2) according to corresponding informance between different modalities data in data base and label information, from the angle rebuild, different modalities is joined Diversity between data and similarity are expressed, utilizes label information, build cross-module state retrieval block mold and learn mould Shape parameter；

3) the known modal data submitting user to, utilizes cross-module state retrieval block mold to return the most right after carrying out feature extraction Other modal datas of the user's request answered；

4) utilize the true corresponding informance across modal data and label information, cross-module state retrieval block mold is believed from correspondence simultaneously Breath and distinctiveness information two aspect are evaluated；

Wherein, described step 1) specifically include:

1) all of image modalities data in data base are extracted SIFT feature, and use k-means method to carry out cluster formation Vision word, is then normalized feature, and making the characteristic vector representing each image is unit vector；

2) text modality data all of in data base are carried out part-of-speech tagging, go, unless noun word, to retain the name in text Word, constitutes a dictionary with the word occurred in all data bases, each text is individually added up the word in dictionary and occurs Number of times, use single text vocabulary frequency to carry out vector quantization, then characteristic vector be normalized, make to represent each text Characteristic vector be unit vector；

3) to the data of other mode in data base, extract conventional industry standard feature, and feature is normalized, make generation The characteristic vector of each data of table is unit vector；

A kind of cross-module state search method that can directly measure similarity between different modalities data the most according to claim 1, It is characterized in that, described step 2) including:

1) in cross-module state is retrieved, introduce the concept of dictionary learning, form cross-module state searching algorithm based on dictionary learning, with word Allusion quotation and sparse coefficient rebuild the data of each mode, the different distinctivenesses between dictionary encoding different modalities, different modalities data Between similarity modeled by the incidence relation matrix between sparse coefficient, dictionary, sparse coefficient and incidence relation matrix are all from respectively Modal data learning obtains；

2) utilizing label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data belonging to same label is shared Identical dictionary primitive, the dictionary being i.e. not zero arranges, so that label information encodes during dictionary learning, study is to tool The dictionary of having any different property information；

3) dictionary, sparse coefficient, incidence relation matrix, label information be unified in such as formula (1) is many based on dictionary learning In mode searching algorithm framework, the corresponding data of different modalities is expressed as entirety and learns；

\begin{matrix} \min Σ_{m = 1}^{M} | | X^{(m)} - D^{(m)} A^{(m)} | |_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} | | A_{:, Ω_{l}}^{(m)} | |_{1, 2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} | | A^{(n)} - W^{(m)} A^{(m)} | |_{F}^{2} + Σ_{m = 1}^{M} | | W^{(m)} | |_{F}^{2} \\ \begin{matrix} s . t . & | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} \end{matrix} - - - (1)

Wherein, M represents the number of mode, and J represents classification i.e. label number, X^(m)Represent the characteristic of m mode, D^(m)Represent The dictionary of m mode, A^(m)Represent the sparse coefficient of m mode,Represent that m mode has those data institute of label l right The sparse coefficient answered, the matrix A to any k × n,W^(m)It is m mode incidence relation matrix, λ_m、β、 γ is adjustable parameter, wherein m=1 ..., M, it is used for regulating the ratio that every part is shared in expression formula,Represent D^(m) In dictionary element, i.e. a string, k is columns；

4) circulation updates sparse coefficient, dictionary and incidence relation matrix, first fixes dictionary and incidence relation matrix update is sparse Coefficient, then utilizes the sparse coefficient obtained and fixed correlation relational matrix to update dictionary, sparse coefficient that recycling updates and Dictionary updating incidence relation matrix, so circulates, until meeting the condition of convergence, specifically comprises the following steps that

\begin{matrix} \min Σ_{m = 1}^{M} | | X^{(m)} - D^{(m)} A^{(m)} | |_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} | | A_{:, Ω_{l}}^{(m)} | |_{1, 2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} | | A^{(n)} - W^{(m)} A^{(m)} | |_{F}^{2} \end{matrix} - - - (2)

\begin{matrix} \min_{D} Σ_{m = 1}^{M} | | X^{(m)} - D^{(m)} A^{(m)} | |_{F}^{2} \\ \begin{matrix} s . t . & | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} \end{matrix} - - - (3)

(3) last, update incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} | | A^{(n)} - W^{(m)} A^{(m)} | |_{F}^{2} + (γ / β) Σ_{m = 1}^{M} | | W^{(m)} | |_{F}^{2} - - - (4) .

A kind of cross-module state search method that can directly measure similarity between different modalities data the most according to claim 2, It is characterized in that, described step 3) including:

1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study^(m), initialize known mould The sparse coefficient of state dataAs follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} | | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |_{F}^{2} + λ | | α_{q}^{(m)} | |_{1} - - - (5)

Wherein, λ is the parameter of an adjustment factor degree；

2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study^(m), initialize The sparse coefficient of demand modal dataAs follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study⁽ⁿ⁾, initial reguirements mould State dataAs follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

4) obtain information and above initialization according to known modal data, study, update sparse coefficient and the demand of known mode The sparse coefficient of mode is as follows:

\begin{matrix} \min | | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |_{F}^{2} + | | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |_{F}^{2} \\ + β | | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |_{F}^{2} + λ_{m} | | α_{q}^{(m)} | |_{1} + λ_{n} | | α_{r}^{(n)} | |_{1} \end{matrix} - - - (8)

Wherein β, λ_m、λ_nIt is adjustable parameter, corresponding with formula (1)；

5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, finally determine that demand modal data is as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

The cross-module state search method that can directly measure similarity between different modalities data the most according to claim 1, it is special Levy and be, described step 4) including:

1) evaluate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its other the most corresponding modal datas, With the quality of the demand modal data corresponding with known modal data position evaluation result in the results list, for given T% index, if before the demand modal data corresponding with known modal data comes t%, then it is assumed that retrieval is correct, otherwise it is assumed that Retrieval error；

2) retrieve with distinctiveness information evaluation cross-module state, be conceived to known modal data and belong to the demand mould of same label with it State data, with retrieve list to weigh cross-module state retrieval result, there is identical label as relevant to known modal data, no Being then uncorrelated, the concrete MAP used in information retrieval is as the measurement of this index, the cross-module state retrieval number to a request According to, and the list that search returns, the definition of a length of R, MAP is defined as follows based on AP, AP:

A P = \frac{1}{L} Σ_{r = 1}^{R} p r e c (r) δ (r) - - - (10)

Wherein, the number of data relevant to retrieval data during L is the list that search returns；Prec (r) represents 1 ... in r data To the ratio shared by the retrieval relevant data of data, if δ (r)=1 r item data is relevant to retrieving data, otherwise δ (r)= 0, the MAP meansigma methods being defined as all retrieval data AP values.