CN103488713A

CN103488713A - Cross-modal search method capable of directly measuring similarity of different modal data

Info

Publication number: CN103488713A
Application number: CN201310410553.XA
Authority: CN
Inventors: 庄越挺; 吴飞; 王彦斐; 汤斯亮; 邵健
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-09-10
Filing date: 2013-09-10
Publication date: 2014-01-01
Anticipated expiration: 2033-09-10
Also published as: CN103488713B

Abstract

The invention discloses a cross-modal search method capable of directly measuring similarity of different modal data. The method includes the steps of firstly, feature extracting; secondly, model building and learning; thirdly, cross-media data search; fourthly, result evaluating. By the method compared with traditional cross-media search methods, similarity comparison of different modal data can be performed directly, for cross-modal search tasks, a user can submit texts, images, sounds and the like of optional modals so as to search required corresponding modal results, requirements of cross-media search are satisfied, and search intensions of a user can be achieved more directly. Compared with other cross-media search algorithms capable of directly measuring similarity of different modals, the method is high in noise interference resistance and expression capacity of loosely-related cross-modal data, and better search results can be achieved.

Description

A kind of cross-module state search method that can directly measure similarity between the different modalities data

Technical field

The present invention relates to the retrieval of cross-module state, relate in particular to a kind of cross-module state search method that can directly measure similarity between the different modalities data.

Background technology

Nowadays, the type of electronic data more and more trends towards rich and varied, and various types of data, as text, image, sound, map etc. extensively exist on the internet.Same semantic content often both can be described by a kind of data of mode, also can describe by the data of other mode.Sometimes, for the description of a kind of categorical data of certain semanteme, we wish to find the description of its corresponding other types data.Such as, remove the retrieval picture close with the text implication according to text, or according to picture searching news report relevant with picture etc.But existing search method is generally all for the single mode data, as the text retrieval text, the image retrieval image.Some multi-modal or multimedia retrieval methods are also arranged, but these multi-modal search methods are mostly to do similarity measurement between same mode, then calculate across the similarity between media data by Function Mapping, the direct relatively search method of different modalities similarity is seldom arranged.Between same mode, carry out measuring similarity across the media search method, its weak point is, can not learn the relation between cross-module state data, needs preassigned matching relationship in dependency database, and, for corresponding relation loose between multi-medium data, the inquiry effect is undesirable.Therefore, be necessary to propose directly to carry out the different modalities similarity measurement across the media search method.Directly carry out the comparison of different modalities data similarity, its difficult point is, between the feature of different modalities data, difference is larger, and in general dimension is higher, has the problem of " semantic gap ".

For how overcoming " semantic gap " problem, the similarity of carrying out between the different modalities data compares and retrieval, there are some to follow the classic method diverse ways, these methods generally are divided into two classes: a class is to regard the data of different modalities as stochastic variable, make the mapping of the hidden space corresponding of these stochastic variable relevance maximums by searching, for retrieve data, also it is projected to hidden space, thereby complete the retrieval of cross-module state.Another kind of method is more implicit themes in these data of hypothesis, and the relevance of different modalities data is carried out model by theme.These two class methods are all directly to carry out data similarity method relatively between different modalities, yet, to semantic level loosely related situation between different modalities, the degree of reliability that the inherence of " correlativity maximum " and " theme " these semantic levels is understood is just less than classification and these known definite information of incidence relation of different modalities data.The present invention is incorporated into dictionary learning in the retrieval of cross-module state, directly with regard to explicit incidence relation, learnt, and utilize label information, can carry out modeling to the relation in the loose correspondence of semantic level between text and image well, thereby improve the robustness to noise, improve the accuracy rate of cross-module state retrieval.

Summary of the invention

The object of the present invention is to provide a kind of cross-module state search method that can directly measure similarity between the different modalities data, in order to can directly retrieve by a kind of data of mode the data of another one or more mode.

The cross-module state retrieval side that can directly measure similarity between the different modalities data comprises the steps:

1) each modal data in database is carried out to feature extraction and label record;

2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;

3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;

4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.

Described step 1) comprising:

1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;

2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;

3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.

4) to different modalities data corresponding in database, add up their label information, record them from that classification.

Described step 2) comprising:

1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;

2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;

3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + γ Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (1)

Wherein, M means the number of mode, and J means that classification is the label number, X ^(m)the characteristic that means m mode, D ^(m)the dictionary that means m mode, A ^(m)the sparse coefficient that means m mode,

mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,

w ^(m)m mode incidence relation matrix, λ _m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,

mean D ^(m)in a dictionary element, one row, k is columns;

4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:

(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

(3) last, upgrade the incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

Described step 3) comprising:

1) the known m modal data of submitting to according to the user

the known mode dictionary D obtained with study ^(m), the sparse coefficient of the known modal data of initialization

as follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, the parameter that λ is an adjustment factor degree;

2) according to the sparse coefficient of initialized known modal data the incidence relation matrix W obtained with study ^(m), the sparse coefficient of initialization demand modal data

as follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) the sparse coefficient of modal data according to demand the demand mode dictionary D obtained with study ⁽ⁿ⁾, initialization demand modal data as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

Wherein β, λ _m, λ _nadjustable parameter, corresponding with formula (1).

5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

Described step 4) comprising:

1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;

2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:

AP = \frac{1}{L} Σ_{r = 1}^{R} prec (r) δ (r) - - - (10)

Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.

The useful effect that the present invention has is: the present invention and traditional comparing across the media search method, can directly compare similarity between different modalities, rather than dependence relatively propagates between the different modalities data by corresponding relation with similarity between mode, the benefit that directly compares similarity between different modalities is, can really excavate across the implicit associations between media data and close, directly realize user's retrieval intention.Other can directly measure comparing across the media searching algorithm of different modalities similarity, the present invention improved tolerance result to the antijamming capability of noise and to loosely related the ability to express across media data, make retrieval effectiveness better, result is more from semantically relevant to retrieve data.

The accompanying drawing explanation

Fig. 1 is the cross-module state search method schematic diagram that can directly measure similarity between the different modalities data;

Fig. 2 is the example of corresponding picture and text in the embodiment database;

Fig. 3 is the example of picture retrieval text of the present invention and text retrieval picture.Each inquiry has been listed first four and has been returned results.Top is the example of picture retrieval text, and for the better result for retrieval of showing, with text, corresponding true picture represents the Similar Text that retrieval obtains here.Below is the example of text retrieval picture.Each example has all contrasted the present invention's (SliM2 gives a name) and another directly measures the retrieval effectiveness across media search method (GMA) of similarity between different modalities.

Embodiment

Described step 1) comprising:

Described step 2) comprising:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + γ Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (1)

Wherein, M means the number of mode, and J means that classification is the label number, X ^(m)the characteristic that means m mode, D ^(m)the dictionary that means m mode, A ^(m)the sparse coefficient that means m mode, mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,

mean D ^(m)in a dictionary element, one row, k is columns;

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

(3) last, upgrade the incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

Described step 3) comprising:

1) the known m modal data of submitting to according to the user

the known mode dictionary D obtained with study ^(m), the sparse coefficient of the known modal data of initialization as follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, the parameter that λ is an adjustment factor degree;

2) according to the sparse coefficient of initialized known modal data

the incidence relation matrix W obtained with study ^(m), the sparse coefficient of initialization demand modal data

as follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) the sparse coefficient of modal data according to demand

the demand mode dictionary D obtained with study ⁽ⁿ⁾, initialization demand modal data

as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

Wherein β, λ _m, λ _nadjustable parameter, corresponding with formula (1).

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

Described step 4) comprising:

AP = \frac{1}{L} Σ_{r = 1}^{R} prec (r) δ (r) - - - (10)

Embodiment

Suppose that we have respectively text and the view data of 2173 pairs of known corresponding relations, each 693 of the text data of unknown corresponding relation and view data, the example of picture and text is as Fig. 2.At first extract the SIFT feature for image modalities data all in database, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector.Text modality data all in database are carried out to part-of-speech tagging simultaneously, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector.

2173 pairs of data (feature) of pairing are expressed as to matrix form, and regulation M means the number of mode, and J means that classification is the label number, X ^(m)the characteristic that means m mode, D ^(m)the dictionary that means m mode, A ^(m)the sparse coefficient that means m mode,

mean D ^(m)in a dictionary element, one row, k is columns, and two mode are arranged here, text and image, so M=2, text and image are respectively as X ⁽¹⁾and X ⁽²⁾.

Then carry out following steps:

1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

3) last, upgrade the incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

Thereby study obtains D={D ⁽¹⁾, D ⁽²⁾..., D ^(M), W={W ⁽¹⁾, W ⁽²⁾..., W ^(M).Then enter retrieval phase, in retrieval phase, we do retrieval with 693*2 text of unknown corresponding relation and any one in image, can return to its corresponding text or image.Concrete steps are as follows:

Suppose that the user submits the retrieve data of known image or text modality to m=1 or 2 wherein.

1) the known m modal data of submitting to according to the user

as follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, the parameter that λ is an adjustment factor degree;

2) according to the sparse coefficient of initialized known modal data

as follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) the sparse coefficient of modal data according to demand

as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

Wherein β, λ _m, λ _nadjustable parameter, corresponding with formula (1).

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9)

6) modal data according to demand, to the sequence of demand mode candidate return-list, return to result after sequence.

Fig. 3 has meaned comprise the example (top) of picture retrieval text and the example (bottom) of text retrieval picture, and (SliM to give a name to have contrasted the present invention across the concrete instance of media retrieval ²) and another directly measure the retrieval effectiveness across media search method (GMA) of similarity between different modalities.For the example (top) of picture retrieval text, in order more intuitively to show retrieval effectiveness, we adopt the true picture that text is corresponding to represent the text data that retrieval obtains.Can see, retrieving image derives from sports category, the result that two kinds of methods are retrieved is all from sports category, but the result of utilizing the present invention to retrieve has come first by the corresponding text of retrieving image (text means with corresponding picture), remaining result for retrieval text (meaning with corresponding picture) is also more relevant to retrieving image on semantic and content.For the example of text retrieval picture, text has only intercepted one section demonstration, and the content of the text is mainly relevant park and trail, belongs to geographical classification, and the result be retrieved with the present invention and retrieval text belong to same classification, and very relevant on content.And the result that retrieves out by another method makes number one and the picture of the 4th from historical classification, do not have in terms of content method of the present invention strong with the retrieval text relevant yet.

From top example, can find out, be different from classic method, the present invention can directly carry out similarity measurement between different modalities, thereby realizes the retrieval of cross-module state, even and compare with the method that can directly compare equally the different modalities similarity, method of the present invention has better retrieval effectiveness.

Claims

1. the cross-module state search method that can directly measure similarity between the different modalities data, is characterized in that comprising the steps:

2. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 1) comprising:

3. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 2) comprising:

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + γ Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (1)

w ^(m)m mode incidence relation matrix, λ _m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula, mean D ^(m)in a dictionary element, one row, k is columns;

\begin{matrix} \min Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} + Σ_{m = 1}^{M} Σ_{l = 1}^{J} λ_{m} {| | A_{:, Ω_{l}}^{(m)} | |}_{1,2} \\ + β Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} \end{matrix} - - - (2)

\begin{matrix} \min_{D} Σ_{m = 1}^{M} {| | X^{(m)} - D^{(m)} A^{(m)} | |}_{F}^{2} \\ s . t . | | d_{k}^{(m)} | | \leq 1, &ForAll; k, &ForAll; m, \end{matrix} - - - (3)

(3) last, upgrade the incidence relation matrix as follows:

\min_{W} Σ_{m = 1}^{M} \underset{n &NotEqual; m}{Σ} {| | A^{(n)} - W^{(m)} A^{(m)} | |}_{F}^{2} + (γ / β) Σ_{m = 1}^{M} {| | W^{(m)} | |}_{F}^{2} - - - (4) .

4. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 3) comprising:

1) the known m modal data of submitting to according to the user

as follows:

α_{q}^{(m)} = \min_{α_{q}} \frac{1}{2} {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ {| | α_{q}^{(m)} | |}_{1} - - - (5)

Wherein, the parameter that λ is an adjustment factor degree;

2) according to the sparse coefficient of initialized known modal data

as follows:

α_{r}^{(n)} = W^{(m)} α_{q}^{(m)} - - - (6)

3) the sparse coefficient of modal data according to demand

as follows:

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (7)

\begin{matrix} \min {| | x_{q}^{(m)} - D^{(m)} α_{q}^{(m)} | |}_{F}^{2} + {| | x_{r}^{(n)} - D^{(n)} α_{r}^{(n)} | |}_{F}^{2} \\ + β {| | α_{r}^{(n)} - W^{(m)} α_{q}^{(m)} | |}_{F}^{2} + λ_{m} {| | α_{q}^{(m)} | |}_{1} + λ_{n} {| | α_{r}^{(n)} | |}_{1} \end{matrix} - - - (8)

Wherein β, λ _m, λ _nadjustable parameter, corresponding with formula (1).

x_{r}^{(n)} = D^{(n)} α_{r}^{(n)} - - - (9) .

5. a kind of cross-module state searching algorithm that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 4) comprising:

AP = \frac{1}{L} Σ_{r = 1}^{R} prec (r) δ (r) - - - (10)