CN103488713A - Cross-modal search method capable of directly measuring similarity of different modal data - Google Patents

Cross-modal search method capable of directly measuring similarity of different modal data Download PDF

Info

Publication number
CN103488713A
CN103488713A CN201310410553.XA CN201310410553A CN103488713A CN 103488713 A CN103488713 A CN 103488713A CN 201310410553 A CN201310410553 A CN 201310410553A CN 103488713 A CN103488713 A CN 103488713A
Authority
CN
China
Prior art keywords
data
dictionary
modal
cross
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310410553.XA
Other languages
Chinese (zh)
Other versions
CN103488713B (en
Inventor
庄越挺
吴飞
王彦斐
汤斯亮
邵健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310410553.XA priority Critical patent/CN103488713B/en
Publication of CN103488713A publication Critical patent/CN103488713A/en
Application granted granted Critical
Publication of CN103488713B publication Critical patent/CN103488713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-modal search method capable of directly measuring similarity of different modal data. The method includes the steps of firstly, feature extracting; secondly, model building and learning; thirdly, cross-media data search; fourthly, result evaluating. By the method compared with traditional cross-media search methods, similarity comparison of different modal data can be performed directly, for cross-modal search tasks, a user can submit texts, images, sounds and the like of optional modals so as to search required corresponding modal results, requirements of cross-media search are satisfied, and search intensions of a user can be achieved more directly. Compared with other cross-media search algorithms capable of directly measuring similarity of different modals, the method is high in noise interference resistance and expression capacity of loosely-related cross-modal data, and better search results can be achieved.

Description

A kind of cross-module state search method that can directly measure similarity between the different modalities data
Technical field
The present invention relates to the retrieval of cross-module state, relate in particular to a kind of cross-module state search method that can directly measure similarity between the different modalities data.
Background technology
Nowadays, the type of electronic data more and more trends towards rich and varied, and various types of data, as text, image, sound, map etc. extensively exist on the internet.Same semantic content often both can be described by a kind of data of mode, also can describe by the data of other mode.Sometimes, for the description of a kind of categorical data of certain semanteme, we wish to find the description of its corresponding other types data.Such as, remove the retrieval picture close with the text implication according to text, or according to picture searching news report relevant with picture etc.But existing search method is generally all for the single mode data, as the text retrieval text, the image retrieval image.Some multi-modal or multimedia retrieval methods are also arranged, but these multi-modal search methods are mostly to do similarity measurement between same mode, then calculate across the similarity between media data by Function Mapping, the direct relatively search method of different modalities similarity is seldom arranged.Between same mode, carry out measuring similarity across the media search method, its weak point is, can not learn the relation between cross-module state data, needs preassigned matching relationship in dependency database, and, for corresponding relation loose between multi-medium data, the inquiry effect is undesirable.Therefore, be necessary to propose directly to carry out the different modalities similarity measurement across the media search method.Directly carry out the comparison of different modalities data similarity, its difficult point is, between the feature of different modalities data, difference is larger, and in general dimension is higher, has the problem of " semantic gap ".
For how overcoming " semantic gap " problem, the similarity of carrying out between the different modalities data compares and retrieval, there are some to follow the classic method diverse ways, these methods generally are divided into two classes: a class is to regard the data of different modalities as stochastic variable, make the mapping of the hidden space corresponding of these stochastic variable relevance maximums by searching, for retrieve data, also it is projected to hidden space, thereby complete the retrieval of cross-module state.Another kind of method is more implicit themes in these data of hypothesis, and the relevance of different modalities data is carried out model by theme.These two class methods are all directly to carry out data similarity method relatively between different modalities, yet, to semantic level loosely related situation between different modalities, the degree of reliability that the inherence of " correlativity maximum " and " theme " these semantic levels is understood is just less than classification and these known definite information of incidence relation of different modalities data.The present invention is incorporated into dictionary learning in the retrieval of cross-module state, directly with regard to explicit incidence relation, learnt, and utilize label information, can carry out modeling to the relation in the loose correspondence of semantic level between text and image well, thereby improve the robustness to noise, improve the accuracy rate of cross-module state retrieval.
Summary of the invention
The object of the present invention is to provide a kind of cross-module state search method that can directly measure similarity between the different modalities data, in order to can directly retrieve by a kind of data of mode the data of another one or more mode.
The cross-module state retrieval side that can directly measure similarity between the different modalities data comprises the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
Described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
Described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + γ Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M means the number of mode, and J means that classification is the label number, X (m)the characteristic that means m mode, D (m)the dictionary that means m mode, A (m)the sparse coefficient that means m mode,
Figure BDA0000379937430000031
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
Figure BDA0000379937430000032
w (m)m mode incidence relation matrix, λ m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
Figure BDA0000379937430000033
mean D (m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, upgrade the incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
Described step 3) comprising:
1) the known m modal data of submitting to according to the user
Figure BDA0000379937430000037
the known mode dictionary D obtained with study (m), the sparse coefficient of the known modal data of initialization
Figure BDA0000379937430000038
as follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data the incidence relation matrix W obtained with study (m), the sparse coefficient of initialization demand modal data
Figure BDA00003799374300000311
as follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) the sparse coefficient of modal data according to demand the demand mode dictionary D obtained with study (n), initialization demand modal data as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λ m, λ nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
Described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
AP = 1 L Σ r = 1 R prec ( r ) δ ( r ) - - - ( 10 )
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
The useful effect that the present invention has is: the present invention and traditional comparing across the media search method, can directly compare similarity between different modalities, rather than dependence relatively propagates between the different modalities data by corresponding relation with similarity between mode, the benefit that directly compares similarity between different modalities is, can really excavate across the implicit associations between media data and close, directly realize user's retrieval intention.Other can directly measure comparing across the media searching algorithm of different modalities similarity, the present invention improved tolerance result to the antijamming capability of noise and to loosely related the ability to express across media data, make retrieval effectiveness better, result is more from semantically relevant to retrieve data.
The accompanying drawing explanation
Fig. 1 is the cross-module state search method schematic diagram that can directly measure similarity between the different modalities data;
Fig. 2 is the example of corresponding picture and text in the embodiment database;
Fig. 3 is the example of picture retrieval text of the present invention and text retrieval picture.Each inquiry has been listed first four and has been returned results.Top is the example of picture retrieval text, and for the better result for retrieval of showing, with text, corresponding true picture represents the Similar Text that retrieval obtains here.Below is the example of text retrieval picture.Each example has all contrasted the present invention's (SliM2 gives a name) and another directly measures the retrieval effectiveness across media search method (GMA) of similarity between different modalities.
Embodiment
The cross-module state retrieval side that can directly measure similarity between the different modalities data comprises the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
Described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
Described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + γ Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M means the number of mode, and J means that classification is the label number, X (m)the characteristic that means m mode, D (m)the dictionary that means m mode, A (m)the sparse coefficient that means m mode, mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
Figure BDA0000379937430000063
w (m)m mode incidence relation matrix, λ m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
Figure BDA0000379937430000064
mean D (m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, upgrade the incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
Described step 3) comprising:
1) the known m modal data of submitting to according to the user
Figure BDA0000379937430000071
the known mode dictionary D obtained with study (m), the sparse coefficient of the known modal data of initialization as follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
Figure BDA0000379937430000074
the incidence relation matrix W obtained with study (m), the sparse coefficient of initialization demand modal data
Figure BDA0000379937430000075
as follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) the sparse coefficient of modal data according to demand
Figure BDA0000379937430000077
the demand mode dictionary D obtained with study (n), initialization demand modal data
Figure BDA0000379937430000078
as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λ m, λ nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
Described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
AP = 1 L Σ r = 1 R prec ( r ) δ ( r ) - - - ( 10 )
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
Embodiment
Suppose that we have respectively text and the view data of 2173 pairs of known corresponding relations, each 693 of the text data of unknown corresponding relation and view data, the example of picture and text is as Fig. 2.At first extract the SIFT feature for image modalities data all in database, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector.Text modality data all in database are carried out to part-of-speech tagging simultaneously, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector.
2173 pairs of data (feature) of pairing are expressed as to matrix form, and regulation M means the number of mode, and J means that classification is the label number, X (m)the characteristic that means m mode, D (m)the dictionary that means m mode, A (m)the sparse coefficient that means m mode,
Figure BDA0000379937430000081
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
Figure BDA0000379937430000082
w (m)m mode incidence relation matrix, λ m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
Figure BDA0000379937430000083
mean D (m)in a dictionary element, one row, k is columns, and two mode are arranged here, text and image, so M=2, text and image are respectively as X (1)and X (2).
Then carry out following steps:
1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
3) last, upgrade the incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
Thereby study obtains D={D (1), D (2)..., D (M), W={W (1), W (2)..., W (M).Then enter retrieval phase, in retrieval phase, we do retrieval with 693*2 text of unknown corresponding relation and any one in image, can return to its corresponding text or image.Concrete steps are as follows:
Suppose that the user submits the retrieve data of known image or text modality to m=1 or 2 wherein.
1) the known m modal data of submitting to according to the user
Figure BDA0000379937430000092
the known mode dictionary D obtained with study (m), the sparse coefficient of the known modal data of initialization
Figure BDA0000379937430000093
as follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
Figure BDA0000379937430000095
the incidence relation matrix W obtained with study (m), the sparse coefficient of initialization demand modal data
Figure BDA0000379937430000096
as follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) the sparse coefficient of modal data according to demand
Figure BDA0000379937430000098
the demand mode dictionary D obtained with study (n), initialization demand modal data
Figure BDA0000379937430000099
as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λ m, λ nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 )
6) modal data according to demand, to the sequence of demand mode candidate return-list, return to result after sequence.
Fig. 3 has meaned comprise the example (top) of picture retrieval text and the example (bottom) of text retrieval picture, and (SliM to give a name to have contrasted the present invention across the concrete instance of media retrieval 2) and another directly measure the retrieval effectiveness across media search method (GMA) of similarity between different modalities.For the example (top) of picture retrieval text, in order more intuitively to show retrieval effectiveness, we adopt the true picture that text is corresponding to represent the text data that retrieval obtains.Can see, retrieving image derives from sports category, the result that two kinds of methods are retrieved is all from sports category, but the result of utilizing the present invention to retrieve has come first by the corresponding text of retrieving image (text means with corresponding picture), remaining result for retrieval text (meaning with corresponding picture) is also more relevant to retrieving image on semantic and content.For the example of text retrieval picture, text has only intercepted one section demonstration, and the content of the text is mainly relevant park and trail, belongs to geographical classification, and the result be retrieved with the present invention and retrieval text belong to same classification, and very relevant on content.And the result that retrieves out by another method makes number one and the picture of the 4th from historical classification, do not have in terms of content method of the present invention strong with the retrieval text relevant yet.
From top example, can find out, be different from classic method, the present invention can directly carry out similarity measurement between different modalities, thereby realizes the retrieval of cross-module state, even and compare with the method that can directly compare equally the different modalities similarity, method of the present invention has better retrieval effectiveness.

Claims (5)

1. the cross-module state search method that can directly measure similarity between the different modalities data, is characterized in that comprising the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
2. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
3. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + γ Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M means the number of mode, and J means that classification is the label number, X (m)the characteristic that means m mode, D (m)the dictionary that means m mode, A (m)the sparse coefficient that means m mode,
Figure FDA0000379937420000022
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
Figure FDA0000379937420000023
w (m)m mode incidence relation matrix, λ m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula, mean D (m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, upgrade the incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
4. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 3) comprising:
1) the known m modal data of submitting to according to the user
Figure FDA0000379937420000028
the known mode dictionary D obtained with study (m), the sparse coefficient of the known modal data of initialization
Figure FDA0000379937420000029
as follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
Figure FDA0000379937420000032
the incidence relation matrix W obtained with study (m), the sparse coefficient of initialization demand modal data
Figure FDA0000379937420000033
as follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) the sparse coefficient of modal data according to demand
Figure FDA0000379937420000035
the demand mode dictionary D obtained with study (n), initialization demand modal data
Figure FDA0000379937420000036
as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λ m, λ nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
5. a kind of cross-module state searching algorithm that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
AP = 1 L Σ r = 1 R prec ( r ) δ ( r ) - - - ( 10 )
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
CN201310410553.XA 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data Active CN103488713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310410553.XA CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310410553.XA CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Publications (2)

Publication Number Publication Date
CN103488713A true CN103488713A (en) 2014-01-01
CN103488713B CN103488713B (en) 2016-09-28

Family

ID=49828939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310410553.XA Active CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Country Status (1)

Country Link
CN (1) CN103488713B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166684A (en) * 2014-07-24 2014-11-26 北京大学 Cross-media retrieval method based on uniform sparse representation
CN104199826A (en) * 2014-07-24 2014-12-10 北京大学 Heterogeneous media similarity calculation method and retrieval method based on correlation analysis
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104317838A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-media Hash index method based on coupling differential dictionary
CN104346450A (en) * 2014-10-29 2015-02-11 浙江大学 Cross-media ordering method based on multi-modal implicit coupling expression
CN104462489A (en) * 2014-12-18 2015-03-25 北京邮电大学 Cross-modal retrieval method based on deep-layer models
CN105550190A (en) * 2015-06-26 2016-05-04 许昌学院 Knowledge graph-oriented cross-media retrieval system
CN107633259A (en) * 2017-08-21 2018-01-26 天津大学 A kind of cross-module state learning method represented based on sparse dictionary
CN108038080A (en) * 2017-11-29 2018-05-15 浙江大学 A kind of method that local multi-modal sparse coding completion is carried out using the similar tactical ruleization of adaptability
CN108121750A (en) * 2016-11-30 2018-06-05 西门子公司 A kind of model treatment method, apparatus and machine readable media
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN110704708A (en) * 2019-09-27 2020-01-17 深圳市商汤科技有限公司 Data processing method, device, equipment and storage medium
CN111930972A (en) * 2020-08-04 2020-11-13 山东大学 Cross-modal retrieval method and system for multimedia data by using label level information
CN112364197A (en) * 2020-11-12 2021-02-12 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN113656660A (en) * 2021-10-14 2021-11-16 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268346A (en) * 2013-05-27 2013-08-28 翁时锋 Semi-supervised classification method and semi-supervised classification system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268346A (en) * 2013-05-27 2013-08-28 翁时锋 Semi-supervised classification method and semi-supervised classification system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴飞等: "跨媒体分析与检索", 《中国计算机学会通讯》 *
尹湘舟: "面向web图片检索的文本和图片信息融合技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199826A (en) * 2014-07-24 2014-12-10 北京大学 Heterogeneous media similarity calculation method and retrieval method based on correlation analysis
CN104166684A (en) * 2014-07-24 2014-11-26 北京大学 Cross-media retrieval method based on uniform sparse representation
CN104199826B (en) * 2014-07-24 2017-06-30 北京大学 A kind of dissimilar medium similarity calculation method and search method based on association analysis
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104317838A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-media Hash index method based on coupling differential dictionary
CN104317838B (en) * 2014-10-10 2017-05-17 浙江大学 Cross-media Hash index method based on coupling differential dictionary
CN104317837B (en) * 2014-10-10 2017-06-23 浙江大学 A kind of cross-module state search method based on topic model
CN104346450A (en) * 2014-10-29 2015-02-11 浙江大学 Cross-media ordering method based on multi-modal implicit coupling expression
CN104346450B (en) * 2014-10-29 2017-06-23 浙江大学 A kind of across media sort methods based on multi-modal recessive coupling expression
CN104462489B (en) * 2014-12-18 2018-02-23 北京邮电大学 A kind of cross-module state search method based on Deep model
CN104462489A (en) * 2014-12-18 2015-03-25 北京邮电大学 Cross-modal retrieval method based on deep-layer models
CN105550190A (en) * 2015-06-26 2016-05-04 许昌学院 Knowledge graph-oriented cross-media retrieval system
CN105550190B (en) * 2015-06-26 2019-03-29 许昌学院 Cross-media retrieval system towards knowledge mapping
CN108121750B (en) * 2016-11-30 2022-07-08 西门子公司 Model processing method and device and machine readable medium
CN108121750A (en) * 2016-11-30 2018-06-05 西门子公司 A kind of model treatment method, apparatus and machine readable media
CN107633259B (en) * 2017-08-21 2020-03-31 天津大学 Cross-modal learning method based on sparse dictionary representation
CN107633259A (en) * 2017-08-21 2018-01-26 天津大学 A kind of cross-module state learning method represented based on sparse dictionary
CN108038080A (en) * 2017-11-29 2018-05-15 浙江大学 A kind of method that local multi-modal sparse coding completion is carried out using the similar tactical ruleization of adaptability
CN110059217A (en) * 2019-04-29 2019-07-26 广西师范大学 A kind of image text cross-media retrieval method of two-level network
CN110059217B (en) * 2019-04-29 2022-11-04 广西师范大学 Image text cross-media retrieval method for two-stage network
CN110704708A (en) * 2019-09-27 2020-01-17 深圳市商汤科技有限公司 Data processing method, device, equipment and storage medium
CN111930972A (en) * 2020-08-04 2020-11-13 山东大学 Cross-modal retrieval method and system for multimedia data by using label level information
CN111930972B (en) * 2020-08-04 2021-04-27 山东大学 Cross-modal retrieval method and system for multimedia data by using label level information
CN112364197A (en) * 2020-11-12 2021-02-12 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN112364197B (en) * 2020-11-12 2021-06-01 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN113656660A (en) * 2021-10-14 2021-11-16 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium

Also Published As

Publication number Publication date
CN103488713B (en) 2016-09-28

Similar Documents

Publication Publication Date Title
CN103488713A (en) Cross-modal search method capable of directly measuring similarity of different modal data
CN107346328B (en) Cross-modal association learning method based on multi-granularity hierarchical network
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN106886601B (en) A kind of Cross-modality search method based on the study of subspace vehicle mixing
US20120323968A1 (en) Learning Discriminative Projections for Text Similarity Measures
CN109858015B (en) Semantic similarity calculation method and device based on CTW (computational cost) and KM (K-value) algorithm
Sarawagi et al. Open-domain quantity queries on web tables: annotation, response, and consensus models
CN106708929B (en) Video program searching method and device
US20150356199A1 (en) Click-through-based cross-view learning for internet searches
Guo et al. Sparse deep nonnegative matrix factorization
Wang et al. Image tag refinement by regularized latent Dirichlet allocation
CN103455487A (en) Extracting method and device for search term
CN102693321A (en) Cross-media information analysis and retrieval method
CN104317837B (en) A kind of cross-module state search method based on topic model
US11709858B2 (en) Mapping of unlabeled data onto a target schema via semantic type detection
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
CN105701225A (en) Cross-media search method based on unification association supergraph protocol
CN106570196B (en) Video program searching method and device
CN113792594B (en) Method and device for locating language fragments in video based on contrast learning
CN103279581B (en) A kind of method utilizing compact video subject description to carry out video frequency searching
CN113326392B (en) Remote sensing image audio retrieval method based on quadruple hash
CN116151258A (en) Text disambiguation method, electronic device and storage medium
CN116561594A (en) Legal document similarity analysis method based on Word2vec
CN105677830A (en) Heterogeneous media similarity computing method and retrieving method based on entity mapping

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140101

Assignee: TONGDUN HOLDINGS Co.,Ltd.

Assignor: ZHEJIANG University

Contract record no.: X2021990000612

Denomination of invention: A cross modal retrieval method that can directly measure the similarity between different modal data

Granted publication date: 20160928

License type: Common License

Record date: 20211012

EE01 Entry into force of recordation of patent licensing contract