CN103488713B - A kind of cross-module state search method that can directly measure similarity between different modalities data - Google Patents

A kind of cross-module state search method that can directly measure similarity between different modalities data Download PDF

Info

Publication number
CN103488713B
CN103488713B CN201310410553.XA CN201310410553A CN103488713B CN 103488713 B CN103488713 B CN 103488713B CN 201310410553 A CN201310410553 A CN 201310410553A CN 103488713 B CN103488713 B CN 103488713B
Authority
CN
China
Prior art keywords
data
dictionary
retrieval
cross
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310410553.XA
Other languages
Chinese (zh)
Other versions
CN103488713A (en
Inventor
庄越挺
吴飞
王彦斐
汤斯亮
邵健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310410553.XA priority Critical patent/CN103488713B/en
Publication of CN103488713A publication Critical patent/CN103488713A/en
Application granted granted Critical
Publication of CN103488713B publication Critical patent/CN103488713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of cross-module state search method that can directly measure similarity between different modalities data.It comprises the steps: 1) feature extraction;2) model is set up and study;3) retrieve across media data;4) evaluation of result.The present invention directly can carry out similarity-rough set between different modalities data, and for cross-module state retrieval tasks, user can submit the text of any mode, image, sound etc. to, go to retrieve the corresponding mode result of their demand.The difference of the present invention and tradition cross-media retrieval method is the similarity system design that can directly carry out between different modalities data, meet the demand of cross-media retrieval, the retrieval more directly achieving user is intended to, compared with other cross-media retrieval algorithm that can directly measure different modalities similarity, this method has stronger noise resistance interference performance and to loosely related across the ability to express of modal data so that retrieval effectiveness is more preferable.

Description

A kind of cross-module state search method that can directly measure similarity between different modalities data
Technical field
The present invention relates to the retrieval of cross-module state, particularly relate to a kind of can directly measure the cross-module of similarity between different modalities data State search method.
Background technology
Nowadays, the type of electronic data increasingly trends towards rich and varied, various types of data, such as text, image, sound Sound, map etc. are widely present on the internet.Same semantic content the most both can describe, also by the data of a kind of mode Can describe by the data of other mode.Sometimes, for the description of a kind of categorical data of certain semanteme, it is intended that find The description of the other types data of its correspondence.Such as, remove, according to text, the picture that retrieval and text implication are close, or according to figure News report that sheet search is relevant with picture etc..But, existing search method is typically all for single mode data, as Text retrieval text, image retrieval image.Also there are some multi-modal or multimedia retrieval method, but these multi-modal retrieval sides Method is mostly to do similarity measurement between same mode, then calculates the similarity between media data by Function Mapping, Seldom there is the search method directly comparing different modalities similarity.The cross-media retrieval side of measuring similarity is carried out between same mode Method, it is disadvantageous in that, can not learn to the relation across between modal data, needs to rely in data base preassigned Join relation, and for corresponding relation loose between multi-medium data, inquiry effect is undesirable.Therefore, it is necessary to proposition can Directly carry out the cross-media retrieval method of different modalities similarity measurement.Directly carry out the comparison of different modalities data similarity, Its difficult point is, between the feature of different modalities data, difference is relatively big, and in general dimension is higher, there is " semantic gap " Problem.
For how overcoming " semantic gap " problem, carrying out the similarity system design between different modalities data and retrieval, having one A bit with the method that traditional method is different, these methods are generally divided into two classes: a class is that the data of different modalities are regarded as random change Amount, the mapping that the latent space that makes these stochastic variable relatednesss maximum by searching is corresponding, for retrieval data, also projected To latent space, thus complete the retrieval of cross-module state.Another kind of method assumes that more implicit themes, different modalities number in these data According to relatedness carried out model by theme.This two classes method is all directly to carry out the method for data similarity-rough set between different modalities, But, to semantic level loosely related situation between different modalities, " correlation maximum " and " theme " these semantic levels interior In the degree of reliability understood, just less than the classification of different modalities data and incidence relation, these known determine information.The present invention will Dictionary learning is incorporated in the retrieval of cross-module state, directly learns with regard to explicit incidence relation, and utilize label information, can be very Well relation in the loose correspondence of semantic level between text and image is modeled, thus improves the robustness to noise, carry The accuracy rate of high cross-module state retrieval.
Summary of the invention
It is an object of the invention to provide a kind of cross-module state retrieval side that can directly measure similarity between different modalities data Method, in order to the data of another one or more mode can be directly retrieved by the data of a kind of mode.
The cross-module state retrieval side that can directly measure similarity between different modalities data comprises the steps:
1) each modal data in data base is carried out feature extraction and label record;
2) according to corresponding informance between different modalities data in data base and label information, from the angle rebuild to different moulds Diversity and similarity between state paired data are expressed, and utilize label information, build cross-module state retrieval block mold and learn Practise model parameter;
3) the known modal data submitting user to, utilizes cross-media retrieval model to return the most right after carrying out feature extraction Other modal datas of the user's request answered;
4) utilize the true corresponding informance across modal data and label information, cross-media retrieval model is believed from correspondence simultaneously Breath and distinctiveness information two aspect are evaluated.
Described step 1) including:
1) all of image modalities data in data base are extracted SIFT feature, and use k-means method to cluster Forming vision word, be then normalized feature, making the characteristic vector representing each image is unit vector;
2) text modality data all of in data base are carried out part-of-speech tagging, go, unless noun word, to retain in text Noun, constitute a dictionary with the word that occurred in all data bases, each text individually added up the word in dictionary The number of times occurred, uses single text vocabulary frequency to carry out vector quantization, is then normalized characteristic vector, makes to represent each The characteristic vector of text is unit vector;
3) to the data of other mode in data base, extract conventional industry standard feature, and feature be normalized, Making the characteristic vector representing each data is unit vector.
4) to different modalities data corresponding in data base, add up their label information, i.e. record them from that Classification.
Described step 2) including:
1) in cross-media retrieval, introduce the concept of dictionary learning, form cross-module state searching algorithm based on dictionary learning, The data of each mode, the different distinctivenesses between dictionary encoding different modalities, different modalities is rebuild with dictionary and sparse coefficient Similarity between data is modeled by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix are equal Obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, belong to the same modal data of same label Sharing identical dictionary primitive, the dictionary being i.e. not zero arranges, so that label information encodes during dictionary learning, and study To the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in as expression formula (1) based on dictionary In the multi-modal retrieval algorithm frame practised, the corresponding data of different modalities is expressed as entirety and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + γ Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M represents the number of mode, and J represents classification i.e. label number, X(m)Represent the characteristic of m mode, D(m) Represent the dictionary of m mode, A(m)Represent the sparse coefficient of m mode,Represent that m mode has those numbers of label l According to corresponding sparse coefficient, the matrix A to any k × n,W(m)It it is m mode incidence relation square Battle array, λm(m=1 ..., M), β, γ be adjustable parameter, be used for regulating the ratio that every part is shared in expression formula,Table Show D(m)In dictionary element, i.e. a string, k is columns;
4) circulation updates sparse coefficient, dictionary and incidence relation matrix, first fixes dictionary and incidence relation matrix update Sparse coefficient, then utilizes the sparse coefficient obtained and fixed correlation relational matrix to update dictionary, the sparse system that recycling updates Number and dictionary updating incidence relation matrix, so circulate, until meeting the condition of convergence, specifically comprises the following steps that
(1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, update incidence relation matrix as follows:
min D Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
Described step 3) including:
1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study(m), initialized Know the sparse coefficient of modal dataAs follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, λ is the parameter of an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study(m), The sparse coefficient of initial reguirements modal dataAs follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study(n), initialize and need Seek modal dataAs follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) according to known modal data, study obtain information and above initialization, update known mode sparse coefficient and The sparse coefficient of demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λm、λnIt is adjustable parameter, corresponding with formula (1).
5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, demand modal data is finally determined such as Under:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
Described step 4) including:
1) evaluate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its other the most corresponding mode numbers According to, with the quality of the demand modal data corresponding with known modal data position evaluation result in the results list, for giving Fixed t% index, if before the demand modal data corresponding with known modal data comes t%, then it is assumed that retrieval is correct, otherwise recognizes For retrieval error;
2) retrieve with distinctiveness information evaluation cross-module state, be conceived to known modal data and belong to the need of same label with it Seek modal data, with retrieve list to weigh cross-module state retrieval result, there is identical label as phase with known modal data Closing, be otherwise uncorrelated, the concrete MAP used in information retrieval is as the measurement of this index, the cross-module state to a request Retrieval data, and the list that search returns, the definition of a length of R, MAP is defined as follows based on AP, AP:
AP = 1 L Σ r = 1 R prec ( r ) δ ( r ) - - - ( 10 )
Wherein, the number of data relevant to retrieval data during L is the list that search returns.Prec (r) represents 1 ... r number The ratio shared by data relevant to retrieval data according to, if δ (r)=1 r item data is relevant to retrieval data, otherwise δ R ()=0, MAP is defined as the meansigma methods of all retrieval data AP values.
What the present invention had has the advantages that: the present invention compared with traditional cross-media retrieval method, can directly than Between relatively different modalities, similarity rather than dependence travel to different modalities data with similarity system design between mode by corresponding relation Between, directly compare the implicit associations pass that the benefit of similarity between different modalities it is possible to really excavate across between media data, directly Connect the retrieval intention realizing user.Other cross-media retrieval algorithm that can directly measure different modalities similarity is compared, this Bright improve measurement results to the capacity of resisting disturbance of noise and to loosely related across the ability to express of media data so that retrieval Effect is more preferable, and result is more relevant to retrieval data from semantically.
Accompanying drawing explanation
Fig. 1 is the cross-module state search method schematic diagram that can directly measure similarity between different modalities data;
Fig. 2 is corresponding picture and the example of text in embodiment data base;
Fig. 3 is picture retrieval text and the example of text retrieval picture of the present invention.Each inquiry lists first four and returns Return result.Top is the example of picture retrieval text, in order to more preferably show retrieval result, here with the true picture that text is corresponding Represent the Similar Text that retrieval obtains.Lower section is the example of text retrieval picture.Each example all compared for the present invention and (gives a name And another directly measures the retrieval effectiveness of cross-media retrieval method (GMA) of similarity between different modalities SliM2).
Detailed description of the invention
The cross-module state retrieval side that can directly measure similarity between different modalities data comprises the steps:
1) each modal data in data base is carried out feature extraction and label record;
2) according to corresponding informance between different modalities data in data base and label information, from the angle rebuild to different moulds Diversity and similarity between state paired data are expressed, and utilize label information, build cross-module state retrieval block mold and learn Practise model parameter;
3) the known modal data submitting user to, utilizes cross-media retrieval model to return the most right after carrying out feature extraction Other modal datas of the user's request answered;
4) utilize the true corresponding informance across modal data and label information, cross-media retrieval model is believed from correspondence simultaneously Breath and distinctiveness information two aspect are evaluated.
Described step 1) including:
1) all of image modalities data in data base are extracted SIFT feature, and use k-means method to cluster Forming vision word, be then normalized feature, making the characteristic vector representing each image is unit vector;
2) text modality data all of in data base are carried out part-of-speech tagging, go, unless noun word, to retain in text Noun, constitute a dictionary with the word that occurred in all data bases, each text individually added up the word in dictionary The number of times occurred, uses single text vocabulary frequency to carry out vector quantization, is then normalized characteristic vector, makes to represent each The characteristic vector of text is unit vector;
3) to the data of other mode in data base, extract conventional industry standard feature, and feature be normalized, Making the characteristic vector representing each data is unit vector.
4) to different modalities data corresponding in data base, add up their label information, i.e. record them from that Classification.
Described step 2) including:
1) in cross-media retrieval, introduce the concept of dictionary learning, form cross-module state searching algorithm based on dictionary learning, The data of each mode, the different distinctivenesses between dictionary encoding different modalities, different modalities is rebuild with dictionary and sparse coefficient Similarity between data is modeled by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix are equal Obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, belong to the same modal data of same label Sharing identical dictionary primitive, the dictionary being i.e. not zero arranges, so that label information encodes during dictionary learning, and study To the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in as expression formula (1) based on dictionary In the multi-modal retrieval algorithm frame practised, the corresponding data of different modalities is expressed as entirety and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + γ Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M represents the number of mode, and J represents classification i.e. label number, X(m)Represent the characteristic of m mode, D(m) Represent the dictionary of m mode, A(m)Represent the sparse coefficient of m mode,Represent that m mode has those numbers of label l According to corresponding sparse coefficient, the matrix A to any k × n,W(m)It it is m mode incidence relation square Battle array, λm(m=1 ..., M), β, γ be adjustable parameter, be used for regulating the ratio that every part is shared in expression formula,Table Show D(m)In dictionary element, i.e. a string, k is columns;
4) circulation updates sparse coefficient, dictionary and incidence relation matrix, first fixes dictionary and incidence relation matrix update Sparse coefficient, then utilizes the sparse coefficient obtained and fixed correlation relational matrix to update dictionary, the sparse system that recycling updates Number and dictionary updating incidence relation matrix, so circulate, until meeting the condition of convergence, specifically comprises the following steps that
(1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, update incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
Described step 3) including:
1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study(m), initialized Know the sparse coefficient of modal dataAs follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, λ is the parameter of an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study(m), The sparse coefficient of initial reguirements modal dataAs follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study(n), initialize and need Seek modal dataAs follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) according to known modal data, study obtain information and above initialization, update known mode sparse coefficient and The sparse coefficient of demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λm、λnIt is adjustable parameter, corresponding with formula (1).
5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, demand modal data is finally determined such as Under:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
Described step 4) including:
1) evaluate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its other the most corresponding mode numbers According to, with the quality of the demand modal data corresponding with known modal data position evaluation result in the results list, for giving Fixed t% index, if before the demand modal data corresponding with known modal data comes t%, then it is assumed that retrieval is correct, otherwise recognizes For retrieval error;
2) retrieve with distinctiveness information evaluation cross-module state, be conceived to known modal data and belong to the need of same label with it Seek modal data, with retrieve list to weigh cross-module state retrieval result, there is identical label as phase with known modal data Closing, be otherwise uncorrelated, the concrete MAP used in information retrieval is as the measurement of this index, the cross-module state to a request Retrieval data, and the list that search returns, the definition of a length of R, MAP is defined as follows based on AP, AP:
AP = 1 L Σ r = 1 R prec ( r ) δ ( r ) - - - ( 10 )
Wherein, the number of data relevant to retrieval data during L is the list that search returns.Prec (r) represents 1 ... r number The ratio shared by data relevant to retrieval data according to, if δ (r)=1 r item data is relevant to retrieval data, otherwise δ R ()=0, MAP is defined as the meansigma methods of all retrieval data AP values.
Embodiment
Assume that we have 2173 respectively to the text of known corresponding relation and view data, the textual data of unknown corresponding relation According to each with view data 693, the example of picture and text such as Fig. 2.Firstly for image modalities data all of in data base Extract SIFT feature, and use k-means method to carry out cluster formation vision word, then feature is normalized, makes generation The characteristic vector of each image of table is unit vector.Text modality data all of in data base are carried out part of speech mark simultaneously Note, goes, unless noun word, to retain the noun in text, constitutes a dictionary with the word occurred in all data bases, right The number of times that the word in dictionary occurs individually added up by each text, uses single text vocabulary frequency to carry out vector quantization, then to spy Levying vector to be normalized, making the characteristic vector representing each text is unit vector.
2173 pairs of data (feature) of pairing are expressed as matrix form, it is stipulated that M represents the number of mode, and J represents classification I.e. label number, X(m)Represent the characteristic of m mode, D(m)Represent the dictionary of m mode, A(m)Represent the sparse of m mode Coefficient,Represent that m mode has the sparse coefficient corresponding to those data of label l, the matrix to any k × nW(m)It is m mode incidence relation matrix, λm(m=1 ..., M), β, γ be adjustable parameter, use Regulate the ratio that every part is shared in expression formula,Represent D(m)In dictionary element, i.e. a string, k for row Number, has two mode, text and image, therefore M=2 here, and text and image are respectively as X(1)And X(2)
Then following steps are performed:
1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1,2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
3) last, update incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 )
Thus study obtains D={D(1),D(2),...,D(M), W={W(1),W(2),...,W(M)}.Subsequently into retrieval rank Section, in retrieval phase, we retrieve by any one in 693*2 text of the unknown corresponding relation and image, can return Return text or the image of its correspondence.Specifically comprise the following steps that
Assume that user submits the retrieval data of known image or text modality toWherein m=1 or 2.
1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study(m), initialized Know the sparse coefficient of modal dataAs follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, λ is the parameter of an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study(m), The sparse coefficient of initial reguirements modal dataAs follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study(n), initialize and need Seek modal dataAs follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) according to known modal data, study obtain information and above initialization, update known mode sparse coefficient and The sparse coefficient of demand mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λm、λnIt is adjustable parameter, corresponding with formula (1).
5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, demand modal data is finally determined such as Under:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
6) modal data according to demand, returns list ordering to demand mode candidate, returns result after sequence.
Fig. 3 illustrates the concrete instance of cross-media retrieval, including example (top) and the text retrieval of picture retrieval text The example (bottom) of picture, and (give a name SliM to compared for the present invention2) directly measure between different modalities with another similar The retrieval effectiveness of the cross-media retrieval method (GMA) of property.For the example (top) of picture retrieval text, in order to more intuitively open up Showing retrieval effectiveness, the true picture that we use text corresponding represents the text data that retrieval obtains.It will be seen that retrieval figure Sheet derives from sports category, and the result that two kinds of methods are retrieved both is from sports category, but utilizes the knot that the present invention retrieves Text (the corresponding picture of text represents) corresponding to retrieving image has been come first by fruit, and remaining retrieves resulting text (representing with corresponding picture) is also semantic more relevant to retrieving image with in content.For the example of text retrieval picture, literary composition This has intercepted one section of display, and the content of the text is mainly relevant park and trail, belongs to geographical classification, examines by the present invention Rope result out and retrieval text belong to same category, and the most relevant in content.And alternatively retrieve Result out makes number one and the picture of the 4th is from history classification, does not the most also have method and the inspection of the present invention Rope text relevant is strong.
From the example above it can be seen that be different from traditional method, the present invention can directly carry out between different modalities similar Property tolerance, thus realize cross-module state retrieval, even and if compared with the method the most directly comparing different modalities similarity, The method of the present invention has more preferable retrieval effectiveness.

Claims (4)

1. the cross-module state search method that can directly measure similarity between different modalities data, it is characterised in that include walking as follows Rapid:
1) each modal data in data base is carried out feature extraction and label record;
2) according to corresponding informance between different modalities data in data base and label information, from the angle rebuild, different modalities is joined Diversity between data and similarity are expressed, utilizes label information, build cross-module state retrieval block mold and learn mould Shape parameter;
3) the known modal data submitting user to, utilizes cross-module state retrieval block mold to return the most right after carrying out feature extraction Other modal datas of the user's request answered;
4) utilize the true corresponding informance across modal data and label information, cross-module state retrieval block mold is believed from correspondence simultaneously Breath and distinctiveness information two aspect are evaluated;
Wherein, described step 1) specifically include:
1) all of image modalities data in data base are extracted SIFT feature, and use k-means method to carry out cluster formation Vision word, is then normalized feature, and making the characteristic vector representing each image is unit vector;
2) text modality data all of in data base are carried out part-of-speech tagging, go, unless noun word, to retain the name in text Word, constitutes a dictionary with the word occurred in all data bases, each text is individually added up the word in dictionary and occurs Number of times, use single text vocabulary frequency to carry out vector quantization, then characteristic vector be normalized, make to represent each text Characteristic vector be unit vector;
3) to the data of other mode in data base, extract conventional industry standard feature, and feature is normalized, make generation The characteristic vector of each data of table is unit vector;
4) to different modalities data corresponding in data base, add up their label information, i.e. record them from that classification.
A kind of cross-module state search method that can directly measure similarity between different modalities data the most according to claim 1, It is characterized in that, described step 2) including:
1) in cross-module state is retrieved, introduce the concept of dictionary learning, form cross-module state searching algorithm based on dictionary learning, with word Allusion quotation and sparse coefficient rebuild the data of each mode, the different distinctivenesses between dictionary encoding different modalities, different modalities data Between similarity modeled by the incidence relation matrix between sparse coefficient, dictionary, sparse coefficient and incidence relation matrix are all from respectively Modal data learning obtains;
2) utilizing label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data belonging to same label is shared Identical dictionary primitive, the dictionary being i.e. not zero arranges, so that label information encodes during dictionary learning, study is to tool The dictionary of having any different property information;
3) dictionary, sparse coefficient, incidence relation matrix, label information be unified in such as formula (1) is many based on dictionary learning In mode searching algorithm framework, the corresponding data of different modalities is expressed as entirety and learns;
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1 , 2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + Σ m = 1 M | | W ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 1 )
Wherein, M represents the number of mode, and J represents classification i.e. label number, X(m)Represent the characteristic of m mode, D(m)Represent The dictionary of m mode, A(m)Represent the sparse coefficient of m mode,Represent that m mode has those data institute of label l right The sparse coefficient answered, the matrix A to any k × n,W(m)It is m mode incidence relation matrix, λm、β、 γ is adjustable parameter, wherein m=1 ..., M, it is used for regulating the ratio that every part is shared in expression formula,Represent D(m) In dictionary element, i.e. a string, k is columns;
4) circulation updates sparse coefficient, dictionary and incidence relation matrix, first fixes dictionary and incidence relation matrix update is sparse Coefficient, then utilizes the sparse coefficient obtained and fixed correlation relational matrix to update dictionary, sparse coefficient that recycling updates and Dictionary updating incidence relation matrix, so circulates, until meeting the condition of convergence, specifically comprises the following steps that
(1) first fix dictionary and incidence relation matrix, update sparse coefficient as follows:
min Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 + Σ m = 1 M Σ l = 1 J λ m | | A : , Ω l ( m ) | | 1 , 2 + β Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 - - - ( 2 )
(2), after obtaining sparse coefficient, each mode dictionary is updated according to the following formula:
min D Σ m = 1 M | | X ( m ) - D ( m ) A ( m ) | | F 2 s . t . | | d k ( m ) | | ≤ 1 , ∀ k , ∀ m , - - - ( 3 )
(3) last, update incidence relation matrix as follows:
min W Σ m = 1 M Σ n ≠ m | | A ( n ) - W ( m ) A ( m ) | | F 2 + ( γ / β ) Σ m = 1 M | | W ( m ) | | F 2 - - - ( 4 ) .
A kind of cross-module state search method that can directly measure similarity between different modalities data the most according to claim 2, It is characterized in that, described step 3) including:
1) the known m modal data submitted to according to userThe known mode dictionary D obtained with study(m), initialize known mould The sparse coefficient of state dataAs follows:
α q ( m ) = min α q 1 2 | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + λ | | α q ( m ) | | 1 - - - ( 5 )
Wherein, λ is the parameter of an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal dataThe incidence relation matrix W obtained with study(m), initialize The sparse coefficient of demand modal dataAs follows:
α r ( n ) = W ( m ) α q ( m ) - - - ( 6 )
3) sparse coefficient of modal data according to demandThe demand mode dictionary D obtained with study(n), initial reguirements mould State dataAs follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 7 )
4) obtain information and above initialization according to known modal data, study, update sparse coefficient and the demand of known mode The sparse coefficient of mode is as follows:
min | | x q ( m ) - D ( m ) α q ( m ) | | F 2 + | | x r ( n ) - D ( n ) α r ( n ) | | F 2 + β | | α r ( n ) - W ( m ) α q ( m ) | | F 2 + λ m | | α q ( m ) | | 1 + λ n | | α r ( n ) | | 1 - - - ( 8 )
Wherein β, λm、λnIt is adjustable parameter, corresponding with formula (1);
5) according to sparse coefficient and the demand mode dictionary of the demand mode updated, finally determine that demand modal data is as follows:
x r ( n ) = D ( n ) α r ( n ) - - - ( 9 ) .
The cross-module state search method that can directly measure similarity between different modalities data the most according to claim 1, it is special Levy and be, described step 4) including:
1) evaluate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its other the most corresponding modal datas, With the quality of the demand modal data corresponding with known modal data position evaluation result in the results list, for given T% index, if before the demand modal data corresponding with known modal data comes t%, then it is assumed that retrieval is correct, otherwise it is assumed that Retrieval error;
2) retrieve with distinctiveness information evaluation cross-module state, be conceived to known modal data and belong to the demand mould of same label with it State data, with retrieve list to weigh cross-module state retrieval result, there is identical label as relevant to known modal data, no Being then uncorrelated, the concrete MAP used in information retrieval is as the measurement of this index, the cross-module state retrieval number to a request According to, and the list that search returns, the definition of a length of R, MAP is defined as follows based on AP, AP:
A P = 1 L Σ r = 1 R p r e c ( r ) δ ( r ) - - - ( 10 )
Wherein, the number of data relevant to retrieval data during L is the list that search returns;Prec (r) represents 1 ... in r data To the ratio shared by the retrieval relevant data of data, if δ (r)=1 r item data is relevant to retrieving data, otherwise δ (r)= 0, the MAP meansigma methods being defined as all retrieval data AP values.
CN201310410553.XA 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data Active CN103488713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310410553.XA CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310410553.XA CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Publications (2)

Publication Number Publication Date
CN103488713A CN103488713A (en) 2014-01-01
CN103488713B true CN103488713B (en) 2016-09-28

Family

ID=49828939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310410553.XA Active CN103488713B (en) 2013-09-10 2013-09-10 A kind of cross-module state search method that can directly measure similarity between different modalities data

Country Status (1)

Country Link
CN (1) CN103488713B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199826B (en) * 2014-07-24 2017-06-30 北京大学 A kind of dissimilar medium similarity calculation method and search method based on association analysis
CN104166684A (en) * 2014-07-24 2014-11-26 北京大学 Cross-media retrieval method based on uniform sparse representation
CN104317838B (en) * 2014-10-10 2017-05-17 浙江大学 Cross-media Hash index method based on coupling differential dictionary
CN104317837B (en) * 2014-10-10 2017-06-23 浙江大学 A kind of cross-module state search method based on topic model
CN104346450B (en) * 2014-10-29 2017-06-23 浙江大学 A kind of across media sort methods based on multi-modal recessive coupling expression
CN104462489B (en) * 2014-12-18 2018-02-23 北京邮电大学 A kind of cross-module state search method based on Deep model
CN105550190B (en) * 2015-06-26 2019-03-29 许昌学院 Cross-media retrieval system towards knowledge mapping
CN108121750B (en) * 2016-11-30 2022-07-08 西门子公司 Model processing method and device and machine readable medium
CN107633259B (en) * 2017-08-21 2020-03-31 天津大学 Cross-modal learning method based on sparse dictionary representation
CN108038080A (en) * 2017-11-29 2018-05-15 浙江大学 A kind of method that local multi-modal sparse coding completion is carried out using the similar tactical ruleization of adaptability
CN110059217B (en) * 2019-04-29 2022-11-04 广西师范大学 Image text cross-media retrieval method for two-stage network
CN110704708B (en) * 2019-09-27 2023-04-07 深圳市商汤科技有限公司 Data processing method, device, equipment and storage medium
CN111930972B (en) * 2020-08-04 2021-04-27 山东大学 Cross-modal retrieval method and system for multimedia data by using label level information
CN112364197B (en) * 2020-11-12 2021-06-01 四川省人工智能研究院(宜宾) Pedestrian image retrieval method based on text description
CN113656660B (en) * 2021-10-14 2022-06-28 北京中科闻歌科技股份有限公司 Cross-modal data matching method, device, equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268346A (en) * 2013-05-27 2013-08-28 翁时锋 Semi-supervised classification method and semi-supervised classification system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268346A (en) * 2013-05-27 2013-08-28 翁时锋 Semi-supervised classification method and semi-supervised classification system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
跨媒体分析与检索;吴飞等;《中国计算机学会通讯》;20110228;第7卷(第2期);第23-27页 *
面向web图片检索的文本和图片信息融合技术研究;尹湘舟;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111215(第S2期);第23-44页 *

Also Published As

Publication number Publication date
CN103488713A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
CN103488713B (en) A kind of cross-module state search method that can directly measure similarity between different modalities data
CN106295796B (en) entity link method based on deep learning
Hendricks et al. Deep compositional captioning: Describing novel object categories without paired training data
Min et al. Question answering through transfer learning from large fine-grained supervision data
Peng et al. Semi-supervised cross-media feature learning with unified patch graph regularization
CN106202256B (en) Web image retrieval method based on semantic propagation and mixed multi-instance learning
CN102197393B (en) Image-based semantic distance
US20210366025A1 (en) Item recommendation method based on user intention in session and system thereof
CN109635083A (en) It is a kind of for search for TED speech in topic formula inquiry document retrieval method
CN106886601A (en) A kind of Cross-modality searching algorithm based on the study of subspace vehicle mixing
CN106156333A (en) A kind of improvement list class collaborative filtering method of mosaic society information
CN106844738B (en) The classification method of Junker relationship between food materials neural network based
CN110647904A (en) Cross-modal retrieval method and system based on unmarked data migration
CN112417306A (en) Method for optimizing performance of recommendation algorithm based on knowledge graph
CN105701514A (en) Multi-modal canonical correlation analysis method for zero sample classification
CN105701225B (en) A kind of cross-media retrieval method based on unified association hypergraph specification
CN105718940A (en) Zero-sample image classification method based on multi-group factor analysis
CN113239159B (en) Cross-modal retrieval method for video and text based on relational inference network
CN109472282B (en) Depth image hashing method based on few training samples
CN110059220A (en) A kind of film recommended method based on deep learning Yu Bayesian probability matrix decomposition
CN104317838A (en) Cross-media Hash index method based on coupling differential dictionary
CN113779219A (en) Question-answering method for embedding multiple knowledge maps by combining hyperbolic segmented knowledge of text
CN112800292A (en) Cross-modal retrieval method based on modal specificity and shared feature learning
CN105893573A (en) Site-based multi-modal media data subject extraction model
CN102693321A (en) Cross-media information analysis and retrieval method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20140101

Assignee: TONGDUN HOLDINGS Co.,Ltd.

Assignor: ZHEJIANG University

Contract record no.: X2021990000612

Denomination of invention: A cross modal retrieval method that can directly measure the similarity between different modal data

Granted publication date: 20160928

License type: Common License

Record date: 20211012