CN103488713A - Cross-modal search method capable of directly measuring similarity of different modal data - Google Patents
Cross-modal search method capable of directly measuring similarity of different modal data Download PDFInfo
- Publication number
- CN103488713A CN103488713A CN201310410553.XA CN201310410553A CN103488713A CN 103488713 A CN103488713 A CN 103488713A CN 201310410553 A CN201310410553 A CN 201310410553A CN 103488713 A CN103488713 A CN 103488713A
- Authority
- CN
- China
- Prior art keywords
- data
- dictionary
- modal
- cross
- sigma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a cross-modal search method capable of directly measuring similarity of different modal data. The method includes the steps of firstly, feature extracting; secondly, model building and learning; thirdly, cross-media data search; fourthly, result evaluating. By the method compared with traditional cross-media search methods, similarity comparison of different modal data can be performed directly, for cross-modal search tasks, a user can submit texts, images, sounds and the like of optional modals so as to search required corresponding modal results, requirements of cross-media search are satisfied, and search intensions of a user can be achieved more directly. Compared with other cross-media search algorithms capable of directly measuring similarity of different modals, the method is high in noise interference resistance and expression capacity of loosely-related cross-modal data, and better search results can be achieved.
Description
Technical field
The present invention relates to the retrieval of cross-module state, relate in particular to a kind of cross-module state search method that can directly measure similarity between the different modalities data.
Background technology
Nowadays, the type of electronic data more and more trends towards rich and varied, and various types of data, as text, image, sound, map etc. extensively exist on the internet.Same semantic content often both can be described by a kind of data of mode, also can describe by the data of other mode.Sometimes, for the description of a kind of categorical data of certain semanteme, we wish to find the description of its corresponding other types data.Such as, remove the retrieval picture close with the text implication according to text, or according to picture searching news report relevant with picture etc.But existing search method is generally all for the single mode data, as the text retrieval text, the image retrieval image.Some multi-modal or multimedia retrieval methods are also arranged, but these multi-modal search methods are mostly to do similarity measurement between same mode, then calculate across the similarity between media data by Function Mapping, the direct relatively search method of different modalities similarity is seldom arranged.Between same mode, carry out measuring similarity across the media search method, its weak point is, can not learn the relation between cross-module state data, needs preassigned matching relationship in dependency database, and, for corresponding relation loose between multi-medium data, the inquiry effect is undesirable.Therefore, be necessary to propose directly to carry out the different modalities similarity measurement across the media search method.Directly carry out the comparison of different modalities data similarity, its difficult point is, between the feature of different modalities data, difference is larger, and in general dimension is higher, has the problem of " semantic gap ".
For how overcoming " semantic gap " problem, the similarity of carrying out between the different modalities data compares and retrieval, there are some to follow the classic method diverse ways, these methods generally are divided into two classes: a class is to regard the data of different modalities as stochastic variable, make the mapping of the hidden space corresponding of these stochastic variable relevance maximums by searching, for retrieve data, also it is projected to hidden space, thereby complete the retrieval of cross-module state.Another kind of method is more implicit themes in these data of hypothesis, and the relevance of different modalities data is carried out model by theme.These two class methods are all directly to carry out data similarity method relatively between different modalities, yet, to semantic level loosely related situation between different modalities, the degree of reliability that the inherence of " correlativity maximum " and " theme " these semantic levels is understood is just less than classification and these known definite information of incidence relation of different modalities data.The present invention is incorporated into dictionary learning in the retrieval of cross-module state, directly with regard to explicit incidence relation, learnt, and utilize label information, can carry out modeling to the relation in the loose correspondence of semantic level between text and image well, thereby improve the robustness to noise, improve the accuracy rate of cross-module state retrieval.
Summary of the invention
The object of the present invention is to provide a kind of cross-module state search method that can directly measure similarity between the different modalities data, in order to can directly retrieve by a kind of data of mode the data of another one or more mode.
The cross-module state retrieval side that can directly measure similarity between the different modalities data comprises the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
Described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
Described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
Wherein, M means the number of mode, and J means that classification is the label number, X
(m)the characteristic that means m mode, D
(m)the dictionary that means m mode, A
(m)the sparse coefficient that means m mode,
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
w
(m)m mode incidence relation matrix, λ
m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
mean D
(m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
(3) last, upgrade the incidence relation matrix as follows:
Described step 3) comprising:
1) the known m modal data of submitting to according to the user
the known mode dictionary D obtained with study
(m), the sparse coefficient of the known modal data of initialization
as follows:
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
the incidence relation matrix W obtained with study
(m), the sparse coefficient of initialization demand modal data
as follows:
3) the sparse coefficient of modal data according to demand
the demand mode dictionary D obtained with study
(n), initialization demand modal data
as follows:
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
Wherein β, λ
m, λ
nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
Described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
The useful effect that the present invention has is: the present invention and traditional comparing across the media search method, can directly compare similarity between different modalities, rather than dependence relatively propagates between the different modalities data by corresponding relation with similarity between mode, the benefit that directly compares similarity between different modalities is, can really excavate across the implicit associations between media data and close, directly realize user's retrieval intention.Other can directly measure comparing across the media searching algorithm of different modalities similarity, the present invention improved tolerance result to the antijamming capability of noise and to loosely related the ability to express across media data, make retrieval effectiveness better, result is more from semantically relevant to retrieve data.
The accompanying drawing explanation
Fig. 1 is the cross-module state search method schematic diagram that can directly measure similarity between the different modalities data;
Fig. 2 is the example of corresponding picture and text in the embodiment database;
Fig. 3 is the example of picture retrieval text of the present invention and text retrieval picture.Each inquiry has been listed first four and has been returned results.Top is the example of picture retrieval text, and for the better result for retrieval of showing, with text, corresponding true picture represents the Similar Text that retrieval obtains here.Below is the example of text retrieval picture.Each example has all contrasted the present invention's (SliM2 gives a name) and another directly measures the retrieval effectiveness across media search method (GMA) of similarity between different modalities.
Embodiment
The cross-module state retrieval side that can directly measure similarity between the different modalities data comprises the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
Described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
Described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
Wherein, M means the number of mode, and J means that classification is the label number, X
(m)the characteristic that means m mode, D
(m)the dictionary that means m mode, A
(m)the sparse coefficient that means m mode,
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
w
(m)m mode incidence relation matrix, λ
m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
mean D
(m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
(3) last, upgrade the incidence relation matrix as follows:
Described step 3) comprising:
1) the known m modal data of submitting to according to the user
the known mode dictionary D obtained with study
(m), the sparse coefficient of the known modal data of initialization
as follows:
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
the incidence relation matrix W obtained with study
(m), the sparse coefficient of initialization demand modal data
as follows:
3) the sparse coefficient of modal data according to demand
the demand mode dictionary D obtained with study
(n), initialization demand modal data
as follows:
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
Wherein β, λ
m, λ
nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
Described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
Embodiment
Suppose that we have respectively text and the view data of 2173 pairs of known corresponding relations, each 693 of the text data of unknown corresponding relation and view data, the example of picture and text is as Fig. 2.At first extract the SIFT feature for image modalities data all in database, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector.Text modality data all in database are carried out to part-of-speech tagging simultaneously, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector.
2173 pairs of data (feature) of pairing are expressed as to matrix form, and regulation M means the number of mode, and J means that classification is the label number, X
(m)the characteristic that means m mode, D
(m)the dictionary that means m mode, A
(m)the sparse coefficient that means m mode,
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
w
(m)m mode incidence relation matrix, λ
m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
mean D
(m)in a dictionary element, one row, k is columns, and two mode are arranged here, text and image, so M=2, text and image are respectively as X
(1)and X
(2).
Then carry out following steps:
1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
3) last, upgrade the incidence relation matrix as follows:
Thereby study obtains D={D
(1), D
(2)..., D
(M), W={W
(1), W
(2)..., W
(M).Then enter retrieval phase, in retrieval phase, we do retrieval with 693*2 text of unknown corresponding relation and any one in image, can return to its corresponding text or image.Concrete steps are as follows:
Suppose that the user submits the retrieve data of known image or text modality to
m=1 or 2 wherein.
1) the known m modal data of submitting to according to the user
the known mode dictionary D obtained with study
(m), the sparse coefficient of the known modal data of initialization
as follows:
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
the incidence relation matrix W obtained with study
(m), the sparse coefficient of initialization demand modal data
as follows:
3) the sparse coefficient of modal data according to demand
the demand mode dictionary D obtained with study
(n), initialization demand modal data
as follows:
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
Wherein β, λ
m, λ
nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
6) modal data according to demand, to the sequence of demand mode candidate return-list, return to result after sequence.
Fig. 3 has meaned comprise the example (top) of picture retrieval text and the example (bottom) of text retrieval picture, and (SliM to give a name to have contrasted the present invention across the concrete instance of media retrieval
2) and another directly measure the retrieval effectiveness across media search method (GMA) of similarity between different modalities.For the example (top) of picture retrieval text, in order more intuitively to show retrieval effectiveness, we adopt the true picture that text is corresponding to represent the text data that retrieval obtains.Can see, retrieving image derives from sports category, the result that two kinds of methods are retrieved is all from sports category, but the result of utilizing the present invention to retrieve has come first by the corresponding text of retrieving image (text means with corresponding picture), remaining result for retrieval text (meaning with corresponding picture) is also more relevant to retrieving image on semantic and content.For the example of text retrieval picture, text has only intercepted one section demonstration, and the content of the text is mainly relevant park and trail, belongs to geographical classification, and the result be retrieved with the present invention and retrieval text belong to same classification, and very relevant on content.And the result that retrieves out by another method makes number one and the picture of the 4th from historical classification, do not have in terms of content method of the present invention strong with the retrieval text relevant yet.
From top example, can find out, be different from classic method, the present invention can directly carry out similarity measurement between different modalities, thereby realizes the retrieval of cross-module state, even and compare with the method that can directly compare equally the different modalities similarity, method of the present invention has better retrieval effectiveness.
Claims (5)
1. the cross-module state search method that can directly measure similarity between the different modalities data, is characterized in that comprising the steps:
1) each modal data in database is carried out to feature extraction and label record;
2) according to the corresponding informance between the different modalities data and label information in database, from the angle of rebuilding, the otherness between the different modalities paired data and similarity are expressed, utilize label information, build cross-module state retrieval block mold learning model parameter;
3) the known modal data of the user being submitted to, carry out utilizing other modal datas that return to corresponding with it user's request across the media retrieval model after feature extraction;
4) utilize true corresponding informance and the label information of cross-module state data, to being estimated from corresponding informance and distinctiveness information two aspects across the media retrieval model simultaneously.
2. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 1) comprising:
1) image modalities data all in database are extracted to the SIFT feature, and adopt the k-means method to carry out cluster formation vision word, then feature is carried out to normalization, make to represent that the eigenvector of each image is unit vector;
2) text modality data all in database are carried out to part-of-speech tagging, remove non-noun word, retain the noun in text, form a dictionary with the word occurred in all databases, each text is added up separately to the number of times that the word in dictionary occurs, adopt single text vocabulary frequency to carry out vector quantization, then eigenvector is carried out to normalization, make to represent that the eigenvector of each text is unit vector;
3) to the data of other mode in database, extract industry standard feature commonly used, and feature is carried out to normalization, make to represent that the eigenvector of each data is unit vector.
4) to different modalities data corresponding in database, add up their label information, record them from that classification.
3. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 2) comprising:
1) introduce the concept of dictionary learning in across the media retrieval, the cross-module state searching algorithm of formation based on dictionary learning, data with dictionary and sparse each mode of coefficient reconstruction, distinctiveness between different dictionary encoding different modalities, similarity between the different modalities data is carried out modeling by the incidence relation matrix between sparse coefficient, and dictionary, sparse coefficient and incidence relation matrix all obtain from each modal data learning;
2) utilize label information to participate in the retrieval of cross-module state, in dictionary learning, the same modal data that belongs to same label is shared identical dictionary primitive, i.e. non-vanishing dictionary row, thereby label information is coded in the process of dictionary learning, and study is to the dictionary with distinctiveness information;
3) dictionary, sparse coefficient, incidence relation matrix, label information are unified in the multi-modal searching algorithm framework based on dictionary learning as expression formula (1), the corresponding data of different modalities is expressed as a whole and learns;
Wherein, M means the number of mode, and J means that classification is the label number, X
(m)the characteristic that means m mode, D
(m)the dictionary that means m mode, A
(m)the sparse coefficient that means m mode,
mean that m mode has the corresponding sparse coefficient of those data of label l, to the matrix A of any k * n,
w
(m)m mode incidence relation matrix, λ
m(m=1 ..., M), β, γ be adjustable parameter, be used for regulating every part shared ratio in expression formula,
mean D
(m)in a dictionary element, one row, k is columns;
4) sparse coefficient, dictionary and incidence relation matrix are upgraded in circulation, at first fixedly dictionary and the sparse coefficient of incidence relation matrix update, then utilize the sparse coefficient and the fixed correlation relational matrix that obtain to upgrade dictionary, sparse coefficient and dictionary updating incidence relation matrix that recycling is upgraded, so circulation, until meet the condition of convergence, concrete steps are as follows:
(1) at first fixedly dictionary and incidence relation matrix, upgrade sparse coefficient as follows:
(2), after obtaining sparse coefficient, upgrade according to the following formula each mode dictionary:
(3) last, upgrade the incidence relation matrix as follows:
4. a kind of cross-module state search method that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 3) comprising:
1) the known m modal data of submitting to according to the user
the known mode dictionary D obtained with study
(m), the sparse coefficient of the known modal data of initialization
as follows:
Wherein, the parameter that λ is an adjustment factor degree;
2) according to the sparse coefficient of initialized known modal data
the incidence relation matrix W obtained with study
(m), the sparse coefficient of initialization demand modal data
as follows:
3) the sparse coefficient of modal data according to demand
the demand mode dictionary D obtained with study
(n), initialization demand modal data
as follows:
4) obtain information and above initialization according to known modal data, study, the sparse coefficient that upgrades the sparse coefficient of known mode and demand mode is as follows:
Wherein β, λ
m, λ
nadjustable parameter, corresponding with formula (1).
5), according to sparse coefficient and the demand mode dictionary of the demand mode of upgrading, finally determine that the demand modal data is as follows:
5. a kind of cross-module state searching algorithm that can directly measure similarity between the different modalities data according to claim 1, is characterized in that described step 4) comprising:
1) estimate the retrieval of cross-module state with corresponding informance, be conceived to known modal data and its direct other corresponding modal datas, quality with the demand modal data position evaluation result in the results list corresponding with known modal data, for given t% index, before if the demand modal data corresponding with known modal data comes t%, think that retrieval is correct, otherwise think retrieval error;
2) with distinctiveness information evaluation cross-module state, retrieve, be conceived to known modal data and belong to the demand modal data of same label with it, weigh cross-module state result for retrieval with the retrieval list, there is identical label as relevant with known modal data, otherwise be uncorrelated, MAP in concrete employing information retrieval is as the measurement of this index, to the cross-module state retrieve data of a request, and search for the list of returning, length is R, the definition of MAP is based on AP, and AP is defined as follows:
Wherein, L is the number of searching for data relevant to retrieve data in the list of returning.Prec (r) means the data shared ratio relevant to retrieve data in the 1...r data, δ (r) if=1 r item number is according to relevant to retrieve data, otherwise δ (r)=0, MAP is defined as the mean value of all retrieve data AP values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310410553.XA CN103488713B (en) | 2013-09-10 | 2013-09-10 | A kind of cross-module state search method that can directly measure similarity between different modalities data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310410553.XA CN103488713B (en) | 2013-09-10 | 2013-09-10 | A kind of cross-module state search method that can directly measure similarity between different modalities data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103488713A true CN103488713A (en) | 2014-01-01 |
CN103488713B CN103488713B (en) | 2016-09-28 |
Family
ID=49828939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310410553.XA Active CN103488713B (en) | 2013-09-10 | 2013-09-10 | A kind of cross-module state search method that can directly measure similarity between different modalities data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103488713B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104166684A (en) * | 2014-07-24 | 2014-11-26 | 北京大学 | Cross-media retrieval method based on uniform sparse representation |
CN104199826A (en) * | 2014-07-24 | 2014-12-10 | 北京大学 | Heterogeneous media similarity calculation method and retrieval method based on correlation analysis |
CN104317837A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-modal searching method based on topic model |
CN104317838A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-media Hash index method based on coupling differential dictionary |
CN104346450A (en) * | 2014-10-29 | 2015-02-11 | 浙江大学 | Cross-media ordering method based on multi-modal implicit coupling expression |
CN104462489A (en) * | 2014-12-18 | 2015-03-25 | 北京邮电大学 | Cross-modal retrieval method based on deep-layer models |
CN105550190A (en) * | 2015-06-26 | 2016-05-04 | 许昌学院 | Knowledge graph-oriented cross-media retrieval system |
CN107633259A (en) * | 2017-08-21 | 2018-01-26 | 天津大学 | A kind of cross-module state learning method represented based on sparse dictionary |
CN108038080A (en) * | 2017-11-29 | 2018-05-15 | 浙江大学 | A kind of method that local multi-modal sparse coding completion is carried out using the similar tactical ruleization of adaptability |
CN108121750A (en) * | 2016-11-30 | 2018-06-05 | 西门子公司 | A kind of model treatment method, apparatus and machine readable media |
CN110059217A (en) * | 2019-04-29 | 2019-07-26 | 广西师范大学 | A kind of image text cross-media retrieval method of two-level network |
CN110704708A (en) * | 2019-09-27 | 2020-01-17 | 深圳市商汤科技有限公司 | Data processing method, device, equipment and storage medium |
CN111930972A (en) * | 2020-08-04 | 2020-11-13 | 山东大学 | Cross-modal retrieval method and system for multimedia data by using label level information |
CN112364197A (en) * | 2020-11-12 | 2021-02-12 | 四川省人工智能研究院(宜宾) | Pedestrian image retrieval method based on text description |
CN113656660A (en) * | 2021-10-14 | 2021-11-16 | 北京中科闻歌科技股份有限公司 | Cross-modal data matching method, device, equipment and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268346A (en) * | 2013-05-27 | 2013-08-28 | 翁时锋 | Semi-supervised classification method and semi-supervised classification system |
-
2013
- 2013-09-10 CN CN201310410553.XA patent/CN103488713B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103268346A (en) * | 2013-05-27 | 2013-08-28 | 翁时锋 | Semi-supervised classification method and semi-supervised classification system |
Non-Patent Citations (2)
Title |
---|
吴飞等: "跨媒体分析与检索", 《中国计算机学会通讯》 * |
尹湘舟: "面向web图片检索的文本和图片信息融合技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199826A (en) * | 2014-07-24 | 2014-12-10 | 北京大学 | Heterogeneous media similarity calculation method and retrieval method based on correlation analysis |
CN104166684A (en) * | 2014-07-24 | 2014-11-26 | 北京大学 | Cross-media retrieval method based on uniform sparse representation |
CN104199826B (en) * | 2014-07-24 | 2017-06-30 | 北京大学 | A kind of dissimilar medium similarity calculation method and search method based on association analysis |
CN104317837A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-modal searching method based on topic model |
CN104317838A (en) * | 2014-10-10 | 2015-01-28 | 浙江大学 | Cross-media Hash index method based on coupling differential dictionary |
CN104317838B (en) * | 2014-10-10 | 2017-05-17 | 浙江大学 | Cross-media Hash index method based on coupling differential dictionary |
CN104317837B (en) * | 2014-10-10 | 2017-06-23 | 浙江大学 | A kind of cross-module state search method based on topic model |
CN104346450A (en) * | 2014-10-29 | 2015-02-11 | 浙江大学 | Cross-media ordering method based on multi-modal implicit coupling expression |
CN104346450B (en) * | 2014-10-29 | 2017-06-23 | 浙江大学 | A kind of across media sort methods based on multi-modal recessive coupling expression |
CN104462489B (en) * | 2014-12-18 | 2018-02-23 | 北京邮电大学 | A kind of cross-module state search method based on Deep model |
CN104462489A (en) * | 2014-12-18 | 2015-03-25 | 北京邮电大学 | Cross-modal retrieval method based on deep-layer models |
CN105550190A (en) * | 2015-06-26 | 2016-05-04 | 许昌学院 | Knowledge graph-oriented cross-media retrieval system |
CN105550190B (en) * | 2015-06-26 | 2019-03-29 | 许昌学院 | Cross-media retrieval system towards knowledge mapping |
CN108121750B (en) * | 2016-11-30 | 2022-07-08 | 西门子公司 | Model processing method and device and machine readable medium |
CN108121750A (en) * | 2016-11-30 | 2018-06-05 | 西门子公司 | A kind of model treatment method, apparatus and machine readable media |
CN107633259B (en) * | 2017-08-21 | 2020-03-31 | 天津大学 | Cross-modal learning method based on sparse dictionary representation |
CN107633259A (en) * | 2017-08-21 | 2018-01-26 | 天津大学 | A kind of cross-module state learning method represented based on sparse dictionary |
CN108038080A (en) * | 2017-11-29 | 2018-05-15 | 浙江大学 | A kind of method that local multi-modal sparse coding completion is carried out using the similar tactical ruleization of adaptability |
CN110059217A (en) * | 2019-04-29 | 2019-07-26 | 广西师范大学 | A kind of image text cross-media retrieval method of two-level network |
CN110059217B (en) * | 2019-04-29 | 2022-11-04 | 广西师范大学 | Image text cross-media retrieval method for two-stage network |
CN110704708A (en) * | 2019-09-27 | 2020-01-17 | 深圳市商汤科技有限公司 | Data processing method, device, equipment and storage medium |
CN111930972A (en) * | 2020-08-04 | 2020-11-13 | 山东大学 | Cross-modal retrieval method and system for multimedia data by using label level information |
CN111930972B (en) * | 2020-08-04 | 2021-04-27 | 山东大学 | Cross-modal retrieval method and system for multimedia data by using label level information |
CN112364197A (en) * | 2020-11-12 | 2021-02-12 | 四川省人工智能研究院(宜宾) | Pedestrian image retrieval method based on text description |
CN112364197B (en) * | 2020-11-12 | 2021-06-01 | 四川省人工智能研究院(宜宾) | Pedestrian image retrieval method based on text description |
CN113656660A (en) * | 2021-10-14 | 2021-11-16 | 北京中科闻歌科技股份有限公司 | Cross-modal data matching method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN103488713B (en) | 2016-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103488713A (en) | Cross-modal search method capable of directly measuring similarity of different modal data | |
CN107346328B (en) | Cross-modal association learning method based on multi-granularity hierarchical network | |
CN109376222B (en) | Question-answer matching degree calculation method, question-answer automatic matching method and device | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
CN106886601B (en) | A kind of Cross-modality search method based on the study of subspace vehicle mixing | |
US20120323968A1 (en) | Learning Discriminative Projections for Text Similarity Measures | |
CN109858015B (en) | Semantic similarity calculation method and device based on CTW (computational cost) and KM (K-value) algorithm | |
Sarawagi et al. | Open-domain quantity queries on web tables: annotation, response, and consensus models | |
CN106708929B (en) | Video program searching method and device | |
US20150356199A1 (en) | Click-through-based cross-view learning for internet searches | |
Guo et al. | Sparse deep nonnegative matrix factorization | |
Wang et al. | Image tag refinement by regularized latent Dirichlet allocation | |
CN103455487A (en) | Extracting method and device for search term | |
CN102693321A (en) | Cross-media information analysis and retrieval method | |
CN104317837B (en) | A kind of cross-module state search method based on topic model | |
US11709858B2 (en) | Mapping of unlabeled data onto a target schema via semantic type detection | |
CN113239159B (en) | Cross-modal retrieval method for video and text based on relational inference network | |
CN105701225A (en) | Cross-media search method based on unification association supergraph protocol | |
CN106570196B (en) | Video program searching method and device | |
CN113792594B (en) | Method and device for locating language fragments in video based on contrast learning | |
CN103279581B (en) | A kind of method utilizing compact video subject description to carry out video frequency searching | |
CN113326392B (en) | Remote sensing image audio retrieval method based on quadruple hash | |
CN116151258A (en) | Text disambiguation method, electronic device and storage medium | |
CN116561594A (en) | Legal document similarity analysis method based on Word2vec | |
CN105677830A (en) | Heterogeneous media similarity computing method and retrieving method based on entity mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20140101 Assignee: TONGDUN HOLDINGS Co.,Ltd. Assignor: ZHEJIANG University Contract record no.: X2021990000612 Denomination of invention: A cross modal retrieval method that can directly measure the similarity between different modal data Granted publication date: 20160928 License type: Common License Record date: 20211012 |
|
EE01 | Entry into force of recordation of patent licensing contract |