CN106951509B - Multi-tag coring canonical correlation analysis search method - Google Patents
Multi-tag coring canonical correlation analysis search method Download PDFInfo
- Publication number
- CN106951509B CN106951509B CN201710158859.9A CN201710158859A CN106951509B CN 106951509 B CN106951509 B CN 106951509B CN 201710158859 A CN201710158859 A CN 201710158859A CN 106951509 B CN106951509 B CN 106951509B
- Authority
- CN
- China
- Prior art keywords
- sample
- text
- subspace
- label
- visual pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/44—Browsing; Visualisation therefor
- G06F16/444—Spatial browsing, e.g. 2D maps, 3D or virtual spaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/435—Filtering based on additional data, e.g. user or group profiles
Abstract
The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis search method.Key step are as follows: (S1) selects text and visual pattern, the paired data of building text, visual pattern and label, and selects the sample of paired data;(S2) the Semantic Similarity matrix of label is calculated;(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection;(S5) it is retrieved, obtains the search result across Modal Subspace.The present invention is used by the high-layer semantic information to multi-tag form in multimedia document while excavating nonlinear correlativity between different modalities using KCCA, learn to the public subspace with more distinction of the different modalities more suitable for cross-module state retrieval tasks, has obtained good retrieval effectiveness.
Description
Technical field
The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis retrieval side
Method.
Background technique
Cross-module state information retrieval is a challenging research topic, and inquiry and result belong to different mode, across
Modal information retrieval is the mutual retrieval between multi-modal information, such as image retrieval text, text retrieval image.Due to " language
The presence of adopted wide gap " cannot directly be compared it.Therefore, the critical issue in this task is how to measure multiple mode
The distance between or similitude.The two feature spaces are aligned by one shared subspace of study in the prior art, so as to
It can compare between different mode.In the conventional method, canonical correlation analysis (Canonical Correlation
Analysis, abbreviation CCA) [1] show its simplicity and efficiency, by maximizing the correlation between two mode projections
Property learn shared subspace.CCA has become the main force of many cross-module state search methods.Many expansions of CCA have been proposed
Task of the exhibition for cross-mode retrieval in recent years.
Although CCA is welcome due to its simplicity and efficiency, it has the shortcomings that several.CCA is dependent between mode
One-to-one pairing relationship, not using high-level semantic label information present in multimedia document, this causes it cannot
It obtains being better adapted to cross-module state retrieval tasks subspace.Recently, it has been proposed that use the expansion of some CCA of label information
Exhibition method.However, these most of work are only applicable in single label annotation multimedia document.Under normal conditions, a figure
As may belong to multiple classes, it is therefore assumed that it is unreasonable that data are annotated with single label, and cause label information not by most
The utilization of big degree.Therefore, closer to truth be consider multi-tag information with excavate the data from different modalities it
Between correlation.
Canonical correlation analysis is proposed by Hotelling first, is a kind of for finding the subspace of multiple data spaces
Data analysing method.However, classics CCA has ignored the additional high-level semantic information for limiting its performance.Rasiwasia et al is
Single label data collection proposes cluster-CCA [1].Viresh Ranjan et al proposes consideration multi-tag in document [2]
Multi-tag canonical correlation analysis (multi-label Canonical the Correlation Analysis, abbreviation ml- of information
CCA).As the expanding method of CCA, other expanding methods that ml-CCA surpasses most of CCA, which are benefited from, considers multiple labeling information.
However ml-CCA be linear method and the correlation between the data of different modalities often and non-linear relation, which limits
The performance of ml-CCA.
Under normal conditions, image may belong to multiple classes, i.e. an image usually corresponds to multiple labels, consider multi-tag letter
The case where breath, more meets reality;Meanwhile it is multi-modal between correlation not simply linear relationship.However, existing method
Perhaps the high-level semantics information of multi-tag message form is not utilized or cannot excavate the non-linear relation between mode.With reference to text
It offers as follows:
[1]Vijay Mahadevan,“Cluster canonical correlation analysis,”Aistats,
2014.
[2]V.Ranjan,N.Rasiwasia,and C.V.Jawahar,“Multilabel cross-modal
retrieval,”in 2015IEEE International Conference on Computer Vision(ICCV),Dec
2015,pp.4094–4102.
Summary of the invention
The present invention is based on coring canonical correlation analysis and multi-tag information, propose the frame of a new cross-module state retrieval
Frame -- multi-tag coring canonical correlation analysis (multi-label kernel Canonical Correlation Analysis,
Abbreviation ml-KCCA) search method.Before describing technical solution of the present invention, first coring canonical correlation analysis is introduced.
KCCA is the coring version of CCA.Two views of data-oriented, we can construct their common expression by coring CCA.
Formally, the paired data { (t of text and visual pattern is given1,p1),...,(tN,pN) sample, wherein t ∈ Rt, p ∈ RpPoint
Not Biao Shi sample text and visual modalities feature vector, give two feature spaces kernel function (in this method be Gauss
Kernel function), respectively kt(ti,tj)=φt(ti)Tφt(tj), kp(pi,pj)=φp(pi)Tφp(pj), wherein ti,tj, pi,pj
For the sample point in data space, φt、φpMapping function of the data space to feature space, i=1,2 ..., N, j=1,
2 ..., N, T indicate transposition symbol;
The objective function of coring form can be extended and be verified as the form of formula (1), to determine projection vector α, β ∈
RNTo maximize Canonical correlation:
Wherein ρ*For related coefficient, Kt=(kt(ti,tj))N×NAnd Kp=(kp(pi,pj))N×NIndicate N to N × N of sample
Nuclear matrix.The problem can be converted into eigenvalue problem and be solved, [α β]TFor feature vector, according to being worth maximum D feature
Value can find out corresponding a series of (α1,β1),(α2,β2) ..., (αD,βD), for calculating new input text t or vision figure
As the D of p ties up projection.Max indicates maximizing.In its primitive form, KCCA cannot utilize label information.Therefore, α, β cannot
Using label information, and it is not enough to very good solution cross-module state retrieval tasks.
The high-layer semantic information of multi-tag message form is by ml-KCCA for learning more suitable for cross-module state retrieval tasks
The nonlinear dependence between different modalities is excavated by Kernel-Based Methods in the shared subspace with more distinction of different modalities
System.Meanwhile incomplete Choleskydecomposition is used for the characteristic value Solve problems that ml-KCCA accelerates solution KCCA.Particular technique
Scheme is as follows:
A kind of multi-tag coring canonical correlation analysis search method, comprising the following steps:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects to match logarithm
According to sample;
The sample of paired data is expressed as { (t1,p1,z1),...,(ti,pi,zi),...,(tN,pN,zN), wherein ziIt is
The label vector of i-th of sample of paired data, i=1,2 ..., N, Tw=[t1,t2,...,tN]∈Rdt×N, wherein TwIt is text
The matrix expression of sample, the dimension of dt samples of text, P=[p1,p2,...,pN]∈Rdp×N, wherein P is visual pattern sample
Matrix indicates that dp indicates the dimension of visual pattern sample.Z=[z1,z2,...,zN]∈RC×N, wherein Z indicates label matrix, in Z
Multiple elements in each column may be non-zero, that is, exist simultaneously multiple labels, and C is the dimension of label, and N is the sample of paired data
This number, N round numbers.
(S2) the Semantic Similarity matrix of label is calculated;It is similar between calculating any two label vector for enabling f ()
The function of property, then Semantic Similarity matrix S:
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:
Wherein ρ is related coefficient,Respectively indicate N × N nuclear matrix, KtAnd KpRespectively indicate N
To N × N nuclear matrix of sample, i.e., N is handled using multi-tag information to paired data sample, η is for controlling Semantic Similarity
The influence coefficient of matrix;
According toWithα will be solved, β process is converted into solution eigenvalue problem by being similar in the case where KCCA:
B-1Aw=λ w (4)
Wherein, λ is characterized value,W=[α β]T, most according to value
D big characteristic value finds out corresponding a series of vector (α1,β1),...,(αD,βD);
Find out (α1,β1),...,(αD,βD) after, text and visual pattern feature in multi-modal shared subspace
Expression can be obtained, by the weighting kernel function between assessment input and N number of sampled point, by new text input txProject to α
In specified single text input:
Wherein αiIndicate i-th of element of vector α, tiRepresent i-th of sample in N number of sample data.
(S4) visual pattern and text are sought respectively in the projective representation of multi-modal shared subspace;
New text txThe final projection M of public subspace is tieed up to D are as follows:
Wherein,Indicate vector α1I-th of element;
Similar, new vision image pxThe final projection Q of public subspace is tieed up to D are as follows:
Wherein,Indicate vector β1I-th of element;
(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new
Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval;It, then will be new when carrying out text retrieval visual pattern
Text is mapped to subspace by M, and then carries out similarity retrieval.
Further, the function f () for calculating similitude is the similarity measurements flow function based on dot product:
Wherein<>indicates dot product, | | | | indicate modulus operation, i=1,2 ..., N, j=1,2 ..., N, zjIt is pairing
The label vector of j-th of sample of data.
Further, the function f () for calculating similitude is the similarity measurement based on index square
Function:
Wherein, σ is invariant, | | | |2Indicate 2- norm.
Using the present invention obtain the utility model has the advantages that the present invention passes through the high-level semantic to multi-tag form in multimedia document
Information is used while excavating nonlinear correlativity between different modalities using KCCA, study arrived more suitable for across
The public subspace with more distinction of the different modalities of mode retrieval tasks has obtained very well in the subspace learnt
Retrieval effectiveness, more existing method has significant improvement.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is ml-KCCA search method schematic diagram of the present invention;
Fig. 3 is influence of the parameter η and σ to ml-KCCA model, and the evaluation index of use is the (table in figure of Precision@10
It is shown as P@10), Precision@10 indicates file proportion associated with the query in preceding ten samples returned the result.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.
As shown in Figure 1, being flow chart of the present invention, key step are as follows:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects to match logarithm
According to sample;
(S2) the Semantic Similarity matrix of label is calculated;
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection;
(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new
Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval;When carrying out text retrieval visual pattern, then by text
It is mapped to subspace by M, and then carries out similarity retrieval.
As shown in Fig. 2, being ml-KCCA search method schematic diagram of the present invention;Figure intermediate cam shape and square indicate vision figure
Data point in picture and text modality, symbol "+", "-", " x ", " ÷ " indicate different class labels.Scheming (a) is text and vision
Image instance is from their own feature space to the public subspace for using ml-KCCA to learn.Scheming (b) is that pairs of distance has
Have the example of similar label in the public subspace learnt by ml-KCCA closer to.Scheming (c) is that two-way cross-module state retrieves example
Son: after text and image are mapped to the subspace of study, text query can more accurately retrieve image, otherwise also
So.
It is illustrated in figure 3 the experimental result that parameter η and σ influences model.Judging from the experimental results, in addition to extreme at two
In the case of, the performance of ml-KCCA is all an advantage over KCCA's.
Ml-KCCA and other are as shown in table 1 based on performance of the method for CCA on Pascal data set, it can be seen that this
Inventive method is all the method to behave oneself best in most cases.Table 1 is CCA and other search methods in Pascal data set
On performance comparison.Using MAP (average accuracy mean value) evaluation index.Image labeling (Image in table 1
It annotation) is image retrieval text, image retrieval (Image retrieval) is text retrieval image.
1 the method for the present invention of table and art methods retrieval situation compare statistical form
Method | Image annotation | Image retrieval |
KGMMFA | 42.1 | 32.8 |
KGMLDA | 42.7 | 33.9 |
LCFS | 34.4 | 26.7 |
LGCFL | 37.8 | 32.9 |
ml-CCA | 48.4 | 38.0 |
ml-KCCA | 50.91 | 41.17 |
The foregoing is merely one embodiment of the present invention, and the invention is not limited to above embodiment, are implementing
There may be the small changes in part in the process, if to various changes or modifications of the invention do not depart from spirit of the invention and
Range, and belong within the scope of claim and equivalent technologies of the invention.
Claims (3)
1. a kind of multi-tag coring canonical correlation analysis search method, which comprises the following steps:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects paired data
Sample;
The sample of paired data is expressed as { (t1,p1,z1),...,(ti,pi,zi),...,(tN,pN,zN), wherein ziIt is with logarithm
According to i-th of sample label vector, Tw=[t1,t2,...,tN]∈Rdt×N, TwIt is the matrix expression of samples of text, dt is text
The dimension of this sample, P=[p1,p2,...,pN]∈Rdp×N, P is that the matrix of visual pattern sample indicates that dp indicates visual pattern
The dimension of sample;Z=[z1,z2,...,zN]∈RC×N, wherein Z indicates that label matrix, C are the dimension of label, and N is paired data
Sample number;
(S2) the Semantic Similarity matrix of label is calculated;If f (g) is the letter for calculating similitude between any two label vector
It counts, then Semantic Similarity matrix S:
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:
Wherein, ρ is related coefficient,KtAnd KpN is respectively indicated to N × N nuclear matrix of sample, η
For controlling the influence coefficient of Semantic Similarity matrix, α, β indicate projection vector;
According toWithα will be solved, it is as follows that β process is converted into solution eigenvalue problem:
B-1Aw=λ w (4)
Wherein, λ is characterized value,W=[α β]T, according to the maximum D of value
A characteristic value finds out corresponding a series of vector (α1,β1),...,(αD,βD);
According to (α1,β1),...,(αD,βD), by new text input txIt projects in the specified single text input of α:
Wherein αiIndicate i-th of element of vector α, tiRepresent i-th of sample in N number of sample data, φtIndicate data space
To the mapping function of feature space, ktIndicate the kernel function of feature space;
(S4) visual pattern and text are sought respectively in the projective representation of multi-modal shared subspace;
New text txThe final projection M of public subspace is tieed up to D are as follows:
New vision image pxThe final projection Q of public subspace is tieed up to D are as follows:
(S5) it is retrieved, obtains the search result across Modal Subspace, specifically: when carrying out image retrieval text, then pass through
New vision image is mapped to subspace by Q, and then carries out similarity retrieval;When carrying out text retrieval visual pattern, then will
Text is mapped to subspace by M, and then carries out similarity retrieval.
2. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating
The function f (g) of similitude is the similarity measurements flow function based on dot product:
Wherein<>indicates point multiplication operation, | | g | | indicate modulus operation, i=1,2, L, N, j=1,2, L, N, ziIt is paired data
I-th of sample label vector, zjIt is the label vector of j-th of sample of paired data.
3. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating
The function f (g) of similitude is the similarity measurements flow function based on index square:
Wherein, σ expression invariant, i=1,2, L, N, j=1,2, L, N, | | | |2Indicate 2- norm, ziIt is paired data
The label vector of i-th of sample, zjIt is the label vector of j-th of sample of paired data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710158859.9A CN106951509B (en) | 2017-03-17 | 2017-03-17 | Multi-tag coring canonical correlation analysis search method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710158859.9A CN106951509B (en) | 2017-03-17 | 2017-03-17 | Multi-tag coring canonical correlation analysis search method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951509A CN106951509A (en) | 2017-07-14 |
CN106951509B true CN106951509B (en) | 2019-08-09 |
Family
ID=59472070
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710158859.9A Active CN106951509B (en) | 2017-03-17 | 2017-03-17 | Multi-tag coring canonical correlation analysis search method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951509B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110209145B (en) * | 2019-05-16 | 2020-09-11 | 浙江大学 | Carbon dioxide absorption tower fault diagnosis method based on nuclear matrix approximation |
CN111930972B (en) * | 2020-08-04 | 2021-04-27 | 山东大学 | Cross-modal retrieval method and system for multimedia data by using label level information |
CN113361198B (en) * | 2021-06-09 | 2023-11-03 | 南京大学 | Crowd-sourced test report fusion method based on public and private information mining |
CN115599984B (en) * | 2022-09-09 | 2023-06-09 | 北京理工大学 | Retrieval method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521368A (en) * | 2011-12-16 | 2012-06-27 | 武汉科技大学 | Similarity matrix iteration based cross-media semantic digesting and optimizing method |
CN103995903A (en) * | 2014-06-12 | 2014-08-20 | 武汉科技大学 | Cross-media search method based on isomorphic subspace mapping and optimization |
CN104166982A (en) * | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9613118B2 (en) * | 2013-03-18 | 2017-04-04 | Spotify Ab | Cross media recommendation |
-
2017
- 2017-03-17 CN CN201710158859.9A patent/CN106951509B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521368A (en) * | 2011-12-16 | 2012-06-27 | 武汉科技大学 | Similarity matrix iteration based cross-media semantic digesting and optimizing method |
CN103995903A (en) * | 2014-06-12 | 2014-08-20 | 武汉科技大学 | Cross-media search method based on isomorphic subspace mapping and optimization |
CN104166982A (en) * | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
Non-Patent Citations (1)
Title |
---|
Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval;Yuhua Jia 等;《Multimedia Tools and Applications》;20180227;1-20 * |
Also Published As
Publication number | Publication date |
---|---|
CN106951509A (en) | 2017-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yao et al. | Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model | |
CN106649715B (en) | A kind of cross-media retrieval method based on local sensitivity hash algorithm and neural network | |
CN106951509B (en) | Multi-tag coring canonical correlation analysis search method | |
CN111506714A (en) | Knowledge graph embedding based question answering | |
CN103559191B (en) | Based on latent space study and Bidirectional sort study across media sort method | |
CN103617157A (en) | Text similarity calculation method based on semantics | |
CN111325243B (en) | Visual relationship detection method based on regional attention learning mechanism | |
CN105930873B (en) | A kind of walking across mode matching method certainly based on subspace | |
CN113705570B (en) | Deep learning-based few-sample target detection method | |
CN103995903A (en) | Cross-media search method based on isomorphic subspace mapping and optimization | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
CN113705218A (en) | Event element gridding extraction method based on character embedding, storage medium and electronic device | |
CN111666766A (en) | Data processing method, device and equipment | |
Maher et al. | Effectiveness of different similarity measures for text classification and clustering | |
KR20120047622A (en) | System and method for managing digital contents | |
CN112131453A (en) | Method, device and storage medium for detecting network bad short text based on BERT | |
CN105701227B (en) | A kind of across media method for measuring similarity and search method based on local association figure | |
Jia et al. | Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval | |
CN113821702A (en) | Urban multidimensional space multivariate heterogeneous information data processing method | |
CN114491071A (en) | Food safety knowledge graph construction method and system based on cross-media data | |
CN107633259B (en) | Cross-modal learning method based on sparse dictionary representation | |
Mandal et al. | Query specific re-ranking for improved cross-modal retrieval | |
CN111177492A (en) | Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization | |
Li et al. | A novel relevance feedback method in content-based image retrieval | |
Patel et al. | A survey on context based similarity techniques for image retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |