CN106951509B

CN106951509B - Multi-tag coring canonical correlation analysis search method

Info

Publication number: CN106951509B
Application number: CN201710158859.9A
Authority: CN
Inventors: 白亮; 贾玉华; 王昊冉; 郭金林; 谢毓湘; 于天元
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2019-08-09
Anticipated expiration: 2037-03-17
Also published as: CN106951509A

Abstract

The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis search method.Key step are as follows: (S1) selects text and visual pattern, the paired data of building text, visual pattern and label, and selects the sample of paired data；(S2) the Semantic Similarity matrix of label is calculated；(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis；(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection；(S5) it is retrieved, obtains the search result across Modal Subspace.The present invention is used by the high-layer semantic information to multi-tag form in multimedia document while excavating nonlinear correlativity between different modalities using KCCA, learn to the public subspace with more distinction of the different modalities more suitable for cross-module state retrieval tasks, has obtained good retrieval effectiveness.

Description

Multi-tag coring canonical correlation analysis search method

Technical field

The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis retrieval side Method.

Background technique

Cross-module state information retrieval is a challenging research topic, and inquiry and result belong to different mode, across Modal information retrieval is the mutual retrieval between multi-modal information, such as image retrieval text, text retrieval image.Due to " language The presence of adopted wide gap " cannot directly be compared it.Therefore, the critical issue in this task is how to measure multiple mode The distance between or similitude.The two feature spaces are aligned by one shared subspace of study in the prior art, so as to It can compare between different mode.In the conventional method, canonical correlation analysis (Canonical Correlation Analysis, abbreviation CCA) [1] show its simplicity and efficiency, by maximizing the correlation between two mode projections Property learn shared subspace.CCA has become the main force of many cross-module state search methods.Many expansions of CCA have been proposed Task of the exhibition for cross-mode retrieval in recent years.

Although CCA is welcome due to its simplicity and efficiency, it has the shortcomings that several.CCA is dependent between mode One-to-one pairing relationship, not using high-level semantic label information present in multimedia document, this causes it cannot It obtains being better adapted to cross-module state retrieval tasks subspace.Recently, it has been proposed that use the expansion of some CCA of label information Exhibition method.However, these most of work are only applicable in single label annotation multimedia document.Under normal conditions, a figure As may belong to multiple classes, it is therefore assumed that it is unreasonable that data are annotated with single label, and cause label information not by most The utilization of big degree.Therefore, closer to truth be consider multi-tag information with excavate the data from different modalities it Between correlation.

Canonical correlation analysis is proposed by Hotelling first, is a kind of for finding the subspace of multiple data spaces Data analysing method.However, classics CCA has ignored the additional high-level semantic information for limiting its performance.Rasiwasia et al is Single label data collection proposes cluster-CCA [1].Viresh Ranjan et al proposes consideration multi-tag in document [2] Multi-tag canonical correlation analysis (multi-label Canonical the Correlation Analysis, abbreviation ml- of information CCA).As the expanding method of CCA, other expanding methods that ml-CCA surpasses most of CCA, which are benefited from, considers multiple labeling information. However ml-CCA be linear method and the correlation between the data of different modalities often and non-linear relation, which limits The performance of ml-CCA.

Under normal conditions, image may belong to multiple classes, i.e. an image usually corresponds to multiple labels, consider multi-tag letter The case where breath, more meets reality；Meanwhile it is multi-modal between correlation not simply linear relationship.However, existing method Perhaps the high-level semantics information of multi-tag message form is not utilized or cannot excavate the non-linear relation between mode.With reference to text It offers as follows:

[1]Vijay Mahadevan,“Cluster canonical correlation analysis,”Aistats, 2014.

[2]V.Ranjan,N.Rasiwasia,and C.V.Jawahar,“Multilabel cross-modal retrieval,”in 2015IEEE International Conference on Computer Vision(ICCV),Dec 2015,pp.4094–4102.

Summary of the invention

The present invention is based on coring canonical correlation analysis and multi-tag information, propose the frame of a new cross-module state retrieval Frame -- multi-tag coring canonical correlation analysis (multi-label kernel Canonical Correlation Analysis, Abbreviation ml-KCCA) search method.Before describing technical solution of the present invention, first coring canonical correlation analysis is introduced. KCCA is the coring version of CCA.Two views of data-oriented, we can construct their common expression by coring CCA. Formally, the paired data { (t of text and visual pattern is given₁,p₁),...,(t_N,p_N) sample, wherein t ∈ R^t, p ∈ R^pPoint Not Biao Shi sample text and visual modalities feature vector, give two feature spaces kernel function (in this method be Gauss Kernel function), respectively k_t(t_i,t_j)=φ_t(t_i)^Tφ_t(t_j), k_p(p_i,p_j)=φ_p(p_i)^Tφ_p(p_j), wherein t_i,t_j, p_i,p_j For the sample point in data space, φ_t、φ_pMapping function of the data space to feature space, i=1,2 ..., N, j=1, 2 ..., N, T indicate transposition symbol；

The objective function of coring form can be extended and be verified as the form of formula (1), to determine projection vector α, β ∈ R^NTo maximize Canonical correlation:

Wherein ρ^*For related coefficient, K_t=(k_t(t_i,t_j))_N×NAnd K_p=(k_p(p_i,p_j))_N×NIndicate N to N × N of sample Nuclear matrix.The problem can be converted into eigenvalue problem and be solved, [α β]^TFor feature vector, according to being worth maximum D feature Value can find out corresponding a series of (α¹,β¹),(α²,β²) ..., (α^D,β^D), for calculating new input text t or vision figure As the D of p ties up projection.Max indicates maximizing.In its primitive form, KCCA cannot utilize label information.Therefore, α, β cannot Using label information, and it is not enough to very good solution cross-module state retrieval tasks.

The high-layer semantic information of multi-tag message form is by ml-KCCA for learning more suitable for cross-module state retrieval tasks The nonlinear dependence between different modalities is excavated by Kernel-Based Methods in the shared subspace with more distinction of different modalities System.Meanwhile incomplete Choleskydecomposition is used for the characteristic value Solve problems that ml-KCCA accelerates solution KCCA.Particular technique Scheme is as follows:

A kind of multi-tag coring canonical correlation analysis search method, comprising the following steps:

(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects to match logarithm According to sample；

The sample of paired data is expressed as { (t₁,p₁,z₁),...,(t_i,p_i,z_i),...,(t_N,p_N,z_N), wherein z_iIt is The label vector of i-th of sample of paired data, i=1,2 ..., N, T_w=[t₁,t₂,...,t_N]∈R^dt×N, wherein T_wIt is text The matrix expression of sample, the dimension of dt samples of text, P=[p₁,p₂,...,p_N]∈R^dp×N, wherein P is visual pattern sample Matrix indicates that dp indicates the dimension of visual pattern sample.Z=[z₁,z₂,...,z_N]∈R^C×N, wherein Z indicates label matrix, in Z Multiple elements in each column may be non-zero, that is, exist simultaneously multiple labels, and C is the dimension of label, and N is the sample of paired data This number, N round numbers.

(S2) the Semantic Similarity matrix of label is calculated；It is similar between calculating any two label vector for enabling f () The function of property, then Semantic Similarity matrix S:

(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis；

Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:

Wherein ρ is related coefficient,Respectively indicate N × N nuclear matrix, K_tAnd K_pRespectively indicate N To N × N nuclear matrix of sample, i.e., N is handled using multi-tag information to paired data sample, η is for controlling Semantic Similarity The influence coefficient of matrix；

According toWithα will be solved, β process is converted into solution eigenvalue problem by being similar in the case where KCCA:

B^-1Aw=λ w (4)

Wherein, λ is characterized value,W=[α β]^T, most according to value D big characteristic value finds out corresponding a series of vector (α¹,β¹),...,(α^D,β^D)；

Find out (α¹,β¹),...,(α^D,β^D) after, text and visual pattern feature in multi-modal shared subspace Expression can be obtained, by the weighting kernel function between assessment input and N number of sampled point, by new text input t_xProject to α In specified single text input:

Wherein α_iIndicate i-th of element of vector α, t_iRepresent i-th of sample in N number of sample data.

(S4) visual pattern and text are sought respectively in the projective representation of multi-modal shared subspace；

New text t_xThe final projection M of public subspace is tieed up to D are as follows:

Wherein,Indicate vector α¹I-th of element；

Similar, new vision image p_xThe final projection Q of public subspace is tieed up to D are as follows:

Wherein,Indicate vector β¹I-th of element；

(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval；It, then will be new when carrying out text retrieval visual pattern Text is mapped to subspace by M, and then carries out similarity retrieval.

Further, the function f () for calculating similitude is the similarity measurements flow function based on dot product:

Wherein<>indicates dot product, | | | | indicate modulus operation, i=1,2 ..., N, j=1,2 ..., N, z_jIt is pairing The label vector of j-th of sample of data.

Further, the function f () for calculating similitude is the similarity measurement based on index square

Function:

Wherein, σ is invariant, | | | |₂Indicate 2- norm.

Using the present invention obtain the utility model has the advantages that the present invention passes through the high-level semantic to multi-tag form in multimedia document Information is used while excavating nonlinear correlativity between different modalities using KCCA, study arrived more suitable for across The public subspace with more distinction of the different modalities of mode retrieval tasks has obtained very well in the subspace learnt Retrieval effectiveness, more existing method has significant improvement.

Detailed description of the invention

Fig. 1 is the method for the present invention flow chart；

Fig. 2 is ml-KCCA search method schematic diagram of the present invention；

Fig. 3 is influence of the parameter η and σ to ml-KCCA model, and the evaluation index of use is the (table in figure of Precision@10 It is shown as P@10), Precision@10 indicates file proportion associated with the query in preceding ten samples returned the result.

Specific embodiment

Present invention will be further explained below with reference to the attached drawings and examples.

As shown in Figure 1, being flow chart of the present invention, key step are as follows:

(S2) the Semantic Similarity matrix of label is calculated；

(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection；

(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval；When carrying out text retrieval visual pattern, then by text It is mapped to subspace by M, and then carries out similarity retrieval.

As shown in Fig. 2, being ml-KCCA search method schematic diagram of the present invention；Figure intermediate cam shape and square indicate vision figure Data point in picture and text modality, symbol "+", "-", " x ", " ÷ " indicate different class labels.Scheming (a) is text and vision Image instance is from their own feature space to the public subspace for using ml-KCCA to learn.Scheming (b) is that pairs of distance has Have the example of similar label in the public subspace learnt by ml-KCCA closer to.Scheming (c) is that two-way cross-module state retrieves example Son: after text and image are mapped to the subspace of study, text query can more accurately retrieve image, otherwise also So.

It is illustrated in figure 3 the experimental result that parameter η and σ influences model.Judging from the experimental results, in addition to extreme at two In the case of, the performance of ml-KCCA is all an advantage over KCCA's.

Ml-KCCA and other are as shown in table 1 based on performance of the method for CCA on Pascal data set, it can be seen that this Inventive method is all the method to behave oneself best in most cases.Table 1 is CCA and other search methods in Pascal data set On performance comparison.Using MAP (average accuracy mean value) evaluation index.Image labeling (Image in table 1 It annotation) is image retrieval text, image retrieval (Image retrieval) is text retrieval image.

1 the method for the present invention of table and art methods retrieval situation compare statistical form

Method	Image annotation	Image retrieval
			KGMMFA	42.1	32.8
KGMLDA	42.7	33.9
			LCFS	34.4	26.7
LGCFL	37.8	32.9
			ml-CCA	48.4	38.0
ml-KCCA	50.91	41.17

The foregoing is merely one embodiment of the present invention, and the invention is not limited to above embodiment, are implementing There may be the small changes in part in the process, if to various changes or modifications of the invention do not depart from spirit of the invention and Range, and belong within the scope of claim and equivalent technologies of the invention.

Claims

1. a kind of multi-tag coring canonical correlation analysis search method, which comprises the following steps:

(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects paired data Sample；

The sample of paired data is expressed as { (t₁,p₁,z₁),...,(t_i,p_i,z_i),...,(t_N,p_N,z_N), wherein z_iIt is with logarithm According to i-th of sample label vector, T_w=[t₁,t₂,...,t_N]∈R^dt×N, T_wIt is the matrix expression of samples of text, dt is text The dimension of this sample, P=[p₁,p₂,...,p_N]∈R^dp×N, P is that the matrix of visual pattern sample indicates that dp indicates visual pattern The dimension of sample；Z=[z₁,z₂,...,z_N]∈R^C×N, wherein Z indicates that label matrix, C are the dimension of label, and N is paired data Sample number；

(S2) the Semantic Similarity matrix of label is calculated；If f (g) is the letter for calculating similitude between any two label vector It counts, then Semantic Similarity matrix S:

Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:

Wherein, ρ is related coefficient,K_tAnd K_pN is respectively indicated to N × N nuclear matrix of sample, η For controlling the influence coefficient of Semantic Similarity matrix, α, β indicate projection vector；

According toWithα will be solved, it is as follows that β process is converted into solution eigenvalue problem:

B^-1Aw=λ w (4)

Wherein, λ is characterized value,W=[α β]^T, according to the maximum D of value A characteristic value finds out corresponding a series of vector (α¹,β¹),...,(α^D,β^D)；

According to (α¹,β¹),...,(α^D,β^D), by new text input t_xIt projects in the specified single text input of α:

Wherein α_iIndicate i-th of element of vector α, t_iRepresent i-th of sample in N number of sample data, φ_tIndicate data space To the mapping function of feature space, k_tIndicate the kernel function of feature space；

New vision image p_xThe final projection Q of public subspace is tieed up to D are as follows:

(S5) it is retrieved, obtains the search result across Modal Subspace, specifically: when carrying out image retrieval text, then pass through New vision image is mapped to subspace by Q, and then carries out similarity retrieval；When carrying out text retrieval visual pattern, then will Text is mapped to subspace by M, and then carries out similarity retrieval.

2. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating The function f (g) of similitude is the similarity measurements flow function based on dot product:

Wherein<>indicates point multiplication operation, | | g | | indicate modulus operation, i=1,2, L, N, j=1,2, L, N, z_iIt is paired data I-th of sample label vector, z_jIt is the label vector of j-th of sample of paired data.

3. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating The function f (g) of similitude is the similarity measurements flow function based on index square:

Wherein, σ expression invariant, i=1,2, L, N, j=1,2, L, N, | | | |₂Indicate 2- norm, z_iIt is paired data The label vector of i-th of sample, z_jIt is the label vector of j-th of sample of paired data.