CN106951509B - Multi-tag coring canonical correlation analysis search method - Google Patents

Multi-tag coring canonical correlation analysis search method Download PDF

Info

Publication number
CN106951509B
CN106951509B CN201710158859.9A CN201710158859A CN106951509B CN 106951509 B CN106951509 B CN 106951509B CN 201710158859 A CN201710158859 A CN 201710158859A CN 106951509 B CN106951509 B CN 106951509B
Authority
CN
China
Prior art keywords
sample
text
subspace
label
visual pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710158859.9A
Other languages
Chinese (zh)
Other versions
CN106951509A (en
Inventor
白亮
贾玉华
王昊冉
郭金林
谢毓湘
于天元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710158859.9A priority Critical patent/CN106951509B/en
Publication of CN106951509A publication Critical patent/CN106951509A/en
Application granted granted Critical
Publication of CN106951509B publication Critical patent/CN106951509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/44Browsing; Visualisation therefor
    • G06F16/444Spatial browsing, e.g. 2D maps, 3D or virtual spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles

Abstract

The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis search method.Key step are as follows: (S1) selects text and visual pattern, the paired data of building text, visual pattern and label, and selects the sample of paired data;(S2) the Semantic Similarity matrix of label is calculated;(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection;(S5) it is retrieved, obtains the search result across Modal Subspace.The present invention is used by the high-layer semantic information to multi-tag form in multimedia document while excavating nonlinear correlativity between different modalities using KCCA, learn to the public subspace with more distinction of the different modalities more suitable for cross-module state retrieval tasks, has obtained good retrieval effectiveness.

Description

Multi-tag coring canonical correlation analysis search method
Technical field
The invention belongs to computer search technical fields, and in particular to a kind of multi-tag coring canonical correlation analysis retrieval side Method.
Background technique
Cross-module state information retrieval is a challenging research topic, and inquiry and result belong to different mode, across Modal information retrieval is the mutual retrieval between multi-modal information, such as image retrieval text, text retrieval image.Due to " language The presence of adopted wide gap " cannot directly be compared it.Therefore, the critical issue in this task is how to measure multiple mode The distance between or similitude.The two feature spaces are aligned by one shared subspace of study in the prior art, so as to It can compare between different mode.In the conventional method, canonical correlation analysis (Canonical Correlation Analysis, abbreviation CCA) [1] show its simplicity and efficiency, by maximizing the correlation between two mode projections Property learn shared subspace.CCA has become the main force of many cross-module state search methods.Many expansions of CCA have been proposed Task of the exhibition for cross-mode retrieval in recent years.
Although CCA is welcome due to its simplicity and efficiency, it has the shortcomings that several.CCA is dependent between mode One-to-one pairing relationship, not using high-level semantic label information present in multimedia document, this causes it cannot It obtains being better adapted to cross-module state retrieval tasks subspace.Recently, it has been proposed that use the expansion of some CCA of label information Exhibition method.However, these most of work are only applicable in single label annotation multimedia document.Under normal conditions, a figure As may belong to multiple classes, it is therefore assumed that it is unreasonable that data are annotated with single label, and cause label information not by most The utilization of big degree.Therefore, closer to truth be consider multi-tag information with excavate the data from different modalities it Between correlation.
Canonical correlation analysis is proposed by Hotelling first, is a kind of for finding the subspace of multiple data spaces Data analysing method.However, classics CCA has ignored the additional high-level semantic information for limiting its performance.Rasiwasia et al is Single label data collection proposes cluster-CCA [1].Viresh Ranjan et al proposes consideration multi-tag in document [2] Multi-tag canonical correlation analysis (multi-label Canonical the Correlation Analysis, abbreviation ml- of information CCA).As the expanding method of CCA, other expanding methods that ml-CCA surpasses most of CCA, which are benefited from, considers multiple labeling information. However ml-CCA be linear method and the correlation between the data of different modalities often and non-linear relation, which limits The performance of ml-CCA.
Under normal conditions, image may belong to multiple classes, i.e. an image usually corresponds to multiple labels, consider multi-tag letter The case where breath, more meets reality;Meanwhile it is multi-modal between correlation not simply linear relationship.However, existing method Perhaps the high-level semantics information of multi-tag message form is not utilized or cannot excavate the non-linear relation between mode.With reference to text It offers as follows:
[1]Vijay Mahadevan,“Cluster canonical correlation analysis,”Aistats, 2014.
[2]V.Ranjan,N.Rasiwasia,and C.V.Jawahar,“Multilabel cross-modal retrieval,”in 2015IEEE International Conference on Computer Vision(ICCV),Dec 2015,pp.4094–4102.
Summary of the invention
The present invention is based on coring canonical correlation analysis and multi-tag information, propose the frame of a new cross-module state retrieval Frame -- multi-tag coring canonical correlation analysis (multi-label kernel Canonical Correlation Analysis, Abbreviation ml-KCCA) search method.Before describing technical solution of the present invention, first coring canonical correlation analysis is introduced. KCCA is the coring version of CCA.Two views of data-oriented, we can construct their common expression by coring CCA. Formally, the paired data { (t of text and visual pattern is given1,p1),...,(tN,pN) sample, wherein t ∈ Rt, p ∈ RpPoint Not Biao Shi sample text and visual modalities feature vector, give two feature spaces kernel function (in this method be Gauss Kernel function), respectively kt(ti,tj)=φt(ti)Tφt(tj), kp(pi,pj)=φp(pi)Tφp(pj), wherein ti,tj, pi,pj For the sample point in data space, φt、φpMapping function of the data space to feature space, i=1,2 ..., N, j=1, 2 ..., N, T indicate transposition symbol;
The objective function of coring form can be extended and be verified as the form of formula (1), to determine projection vector α, β ∈ RNTo maximize Canonical correlation:
Wherein ρ*For related coefficient, Kt=(kt(ti,tj))N×NAnd Kp=(kp(pi,pj))N×NIndicate N to N × N of sample Nuclear matrix.The problem can be converted into eigenvalue problem and be solved, [α β]TFor feature vector, according to being worth maximum D feature Value can find out corresponding a series of (α11),(α22) ..., (αDD), for calculating new input text t or vision figure As the D of p ties up projection.Max indicates maximizing.In its primitive form, KCCA cannot utilize label information.Therefore, α, β cannot Using label information, and it is not enough to very good solution cross-module state retrieval tasks.
The high-layer semantic information of multi-tag message form is by ml-KCCA for learning more suitable for cross-module state retrieval tasks The nonlinear dependence between different modalities is excavated by Kernel-Based Methods in the shared subspace with more distinction of different modalities System.Meanwhile incomplete Choleskydecomposition is used for the characteristic value Solve problems that ml-KCCA accelerates solution KCCA.Particular technique Scheme is as follows:
A kind of multi-tag coring canonical correlation analysis search method, comprising the following steps:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects to match logarithm According to sample;
The sample of paired data is expressed as { (t1,p1,z1),...,(ti,pi,zi),...,(tN,pN,zN), wherein ziIt is The label vector of i-th of sample of paired data, i=1,2 ..., N, Tw=[t1,t2,...,tN]∈Rdt×N, wherein TwIt is text The matrix expression of sample, the dimension of dt samples of text, P=[p1,p2,...,pN]∈Rdp×N, wherein P is visual pattern sample Matrix indicates that dp indicates the dimension of visual pattern sample.Z=[z1,z2,...,zN]∈RC×N, wherein Z indicates label matrix, in Z Multiple elements in each column may be non-zero, that is, exist simultaneously multiple labels, and C is the dimension of label, and N is the sample of paired data This number, N round numbers.
(S2) the Semantic Similarity matrix of label is calculated;It is similar between calculating any two label vector for enabling f () The function of property, then Semantic Similarity matrix S:
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:
Wherein ρ is related coefficient,Respectively indicate N × N nuclear matrix, KtAnd KpRespectively indicate N To N × N nuclear matrix of sample, i.e., N is handled using multi-tag information to paired data sample, η is for controlling Semantic Similarity The influence coefficient of matrix;
According toWithα will be solved, β process is converted into solution eigenvalue problem by being similar in the case where KCCA:
B-1Aw=λ w (4)
Wherein, λ is characterized value,W=[α β]T, most according to value D big characteristic value finds out corresponding a series of vector (α11),...,(αDD);
Find out (α11),...,(αDD) after, text and visual pattern feature in multi-modal shared subspace Expression can be obtained, by the weighting kernel function between assessment input and N number of sampled point, by new text input txProject to α In specified single text input:
Wherein αiIndicate i-th of element of vector α, tiRepresent i-th of sample in N number of sample data.
(S4) visual pattern and text are sought respectively in the projective representation of multi-modal shared subspace;
New text txThe final projection M of public subspace is tieed up to D are as follows:
Wherein,Indicate vector α1I-th of element;
Similar, new vision image pxThe final projection Q of public subspace is tieed up to D are as follows:
Wherein,Indicate vector β1I-th of element;
(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval;It, then will be new when carrying out text retrieval visual pattern Text is mapped to subspace by M, and then carries out similarity retrieval.
Further, the function f () for calculating similitude is the similarity measurements flow function based on dot product:
Wherein<>indicates dot product, | | | | indicate modulus operation, i=1,2 ..., N, j=1,2 ..., N, zjIt is pairing The label vector of j-th of sample of data.
Further, the function f () for calculating similitude is the similarity measurement based on index square
Function:
Wherein, σ is invariant, | | | |2Indicate 2- norm.
Using the present invention obtain the utility model has the advantages that the present invention passes through the high-level semantic to multi-tag form in multimedia document Information is used while excavating nonlinear correlativity between different modalities using KCCA, study arrived more suitable for across The public subspace with more distinction of the different modalities of mode retrieval tasks has obtained very well in the subspace learnt Retrieval effectiveness, more existing method has significant improvement.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is ml-KCCA search method schematic diagram of the present invention;
Fig. 3 is influence of the parameter η and σ to ml-KCCA model, and the evaluation index of use is the (table in figure of Precision@10 It is shown as P@10), Precision@10 indicates file proportion associated with the query in preceding ten samples returned the result.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.
As shown in Figure 1, being flow chart of the present invention, key step are as follows:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects to match logarithm According to sample;
(S2) the Semantic Similarity matrix of label is calculated;
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
(S4) seek visual pattern and text respectively indicates in multi-modal shared subspace projection;
(S5) retrieved, obtain the search result across Modal Subspace: when carrying out image retrieval text, then passing through will be new Visual pattern is mapped to subspace by Q, and then carries out similarity retrieval;When carrying out text retrieval visual pattern, then by text It is mapped to subspace by M, and then carries out similarity retrieval.
As shown in Fig. 2, being ml-KCCA search method schematic diagram of the present invention;Figure intermediate cam shape and square indicate vision figure Data point in picture and text modality, symbol "+", "-", " x ", " ÷ " indicate different class labels.Scheming (a) is text and vision Image instance is from their own feature space to the public subspace for using ml-KCCA to learn.Scheming (b) is that pairs of distance has Have the example of similar label in the public subspace learnt by ml-KCCA closer to.Scheming (c) is that two-way cross-module state retrieves example Son: after text and image are mapped to the subspace of study, text query can more accurately retrieve image, otherwise also So.
It is illustrated in figure 3 the experimental result that parameter η and σ influences model.Judging from the experimental results, in addition to extreme at two In the case of, the performance of ml-KCCA is all an advantage over KCCA's.
Ml-KCCA and other are as shown in table 1 based on performance of the method for CCA on Pascal data set, it can be seen that this Inventive method is all the method to behave oneself best in most cases.Table 1 is CCA and other search methods in Pascal data set On performance comparison.Using MAP (average accuracy mean value) evaluation index.Image labeling (Image in table 1 It annotation) is image retrieval text, image retrieval (Image retrieval) is text retrieval image.
1 the method for the present invention of table and art methods retrieval situation compare statistical form
Method Image annotation Image retrieval
KGMMFA 42.1 32.8
KGMLDA 42.7 33.9
LCFS 34.4 26.7
LGCFL 37.8 32.9
ml-CCA 48.4 38.0
ml-KCCA 50.91 41.17
The foregoing is merely one embodiment of the present invention, and the invention is not limited to above embodiment, are implementing There may be the small changes in part in the process, if to various changes or modifications of the invention do not depart from spirit of the invention and Range, and belong within the scope of claim and equivalent technologies of the invention.

Claims (3)

1. a kind of multi-tag coring canonical correlation analysis search method, which comprises the following steps:
(S1) text and visual pattern, the paired data of building text, visual pattern and label are selected, and selects paired data Sample;
The sample of paired data is expressed as { (t1,p1,z1),...,(ti,pi,zi),...,(tN,pN,zN), wherein ziIt is with logarithm According to i-th of sample label vector, Tw=[t1,t2,...,tN]∈Rdt×N, TwIt is the matrix expression of samples of text, dt is text The dimension of this sample, P=[p1,p2,...,pN]∈Rdp×N, P is that the matrix of visual pattern sample indicates that dp indicates visual pattern The dimension of sample;Z=[z1,z2,...,zN]∈RC×N, wherein Z indicates that label matrix, C are the dimension of label, and N is paired data Sample number;
(S2) the Semantic Similarity matrix of label is calculated;If f (g) is the letter for calculating similitude between any two label vector It counts, then Semantic Similarity matrix S:
(S3) Semantic Similarity matrix application is sought into multi-modal shared subspace in coring canonical correlation analysis;
Learn common multi-modal shared subspace to obtain, ml-KCCA be formulated as:
Wherein, ρ is related coefficient,KtAnd KpN is respectively indicated to N × N nuclear matrix of sample, η For controlling the influence coefficient of Semantic Similarity matrix, α, β indicate projection vector;
According toWithα will be solved, it is as follows that β process is converted into solution eigenvalue problem:
B-1Aw=λ w (4)
Wherein, λ is characterized value,W=[α β]T, according to the maximum D of value A characteristic value finds out corresponding a series of vector (α11),...,(αDD);
According to (α11),...,(αDD), by new text input txIt projects in the specified single text input of α:
Wherein αiIndicate i-th of element of vector α, tiRepresent i-th of sample in N number of sample data, φtIndicate data space To the mapping function of feature space, ktIndicate the kernel function of feature space;
(S4) visual pattern and text are sought respectively in the projective representation of multi-modal shared subspace;
New text txThe final projection M of public subspace is tieed up to D are as follows:
New vision image pxThe final projection Q of public subspace is tieed up to D are as follows:
(S5) it is retrieved, obtains the search result across Modal Subspace, specifically: when carrying out image retrieval text, then pass through New vision image is mapped to subspace by Q, and then carries out similarity retrieval;When carrying out text retrieval visual pattern, then will Text is mapped to subspace by M, and then carries out similarity retrieval.
2. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating The function f (g) of similitude is the similarity measurements flow function based on dot product:
Wherein<>indicates point multiplication operation, | | g | | indicate modulus operation, i=1,2, L, N, j=1,2, L, N, ziIt is paired data I-th of sample label vector, zjIt is the label vector of j-th of sample of paired data.
3. a kind of multi-tag coring canonical correlation analysis search method as described in claim 1, which is characterized in that the calculating The function f (g) of similitude is the similarity measurements flow function based on index square:
Wherein, σ expression invariant, i=1,2, L, N, j=1,2, L, N, | | | |2Indicate 2- norm, ziIt is paired data The label vector of i-th of sample, zjIt is the label vector of j-th of sample of paired data.
CN201710158859.9A 2017-03-17 2017-03-17 Multi-tag coring canonical correlation analysis search method Active CN106951509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710158859.9A CN106951509B (en) 2017-03-17 2017-03-17 Multi-tag coring canonical correlation analysis search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710158859.9A CN106951509B (en) 2017-03-17 2017-03-17 Multi-tag coring canonical correlation analysis search method

Publications (2)

Publication Number Publication Date
CN106951509A CN106951509A (en) 2017-07-14
CN106951509B true CN106951509B (en) 2019-08-09

Family

ID=59472070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710158859.9A Active CN106951509B (en) 2017-03-17 2017-03-17 Multi-tag coring canonical correlation analysis search method

Country Status (1)

Country Link
CN (1) CN106951509B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209145B (en) * 2019-05-16 2020-09-11 浙江大学 Carbon dioxide absorption tower fault diagnosis method based on nuclear matrix approximation
CN111930972B (en) * 2020-08-04 2021-04-27 山东大学 Cross-modal retrieval method and system for multimedia data by using label level information
CN113361198B (en) * 2021-06-09 2023-11-03 南京大学 Crowd-sourced test report fusion method based on public and private information mining
CN115599984B (en) * 2022-09-09 2023-06-09 北京理工大学 Retrieval method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521368A (en) * 2011-12-16 2012-06-27 武汉科技大学 Similarity matrix iteration based cross-media semantic digesting and optimizing method
CN103995903A (en) * 2014-06-12 2014-08-20 武汉科技大学 Cross-media search method based on isomorphic subspace mapping and optimization
CN104166982A (en) * 2014-06-30 2014-11-26 复旦大学 Image optimization clustering method based on typical correlation analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9613118B2 (en) * 2013-03-18 2017-04-04 Spotify Ab Cross media recommendation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521368A (en) * 2011-12-16 2012-06-27 武汉科技大学 Similarity matrix iteration based cross-media semantic digesting and optimizing method
CN103995903A (en) * 2014-06-12 2014-08-20 武汉科技大学 Cross-media search method based on isomorphic subspace mapping and optimization
CN104166982A (en) * 2014-06-30 2014-11-26 复旦大学 Image optimization clustering method based on typical correlation analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval;Yuhua Jia 等;《Multimedia Tools and Applications》;20180227;1-20 *

Also Published As

Publication number Publication date
CN106951509A (en) 2017-07-14

Similar Documents

Publication Publication Date Title
Yao et al. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model
CN106649715B (en) A kind of cross-media retrieval method based on local sensitivity hash algorithm and neural network
CN106951509B (en) Multi-tag coring canonical correlation analysis search method
CN111506714A (en) Knowledge graph embedding based question answering
CN103559191B (en) Based on latent space study and Bidirectional sort study across media sort method
CN103617157A (en) Text similarity calculation method based on semantics
CN111325243B (en) Visual relationship detection method based on regional attention learning mechanism
CN105930873B (en) A kind of walking across mode matching method certainly based on subspace
CN113705570B (en) Deep learning-based few-sample target detection method
CN103995903A (en) Cross-media search method based on isomorphic subspace mapping and optimization
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN113705218A (en) Event element gridding extraction method based on character embedding, storage medium and electronic device
CN111666766A (en) Data processing method, device and equipment
Maher et al. Effectiveness of different similarity measures for text classification and clustering
KR20120047622A (en) System and method for managing digital contents
CN112131453A (en) Method, device and storage medium for detecting network bad short text based on BERT
CN105701227B (en) A kind of across media method for measuring similarity and search method based on local association figure
Jia et al. Semantically-enhanced kernel canonical correlation analysis: a multi-label cross-modal retrieval
CN113821702A (en) Urban multidimensional space multivariate heterogeneous information data processing method
CN114491071A (en) Food safety knowledge graph construction method and system based on cross-media data
CN107633259B (en) Cross-modal learning method based on sparse dictionary representation
Mandal et al. Query specific re-ranking for improved cross-modal retrieval
CN111177492A (en) Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization
Li et al. A novel relevance feedback method in content-based image retrieval
Patel et al. A survey on context based similarity techniques for image retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant