CN105808752B

CN105808752B - A kind of automatic image marking method based on CCA and 2PKNN

Info

Publication number: CN105808752B
Application number: CN201610144113.8A
Authority: CN
Inventors: 孙亮; 王雪莲; 葛宏伟; 谭国真
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2016-03-10
Filing date: 2016-03-10
Publication date: 2018-04-10
Anticipated expiration: 2036-03-10
Also published as: CN105808752A

Abstract

The invention belongs to the Computer Applied Technology field theories of learning and using subdomains, it is related to a kind of automatic image marking method based on CCA and 2PKNN, to solve the problems, such as semantic gap present in automatic image annotation task, weak mark and class imbalance.Firstly, for semantic gap problem, by two Feature Mappings to CCA subspaces, the distance of two features is sought in the subspace；For weak mark problem, a semantic space is built to each label；For class imbalance problem, with reference to KNN algorithms, k that test image is found in the semantic space of each label are closest, they are formed into image subset, using the visible sensation distance of the subset and test image, with reference to Bayesian formula, the several labels of fraction highest are assigned to test image.Finally, using the correlation between label, image labeling result is optimized.This method has to image labeling performance and largely improved.

Description

A kind of automatic image marking method based on CCA and 2PKNN

Technical field

It is absorbed in automated graphics the invention belongs to the Computer Applied Technology field theories of learning and using subdomains, the present invention Mark problem.A kind of automatic image marking method based on CCA and KNN is proposed, to solve to deposit in automatic image annotation task Semantic gap, it is weak mark and class imbalance problem.First, it is the global and local totally 15 kinds of features of each image zooming-out, The distance between low-level image feature and high-level semantic are tried to achieve using CCA respectively to each feature, above-mentioned distance has been merged and has been formed most Whole distance, solves the problems, such as semantic gap.The semantic space of each label can be obtained according to the distance, merges each label The image of original mark, so as to form a more perfect semantic space.Solves the problems, such as weak mark.For each label from K neighbours of KNN algorithms selections are utilized in its semantic space, i.e., semantic neighbours, form a subset.Each label in the subset Image number relative equilibrium, solve class label imbalance problem.For test image, according to test image and the subset In image between visible sensation distance obtain the probability of test image and each label, therefrom selection with the several marks of probability highest Label, the annotation results as the image.Finally using the Semantic Similarity between label and label, improve image labeling result.Should Method has effectively taken into account accuracy of identification and arithmetic speed.

Background technology

As multimedia shares website such as Flickr and social networks such as Facebook high speed development, image and video counts Measure growing day by day.How the image of magnanimity is effectively stored, management and retrieval turn into a stern challenge and urgent Demand.In order to promote sharing and searching for for image, then image labeling information (describes the semanteme of image, place, appearance etc. Characteristic) serve vital effect.But although manual image mark correctness is higher, for large nuber of images, manually Image labeling is lost time very much.Therefore, automatic image annotation causes the concern of numerous researchers.And with amount of images Increase, it will become more next for some related applications of image, such as picture search, pattern-recognition etc., automatic image annotation It is more important.

Automatic image annotation, i.e., to a no label or there is the image of few label, according to its low-level image feature, by calculating Machine automatic seeking is found out and can effectively describe the text of its semantic content and describe label.At present, automatic image annotation problem obtains Extensive research is arrived.Its research method is roughly divided into three major types, the dimensioning algorithm based on classification, and its core concept is according to Image data base through having marked carrys out potential relation and the mapping that simultaneous goes out between semantic key words and characteristics of image, then for The unknown images provided predict its mark, and the representative method of the automatic image annotation algorithm based on classification has：Branch Hold vector machine (Support Vector Machine, SVM), Bayes's the point estimation method, and the machine learning such as traditional decision-tree Method.Based on the image labeling algorithm of probabilistic correlation modeling, its main thought is one probability statistics model of structure, passes through this mould Type come calculate picture material and mark word between joint probability.Duygulu et al. proposes translation model (Translation Model, TM) mark the different regions of image.The one kind of method based on graphics habit as semi-supervised learning algorithm, it is led It is to realize the transmission of the information of vision content and text key word between image by way of cooperation to want thought, forms one Complementary transmission.

As the research to image labeling problem is more and more deeper, researchers have found there be some visitors during image labeling The problem of sight is present, mainly have it is following some.(a) semantic gaps problem, it is mainly reflected in the high-level semantic letter that user is understood Otherness between the bottom visual signature of breath and image in itself.The reason for causing semantic gap problem is roughly divided into following several Kind：1. because visual signature dimension is fixed, it is difficult to avoid losing vision content in characteristic extraction procedure so that image is special Sign can not give full expression to the vision implication of image in itself.2. due to vision content somewhat like in image, easily cause vision Wide gap between content and semantic label.3. because semantic content is understanding of the observer to picture material, there is very strong master The property seen.Understanding of the different users to same image is different.(b) the weak mark problems of, the label in training set are not marked completely Remember all labels related to the image in tally set.(c) class imbalances problem, the amount of images of each label mark Have a long way to go.For example, in corel5k data sets, each label for labelling image number scope is [22,1004].ESP Game In data set, each label for labelling image number scope is each label for labelling in [172,4553] .IAPRTC-12 data sets Image number scope is [153,4999].And in these three data sets, label is concentrated with the image number of 75% label for labelling Less than the average image number of label for labelling.It can be seen that serious class imbalance in data set be present.

Many researchers are directed to above-mentioned problem, propose some solutions.What Guillaumin, M et al. were proposed Tagprop(Guillaumin,M.,Mensink,T.,Verbeek,J.,and Schmid,C.Tagprop: Discriminative metric learning in nearest neighbor models for image auto‐ annotation.In Computer Vision,2009IEEE 12th International Conference on, Pp.309-316.Ieee, 2009.) it is that a discriminate trains closest model.It is embedded into chi by combining multiple images description Space learning is spent, and by overcoming label Sparse Problems to specific label specified weight.The 2PKNN of the propositions such as Yashaswi (Y.Verma and C.V.Jawahar.Image annotation using metric learning in semantic Neighborhoods.In ECCV ' 12, pages 836-849,2012.) etc. model, by two steps operate solve be used for solve Certainly class imbalance problem.The first step uses image-label similitude, and second step uses image-image similarity, combined two The advantages of person.Multi views NMF-KNN (Kalayeh M M, the Idrees H, Shah for the weight extension that Mahdi Md et al. are proposed M.NMF‐KNN:image annotation using weighted multi‐view non‐negative matrix factorization[C]//Computer Vision and Pattern Recognition(CVPR),2014IEEE Conference on.IEEE,2014:184-191.) learned using consistent constraint is added on the coefficient matrix of different characteristic Practise the generation model of ad hoc inquiry between closest feature and label.Yashaswi et al. propose KSVM-VT (Verma Y, Jawahar C V.Exploring svm for image annotation in presence of confusing Labels [C] //Proceedings of the 24th British Machine Vision Conference.2013.) use In solving the problems, such as weak mark, by adding a tolerance parameter in being lost in hinge.The parameter is similar by vision Property and data set statistics automatically determine.This, which allows for the new model, can more tolerate the sample classification mistake with confusion label.Though These right methods improve problem present in image labeling in terms of some, but they do not make full use of image-image, Image-label, the similitude between tag-tag.Substantially, image is semantic related to label.Such as ' sea ' and ' seabeach ' It is likely to appear in same image, and ' sea ' and ' garden ' seldom occurs in same image.Phase between this label Closing property can be applied in the Tag Estimation of unknown images.

The content of the invention

The technical problem to be solved in the present invention is due to semantic gap, weak mark, classification be present not in automatic image annotation The problems such as balance, the problem of efficiency and accuracy can not be taken into account, it is proposed that a kind of automatic image annotation based on CCA and 2PKNN Method.

Technical scheme is as follows：

In order to take into account efficiency, image-image is considered, image-label, the similitude between tag-tag.Image-figure The similitude of picture is obtained by distance between characteristics of image.Similitude is by the way that characteristics of image and label characteristics are reflected between image-label CCA subspaces are mapped to, are obtained in the subspace using COS distance.The co-occurrence matrix that similitude passes through label between tag-tag Obtained with COS distance.First, to each image zooming-out feature, redundancy and uncorrelated data are then removed to Feature Dimension Reduction.So Characteristics of image and label mapping are tried to achieve into the similitude between feature and label to CCA subspaces in the subspace afterwards.According to the phase Like property, the corresponding image subset of each label is obtained, integrates the original image set of each label, so as to form more perfect figure Image set, the semantic neighbours of referred to as each label.For each test image, using KNN algorithms, in the semantic neighbours of each label Middle selection and the maximally related k neighbours of the test image.A subset is formed by these neighbours, utilizes test image and this height The visible sensation distance of image is concentrated to obtain its probability with each label, the several labels for selecting maximum probability are the mark of the image Label.Similitude between tag-tag is finally utilized, improves the result of image labeling.Comprise the following steps that：

Whole method includes image preprocessing, the structure of semantic neighbours, the image labeling process of 2PKNN algorithms and image mark Four parts of note optimization.

First, image preprocessing part

Make X={ x₁,…,x_i,…,x_nIt is n image collection, Z={ l₁..., l_i..., l_cFor the word set of c label. Training set T={ (x₁, y₁) ..., (x_i, y_i) ..., (x_n, y_n) by image x_iThe corresponding label y with it_iForm.y_iFor one two The vectorial y of system_i∈{0,1}^c,y_i(k) image x=1 is represented_iIt is labelled with k-th of label.Otherwise y_i(k)=0.Therefore, corresponding mark Label matrix is Y=[y₁, y₂... ..., y_n]^T=[y (1), y (2) ... ..., y (c)].

The diversity of image is considered, using the method for multi-features.Extract first each image 15 overall situations and Local feature.Global characteristics include GIST, RGB, LAB and hsv color histogram.Local feature includes the hue of SIFT and robust Feature, it is both intensive to extract multiple dimensioned grid and Harris-Laplacian points of interest.In order to obtain the space of image letter Breath, color histogram is calculated in three same levels segmentation of each image, SIFT and hue features (are designated as V3H1).Due to Some intrinsic dimensionalities are very high after feature extraction, contain many irrelevant and redundant datas.So using square Feature Mapping The method of (Homogeneous feature mapping) carries out Feature Dimension Reduction.In order to calculate the distance between two features, for Color histogram, using KL distances.For GIST, using L2 distances.L1 distances are used for SIFT and hue features.Finally melt All distances are closed, obtain final distance.

2nd, the structure of semantic neighbours

Due to, canonical correlation analysis be reflected using the dependency relation between generalized variable pair it is whole between two groups of indexs The Multielement statistical analysis method of body correlation.And can be using characteristics of image and semantic feature as two groups of indexs, so as to utilize allusion quotation The correlation between characteristics of image and semantic feature is obtained in type correlation analysis.

If characteristics of image X is n*d exponent number evidences, semantic feature Y is n*c exponent number evidences, X row x (1), x (2) ... ..., x (d) It is one group of variable, Y row y (1), y (2) ..., y (c) are another group of variable.First standardized using preceding, obtain new X, Y.It is corresponding Any linear combination of this two groups of variables is expressed as：

The purpose of canonical correlation analysis is just to solve for causing the related ρ (a, b) between any combination U, V of this two groups of variables A when maximum, b value.ρ (a, b) is：

Wherein, Cov (U, V) represents U, V covariance；E represents it is expected.S₁₁=E (X'X), S₂₂=E (Y'Y), S₁₂=E (X'Y),S₁₂=S₂₁', D represents variance, and a, b are canonical variables.

The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V is maximum Change.The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V maximizes.In order to It is easy to solve, formula (2) denominator is set to 1, i.e.,：

a'S₁₁A=b'S₂₂B=1 (3)

So problem is just changed into solving a'S under the conditions of formula (3)₁₂B max problem, utilize Lagrangian method, order：

Wherein, λ, u Lagrange multiplier.Making formula (4), u local derviation is 0 to λ.Obtain a for making ψ maximum, b values.So as to To projection matrix U, V.

By characteristics of image and semantic feature respectively by U, V is mapped to CCA proper subspaces, is then asked using COS distance Obtain the similitude between image and label.Using the similitude, maximally related several images are selected for each label, with reference to each The original mark image of label, obtains the image subset of each label.It is defined asRepresent that training image is concentrated comprising mark Sign l_k, k ∈ [1 ... c] image subset.Each label can be obtained from label matrix Y=[y (1), y (2) ..., y (c)] Corresponding image subsetThe two subsets are merged, form final subset G_k=X_k∪T_K,k∈[1…c].So as to More perfect image subset is obtained, solves the problems, such as the weak mark of label.Claim the semantic neighbours of the subset test image.

3rd, the image labeling of 2PKNN algorithms

For a unknown images, k semantic neighbour is selected in the semantic neighbours of each label first with KNN algorithms Occupy, the final image labeling result of the semantic neighbors decided.Assumed condition probability P (J | y_i) simulate the premise for giving semantic label Hypograph J feature distribution.Therefore, image labeling problem is converted into the problem of seeking posterior probability.I.e.：

P(y_i) represent label y_i, i ∈ [1 ... c] prior probability, therefore, a unknown images J is given, by below equation Label corresponding to it can be obtained

Selecting makes y^*The label of maximum several labels, as image J.

Because the image number of each label for labelling in tally set is uneven, in order to solve the problem, using with two step KNN Algorithm.From each semantic neighbours G_kIt is middle that the k1 image similar to unknown images J is selected using KNN algorithms.So as to form one The subset T related to unknown images J_J={ T_J,1, T_J,2... ..., T_J,C}.In the subset, the image number of each label is relative Balance.Then in the subset, with reference to unknown images J and the visible sensation distance of subset image, unknown images J and each label are drawn Probability.That is, a label y is given_kUnknown images J probability under conditions of ∈ Z：

WhereinD (J, I_i) for the visible sensation distance of test image and training image.P(y_k|I_i) =δ (y_k∈Y_i) it is defined as label y_kIn image I_iPresence and shortage in corresponding tally set.δ () is 1 expression label y_k Image I_iPresence in corresponding tally set.δ () is that 0 expression is not present.W is bandwidth, and span is [1,30].It is because every The semantic neighbor image number relative equilibrium of individual label the, so P (y in formula (5)_i) to be uniformly distributed, so by formula (7) band Enter in formula (5), unknown images J label probability will be obtained.

4th, image labeling optimizes

In view of the correlation between word, in large data sets, the standard of mark can be improved using the correlation between word True property.The mark word co-occurrence matrix C of training set, each Elements C of co-occurrence matrix are calculated first_ijRepresent that label i and label j is same When there is image frequency in one image.Co-occurrence matrix is normalized：

Sim (x, y) describes mark word l_x,l_yBetween similarity, for the semanteme between more accurate descriptive semantics word Similarity, use COS distance formula.

Not Biao Shi sim matrixes row vector.Cos be c × c word between correlation matrix, c be mark word sum.Finally Mark word optimization formula be：

R=γ * Cos*P+ (1- γ) * P (10)

Wherein, P is the image Posterior probability distribution matrix that step 3 obtains.Cos correlation matrixes between word.γ is balance Coefficient, its span are：0≤γ≤1.R is final image posterior probability.It is therefrom each image selection probability highest Several labels are its label.

Brief description of the drawings

Fig. 1 automatic image annotation basic frameworks.

Automatic image annotation flow charts of the Fig. 2 based on CCA and 2PKNN.

Frequency of the word in image library is marked in Fig. 3 Corel5k databases.

Frequency of the word in image library is marked in Fig. 4 ESP game databases.

Frequency of the word in image library is marked in Fig. 5 IAPR TC12 databases.

Variation tendency of the performance with K values is marked on Fig. 6 Corel5k data sets.

Variation tendency of the performance with K values is marked on Fig. 7 ESP Game data sets.

Variation tendency of the performance with K values is marked on Fig. 8 IAPR TC-12 data sets.

Embodiment

Embodiments of the invention are described in detail below in conjunction with technical scheme and accompanying drawing.

1st, data set is determined, what the present invention selected is three standard picture labeled data collection, is Corel5k, ESP respectively Game,IAPR TC-12.For Corel5k data sets, including 4999 images, 260 labels, 4500 figures therein are selected It is remaining to be used as test set as being used as training set.For ESP Game data sets, including 20770 images, 268 labels.Choosing Wherein 18689 images are selected as training set, it is remaining to be used as test set.For IAPR TC-12 data sets, including 19627 Image, 291 labels.17665 images therein are selected as training set, it is remaining to be used as test set.

2nd, feature extraction and normalization, to each image, its global characteristics and local feature are extracted, global characteristics include GIST, RGB, Lab and hsv color histogram.Local feature includes the hue features of SIFT and robust.Then dimension is higher than The feature of 1000 dimensions drops to 1000 dimensions using square Feature Mapping (homogeneous feature map).And to each feature Standardized using gamma.

3rd, algorithm parameter is set, the closest k1 of each label, k1 ∈ [1,20], bandwidth parameter w, w ∈ [1,30], balance Coefficient gamma, 0≤γ≤1, number of tags nl, the n1 ∈ [1,23] of each image labeling.

4th, by formula (1) to formula (4), CCA models are trained.Then the visual signature of image and semantic feature are projected respectively To CCA subspaces, COS distance is utilized in the subspace, seeks the similitude between visual signature and semantic feature, merges 15 Similarity matrix, the similitude formed between final image and label.

5th, build semantic neighbours, the similitude between the image tag obtained according to step 4, obtain the related figure of each label As subset, with reference to the original mark image subset of each label, a more perfect image subset is formed, is referred to as semantic adjacent Occupy.

6th, the visible sensation distance between test image and training image is calculated, is calculated for different features using different distances Method, form final distance.For Lab color histogram features, using KL distances.It is special for SIFT, Hue, HSV, RGB Sign, using L1 distances.For GIST features, using L2 distances.All distances are finally merged, form final distance.

7th, for each test image, the test image most phase in the semantic neighbours of each label is calculated using KNN algorithms K1 of pass are neighbouring, integrate k1 of each label neighbouring, one image subset of formation.By the figure in test image and the subset The visible sensation distance of picture, bring formula (7) into and obtain the probability between the test image and each label.Selection and test from the probability The maximally related several labels of image, the label as the image.So as to obtain initial annotation results.

8th, the correlation between tag-tag is calculated.The co-occurrence matrix between tag-tag is calculated first.Then formula is passed through (8) normalize.Bring normalized matrix into formula (9) and obtain the similitude between label and label.

9th, final Marking Probability is calculated.The similarity matrix that the probability matrix and step 8 that step 7 is obtained obtain is brought into Formula (10) obtains final probability.

Claims

1. a kind of automatic image marking method based on CCA and 2PKNN, including image preprocessing, semantic neighbours structure, The image labeling process and image labeling of 2PKNN algorithms optimize four parts；It is characterized in that comprise the following steps that：

(1) image preprocessing

Make X={ x₁,…,x_i,…,x_nIt is n image collection, Z={ l₁..., l_x..., l_cFor the word set of c label；Training Collect T={ (x₁, y₁) ..., (x_i, y_i) ..., (x_n, y_n) by image x_iThe corresponding label y with it_iForm；y_iFor a binary system Vectorial y_i∈{0,1}^c,y_i(k) image x=1 is represented_iIt is labelled with k-th of label；Otherwise y_i(k)=0；Corresponding label matrix is Y =[y₁,y₂,…,y_n]^T=[y₍₁₎,y₍₂₎,…,y_(c)], wherein, x_iRepresent the d dimension row vectors of i-th image, y_iRepresent i-th The c dimension row vectors of image；；

15 global and local features of each image are extracted first；Global characteristics include GIST, RGB, and LAB and hsv color are straight Fang Tu；Local feature includes the hue features of SIFT and robust, both the multiple dimensioned grid of intensive extraction and Harris- Laplacian points of interest；In order to obtain the spatial information of image, color is calculated in three same levels segmentation of each image Histogram, SIFT and hue features, wherein, hue features are designated as V3H1；Feature Dimension Reduction is carried out using the method for square Feature Mapping； The distance between two features is calculated, for color histogram, using KL distances；For GIST, using L2 distances；For SIFT and Hue features, using L1 distances；All distances are finally merged, obtain final distance；

(2) structure of semantic neighbours

If characteristics of image X is n*d exponent number evidences, semantic feature Y is n*c exponent number evidences, X row x₍₁₎,x₍₂₎,…,x_(d)It is one group of change Amount, Y row y₍₁₎,y₍₂₎,…,y_(c)For another group of variable；First standardize, obtain new X, Y；Correspond to any of this two groups of variables Linear combination is expressed as：

<mrow> <mi>U</mi> <mo>=</mo> <mi>X</mi> <mi>a</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </munderover> <msub> <mi>a</mi> <mi>i</mi> </msub> <msub> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msub> <mo>,</mo> <mi>V</mi> <mo>=</mo> <mi>Y</mi> <mi>b</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>c</mi> </munderover> <msub> <mi>b</mi> <mi>i</mi> </msub> <msub> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein, a_i＞ 0 is represented to x₍₁₎,x₍₂₎,…,x_(d)Carry out x during linear combination_(i)Coefficient, the row that all d coefficients are formed Vector representation is a；b_i＞ 0 is represented to y₍₁₎,y₍₂₎,…,y_(c)Carry out y during linear combination_(i)Coefficient, all c coefficients form Row vector be expressed as b；

The purpose of canonical correlation analysis is just to solve for causing that the related ρ (a, b) between any combination U, V of this two groups of variables is maximum When a, b values, ρ (a, b) is：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>&rho;</mi> <mrow> <mo>(</mo> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mi>o</mi> <mi>v</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msup> <mi>X</mi> <mo>&prime;</mo> </msup> <mi>Y</mi> <mi>b</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msup> <mi>X</mi> <mo>&prime;</mo> </msup> <mi>X</mi> <mi>a</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <msqrt> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>b</mi> <mo>&prime;</mo> </msup> <msup> <mi>Y</mi> <mo>&prime;</mo> </msup> <mi>Y</mi> <mi>b</mi> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>12</mn> </msub> <mi>b</mi> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>11</mn> </msub> <msup> <mi>ab</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>22</mn> </msub> <mi>b</mi> <mo>)</mo> </mrow> <mrow> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein, Cov (U, V) represents U, V covariance；E represents it is expected；S₁₁=E (X'X), S₂₂=E (Y'Y), S₁₂=E (X'Y), S₁₂=S₂₁', D represents variance, and a, b are canonical variables, a ', b ' representation vector a, b respectively transposition, X ', Y ' matrix is represented respectively X, Y transposition；

The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V maximizes；Will Formula (2) denominator is set to 1, i.e.,：

a'S₁₁A=b'S₂₂B=1 (3)

Formula solves a'S under the conditions of (3)₁₂B max problem, utilize Lagrange decree：

<mrow> <mi>&psi;</mi> <mo>=</mo> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>12</mn> </msub> <mi>b</mi> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>&lambda;</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>11</mn> </msub> <mi>a</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>&mu;</mi> <mrow> <mo>(</mo> <msup> <mi>b</mi> <mo>&prime;</mo> </msup> <msub> <mi>S</mi> <mn>22</mn> </msub> <mi>b</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, λ, u Lagrange multiplier；Making formula (4), u local derviation is 0 to λ；The a for making ψ maximum is obtained, b values, obtains projecting square Battle array U, V；

By characteristics of image and semantic feature respectively by U, V is mapped to CCA proper subspaces, then tries to achieve figure using COS distance Picture and the similitude between label；Using the similitude, maximally related several images are selected for each label, with reference to each label Original mark image, obtain the image subset of each label；It is defined asRepresent that training image is concentrated and include label l_k, K ∈ [1...c] image subset；From label matrix Y=[y₍₁₎,y₍₂₎,…,y_(c)] in can obtain scheming corresponding to each label As subsetWe merge the two subsets, form final subset G_k=X_k∪T_K, k ∈ [1...c], so as to obtain More perfect image subset, claim the semantic neighbours of the subset test image；

(3) image labeling of 2PKNN algorithms

For a unknown images, k semantic neighbours are selected in the semantic neighbours of each label first with KNN algorithms, should The final image labeling result of semantic neighbors decided；Assumed condition probability P (J | y_i) scheme on the premise of the given semantic label of simulation As J feature distribution；Image labeling problem is converted into the problem of seeking posterior probability；I.e.：

P(y_i) represent label y_i, i ∈ [1...c] prior probability, a unknown images J is given, it is right to be obtained by the following formula it The label answered：Selecting makes y^*The label of maximum several labels, as image J；

From each semantic neighbours G_kIt is middle that the k1 image similar to unknown images J is selected using KNN algorithms；So as to formed one with Subset T related unknown images J_J={ T_J,1, T_J,2... ..., T_J,C}；In the subset, the image number of each label is relatively flat Weighing apparatus；Then in the subset, with reference to unknown images J and the visible sensation distance of subset image, unknown images J and each label are drawn Probability；That is, a label y is given_kUnknown images J probability under conditions of ∈ Z：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>J</mi> <mo>|</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>T</mi> <mi>J</mi> </msub> </mrow> </munder> <msub> <mi>&theta;</mi> <mrow> <mi>J</mi> <mo>,</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>&CenterDot;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>T</mi> <mi>J</mi> </msub> </mrow> </munder> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mi>w</mi> <mi>D</mi> <mo>(</mo> <mrow> <mi>J</mi> <mo>,</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mi>&delta;</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>&Element;</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

WhereinD(J,I_i) for the visible sensation distance of test image and training image；P(y_k|I_i)=δ (y_k∈Y_i) it is defined as label y_kIn image I_iPresence and shortage in corresponding tally set；δ () is 1 expression label y_kIn image I_iPresence in corresponding tally set；δ () is that 0 expression is not present；W is bandwidth, and span is [1,30]；Because each mark The semantic neighbor image number relative equilibrium of label the, so P (y in formula (5)_i) to be uniformly distributed, bring formula (7) into formula (5) In, unknown images J label probability will be obtained；

(4) image labeling optimizes

The mark word co-occurrence matrix C of training set, each Elements C of co-occurrence matrix are calculated first_ijRepresent that label i and label j is same When there is image frequency in one image；Co-occurrence matrix is normalized：

Sim (x, y) describes mark word l_x,l_yBetween similarity, in order to semantic similar between more accurate descriptive semantics word Degree, uses COS distance formula；

<mrow> <mi>C</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mover> <mi>a</mi> <mo>&RightArrow;</mo> </mover> <mo>&CenterDot;</mo> <mover> <mi>b</mi> <mo>&RightArrow;</mo> </mover> </mrow> <mrow> <mrow> <mo>|</mo> <mi>a</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>b</mi> <mo>|</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

I-th and j-th of row vector of sim matrixes are represented respectively；Cos be c × c word between correlation matrix, c be mark Word sum；Finally the optimization formula of mark word is：

R=γ * Cos*P+ (1- γ) * P (10)

Wherein, P is the image Posterior probability distribution matrix that step 3 obtains；Cos correlation matrixes between word；γ is balance system Number, its span are：0≤γ≤1；R is final image posterior probability, therefrom several for each image selection probability highest Individual label is its label.