CN105808752B - A kind of automatic image marking method based on CCA and 2PKNN - Google Patents

A kind of automatic image marking method based on CCA and 2PKNN Download PDF

Info

Publication number
CN105808752B
CN105808752B CN201610144113.8A CN201610144113A CN105808752B CN 105808752 B CN105808752 B CN 105808752B CN 201610144113 A CN201610144113 A CN 201610144113A CN 105808752 B CN105808752 B CN 105808752B
Authority
CN
China
Prior art keywords
mrow
image
msub
label
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610144113.8A
Other languages
Chinese (zh)
Other versions
CN105808752A (en
Inventor
孙亮
王雪莲
葛宏伟
谭国真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201610144113.8A priority Critical patent/CN105808752B/en
Publication of CN105808752A publication Critical patent/CN105808752A/en
Application granted granted Critical
Publication of CN105808752B publication Critical patent/CN105808752B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Abstract

The invention belongs to the Computer Applied Technology field theories of learning and using subdomains, it is related to a kind of automatic image marking method based on CCA and 2PKNN, to solve the problems, such as semantic gap present in automatic image annotation task, weak mark and class imbalance.Firstly, for semantic gap problem, by two Feature Mappings to CCA subspaces, the distance of two features is sought in the subspace;For weak mark problem, a semantic space is built to each label;For class imbalance problem, with reference to KNN algorithms, k that test image is found in the semantic space of each label are closest, they are formed into image subset, using the visible sensation distance of the subset and test image, with reference to Bayesian formula, the several labels of fraction highest are assigned to test image.Finally, using the correlation between label, image labeling result is optimized.This method has to image labeling performance and largely improved.

Description

A kind of automatic image marking method based on CCA and 2PKNN
Technical field
It is absorbed in automated graphics the invention belongs to the Computer Applied Technology field theories of learning and using subdomains, the present invention Mark problem.A kind of automatic image marking method based on CCA and KNN is proposed, to solve to deposit in automatic image annotation task Semantic gap, it is weak mark and class imbalance problem.First, it is the global and local totally 15 kinds of features of each image zooming-out, The distance between low-level image feature and high-level semantic are tried to achieve using CCA respectively to each feature, above-mentioned distance has been merged and has been formed most Whole distance, solves the problems, such as semantic gap.The semantic space of each label can be obtained according to the distance, merges each label The image of original mark, so as to form a more perfect semantic space.Solves the problems, such as weak mark.For each label from K neighbours of KNN algorithms selections are utilized in its semantic space, i.e., semantic neighbours, form a subset.Each label in the subset Image number relative equilibrium, solve class label imbalance problem.For test image, according to test image and the subset In image between visible sensation distance obtain the probability of test image and each label, therefrom selection with the several marks of probability highest Label, the annotation results as the image.Finally using the Semantic Similarity between label and label, improve image labeling result.Should Method has effectively taken into account accuracy of identification and arithmetic speed.
Background technology
As multimedia shares website such as Flickr and social networks such as Facebook high speed development, image and video counts Measure growing day by day.How the image of magnanimity is effectively stored, management and retrieval turn into a stern challenge and urgent Demand.In order to promote sharing and searching for for image, then image labeling information (describes the semanteme of image, place, appearance etc. Characteristic) serve vital effect.But although manual image mark correctness is higher, for large nuber of images, manually Image labeling is lost time very much.Therefore, automatic image annotation causes the concern of numerous researchers.And with amount of images Increase, it will become more next for some related applications of image, such as picture search, pattern-recognition etc., automatic image annotation It is more important.
Automatic image annotation, i.e., to a no label or there is the image of few label, according to its low-level image feature, by calculating Machine automatic seeking is found out and can effectively describe the text of its semantic content and describe label.At present, automatic image annotation problem obtains Extensive research is arrived.Its research method is roughly divided into three major types, the dimensioning algorithm based on classification, and its core concept is according to Image data base through having marked carrys out potential relation and the mapping that simultaneous goes out between semantic key words and characteristics of image, then for The unknown images provided predict its mark, and the representative method of the automatic image annotation algorithm based on classification has:Branch Hold vector machine (Support Vector Machine, SVM), Bayes's the point estimation method, and the machine learning such as traditional decision-tree Method.Based on the image labeling algorithm of probabilistic correlation modeling, its main thought is one probability statistics model of structure, passes through this mould Type come calculate picture material and mark word between joint probability.Duygulu et al. proposes translation model (Translation Model, TM) mark the different regions of image.The one kind of method based on graphics habit as semi-supervised learning algorithm, it is led It is to realize the transmission of the information of vision content and text key word between image by way of cooperation to want thought, forms one Complementary transmission.
As the research to image labeling problem is more and more deeper, researchers have found there be some visitors during image labeling The problem of sight is present, mainly have it is following some.(a) semantic gaps problem, it is mainly reflected in the high-level semantic letter that user is understood Otherness between the bottom visual signature of breath and image in itself.The reason for causing semantic gap problem is roughly divided into following several Kind:1. because visual signature dimension is fixed, it is difficult to avoid losing vision content in characteristic extraction procedure so that image is special Sign can not give full expression to the vision implication of image in itself.2. due to vision content somewhat like in image, easily cause vision Wide gap between content and semantic label.3. because semantic content is understanding of the observer to picture material, there is very strong master The property seen.Understanding of the different users to same image is different.(b) the weak mark problems of, the label in training set are not marked completely Remember all labels related to the image in tally set.(c) class imbalances problem, the amount of images of each label mark Have a long way to go.For example, in corel5k data sets, each label for labelling image number scope is [22,1004].ESP Game In data set, each label for labelling image number scope is each label for labelling in [172,4553] .IAPRTC-12 data sets Image number scope is [153,4999].And in these three data sets, label is concentrated with the image number of 75% label for labelling Less than the average image number of label for labelling.It can be seen that serious class imbalance in data set be present.
Many researchers are directed to above-mentioned problem, propose some solutions.What Guillaumin, M et al. were proposed Tagprop(Guillaumin,M.,Mensink,T.,Verbeek,J.,and Schmid,C.Tagprop: Discriminative metric learning in nearest neighbor models for image auto‐ annotation.In Computer Vision,2009IEEE 12th International Conference on, Pp.309-316.Ieee, 2009.) it is that a discriminate trains closest model.It is embedded into chi by combining multiple images description Space learning is spent, and by overcoming label Sparse Problems to specific label specified weight.The 2PKNN of the propositions such as Yashaswi (Y.Verma and C.V.Jawahar.Image annotation using metric learning in semantic Neighborhoods.In ECCV ' 12, pages 836-849,2012.) etc. model, by two steps operate solve be used for solve Certainly class imbalance problem.The first step uses image-label similitude, and second step uses image-image similarity, combined two The advantages of person.Multi views NMF-KNN (Kalayeh M M, the Idrees H, Shah for the weight extension that Mahdi Md et al. are proposed M.NMF‐KNN:image annotation using weighted multi‐view non‐negative matrix factorization[C]//Computer Vision and Pattern Recognition(CVPR),2014IEEE Conference on.IEEE,2014:184-191.) learned using consistent constraint is added on the coefficient matrix of different characteristic Practise the generation model of ad hoc inquiry between closest feature and label.Yashaswi et al. propose KSVM-VT (Verma Y, Jawahar C V.Exploring svm for image annotation in presence of confusing Labels [C] //Proceedings of the 24th British Machine Vision Conference.2013.) use In solving the problems, such as weak mark, by adding a tolerance parameter in being lost in hinge.The parameter is similar by vision Property and data set statistics automatically determine.This, which allows for the new model, can more tolerate the sample classification mistake with confusion label.Though These right methods improve problem present in image labeling in terms of some, but they do not make full use of image-image, Image-label, the similitude between tag-tag.Substantially, image is semantic related to label.Such as ' sea ' and ' seabeach ' It is likely to appear in same image, and ' sea ' and ' garden ' seldom occurs in same image.Phase between this label Closing property can be applied in the Tag Estimation of unknown images.
The content of the invention
The technical problem to be solved in the present invention is due to semantic gap, weak mark, classification be present not in automatic image annotation The problems such as balance, the problem of efficiency and accuracy can not be taken into account, it is proposed that a kind of automatic image annotation based on CCA and 2PKNN Method.
Technical scheme is as follows:
In order to take into account efficiency, image-image is considered, image-label, the similitude between tag-tag.Image-figure The similitude of picture is obtained by distance between characteristics of image.Similitude is by the way that characteristics of image and label characteristics are reflected between image-label CCA subspaces are mapped to, are obtained in the subspace using COS distance.The co-occurrence matrix that similitude passes through label between tag-tag Obtained with COS distance.First, to each image zooming-out feature, redundancy and uncorrelated data are then removed to Feature Dimension Reduction.So Characteristics of image and label mapping are tried to achieve into the similitude between feature and label to CCA subspaces in the subspace afterwards.According to the phase Like property, the corresponding image subset of each label is obtained, integrates the original image set of each label, so as to form more perfect figure Image set, the semantic neighbours of referred to as each label.For each test image, using KNN algorithms, in the semantic neighbours of each label Middle selection and the maximally related k neighbours of the test image.A subset is formed by these neighbours, utilizes test image and this height The visible sensation distance of image is concentrated to obtain its probability with each label, the several labels for selecting maximum probability are the mark of the image Label.Similitude between tag-tag is finally utilized, improves the result of image labeling.Comprise the following steps that:
Whole method includes image preprocessing, the structure of semantic neighbours, the image labeling process of 2PKNN algorithms and image mark Four parts of note optimization.
First, image preprocessing part
Make X={ x1,…,xi,…,xnIt is n image collection, Z={ l1..., li..., lcFor the word set of c label. Training set T={ (x1, y1) ..., (xi, yi) ..., (xn, yn) by image xiThe corresponding label y with itiForm.yiFor one two The vectorial y of systemi∈{0,1}c,yi(k) image x=1 is representediIt is labelled with k-th of label.Otherwise yi(k)=0.Therefore, corresponding mark Label matrix is Y=[y1, y2... ..., yn]T=[y (1), y (2) ... ..., y (c)].
The diversity of image is considered, using the method for multi-features.Extract first each image 15 overall situations and Local feature.Global characteristics include GIST, RGB, LAB and hsv color histogram.Local feature includes the hue of SIFT and robust Feature, it is both intensive to extract multiple dimensioned grid and Harris-Laplacian points of interest.In order to obtain the space of image letter Breath, color histogram is calculated in three same levels segmentation of each image, SIFT and hue features (are designated as V3H1).Due to Some intrinsic dimensionalities are very high after feature extraction, contain many irrelevant and redundant datas.So using square Feature Mapping The method of (Homogeneous feature mapping) carries out Feature Dimension Reduction.In order to calculate the distance between two features, for Color histogram, using KL distances.For GIST, using L2 distances.L1 distances are used for SIFT and hue features.Finally melt All distances are closed, obtain final distance.
2nd, the structure of semantic neighbours
Due to, canonical correlation analysis be reflected using the dependency relation between generalized variable pair it is whole between two groups of indexs The Multielement statistical analysis method of body correlation.And can be using characteristics of image and semantic feature as two groups of indexs, so as to utilize allusion quotation The correlation between characteristics of image and semantic feature is obtained in type correlation analysis.
If characteristics of image X is n*d exponent number evidences, semantic feature Y is n*c exponent number evidences, X row x (1), x (2) ... ..., x (d) It is one group of variable, Y row y (1), y (2) ..., y (c) are another group of variable.First standardized using preceding, obtain new X, Y.It is corresponding Any linear combination of this two groups of variables is expressed as:
The purpose of canonical correlation analysis is just to solve for causing the related ρ (a, b) between any combination U, V of this two groups of variables A when maximum, b value.ρ (a, b) is:
Wherein, Cov (U, V) represents U, V covariance;E represents it is expected.S11=E (X'X), S22=E (Y'Y), S12=E (X'Y),S12=S21', D represents variance, and a, b are canonical variables.
The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V is maximum Change.The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V maximizes.In order to It is easy to solve, formula (2) denominator is set to 1, i.e.,:
a'S11A=b'S22B=1 (3)
So problem is just changed into solving a'S under the conditions of formula (3)12B max problem, utilize Lagrangian method, order:
Wherein, λ, u Lagrange multiplier.Making formula (4), u local derviation is 0 to λ.Obtain a for making ψ maximum, b values.So as to To projection matrix U, V.
By characteristics of image and semantic feature respectively by U, V is mapped to CCA proper subspaces, is then asked using COS distance Obtain the similitude between image and label.Using the similitude, maximally related several images are selected for each label, with reference to each The original mark image of label, obtains the image subset of each label.It is defined asRepresent that training image is concentrated comprising mark Sign lk, k ∈ [1 ... c] image subset.Each label can be obtained from label matrix Y=[y (1), y (2) ..., y (c)] Corresponding image subsetThe two subsets are merged, form final subset Gk=Xk∪TK,k∈[1…c].So as to More perfect image subset is obtained, solves the problems, such as the weak mark of label.Claim the semantic neighbours of the subset test image.
3rd, the image labeling of 2PKNN algorithms
For a unknown images, k semantic neighbour is selected in the semantic neighbours of each label first with KNN algorithms Occupy, the final image labeling result of the semantic neighbors decided.Assumed condition probability P (J | yi) simulate the premise for giving semantic label Hypograph J feature distribution.Therefore, image labeling problem is converted into the problem of seeking posterior probability.I.e.:
P(yi) represent label yi, i ∈ [1 ... c] prior probability, therefore, a unknown images J is given, by below equation Label corresponding to it can be obtained
Selecting makes y*The label of maximum several labels, as image J.
Because the image number of each label for labelling in tally set is uneven, in order to solve the problem, using with two step KNN Algorithm.From each semantic neighbours GkIt is middle that the k1 image similar to unknown images J is selected using KNN algorithms.So as to form one The subset T related to unknown images JJ={ TJ,1, TJ,2... ..., TJ,C}.In the subset, the image number of each label is relative Balance.Then in the subset, with reference to unknown images J and the visible sensation distance of subset image, unknown images J and each label are drawn Probability.That is, a label y is givenkUnknown images J probability under conditions of ∈ Z:
WhereinD (J, Ii) for the visible sensation distance of test image and training image.P(yk|Ii) =δ (yk∈Yi) it is defined as label ykIn image IiPresence and shortage in corresponding tally set.δ () is 1 expression label yk Image IiPresence in corresponding tally set.δ () is that 0 expression is not present.W is bandwidth, and span is [1,30].It is because every The semantic neighbor image number relative equilibrium of individual label the, so P (y in formula (5)i) to be uniformly distributed, so by formula (7) band Enter in formula (5), unknown images J label probability will be obtained.
4th, image labeling optimizes
In view of the correlation between word, in large data sets, the standard of mark can be improved using the correlation between word True property.The mark word co-occurrence matrix C of training set, each Elements C of co-occurrence matrix are calculated firstijRepresent that label i and label j is same When there is image frequency in one image.Co-occurrence matrix is normalized:
Sim (x, y) describes mark word lx,lyBetween similarity, for the semanteme between more accurate descriptive semantics word Similarity, use COS distance formula.
Not Biao Shi sim matrixes row vector.Cos be c × c word between correlation matrix, c be mark word sum.Finally Mark word optimization formula be:
R=γ * Cos*P+ (1- γ) * P (10)
Wherein, P is the image Posterior probability distribution matrix that step 3 obtains.Cos correlation matrixes between word.γ is balance Coefficient, its span are:0≤γ≤1.R is final image posterior probability.It is therefrom each image selection probability highest Several labels are its label.
Brief description of the drawings
Fig. 1 automatic image annotation basic frameworks.
Automatic image annotation flow charts of the Fig. 2 based on CCA and 2PKNN.
Frequency of the word in image library is marked in Fig. 3 Corel5k databases.
Frequency of the word in image library is marked in Fig. 4 ESP game databases.
Frequency of the word in image library is marked in Fig. 5 IAPR TC12 databases.
Variation tendency of the performance with K values is marked on Fig. 6 Corel5k data sets.
Variation tendency of the performance with K values is marked on Fig. 7 ESP Game data sets.
Variation tendency of the performance with K values is marked on Fig. 8 IAPR TC-12 data sets.
Embodiment
Embodiments of the invention are described in detail below in conjunction with technical scheme and accompanying drawing.
1st, data set is determined, what the present invention selected is three standard picture labeled data collection, is Corel5k, ESP respectively Game,IAPR TC-12.For Corel5k data sets, including 4999 images, 260 labels, 4500 figures therein are selected It is remaining to be used as test set as being used as training set.For ESP Game data sets, including 20770 images, 268 labels.Choosing Wherein 18689 images are selected as training set, it is remaining to be used as test set.For IAPR TC-12 data sets, including 19627 Image, 291 labels.17665 images therein are selected as training set, it is remaining to be used as test set.
2nd, feature extraction and normalization, to each image, its global characteristics and local feature are extracted, global characteristics include GIST, RGB, Lab and hsv color histogram.Local feature includes the hue features of SIFT and robust.Then dimension is higher than The feature of 1000 dimensions drops to 1000 dimensions using square Feature Mapping (homogeneous feature map).And to each feature Standardized using gamma.
3rd, algorithm parameter is set, the closest k1 of each label, k1 ∈ [1,20], bandwidth parameter w, w ∈ [1,30], balance Coefficient gamma, 0≤γ≤1, number of tags nl, the n1 ∈ [1,23] of each image labeling.
4th, by formula (1) to formula (4), CCA models are trained.Then the visual signature of image and semantic feature are projected respectively To CCA subspaces, COS distance is utilized in the subspace, seeks the similitude between visual signature and semantic feature, merges 15 Similarity matrix, the similitude formed between final image and label.
5th, build semantic neighbours, the similitude between the image tag obtained according to step 4, obtain the related figure of each label As subset, with reference to the original mark image subset of each label, a more perfect image subset is formed, is referred to as semantic adjacent Occupy.
6th, the visible sensation distance between test image and training image is calculated, is calculated for different features using different distances Method, form final distance.For Lab color histogram features, using KL distances.It is special for SIFT, Hue, HSV, RGB Sign, using L1 distances.For GIST features, using L2 distances.All distances are finally merged, form final distance.
7th, for each test image, the test image most phase in the semantic neighbours of each label is calculated using KNN algorithms K1 of pass are neighbouring, integrate k1 of each label neighbouring, one image subset of formation.By the figure in test image and the subset The visible sensation distance of picture, bring formula (7) into and obtain the probability between the test image and each label.Selection and test from the probability The maximally related several labels of image, the label as the image.So as to obtain initial annotation results.
8th, the correlation between tag-tag is calculated.The co-occurrence matrix between tag-tag is calculated first.Then formula is passed through (8) normalize.Bring normalized matrix into formula (9) and obtain the similitude between label and label.
9th, final Marking Probability is calculated.The similarity matrix that the probability matrix and step 8 that step 7 is obtained obtain is brought into Formula (10) obtains final probability.

Claims (1)

1. a kind of automatic image marking method based on CCA and 2PKNN, including image preprocessing, semantic neighbours structure, The image labeling process and image labeling of 2PKNN algorithms optimize four parts;It is characterized in that comprise the following steps that:
(1) image preprocessing
Make X={ x1,…,xi,…,xnIt is n image collection, Z={ l1..., lx..., lcFor the word set of c label;Training Collect T={ (x1, y1) ..., (xi, yi) ..., (xn, yn) by image xiThe corresponding label y with itiForm;yiFor a binary system Vectorial yi∈{0,1}c,yi(k) image x=1 is representediIt is labelled with k-th of label;Otherwise yi(k)=0;Corresponding label matrix is Y =[y1,y2,…,yn]T=[y(1),y(2),…,y(c)], wherein, xiRepresent the d dimension row vectors of i-th image, yiRepresent i-th The c dimension row vectors of image;;
15 global and local features of each image are extracted first;Global characteristics include GIST, RGB, and LAB and hsv color are straight Fang Tu;Local feature includes the hue features of SIFT and robust, both the multiple dimensioned grid of intensive extraction and Harris- Laplacian points of interest;In order to obtain the spatial information of image, color is calculated in three same levels segmentation of each image Histogram, SIFT and hue features, wherein, hue features are designated as V3H1;Feature Dimension Reduction is carried out using the method for square Feature Mapping; The distance between two features is calculated, for color histogram, using KL distances;For GIST, using L2 distances;For SIFT and Hue features, using L1 distances;All distances are finally merged, obtain final distance;
(2) structure of semantic neighbours
If characteristics of image X is n*d exponent number evidences, semantic feature Y is n*c exponent number evidences, X row x(1),x(2),…,x(d)It is one group of change Amount, Y row y(1),y(2),…,y(c)For another group of variable;First standardize, obtain new X, Y;Correspond to any of this two groups of variables Linear combination is expressed as:
<mrow> <mi>U</mi> <mo>=</mo> <mi>X</mi> <mi>a</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>d</mi> </munderover> <msub> <mi>a</mi> <mi>i</mi> </msub> <msub> <mi>x</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msub> <mo>,</mo> <mi>V</mi> <mo>=</mo> <mi>Y</mi> <mi>b</mi> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>c</mi> </munderover> <msub> <mi>b</mi> <mi>i</mi> </msub> <msub> <mi>y</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein, ai> 0 is represented to x(1),x(2),…,x(d)Carry out x during linear combination(i)Coefficient, the row that all d coefficients are formed Vector representation is a;bi> 0 is represented to y(1),y(2),…,y(c)Carry out y during linear combination(i)Coefficient, all c coefficients form Row vector be expressed as b;
The purpose of canonical correlation analysis is just to solve for causing that the related ρ (a, b) between any combination U, V of this two groups of variables is maximum When a, b values, ρ (a, b) is:
<mrow> <mtable> <mtr> <mtd> <mrow> <mi>&amp;rho;</mi> <mrow> <mo>(</mo> <mi>a</mi> <mo>,</mo> <mi>b</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mi>o</mi> <mi>v</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>,</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>U</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <msqrt> <mrow> <mi>D</mi> <mrow> <mo>(</mo> <mi>V</mi> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msup> <mi>X</mi> <mo>&amp;prime;</mo> </msup> <mi>Y</mi> <mi>b</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msqrt> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msup> <mi>X</mi> <mo>&amp;prime;</mo> </msup> <mi>X</mi> <mi>a</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <msqrt> <mrow> <mi>E</mi> <mrow> <mo>(</mo> <msup> <mi>b</mi> <mo>&amp;prime;</mo> </msup> <msup> <mi>Y</mi> <mo>&amp;prime;</mo> </msup> <mi>Y</mi> <mi>b</mi> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mfrac> <mrow> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>12</mn> </msub> <mi>b</mi> </mrow> <msup> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>11</mn> </msub> <msup> <mi>ab</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>22</mn> </msub> <mi>b</mi> <mo>)</mo> </mrow> <mrow> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Wherein, Cov (U, V) represents U, V covariance;E represents it is expected;S11=E (X'X), S22=E (Y'Y), S12=E (X'Y), S12=S21', D represents variance, and a, b are canonical variables, a ', b ' representation vector a, b respectively transposition, X ', Y ' matrix is represented respectively X, Y transposition;
The final purpose of CCA algorithms is exactly to find one group of vector a, b, so that the correlation coefficient ρ between U, V maximizes;Will Formula (2) denominator is set to 1, i.e.,:
a'S11A=b'S22B=1 (3)
Formula solves a'S under the conditions of (3)12B max problem, utilize Lagrange decree:
<mrow> <mi>&amp;psi;</mi> <mo>=</mo> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>12</mn> </msub> <mi>b</mi> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>&amp;lambda;</mi> <mrow> <mo>(</mo> <msup> <mi>a</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>11</mn> </msub> <mi>a</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msup> <mi>b</mi> <mo>&amp;prime;</mo> </msup> <msub> <mi>S</mi> <mn>22</mn> </msub> <mi>b</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Wherein, λ, u Lagrange multiplier;Making formula (4), u local derviation is 0 to λ;The a for making ψ maximum is obtained, b values, obtains projecting square Battle array U, V;
By characteristics of image and semantic feature respectively by U, V is mapped to CCA proper subspaces, then tries to achieve figure using COS distance Picture and the similitude between label;Using the similitude, maximally related several images are selected for each label, with reference to each label Original mark image, obtain the image subset of each label;It is defined asRepresent that training image is concentrated and include label lk, K ∈ [1...c] image subset;From label matrix Y=[y(1),y(2),…,y(c)] in can obtain scheming corresponding to each label As subsetWe merge the two subsets, form final subset Gk=Xk∪TK, k ∈ [1...c], so as to obtain More perfect image subset, claim the semantic neighbours of the subset test image;
(3) image labeling of 2PKNN algorithms
For a unknown images, k semantic neighbours are selected in the semantic neighbours of each label first with KNN algorithms, should The final image labeling result of semantic neighbors decided;Assumed condition probability P (J | yi) scheme on the premise of the given semantic label of simulation As J feature distribution;Image labeling problem is converted into the problem of seeking posterior probability;I.e.:
P(yi) represent label yi, i ∈ [1...c] prior probability, a unknown images J is given, it is right to be obtained by the following formula it The label answered:Selecting makes y*The label of maximum several labels, as image J;
From each semantic neighbours GkIt is middle that the k1 image similar to unknown images J is selected using KNN algorithms;So as to formed one with Subset T related unknown images JJ={ TJ,1, TJ,2... ..., TJ,C};In the subset, the image number of each label is relatively flat Weighing apparatus;Then in the subset, with reference to unknown images J and the visible sensation distance of subset image, unknown images J and each label are drawn Probability;That is, a label y is givenkUnknown images J probability under conditions of ∈ Z:
<mrow> <mtable> <mtr> <mtd> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>J</mi> <mo>|</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&amp;Sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>&amp;Element;</mo> <msub> <mi>T</mi> <mi>J</mi> </msub> </mrow> </munder> <msub> <mi>&amp;theta;</mi> <mrow> <mi>J</mi> <mo>,</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> </mrow> </msub> <mo>&amp;CenterDot;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>|</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <munder> <mi>&amp;Sigma;</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> <mo>&amp;Element;</mo> <msub> <mi>T</mi> <mi>J</mi> </msub> </mrow> </munder> <mi>exp</mi> <mrow> <mo>(</mo> <mo>-</mo> <mi>w</mi> <mi>D</mi> <mo>(</mo> <mrow> <mi>J</mi> <mo>,</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>&amp;CenterDot;</mo> <mi>&amp;delta;</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>k</mi> </msub> <mo>&amp;Element;</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>
WhereinD(J,Ii) for the visible sensation distance of test image and training image;P(yk|Ii)=δ (yk∈Yi) it is defined as label ykIn image IiPresence and shortage in corresponding tally set;δ () is 1 expression label ykIn image IiPresence in corresponding tally set;δ () is that 0 expression is not present;W is bandwidth, and span is [1,30];Because each mark The semantic neighbor image number relative equilibrium of label the, so P (y in formula (5)i) to be uniformly distributed, bring formula (7) into formula (5) In, unknown images J label probability will be obtained;
(4) image labeling optimizes
The mark word co-occurrence matrix C of training set, each Elements C of co-occurrence matrix are calculated firstijRepresent that label i and label j is same When there is image frequency in one image;Co-occurrence matrix is normalized:
<mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>C</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>C</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>x</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>
Sim (x, y) describes mark word lx,lyBetween similarity, in order to semantic similar between more accurate descriptive semantics word Degree, uses COS distance formula;
<mrow> <mi>C</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mover> <mi>a</mi> <mo>&amp;RightArrow;</mo> </mover> <mo>&amp;CenterDot;</mo> <mover> <mi>b</mi> <mo>&amp;RightArrow;</mo> </mover> </mrow> <mrow> <mrow> <mo>|</mo> <mi>a</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <mi>b</mi> <mo>|</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>
I-th and j-th of row vector of sim matrixes are represented respectively;Cos be c × c word between correlation matrix, c be mark Word sum;Finally the optimization formula of mark word is:
R=γ * Cos*P+ (1- γ) * P (10)
Wherein, P is the image Posterior probability distribution matrix that step 3 obtains;Cos correlation matrixes between word;γ is balance system Number, its span are:0≤γ≤1;R is final image posterior probability, therefrom several for each image selection probability highest Individual label is its label.
CN201610144113.8A 2016-03-10 2016-03-10 A kind of automatic image marking method based on CCA and 2PKNN Expired - Fee Related CN105808752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610144113.8A CN105808752B (en) 2016-03-10 2016-03-10 A kind of automatic image marking method based on CCA and 2PKNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610144113.8A CN105808752B (en) 2016-03-10 2016-03-10 A kind of automatic image marking method based on CCA and 2PKNN

Publications (2)

Publication Number Publication Date
CN105808752A CN105808752A (en) 2016-07-27
CN105808752B true CN105808752B (en) 2018-04-10

Family

ID=56467279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610144113.8A Expired - Fee Related CN105808752B (en) 2016-03-10 2016-03-10 A kind of automatic image marking method based on CCA and 2PKNN

Country Status (1)

Country Link
CN (1) CN105808752B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650775B (en) * 2016-10-12 2020-04-10 南京理工大学 Image annotation method capable of mining visual and semantic similarity simultaneously
CN108182443B (en) * 2016-12-08 2020-08-07 广东精点数据科技股份有限公司 Automatic image labeling method and device based on decision tree
CN108268510B (en) * 2016-12-30 2022-01-28 华为技术有限公司 Image annotation method and device
CN107578069B (en) * 2017-09-18 2020-12-29 北京邮电大学世纪学院 Image multi-scale automatic labeling method
CN108228845B (en) * 2018-01-09 2020-10-27 华南理工大学 Mobile phone game classification method
CN108537269B (en) * 2018-04-04 2022-03-25 中山大学 Weak interactive object detection deep learning method and system thereof
CN108985339A (en) * 2018-06-21 2018-12-11 浙江工业大学 A kind of supermarket's articles from the storeroom method for identifying and classifying based on target identification Yu KNN algorithm
CN109543730A (en) * 2018-11-09 2019-03-29 广东原昇信息科技有限公司 The classification method of information streaming material intention picture
CN111325200A (en) * 2018-12-17 2020-06-23 北京京东尚科信息技术有限公司 Image annotation method, device, equipment and computer readable storage medium
CN110516092B (en) * 2019-09-02 2020-12-01 中国矿业大学(北京) Automatic image annotation method based on K nearest neighbor and random walk algorithm
CN111985550B (en) * 2020-08-13 2024-02-27 杭州电子科技大学 Classifying method for preprocessing industrial chemical data based on Gap measurement
CN113158745B (en) * 2021-02-02 2024-04-02 北京惠朗时代科技有限公司 Multi-feature operator-based messy code document picture identification method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542067A (en) * 2012-01-06 2012-07-04 上海交通大学 Automatic image semantic annotation method based on scale learning and correlated label dissemination
CN104572940A (en) * 2014-12-30 2015-04-29 中国人民解放军海军航空工程学院 Automatic image annotation method based on deep learning and canonical correlation analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5234469B2 (en) * 2007-09-14 2013-07-10 国立大学法人 東京大学 Correspondence relationship learning device and method, correspondence relationship learning program, annotation device and method, annotation program, retrieval device and method, and retrieval program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542067A (en) * 2012-01-06 2012-07-04 上海交通大学 Automatic image semantic annotation method based on scale learning and correlated label dissemination
CN104572940A (en) * 2014-12-30 2015-04-29 中国人民解放军海军航空工程学院 Automatic image annotation method based on deep learning and canonical correlation analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization;Mahdi M. Kalayeh et al.;《2014 IEEE Conference on Computer Vision and Pattern Recognition》;20141231;第184-191页 *

Also Published As

Publication number Publication date
CN105808752A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
CN105808752B (en) A kind of automatic image marking method based on CCA and 2PKNN
Zhao et al. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery
US11501518B2 (en) Fine-grained image recognition method, electronic device and storage medium
Shi et al. Branch feature fusion convolution network for remote sensing scene classification
Xia et al. Spectral–spatial classification for hyperspectral data using rotation forests with local feature extraction and Markov random fields
Liu et al. Nonparametric scene parsing via label transfer
CN103942564B (en) High-resolution remote sensing image scene classifying method based on unsupervised feature learning
Lee et al. Object-graphs for context-aware visual category discovery
Jing et al. SNMFCA: Supervised NMF-based image classification and annotation
CN111738355B (en) Image classification method and device with attention fused with mutual information and storage medium
Ren et al. Multi-instance visual-semantic embedding
CN104318219A (en) Face recognition method based on combination of local features and global features
CN110210534B (en) Multi-packet fusion-based high-resolution remote sensing image scene multi-label classification method
Faria et al. Fusion of time series representations for plant recognition in phenology studies
CN104112018A (en) Large-scale image retrieval method
Xiong et al. MSN: Modality separation networks for RGB-D scene recognition
Feng et al. Transductive multi-instance multi-label learning algorithm with application to automatic image annotation
CN104281572A (en) Target matching method and system based on mutual information
Zhang et al. A multiple instance learning approach for content based image retrieval using one-class support vector machine
CN113723492A (en) Hyperspectral image semi-supervised classification method and device for improving active deep learning
CN104036021A (en) Method for semantically annotating images on basis of hybrid generative and discriminative learning models
Feng et al. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking
Chen et al. RRGCCAN: Re-ranking via graph convolution channel attention network for person re-identification
Wang et al. Integration of heterogeneous features for remote sensing scene classification
CN111680579A (en) Remote sensing image classification method for adaptive weight multi-view metric learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180410