CN104346456B

CN104346456B - The digital picture multi-semantic meaning mask method measured based on spatial dependence

Info

Publication number: CN104346456B
Application number: CN201410599268.1A
Authority: CN
Inventors: 张晨光; 张燕
Original assignee: Hainan University
Current assignee: Hainan University
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2017-09-08
Anticipated expiration: 2034-10-31
Also published as: CN104346456A

Abstract

The invention belongs to digital picture multi-semantic meaning mask method, it is characterised in that in turn include the following steps:（1）Semantic known some digital pictures and all digital pictures to be marked are inputted to computer；（2）By feature extraction, the set of eigenvectors of all images is obtained；（2）Construction has marked the final label vector collection of label vector and all images of image；（3）Calculate the Gram matrixes of set of eigenvectors；（4）Utilization space dependence measure obtains the metric of dependence degree between set of eigenvectors and label vector collection；（6）Dependence metric is stepped up in iterative process to maximum, the value of the confidence that image to be marked belongs to each semantic category is obtained；（7）Given threshold, judges image, semantic to be marked.The invention has the advantages that：1）Mark effect is improved using largely non-semantic tagger image；2）Situation is marked suitable for multi-semantic meaning；3）Faster operation speed.

Description

The digital picture multi-semantic meaning mask method measured based on spatial dependence

Technical field

The present invention relates to a kind of semi-supervised multi-semantic meaning mask method of digital picture measured based on spatial dependence, belong to electricity Sub-information technical field.

Background technology

Linguistic indexing of pictures is intended to represent the semantic content of piece image using semantic key words, and it is for graphical analysis Understand and image retrieval all has very important significance.The linguistic indexing of pictures of early stage needs professional according to each image Semanteme it is artificial mark keyword, it is time-consuming and with subjectivity.In order to overcome these defects manually marked, researcher is in recent years The method for proposing many automatic marking image, semantic contents, including the translation model based on generation model, across media relevant modes The methods such as type, and the method such as asymmetric SVMs and hierarchical classification based on discrimination model.Usually, these method sheets The process of machine learning can be regarded in matter as：Learn on the sample data set for having marked image composition and construct one Statistical classification model, and obtain using the model semantic classes of image to be marked.

Although the proposition of numerous automatic semantic tagger technologies provides one and had for the analysis and understanding of mass image data The basis of benefit and premise, but there are still many bottleneck problem urgent need to resolve for the technology.Wherein, image multi-semantic meaning and marked This excessively rare two classes problem of image increasingly causes the extensive concern of researcher.Image multi-semantic meaning, which refers to a sub-picture, generally to be had Multiple different semantic, such as in landscape figure, piece image can possess the themes such as " sky ", " white clouds ", " grassland " simultaneously； In medical image, medical image can include simultaneously to " tumour ", " calculus " and etc. the related information of disease.Conventional machines learn Method, including nearest neighbour method, decision tree, neutral net and SVMs etc., belong to single label learning method, it is impossible to directly use more Linguistic indexing of pictures in the case of multi-semantic meaning.The situation that this single sample possesses many generics is referred to as many marks in machine learning field Label study.At present, multi-tag problem concerning study has Binary Relevance, Classifier Chains, MLKNN and Rank- The solutions such as SVM.These methods are that single stamp methods are changed by problem or algorithm improvement is obtained, and are respectively had in actual applications It is good and bad.

In addition to multi-semantic meaning problem, automatic semantic tagger technology, which is also existed, has marked the problem of image is excessively rare.Make Main cause into this problem is because the acquisition for having marked image is generally required for expending substantial amounts of man power and material.Especially It is that all kinds of to have marked the relative reduction of picture number with the increase of semantic classes, this problem just seems outstanding in the case of multi-semantic meaning Its is sharp.The decline of disaggregated model Generalization Capability can be caused by having marked sample excessively rareness, and then influence the accurate of semantic tagger Rate.An effective way for solving this problem is exactly to develop semi-supervised semanteme marking method.At present, although semi-supervised learning side Method has grown a lot, it is proposed that including TSVM, a variety of methods such as figure semi-supervised learning, but can be applied to multi-semantic meaning (many marks Label) problem concerning study semi-supervised learning method it is still rarer.

For above-mentioned two problems, the invention discloses a kind of digital picture based on spatial dependence measurement is semi-supervised many Semanteme marking method.Its theoretical foundation is spatial dependence measurement, using all samples, including has marked and do not marked sample This, the dependence to feature set and semantic classes collection is estimated, and will mark image pattern as boundary constraint, finally leads to Cross iterative technique and step up the estimate to maximum, so as to obtain all semantic classes of image to be marked.The present invention has Good technique effect.First, the present invention, can be by increasing sample number based on the dependence based on statistical theory Mesh, including do not mark the accuracy that number of samples improves dependence estimation, therefore it is a kind of first using not marking image Improve the semi-supervised mask method of mark accuracy rate；Secondly, no matter image has how many semantic classes simultaneously, and the present invention will The semantic combination of the image regards in semantic set a point as and maps to reproducing kernel Hilbert space, so it is also simultaneously Multi-semantic meaning image labeling method；The last present invention completes image labeling on the basis of feasible direction method by iteration, achieves With the comparable calculating speed of prior art.

The content of the invention

It is an object of the invention to provide a kind of semi-supervised multi-semantic meaning mask method of digital picture of precise and high efficiency.

The technical scheme is that：Receive semantic known some digital pictures and all digital pictures to be marked And extract characteristics of image and obtain set of eigenvectors, construction has marked the label vector and the final label vector of all images of image Collection, calculate set of eigenvectors Gram matrixes and according to spatial dependence measurement obtain image belong to each semantic category the value of the confidence and Image is finally semantic, specifically comprises the steps of：

Step 1, some semantic known digital pictures are inputted and all digital pictures of progress semantic tagger are needed extremely Computer；It is rgb format by the unification of all picture formats, and size normalization is carried out to all images；

Step 2, the global textural characteristics of image are extracted using Gist descriptors, above-mentioned all digital pictures are converted into Vector, a width figure one column vector of correspondence, and by these characteristic vector composition of vector collection, it is designated as X=[x₁, x₂..., x_v, x_v+1..., x_v+u], wherein x_i(1≤i≤v) correspondence has marked image, remaining correspondence image to be marked；

Step 3, it is the possible semantic classes sum of sample to make m, constructs and has marked image x under original state_i(1≤i≤v) Label vector be It is m dimensional vectors, wherein：

Make m dimensional vectors y_iRepresent image x_iThe final label vector of (1≤i≤u+v), construction label vector collection Y=[y₁, y₂..., y_v, y_v+1..., y_v+u]；

Step 4, it is k (x to select the kernel function on set of eigenvectors X_i, x_j), X Gram matrixes are calculated by kernel function, It is designated as K；

Step 5, utilization space dependence measure obtains dependence degree between set of eigenvectors and label vector collection Metric it is as follows：

Wherein, Tr [] represents to seek mark,I is unit matrix, e be element value be all 1 n ranks to Amount, n=v+u represents image pattern sum；

Step 6, it is ensured that Y meets conditionAndIn the case of, updated using iterative technique Y_UValue, Q (Y) is stepped up to maximum, so as to obtain the value of the confidence Y that image to be marked belongs to each semantic category_U；Wherein, Y= [Y_V, Y_U], Y_VAnd Y_UIt is Y preceding v row and rear u row respectively, corresponds respectively to known to semanteme and unknown portions, | | | |_FIt is Frobenius norms, τ ＞ 0 are previously given smaller constants, for avoiding Y_UThe excessive decrease Y of yardstick_VFor dependence journey The contribution rate of degree；

Step 7, to any one secondary image x to be marked_j(v+1≤j≤v+u), sets the confidence threshold ε of the image_jFor the figure As the average value of all semantic confidence values, i.e.,：

To any image image x to be marked to be marked_j(v+1≤j≤v+u) and any given semantic classes i (1≤i ≤ m), if Y_U(i, j) ＞ ε_j, then judgement sample is with i-th of semantic classes, and otherwise judgement sample is semantic without i-th.

Kernel function in the step 3 includes radial direction base core, linear kernel, polynomial kernel, sigmoid cores.

The specific steps of the step 6 include：

Step 6.1, remember A=HKH, A and H is divided into four parts according to having marked with to be marked：

Wherein, A_VAnd H_VMark part in correspondence image, A_UAnd H_UThen part to be marked in correspondence image, andOrder

Mark ratio Q (Y) is converted on Y_UFunction f (Y_U)/g(Y_U)；

Step 6.2, given threshold value κ ＞ 0 are the number of very little；Random initializtionSo thatOrder

Step 6.3, F (Y are made_U)=f (Y_U)-λ^bg(Y_U), solution obtains new

Step 6.4, λ is made^a=λ^b,

Step 6.5, λ is worked as^b-λ^a＜ κ, outputY_UIn each row Y_U(：, j) i-th of number Y of (j=1 ..., u)_U (i, j) represents that j-th of sample belongs to the confidence level of the i-th class；Otherwise step 6.3 is jumped to, execution step 6.3 is continued cycling through to step Rapid 6.5.

The specific steps of the step 6.3 include：

Step 6.3.1, specified threshold δ ＞ 0 is the number of very little, order

M=(A_U-λ^bH_U)

N=2Y_V(A_VU-λbH_VU)

ConstructionThe same solution problem of this optimization problem is as follows：

Step 6.3.2, orderBy KKT conditions, ifAnd I.e. For optimal solution, outputTo be newOtherwise, optimal solution is now transferred to next step on border；

Step 6.3.3, initialization pointsIt is used as new zequin；IfOrderOtherwise, at random InitializationSo thatMeet

Step 6.3.4, initializes w₂For constant, w is made₁=-w₂/2；Wherein, w₂For representing next feasible direction Frobenius norms；

Step 6.3.5, calculates current pointFeasible direction d；Wherein, direction d should ensure that next iteration point WithFrobenius norms it is consistent and in the direction optimization target values increase it is most fast, meet the feasible direction d of this two condition It can be calculated as follows：

Wherein,

Step 6.3.6, makes w₁=α w₁, w₂=α w₂, α ＜ 1 are given normal numbers；

Step 6.3.7, whenOrderOtherwise step 6.3.5 is jumped to, continues to follow Ring performs 6.3.5 to step 6.3.7；

Step 6.3.8, whenOutputTo be newOtherwise step 6.3.4 is jumped to, continues to follow Ring performs step 6.3.4 to step 6.3.8.

The general principle of the present invention between the feature space and semantic space of image it is believed that have very strong dependence Property, on the basis of quantitative estimation is carried out to dependence, using the semantic classes for having marked image as constraints, pass through iteration Technology steps up the estimate to maximum, so as to obtain all semantic classes of image to be marked.

The present invention compared with prior art, with following obvious advantage and beneficial effect：

First, the present invention as a result of spatial dependence as theoretical foundation, be it is a kind of it is new be used to solving image it is many The new technology of semantic tagger problem；Secondly, of the invention or a kind of semi-supervised mask method, it can be big by what is inexpensively easily taken Amount does not mark image and learnt, therefore can often obtain the mark accuracy rate higher than prior art, is especially marking In the case of noting image rareness, lifting effect is obvious；The last present invention is complete by iterative technique on the basis of feasible direction method Into the mark of image, achieve and the comparable calculating speed of prior art.

Brief description of the drawings

Fig. 1 is the structured flowchart of the embodiment of the present invention.

Fig. 2 is the flow chart that the embodiment of the present invention obtains each sample the value of the confidence by iteration.

Fig. 3 is the flow chart that the embodiment of the present invention solves iterative process neutron optimization problem.

Fig. 4 is the ROC curve effect contrast figure of the embodiment of the present invention.

Embodiment

According to Fig. 1 dispose embodiments of the invention, comprising comprise the following steps that：

Step 1, digital picture known to 200 semantemes and remaining 1800 numeral for needing to carry out semantic tagger are inputted Image is to computer, including desert, mountain peak, sea, the setting sun and the class of trees 5；It is rgb format by the unification of all picture formats, and Size is carried out to all images and is normalized to 512 × 512；Here all images derive from Nanjing University's machine learning and data Image data base disclosed in Research on Mining, can be from network address http://lamda.nju.edu.cn/data_ Downloaded in MIMLimage.ashx；

Step 2, the global textural characteristics of image are extracted using Gist descriptors：Each secondary figure is converted into gray-scale map, 4 Individual yardstick, 8 directions carry out Gabor filtering, and filtered image carries out 4 × 4 piecemeals, obtain 512 dimensions of each secondary figure Gist feature column vectors；By these characteristic vector composition of vector collection, X=[x are designated as₁, x₂..., x_v, x_v+1..., x_v+u], wherein V=200, u=1800, x_i(1≤i≤v) correspondence has marked image, remaining correspondence image to be marked；

Step 3, it is semantic classes sum to make m=5；Image x has been marked under construction original state_iThe label of (1≤i≤v) Vector is It is m dimensional vectors, wherein：

Make m dimensional vectors y_iRepresent image x_iThe final label vector of (1≤i≤u+v), construction label vector collection Y=[y1_, y₂..., y_v, y_v+1..., y_v+u]；

Step 4, the kernel function k (x on set of eigenvectors X are selected_i, x_j) it is radial direction base core, X is calculated by the kernel function Gram matrixes, be designated as K；

Step 6, it is ensured that Y meets conditionAndIn the case of, updated using iterative technique Y_UValue, Q (Y) is stepped up to maximum, so as to obtain the value of the confidence Y that image to be marked belongs to each semantic category_U；Wherein, Y= [Y_V, Y_U], Y_VAnd Y_UIt is Y preceding v row and rear u row respectively, corresponds respectively to known to semanteme and unknown portions, | | | |_FIt is Frobenius norms, τ is redefined for 0.1, for avoiding Y_UThe excessive decrease Y of yardstick_VFor the contribution rate of dependence degree；Figure 2 be the flow chart of step 6, is specifically comprised the following steps：

Mark ratio Q (Y) is converted on Y_UFunction f (Y_U)/g(Y_U)；

Step 6.2, given threshold value κ=0.001；Random initializtionSo thatOrder

Step 6.3, F (Y are made_U)=f (Y_U)-λ^bg(Y_U), solving-optimizing subproblem Flow chart as shown in figure 3, comprising the following steps that：

Step 6.3.1, specified threshold δ=0.001 is the number of very little, order

M=(A_U-λ^bH_U)

N=2YV (A_VU-λ^bH_VU)

Step 6.3.2, orderIfAnd As optimal solution, defeated Go outTo be newOtherwise, optimal solution is now transferred to next step on border；

Step 6.3.4, initializes w₂=1 is constant, makes w₁=-w₂/2；Wherein, w₂For representing next feasible direction Frobenius norms；

Step 6.3.5, calculates current pointFeasible direction d：

Wherein,

Step 6.3.6, makes w₁=α w₁, w₂=α w₂, α=0.5 is given constant；

Step 6.3.7, ifOrderOtherwise step 6.3.5 is jumped to, continues to follow Ring performs 6.3.5 to step 6.3.7；

Step 6.3.8, whenOutputTo be newOtherwise step 6.3.4 is jumped to, continues to follow Ring performs step 6.3.4 to step 6.3.8；

Step 6.4, λ is made^a=λ^b,

Step 6.5, λ is worked as^b-λ^a＜ κ, outputY_UIn each row Y_U(：, j) i-th of number Y of (j=1 ..., u)_U (i, j) represents that j-th of sample belongs to the confidence level of the i-th class；Otherwise step 6.3 is jumped to, execution step 6.3 is continued cycling through to step Rapid 6.5；

The embodiment of the present invention and classics MLKNN (Zhang M L, et al.A k-nearest neighbor based Algorithm for multi-label classification) and Binary Relevance (Boutell M R, et Al.Learning multi-label scene classification) two kinds of mask methods have marked language at only 200 ROC curve (Receiver operating characteristic curve) under adopted image is as shown in Figure 4.In Fig. 4, this Inventive embodiments are in desert, mountain peak, sea, and best AUC is achieved (under ROC curve in five classifications of the setting sun and trees Aspect is accumulated), absolutely prove the present invention with good multi-semantic meaning mark effect.

Finally it should be noted that：Above example only not limits technology described in the invention to illustrate the present invention Scheme；Therefore, although this specification with reference to each above-mentioned embodiment to present invention has been detailed description, this Field it is to be appreciated by one skilled in the art that still can be modified to the present invention or equivalent substitution；And all do not depart from hair The technical scheme of bright spirit and scope and its improvement, it all should cover among scope of the presently claimed invention.

Claims

1. a kind of digital picture multi-semantic meaning mask method measured based on spatial dependence, it is characterised in that successively including following step Suddenly：

Step 1, some semantic known digital pictures are inputted and need all digital pictures for carrying out semantic tagger extremely to calculate Machine；It is rgb format by the unification of all picture formats, and size normalization is carried out to all images；

Step 2, using Gist descriptors extract image global textural characteristics, by above-mentioned all digital pictures be converted into Amount, a width figure one column vector of correspondence, and by these characteristic vector composition of vector collection, it is designated as X=[x₁, x₂..., x_v, x_v+1..., x_v+u], wherein x_i(1≤i≤v) correspondence has marked image, remaining correspondence image to be marked；

Step 3, it is the possible semantic classes sum of sample to make m, constructs and has marked image x under original state_iThe mark of (1≤i≤v) Signing vector isIt is m dimensional vectors, wherein：

Step 4, it is k (x to select the kernel function on set of eigenvectors X_i, x_j), X Gram matrixes are calculated by kernel function, are designated as K；

Step 5, utilization space dependence measure obtains the degree of dependence degree between set of eigenvectors and label vector collection Value is as follows：

<mrow> <mi>Q</mi> <mrow> <mo>(</mo> <mi>Y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>T</mi> <mi>r</mi> <mrow> <mo>&lsqb;</mo> <mrow> <msup> <mi>YHKHY</mi> <mi>T</mi> </msup> </mrow> <mo>&rsqb;</mo> </mrow> </mrow> <mrow> <mi>T</mi> <mi>r</mi> <mrow> <mo>&lsqb;</mo> <mrow> <msup> <mi>YHY</mi> <mi>T</mi> </msup> </mrow> <mo>&rsqb;</mo> </mrow> </mrow> </mfrac> </mrow>

Wherein, Tr [] represents to seek mark,I is unit matrix, and e is the n ranks vector that element value is all 1, n=v + u represents image pattern sum；

Step 6, it is ensured that Y meets conditionAndIn the case of, update Y using iterative technique_U's Value, steps up Q (Y) to maximum, so as to obtain the value of the confidence Y that image to be marked belongs to each semantic category_U；Wherein, Y=[Y_V, Y_U], Y_VAnd Y_UIt is Y preceding v row and rear u row respectively, corresponds respectively to known to semanteme and unknown portions, | | | |_FIt is Frobenius norms, τ ＞ 0 are previously given constants, for avoiding Y_UThe excessive decrease Y of yardstick_VFor the contribution of dependence degree Rate, specific steps include：

Mark ratio Q (Y) is converted on Y_UFunction f (Y_U)/g(Y_U)；

Step 6.2, given threshold value κ ＞ 0 are constant；Random initializtionSo thatOrder

Step 6.3, F (Y are made_U)=f (Y_U)-λ^bg(Y_U), solution obtains new

Step 6.4, λ is made^a=λ^b,

Step 6.5, λ is worked as^b-λ^a＜ κ, outputY_UIn each row Y_U(：, j) i-th of number Y of (j=1 ..., u)_U(i, j) Represent that j-th of sample belongs to the confidence level of the i-th class；Otherwise step 6.3 is jumped to, execution step 6.3 is continued cycling through to step 6.5；

Step 7, to any one secondary image x to be marked_j(v+1≤j≤v+u), sets the confidence threshold ε of the image_jFor the image institute There is the average value of semantic confidence value, i.e.,：

<mrow> <msub> <mi>&epsiv;</mi> <mi>j</mi> </msub> <mo>=</mo> <mfrac> <mrow> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>Y</mi> <mi>U</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow> <mi>m</mi> </mfrac> </mrow>

To any image image x to be marked to be marked_j(v+1≤j≤v+u) and any given semantic classes i (1≤i≤m), If Y_U(i, j) ＞ ε_j, then judgement sample is with i-th of semantic classes, and otherwise judgement sample is semantic without i-th.

2. the digital picture multi-semantic meaning mask method according to claim 1 measured based on spatial dependence, its feature is existed In：Kernel function in the step 3 includes radial direction base core, linear kernel, polynomial kernel, sigmoid cores.

3. the digital picture multi-semantic meaning mask method according to claim 1 measured based on spatial dependence, its feature is existed In：The specific steps of the step 6.3 include：

Step 6.3.1, specified threshold δ ＞ 0 is constant, order

M=(A_U-λ^bH_U)

N=2Y_V(A_VU-λ^bH_VU)

Step 6.3.2, orderIfAnd As optimal solution, is exported To be newOtherwise, optimal solution is now transferred to next step on border；

Step 6.3.3, initialization pointsIt is used as new zequin；IfOrderOtherwise, it is random initial ChangeSo thatMeet

Step 6.3.4, initializes w₂To give constant, w is made₁=-w₂/2；Wherein, w₂For representing next feasible direction Frobenius norms；

Step 6.3.5, calculates current pointFeasible direction d it is as follows：

Wherein,

Step 6.3.7, whenOrderOtherwise step 6.3.5 is jumped to, execution is continued cycling through 6.3.5 to step 6.3.7；

Step 6.3.8, whenOutputTo be newOtherwise step 6.3.4 is jumped to, continues cycling through and holds Row step 6.3.4 to step 6.3.8.