CN103942214A - Natural image classification method and device on basis of multi-modal matrix filling - Google Patents

Natural image classification method and device on basis of multi-modal matrix filling Download PDF

Info

Publication number
CN103942214A
CN103942214A CN201310021734.3A CN201310021734A CN103942214A CN 103942214 A CN103942214 A CN 103942214A CN 201310021734 A CN201310021734 A CN 201310021734A CN 103942214 A CN103942214 A CN 103942214A
Authority
CN
China
Prior art keywords
label
natural image
theta
data
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310021734.3A
Other languages
Chinese (zh)
Other versions
CN103942214B (en
Inventor
罗勇
许超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201310021734.3A priority Critical patent/CN103942214B/en
Publication of CN103942214A publication Critical patent/CN103942214A/en
Application granted granted Critical
Publication of CN103942214B publication Critical patent/CN103942214B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a natural image classification method and device on basis of multi-modal matrix filling. The method comprises the steps of carrying out feature extraction on natural image data with labels, natural image data without labels and natural image data for testing, and obtaining different feature representations; adopting a matrix filling algorithm to generate estimation labels of all features of the data with the labels; carrying out linear combination on all the estimation labels to be approximate to the corresponding known true labels corresponding to the estimation labels to obtain a combination coefficient; for all features, utilizing the natural image data with the labels and adopting the matrix filling algorithm to predict the labels of the natural image data without the labels and the labels of the natural image data for testing; adopting the combination coefficient to combine the predicted labels of all the features to obtain labels combining multiple features; classifying the natural image data on the basis of the labels combining the multiple features. The natural image classification method and device on the basis of the multi-modal matrix filling are easy to achieve, high classification accuracy can be obtained, meanwhile, the advantages of image classification on the basis of matrix filling are inherited, and the natural image classification method and device on the basis of the multi-modal matrix filling are suitable for the fields of network picture summarizing and classifying, image retrieval and the like.

Description

Natural image sorting technique and device based on multi-modal matrix fill-in
Technical field
The invention belongs to Images Classification and multi-modal data analysis (many Fusion Features) technical field, relate to the many labelings technology based on matrix fill-in, be specifically related to a kind of image classification method and device that utilizes multi-modal matrix fill-in.
Background technology
The image single from the content such as face, fingerprint, form is consistent is different, in a width natural image, conventionally comprises multiple objects, and presents respectively different forms.In natural image classification, often need to distribute multiple class labels to piece image.As shown in Figure 1, (a) " people " riding " bicycle ", and (b) " sky " and " ocean " often occurs together, and (c) " dog " is a kind of " animal ".Traditional single labeling (sample only has a class label) algorithm great majority cannot be directly used in many labelings.More feasible is " one-to-many " strategy in multicategory classification: for each classification builds respectively a two-value sorter, belong to such other sample and be considered as positive example, remaining is all considered as negative example.An open defect of this method is easily to cause serious data skew problem, has also ignored interrelated between classification (for example, the cooccurrence relation of " sky " and " ocean ", and the subordinate relation of " dog " and " animal ") simultaneously.Therefore, there are in recent years a lot of new algorithms to be suggested the problem that solves many labels.Wherein, utilize many labelings algorithm of matrix fill-in to allow input data (feature and label) to have excalation, noise and wild point are had to very strong robustness.
Matrix fill-in, as its name suggests, is the matrix M of free missing value to fill up by one exactly.If this matrix is not had to task hypothesis or priori, cannot fill.Therefore, that matrix of conventionally supposing required recovery is low-rank (low-rank) (E.Candes and B.Recht, Exact matrix completion via convex optimization, Found.Comput.Math, 9:717-772,2009).The target of matrix fill-in is exactly to find a matrix X to make X and the M error in known terms as far as possible little, and the order of X is low as far as possible simultaneously.This order minimization problem is the difficult problem of a NP-, therefore almost there is no what practicality.Fortunately, order rank (X) can be by its protruding encapsulation, i.e. nuclear norm || X|| *institute replaces (M.Fazel, Matrix rank minimization with applications, Ph.D.thesis, Stanford University, 2002).Based on this point, a lot of algorithms are developed for matrix fill-in.For example, Candes and Recht(E.Candes and B.Recht, Exact matrix completion via convex optimization, Found.Comput.Math, 9:717-772,2009) point out to minimize nuclear norm || X|| *with rank (X) has identical unique solution, and prove to recover a matrix and only needed limited number of samples.In addition the algorithm that, the author of the document has also proposed a kind of positive semidefinite optimization solves the problem that minimizes nuclear norm.Not very low situation in order to process large matrix and rank of matrix, researchist has proposed respectively singular value thresholding (singular value thresholding, SVT) (J.Cai, E.Candes and Z.Shen, A singular value thresholding algorithm for matrix completion, SIAM, 20 (4): 1956-1982, 2010) and fixed point continuity (fixed point continuation) (S.Ma, D.Goldfarb and L.Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math.Program., 128 (1): 321-353, 2009) algorithm.Recently, matrix fill-in is introduced in conduction study (A.Goldberg, X.Zhu, B.Recht, J.Xu and R.Nowak, Transduction with matrix completion:three birds with one stone, NIPS, pp.757-765, 2010) and many label images classification (R.Cabral, F.Torre, J.Costeira and A.Bernardino, Matrix completion for multi-label image classification, NIPS, pp.190-198, 2011), basic thought is stitched together sample characteristics matrix and sample label matrix exactly, then by matrix fill-in algorithm, the value of unknown characteristics wherein and label is estimated.
This image classification algorithms based on matrix fill-in can only be processed single data of planting feature.And in fact, also do not have so far any feature to can be good at describing the various classifications of natural image.Therefore, conventionally all require to use various features (as SIFT(D.Lowe, Distinctive image features from scale-invariant keypoints, Int.J.Comput.Vis., 60 (2): 91-110, 2004), GIST(A.Torralba, K.Murphy and W.Freeman, Modeling the shape of the scene:A holistic representation of the spatial envelope, Int.J.Comput.Vis., 42 (3): 145-175, 2001) and RGB etc.), merge the most direct manifold way and exactly various features are conspired to create to a long vector.This way not only can reduce operation efficiency greatly, and can cause dimension blast problem, lacks physical interpretation simultaneously, affects classification accuracy rate.
Summary of the invention
The object of the invention is to for the problems referred to above, a kind of image classification method and device based on multi-modal matrix fill-in proposed, adopt the matrix fill-in algorithm of many Fusion Features, by excavating the complementarity between each various features, realization efficiently, many label images are classified fast.
The sorting algorithm of many Fusion Features is broadly divided into three kinds: the fusion (M.White of characteristic layer, Y.Yu, X.Zhang and D.Schuurmans, Convex multi-view subspace learning, NIPS, pp.1682-1690, 2012), interactive (the A.Blum and T.Mitchell that merges, Combining labeled and unlabeled data with co-training, COLT, pp.92-100, 1998) and the fusion (C.Snoek of sorter layer, M.Worring and A.Smeulders, Early versus late fusion in semantic video analysis, Multimedia, pp.399-402, 2005).What the present invention adopted is the strategy merging at sorter layer.
Specifically, the image classification method based on multi-modal matrix fill-in of the present invention, its step comprises:
1) to tape label, without label with test natural image data carry out feature extraction, obtain different characteristic and represent;
2) adopt matrix fill-in algorithm to generate the estimation label of each feature of tape label data;
3) each estimation label is carried out to linear combination to approach the known true label of its correspondence, obtain combination coefficient;
4), for various features, utilize the natural image data acquisition matrix fill-in algorithm predicts of tape label without label label and natural image data test;
5) adopt described combination coefficient to combine the characteristic label of step 4) prediction, obtain merging manifold label;
6) based on the manifold label of described fusion, natural image data are classified.
Further, adopt the Feature Extraction Algorithm such as SIFT, GIST to carry out described feature extraction.
Further, the each character representation of step 1) gained is carried out to pre-service, then carry out step 2); Preferably use coring principal component analysis (PCA) to carry out pre-service, can also adopt other methods such as Random Maps (Random Projection).In the present invention, pre-service is not steps necessary, but carries out the pre-service execution efficiency of boosting algorithm greatly, can improve to a certain extent classification accuracy rate simultaneously yet.
Further, step 2) implementation method be: after establishing pre-service, obtain X 0 (v), v=1 ..., V, wherein V is feature kind number, X 0what represent is that raw data matrix is for v kind feature; The natural image data of tape label are divided into two parts, suppose the label of Part I data unknown, the label of Part II data known; Adopt matrix fill-in algorithm to use Part II data to estimate the label of Part I data, obtain estimating label in like manner obtain the estimation label of Part II data will with be stitched together, obtain the estimation label of v kind character representation the feature of all kinds is implemented to said process, obtain v=1 ..., V.
Further, establishing combination coefficient is { θ v, find combination coefficient { θ by solving following optimization problem vmake f ( p k ) = Σ v = 1 V θ v p k ( v ) Approaching to reality value as much as possible
arg min θ 1 2 N Σ k = 1 N L ( f ( p k ) , y k 0 ) + η 2 | | θ | | 2 2 ,
s . t Σ v = 1 V θ v = 1 , θ v ≥ 0 ,
Wherein, N = | Ω Y l | For middle element number, p k = [ p k ( 1 ) , . . . , p k ( V ) ] T , θ=[θ 1..., θ v] t, regular terms can prevent weight { θ vbe partial to certain best feature, and η>=0th, balance parameter, L can be any convex loss function, the hinge function being adopted as square error function, support vector machine (SVM) etc.
The present invention also provides the image classification device that adopts said method, and it comprises:
Feature extraction unit, for natural image data tape label and Unknown Label are carried out to feature extraction, obtains different characteristic and represents;
Training data generation unit, connects described feature extraction unit, for adopting matrix fill-in algorithm to generate the estimation label of each feature of tape label data;
Combination coefficient computing unit, connects described training data generation unit, for each is estimated to label carries out linear combination to approach the known true label of its correspondence, obtains combination coefficient;
Label prediction unit, connects described feature extraction unit, for utilizing the natural image data acquisition of the tape label label of the natural image data of matrix fill-in algorithm predicts Unknown Label;
Output integrated unit, connects described combination coefficient computing unit and described label prediction unit, for adopting described combination coefficient to combine the label being obtained by label prediction unit, obtains merging manifold label;
Image classification unit, connects described Fusion Features unit, based on the manifold label of described fusion, natural image data is classified.
Further, above-mentioned image classification device also can comprise pretreatment unit, for using coring principal component analysis (PCA) to carry out pre-service to each character representation.Now this pretreatment unit connects described feature extraction unit, and training data generation unit is connected to receive pretreated data with this pretreatment unit respectively with label prediction unit.
Compared with prior art, advantage of the present invention and good effect are: simple, be easy to realize, and computation complexity is lower, and can obtain higher classification accuracy rate; Inherit the advantage of the Images Classification based on matrix fill-in simultaneously, for example, noise and missing data have been had to very strong robustness, can realize many labelings (can process the situation that piece image belongs to plurality of classes simultaneously) simultaneously.Main application of the present invention is: 1) at present image retrieval is main or based on key word, the network image without label that one width is uploaded can use many label image classification of the present invention to carry out automatic keyword, and then can be searched for by the image indexing system based on key word; 2) to the classification of summarizing of network picture, be convenient to user and browse.
Brief description of the drawings
Fig. 1 is some natural image samples with multiple class labels.
Fig. 2 is the step block diagram of the image classification method based on multi-modal matrix fill-in of the embodiment of the present invention.
Fig. 3 is the composition diagram of the image classification device of the embodiment of the present invention.
Fig. 4 is the present invention and other contrast and experiment based on matrix fill-in method.
Embodiment
Below by specific embodiment, and coordinate accompanying drawing, the present invention is described in detail.
The image classification method based on multi-modal matrix fill-in of the present embodiment, as shown in Figure 2, concrete steps comprise its flow process:
1) using different Feature Extraction Algorithm (SIFT, GIST etc.) to obtain different characteristic to all natural images (comprising tape label data, without label data, with test data) represents.
In classification, conventionally data can be divided into training data and test data, training data is for training classifier, and test data is the performance for testing classification device.Belong to training data without label data, but be different from the training data of tape label, they do not have label, can utilize these not the training data of tape label promote the performance of sorter.In the present invention, a large amount of can help mining data structure without label data, thereby obtains better test result.
2) in order to reduce the noise of input data, and consider the non-linear of characteristics of image, simultaneously also in order to improve matrix fill-in efficiency, first data (character representation of figure) are used to coring principal component analysis (PCA) (kernel principal component analysis, KPCA) carry out pre-service, obtain X 0 (v), v=1 ..., V, wherein V is feature kind number, X 0what represent is raw data matrix, X 0 (v)represent input (original) data matrix of v kind feature.In the implementation process of matrix fill-in algorithm, the value of this data matrix can change, and finally obtains a new data matrix X (v), ensureing and X 0 (v)in close as far as possible in its known terms, it is minimum meeting order.
3) generating training data: the training data is here for learning the combination coefficient between various features.For v kind character representation, the data of tape label are divided into two parts, first suppose the label of Part I data be unknown, then use Part II (known label) data to use matrix fill-in algorithm that the label of Part I data is estimated, obtain same reason, can obtain the estimation label of Part II data their corresponding true labels are will with be stitched together, obtain
4) feature by all kinds implements the 3rd) operation of step obtains v=1 ..., V, their corresponding true labels known.Therefore, can be by each linear combination goes to approach together combination coefficient { the θ obtaining valso just reflected the importance of various features.
5) then, for v ∈ 1 ..., V} kind feature, utilizes tape label data remove the label of prediction without label and test data with matrix fill-in algorithm, obtain with
6) last, by all { θ for output vcombine manifold prediction label Y that obtained final fusion uand Y t, this has just realized the classification without label data and test data.
The present invention lays particular emphasis on " multi-modal ", and how comprehensive various features realizes the Images Classification based on matrix fill-in.Basic thought is first to obtain classification results by single feature of planting, then learns one group of coefficient { θ vnext comprehensive these results.The classification results obtaining is " soft label ", i.e. some real number values, and what they represented is the degree that certain test pattern belongs to certain classification; In the time differentiating particularly certain image and belong to which classification (give image " hard label "), can be by setting a threshold value, in certain classification, the predicted value of certain image is greater than this threshold value and just gives this image by this class label.
Above the 3rd) and the 5th) step mentions and uses matrix fill-in algorithm predicts label.The following describes and how to use matrix fill-in to carry out many labelings.
A given n sample D n = { ( x j 0 , y j 0 ) j = 1 n } , Wherein proper vector corresponding label y j 0 ∈ { - 1,1 } m (for being unknown without label and test data), m is class label number.In order to predict Unknown Label, first construction feature matrix with corresponding label matrix problem is transformed into filled matrix Y 0in unknown term.Consider linear classification model y j=Wx j+ b, and the eigenmatrix that hypothesis prediction obtains is X=[x 1..., x n], label matrix is Y=[y 1..., y n].So, matrix Z=[Y X and Y being combined into; X; 1 t] be low-rank.First this be because the every a line in Y can be by [X; 1 t] in line linearity combination obtain.On the other hand, X itself also meets low-rank conventionally, and this is also the basis that some linear dimension reduction methods (such as PCA etc.) can be implemented.If Z 0=[Y 0; X 0; 1 t] and suppose Ω xand Ω yknown terms index set wherein, so can be by minimizing the nuclear norm of Z, and Z and Z 0at Ω x, Ω yon difference estimate Y 0in unknown term, optimization problem can be expressed as:
arg min μ | | Z | | * 1 | Ω X | Σ i , j ∈ Ω X c x ( Z ij , Z ij 0 ) + λ | Ω Y | Σ i , j ∈ Ω Y c y ( Z ij , Z ij 0 ) , - - - ( 1 )
s . t Z ( m + d + 1 ) = 1 T .
Wherein, c xand c ybe respectively loss function, μ and λ are greater than zero balance parameter.Concrete solution procedure can be with reference to (A.Goldberg, X.Zhu, B.Recht, J.Xu and R.Nowak, Transduction with matrix completion:three birds with one stone, NIPS, pp.757-765,2010).
The following describes the 4th) how to learn combination coefficient { θ in step v.Study combination various features is referred to as " MVMC " by the present invention.
For the ease of the explanation of deriving, the present embodiment becomes the 2-d index (i, j) about sample into one dimension index k by vectorization, then uses represent the index set of known label data, represent the training data (the prediction label value of labeled data) generating from v kind feature.Our target is just to locate one group of coefficient { θ vmake approaching to reality value as much as possible optimization problem can be expressed as:
arg min θ 1 2 N Σ k = 1 N L ( f ( p k ) , y k 0 ) + η 2 | | θ | | 2 2 , - - - ( 2 )
s . t Σ v = 1 V θ v = 1 , θ v ≥ 0 ,
Wherein N = | Ω Y l | For middle element number, p k = [ p k ( 1 ) , . . . . , p k ( V ) ] T , θ=[θ 1..., θ v] t, regular terms can prevent weight { θ vbe partial to best certain feature (certain θ vbe 1, its residual value is all 0), η>=0th, balance parameter.L can be any convex loss function, as square error function, and the hinge function that support vector machine (SVM) adopts etc.
When selecting square error function, i.e. L (f (x), y)=(f (x)-y) 2, optimization problem becomes following form:
arg min 1 2 N | | P T θ - y 0 | | F 2 + η 2 | | θ | | 2 2 - - - ( 3 )
s . t Σ v = 1 V θ v = 1 , θ v ≥ 0 ,
Wherein P=[p 1..., p n], this problem can adopt coordinate gradient decline (coordinate descent) algorithm to solve.Specifically, only select two variable θ at every turn iand θ jupgrade, until the change amount of adjacent twice iterative target function is less than a threshold value.Consider bound term by use lagrange's method of multipliers in every step iteration, can obtain following update rule:
θ i * = η ( θ i + θ j ) + ( h i - h j ) + ϵ ij ( H ii - H ij - H ji + H jj ) + 2 η θ j * = θ i + θ j - θ i * - - - ( 4 )
Wherein, a positive semidefinite matrix, p (i)the i that refers to matrix P is capable, ϵ ij = ( H ii - H ij - H ji + H jj ) θ i - Σ k ( H ik - H jk ) θ k . Further will retrain θ v>=0 takes into account, and has
When adopting hinge loss function, i.e. L (f (x), y)=(1-yf (x)) +, can obtain following optimization problem:
arg min 1 2 | | θ | | 2 2 + C Σ k = 1 N ξ k ,
s . t y k 0 ( θ T p k ) ≥ 1 - ξ k , ξ k ≥ 0 , k = 1 , . . . , N - - - ( 6 )
Σ v = 1 V θ v = 1 , θ v ≥ 0 , v = 1 . . . , V
Wherein C>=0th, balance parameter, ξ kit is slack variable.This problem can, by using lagrange's method of multipliers to convert dual problem to, then alternately be optimized two subproblems and solve.Subproblem 1:
arg max α α T ( 1 N - Y d 0 p T δ ) - 1 2 α T Hα , - - - ( 7 )
s . t 0 ≤ α k ≤ C .
Wherein, α and δ are dual variables to be solved.H is the matrix of a N × N, each and 1 nthat length is the column vector that the each element of N is 1.Subproblem 2:
arg max δ ( - α T Y d 0 p T ) δ - 1 2 δ T δ , - - - ( 8 )
s . t δ v ≥ 0 .
Alternately optimize (7) and (8) and obtain α *, δ *, combination coefficient its normalization is finally separated.
Fig. 3 is the composition diagram that adopts the image classification device of said method.This device comprises: feature extraction unit, for natural image data tape label and Unknown Label are carried out to feature extraction, obtains different characteristic and represent; Pretreatment unit, connects described feature extraction unit, for using coring principal component analysis (PCA) to carry out pre-service to each character representation; Training data generation unit, connects described pretreatment unit, for adopting matrix fill-in algorithm to generate the estimation label of each feature of tape label data; Combination coefficient computing unit, connects described training data generation unit, for each is estimated to label carries out linear combination to approach the known true label of its correspondence, obtains combination coefficient; Label prediction unit, connects described pretreatment unit, for utilizing the natural image data acquisition of the tape label label of the natural image data of matrix fill-in algorithm predicts Unknown Label; Output integrated unit, connects described combination coefficient computing unit and described label prediction unit, for adopting described combination coefficient to combine the label being obtained by label prediction unit, obtains merging manifold label; Image classification unit, connects described output integrated unit, based on the manifold label of described fusion, natural image data is classified.It should be noted that, above-mentioned pretreatment unit is not necessary, in other embodiments, can not establish this pretreatment unit yet, and training data generation unit is directly connected with feature extraction unit with label prediction unit.
Fig. 4 has compared some algorithms based on matrix fill-in, comprises that MVMC-LS(utilizes square error loss), MVMC-SVM(utilizes hinge loss) and other three kinds of algorithms, be respectively 1) BMC: use monotypism energy best feature; 2) CMC: various features is conspired to create to a long vector; 3) AMC: adopt average combined coefficient when combination various features.The present invention is at PASCAL VOC ' 07(M.Everingham, L.Van Gool, C.Williams, J.Winn and A.Zisserman, The PASCAL Visual Object Classes Challenge2007 (VOC2007) Results) test on data set, use mAP(M.Zhu, Recall, precision and average precision, University of Waterloo, Tech.Rep., 2004) as assessment level, and to have tested tape label number of samples be 100,150,200 3 kinds of situations.Experimental result shows, uses various features simultaneously, no matter is that feature is conspired to create to long vector (CMC), or Output rusults is averaged to (AMC), is all better than and uses single feature of planting.Use the combination coefficient of the MVMC Algorithm Learning output of the present invention's proposition can further improve performance, and the result outline of losing based on hinge is better than use square error (the latter's algorithm complex is lower).
Table 1 has compared MVMC-SVM and other several many Feature Fusion Algorithms, respectively 1) HierSVM(J.Kludas, E.Bruno and S.Marchand-Maillet, Information fusion in multimedia information retrieval, Adaptive Multimedia Retrieval, pp.147-159,2009): for every kind of feature is trained respectively a svm classifier device, then merge multiple output by a SVM; 2) SimpleMKL(A.Rakotomamonjy, F.Bach, S.Canu and Y.Grandvalet, SimpleMKL, JMLR, 9:2491-2521,2008): a kind of popular Multiple Kernel Learning (MKL) algorithm, for every kind of feature builds respectively core, then is used for classification by all core linear combination; 3) LpMKL(M.Kloft, U.Brefeld, S.Sonnenburg and A.Zien, Lp-norm multiple kernel learning, JMLR, 12:953-997,2011): a kind of Multiple Kernel Learning algorithm proposing recently, is generalized to l by MKL p-norm, p>=1.Experimental result shows that MVMC-SVM is better than other algorithm, especially, compares with LpMKL, the present invention is 100 in tape label data, 150 and obtained respectively 7.2%, 9.7% and 11.4% mAP(mean average precision at 200 o'clock, the average of Average Accuracy) promote.
The contrast and experiment of table 1. the inventive method and other multi-modal method
Above embodiment is only in order to technical scheme of the present invention to be described but not be limited; those of ordinary skill in the art can modify or be equal to replacement technical scheme of the present invention; and not departing from the spirit and scope of the present invention, protection scope of the present invention should be as the criterion with described in claim.

Claims (10)

1. the natural image sorting technique based on multi-modal matrix fill-in, comprises the following steps:
1) to tape label, without label with test natural image data carry out feature extraction, obtain different characteristic and represent;
2) adopt matrix fill-in algorithm to generate the estimation label of each feature of tape label data;
3) each estimation label is carried out to linear combination to approach the known true label of its correspondence, obtain combination coefficient;
4), for various features, utilize the natural image data acquisition matrix fill-in algorithm predicts of tape label without label label and natural image data test;
5) adopt described combination coefficient to combine the characteristic label of step 4) prediction, obtain merging manifold label;
6) based on the manifold label of described fusion, natural image data are classified.
2. the method for claim 1, its spy is: the each character representation of step 1) gained is carried out to pre-service, then carry out step 2).
3. method as claimed in claim 2, its spy is: use coring principal component analytical method or Random Maps method to carry out described pre-service.
4. method as claimed in claim 2, its spy is, step 2) implementation method be:
If obtain X after pre-service 0 (v), v=1 ..., V, wherein V is feature kind number, X 0what represent is that raw data matrix is for v kind feature; The natural image data of tape label are divided into two parts, suppose the label of Part I data unknown, the label of Part II data known;
Adopt matrix fill-in algorithm to use Part II data to estimate the label of Part I data, obtain estimating label in like manner obtain the estimation label of Part II data will with be stitched together, obtain the estimation label of v kind character representation the feature of all kinds is implemented to said process, obtain v=1 ..., V.
5. method as claimed in claim 4, its spy is: establishing combination coefficient is { θ v, find combination coefficient { θ by solving following optimization problem vmake f ( p k ) = Σ v = 1 V θ v p k ( v ) Approaching to reality value as much as possible
arg min θ 1 2 N Σ k = 1 N L ( f ( p k ) , y k 0 ) + η 2 | | θ | | 2 2 ,
s . t Σ v = 1 V θ v = 1 , θ v ≥ 0 ,
Wherein, N = | Ω Y l | For middle element number, p k = [ p k ( 1 ) , . . . , p k ( V ) ] T , θ=[θ 1..., θ v] t, regular terms be used for preventing weight { θ vbe partial to certain best feature, and η>=0th, balance parameter, L is any convex loss function.
6. method as claimed in claim 5, is characterized in that: L is square error function, i.e. L (f (x), y)=(f (x)-y) 2, adopt coordinate gradient descent algorithm to solve, each iteration is only upgraded two variablees, and update rule is:
θ i * = η ( θ i + θ j ) + ( h i - h j ) + ϵ ij ( H ii - H ij - H ji + H jj ) + 2 η θ j * = θ i + θ j - θ i *
And
Wherein, a positive semidefinite matrix, p (i)the i that refers to matrix P is capable, ϵ ij = ( H ii - H ij - H ji + H jj ) θ i - Σ k ( H ik - H jk ) θ k .
7. method as claimed in claim 5, its spy is: L is hinge loss function, i.e. L (f (x), y)=(1-yf (x)) +, solve by alternately optimizing two subproblems,
Subproblem 1:
arg max α α T ( 1 N - Y d 0 p T δ ) - 1 2 α T Hα ,
s . t 0 ≤ α k ≤ C ,
Wherein, H is the matrix of a N × N, each H ij = y i 0 y j 0 p i T p j , Y d 0 = diag ( y 1 0 , . . . , y N 0 ) , 1 nthat length is the column vector that the each element of N is 1;
Subproblem 2:
arg max δ ( - α T Y d 0 P T ) δ - 1 2 δ T δ ,
s . t δ v ≥ 0 ,
Combination coefficient θ = Σ k = 1 N α k * y k 0 p k + δ * , Its normalization is finally separated.
8. method as claimed in claim 1 or 2, is characterized in that: the Feature Extraction Algorithm that described feature extraction adopts is SIFT or GIST.
9. an image classification device that adopts method described in claim 1, is characterized in that, comprising:
Feature extraction unit, for natural image data tape label and Unknown Label are carried out to feature extraction, obtains different characteristic and represents;
Training data generation unit, for adopting matrix fill-in algorithm to generate the estimation label of each feature of tape label data;
Combination coefficient computing unit, connects described training data generation unit, for each is estimated to label carries out linear combination to approach the known true label of its correspondence, obtains combination coefficient;
Label prediction unit, for utilizing the natural image data acquisition of the tape label label of the natural image data of matrix fill-in algorithm predicts Unknown Label;
Output integrated unit, connects described combination coefficient computing unit and described label prediction unit, for adopting described combination coefficient to combine the label being obtained by label prediction unit, obtains merging manifold label;
Image classification unit, connects described Fusion Features unit, based on the manifold label of described fusion, natural image data is classified.
10. device as claimed in claim 9, is characterized in that: also comprise pretreatment unit, connect described feature extraction unit, for each character representation is carried out to pre-service; Described training data generation unit is connected to receive pretreated data with this pretreatment unit respectively with described label prediction unit.
CN201310021734.3A 2013-01-21 2013-01-21 Natural image classification method and device on basis of multi-modal matrix filling Expired - Fee Related CN103942214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310021734.3A CN103942214B (en) 2013-01-21 2013-01-21 Natural image classification method and device on basis of multi-modal matrix filling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310021734.3A CN103942214B (en) 2013-01-21 2013-01-21 Natural image classification method and device on basis of multi-modal matrix filling

Publications (2)

Publication Number Publication Date
CN103942214A true CN103942214A (en) 2014-07-23
CN103942214B CN103942214B (en) 2017-05-17

Family

ID=51189884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310021734.3A Expired - Fee Related CN103942214B (en) 2013-01-21 2013-01-21 Natural image classification method and device on basis of multi-modal matrix filling

Country Status (1)

Country Link
CN (1) CN103942214B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197663A (en) * 2018-01-23 2018-06-22 南京大学 Based on the calligraphy work image classification method to pairing set Multi-label learning
CN108537137A (en) * 2018-03-19 2018-09-14 安徽大学 Differentiate the multi-modal biological characteristic fusion identification method of correlation analysis based on label
CN109190676A (en) * 2018-08-06 2019-01-11 百度在线网络技术(北京)有限公司 model training method, device, equipment and storage medium
CN111177388A (en) * 2019-12-30 2020-05-19 联想(北京)有限公司 Processing method and computer equipment
CN111476285A (en) * 2020-04-01 2020-07-31 深圳力维智联技术有限公司 Training method of image classification model, image classification method and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107894971B (en) * 2017-10-27 2019-11-26 北京大学 A kind of expansible sequence labelling method neural network based

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144033A1 (en) * 2007-11-30 2009-06-04 Xerox Corporation Object comparison, retrieval, and categorization methods and apparatuses
CN101482926A (en) * 2009-02-19 2009-07-15 北京大学 Extensible self-adapting multi-core classification method
CN101807254A (en) * 2009-02-13 2010-08-18 烟台海岸带可持续发展研究所 Implementation method for data characteristic-oriented synthetic kernel support vector machine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090144033A1 (en) * 2007-11-30 2009-06-04 Xerox Corporation Object comparison, retrieval, and categorization methods and apparatuses
CN101807254A (en) * 2009-02-13 2010-08-18 烟台海岸带可持续发展研究所 Implementation method for data characteristic-oriented synthetic kernel support vector machine
CN101482926A (en) * 2009-02-19 2009-07-15 北京大学 Extensible self-adapting multi-core classification method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197663A (en) * 2018-01-23 2018-06-22 南京大学 Based on the calligraphy work image classification method to pairing set Multi-label learning
CN108197663B (en) * 2018-01-23 2022-03-25 南京大学 Calligraphy work image classification method based on dual set multi-label learning
CN108537137A (en) * 2018-03-19 2018-09-14 安徽大学 Differentiate the multi-modal biological characteristic fusion identification method of correlation analysis based on label
CN108537137B (en) * 2018-03-19 2021-08-20 安徽大学 Multi-modal biological characteristic fusion recognition method based on label identification correlation analysis
CN109190676A (en) * 2018-08-06 2019-01-11 百度在线网络技术(北京)有限公司 model training method, device, equipment and storage medium
CN109190676B (en) * 2018-08-06 2022-11-08 百度在线网络技术(北京)有限公司 Model training method, device, equipment and storage medium for image recognition
CN111177388A (en) * 2019-12-30 2020-05-19 联想(北京)有限公司 Processing method and computer equipment
CN111177388B (en) * 2019-12-30 2023-07-21 联想(北京)有限公司 Processing method and computer equipment
CN111476285A (en) * 2020-04-01 2020-07-31 深圳力维智联技术有限公司 Training method of image classification model, image classification method and storage medium
CN111476285B (en) * 2020-04-01 2023-07-28 深圳力维智联技术有限公司 Training method of image classification model, image classification method and storage medium

Also Published As

Publication number Publication date
CN103942214B (en) 2017-05-17

Similar Documents

Publication Publication Date Title
Yu et al. Hierarchical deep click feature prediction for fine-grained image recognition
Yang et al. Deep transfer learning for military object recognition under small training set condition
Xu et al. Transductive zero-shot action recognition by word-vector embedding
Liu et al. Cross-convolutional-layer pooling for image recognition
Zhang et al. Large-scale multi-label classification using unknown streaming images
Wu Power mean SVM for large scale visual classification
CN103942214A (en) Natural image classification method and device on basis of multi-modal matrix filling
Wang et al. Multi-task support vector machines for feature selection with shared knowledge discovery
Cai et al. New graph structured sparsity model for multi-label image annotations
CN105303198A (en) Remote-sensing image semi-supervision classification method based on customized step-size learning
CN104899253A (en) Cross-modality image-label relevance learning method facing social image
Uricchio et al. Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging
CN105718532A (en) Cross-media sequencing method based on multi-depth network structure
CN103020122A (en) Transfer learning method based on semi-supervised clustering
CN104951791A (en) Data classification method and apparatus
Abdul-Rashid et al. Shrec’18 track: 2d image-based 3d scene retrieval
Zhang et al. Tensor-based multi-view label enhancement for multi-label learning.
Hong et al. Safa: Sample-adaptive feature augmentation for long-tailed image classification
CN114925205B (en) GCN-GRU text classification method based on contrast learning
Nareshpalsingh et al. Multi-label classification methods: A comparative study
Olaode et al. Unsupervised image classification by probabilistic latent semantic analysis for the annotation of images
Ji et al. Balance between object and background: Object-enhanced features for scene image classification
Zheng et al. Learning from the web: Webly supervised meta-learning for masked face recognition
CN105117735A (en) Image detection method in big data environment
Guo Deep learning for visual understanding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170517

Termination date: 20200121

CF01 Termination of patent right due to non-payment of annual fee