CN104809468A

CN104809468A - Multi-view classification method based on indefinite kernels

Info

Publication number: CN104809468A
Application number: CN201510188546.9A
Authority: CN
Inventors: 薛晖
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2015-04-20
Filing date: 2015-04-20
Publication date: 2015-07-29

Abstract

The invention discloses a multi-view classification method based on indefinite kernels. The method includes the following steps of firstly, obtaining a multi-view image set used for conducting training; secondly, generating a projection matrix through multi-view data, and projecting different view data in a united low-dimension space; thirdly, training samples in the low-dimension projection space through the definite kernel technology to obtain a classifier; fourthly, standardizing new multi-view data sets, projecting the data sets into the trained low-dimension space, and inputting the projected data sets into the classifier obtained through training so as to obtain the classification result. By means of the method, the mark-number-incomplete multi-view classification problem which needs to be solved is converted into the single-view semi-supervised classification problem in the united low-dimension space, and the mark number integrity on a single view can be achieved; judgment information of data with mark numbers and structure information of data without mark numbers are fully utilized, and the performance of the classifier is improved; in addition, new multi-view data can be directly tested and classified.

Description

A kind of multi views sorting technique based on indefinite core

Technical field

The present invention relates to pattern-recognition and machine learning techniques field, particularly a kind of multi views sorting technique based on indefinite core.

Background technology

Kernel method is one of core technology in machine learning, it is a class important method of the Nonlinear Learning problem existed in solving practical problems, its core concept is, by a kind of Nonlinear Mapping, raw data is embedded into high-dimensional feature space, then utilizes linear learning device in new spatial analysis and process data.Its advantage is mainly reflected in, and without the need to knowing concrete form and the parameter of Nonlinear Mapping in advance, but introducing kernel function, by changing kernel function form and parameter, implicitly realizing the mapping from the low-dimensional input space to high-dimensional feature space; Utilize kernel function, the kernel function that inner product operation complicated in high-dimensional feature space can be converted into the low-dimensional input space calculates, thus solves the problems such as " dimension disaster " that may occur when high-dimensional feature space computing dexterously, greatly reduces calculated amount; According to particular problem, kernel function can be selected neatly, to embed the priori of more problems concerning study.

Although kernel method has above-mentioned many key advantage, be limited to traditional Statistical Learning Theory, existing most of kernel method all requires core positive definite, meets Mercer condition.But in many practical applications, positive definite kernel can not obtain good Generalization Capability sometimes, be even difficult to utilize.On the contrary, indefinite core be able to show the experience classifying quality more excellent than positive definite kernel, becomes a study hotspot in machine learning field just gradually, receives the concern of more and more researcher.Such as, in recognition of face problem, Liu employs indefinite fractional order polynomial kernel in core principle component analysis (KPCA), achieves recognition effect more better than the polynomial kernel KPCA of use positive definite; In video tracking problem, Liwicki etc. make use of indefinite robust gradient core further, to solve the out of true solution problem may brought to keep renewal speed to use the sample set of simplification to represent in positive definite kernel KPCA.Experiment shows, adopts the KPCA of indefinite core to be obviously better than using the gaussian kernel KPCA of positive definite.But existing indefinite kernel method is all confined in traditional single-view problem.Along with the development of data acquisition technology, researcher finds in increasing real-world problem, there is a large amount of multi views sample, for example, people can describe with facial image and sound, and each internet webpage can represent etc. with document and web page interlinkage.Therefore, being extended to further by indefinite kernel method in multi views problem concerning study, to adapt to the needs of practical application better, is necessary.

Multi views study is another hot issue of machine learning.Large quantity research shows, often contains more prior imformation in problem concerning study in multi views data.Multi views the destination of study is just, deeply excavates the potential information between multiple view, by mutually promoting between view, to obtain better learning performance.But, in the study of traditional multi views, often require that the class label of data on each view is complete, but this requirement is usually difficult to meet in realistic problem.Such as, in one section of video record, p.s. all can flash multiframe picture, and it is almost impossible for wanting in every frame picture, mark everyone; Meanwhile, if having multiple people to talk in video recording, the audio frequency marking everyone is wanted also to be difficult to.Therefore, under the data on Audio and Video two views only have a small amount of labelled situation, it is extremely difficult especially that the data that ensure on each view all marked all people.Therefore, on multiple view, label is imperfect is an one of difficult problem in multi views study, and existing multi views learning method all or can not be difficult to head it off effectively.

In existing multi views study, nearly all learning algorithm all designs based on two views.When the number of view is more than two, man-to-man strategy can be adopted to learn, by view combination of two, apply existing algorithm and classify, then choose final classification results in a vote.Therefore, the multi views problem concerning study of two views is the keys solving multi views classification problem.

Summary of the invention

Goal of the invention: for prior art Problems existing, the present invention proposes a kind of multi views sorting technique based on indefinite core, the method is by being mapped to a new unified lower dimensional space and introducing indefinite kernel method by multiple view spaces, effectively prevent " dimension disaster ", improve dirigibility and the validity of subsequent classification.

Technical scheme: before the method concrete steps are described, first provide related definition and expression:

(a) sample: one group of Amazon database picture data set;

(b) category label: the i.e. generic of a sample;

(c) indefinite core: developed and next indefinite kernel function by data inner product in reproducing kernel Kre ǐ n space;

(d) canonical correlation analysis algorithm (CCA): Canonical Correlation Analysis;

(e) principal component analysis (PCA) (PCA): Principal Component Analysis;

(f) semi-supervised canonical correlation analysis algorithm (SemiCCA): Semi-supervised Canonical CorrelationAnalysis;

G () partly matches semi-supervised Generalized correlation analysis (S ²gCA): Semi-paired and Semi-supervisedGeneralized Correlation Analysis;

H territory adaptive method (ARC-t) that () changes based on asymmetric consideration convey: Asymmetric RegularizedCross-domain Transformation Problem with similarity and dissimilarity constraints;

(i) semi-supervised identification indefinite core regularization least square sorter (Semi-IKRLSC): Semi-supervised discriminatively Indefinite Kernel Regularized Least-SquareClassifier;

(j) spatial synchronization: multiple view spaces is mapped to new single view spaces.

The invention provides a kind of multi views sorting technique based on indefinite core, mainly for the multi views sorting technique of two views, the method comprises two stages: training and application, specifically comprise the steps:

1) the multi-view image collection for training is obtained;

2) multi views data genaration projection matrix is utilized, by different views data projection to unified lower dimensional space;

3) utilize indefinite nuclear technology to the sample training in low dimension projective space, obtain sorter;

4) the multi views data set standardizing new, projects to the lower dimensional space of training and obtaining by data set, and the data set after projection is input in the sorter of training and obtaining, and obtains classification results.

Described step 2) adopt SemiCCA algorithm to carry out dimensionality reduction, specifically comprise the steps:

1) according to the data of two views, projection matrix is solved:

If X=is (X ^(L), X ^(U)), Y=(Y ^(L), Y ^(U)) represent data in two view View-X and View-Y respectively, wherein with represent the sample matched in View-X and View-Y respectively, with represent unpaired sample in View-X and View-Y respectively, the formula solving projection matrix is:

M (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) = λN (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) - - - (1)

Wherein, Q _xrepresent the projection matrix corresponding to View-X, Q _yrepresent the projection matrix corresponding to View-Y,

M = μ (\begin{matrix} 0 & R_{xy}^{(L)} \\ R_{yx}^{(L)} & 0 \end{matrix}) + (1 - μ) (\begin{matrix} R_{xx} & 0 \\ 0 & R_{yy} \end{matrix}),

N = μ (\begin{matrix} R_{xx}^{(L)} & 0 \\ 0 & R_{yy}^{(L)} \end{matrix}) + (1 - μ) (\begin{matrix} I_{D_{x}} & 0 \\ 0 & I_{D_{y}} \end{matrix}),

represent the covariance matrix of paired sample in View-X, represent the covariance matrix of paired sample in View-Y, represent the covariance matrix of paired sample between View-X and View-Y two views, represent the covariance matrix of paired sample between View-Y and View-X two views, represent the covariance matrix of total data in View-X, represent the covariance matrix of total data in View-Y, the μ regularization parameter that to be span be [0,1], λ is generalized eigenvalue, D _xand D _yrepresent the dimension of View-X and View-Y respectively, represent D _x× D _xthe unit matrix of dimension, represent D _y× D _ythe unit matrix of dimension;

2) according to the projection matrix solved, lower dimensional space is mapped to by unified for multi views data X and Y.

Described step 3) adopt Semi-IKRLSC Algorithm for Training sorter, given sample set

S = {x_{1}, . . ., x_{l}, x_{l + 1}, . . ., x_{n}} &Subset; R^{m},

Wherein

S_{L} = {(x_{i}, y_{i})}_{i = 1}^{l}

Have label sample, y from c class _irepresent category label, be non-label sample, concrete training process is:

1) degree of separation T between compactness and class is obtained in the local class having a label sample _lwand T _lb:

\begin{matrix} T_{lw} = Σ_{k = 1}^{c} Σ_{i = 1}^{l_{k}} {| | f (x_{i}^{(k)}) - 1 / l_{k} Σ_{j = 1}^{l_{k}} f (x_{j}^{(k)}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lw} Q \end{matrix} - - - (5)

\begin{matrix} T_{lb} = Σ_{k = 1}^{c} {l_{k} | | 1 / l_{k} Σ_{j = 1}^{l_{k}} f (x_{j}^{(k)}) - 1 / l Σ_{j = 1}^{l} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lb} Q \end{matrix} - - - (6)

Wherein, f (x)=Q ^tx is classifier functions to be asked, represent i-th marker samples of a kth classification, l _kfor the number of samples that a kth classification marks altogether, Q is discriminant classification vector,

M_{lw} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{j = 1}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

M_{lb} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{j = 1}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

in order to weigh the relative position of similar middle sample, δ is variance;

2) the global structure information T of data is calculated _p:

\begin{matrix} T_{p} = Σ_{i = 1}^{n} {| | f (x_{i}) - \frac{1}{n} Σ_{j = 1}^{n} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{p} Q \end{matrix} - - - (7)

Wherein, f is classifier functions, and Q is discriminant classification vector,

M_{p} = \frac{1}{2} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T};

3) the global structure information of data is introduced degree of separation between compactness and class in the class of local:

G^{(SSL)} (f) = (1 - γ) T_{lw} + γ {| | Q | |}^{2} = Q^{T} [(1 - γ) M_{lw} + γI] Q = Q^{T} T_{rlw}^{(SSL)} Q - - - (8)

J^{(SSL)} (f) = (1 - γ) T_{lb} + γ T_{p} = Q^{T} [(1 - γ) M_{lb} + γ M_{p}] Q = Q^{T} T_{rlb}^{(SSL)} Q - - - (9)

Wherein, G ^(SSL)(f) and J ^(SSL)(f) be respectively introduce global structure information class in degree of separation between compactness and class, γ is regularization parameter, T _lwfor have a label data local class in compactness, Q is discriminant classification vector, and I is unit battle array, for have a label data local class between degree of separation, T _pfor the global structure information of data,

T_{rlb}^{(SSL)} = (1 - γ) M_{lb} + γ M_{p};

4) utilize indefinite core sorting technique, train all samples, obtain classifier functions f ^*:

\begin{matrix} f^{*} = \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - f (x_{i}))}^{2} + [η G^{(SSL)} (f) - (1 - η) J^{(SSL)} (f)]} \\ = \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - Q^{T} x_{i})}^{2} + Q^{T} [η T_{rlw}^{(SSL)} - (1 - η) T_{rlb}^{(SSL)}] Q} \end{matrix} - - - (12)

Wherein, represent reproducing kernel Kre ǐ n space, l is the number having label sample, y _ifor sample x _icategory label, f is classifier functions, and 0≤η≤1 is regularization parameter, G ^(SSL)compactness in f class that () is introducing global structure information, J ^(SSL)degree of separation between f class that () is introducing global structure information, Q is discriminant classification vector,

T_{rlw}^{(SSL)} = (1 - γ) M_{lw} + γI,

T_{rlb}^{(SSL)} = (1 - γ) M_{lb} + γ M_{p} .

Beneficial effect: advantage of the present invention is mainly reflected in three aspects:

1, select the unification of the multiple view of SemiCCA algorithm realization, will the imperfect multi views classification problem of label solved be needed to be converted into the single-view semisupervised classification problem in unified lower dimensional space, the label that can realize in single-view be complete.

2, select the Semi-IKRLSC Algorithm for Training based on indefinite core to go out performance preferably sorter, and take full advantage of the discriminant information of label data and the structural information of unlabelled data, to promote classifier performance.

3, can directly new multi views data be tested and be classified.

Accompanying drawing explanation

Fig. 1 is the method flow diagram of the multi views classification that the present invention is based on indefinite core;

Fig. 2 is the spatial synchronization schematic diagram in training stage dimensionality reduction step of the present invention.

Embodiment

Below in conjunction with accompanying drawing and example, the present invention is described in detail, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to the various equivalent form of value of the present invention has all fallen within right appended by the application.

The imperfect problem of label during the present invention learns mainly for multi views, and in conjunction with indefinite kernel method, to obtaining better classification performance.Institute's problems faced mainly comprises following three aspects:

1, in multi views study, a large amount of label sample is obtained very difficult.Having on a small quantity in label sample situation, ensureing that the data label of each view is complete extremely difficult especially.But existing multi views learning method is mostly based upon on the complete basis of viewdata label, very difficult process is not at same view spaces and the incomplete data of label.

2, existing sorting technique not only directly can not train the incomplete data of multi views label, nor can utilize data message fully, and classification performance is not high.

3, existing multi views method effectively cannot be tested new multi views label deficiency of data and classifies.

A kind of multi views sorting technique based on indefinite core as shown in Figure 1, comprises two stages: training and application, specifically comprise the steps:

1) the multi-view image collection for training is obtained;

Existing much research demonstrates the remarkable learning performance of CCA algorithm in multi views study, but CCA is based upon on the data set basis of matching completely, and CCA probably caused and joins phenomenon.For avoiding this phenomenon, SemiCCA algorithm is on the basis of CCA, and the penalty term adding PCA form improves, and make use of pairing and non-paired data simultaneously.Its basic thought is, finds one group of projection matrix Q _xand Q _y, make the correlativity of similar features in sample set maximum, the correlativity between different characteristic is minimum.We utilize SemiCCA algorithm, by the data projection of different views in unified lower dimensional space, make data class label complete, so, both reached the object of dimensionality reduction, also solved the incomplete problem of label, the spatial synchronization thought that Fig. 2 directviewing description is used in dimensionality reduction step.

Given X=(X ^(L), X ^(U)), Y=(Y ^(L), Y ^(U)) represent data in two view View-X and View-Y respectively, wherein with represent the sample matched in View-X and View-Y respectively, with represent unpaired sample in View-X and View-Y respectively.Therefore, can be expressed as to solving of SemiCCA the problem that solves generalized eigenvalue:

M (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) = λN (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) - - - (1)

Wherein,

M = μ (\begin{matrix} 0 & R_{xy}^{(L)} \\ R_{yx}^{(L)} & 0 \end{matrix}) + (1 - μ) (\begin{matrix} R_{xx} & 0 \\ 0 & R_{yy} \end{matrix}) - - - (2)

N = μ (\begin{matrix} R_{xx}^{(L)} & 0 \\ 0 & R_{yy}^{(L)} \end{matrix}) + (1 - μ) (\begin{matrix} I_{D_{x}} & 0 \\ 0 & I_{D_{y}} \end{matrix}) - - - (3)

represent the covariance matrix of paired sample in View-X, represent the covariance matrix of paired sample in View-Y, represent the covariance matrix of paired sample between View-X and View-Y two views, represent the covariance matrix of paired sample between View-Y and View-X two views, represent the covariance matrix of total data in View-X, represent the covariance matrix of total data in View-Y, the μ regularization parameter that to be span be [0,1], for carrying out suitable compromise between CCA and PCA.Solve this generalized eigenvalue problem, obtain projection matrix Q _xand Q _ycorrespond respectively to View-X and View-Y, be then mapped to unified for multi views data X and Y in new lower dimensional space Z, thus reach the object of spatial synchronization, wherein the dimension of View-Z is less than the dimension of View-X and View-Y.Thus, the incomplete multi views classification problem of label has just changed into the single-view semisupervised classification problem in View-Z.

In semi-supervised learning, the traditional classifier based on Tikhonov Regularization Theory only considered the slickness of classification function, and ignores discriminant information and the structural information of data, limits the raising of classification performance.Semi-IKRLSC algorithm is by being incorporated in semisupervised classification by indefinite core, global structure information in conjunction with data defines new regularization term with local discriminant information, take full advantage of sample prior imformation, make similar sample in output region close as much as possible, and inhomogeneity sample as much as possible away from, effectively improve classifier performance.

Given sample set

S = {x_{1}, . . ., x_{l}, x_{l + 1}, . . ., x_{n}} &Subset; R^{m},

Wherein

S_{L} = {(x_{i}, y_{i})}_{i = 1}^{l}

Have label sample, y from c class _irepresent category label, it is non-label sample.The objective function of Semi-IKRLSC algorithm can be write as following form:

\min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} V (y_{i}, f (x_{i})) + λ {< f, f >}_{\tilde{K}}} - - - (4)

Wherein, represent reproducing kernel Kre ǐ n space, V (y _i, f (x _i)) be loss function, in order to measure the error between class label predicted value and actual value. for indefinite regularization term, in order to global structure information and the local discriminant information of fused data.

In order to construct indefinite regularization term first T is used _lwand T _lbdegree of separation between compactness and class in the local class indicating label sample, T _lwand T _lbbe defined as follows:

\begin{matrix} T_{lw} = Σ_{k = 1}^{c} Σ_{i = 1}^{l_{k}} {| | f (x_{i}^{(k)}) - 1 / l_{k} Σ_{j = 1}^{l_{k}} f (x_{j}^{(k)}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lw} Q \end{matrix} - - - (5)

\begin{matrix} T_{lb} = Σ_{k = 1}^{c} {l_{k} | | 1 / l_{k} Σ_{j = 1}^{l_{k}} f (x_{j}^{(k)}) - 1 / l Σ_{j = 1}^{l} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lb} Q \end{matrix} - - - (6)

Wherein, f (x)=Q ^tx is classifier functions to be asked, and Q is discriminant classification vector,

M_{lw} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{j = 1}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

M_{lb} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{j = 1}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

represent i-th marker samples of kth class, l _krepresent a kth class always total l _kindividual marker samples, δ represents variance.Υ _i,jin order to weigh the relative position of similar middle sample: if similar sample is relatively close, then Υ _i,jvalue comparatively large, represent that sample has stronger correlativity in output region; On the contrary, if similar sample is relatively away from, then Υ _i,jvalue less, represent that the relevance of sample in output region is less; And for inhomogeneous sample, in output region then as far as possible away from.

But when there being label sample less, the tolerance T of degree of separation between compactness and class in the class of local defined above _lwand T _lbhave inclined.For avoiding this situation, we introduce the global structure information of data:

\begin{matrix} T_{p} = Σ_{i = 1}^{n} {| | f (x_{i}) - \frac{1}{n} Σ_{j = 1}^{n} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{p} Q \end{matrix} - - - (7)

Wherein,

M_{p} = \frac{1}{2} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T}, Γ_{i, j}^{p} = 1 / n .

Then degree of separation G between compactness and class is redefined in the class in semi-supervised problem ^(SSL)(f) and J ^(SSL)(f):

G^{(SSL)} (f) = (1 - γ) T_{lw} + γ {| | Q | |}^{2} = Q^{T} [(1 - γ) M_{lw} + γI] Q = Q^{T} T_{rlw}^{(SSL)} Q - - - (8)

J^{(SSL)} (f) = (1 - γ) T_{lb} + γ T_{p} = Q^{T} [(1 - γ) M_{lb} + γ M_{p}] Q = Q^{T} T_{rlb}^{(SSL)} Q - - - (9)

Wherein, γ is regularization parameter, is used for compromising to global structure information and local discriminant information, 0≤γ≤1.T _lwfor have a label data local class in compactness, t _lbfor have a label data local class between degree of separation, unit matrix I retrains T as regularization term _lw, to improve compactness G in class ^(SSL)the stability of (f).T _pfeature the global information of label sample and unlabelled sample, J can be ensured well ^(SSL)the reliability of (f).

Thus, final objective function can be expressed as:

\begin{matrix} \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - f (x_{i}))}^{2} + [η G^{(SSL)} (f) - (1 - η) J^{(SSL)} (f)]} \end{matrix} - - - (10)

Namely

\begin{matrix} \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - Q^{T} x_{i})}^{2} + Q^{T} [η T_{rlw}^{(SSL)} - (1 - η) T_{rlb}^{(SSL)}] Q} \end{matrix} - - - (11)

Wherein, l is the number having label sample, y _ifor sample x _icategory label.η is regularization parameter, 0≤η≤1.

To new multi views label deficiency of data, first project, unified for the data of different views to new lower dimensional space, solve the incomplete problem of class label.Then utilize the semi-supervised classifier of having trained and having obtained, the data of lower dimensional space are classified.

When given view is more than two, man-to-man strategy can be adopted to learn, by view combination of two, application the inventive method carries out dimensionality reduction and classification, then chooses final classification results in a vote.

We adopt Amazon database to test, and first all pictures are adjusted to consistent length, and turn to gray-scale map.By scale invariability, Amazon database data has two kinds of feature: SURF and SIFT feature.For SURF feature, have two kinds of coded systems: (1) carries out k-means cluster to part Amazon database data, coding dimension is 800; (2) another kind of coding dimension is 600, and Dslr database constructs.The coding dimension of SIFT feature is 900.These two kinds of features are respectively as View-X and View-Y view.Dslr always has the picture of 31 classifications, and we choose wherein 20 classes and train.S ²gCA and ARC-t algorithm is the algorithms most in use in multi views study, and k nearest neighbor algorithm (KNN) is the basic skills in sorting algorithm, and Laplace regularization least square algorithm (MR) is positive definite kernel method classical in semisupervised classification.The present invention respectively with S ²gCA, ARC-t and SemiCCA+MR, SemiCCA+KNN contrast, and experimental result is as shown in table 1, can find out that the multi views sorting technique performance based on indefinite core in the present invention is apparently higher than other algorithms, and stability are higher.On Webcam and Dslr data set, even if data processing technique is different, but the nicety of grading of institute of the present invention extracting method is all stabilized between 50%-60% on three data sets.On Amazon and Dslr data set, be then stabilized in about 40%.

Contrast and experiment on table 1 Amazon data set.

Claims

1., based on a multi views sorting technique for indefinite core, it is characterized in that, comprise the steps:

1) the multi-view image collection for training is obtained;

2. the multi views sorting technique based on indefinite core according to claim 1, is characterized in that, described step 2) adopt SemiCCA algorithm to carry out dimensionality reduction, specifically comprise the steps:

1) according to the data of two views, projection matrix is solved;

If X=is (X ^(L), X ^(U)), Y=(Y ^(L), Y ^(U)) represent data in two view View-X and View-Y respectively, wherein with represent the sample matched in View-X and View-Y respectively, with represent unpaired sample in View-X and View-Y respectively, then the formula solving projection matrix is:

M (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) = λN (\begin{matrix} Q_{x} \\ Q_{y} \end{matrix}) - - - (1)

M = μ (\begin{matrix} 0 & R_{xy}^{(L)} \\ R_{yx}^{(L)} & 0 \end{matrix}) + (1 - μ) (\begin{matrix} R_{xx} & 0 \\ 0 & R_{yy} \end{matrix}),

N = μ (\begin{matrix} R_{xx}^{(L)} & 0 \\ 0 & R_{yy}^{(L)} \end{matrix}) + (1 - μ) (\begin{matrix} I_{D_{x}} & 0 \\ 0 & I_{D_{y}} \end{matrix}),

3. the multi views sorting technique based on indefinite core according to claim 1, is characterized in that, described step 3) adopt Semi-IKRLSC Algorithm for Training sorter, given sample set

S = {x_{1}, \cdot \cdot \cdot, x_{l}, x_{l + 1}, \cdot \cdot \cdot, x_{n}} &Subset; R^{m},

Wherein

S_{L} = {(x_{i}, y_{i})}_{i = 1}^{l}

\begin{matrix} T_{lw} = Σ_{k = 1}^{c} Σ_{i = 1}^{l_{k}} {| | f (x_{i}^{(k)}) - 1 / l_{k} Σ_{j = 1}^{l_{k}} f (x_{j}^{(k)}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lw} Q \end{matrix} - - - (5)

\begin{matrix} T_{lb} = Σ_{k = 1}^{c} l_{k} {| | 1 / l_{k} Σ_{i = 1}^{l_{k}} f (x_{i}^{(k)}) - 1 / l Σ_{j = 1}^{l} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{lb} Q \end{matrix} - - - (6)

M_{lw} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lw} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

M_{lb} = \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} Γ_{i, j}^{lb} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T},

2) the global structure information T of data is calculated _p:

\begin{matrix} T_{p} = Σ_{i = 1}^{n} {| | f (x_{i}) - \frac{1}{n} Σ_{j = 1}^{n} f (x_{j}) | |}^{2} \\ = \frac{1}{2} Q^{T} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T} Q = Q^{T} M_{p} Q \end{matrix} - - - (7)

M_{p} = \frac{1}{2} Σ_{i = 1}^{n} Σ_{j = 1}^{n} Γ_{i, j}^{p} (x_{i} - x_{j}) {(x_{i} - x_{j})}^{T};

G^{(SSL)} (f) = (1 - γ) T_{lw} + γ {| | Q | |}^{2} = Q^{T} [(1 - γ) M_{lw} + γI] Q = Q^{T} T_{rlw}^{(SSL)} Q - - - (8)

J^{(SSL)} (f) = (1 - γ) T_{lb} + γ T_{p} = Q^{T} [(1 - γ) M_{lb} + γ M_{p}] Q = Q^{T} T_{rlb}^{(SSL)} Q - - - (9)

Wherein, G ^(SSL)(f) and J ^(SSL)(f) be respectively introduce global structure information class in degree of separation between compactness and class, γ is regularization parameter, T _lwfor have a label data local class in compactness, Q is discriminant classification vector, and I is unit battle array, t _lbfor have a label data local class between degree of separation, T _pfor the global structure information of data,

T_{rlb}^{(SSL)} = (1 - γ) M_{lb} + γ M_{p};

\begin{matrix} f^{*} = \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - f (x_{i}))}^{2} + [η G^{(SSL)} (f) - (1 - η) J^{(SSL)} (f)]} \\ = \min_{f &Element; \tilde{K}} {1 / l Σ_{i = 1}^{l} {(y_{i} - Q^{T} x_{i})}^{2} + Q^{T} [η T_{rlw}^{(SSL)} - (1 - η) T_{rlb}^{(SSL)}] Q} \end{matrix} - - - (12)

Wherein, represent reproducing kernel space, l is the number having label sample, y _ifor sample x _icategory label, f is classifier functions, and 0≤η≤1 is regularization parameter, G ^(SSL)compactness in f class that () is introducing global structure information, J ^(SSL)degree of separation between f class that () is introducing global structure information, Q is discriminant classification vector,

T_{rlw}^{(SSL)} = (1 - γ) M_{lw} + γI,

T_{rlb}^{(SSL)} = (1 - γ) M_{lb} + γ M_{p} .