CN106203517A

CN106203517A - The data classification method of a kind of nuclear norm driving and system

Info

Publication number: CN106203517A
Application number: CN201610554118.8A
Authority: CN
Inventors: 张召; 贾磊; 李凡长; 张莉; 王邦军
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2016-07-14
Filing date: 2016-07-14
Publication date: 2016-12-07

Abstract

The invention discloses data classification method and system that a kind of nuclear norm drives, the method includes: first with the sample architecture weight coefficient matrix in training set, and for characterizing the similarity between sample, reinitialize an initial category matrix；Secondly, in order to accurately and reliably measure neighbour's reconstructed error, have employed nuclear norm and smooth item to measure manifold, during optimizing, neighbour's reconstructed error minimization problem based on nuclear norm can change into the optimization problem solving a series of Frobenius norms, simultaneously, during difference between predictive metrics label and artificial initial labels, for the lift scheme robustness for noise and the accuracy of tolerance, propose based on weighting L2, the label fit term of 1 norm.Finally, take probit maximal term in soft class label vector and, for category authentication, obtain classification results the most accurately.Additionally, use nuclear norm more more reliable than L1 norm or L2 norm as distance metric, it is effectively improved the prediction precision of model.

Description

The data classification method of a kind of nuclear norm driving and system

Technical field

The present invention relates to machine learning and mode identification technology, the data particularly relating to the driving of a kind of nuclear norm are divided Class method and system.

Background technology

At present, along with popularizing of the Internet and developing rapidly of computer technology, promote the arriving of big data age, respectively Row each industry every day is all at the fragmentation of data producing enormous amount so that the such as Various types of data such as text, multimedia messages, image in Explosive growth, but people are also into the vicious circle of " information is flourishing, knowledge is deficient " simultaneously, owing to data volume is huge, look for not To the data that oneself is required.This difficult situation promotes scholars urgently to seek the new technique of quick-searching data, meets people Quick-searching demand.Data Classification Technology solves the effective way of this problem just, and it can build and the mankind automatically Cognitive consistent data semantic information.

Label is propagated and was proposed by Zhu et al. in 2002, and it is a kind of semi-supervised learning method based on figure, and it is thought substantially Road is the label information removing to predict unmarked node with the label information of marked node.In recent years in terms of view data classification also Show outstanding performance.Utilizing the relation between sample to set up complete graph model, in complete graph, node includes marking and not Labeled data, its limit represents the similarity of two nodes, and the label of node passes to other nodes by similarity.But, at present Most of direct-push label propagation models conduct a research mainly around the aspect of the sparse structure of weight, have some the most scarce Point, i.e. based on not Luo Beini this norm of crow (Frobenius norm) neighbour's reconstructed error is with label matching mismatch tolerance not Accurately, reliably, and the process that label is estimated is easily subject to the negative effect of noise, reduces the accuracy of label estimated result.

It is therefore proposed that a kind of new data classification method, it is this area to promote reliability and the accuracy of classification results The problem that technical staff needs solution at present badly.

Summary of the invention

In view of this, the invention provides data classification method and system that a kind of nuclear norm drives, divide to realize promoting The reliability of class result and the purpose of accuracy.

For solving above-mentioned technical problem, the present invention provides the data classification method that a kind of nuclear norm drives, based on nuclear norm As the thought of distance metric, the method includes:

Initial category matrix Y is determined according to the original tag information of training sample in training set, and by all institutes State training sample execution neighbor search operative configuration and obtain similarity measure matrix, and described similarity measure matrix is carried out symmetry Change, normalized obtains weight coefficient matrix W；Wherein, described training set includes label training sample and without label training Sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample；

Based on described initial category matrix Y and described weight coefficient matrix W, by balance, neighbour reconstructs item and label matching Item sets up direct-push label propagation model, utilizes described direct-push label propagation model to be iterated optimization and obtains described training set Soft class label prediction matrix F；Wherein, described neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for Definition manifold smooths item, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization；

Utilize described soft class label prediction matrix F, be calculated the soft class label vector of test sample, according to soft class The classification that in distinguishing label vector, the maximum of probability is corresponding determines the classification of described test sample, obtains knot of classifying the most accurately Really；Wherein, the element in described soft class label vector is the probability that described test sample belongs to each classification.

Preferably, described by all described training samples execution neighbor search operative configuration is obtained similarity measure square Battle array, and described similarity measure matrix is carried out symmetrization, normalized obtains weight coefficient matrix W, including:

Sample each in described training set is carried out k nearest neighbor search, find out each sample K in described training set away from From nearest sample；

Use the construction algorithm of LLE-reconstruct power, the similarity measure matrix of structure similar neighborhoods figure；

Described similarity measure matrix is carried out symmetrization, normalized, obtains described weight coefficient matrix W.

Preferably, realize described balance neighbour by the following the minimization of object function problem of solution and reconstruct item and label matching :

\underset{F}{M i n} \hat{J} = | | F^{T} - {WF}^{T} | |_{*} + t r ({(F^{T} - Y^{T})}^{T} U V (F^{T} - Y^{T}))

Wherein, | | F^T-WF^T||_*Item, tr ((F is reconstructed for described neighbour^T-Y^T)^TUV(F^T-Y^T)) it is described label fit term；U It is with μ_iBalance neighbour for element reconstructs the just balance parameter matrix between item and label fit term；V is with V_i,i=1/2hⁱ| |₂(i=1,2 ..., l+ μ) it is the diagonal matrix of element, hⁱFor matrix F^Τ-Y^ΤI-th row vector.

Present invention also offers the data sorting system that a kind of nuclear norm drives, based on nuclear norm as the think of of distance metric Thinking, this system includes:

Training pretreatment module, for determining initial category matrix according to the original tag information of training sample in training set Y, and obtain similarity measure matrix by all described training samples are performed neighbor search operative configuration, and to described similar Metric matrix carries out symmetrization, normalized obtains weight coefficient matrix W；Wherein, described training set includes label training Sample and without label training sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample；

Training module, for based on described initial category matrix Y and described weight coefficient matrix W, by balance neighbour's weight Structure item and label fit term set up direct-push label propagation model, utilize described direct-push label propagation model to be iterated optimizing Obtain the soft class label prediction matrix F of described training set；Wherein, described neighbour reconstructs the weight that item is reliable nuclear norm tolerance Structure error items, is used for defining manifold and smooths item, and described label fit term is based on weighting L2, the label matching of 1 norm regularization ?；

Category determination module, is used for utilizing described soft class label prediction matrix F, is calculated the soft classification of test sample Label vector, determines the classification of described test sample according to the classification that the maximum of probability in soft class label vector is corresponding, To classification results the most accurately；Wherein, the element in described soft class label vector is that described test sample belongs to each classification Probability.

The data classification method of a kind of nuclear norm driving that the above present invention provides and system, by introducing current popular Nuclear norm regularization tolerance thought, utilizes label transmission method that data carry out direct-push classification process, and accurate Fast estimation goes out The class label of test sample.Specifically, when building direct-push label propagation model, by using based on reliable nuclear norm Neighbour's reconstructed error of tolerance, improves reliability and the accuracy of model, and then improves reliability and the essence of classification results Parasexuality.

Additionally, use based on weighting L2, the label fit term of 1 norm, it is ensured that in predictive metrics label and initial labels Between difference time for the robust type of noise and the accuracy of tolerance, improve model to noise and the robust of heterogeneous data with this Property, improve reliability and the accuracy of classification results further.Further, use nuclear norm as distance metric than L1 norm or L2 norm is more reliable, is effectively improved the prediction precision of model.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to The accompanying drawing provided obtains other accompanying drawing.

The flow chart of the data classification method that a kind of nuclear norm that Fig. 1 provides for the embodiment of the present invention drives；

The structured flowchart schematic diagram of the data sorting system that a kind of nuclear norm that Fig. 2 provides for the embodiment of the present invention drives；

The actual application scenarios signal of the data classification method that a kind of nuclear norm that Fig. 3 provides for the embodiment of the present invention drives Figure.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.

The core of the present invention is to provide data classification method and the system that a kind of nuclear norm drives, to realize promoting classification knot The reliability of fruit and the purpose of accuracy.

In order to make those skilled in the art be more fully understood that the present invention program, below in conjunction with the accompanying drawings and detailed description of the invention The present invention is described in further detail.

The embodiment of the present invention is tested three different data bases: SCCTS machine learning data set, COIL20 mesh Logo image data set and GTF face image data collection, wherein SCCTS machine learning data set, comprise 1000 samples, respectively Being 10 classifications, each classification includes 100 samples；COIL20 destination image data collection has 1440 target image identification numbers According to collection sample；GTF face data base contains the 750 width pictures of testee, and every pictures has different posture, illumination Intensity and expression.These data bases collect from many aspects, thus test result has the most illustrative.

The flow process of the data classification method of a kind of nuclear norm driving that the embodiment of the present invention provides is shown with reference to Fig. 1, Fig. 1 Figure, the method is based on nuclear norm as the thought of distance metric, and it specifically may include steps of:

In step S100, foundation training set, the original tag information of training sample determines initial category matrix Y, and passes through All training samples are performed neighbor search operative configuration and obtains similarity measure matrix, and similarity measure matrix is carried out symmetry Change, normalized obtains weight coefficient matrix W.

Wherein, training set includes label training sample and without label training sample, and having label training sample is known class Sample, without the sample that label training sample is unknown classification.For a given training set Wherein, n is the dimension of sample, and N=l+u is sample set sum, and l is training set sample number, and u is test set sample number, i.e. comprises There is the training set sample of class label (C classification, C ＞ 2 altogether)With the test without any label Collection sample

For initial category matrix Y, it is used for recording initial known supervision message, in actual applications, by manually Demarcate and determine: first, one row, column number of initialization definitions is respectively classification sum C and the matrix Y of training sample sum N, institute Element is had all to be initialized as 0 to record the init Tag information of all training samples；Secondly, for there being the training sample of label This, if sample x_jBelong to the i-th class, make Y_i,j=1, wherein class label i belong to set 1,2 ..., c}；For institute with or without label Sample x_j, make Y_i,j=0.Ensure that the element sum of all row is 1 in Y, represent each training sample have and only one known Label.

For weight coefficient matrix W, it is for characterizing the neighbour's characteristic between sample, in specific implementation process, by with The lower step realization determination to it:

Sample each in training set is carried out k nearest neighbor search, finds out each sample K in training set individual closest Sample；Use the construction algorithm of LLE-reconstruct power, the similarity measure matrix of structure similar neighborhoods figure；Similarity measure matrix is carried out Symmetrization, normalized, obtain weight coefficient matrix W.

A weight matrix is obtained according to LLE reconstruction weights matrix calculusI.e. solve following minimum Change problem:

Wherein,For comprising sample x_iNeighbor Points in neighborhood,For going and being 1 constraint, W_i,j≥ 0 meets this definition of probability for nonnegativity restrictions, i.e. weight are sparse.

Further optionally, this part also includes the operations such as necessary data prediction and parameter are arranged, and particular content is permissible With reference to prior art, the present invention is not detailed at this.

So far, we have obtained weight coefficient matrix W and initial category label matrix Y.

Step S101, based on initial category matrix Y and weight coefficient matrix W, reconstruct item by balance neighbour and label intended Close item and set up direct-push label propagation model, utilize direct-push label propagation model to be iterated optimizing the soft class obtaining training set Distinguishing label prediction matrix F.

Wherein, neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining manifold and smooths item, label Fit term is label fit term based on weighting L2,1 norm regularization；Specifically, direct-push label propagation model mistake is being set up Cheng Zhong, balances neighbour reconstruct item and label fit term by solving following the minimization of object function problem:

\underset{F}{M i n} \hat{J} = | | F^{T} - {WF}^{T} | |_{*} + t r ({(F^{T} - Y^{T})}^{T} U V (F^{T} - Y^{T}))

Wherein, | | F^T-WF^T||_*Item, tr ((F is reconstructed for neighbour^T-Y^T)^TUV(F^T-Y^T)) it is label fit term；U is with μ_iFor The balance neighbour of element reconstructs the just balance parameter matrix between item and label fit term, with μ_iFor the diagonal matrix of element, μ_iFor Adjust parameter, as the sample x in training set_iLabel when being known, corresponding μ_i=10¹⁰Otherwise, μ_i=0；V is with V_i,i=1/ 2hⁱ||₂(i=1,2 ..., l+ μ) it is the diagonal matrix of element, hⁱFor matrix F^Τ-Y^ΤI-th row vector.

t r (X) = Σ_{i = 1}^{N} X_{i, i}, | | E | |_{2, 1} = Σ_{i = 1}^{N} \sqrt{Σ_{i = 1}^{N} {({[E]}_{i, j})}^{2}}, | | L | |_{*} = \underset{i}{Σ} σ_{i} (L),

∑_iσ_i(L) the singular value sum of representing matrix L.

When calculating, it can be noted that object function is convex, so object function can be asked the local derviation of its F, at derivative At 0, it is the extreme point of object function.First calculating | | F^T-WF^T||_*Time, introduce when this example is optimized and calculates One theorem: for matrixHave

| | X | |_{*} = | | {({XX}^{T})}^{- \frac{1}{4}} X | |_{F}^{2},

Replacing nuclear norm by Frobenius norm in theorem, and provide one group of substrate for model, order is the X of r α item weight definition is:

X^{α} = U Σ^{α} V^{T}, Σ^{α} = d i a g (σ_{1}^{α}, ..., σ_{r}^{α}),

U∑V^ΤIt is the decomposition that X is carried out singular value, ∑=diag (σ₁,...,σ_r).Based on above theorem, the inventive method Object function item | | F^T-WF^T||_*Can be by following conversion:

\begin{matrix} | | F^{T} - {WF}^{T} | |_{*} = | | G (F^{T} - {WF}^{T}) | |_{F}^{2} \\ = t r (G (F^{T} - {WF}^{T}) {(F^{T} - {WF}^{T})}^{T} G^{T}) \\ = t r (G (I - W) F^{T} F (I - W^{T}) G^{T}) \\ = t r (F (I - W^{T}) G^{T} G (I - W) F^{T}) \end{matrix},

Wherein, G is weight matrix, is defined as:

G=((F^T-WF^T)(F^T-WF^T)^T)^-1/4,

It should be noted here that as (F^T-WF^T)(F^T-WF^T)^ΤIn some singular value the least time, calculate G time there will be mistake By mistake, in order to improve the stability of algorithm, it is possible to use (F^T-WF^T)_εReplace (F^T-WF^T) calculate:

X_{ϵ} = U \underset{ϵ}{Σ} V^{T}, Σ_{ϵ} = d i a g (m a x {σ_{i}, ϵ} i = 1 : r),

Wherein, ε^k=min{ ε^k-1,σ_K(F^T-WF^T)}.Then, label fit term is had:

(F^T-Y^T)^TUV(F^T-Y^T)=FUVF^T-FUVY^T-YUVF^T+YUVY^T,

Minimizing for object function, after being equivalent to F derivation, derivative is the extreme point of zero, then has:

\begin{matrix} \frac{\partial}{\partial F} = F (I - W^{T}) G^{T} G (I - W) + F U V - Y U V = 0 \\ &DoubleRightArrow; F ((I - W^{T}) G^{T} G (I - W) + U V) = Y U V \\ &DoubleRightArrow; F = Y U V {((I - W^{T}) G^{T} G (I - W) + U V)}^{- 1} \end{matrix},

Wherein,It is diagonal matrix, is infinity corresponding to there being the diagonal element value of label, without the correspondence of label Diagonal element value is 0.

Finally, because V and G is the function about F, institute is in this way by making target letter to three mutual iteration of variable Number obtains efficient solution, finally draws soft label F and predicts the outcome.Concrete Tag Estimation algorithm based on nuclear norm driving is such as Under:

Input: raw data matrixTest set label matrix

Output: soft label matrix (F^*←F_k+1), prediction matrix (predicted_labels).

Initialize:

Gama=1, kesi=1, tol=1e-2, K=7, V=G=I, F=Y, maxIter=10, converged=0

Do when while is the most not converged

Fixing G and V also updates F_k+1:

F^k+1=YUV^k((I-W^T)(G^k)^TG^k(I-W)+UV^k)^-1

Fixing F also updates V_k+1:

V^{k + 1} = 1 / 2 | | F_{k}^{T} - Y^{T} | |_{2}

Fixing F also updates G^k+1:

G^k+1=(((F^k)^T-W(F^k)^T)((F^k)^T-W(F^k)^T)^T)^-1/4

Check whether convergence:

If sqrt (sum (tmp (:).²)) ＜ tol | | iter ＞=maxIter then stops；If (the condition of iteration stopping For: exceed maximum iteration time maxIter set in advance, or calculate between the matrix F that double iteration obtains away from From (basis for estimation is: F^k+1-F^kAll elements quadratic sum opens the result of radical sign less than setting tol), if less than presetting Value, then iteration stopping.)

Otherwise k=k+1

endwhile

This algorithm essentially consists in G_k+1Calculating, need matrix is carried out singular value decomposition, therefore the calculating of this algorithm is complicated Spend identical with Inexact ALM method based on RPCA.

Step S102, utilize soft class label prediction matrix F, be calculated the soft class label vector of test sample, root Determine the classification of test sample according to the classification that the maximum of probability in soft class label vector is corresponding, obtain knot of classifying the most accurately Really.

In the present invention, the element in soft class label vector is the probability that test sample belongs to each classification.

By model, the mutual iteration of soft class label matrix F and its dependent variable is obtained the soft classification of all sample datas Label matrix F, the sample x that last each class label information is unknown_newClass label can be summed up as argmax_i≤c(f_new), f_newFor x_newSoft label vector, i.e. according to soft label f_newThe position corresponding to maximum of middle classification ownership probability, estimates class The sample class that label information is unknown, completes categorizing process.For the soft label matrix F of above-mentioned grey iterative generation, last each nothing The hard label of label training sample can be summed up as argmax_i≤c(f_i), represent the soft label vector f of prediction_iI-th element Position.According to the maximum in the soft label corresponding without label training sample, obtain the prediction corresponding without label training set sample Classification.

Data classification method based on the nuclear norm driving that the invention described above embodiment provides, the embodiment of the present invention also provides for The data sorting system that a kind of nuclear norm drives, with reference to Fig. 2, this system 200 based on nuclear norm as the thought of distance metric, It can include following content:

Training pretreatment module 201, for determining initial category according to the original tag information of training sample in training set Matrix Y, and obtain similarity measure matrix by all training samples are performed neighbor search operative configuration, and to similarity measure Matrix carries out symmetrization, normalized obtains weight coefficient matrix W；Wherein, training set includes label training sample and nothing Label training sample, weight coefficient matrix W is for characterizing the neighbour's characteristic between sample；

Training module 202, for based on initial category matrix Y and weight coefficient matrix W, by balance neighbour reconstruct item and Label fit term sets up direct-push label propagation model, utilizes direct-push label propagation model to be iterated optimization and obtains training set Soft class label prediction matrix F；Wherein, neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining Manifold smooths item, and label fit term is based on weighting L2, the label fit term of 1 norm regularization；

Category determination module 203, is used for utilizing soft class label prediction matrix F, is calculated the soft classification of test sample Label vector, determines the classification of test sample, obtains according to the classification that the maximum of probability in soft class label vector is corresponding Classification results accurately；Wherein, the element in soft class label vector is the probability that test sample belongs to each classification.

Refer to table 1, for the inventive method and SparseNP (Sparse Neighborhood Propagation), SLP (SpecialLabel Propagation)、LNP(Label NeighborhoodPropagation)、LLGC(Learning with Local and Global Consistency)、LapLDA(LaplacianLinearDiscriminant Analysis), GFHF (Gaussian Fields and Harmonic Functions) and CD-LNP (Prior Class Dissimilarity based LNP) classification Comparative result table, give the average of each method experiment and the highest identification Rate.Refer to accompanying drawing 3, the actual application scenarios of the data classification method that a kind of nuclear norm disclosed in the embodiment of the present invention drives Schematic diagram.

In this example, SparseNP, LNP and LapLDA method participating in comparing uses the acquiescence ginseng that in each document, algorithm uses Number, and classification all employing K-arest neighbors (K=7) graders.Real from COIL20 destination image data collection and HP0 machine error respectively Test training sample data concentrate every class randomly select 15 and 2 as marked data, other Unlabeled datas are as test Collection.Every class is concentrated to randomly select 5 and 7 as marked from these two groups of Experiment Training sample datas of GTF facial image respectively Data.Other Unlabeled datas are as test set.

Table 1

We can see that the Data Classifying Quality of the present invention is substantially better than other relevant several sides by experimental result Method, and show stronger stability, there is certain advantage.

It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each embodiment weight Point explanation is all the difference with other embodiments, and between each embodiment, identical similar part sees mutually. For system class embodiment, due to itself and embodiment of the method basic simlarity, so describing fairly simple, relevant part ginseng See that the part of embodiment of the method illustrates.

The data classification method and the system that drive a kind of nuclear norm provided by the present invention above are described in detail. Principle and the embodiment of the present invention are set forth by specific case used herein, and the explanation of above example is simply used In helping to understand method and the core concept thereof of the present invention.It should be pointed out that, for those skilled in the art, Under the premise without departing from the principles of the invention, it is also possible to the present invention is carried out some improvement and modification, these improve and modify also Fall in the protection domain of the claims in the present invention.

Claims

1. the data classification method that a nuclear norm drives, it is characterised in that based on nuclear norm as the thought of distance metric, be somebody's turn to do Method includes:

Initial category matrix Y is determined according to the original tag information of training sample in training set, and by all described instructions Practice sample execution neighbor search operative configuration and obtain similarity measure matrix, and described similarity measure matrix is carried out symmetrization, returns One change processes and obtains weight coefficient matrix W；Wherein, described training set includes label training sample and without label training sample, Described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample；

Based on described initial category matrix Y and described weight coefficient matrix W, reconstruct item by balance neighbour and label fit term is built Vertical direct-push label propagation model, utilizes described direct-push label propagation model to be iterated optimizing and obtain the soft of described training set Class label prediction matrix F；Wherein, described neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining Manifold smooths item, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization；

Utilize described soft class label prediction matrix F, be calculated the soft class label vector of test sample, according to soft classification mark Sign the classification that in vector, the maximum of probability is corresponding and determine the classification of described test sample, obtain classification results the most accurately；Its In, the element in described soft class label vector is the probability that described test sample belongs to each classification.

2. the method for claim 1, it is characterised in that described by all described training samples are performed neighbor search Operative configuration obtains similarity measure matrix, and described similarity measure matrix is carried out symmetrization, normalized obtains weight system Matrix number W, including:

Sample each in described training set is carried out k nearest neighbor search, finds out each sample in described training set K distance Near sample；

3. method as claimed in claim 1 or 2, it is characterised in that realize by solving following the minimization of object function problem Described balance neighbour reconstructs item and label fit term:

\underset{F}{M i n} \hat{J} = | | F^{T} - {WF}^{T} | |_{*} + t r ({(F^{T} - Y^{T})}^{T} U V (F^{T} - Y^{T}))

Wherein, | | F^T-WF^T||_*Item, tr ((F is reconstructed for described neighbour^T-Y^T)^TUV(F^T-Y^T)) it is described label fit term；U be with μ_iBalance neighbour for element reconstructs the just balance parameter matrix between item and label fit term；V is with V_i,i=1/2 | | hⁱ||₂ (i=1,2 ..., l+ μ) it is the diagonal matrix of element, hⁱFor matrix F^Τ-Y^ΤI-th row vector.

4. the data sorting system that a nuclear norm drives, it is characterised in that based on nuclear norm as the thought of distance metric, be somebody's turn to do System includes:

Training pretreatment module, for determining initial category matrix Y according to the original tag information of training sample in training set, with And obtain similarity measure matrix by all described training samples are performed neighbor search operative configuration, and to described similarity measure Matrix carries out symmetrization, normalized obtains weight coefficient matrix W；Wherein, described training set includes label training sample With without label training sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample；

Training module, for based on described initial category matrix Y and described weight coefficient matrix W, by balance, neighbour reconstructs item Set up direct-push label propagation model with label fit term, utilize described direct-push label propagation model to be iterated optimization and obtain The soft class label prediction matrix F of described training set；Wherein, described neighbour reconstructs the reconstruct mistake that item is reliable nuclear norm tolerance Item, is used for defining manifold and smooths item by mistake, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization；

Category determination module, is used for utilizing described soft class label prediction matrix F, is calculated the soft class label of test sample Vector, determines the classification of described test sample, obtains according to the classification that the maximum of probability in soft class label vector is corresponding Classification results accurately；Wherein, the element in described soft class label vector is that described test sample belongs to the general of each classification Rate.