CN106203517A - The data classification method of a kind of nuclear norm driving and system - Google Patents

The data classification method of a kind of nuclear norm driving and system Download PDF

Info

Publication number
CN106203517A
CN106203517A CN201610554118.8A CN201610554118A CN106203517A CN 106203517 A CN106203517 A CN 106203517A CN 201610554118 A CN201610554118 A CN 201610554118A CN 106203517 A CN106203517 A CN 106203517A
Authority
CN
China
Prior art keywords
label
matrix
sample
item
neighbour
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610554118.8A
Other languages
Chinese (zh)
Inventor
张召
贾磊
李凡长
张莉
王邦军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201610554118.8A priority Critical patent/CN106203517A/en
Publication of CN106203517A publication Critical patent/CN106203517A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes

Abstract

The invention discloses data classification method and system that a kind of nuclear norm drives, the method includes: first with the sample architecture weight coefficient matrix in training set, and for characterizing the similarity between sample, reinitialize an initial category matrix;Secondly, in order to accurately and reliably measure neighbour's reconstructed error, have employed nuclear norm and smooth item to measure manifold, during optimizing, neighbour's reconstructed error minimization problem based on nuclear norm can change into the optimization problem solving a series of Frobenius norms, simultaneously, during difference between predictive metrics label and artificial initial labels, for the lift scheme robustness for noise and the accuracy of tolerance, propose based on weighting L2, the label fit term of 1 norm.Finally, take probit maximal term in soft class label vector and, for category authentication, obtain classification results the most accurately.Additionally, use nuclear norm more more reliable than L1 norm or L2 norm as distance metric, it is effectively improved the prediction precision of model.

Description

The data classification method of a kind of nuclear norm driving and system
Technical field
The present invention relates to machine learning and mode identification technology, the data particularly relating to the driving of a kind of nuclear norm are divided Class method and system.
Background technology
At present, along with popularizing of the Internet and developing rapidly of computer technology, promote the arriving of big data age, respectively Row each industry every day is all at the fragmentation of data producing enormous amount so that the such as Various types of data such as text, multimedia messages, image in Explosive growth, but people are also into the vicious circle of " information is flourishing, knowledge is deficient " simultaneously, owing to data volume is huge, look for not To the data that oneself is required.This difficult situation promotes scholars urgently to seek the new technique of quick-searching data, meets people Quick-searching demand.Data Classification Technology solves the effective way of this problem just, and it can build and the mankind automatically Cognitive consistent data semantic information.
Label is propagated and was proposed by Zhu et al. in 2002, and it is a kind of semi-supervised learning method based on figure, and it is thought substantially Road is the label information removing to predict unmarked node with the label information of marked node.In recent years in terms of view data classification also Show outstanding performance.Utilizing the relation between sample to set up complete graph model, in complete graph, node includes marking and not Labeled data, its limit represents the similarity of two nodes, and the label of node passes to other nodes by similarity.But, at present Most of direct-push label propagation models conduct a research mainly around the aspect of the sparse structure of weight, have some the most scarce Point, i.e. based on not Luo Beini this norm of crow (Frobenius norm) neighbour's reconstructed error is with label matching mismatch tolerance not Accurately, reliably, and the process that label is estimated is easily subject to the negative effect of noise, reduces the accuracy of label estimated result.
It is therefore proposed that a kind of new data classification method, it is this area to promote reliability and the accuracy of classification results The problem that technical staff needs solution at present badly.
Summary of the invention
In view of this, the invention provides data classification method and system that a kind of nuclear norm drives, divide to realize promoting The reliability of class result and the purpose of accuracy.
For solving above-mentioned technical problem, the present invention provides the data classification method that a kind of nuclear norm drives, based on nuclear norm As the thought of distance metric, the method includes:
Initial category matrix Y is determined according to the original tag information of training sample in training set, and by all institutes State training sample execution neighbor search operative configuration and obtain similarity measure matrix, and described similarity measure matrix is carried out symmetry Change, normalized obtains weight coefficient matrix W;Wherein, described training set includes label training sample and without label training Sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample;
Based on described initial category matrix Y and described weight coefficient matrix W, by balance, neighbour reconstructs item and label matching Item sets up direct-push label propagation model, utilizes described direct-push label propagation model to be iterated optimization and obtains described training set Soft class label prediction matrix F;Wherein, described neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for Definition manifold smooths item, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization;
Utilize described soft class label prediction matrix F, be calculated the soft class label vector of test sample, according to soft class The classification that in distinguishing label vector, the maximum of probability is corresponding determines the classification of described test sample, obtains knot of classifying the most accurately Really;Wherein, the element in described soft class label vector is the probability that described test sample belongs to each classification.
Preferably, described by all described training samples execution neighbor search operative configuration is obtained similarity measure square Battle array, and described similarity measure matrix is carried out symmetrization, normalized obtains weight coefficient matrix W, including:
Sample each in described training set is carried out k nearest neighbor search, find out each sample K in described training set away from From nearest sample;
Use the construction algorithm of LLE-reconstruct power, the similarity measure matrix of structure similar neighborhoods figure;
Described similarity measure matrix is carried out symmetrization, normalized, obtains described weight coefficient matrix W.
Preferably, realize described balance neighbour by the following the minimization of object function problem of solution and reconstruct item and label matching :
M i n F J ^ = | | F T - WF T | | * + t r ( ( F T - Y T ) T U V ( F T - Y T ) )
Wherein, | | FT-WFT||*Item, tr ((F is reconstructed for described neighbourT-YT)TUV(FT-YT)) it is described label fit term;U It is with μiBalance neighbour for element reconstructs the just balance parameter matrix between item and label fit term;V is with Vi,i=1/2hi| |2(i=1,2 ..., l+ μ) it is the diagonal matrix of element, hiFor matrix FΤ-YΤI-th row vector.
Present invention also offers the data sorting system that a kind of nuclear norm drives, based on nuclear norm as the think of of distance metric Thinking, this system includes:
Training pretreatment module, for determining initial category matrix according to the original tag information of training sample in training set Y, and obtain similarity measure matrix by all described training samples are performed neighbor search operative configuration, and to described similar Metric matrix carries out symmetrization, normalized obtains weight coefficient matrix W;Wherein, described training set includes label training Sample and without label training sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample;
Training module, for based on described initial category matrix Y and described weight coefficient matrix W, by balance neighbour's weight Structure item and label fit term set up direct-push label propagation model, utilize described direct-push label propagation model to be iterated optimizing Obtain the soft class label prediction matrix F of described training set;Wherein, described neighbour reconstructs the weight that item is reliable nuclear norm tolerance Structure error items, is used for defining manifold and smooths item, and described label fit term is based on weighting L2, the label matching of 1 norm regularization ?;
Category determination module, is used for utilizing described soft class label prediction matrix F, is calculated the soft classification of test sample Label vector, determines the classification of described test sample according to the classification that the maximum of probability in soft class label vector is corresponding, To classification results the most accurately;Wherein, the element in described soft class label vector is that described test sample belongs to each classification Probability.
The data classification method of a kind of nuclear norm driving that the above present invention provides and system, by introducing current popular Nuclear norm regularization tolerance thought, utilizes label transmission method that data carry out direct-push classification process, and accurate Fast estimation goes out The class label of test sample.Specifically, when building direct-push label propagation model, by using based on reliable nuclear norm Neighbour's reconstructed error of tolerance, improves reliability and the accuracy of model, and then improves reliability and the essence of classification results Parasexuality.
Additionally, use based on weighting L2, the label fit term of 1 norm, it is ensured that in predictive metrics label and initial labels Between difference time for the robust type of noise and the accuracy of tolerance, improve model to noise and the robust of heterogeneous data with this Property, improve reliability and the accuracy of classification results further.Further, use nuclear norm as distance metric than L1 norm or L2 norm is more reliable, is effectively improved the prediction precision of model.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Inventive embodiment, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to The accompanying drawing provided obtains other accompanying drawing.
The flow chart of the data classification method that a kind of nuclear norm that Fig. 1 provides for the embodiment of the present invention drives;
The structured flowchart schematic diagram of the data sorting system that a kind of nuclear norm that Fig. 2 provides for the embodiment of the present invention drives;
The actual application scenarios signal of the data classification method that a kind of nuclear norm that Fig. 3 provides for the embodiment of the present invention drives Figure.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.
The core of the present invention is to provide data classification method and the system that a kind of nuclear norm drives, to realize promoting classification knot The reliability of fruit and the purpose of accuracy.
In order to make those skilled in the art be more fully understood that the present invention program, below in conjunction with the accompanying drawings and detailed description of the invention The present invention is described in further detail.
The embodiment of the present invention is tested three different data bases: SCCTS machine learning data set, COIL20 mesh Logo image data set and GTF face image data collection, wherein SCCTS machine learning data set, comprise 1000 samples, respectively Being 10 classifications, each classification includes 100 samples;COIL20 destination image data collection has 1440 target image identification numbers According to collection sample;GTF face data base contains the 750 width pictures of testee, and every pictures has different posture, illumination Intensity and expression.These data bases collect from many aspects, thus test result has the most illustrative.
The flow process of the data classification method of a kind of nuclear norm driving that the embodiment of the present invention provides is shown with reference to Fig. 1, Fig. 1 Figure, the method is based on nuclear norm as the thought of distance metric, and it specifically may include steps of:
In step S100, foundation training set, the original tag information of training sample determines initial category matrix Y, and passes through All training samples are performed neighbor search operative configuration and obtains similarity measure matrix, and similarity measure matrix is carried out symmetry Change, normalized obtains weight coefficient matrix W.
Wherein, training set includes label training sample and without label training sample, and having label training sample is known class Sample, without the sample that label training sample is unknown classification.For a given training set Wherein, n is the dimension of sample, and N=l+u is sample set sum, and l is training set sample number, and u is test set sample number, i.e. comprises There is the training set sample of class label (C classification, C > 2 altogether)With the test without any label Collection sample
For initial category matrix Y, it is used for recording initial known supervision message, in actual applications, by manually Demarcate and determine: first, one row, column number of initialization definitions is respectively classification sum C and the matrix Y of training sample sum N, institute Element is had all to be initialized as 0 to record the init Tag information of all training samples;Secondly, for there being the training sample of label This, if sample xjBelong to the i-th class, make Yi,j=1, wherein class label i belong to set 1,2 ..., c};For institute with or without label Sample xj, make Yi,j=0.Ensure that the element sum of all row is 1 in Y, represent each training sample have and only one known Label.
For weight coefficient matrix W, it is for characterizing the neighbour's characteristic between sample, in specific implementation process, by with The lower step realization determination to it:
Sample each in training set is carried out k nearest neighbor search, finds out each sample K in training set individual closest Sample;Use the construction algorithm of LLE-reconstruct power, the similarity measure matrix of structure similar neighborhoods figure;Similarity measure matrix is carried out Symmetrization, normalized, obtain weight coefficient matrix W.
A weight matrix is obtained according to LLE reconstruction weights matrix calculusI.e. solve following minimum Change problem:
Wherein,For comprising sample xiNeighbor Points in neighborhood,For going and being 1 constraint, Wi,j≥ 0 meets this definition of probability for nonnegativity restrictions, i.e. weight are sparse.
Further optionally, this part also includes the operations such as necessary data prediction and parameter are arranged, and particular content is permissible With reference to prior art, the present invention is not detailed at this.
So far, we have obtained weight coefficient matrix W and initial category label matrix Y.
Step S101, based on initial category matrix Y and weight coefficient matrix W, reconstruct item by balance neighbour and label intended Close item and set up direct-push label propagation model, utilize direct-push label propagation model to be iterated optimizing the soft class obtaining training set Distinguishing label prediction matrix F.
Wherein, neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining manifold and smooths item, label Fit term is label fit term based on weighting L2,1 norm regularization;Specifically, direct-push label propagation model mistake is being set up Cheng Zhong, balances neighbour reconstruct item and label fit term by solving following the minimization of object function problem:
M i n F J ^ = | | F T - WF T | | * + t r ( ( F T - Y T ) T U V ( F T - Y T ) )
Wherein, | | FT-WFT||*Item, tr ((F is reconstructed for neighbourT-YT)TUV(FT-YT)) it is label fit term;U is with μiFor The balance neighbour of element reconstructs the just balance parameter matrix between item and label fit term, with μiFor the diagonal matrix of element, μiFor Adjust parameter, as the sample x in training setiLabel when being known, corresponding μi=1010Otherwise, μi=0;V is with Vi,i=1/ 2hi||2(i=1,2 ..., l+ μ) it is the diagonal matrix of element, hiFor matrix FΤ-YΤI-th row vector.
ForX=(Xi,i) leading diagonal on all elements sum be referred to as the mark of X, be designated as tr (X), | | E | |2,1For L2,1 norm, | | L | |*For nuclear norm, it is defined respectively as:
t r ( X ) = Σ i = 1 N X i , i , | | E | | 2 , 1 = Σ i = 1 N Σ i = 1 N ( [ E ] i , j ) 2 , | | L | | * = Σ i σ i ( L ) ,
iσi(L) the singular value sum of representing matrix L.
When calculating, it can be noted that object function is convex, so object function can be asked the local derviation of its F, at derivative At 0, it is the extreme point of object function.First calculating | | FT-WFT||*Time, introduce when this example is optimized and calculates One theorem: for matrixHave
| | X | | * = | | ( XX T ) - 1 4 X | | F 2 ,
Replacing nuclear norm by Frobenius norm in theorem, and provide one group of substrate for model, order is the X of r α item weight definition is:
X α = U Σ α V T , Σ α = d i a g ( σ 1 α , ... , σ r α ) ,
U∑VΤIt is the decomposition that X is carried out singular value, ∑=diag (σ1,...,σr).Based on above theorem, the inventive method Object function item | | FT-WFT||*Can be by following conversion:
| | F T - WF T | | * = | | G ( F T - WF T ) | | F 2 = t r ( G ( F T - WF T ) ( F T - WF T ) T G T ) = t r ( G ( I - W ) F T F ( I - W T ) G T ) = t r ( F ( I - W T ) G T G ( I - W ) F T ) ,
Wherein, G is weight matrix, is defined as:
G=((FT-WFT)(FT-WFT)T)-1/4,
It should be noted here that as (FT-WFT)(FT-WFT)ΤIn some singular value the least time, calculate G time there will be mistake By mistake, in order to improve the stability of algorithm, it is possible to use (FT-WFT)εReplace (FT-WFT) calculate:
X ϵ = U Σ ϵ V T , Σ ϵ = d i a g ( m a x { σ i , ϵ } i = 1 : r ) ,
Wherein, εk=min{ εk-1K(FT-WFT)}.Then, label fit term is had:
(FT-YT)TUV(FT-YT)=FUVFT-FUVYT-YUVFT+YUVYT,
Minimizing for object function, after being equivalent to F derivation, derivative is the extreme point of zero, then has:
∂ ∂ F = F ( I - W T ) G T G ( I - W ) + F U V - Y U V = 0 ⇒ F ( ( I - W T ) G T G ( I - W ) + U V ) = Y U V ⇒ F = Y U V ( ( I - W T ) G T G ( I - W ) + U V ) - 1 ,
Wherein,It is diagonal matrix, is infinity corresponding to there being the diagonal element value of label, without the correspondence of label Diagonal element value is 0.
Finally, because V and G is the function about F, institute is in this way by making target letter to three mutual iteration of variable Number obtains efficient solution, finally draws soft label F and predicts the outcome.Concrete Tag Estimation algorithm based on nuclear norm driving is such as Under:
Input: raw data matrixTest set label matrix
Output: soft label matrix (F*←Fk+1), prediction matrix (predicted_labels).
Initialize:
Gama=1, kesi=1, tol=1e-2, K=7, V=G=I, F=Y, maxIter=10, converged=0
Do when while is the most not converged
Fixing G and V also updates Fk+1:
Fk+1=YUVk((I-WT)(Gk)TGk(I-W)+UVk)-1
Fixing F also updates Vk+1:
V k + 1 = 1 / 2 | | F k T - Y T | | 2
Fixing F also updates Gk+1:
Gk+1=(((Fk)T-W(Fk)T)((Fk)T-W(Fk)T)T)-1/4
Check whether convergence:
If sqrt (sum (tmp (:).2)) < tol | | iter >=maxIter then stops;If (the condition of iteration stopping For: exceed maximum iteration time maxIter set in advance, or calculate between the matrix F that double iteration obtains away from From (basis for estimation is: Fk+1-FkAll elements quadratic sum opens the result of radical sign less than setting tol), if less than presetting Value, then iteration stopping.)
Otherwise k=k+1
endwhile
This algorithm essentially consists in Gk+1Calculating, need matrix is carried out singular value decomposition, therefore the calculating of this algorithm is complicated Spend identical with Inexact ALM method based on RPCA.
Step S102, utilize soft class label prediction matrix F, be calculated the soft class label vector of test sample, root Determine the classification of test sample according to the classification that the maximum of probability in soft class label vector is corresponding, obtain knot of classifying the most accurately Really.
In the present invention, the element in soft class label vector is the probability that test sample belongs to each classification.
By model, the mutual iteration of soft class label matrix F and its dependent variable is obtained the soft classification of all sample datas Label matrix F, the sample x that last each class label information is unknownnewClass label can be summed up as argmaxi≤c(fnew), fnewFor xnewSoft label vector, i.e. according to soft label fnewThe position corresponding to maximum of middle classification ownership probability, estimates class The sample class that label information is unknown, completes categorizing process.For the soft label matrix F of above-mentioned grey iterative generation, last each nothing The hard label of label training sample can be summed up as argmaxi≤c(fi), represent the soft label vector f of predictioniI-th element Position.According to the maximum in the soft label corresponding without label training sample, obtain the prediction corresponding without label training set sample Classification.
Data classification method based on the nuclear norm driving that the invention described above embodiment provides, the embodiment of the present invention also provides for The data sorting system that a kind of nuclear norm drives, with reference to Fig. 2, this system 200 based on nuclear norm as the thought of distance metric, It can include following content:
Training pretreatment module 201, for determining initial category according to the original tag information of training sample in training set Matrix Y, and obtain similarity measure matrix by all training samples are performed neighbor search operative configuration, and to similarity measure Matrix carries out symmetrization, normalized obtains weight coefficient matrix W;Wherein, training set includes label training sample and nothing Label training sample, weight coefficient matrix W is for characterizing the neighbour's characteristic between sample;
Training module 202, for based on initial category matrix Y and weight coefficient matrix W, by balance neighbour reconstruct item and Label fit term sets up direct-push label propagation model, utilizes direct-push label propagation model to be iterated optimization and obtains training set Soft class label prediction matrix F;Wherein, neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining Manifold smooths item, and label fit term is based on weighting L2, the label fit term of 1 norm regularization;
Category determination module 203, is used for utilizing soft class label prediction matrix F, is calculated the soft classification of test sample Label vector, determines the classification of test sample, obtains according to the classification that the maximum of probability in soft class label vector is corresponding Classification results accurately;Wherein, the element in soft class label vector is the probability that test sample belongs to each classification.
Refer to table 1, for the inventive method and SparseNP (Sparse Neighborhood Propagation), SLP (SpecialLabel Propagation)、LNP(Label NeighborhoodPropagation)、LLGC(Learning with Local and Global Consistency)、LapLDA(LaplacianLinearDiscriminant Analysis), GFHF (Gaussian Fields and Harmonic Functions) and CD-LNP (Prior Class Dissimilarity based LNP) classification Comparative result table, give the average of each method experiment and the highest identification Rate.Refer to accompanying drawing 3, the actual application scenarios of the data classification method that a kind of nuclear norm disclosed in the embodiment of the present invention drives Schematic diagram.
In this example, SparseNP, LNP and LapLDA method participating in comparing uses the acquiescence ginseng that in each document, algorithm uses Number, and classification all employing K-arest neighbors (K=7) graders.Real from COIL20 destination image data collection and HP0 machine error respectively Test training sample data concentrate every class randomly select 15 and 2 as marked data, other Unlabeled datas are as test Collection.Every class is concentrated to randomly select 5 and 7 as marked from these two groups of Experiment Training sample datas of GTF facial image respectively Data.Other Unlabeled datas are as test set.
Table 1
We can see that the Data Classifying Quality of the present invention is substantially better than other relevant several sides by experimental result Method, and show stronger stability, there is certain advantage.
It should be noted that each embodiment in this specification all uses the mode gone forward one by one to describe, each embodiment weight Point explanation is all the difference with other embodiments, and between each embodiment, identical similar part sees mutually. For system class embodiment, due to itself and embodiment of the method basic simlarity, so describing fairly simple, relevant part ginseng See that the part of embodiment of the method illustrates.
The data classification method and the system that drive a kind of nuclear norm provided by the present invention above are described in detail. Principle and the embodiment of the present invention are set forth by specific case used herein, and the explanation of above example is simply used In helping to understand method and the core concept thereof of the present invention.It should be pointed out that, for those skilled in the art, Under the premise without departing from the principles of the invention, it is also possible to the present invention is carried out some improvement and modification, these improve and modify also Fall in the protection domain of the claims in the present invention.

Claims (4)

1. the data classification method that a nuclear norm drives, it is characterised in that based on nuclear norm as the thought of distance metric, be somebody's turn to do Method includes:
Initial category matrix Y is determined according to the original tag information of training sample in training set, and by all described instructions Practice sample execution neighbor search operative configuration and obtain similarity measure matrix, and described similarity measure matrix is carried out symmetrization, returns One change processes and obtains weight coefficient matrix W;Wherein, described training set includes label training sample and without label training sample, Described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample;
Based on described initial category matrix Y and described weight coefficient matrix W, reconstruct item by balance neighbour and label fit term is built Vertical direct-push label propagation model, utilizes described direct-push label propagation model to be iterated optimizing and obtain the soft of described training set Class label prediction matrix F;Wherein, described neighbour reconstructs the reconfiguring false item that item is reliable nuclear norm tolerance, is used for defining Manifold smooths item, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization;
Utilize described soft class label prediction matrix F, be calculated the soft class label vector of test sample, according to soft classification mark Sign the classification that in vector, the maximum of probability is corresponding and determine the classification of described test sample, obtain classification results the most accurately;Its In, the element in described soft class label vector is the probability that described test sample belongs to each classification.
2. the method for claim 1, it is characterised in that described by all described training samples are performed neighbor search Operative configuration obtains similarity measure matrix, and described similarity measure matrix is carried out symmetrization, normalized obtains weight system Matrix number W, including:
Sample each in described training set is carried out k nearest neighbor search, finds out each sample in described training set K distance Near sample;
Use the construction algorithm of LLE-reconstruct power, the similarity measure matrix of structure similar neighborhoods figure;
Described similarity measure matrix is carried out symmetrization, normalized, obtains described weight coefficient matrix W.
3. method as claimed in claim 1 or 2, it is characterised in that realize by solving following the minimization of object function problem Described balance neighbour reconstructs item and label fit term:
M i n F J ^ = | | F T - WF T | | * + t r ( ( F T - Y T ) T U V ( F T - Y T ) )
Wherein, | | FT-WFT||*Item, tr ((F is reconstructed for described neighbourT-YT)TUV(FT-YT)) it is described label fit term;U be with μiBalance neighbour for element reconstructs the just balance parameter matrix between item and label fit term;V is with Vi,i=1/2 | | hi||2 (i=1,2 ..., l+ μ) it is the diagonal matrix of element, hiFor matrix FΤ-YΤI-th row vector.
4. the data sorting system that a nuclear norm drives, it is characterised in that based on nuclear norm as the thought of distance metric, be somebody's turn to do System includes:
Training pretreatment module, for determining initial category matrix Y according to the original tag information of training sample in training set, with And obtain similarity measure matrix by all described training samples are performed neighbor search operative configuration, and to described similarity measure Matrix carries out symmetrization, normalized obtains weight coefficient matrix W;Wherein, described training set includes label training sample With without label training sample, described weight coefficient matrix W is for characterizing the neighbour's characteristic between sample;
Training module, for based on described initial category matrix Y and described weight coefficient matrix W, by balance, neighbour reconstructs item Set up direct-push label propagation model with label fit term, utilize described direct-push label propagation model to be iterated optimization and obtain The soft class label prediction matrix F of described training set;Wherein, described neighbour reconstructs the reconstruct mistake that item is reliable nuclear norm tolerance Item, is used for defining manifold and smooths item by mistake, and described label fit term is based on weighting L2, the label fit term of 1 norm regularization;
Category determination module, is used for utilizing described soft class label prediction matrix F, is calculated the soft class label of test sample Vector, determines the classification of described test sample, obtains according to the classification that the maximum of probability in soft class label vector is corresponding Classification results accurately;Wherein, the element in described soft class label vector is that described test sample belongs to the general of each classification Rate.
CN201610554118.8A 2016-07-14 2016-07-14 The data classification method of a kind of nuclear norm driving and system Pending CN106203517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610554118.8A CN106203517A (en) 2016-07-14 2016-07-14 The data classification method of a kind of nuclear norm driving and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610554118.8A CN106203517A (en) 2016-07-14 2016-07-14 The data classification method of a kind of nuclear norm driving and system

Publications (1)

Publication Number Publication Date
CN106203517A true CN106203517A (en) 2016-12-07

Family

ID=57475253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610554118.8A Pending CN106203517A (en) 2016-07-14 2016-07-14 The data classification method of a kind of nuclear norm driving and system

Country Status (1)

Country Link
CN (1) CN106203517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198268A (en) * 2017-12-19 2018-06-22 江苏极熵物联科技有限公司 A kind of production equipment data scaling method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108198268A (en) * 2017-12-19 2018-06-22 江苏极熵物联科技有限公司 A kind of production equipment data scaling method

Similar Documents

Publication Publication Date Title
CN104463202A (en) Multi-class image semi-supervised classifying method and system
CN103810288B (en) Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
CN104966105A (en) Robust machine error retrieving method and system
CN105608471A (en) Robust transductive label estimation and data classification method and system
CN104794489A (en) Deep label prediction based inducing type image classification method and system
CN103942749B (en) A kind of based on revising cluster hypothesis and the EO-1 hyperion terrain classification method of semi-supervised very fast learning machine
CN112685504B (en) Production process-oriented distributed migration chart learning method
CN107943856A (en) A kind of file classification method and system based on expansion marker samples
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN111666406A (en) Short text classification prediction method based on word and label combination of self-attention
CN108830301A (en) The semi-supervised data classification method of double Laplace regularizations based on anchor graph structure
CN116644755B (en) Multi-task learning-based few-sample named entity recognition method, device and medium
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
CN106156805A (en) A kind of classifier training method of sample label missing data
CN105335619A (en) Collaborative optimization method applicable to parameter back analysis of high calculation cost numerical calculation model
CN109919236A (en) A kind of BP neural network multi-tag classification method based on label correlation
CN106529604A (en) Adaptive image tag robust prediction method and system
CN109993188B (en) Data tag identification method, behavior identification method and device
CN112861626B (en) Fine granularity expression classification method based on small sample learning
Naved et al. IoT-Enabled Convolutional Neural Networks: Techniques and Applications
CN103093239B (en) A kind of merged point to neighborhood information build drawing method
CN112529057A (en) Graph similarity calculation method and device based on graph convolution network
Song An Evaluation Method of English Teaching Ability Based on Deep Learning
CN106203517A (en) The data classification method of a kind of nuclear norm driving and system
CN104573727A (en) Dimension reduction method of handwritten digital image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161207

RJ01 Rejection of invention patent application after publication