CN108805155A - Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously - Google Patents

Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously Download PDF

Info

Publication number
CN108805155A
CN108805155A CN201810233453.7A CN201810233453A CN108805155A CN 108805155 A CN108805155 A CN 108805155A CN 201810233453 A CN201810233453 A CN 201810233453A CN 108805155 A CN108805155 A CN 108805155A
Authority
CN
China
Prior art keywords
sample
square
laplace regularization
grader
incidence matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810233453.7A
Other languages
Chinese (zh)
Inventor
王迪
张磊
张笑钦
古楠楠
叶修梓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cangnan Institute Of Cangnan
Original Assignee
Cangnan Institute Of Cangnan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cangnan Institute Of Cangnan filed Critical Cangnan Institute Of Cangnan
Priority to CN201810233453.7A priority Critical patent/CN108805155A/en
Publication of CN108805155A publication Critical patent/CN108805155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Abstract

The invention discloses a kind of while learning the semisupervised classification method of incidence matrix and Laplace regularization least square, mainly include the following steps that:Conjunctive model that is a kind of while learning incidence matrix and Laplace regularization least square is established according to training sample first;Secondly optimization is iterated to each class variable in model using block coordinate descent;It finally uses Laplace regularization least square grader to obtain the soft label of sample, and chooses classification of that the maximum dimension of element in label vector as sample.The sparse of sample is effectively merged by the invention from problem of representation and Laplace regularization least square grader, and is realized optimization while sample incidence matrix and Laplace regularization least square grader in learning process and mutually improved.The present invention has explicit classifier functions, the problem of so as to effectively handle external sample.Relative to other semisupervised classification methods, this method has more accurate classification accuracy rate, there is good application prospect.

Description

Learn the semisupervised classification of incidence matrix and Laplace regularization least square simultaneously Method
Technical field
The present invention relates to CRT technology fields, are specifically related to a kind of while learning incidence matrix (Affinity ) and Laplace regularization least square (Laplacian Regularized Least Square, Lap-RLS) Matrix Semisupervised classification method.
Background technology
In practical application, the performance of sorting technique depends in training sample the number for having exemplar, however It is extremely difficult, costly and time-consuming that being obtained in actual life, which has exemplar, and needs a large amount of effort of domain experts. On the other hand, the fast development for having benefited from data sampling techniques and computer hardware technology has obtained a large amount of unlabeled exemplars It is very easy to, so that utilize a small amount of semi-supervised learning for thering are exemplar and a large amount of unlabeled exemplars to be trained (Semi-Supervised Learning, SSL) becomes the research hotspot in CRT technology and machine learning field, existing It has been widely used in the fields such as image classification, recognition of face.
The neighbouring sample in mutually similar cluster or data manifold have larger possibility possess identical label this Under assuming that, the semi-supervised learning (Graph based Semi-Supervised Learning, G-SSL) based on figure causes extensively The close attention of big researcher.Its core concept is to utilize given part labels and the consistency being associated in pairs between sample It predicts the label of unlabeled exemplars, that is, passes through the process that pairs of associated data propagates label.Generally speaking, half prison based on figure It superintends and directs learning algorithm and mainly solves two critical problems:(1) incidence matrix between sample is constructed;(2) unlabeled exemplars are predicted Label.
Being associated with this concept is suggested in the weight matrix in defining datagram.Weight matrix is used for indicating The matrix of similarity between sample.Zhou et al. constructs adjacent map based on Euclidean distance to choose k Neighbor Points, then passes through Thermonuclear (heat kernel) calculates edge weights matrix.The linear neighborhood patches of overlappings of Wang et al. a series of approaches Entire figure, the edge weights of each patch are calculated using neighborhood linear projection.Liu et al. people proposes effective anchor figure, pass through by Each sample is expressed as it adjacent to the linear combination of anchor point to construct association.In order to avoid neighborhood graph parameter select permeability and obtain Derived from figure is adapted to, scholars propose rarefaction representation (Sparse Represention, SR) and low-rank representation (Low-Rank Representation, LRR) etc. the overall situation from representation method, main thought be calculate each sample under other samples Sparse or low-rank representation coefficient, then constructs incidence matrix with coefficient is indicated.The overall situation can be obtained preferably from representation method The global structure of data is an effective tool of semi-supervised learning.
After obtaining incidence matrix, then unlabeled exemplars can be estimated by different forecasting mechanisms based on incidence matrix Label, such as Gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF), part With globally consistent inquiry learning (Learning with Local and Global Consistency, LLGC), manifold regularization (Manifold Regularization, MR), Markov random walk (Markov Random Walks, MRW), special mark Label propagate (Special Label Propagation, SLP), the straight thruster of spectrogram (Spectral Graph Transducer, SGT) etc..
The semi-supervised learning method based on figure is in classification task although show excellent performance above, they It is usually first to construct incidence matrix in training process, is then based on incidence matrix by certain forecasting mechanism come sample estimates mark Label, i.e., the construction of incidence matrix and Tag Estimation mechanism individually carry out in two steps, cannot make full use of incidence matrix in this way It is potentially contacted between sample label.
In order to overcome this disadvantage, Li et al. people that the two independent processes are merged into a combined optimization model and are asked Solution referred to as learns by oneself semi-supervised learning (Self-Taught Semi-Supervised Learning, STSSL).It can be simultaneously The label for learning incidence matrix and unknown sample the advantage is that the label that can make full use of given label and prediction comes Constantly improve incidence matrix, to further increase the precision that label propagates (Label Propagation, LP).However this Method belongs to transductive learning method, i.e. not explicit decision function predicts the label of unlabeled exemplars, therefore cannot be effective Ground handles the problem of external sample.In addition to this, label transmission method used in optimization process is also not suitable for having multiple The data set of miscellaneous structure.
Invention content
The purpose of the invention is to overcome shortcoming and defect of the existing technology, and provides a kind of while learning to be associated with The semisupervised classification method of matrix and Laplace regularization least square.This method is by the sparse from problem of representation and drawing of sample This regularization least square grader of pula is effectively merged, and establishes the Laplace regularization least square of self-study (Self-taught Laplacian Regularized Least Square, ST-LapRLS) model, and in learning process Optimization and mutually improvement while realizing sample incidence matrix and Laplace regularization least square grader.It is prior It is that the invention has explicit classifier functions, so as to effectively handle external sample problem.
5. to achieve the above object, the technical scheme is that this method includes:
S1:The joint mould of incidence matrix and Laplace regularization least square is built while learnt according to training sample Type;
S2:Optimization, Zhi Daoshou are iterated to each class variable in the conjunctive model using block gradient descent algorithm It holds back;
S3:With the soft label of Laplace regularization least square classifier calculated sample to be sorted, and choose soft label That maximum dimension of element is as the classification belonging to sample to be sorted in vector.
Further setting is that the step S1 includes the following steps:
S11:By sparse from indicating to build the incidence matrix between training sample, even some sample can be with other several If a sample rarefaction representation, then it represents that there is larger relevances between this sample and these samples;
S12:Laplce's figure relationship between sample label is built, i.e. the prodigious sample of relevance should have similar Label;
S13:Sparse by sample is effectively embedded into from problem of representation in Laplace regularization least square grader, is built Conjunctive model that is vertical while learning incidence matrix and Laplace regularization least square.
It includes following sub-step that further setting, which is the step S2,:
S21:By the conjunctive model of foundation be decomposed into Laplace regularization least square grader subproblem and It is sparse to indicate subproblem certainly, and Optimization Solution in turn is carried out to two sub-problems by way of iteration;
S22:For Laplace regularization least square grader subproblem, with the method that gradient is zero to grader Coefficient matrix variable carry out Analytical Solution;
S23:For sparse from relevant subproblem is indicated, in order to eliminate the sparse coefficient variation of expression certainly in optimization process Correlation, introduce auxiliary variable, and with alternating direction Multiplier Algorithm solve.
It includes following sub-step that further setting, which is the step S3,:
S31:With the soft label of Laplace regularization least square classifier calculated sample to be sorted;
S32:That maximum dimension of element in soft label vector is found out, and using the dimension as belonging to sample to be sorted Classification.
The beneficial effects of the invention are as follows:
1, the present invention proposes a kind of completely new and general semisupervised classification method, to arbitrary classification data type (such as recognition of face, object classification) is all suitable for.Specifically, estimated label is made full use of to go to generate good data pass Join matrix, good incidence matrix can further improve the performance of grader again, i.e. grader and incidence matrix is in learning process In can achieve the effect that mutually promote.
2, grader proposed by the present invention is explicit, it can easily handle external sample problem.
3, the soft label that the present invention is obtained by Laplace regularization least square grader is compared to label propagation side The hard label that method obtains is more suitable for the manifold in semi-supervised learning it is assumed that therefore the present invention can handle the classification times of complex data Business.
4, the present invention quickly and effectively solves built semisupervised classification model by block gradient descent algorithm.Especially Ground uses gradient to be solved for zero method the coefficient matrix variable in Laplace regularization least square grader Analysis solution;For the sparse correlated variables from problem of representation, the present invention proposes to be iterated solution with alternating direction Multiplier Algorithm.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, according to These attached drawings obtain other attached drawings and still fall within scope of the invention.
Fig. 1 flow charts of the method for the present invention;
Fig. 2 is the overall flow service chart of the present invention
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made into one below in conjunction with attached drawing Step ground detailed description.
As shown in Figure 1 to Figure 2, it is in the embodiment of the present invention, the present invention is a kind of while learning incidence matrix and La Pula The semisupervised classification method of this regularization least square, the hardware and programming language of method carrying out practically of the invention are simultaneously unlimited System, being write with any language can complete, and other operating modes repeat no more thus.
The embodiment of the present invention is using the calculating with Intel Xeon-E5 central processing units and 16G byte of memory Machine is used in combination Matlab language to work out while learning the semisupervised classification of incidence matrix and Laplace regularization least square Working procedure, the method for realizing the present invention.
Learn incidence matrix while of the invention and the semisupervised classification method of Laplace regularization least square is main Foundation, block coordinate including conjunctive model decline optimization algorithm and three steps of classification policy of sample.
Before introducing specific steps, the meaning of the symbol to be used below introduction of the present invention.
Give a data set for including C class samplesWithout loss of generality, false If preceding l element is to have exemplar in data set S, remaining u element is unlabeled exemplars.There are exemplar and no label Sample is formed by matrix and is denoted as X respectivelylAnd Xu, i.e. Xl=[x1,…,xl], Xu=[xl+1,…,xl+u].Correspondingly, Yl= [y1,…,yl] and Yu=[yl+1,…,yl+u] it is known label matrix and Unknown Label matrix, wherein label y respectivelyiIt is a C 0-1 vectors are tieed up, i.e., if xiBelong to jth class sample, then yiJ-th of position on element value be 1, the element in other positions Value is 0.Whole samples are formed by matrix and are denoted as X, i.e. X=[Xl,Xu], label matrix Y, i.e. Y=[Yl,Yu]。||A||1 Representing matrix A'sNorm, i.e., | | A | |1=∑ ∑ | αij|。
For the classification problem for including C class samples, it is as follows:
S1 builds while learning the conjunctive model of incidence matrix and Laplace regularization least square.It includes mainly:
A) incidence matrix between training sample is built.Classical is sparse as follows from problem of representation:
Wherein E is error matrix, and A is sparse from representing matrix, and the diagonal entry of diag (A)=0 representing matrixes A is all Zero, this be in order to avoid there is trivial solution, i.e., sample with its own come linear expression.
The optimal solution of note problem 1. isWhereinIt is sample xiIt is sparse under other samples
Indicate coefficient.If xiThe other sample rarefaction representations of energy, i.e.,Then illustrate sample xiAnd sample ThisBetween there is larger relevances, therefore the incidence matrix between sample can be by rarefaction representation coefficient come structure It makes, i.e.,
W=(| A |+| AT|)/2 ②
B) Laplce's figure relationship between sample label is built.The big sample of relevance should have similar label, Therefore Laplce's figure relationship of construction sample label is
Wherein L is Laplacian Matrix, and L=D-W, D are diagonal matrix, and diagonal entry is
C) the sparse of sample is combined from problem of representation with Laplace regularization least square grader, establishes and learns by oneself Laplace regularization least square model
S.t.X=XA+E, diag (A)=0
Herein, β is a positive parameter,It is nuclear matrix, B is Laplce's canonical Change least squared classified device coefficient matrix,Be a preceding l element it is the diagonal matrix that 1 remaining element is 0,WhereinIt is a null vector.
S2 is iterated optimization, Zhi Daoshou using block gradient descent algorithm to each variable in the conjunctive model that is proposed It holds back.It includes mainly:
A) optimization problem is decomposed into Laplace regularization least square grader subproblem and sparse expression certainly Problem, and Optimization Solution in turn is carried out to following two subproblems by way of iteration
In 5., L(t)It is by A(t)The Laplacian Matrix induced;In 6., Y(t+1)=B(t+1)K, i.e. grader institute The soft label matrix of the sample of prediction.
B) fixation is sparse from representing matrix A(t)With error matrix E(t), solve Laplace regularization least square grader Coefficient matrix B(t+1).Since 5. sub- optimization problem is convex function about variable B and can continuously lead, therefore gradient is asked to variable B and is enabled It is zero, and available analytic solutions are:
Wherein I is unit matrix.
C) Laplace regularization least square grader coefficient matrix B is given(t+1), solve sparse from representing matrix A(t +1)With error matrix E(t+1).Due to
Define matrixThen 6. sub- optimization problem can be re-written as
Wherein⊙ is that the Hadmard of matrix is accumulated.Optimizing to eliminate the sparse coefficient variation of expression certainly Correlation in journey introduces auxiliary variable Z, while for the sake of symbol simplicity, the subscript in variable being removed, then optimization problem 8. being equivalent to:
Its Augmented Lagrangian Functions is
Wherein Λ(1), Λ(2)Indicate Lagrangian Matrix Multiplier,<,>The inner product of representing matrix.With alternating direction Multiplier Algorithm to optimization problem 9. in each variable be iterated solution, be as follows:
Sub- optimization problem about A is solved to
WhereinSη() is that threshold operator meets Sη(x)=sgn (x) max (| x |-η, 0).
Sub- optimization problem about Z is solved to
It is zero to seek gradient about Z to above formula and enable it, and the analytic solutions that variable Z can be obtained are
Sub- optimization problem about E is solved to
Update Lagrange multiplier matrix
Λκ+1 (1)κ (1)+μ(X-XZκ+1-Eκ+1),
Λκ+1 (2)κ (2)+μ(Zκ+1-Aκ+1+diag(Aκ+1))。
S3 proposes a kind of method differentiating sample class label, includes mainly specifically:
A) for need differentiate classification sample z, sample z is gone out by Laplace regularization least square classifier calculated Soft label, i.e.,
yz=F (z)=Bkz,
Wherein kz=[k (x1,z),k(x2,z),…,k(xl+u,z)]T
B) that maximum dimension of element in soft label vector is found out, and using the dimension as the classification belonging to sample, i.e.,
One of ordinary skill in the art will appreciate that implement the method for the above embodiments be can be with Relevant hardware is instructed to complete by program, the program can be stored in a computer read/write memory medium, The storage medium, such as ROM/RAM, disk, CD.
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims (4)

1. semisupervised classification method that is a kind of while learning incidence matrix and Laplace regularization least square, feature exist In this method includes:
S1:The conjunctive model of incidence matrix and Laplace regularization least square is built while learnt according to training sample;
S2:Optimization is iterated to each class variable in the conjunctive model using block gradient descent algorithm, until convergence;
S3:With the soft label of Laplace regularization least square classifier calculated sample to be sorted, and choose soft label vector That maximum dimension of middle element is as the classification belonging to sample to be sorted.
2. semisupervised classification method according to claim 1, it is characterised in that:The step S1 includes the following steps:
S11:The incidence matrix between training sample is built by sparse expression certainly, even some sample can use other several samples If this rarefaction representation, then it represents that there is larger relevances between this sample and these samples;
S12:Laplce's figure relationship between sample label is built, i.e. the prodigious sample of relevance there should be similar mark Label;
S13:Sparse by sample is effectively embedded into from problem of representation in Laplace regularization least square grader, is established same When learn incidence matrix and Laplace regularization least square conjunctive model.
3. semisupervised classification method according to claim 1, it is characterised in that:The step S2 includes following sub-step Suddenly:
S21:The conjunctive model of foundation is decomposed into Laplace regularization least square grader subproblem and sparse From expression subproblem, and Optimization Solution in turn is carried out to two sub-problems by way of iteration;
S22:For Laplace regularization least square grader subproblem, with the method that gradient is zero to grader system Matrix number variable carries out Analytical Solution;
S23:For sparse from relevant subproblem is indicated, in order to eliminate the sparse phase from expression coefficient variation in optimization process Guan Xing introduces auxiliary variable, and is solved with alternating direction Multiplier Algorithm.
4. semisupervised classification method according to claim 1, it is characterised in that:The step S3 includes following sub-step Suddenly:
S31:With the soft label of Laplace regularization least square classifier calculated sample to be sorted;
S32:That maximum dimension of element in soft label vector is found out, and using the dimension as the class belonging to sample to be sorted Not.
CN201810233453.7A 2018-03-21 2018-03-21 Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously Pending CN108805155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810233453.7A CN108805155A (en) 2018-03-21 2018-03-21 Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810233453.7A CN108805155A (en) 2018-03-21 2018-03-21 Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously

Publications (1)

Publication Number Publication Date
CN108805155A true CN108805155A (en) 2018-11-13

Family

ID=64095254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810233453.7A Pending CN108805155A (en) 2018-03-21 2018-03-21 Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously

Country Status (1)

Country Link
CN (1) CN108805155A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363178A (en) * 2019-07-23 2019-10-22 上海黑塞智能科技有限公司 The airborne laser point cloud classification method being embedded in based on part and global depth feature
CN111160398A (en) * 2019-12-06 2020-05-15 重庆邮电大学 Missing label multi-label classification method based on example level and label level association
CN112801162A (en) * 2021-01-22 2021-05-14 之江实验室 Adaptive soft label regularization method based on image attribute prior
CN115240863A (en) * 2022-08-11 2022-10-25 合肥工业大学 Alzheimer disease classification method and system for data loss scene

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363178A (en) * 2019-07-23 2019-10-22 上海黑塞智能科技有限公司 The airborne laser point cloud classification method being embedded in based on part and global depth feature
CN110363178B (en) * 2019-07-23 2021-10-15 上海黑塞智能科技有限公司 Airborne laser point cloud classification method based on local and global depth feature embedding
CN111160398A (en) * 2019-12-06 2020-05-15 重庆邮电大学 Missing label multi-label classification method based on example level and label level association
CN111160398B (en) * 2019-12-06 2022-08-23 重庆邮电大学 Missing label multi-label classification method based on example level and label level association
CN112801162A (en) * 2021-01-22 2021-05-14 之江实验室 Adaptive soft label regularization method based on image attribute prior
CN115240863A (en) * 2022-08-11 2022-10-25 合肥工业大学 Alzheimer disease classification method and system for data loss scene
CN115240863B (en) * 2022-08-11 2023-05-09 合肥工业大学 Alzheimer's disease classification method and system for data loss scene

Similar Documents

Publication Publication Date Title
Yunpeng et al. Multi-step ahead time series forecasting for different data patterns based on LSTM recurrent neural network
Shen et al. Wind speed prediction of unmanned sailboat based on CNN and LSTM hybrid neural network
Qian et al. Stock prediction based on LSTM under different stability
Xie et al. Graph neural network approach for anomaly detection
CN108805155A (en) Learn the semisupervised classification method of incidence matrix and Laplace regularization least square simultaneously
Wang et al. Correlation aware multi-step ahead wind speed forecasting with heteroscedastic multi-kernel learning
CN108830301A (en) The semi-supervised data classification method of double Laplace regularizations based on anchor graph structure
CN114219181A (en) Wind power probability prediction method based on transfer learning
Hu et al. Ensemble echo network with deep architecture for time-series modeling
Mengcan et al. Constrained voting extreme learning machine and its application
Ai et al. A machine learning approach for cost prediction analysis in environmental governance engineering
CN114328663A (en) High-dimensional theater data dimension reduction visualization processing method based on data mining
Fu et al. MCA-DTCN: A novel dual-task temporal convolutional network with multi-channel attention for first prediction time detection and remaining useful life prediction
Wu et al. TWC-EL: A multivariate prediction model by the fusion of three-way clustering and ensemble learning
Guo The microscopic visual forms in architectural art design following deep learning
CN110555530A (en) Distributed large-scale gene regulation and control network construction method
Copiaco et al. Exploring deep time-series imaging for anomaly detection of building energy consumption
CN115080795A (en) Multi-charging-station cooperative load prediction method and device
Jingbo Big Data Classification Model and Algorithm Based on Double Quantum Particle Swarm Optimization
Zhang et al. Lstfcfedlear: A LSTM-FC with vertical federated learning network for fault prediction
CN110502784A (en) A kind of product simulation optimization method
Zhang et al. A hierarchical network embedding method based on network partitioning
Ma et al. Special issue on deep learning and neural computing for intelligent sensing and control
Dong et al. Retrosynthesis prediction based on graph relation network
Qi et al. Research on Carbon Emission Prediction Method Based on Deep Learning: A Case Study of Shandong Province

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181113

RJ01 Rejection of invention patent application after publication