CN111507387A - Paired vector projection data classification method and system based on semi-supervised learning - Google Patents

Paired vector projection data classification method and system based on semi-supervised learning Download PDF

Info

Publication number
CN111507387A
CN111507387A CN202010274957.0A CN202010274957A CN111507387A CN 111507387 A CN111507387 A CN 111507387A CN 202010274957 A CN202010274957 A CN 202010274957A CN 111507387 A CN111507387 A CN 111507387A
Authority
CN
China
Prior art keywords
data
class
matrix
laplace
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010274957.0A
Other languages
Chinese (zh)
Inventor
张莉
薛杨涛
屈蕴茜
章晓芳
王邦军
周伟达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202010274957.0A priority Critical patent/CN111507387A/en
Publication of CN111507387A publication Critical patent/CN111507387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and a system for classifying paired vector projection data based on semi-supervised learning, wherein the method comprises the following steps: constructing an adjacency graph according to the two types of training data, solving a Laplace matrix, and substituting the Laplace matrix into a Laplace manifold regularization term; respectively calculating a positive class Laplace manifold regular term and a negative class Laplace manifold regular term, an intra-class divergence matrix of the positive class data and an intra-class divergence matrix of the negative class data, and an inter-class divergence matrix of the positive class and an inter-class divergence matrix of the negative class; obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors; and projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data. The invention is beneficial to improving the classification precision.

Description

Paired vector projection data classification method and system based on semi-supervised learning
Technical Field
The invention relates to the technical field of data classification, in particular to a method and a system for classifying paired vector projection data based on semi-supervised learning.
Background
The optimization method of the Laplace support vector machine is the same as the traditional support vector machine, and an optimal hyperplane is obtained through quadratic programming.
The multi-weight vector projection support vector machine finds two weight vectors by using label data to replace a hyperplane to realize classification. Each optimal weight vector makes homogeneous data as close as possible and heterogeneous data as far away as possible. The solving method of the method is different from quadratic programming of a support vector machine (SVM for short), the weight vector of each type is obtained through feature decomposition two optimization functions, the data to be classified are multiplied and projected with the weight vector respectively, and the data with the minimum distance from the projection center is the type of the data. The multi-weight vector projection support vector machine is a supervised classification method, cannot process a large amount of label-free data, only depends on a small amount of label data, can reduce classification precision, and has limited applicable scenes.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problems of low classification precision and limited applicable scenes in the prior art, so that the paired vector projection data classification method and system based on semi-supervised learning, which have high classification precision and wide applicable scenes, are provided.
In order to solve the technical problem, the invention provides a paired vector projection data classification method based on semi-supervised learning, which comprises the following steps: constructing an adjacency graph according to the two types of training data, solving a Laplace matrix, and substituting the Laplace matrix into a Laplace manifold regularization term; respectively calculating a positive class Laplace manifold regular term and a negative class Laplace manifold regular term, an intra-class divergence matrix of the positive class data and an intra-class divergence matrix of the negative class data, and an inter-class divergence matrix of the positive class and an inter-class divergence matrix of the negative class; obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors; and projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
In an embodiment of the present invention, the method for solving the laplacian matrix is as follows: obtaining two types of training data, constructing a graph according to the training data to obtain an adjacency matrix, and obtaining a Laplace matrix according to the adjacency matrix.
In one embodiment of the present invention, the two types of training data are
Figure BDA0002444430960000021
Wherein
Figure BDA0002444430960000022
Representing training data, yi{ -1, +1} represents label information, m is a dimension, n is a total number of training data, XlFor a data set containing l labeled training data, XuIs a data set containing n-1 unlabeled training data.
In one embodiment of the present invention, the adjacency matrix is a:
Figure BDA0002444430960000023
the laplace matrix L ═ D-a, where D isii=∑jAij
In one embodiment of the invention, the laplacian matrix is divided into two or moreAfter substituting the Laplace manifold regularization term, the training data is mapped from the low-dimensional input space to the high-dimensional Hilbert space by a mapping function, where phi (x) ═ k (x, x)1),…,k(x,xn)]TWhere k (·,) is a kernel function, the mapped training data becomes
Figure BDA0002444430960000024
Positive data with label y ═ 1
Figure BDA0002444430960000025
Negative class data with label y-1
Figure BDA0002444430960000026
In one embodiment of the invention, the positive Laplace manifold regularization term is
Figure BDA0002444430960000027
Figure BDA0002444430960000031
The negative Laplace-like manifold regularization term is
Figure BDA0002444430960000032
Figure BDA0002444430960000033
Wherein
Figure BDA0002444430960000034
Figure BDA0002444430960000035
The intra-class divergence matrix of the positive class data is S1
Figure BDA0002444430960000036
The intra-class divergence matrix of the negative class data is S2
Figure BDA0002444430960000037
Wherein
Figure BDA0002444430960000038
Is a vector of the mean value of the vectors,
Figure BDA0002444430960000039
and
Figure BDA00024444309600000310
is a full 1 vector; the positive inter-class divergence matrix is G,
Figure BDA00024444309600000311
the positive inter-class divergence matrix is H,
Figure BDA00024444309600000312
where e is an n-dimensional all-1 vector.
In one embodiment of the present invention, the optimal problem is:
Figure BDA00024444309600000313
and
Figure BDA00024444309600000314
wherein, β1,β2∈[0,1]To balance the parameters of the intra-class and inter-class divergence matrices, p1,ρ2And more than or equal to 0 is a regular term parameter.
In an embodiment of the present invention, the method for solving to obtain two optimal projection vectors includes: converting an optimization problem into a lagrange multiplier problem
Figure BDA00024444309600000315
Let Lagrangian function L (w)1,λ1) W of1The partial derivative is equal to 0, i.e.
Figure BDA00024444309600000316
Similarly, pair L (w)2,λ2) W of2The partial derivative is equal to 0, i.e.:
Figure BDA00024444309600000317
convert the optimization problem to β1Gw1-(1-β1)S1w11KTLKw1=λ1w1And β2Hw2-(1-β2)S2w22KTLKw2=λ2w2To obtain two optimal projection vectors w1And w2
In an embodiment of the present invention, the method for obtaining the label-free data label includes: will not have label data
Figure BDA00024444309600000318
(m is dimension), and obtaining projected data through a kernel function:
Figure BDA00024444309600000319
will newly obtain the data
Figure BDA00024444309600000320
W obtained by training the module1And w2Projection onto two different subspaces
Figure BDA00024444309600000321
And
Figure BDA00024444309600000322
the distances to the centers of the subspaces are calculated respectively:
Figure BDA00024444309600000323
according to
Figure BDA00024444309600000324
To obtain non-tag data
Figure BDA00024444309600000325
Is marked with a label
Figure BDA00024444309600000326
The invention also provides a pair vector projection data classification system based on semi-supervised learning, which comprises the following steps: the projection module is used for constructing an adjacency graph according to two types of training data, solving a Laplace matrix, substituting the Laplace matrix into a Laplace manifold regular term, and respectively calculating a positive Laplace manifold regular term, a negative Laplace manifold regular term, an intra-class divergence matrix of positive data, an intra-class divergence matrix of negative data, an inter-class divergence matrix of positive data and an inter-class divergence matrix of negative data; and the classification module is used for projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
Compared with the prior art, the technical scheme of the invention has the following advantages:
for the label data, the principle of maximizing the interspecies divergence matrix and minimizing the intra-class divergence matrix is used, so that the heterogeneous data are far away as possible, and the homogeneous data are close as possible; for the non-label data, the Laplace manifold regular item is constructed, the item obtains the geometric information of the label data and the non-label data through the Laplace matrix, and the method has the advantages that the inter-class divergence matrix and the intra-class divergence matrix are obtained by fully utilizing the category information of the label data and the geometric information of the non-label data is supplemented, so that the method has higher discrimination capability, wider applicable scenes and is favorable for improving the accuracy.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which
FIG. 1 is a flow chart of a method for classifying paired vector projection data based on semi-supervised learning according to the present invention;
FIG. 2 is a graph of classification error rates for different datasets of the present invention using SVM, MVSVM, L apSVM and L apPVP.
Detailed Description
Example one
As shown in fig. 1, the present embodiment provides a method for classifying paired vector projection data based on semi-supervised learning, including the following steps: step S1: constructing an adjacency graph according to the two types of training data, solving a Laplace matrix, and substituting the Laplace matrix into a Laplace manifold regularization term; step S2: respectively calculating a positive class Laplace manifold regular term and a negative class Laplace manifold regular term, an intra-class divergence matrix of the positive class data and an intra-class divergence matrix of the negative class data, and an inter-class divergence matrix of the positive class and an inter-class divergence matrix of the negative class; step S3: obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors; step S4: and projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
In the method for classifying paired vector projection data based on semi-supervised learning, in step S1, an adjacency graph is constructed according to two types of training data, a laplacian matrix is solved, geometric information of label data and unlabeled data is obtained through the laplacian matrix, and the laplacian matrix is substituted into a laplacian manifold regularization term, so that a calculation optimization problem is facilitated; in step S2, respectively calculating a positive laplacian manifold regularization term and a negative laplacian manifold regularization term, so as to obtain geometric information of labeled data and unlabeled data, respectively calculating an intra-class divergence matrix of the positive data and an intra-class divergence matrix of the negative data, and respectively calculating an inter-class divergence matrix of the positive data and an inter-class divergence matrix of the negative data, and according to the intra-class divergence matrix and the inter-class divergence matrix and the geometric information supplemented with the unlabeled data, making the method have a better discrimination capability and wider applicable scenes; in the step S3, obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors, where the optimal projection vectors make homogeneous data as close as possible and heterogeneous data as far away as possible; in step S4, the unlabeled data is projected to a high-dimensional space through a kernel function, the two optimal projection vectors are projected to two different subspaces, and the distances from the two optimal projection vectors to the centers of the respective subspaces are respectively calculated to obtain labels of the unlabeled data.
The method for solving the Laplace matrix comprises the following steps: obtaining two types of training data, constructing a graph according to the training data to obtain an adjacency matrix, and obtaining a Laplace matrix according to the adjacency matrix. Specifically, the two types of training data are
Figure BDA0002444430960000051
Wherein
Figure BDA0002444430960000052
Representing training data, yi{ -1, +1} represents label information, m is a dimension, n is a total number of training data, XlFor a data set containing l labeled training data, XuIs a data set containing n-l unlabeled training data.
And constructing a graph according to the training data to obtain an adjacency matrix, wherein the adjacency matrix is A:
Figure BDA0002444430960000053
the laplacian matrix L may be derived from the adjacency matrix a, where the L ═ D-a, where D isii=∑jAij
And after the Laplace matrix is substituted into a Laplace manifold regularization term, mapping the training data from a low-dimensional input space to a high-dimensional Hilbert space through a mapping function, wherein the mapping function is phi (x) ═ k (x, x)1),…,k(x,xn)]TWhere k (·,) is a kernel function, the mapped training data becomes
Figure BDA0002444430960000061
Positive data with label y ═ 1
Figure BDA0002444430960000062
Negative class data with label y-1
Figure BDA0002444430960000063
It is assumed that the boundary functions of the two subspaces in the present invention are respectively
Figure BDA0002444430960000064
And
Figure BDA0002444430960000065
then the positive class Laplace manifold regularization term is
Figure BDA0002444430960000066
Figure BDA0002444430960000067
Then calculating the intra-class divergence matrix of the positive class data as S1
Figure BDA0002444430960000068
The intra-class divergence matrix of the negative class data is S2
Figure BDA0002444430960000069
Wherein
Figure BDA00024444309600000610
Figure BDA00024444309600000611
And
Figure BDA00024444309600000612
is a full 1 vector.
Then calculating the positive inter-class divergence matrix as G,
Figure BDA00024444309600000613
the positive inter-class divergence matrix is H,
Figure BDA00024444309600000614
where e is an n-dimensional all-1 vector.
According to the invention, on the basis of the principle of minimizing the intra-class divergence matrix while maximizing the inter-class divergence matrix, the minimized Laplace manifold regularization term is added on the basis of the geometric information of the considered unlabeled sample, so that the optimization problem of L apPVP is in the following form, and the optimal problem is as follows:
Figure BDA00024444309600000615
and
Figure BDA00024444309600000616
wherein, β1,β2∈[0,1]To balance the parameters of the intra-class and inter-class divergence matrices, p1,ρ2And more than or equal to 0 is a regular term parameter.
The method for solving to obtain two optimal projection vectors comprises the following steps: converting the optimization problem into a lagrangian multiplier problem, specifically, converting the optimization problem into the lagrangian multiplier problem according to the rayleigh entropy theorem:
Figure BDA00024444309600000617
Figure BDA00024444309600000618
let Lagrangian function L (w)1,λ1) W of1The partial derivative is equal to 0, i.e.
Figure BDA00024444309600000619
Similarly, pair L (w)2,λ2) W of2The partial derivative is equal to 0, i.e.:
Figure BDA0002444430960000071
thereby converting the solution mode of the optimization problem of L apPVP into two characteristic decomposition problems of β1Gw1-(1-β1)S1w11KTLKw1=λ1w1And β2Hw2-(1-β2)S2w22KTLKw2=λ2w2To obtain two optimal projection vectors w1And w2
The method for obtaining the label-free data label comprises the following steps: will not have label data
Figure BDA0002444430960000072
(m is dimension), and obtaining projected data through a kernel function:
Figure BDA0002444430960000073
will newly obtain the data
Figure BDA0002444430960000074
W obtained by training the module1And w2Projection onto two different subspaces
Figure BDA0002444430960000075
And
Figure BDA0002444430960000076
the distances to the centers of the subspaces are calculated respectively:
Figure BDA0002444430960000077
according to
Figure BDA0002444430960000078
To obtain non-tag data
Figure BDA0002444430960000079
Is marked with a label
Figure BDA00024444309600000710
The present invention was tested on the published wpbc (Breastcancer Wisconsin prognosic) dataset, each recording the follow-up data for one cancer case, which divided 198 samples into two categories, each sample containing 33 dimensions, depending on the presence or absence of breast cancer conditions. The present invention randomly selects 70% of the data of each class as training data, and the rest 30% are test data. The data classification method is described in detail below:
obtaining two types of data
Figure BDA00024444309600000711
Wherein
Figure BDA00024444309600000712
Representing training data, yi{ -1, +1} represents label information, m is a dimension, n is a total number of training data, XlFor a data set containing l labeled training data, XuIs a data set containing n-l unlabeled training data. Where n is 83, l is 14, n-l is 69, and m is 33. Firstly, a graph is constructed by using training data, and an adjacent matrix A is obtained:
Figure BDA00024444309600000713
wherein N iskThe optimal parameters are obtained by a 10-fold cross-validation method, namely {1, 3, 5, 7, 9 }.
Then, a laplace matrix L ═ D-a is determined, where D isii=∑jAij
The training data is mapped from a low-dimensional input space to a high-dimensional hilbert space by a phi (-) mapping function: phi (x) ═ k (x, x)1),…,k(x,xn)]TWherein k (·,. cndot.) is a kernel function, the invention selects a gaussian kernel k (x, x ') -exp (| | x-x' | | computationally intensive2/2σ2) Is a kernel function, and the parameters of the kernel function are the median of the sample data. The mapped training data becomes
Figure BDA0002444430960000081
Positive data with label y ═ 1
Figure BDA0002444430960000082
Negative class data with label y-1
Figure BDA0002444430960000083
Suppose that the decision functions of the two subspaces in the present invention are respectively
Figure BDA0002444430960000084
And
Figure BDA0002444430960000085
Figure BDA0002444430960000086
then the Laplace regularization term of the positive class
Figure BDA0002444430960000087
And negative Laplace-like regularization term
Figure BDA0002444430960000088
Respectively expressed as:
Figure BDA0002444430960000089
Figure BDA00024444309600000810
then, the intra-class divergence matrixes S of the positive class data and the negative class data are respectively calculated1And S2
Figure BDA00024444309600000811
Figure BDA00024444309600000812
Wherein the content of the first and second substances,
Figure BDA00024444309600000813
is a vector of the mean value of the vectors,
Figure BDA00024444309600000814
and
Figure BDA00024444309600000815
is a full 1 vector.
And respectively calculating a positive inter-class divergence matrix G and a negative inter-class divergence matrix H:
Figure BDA00024444309600000816
Figure BDA00024444309600000817
where e is an n-dimensional all-1 vector.
According to the invention, on the basis of the principle of minimizing the intra-class divergence matrix while maximizing the inter-class divergence matrix, the minimized Laplace manifold regularization term is added on the basis of the geometric information of the considered unlabeled sample, so that the optimization problem of L apPVP is in the following form:
Figure BDA00024444309600000818
s.t.||w1||=1
Figure BDA00024444309600000819
s.t.||w2||=1
wherein, β1,β2∈[0,1]To balance the parameters of the intra-class and inter-class divergence matrices, p1,ρ2The parameter is determined by 10-fold cross-validation method, wherein β1,β2,ρ1,ρ2Has a range of {2-12,2-11,…,212}。
Converting the optimization problem into a Lagrange multiplier problem according to Rayleigh entropy theorem:
Figure BDA0002444430960000091
Figure BDA0002444430960000092
let Lagrangian function L (w)1,λ1) W of1The partial derivative is equal to 0, i.e.:
Figure BDA0002444430960000093
similarly, pair L (w)2,λ2) W of2The partial derivative is equal to 0, i.e.:
Figure BDA0002444430960000094
finally, the solution to the optimization problem of L apPVP can be transformed into the following two eigen decomposition problems:
β1Gw1-(1-β1)S1w11KTLKw1=λ1w1
β2Hw2-(1-β2)S2w22KTLKw2=λ2w2
two optimal projection vectors w are obtained1And w2
Will not have label data
Figure BDA0002444430960000095
(m 33 is dimension), the projected data is obtained by kernel function:
Figure BDA0002444430960000096
will newly obtain the data
Figure BDA0002444430960000097
W obtained by training the module1And w2Projection onto two different subspaces
Figure BDA0002444430960000098
And
Figure BDA0002444430960000099
the distances to the centers of the subspaces are calculated respectively:
Figure BDA00024444309600000910
according to
Figure BDA00024444309600000911
To obtain non-tag data
Figure BDA00024444309600000912
Is marked with a label
Figure BDA00024444309600000913
Figure BDA00024444309600000914
The invention is verified by the following tests:
experiments show that the paired vector projection (L appVP) of the invention utilizes the geometric information of the unlabeled data to improve the classification result of the MVSVM, and the classification performance of the invention is superior to semi-supervised L appSVM, SVM, L appVP and L appVP of the invention, and the classification results of the data sets Haberman, Heart, Pima and Wpbc are counted on the experimental data by using a Gaussian kernel on the experimental data, and the classification error rate of the invention is better shown in FIG. 2
Example two
Based on the same inventive concept, the present embodiment provides a paired vector projection data classification system based on semi-supervised learning, which solves the problem in the same manner as the paired vector projection data classification method based on semi-supervised learning, and the repeated parts are not repeated.
The paired vector projection data classification system based on semi-supervised learning in this embodiment includes:
the projection module is used for constructing an adjacency graph according to two types of training data, solving a Laplace matrix, substituting the Laplace matrix into a Laplace manifold regular term, respectively calculating a positive Laplace manifold regular term and a negative Laplace manifold regular term, an intra-class divergence matrix of the positive data and an intra-class divergence matrix of the negative data, and a positive inter-class divergence matrix and a negative inter-class divergence matrix, obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors;
and the classification module is used for projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A paired vector projection data classification method based on semi-supervised learning is characterized by comprising the following steps:
step S1: constructing an adjacency graph according to the two types of training data, solving a Laplace matrix, and substituting the Laplace matrix into a Laplace manifold regularization term;
step S2: respectively calculating a positive class Laplace manifold regular term and a negative class Laplace manifold regular term, an intra-class divergence matrix of the positive class data and an intra-class divergence matrix of the negative class data, and an inter-class divergence matrix of the positive class and an inter-class divergence matrix of the negative class;
step S3: obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors;
step S4: and projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
2. The paired vector projection data classification method based on semi-supervised learning according to claim 1, wherein: the method for solving the Laplace matrix comprises the following steps: obtaining two types of training data, constructing a graph according to the training data to obtain an adjacency matrix, and obtaining a Laplace matrix according to the adjacency matrix.
3. The paired vector projection data classification method based on semi-supervised learning according to claim 1 or 2, characterized in that: the two types of training data are
Figure FDA0002444430950000011
Wherein
Figure FDA0002444430950000012
Representing training data, yi{ -1, +1} represents label information, m is a dimension, n is a total number of training data, XlFor a data set containing l labeled training data, XuIs a data set containing n-l unlabeled training data.
4. The paired vector projection data classification method based on semi-supervised learning according to claim 2, characterized in that: the adjacency matrix is A:
Figure FDA0002444430950000013
the laplace matrix L ═ D-a, where D isii=∑jAij
5. The paired vector projection data classification method based on semi-supervised learning according to claim 4, wherein: after the Laplace matrix is substituted into a Laplace manifold regularization term, the training data is mapped from a low-dimensional input space to a high-dimensional Hilbert space through a mapping function, wherein the mapping function is phi (x) ═ k (x, x)1),…,k(x,xn)]TWhere k (·,) is a kernel function, the mapped training data becomes
Figure FDA0002444430950000021
Positive data with label y ═ 1
Figure FDA0002444430950000022
Negative class data with label y-1
Figure FDA0002444430950000023
6. The paired vector projection data classification method based on semi-supervised learning as recited in claim 5, wherein: the positive type Laplace manifold regularization term is
Figure FDA0002444430950000024
Figure FDA0002444430950000025
The negative Laplace-like manifold regularization term is
Figure FDA0002444430950000026
Figure FDA0002444430950000027
Wherein
Figure FDA0002444430950000028
Figure FDA0002444430950000029
The intra-class divergence matrix of the positive class data is S1
Figure FDA00024444309500000210
The intra-class divergence matrix of the negative class data is S2
Figure FDA00024444309500000211
Wherein
Figure FDA00024444309500000212
Is a vector of the mean value of the vectors,
Figure FDA00024444309500000213
and
Figure FDA00024444309500000214
is a full 1 vector; the positive inter-class divergence matrix is G,
Figure FDA00024444309500000215
the positive inter-class divergence matrix is H,
Figure FDA00024444309500000216
where e is an n-dimensional all-1 vector.
7. The paired vector projection data classification method based on semi-supervised learning as recited in claim 6, wherein: the optimal problem is as follows:
Figure FDA00024444309500000217
and
Figure FDA00024444309500000218
wherein, β1,β2∈[0,1]To balance the parameters of the intra-class and inter-class divergence matrices, p1,ρ2And more than or equal to 0 is a regular term parameter.
8. The paired vector projection data classification method based on semi-supervised learning as recited in claim 7, wherein: the method for solving to obtain two optimal projection vectors comprises the following steps: converting an optimization problem into a lagrange multiplier problem
Figure FDA00024444309500000219
Figure FDA00024444309500000220
Let Lagrangian function L (w)1,λ1) W of1The partial derivative is equal to 0, i.e.
Figure FDA00024444309500000221
Similarly, pair L (w)2,λ2) W of2The partial derivative is equal to 0, i.e.:
Figure FDA00024444309500000222
convert the optimization problem to β1Gw1-(1-β1)S1w11KTLKw1=λ1w1And β2Hw2-(1-β2)S2w22KTLKw2=λ2w2To obtain two optimal projection vectors w1And w2
9. The paired vector projection data classification method based on semi-supervised learning as recited in claim 8, wherein: the method for obtaining the label-free data label comprises the following steps: will not have label data
Figure FDA0002444430950000031
(m is dimension), and obtaining projected data through a kernel function:
Figure FDA0002444430950000032
will newly obtain the data
Figure FDA0002444430950000033
W obtained by training the module1And w2Projection onto two different subspaces
Figure FDA0002444430950000034
And
Figure FDA0002444430950000035
the distances to the centers of the subspaces are calculated respectively:
Figure FDA0002444430950000036
according to
Figure FDA0002444430950000037
To obtain non-tag data
Figure FDA0002444430950000038
Is marked with a label
Figure FDA00024444309500000310
Figure FDA0002444430950000039
10. A system for classifying paired vector projection data based on semi-supervised learning, comprising:
the projection module is used for constructing an adjacency graph according to two types of training data, solving a Laplace matrix, substituting the Laplace matrix into a Laplace manifold regular term, respectively calculating a positive Laplace manifold regular term and a negative Laplace manifold regular term, an intra-class divergence matrix of the positive data and an intra-class divergence matrix of the negative data, and a positive inter-class divergence matrix and a negative inter-class divergence matrix, obtaining an optimal problem according to the data, and solving to obtain two optimal projection vectors;
and the classification module is used for projecting the label-free data to a high-dimensional space through a kernel function, projecting the two optimal projection vectors to two different subspaces, and respectively calculating the distance from the two optimal projection vectors to the center of each subspace to obtain the label of the label-free data.
CN202010274957.0A 2020-04-09 2020-04-09 Paired vector projection data classification method and system based on semi-supervised learning Pending CN111507387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010274957.0A CN111507387A (en) 2020-04-09 2020-04-09 Paired vector projection data classification method and system based on semi-supervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010274957.0A CN111507387A (en) 2020-04-09 2020-04-09 Paired vector projection data classification method and system based on semi-supervised learning

Publications (1)

Publication Number Publication Date
CN111507387A true CN111507387A (en) 2020-08-07

Family

ID=71864068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010274957.0A Pending CN111507387A (en) 2020-04-09 2020-04-09 Paired vector projection data classification method and system based on semi-supervised learning

Country Status (1)

Country Link
CN (1) CN111507387A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022063216A1 (en) * 2020-09-28 2022-03-31 International Business Machines Corporation Determination and use of spectral embeddings of large-scale systems by substructuring

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022063216A1 (en) * 2020-09-28 2022-03-31 International Business Machines Corporation Determination and use of spectral embeddings of large-scale systems by substructuring
GB2613994A (en) * 2020-09-28 2023-06-21 Ibm Determination and use of spectral embeddings of large-scale systems by substructuring
US11734384B2 (en) 2020-09-28 2023-08-22 International Business Machines Corporation Determination and use of spectral embeddings of large-scale systems by substructuring

Similar Documents

Publication Publication Date Title
Feng et al. Adaptive unsupervised multi-view feature selection for visual concept recognition
Wang et al. Unsupervised feature selection via unified trace ratio formulation and k-means clustering (track)
EP3467723A1 (en) Machine learning based network model construction method and apparatus
Shi et al. Robust structured graph clustering
JP5565190B2 (en) Learning model creation program, image identification information addition program, learning model creation device, and image identification information addition device
Qin et al. Compressive sequential learning for action similarity labeling
JP2015506026A (en) Image classification
Fan et al. Multi-view subspace learning via bidirectional sparsity
Xu et al. A feasible density peaks clustering algorithm with a merging strategy
El Hajjar et al. Consensus graph and spectral representation for one-step multi-view kernel based clustering
Hao et al. Multi-view spectral clustering via common structure maximization of local and global representations
Cortés et al. Learning edit cost estimation models for graph edit distance
Nie et al. Implicit weight learning for multi-view clustering
Wang et al. High-dimensional Data Clustering Using K-means Subspace Feature Selection.
Wang et al. Robust semi-supervised nonnegative matrix factorization
CN113920382A (en) Cross-domain image classification method based on class consistency structured learning and related device
CN111507387A (en) Paired vector projection data classification method and system based on semi-supervised learning
Chen et al. Stability-based preference selection in affinity propagation
Li et al. Development of a global batch clustering with gradient descent and initial parameters in colour image classification
Liu et al. A weight-incorporated similarity-based clustering ensemble method
CN108229552B (en) Model processing method and device and storage medium
CN112800138B (en) Big data classification method and system
CN110717547A (en) Learning algorithm based on regression hypergraph
CN115601571A (en) Multi-pattern constraint typical correlation analysis method and system for multi-modal data
CN111428741B (en) Network community discovery method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200807

RJ01 Rejection of invention patent application after publication