CN108596272A - A kind of semisupervised classification machine learning new method based on figure - Google Patents
A kind of semisupervised classification machine learning new method based on figure Download PDFInfo
- Publication number
- CN108596272A CN108596272A CN201810437033.0A CN201810437033A CN108596272A CN 108596272 A CN108596272 A CN 108596272A CN 201810437033 A CN201810437033 A CN 201810437033A CN 108596272 A CN108596272 A CN 108596272A
- Authority
- CN
- China
- Prior art keywords
- sample
- new method
- classification
- method based
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of semisupervised classification new method based on figure, gives the sample classification method including step S001 to step S005;And the active sample labeling method including step D000, D001, D002 is provided, training sample set is pre-processed, supervised learning and unsupervised learning advantage are combined, there is training sample to prepare simple, the high advantage of training precision;It is proposed in this paper to pre-process active sample labeling method in training set and sorting technique based on figure makes training precision higher compared to general semi-supervised learning method, the label difficulty of data sample is reduced.
Description
Technical field
The present invention relates to a kind of machine learning methods, and in particular to a kind of semisupervised classification machine learning side based on figure
Method.
Background technology
Problem concerning study is divided into three classes in machine learning:Supervised learning, semi-supervised learning and unsupervised learning;Reality
The class label that sample is obtained in data is the thing of a time and effort consuming, and exemplar will improve the classification capacity of grader.
Method based on figure is a kind of important, effective semi-supervised learning method, and existing semi-supervised learning classification is correct
Rate and two evaluation index performances of standard error are poor.
Invention content
The semisupervised classification machine learning new method based on figure that the purpose of the present invention is to provide a kind of so that grader exists
It is improved in two index performances of classification accuracy rate and standard error.
To achieve the above object, the semisupervised classification machine learning new method based on figure that the present invention provides a kind of, mainly
Include the following steps:
Step 1:Divide training set;Training set X=L ∪ U={ x1,…xl,xl+1,…xl+u, L={ x1,…xlIt is few
Amount has exemplar, U={ xl+1,…xl+uIt is a large amount of unlabeled exemplars;Marker samples accounting
Step 2:Construct asymmetric sparse weight matrix;Structural map G=(V, E), V indicate each sample point, E representative samples
Side between this point, side e ∈ E are determining by w (e), w (e)=wij, wijIndicate sample point xiAnd xjBetween similitude;
According to linear property and sparsity, a sample x is giveni, as all base vectors and xiWhen from same class, it can obtain
Obtain the sparse expression of the sample.TkIt indicates to remove xkExcept row sample matrixIndicate sparse decomposition coefficients.Wherein Tk=
[x1,…,xk-1,xk+1,…,xn],
Asymmetric weight matrix is:
Step 3:Solve asymmetric weight matrix;Give a sample xi, weight system is gone by following optimization problem solving
Number:
To be converted into the following problem of optimization:
min||q||1S.t.Pq=xk,qi>=0, i=1,2 ..., k-1, k+1 ..., n
Wherein P=[Gk Id]∈Rd×(d+n-1), q=[a e]T, can be solved by linear programming problem, result is:
wii=0,
The weight matrix is unsymmetrical matrix;
Step 4:Objective function;The object function formed using loss function and regular terms, expression formula are as follows:
Wherein C is loss function, | | f | |2It is the penalty term in the spaces PKHS,Indicate epidemiological features inside data,F=[f (x in formula1),f(x2),…,f(xl+u)], the optimization above problem can obtain:
In formula, L is Laplce, is indicated using sparse weight, L=D-W ∈ Rn×n, W is sample sparse matrix.Above formula
It is represented by:
Regular terms A can be calculated by input sample;
Step 5, judgement sample classification;Symbol function pair sample class is defined to make decisions
In above-mentioned semisupervised classification learning method, marker samples accounting is to decide classifier performance in training set sample;
Marker samples accounting is bigger, and nicety of grading is higher;But the label of sample is not easy to obtain, and reduces semi-supervised learning efficiency and property
Energy;Further, the present invention is to promote semi-supervised learning efficiency and performance, it is proposed that a kind of active sample labeling method;
It is implemented as follows:
General sample M={ m are defined firsti, N={ ni, i=1,2 ..., the symmetry distance between t:
In formula, symmetry distance is smaller, and two sample distributions of characterization are more similar, change to symmetry distance in conjunction with Bayesian probability
It writes:
In formula, p (ci| it is x) that sample x belongs to classification ciPosterior probability, test probability and prior probability be corrected, more
Close to truth, p (ci| it x) is calculated by Bayesian formula, calculates such as following formula in detail:
And i, j≤s, s are classification number.This formula representative sample x belongs to the i-th class and belongs to the difference of the probability of jth class, letter
It is denoted as:Li,j(x);Difference gets over large sample generic and is easy for determining, conversely, sample is at the fuzzy boundary of comparison.
Select the class probability difference L of unmarked samplei,j(x) it is less than within the scope of predetermined threshold value δ, assert that the sample is in class
Between other i, j on boundary relatively fuzzyyer, there is maximum boundary information content;
Selection is subjected to classification calibration with maximum boundary information content sample by the 3 initialization graders trained;Institute
The 3 initialization graders stated are to randomly choose 2/3rds total sample trainings by NB Algorithm to obtain;
Further, predetermined threshold value δ=2.5 × 10-2It is optimal.
Compared with existing semisupervised classification machine learning method, the present invention provides a kind of semisupervised classifications based on figure
Machine learning new method has the advantages that:
The present invention proposes a kind of semisupervised classification new method based on figure, combines supervised learning and unsupervised learning is excellent
Point there is training sample to prepare simple, the high advantage of training precision;Compared to general semi-supervised learning method, set forth herein
Pre-process active sample labeling method in training set and sorting technique based on figure makes training precision higher, to data sample
Label difficulty reduce.
Description of the drawings
Fig. 1, the semisupervised classification new method flow chart based on figure.
Fig. 2, active sample labeling method flow diagram.
Specific implementation mode
Embodiment 1
According to flow described in Fig. 1, the realization of the semisupervised classification new method based on figure is completed, divides training set first, it will
Sample is according to having label and without labeling two major classes in training set, and calculates its marker samples accounting;Prepare marker samples to account for
Training set than 10%;
To reduce sample labeling difficulty and promoting nicety of grading, using active sample labeling method to the training set of division into
Row and processing;Calculate the class probability difference L of all unlabeled exemplarsi,j(x):
In formula, p (ci| it is x) that sample x belongs to classification ciPosterior probability, test probability and prior probability be corrected, more
Close to truth, p (ci| it x) is calculated by Bayesian formula, calculates such as following formula in detail:
And choose all samples for meeting class probability difference and being less than predetermined threshold value δ;And 3 using training completion are initial
Change grader to demarcate selected sample, sample after calibration has been divided into exemplar collection, improves training and be concentrated with label
Sample accounting;
Predetermined threshold value δ=2.5 × 10 for comparing-2;
According to the asymmetric sparse weight matrix of training set sample architecture;To passing through solving-optimizing problem
Solve asymmetric weight matrix;
Its target function value is calculated according to waiting for that judgement sample inputs
Regular terms A can be calculated by input sample in formula, be specifically calculated as:
A=P-PQ+Q
P=XTX(λI+XTX)-1
Symbolization functionIt is exported, class prediction is carried out to treat judgement sample.
Claims (3)
1. a kind of semisupervised classification new method based on figure, includes the following steps:
Step S001:Divide training set;
Training set X=L ∪ U={ x1,…xl,xl+1,…xl+u, L={ x1,…xlIt is to have exemplar on a small quantity, U={ xl+1,…
xl+uIt is a large amount of unlabeled exemplars;Marker samples accounting
Step S002:Construct asymmetric sparse weight matrix;
In formula,J=1,2 ..., k-1, k+1 ..., n indicates sample xkBy the sparse decomposition system of remaining sample linear expression
Number;
Step S003:Solve asymmetric weight matrix;Pass through solving-optimizing problemIt solves asymmetric
Weight matrix;
Step S004:Calculating target function
Step S005:Symbolization functionJudgement sample classification.
2. a kind of semisupervised classification new method based on figure according to claim 1, it is characterised in that:In the step
S001 is divided in training set, and using active sample labeling method, label has maximum boundary information content sample, is carried out to sample pre-
Processing, includes the following steps:
Step D000:It is calculated as the class probability difference L of all unlabeled exemplarsi,j(x);The class probability mathematic interpolation is public
Formula is:
Step D001:Choose class probability difference Li,j(x) all unlabeled exemplars for being less than predetermined threshold value δ are used as with maximum
Boundary information amount sample;
Step D002:Have maximum boundary information content sample by training 3 initialization graders of completion to carry out classification by choosing
Calibration, exemplar collection has been divided by sample after calibration.
3. a kind of semisupervised classification new method based on figure according to claim 2, it is characterised in that:The predetermined threshold value
δ=2.5 × 10-2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810437033.0A CN108596272A (en) | 2018-05-09 | 2018-05-09 | A kind of semisupervised classification machine learning new method based on figure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810437033.0A CN108596272A (en) | 2018-05-09 | 2018-05-09 | A kind of semisupervised classification machine learning new method based on figure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108596272A true CN108596272A (en) | 2018-09-28 |
Family
ID=63635946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810437033.0A Pending CN108596272A (en) | 2018-05-09 | 2018-05-09 | A kind of semisupervised classification machine learning new method based on figure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108596272A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309871A (en) * | 2019-06-27 | 2019-10-08 | 西北工业大学深圳研究院 | A kind of semi-supervised learning image classification method based on random resampling |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130223727A1 (en) * | 2012-02-29 | 2013-08-29 | Canon Kabushiki Kaisha | Method and device for learning of a classifier, and processing apparatus |
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN104463203A (en) * | 2014-12-03 | 2015-03-25 | 复旦大学 | Hyper-spectral remote sensing image semi-supervised classification method based on ground object class membership grading |
CN104992184A (en) * | 2015-07-02 | 2015-10-21 | 东南大学 | Multiclass image classification method based on semi-supervised extreme learning machine |
CN107766895A (en) * | 2017-11-16 | 2018-03-06 | 苏州大学 | A kind of induction type is non-negative to project semi-supervised data classification method and system |
-
2018
- 2018-05-09 CN CN201810437033.0A patent/CN108596272A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130223727A1 (en) * | 2012-02-29 | 2013-08-29 | Canon Kabushiki Kaisha | Method and device for learning of a classifier, and processing apparatus |
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN104463203A (en) * | 2014-12-03 | 2015-03-25 | 复旦大学 | Hyper-spectral remote sensing image semi-supervised classification method based on ground object class membership grading |
CN104992184A (en) * | 2015-07-02 | 2015-10-21 | 东南大学 | Multiclass image classification method based on semi-supervised extreme learning machine |
CN107766895A (en) * | 2017-11-16 | 2018-03-06 | 苏州大学 | A kind of induction type is non-negative to project semi-supervised data classification method and system |
Non-Patent Citations (2)
Title |
---|
刘建峰等: "融合主动学习的改进贝叶斯半监督分类算法研究", 《计算机测量与控制》 * |
刘建峰等: "非对称稀疏图的半监督学习研究", 《重庆师范大学学报(自然科学版)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309871A (en) * | 2019-06-27 | 2019-10-08 | 西北工业大学深圳研究院 | A kind of semi-supervised learning image classification method based on random resampling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111126488B (en) | Dual-attention-based image recognition method | |
Gu et al. | Stack-captioning: Coarse-to-fine learning for image captioning | |
CN109934293B (en) | Image recognition method, device, medium and confusion perception convolutional neural network | |
CN109492099B (en) | Cross-domain text emotion classification method based on domain impedance self-adaption | |
CN111814871A (en) | Image classification method based on reliable weight optimal transmission | |
Chong et al. | Simultaneous image classification and annotation | |
CN113408605A (en) | Hyperspectral image semi-supervised classification method based on small sample learning | |
CN112784031B (en) | Method and system for classifying customer service conversation texts based on small sample learning | |
CN111816255A (en) | RNA-binding protein recognition by fusing multi-view and optimal multi-tag chain learning | |
CN114357221B (en) | Self-supervision active learning method based on image classification | |
CN101923604A (en) | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set | |
Chu et al. | Co-training based on semi-supervised ensemble classification approach for multi-label data stream | |
CN112668633B (en) | Adaptive graph migration learning method based on fine granularity field | |
CN108596272A (en) | A kind of semisupervised classification machine learning new method based on figure | |
Wu et al. | Quantifying intrinsic uncertainty in classification via deep Dirichlet mixture networks | |
Liu et al. | A high-performing comprehensive learning algorithm for text classification without pre-labeled training set | |
WO2003073381A1 (en) | Pattern feature selection method, classification method, judgment method, program, and device | |
Yang et al. | A two-stage training framework with feature-label matching mechanism for learning from label proportions | |
CN116630816A (en) | SAR target recognition method, device, equipment and medium based on prototype comparison learning | |
Li et al. | A classifier fusion method based on classifier accuracy | |
CN116523877A (en) | Brain MRI image tumor block segmentation method based on convolutional neural network | |
CN112801163B (en) | Multi-target feature selection method of mouse model hippocampal biomarker based on dynamic graph structure | |
Liang et al. | Large-scale image classification using fast svm with deep quasi-linear kernel | |
CN115063374A (en) | Model training method, face image quality scoring method, electronic device and storage medium | |
Zhang et al. | An optimized dimensionality reduction model for high-dimensional data based on restricted Boltzmann machines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180928 |