CN108596272A - A kind of semisupervised classification machine learning new method based on figure - Google Patents

A kind of semisupervised classification machine learning new method based on figure Download PDF

Info

Publication number
CN108596272A
CN108596272A CN201810437033.0A CN201810437033A CN108596272A CN 108596272 A CN108596272 A CN 108596272A CN 201810437033 A CN201810437033 A CN 201810437033A CN 108596272 A CN108596272 A CN 108596272A
Authority
CN
China
Prior art keywords
sample
new method
classification
method based
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810437033.0A
Other languages
Chinese (zh)
Inventor
刘建峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Three Gorges University
Original Assignee
Chongqing Three Gorges University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Three Gorges University filed Critical Chongqing Three Gorges University
Priority to CN201810437033.0A priority Critical patent/CN108596272A/en
Publication of CN108596272A publication Critical patent/CN108596272A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of semisupervised classification new method based on figure, gives the sample classification method including step S001 to step S005;And the active sample labeling method including step D000, D001, D002 is provided, training sample set is pre-processed, supervised learning and unsupervised learning advantage are combined, there is training sample to prepare simple, the high advantage of training precision;It is proposed in this paper to pre-process active sample labeling method in training set and sorting technique based on figure makes training precision higher compared to general semi-supervised learning method, the label difficulty of data sample is reduced.

Description

A kind of semisupervised classification machine learning new method based on figure
Technical field
The present invention relates to a kind of machine learning methods, and in particular to a kind of semisupervised classification machine learning side based on figure Method.
Background technology
Problem concerning study is divided into three classes in machine learning:Supervised learning, semi-supervised learning and unsupervised learning;Reality The class label that sample is obtained in data is the thing of a time and effort consuming, and exemplar will improve the classification capacity of grader. Method based on figure is a kind of important, effective semi-supervised learning method, and existing semi-supervised learning classification is correct Rate and two evaluation index performances of standard error are poor.
Invention content
The semisupervised classification machine learning new method based on figure that the purpose of the present invention is to provide a kind of so that grader exists It is improved in two index performances of classification accuracy rate and standard error.
To achieve the above object, the semisupervised classification machine learning new method based on figure that the present invention provides a kind of, mainly Include the following steps:
Step 1:Divide training set;Training set X=L ∪ U={ x1,…xl,xl+1,…xl+u, L={ x1,…xlIt is few Amount has exemplar, U={ xl+1,…xl+uIt is a large amount of unlabeled exemplars;Marker samples accounting
Step 2:Construct asymmetric sparse weight matrix;Structural map G=(V, E), V indicate each sample point, E representative samples Side between this point, side e ∈ E are determining by w (e), w (e)=wij, wijIndicate sample point xiAnd xjBetween similitude;
According to linear property and sparsity, a sample x is giveni, as all base vectors and xiWhen from same class, it can obtain Obtain the sparse expression of the sample.TkIt indicates to remove xkExcept row sample matrixIndicate sparse decomposition coefficients.Wherein Tk= [x1,…,xk-1,xk+1,…,xn],
Asymmetric weight matrix is:
Step 3:Solve asymmetric weight matrix;Give a sample xi, weight system is gone by following optimization problem solving Number:
To be converted into the following problem of optimization:
min||q||1S.t.Pq=xk,qi>=0, i=1,2 ..., k-1, k+1 ..., n
Wherein P=[Gk Id]∈Rd×(d+n-1), q=[a e]T, can be solved by linear programming problem, result is:
wii=0,
The weight matrix is unsymmetrical matrix;
Step 4:Objective function;The object function formed using loss function and regular terms, expression formula are as follows:
Wherein C is loss function, | | f | |2It is the penalty term in the spaces PKHS,Indicate epidemiological features inside data,F=[f (x in formula1),f(x2),…,f(xl+u)], the optimization above problem can obtain:
In formula, L is Laplce, is indicated using sparse weight, L=D-W ∈ Rn×n, W is sample sparse matrix.Above formula It is represented by:
Regular terms A can be calculated by input sample;
Step 5, judgement sample classification;Symbol function pair sample class is defined to make decisions
In above-mentioned semisupervised classification learning method, marker samples accounting is to decide classifier performance in training set sample; Marker samples accounting is bigger, and nicety of grading is higher;But the label of sample is not easy to obtain, and reduces semi-supervised learning efficiency and property Energy;Further, the present invention is to promote semi-supervised learning efficiency and performance, it is proposed that a kind of active sample labeling method;
It is implemented as follows:
General sample M={ m are defined firsti, N={ ni, i=1,2 ..., the symmetry distance between t:
In formula, symmetry distance is smaller, and two sample distributions of characterization are more similar, change to symmetry distance in conjunction with Bayesian probability It writes:
In formula, p (ci| it is x) that sample x belongs to classification ciPosterior probability, test probability and prior probability be corrected, more Close to truth, p (ci| it x) is calculated by Bayesian formula, calculates such as following formula in detail:
And i, j≤s, s are classification number.This formula representative sample x belongs to the i-th class and belongs to the difference of the probability of jth class, letter It is denoted as:Li,j(x);Difference gets over large sample generic and is easy for determining, conversely, sample is at the fuzzy boundary of comparison.
Select the class probability difference L of unmarked samplei,j(x) it is less than within the scope of predetermined threshold value δ, assert that the sample is in class Between other i, j on boundary relatively fuzzyyer, there is maximum boundary information content;
Selection is subjected to classification calibration with maximum boundary information content sample by the 3 initialization graders trained;Institute The 3 initialization graders stated are to randomly choose 2/3rds total sample trainings by NB Algorithm to obtain;
Further, predetermined threshold value δ=2.5 × 10-2It is optimal.
Compared with existing semisupervised classification machine learning method, the present invention provides a kind of semisupervised classifications based on figure Machine learning new method has the advantages that:
The present invention proposes a kind of semisupervised classification new method based on figure, combines supervised learning and unsupervised learning is excellent Point there is training sample to prepare simple, the high advantage of training precision;Compared to general semi-supervised learning method, set forth herein Pre-process active sample labeling method in training set and sorting technique based on figure makes training precision higher, to data sample Label difficulty reduce.
Description of the drawings
Fig. 1, the semisupervised classification new method flow chart based on figure.
Fig. 2, active sample labeling method flow diagram.
Specific implementation mode
Embodiment 1
According to flow described in Fig. 1, the realization of the semisupervised classification new method based on figure is completed, divides training set first, it will Sample is according to having label and without labeling two major classes in training set, and calculates its marker samples accounting;Prepare marker samples to account for Training set than 10%;
To reduce sample labeling difficulty and promoting nicety of grading, using active sample labeling method to the training set of division into Row and processing;Calculate the class probability difference L of all unlabeled exemplarsi,j(x):
In formula, p (ci| it is x) that sample x belongs to classification ciPosterior probability, test probability and prior probability be corrected, more Close to truth, p (ci| it x) is calculated by Bayesian formula, calculates such as following formula in detail:
And choose all samples for meeting class probability difference and being less than predetermined threshold value δ;And 3 using training completion are initial Change grader to demarcate selected sample, sample after calibration has been divided into exemplar collection, improves training and be concentrated with label Sample accounting;
Predetermined threshold value δ=2.5 × 10 for comparing-2
According to the asymmetric sparse weight matrix of training set sample architecture;To passing through solving-optimizing problem Solve asymmetric weight matrix;
Its target function value is calculated according to waiting for that judgement sample inputs Regular terms A can be calculated by input sample in formula, be specifically calculated as:
A=P-PQ+Q
P=XTX(λI+XTX)-1
Symbolization functionIt is exported, class prediction is carried out to treat judgement sample.

Claims (3)

1. a kind of semisupervised classification new method based on figure, includes the following steps:
Step S001:Divide training set;
Training set X=L ∪ U={ x1,…xl,xl+1,…xl+u, L={ x1,…xlIt is to have exemplar on a small quantity, U={ xl+1,… xl+uIt is a large amount of unlabeled exemplars;Marker samples accounting
Step S002:Construct asymmetric sparse weight matrix;
In formula,J=1,2 ..., k-1, k+1 ..., n indicates sample xkBy the sparse decomposition system of remaining sample linear expression Number;
Step S003:Solve asymmetric weight matrix;Pass through solving-optimizing problemIt solves asymmetric Weight matrix;
Step S004:Calculating target function
Step S005:Symbolization functionJudgement sample classification.
2. a kind of semisupervised classification new method based on figure according to claim 1, it is characterised in that:In the step S001 is divided in training set, and using active sample labeling method, label has maximum boundary information content sample, is carried out to sample pre- Processing, includes the following steps:
Step D000:It is calculated as the class probability difference L of all unlabeled exemplarsi,j(x);The class probability mathematic interpolation is public Formula is:
Step D001:Choose class probability difference Li,j(x) all unlabeled exemplars for being less than predetermined threshold value δ are used as with maximum Boundary information amount sample;
Step D002:Have maximum boundary information content sample by training 3 initialization graders of completion to carry out classification by choosing Calibration, exemplar collection has been divided by sample after calibration.
3. a kind of semisupervised classification new method based on figure according to claim 2, it is characterised in that:The predetermined threshold value δ=2.5 × 10-2
CN201810437033.0A 2018-05-09 2018-05-09 A kind of semisupervised classification machine learning new method based on figure Pending CN108596272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810437033.0A CN108596272A (en) 2018-05-09 2018-05-09 A kind of semisupervised classification machine learning new method based on figure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810437033.0A CN108596272A (en) 2018-05-09 2018-05-09 A kind of semisupervised classification machine learning new method based on figure

Publications (1)

Publication Number Publication Date
CN108596272A true CN108596272A (en) 2018-09-28

Family

ID=63635946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810437033.0A Pending CN108596272A (en) 2018-05-09 2018-05-09 A kind of semisupervised classification machine learning new method based on figure

Country Status (1)

Country Link
CN (1) CN108596272A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309871A (en) * 2019-06-27 2019-10-08 西北工业大学深圳研究院 A kind of semi-supervised learning image classification method based on random resampling

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130223727A1 (en) * 2012-02-29 2013-08-29 Canon Kabushiki Kaisha Method and device for learning of a classifier, and processing apparatus
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN104463203A (en) * 2014-12-03 2015-03-25 复旦大学 Hyper-spectral remote sensing image semi-supervised classification method based on ground object class membership grading
CN104992184A (en) * 2015-07-02 2015-10-21 东南大学 Multiclass image classification method based on semi-supervised extreme learning machine
CN107766895A (en) * 2017-11-16 2018-03-06 苏州大学 A kind of induction type is non-negative to project semi-supervised data classification method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130223727A1 (en) * 2012-02-29 2013-08-29 Canon Kabushiki Kaisha Method and device for learning of a classifier, and processing apparatus
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN104463203A (en) * 2014-12-03 2015-03-25 复旦大学 Hyper-spectral remote sensing image semi-supervised classification method based on ground object class membership grading
CN104992184A (en) * 2015-07-02 2015-10-21 东南大学 Multiclass image classification method based on semi-supervised extreme learning machine
CN107766895A (en) * 2017-11-16 2018-03-06 苏州大学 A kind of induction type is non-negative to project semi-supervised data classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘建峰等: "融合主动学习的改进贝叶斯半监督分类算法研究", 《计算机测量与控制》 *
刘建峰等: "非对称稀疏图的半监督学习研究", 《重庆师范大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309871A (en) * 2019-06-27 2019-10-08 西北工业大学深圳研究院 A kind of semi-supervised learning image classification method based on random resampling

Similar Documents

Publication Publication Date Title
CN111126488B (en) Dual-attention-based image recognition method
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
CN109934293B (en) Image recognition method, device, medium and confusion perception convolutional neural network
CN109492099B (en) Cross-domain text emotion classification method based on domain impedance self-adaption
Chong et al. Simultaneous image classification and annotation
CN111814871A (en) Image classification method based on reliable weight optimal transmission
Yu et al. Mixture of GANs for Clustering.
CN104966105A (en) Robust machine error retrieving method and system
CN112784031B (en) Method and system for classifying customer service conversation texts based on small sample learning
CN111816255A (en) RNA-binding protein recognition by fusing multi-view and optimal multi-tag chain learning
CN109919236A (en) A kind of BP neural network multi-tag classification method based on label correlation
Chu et al. Co-training based on semi-supervised ensemble classification approach for multi-label data stream
CN112668633B (en) Adaptive graph migration learning method based on fine granularity field
CN110263804A (en) A kind of medical image dividing method based on safe semi-supervised clustering
CN108596272A (en) A kind of semisupervised classification machine learning new method based on figure
Liu et al. A high-performing comprehensive learning algorithm for text classification without pre-labeled training set
WO2003073381A1 (en) Pattern feature selection method, classification method, judgment method, program, and device
CN116523877A (en) Brain MRI image tumor block segmentation method based on convolutional neural network
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN114357221B (en) Self-supervision active learning method based on image classification
CN112801163B (en) Multi-target feature selection method of mouse model hippocampal biomarker based on dynamic graph structure
Liang et al. Large-scale image classification using fast svm with deep quasi-linear kernel
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
Zhang et al. An optimized dimensionality reduction model for high-dimensional data based on restricted Boltzmann machines
Yang et al. A two-stage training framework with feature-label matching mechanism for learning from label proportions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928