CN103390171A - Safe semi-supervised learning method - Google Patents

Safe semi-supervised learning method Download PDF

Info

Publication number
CN103390171A
CN103390171A CN2013103155014A CN201310315501A CN103390171A CN 103390171 A CN103390171 A CN 103390171A CN 2013103155014 A CN2013103155014 A CN 2013103155014A CN 201310315501 A CN201310315501 A CN 201310315501A CN 103390171 A CN103390171 A CN 103390171A
Authority
CN
China
Prior art keywords
semi
supervised
learning method
classifier
outcome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013103155014A
Other languages
Chinese (zh)
Inventor
周志华
李宇峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN2013103155014A priority Critical patent/CN103390171A/en
Publication of CN103390171A publication Critical patent/CN103390171A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a safe semi-supervised learning method. The method comprises the steps of building a plurality of semi-supervised sorters and building a final safe semi-supervised sorter, wherein firstly, a plurality of semi-supervised sorters with large differences are built for a given training dataset; and then the final safe semi-supervised sorter is built through improvement of maximum performance under the worst condition. The method has less probability of performance reduction during an applying process, and at the same time, the performance comparable with the conventional classic technological level is obtained.

Description

A kind of safe semi-supervised learning method
Technical field
The present invention relates to a kind of semi-supervised learning method, particularly the semi-supervised learning method of safe utilization Unlabeled data how, belong to the machine learning techniques field.
Background technology
Machine learning method attempts to utilize the historical data of task to improve the performance of task.For the learning performance that obtains, machine learning method is the supervised learning method for example, usually requires historical data that clear and definite concept mark (be called flag data is arranged) is arranged and requires to have a large amount of flag datas that has.In a lot of realistic task,, because obtaining of concept mark need to be expended a large amount of human and material resources, therefore there is flag data normally rare, do not have in a large number the historical data (being called Unlabeled data) of concept mark can easily obtain.How to utilize a large amount of Unlabeled datas to assist to improve and only utilize an important topic that has on a small quantity performance that flag data obtains to become machine learning method, and the semi-supervised learning method is one of two large mainstream technologys of this aspect.
The semi-supervised learning method is able to widespread use aspect a lot; Yet in many situations, existing semi-supervised learning method utilizes Unlabeled data can cause hydraulic performance decline, and namely the performance of semi-supervised learning method can significantly have the obtained performance of flag data training supervised learning method not as direct utilization on a small quantity.This phenomenon has had a strong impact on the application of semi-supervised learning method in actual task, because the user wishes to have utilized the semi-supervised learning method can not cause hydraulic performance decline usually.Therefore need a kind of safe semi-supervised learning method to make, it can bring performance to improve usually on the one hand, and it seldom can cause performance significantly to descend on the other hand.Based on semi-supervised learning problem ubiquity in actual task, the achievement of this respect will play a role in a lot of actual task.
Summary of the invention
Goal of the invention: the problem for present semi-supervised learning method utilizes Unlabeled data all can cause performance significantly to descend in many situations the invention provides a kind of safe semi-supervised learning method.Particularly, at first for given training dataset, build the large semi-supervised classifier of a plurality of othernesses, then by maximizing performance under worst case, improve to build final safe semi-supervised classifier.
Technical scheme: a kind of safe semi-supervised learning method mainly comprises and builds a plurality of semi-supervised classifier steps and build final safe semi-supervised classifier step;
A plurality of semi-supervised classifier steps of described structure are specially:
Step 100, for flag data and a large amount of Unlabeled data, a plurality of semi-supervised classifiers of random initializtion are arranged on a small quantity;
Step 101,, for each initial semi-supervised classifier,, according to the objective function of semi-supervised classifier, be optimized predicting the outcome of semi-supervised classifier by optimization method;
Step 102, be divided into a plurality of bunches with the clustering method that predicts the outcome by machine learning of the semi-supervised classifier optimized in step 101;
Step 103,, for each bunch of cluster result, export the wherein semi-supervised classifier of desired value optimum;
Step 104, collect the semi-supervised classifier of each bunch output, obtains a plurality of semi-supervised classifiers;
The final safe semi-supervised classifier step of described structure is specially:
Step 200,, to flag data training supervised learning method is arranged on a small quantity, obtain predicting the outcome on Unlabeled data;
Step 201, suppose that each semi-supervised classifier that step 104 builds is true sorter, according to predicting the outcome of supervised learning method, the definition performance that predicts the outcome on any Unlabeled data improved function;
Step 202, to predicting the outcome on any Unlabeled data, a plurality of performances that step 201 is obtained improve function, investigate minimum performance and improve the performance raising function that is defined as under worst case;
Step 203, improve objective function according to the performance under worst case, by optimization method, predicting the outcome of Unlabeled data is optimized, and makes its performance that maximizes under worst case improve objective function;
Step 204, with optimum results output, as predicting the outcome of final safe semi-supervised classifier.
Described semi-supervised classifier comprises semi-supervised classifier based on production, based on the semi-supervised classifier of figure, based on the semi-supervised classifier of inconsistency, based on semi-supervised classifier of support vector machine etc.
Described initialization semi-supervised classifier refers to predicting the outcome on Unlabeled data carried out initialization.
The objective function of described semi-supervised classifier comprises the interval of different classes of data, probability likelihood etc.
Supervised learning method in described step 200 comprises production model method, arest neighbors prison learning method, support vector machine learning method etc.
The Performance Evaluating Indexes that described performance improves function comprises precision, precision ratio, recall ratio, F1 tolerance etc.
Beneficial effect: compared with prior art, safe semi-supervised learning method provided by the present invention, seldom cause performance significantly to descend in implementation process, meanwhile obtained the performance comparable with the prior art height.
Description of drawings
Fig. 1 is the workflow diagram of a plurality of semi-supervised classifiers of structure of the embodiment of the present invention;
Fig. 2 is the workflow diagram of the final safe semi-supervised classifier of structure of the embodiment of the present invention;
Fig. 3 is the precision result of experiment contrast on a plurality of True Data collection of the embodiment of the present invention.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment only is not used in and limits the scope of the invention for explanation the present invention, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.
Build the workflow of a plurality of semi-supervised classifiers as shown in Figure 1.Specifically, given have flag data and a large amount of Unlabeled datas on a small quantity, a plurality of semi-supervised classifiers of random initializtion at first, and for example N semi-supervised classifier, be designated as { y 1, y 2..., y N(step 10); Optimize the predicting the outcome until restrain of semi-supervised classifier (step 11), for example can adopt and replace optimization method---the at first fixing { y that predicts the outcome on Unlabeled data 1, y 2..., y NRenewal sorter model parameter { φ 1, φ 2..., φ N(step 12a), then fixed cluster device model parameter { φ 1, φ 2..., φ NUpgrade { the y that predicts the outcome on Unlabeled data 1, y 2..., y NObtain { z 1, z 2..., z N(step 12b), if { z 1, z 2..., z N}={ y 1, y 2..., y NEnter next step, otherwise iterative step 12a and 12b are until convergence; { the y that predicts the outcome with the semi-supervised classifier optimized 1, y 2..., y NCarry out cluster operation, for example adopt k average technology.Note cluster number is the T(step 13); For each bunch of cluster result, export the wherein sorter of desired value optimum, without loss of generality, remember that last semi-supervised classifier is { y 1, y 2..., y T(step 14).So far, obtain a plurality of semi-supervised classifier { y 1, y 2..., y T.
Build the workflow of final safe semi-supervised learning method as shown in Figure 2.At first, to flag data training supervised learning method is arranged on a small quantity, obtain the y that predicts the outcome on Unlabeled data 0(step 20); Suppose the semi-supervised classifier y that builds before each t, for true sorter, according to predicting the outcome of supervised learning method, the y definition performance that predicts the outcome on any Unlabeled data is improved function F (y t, y, y 0), for example for precision, improving function definition is F (y t, y, y 0)=y t' y-y ' y 0, ' represent vectorial transposition (step 21); Consider that minimum performance improves, the performance under the definition worst case improves function (step 22), namely
min?F(y t,y,y 0),
t=1,…,T
Improve objective function according to the performance under worst case, by optimization method, predicting the outcome of Unlabeled data is optimized, make its performance that maximizes under worst case improve objective function (step 23);
max?min?F(y t,y,y 0),
y?t=1,…,T
Optimization method can adopt in the numerical optimization textbook the various numerical optimization technique of introducing; With optimum results y* output, as predict the outcome (step 24) of final safe semi-supervised classifier.
The embodiment of the present invention test on a plurality of True Data collection the contrast precision result as shown in Figure 3.The True Data collection that the experimental data collection provides from the Irving of California, USA university branch school.For each True Data collection, get at random 10 data as flag data is arranged, remainder is Unlabeled data.Experiment repeats 30 times, reports the mean accuracy result on Unlabeled data.The embodiment of the present invention and two kinds of existing methods compare: only utilize the supervised learning method that flag data is arranged; Classical semi-supervised learning method.Here the supervised learning method adopts the support vector machine method of machine learning field classics, and classical semi-supervised learning method adopts the semi-supervised support vector machine method of machine learning field classics.For the embodiment of the present invention, adopt semi-supervised support vector machine method to realize as semi-supervised classifier, Performance Evaluating Indexes adopts precision to realize.The embodiment of the present invention and two kinds of comparative approach adopt the realization of gaussian kernel as support vector machine method.In Fig. 3, thickened portion represents that the method significantly is better than classical supervision support vector machine method and (adopts the t-test statistical test, 95% degree of confidence), underscore represents that partly the method significantly is inferior to classical supervision support vector machine method (adopting the t-test statistical test, 95% degree of confidence).As seen, with existing semi-supervised learning technology repeatedly significantly descent performance compare, safe semi-supervised learning method provided by the present invention, seldom cause performance significantly to descend in implementation process, meanwhile obtained the performance comparable with existing semi-supervised learning technology height.

Claims (6)

1. the semi-supervised learning method of a safety, is characterized in that, comprises and build a plurality of semi-supervised classifier steps and build final safe semi-supervised classifier step;
A plurality of semi-supervised classifier steps of described structure are specially:
Step 100, for flag data and a large amount of Unlabeled data, a plurality of semi-supervised classifiers of random initializtion are arranged on a small quantity;
Step 101,, for each initial semi-supervised classifier,, according to the objective function of semi-supervised classifier, be optimized predicting the outcome of semi-supervised classifier by optimization method;
Step 102, be divided into a plurality of bunches with the clustering method that predicts the outcome by machine learning of the semi-supervised classifier optimized;
Step 103,, for each bunch of cluster result, export the wherein semi-supervised classifier of desired value optimum;
Step 104, collect the semi-supervised classifier of each bunch output, obtains a plurality of semi-supervised classifiers;
The final safe semi-supervised classifier step of described structure is specially:
Step 200,, to flag data training supervised learning method is arranged on a small quantity, obtain predicting the outcome on Unlabeled data;
Step 201, suppose that each semi-supervised classifier that step 104 builds is true sorter, according to predicting the outcome of supervised learning method, the definition performance that predicts the outcome on any Unlabeled data improved function;
Step 202, to predicting the outcome on any Unlabeled data, a plurality of performances that step 201 is obtained improve function, investigate minimum performance and improve the performance raising function that is defined as under worst case;
Step 203, improve objective function according to the performance under worst case, by optimization method, predicting the outcome of Unlabeled data is optimized, and makes its performance that maximizes under worst case improve objective function;
Step 204, with optimum results output, as predicting the outcome of final safe semi-supervised classifier.
2. safe semi-supervised learning method as claimed in claim 1 is characterized in that: described semi-supervised classifier comprises semi-supervised classifier based on production, based on the semi-supervised classifier of figure, based on the semi-supervised classifier of inconsistency with based on the semi-supervised classifier of support vector machine.
3. safe semi-supervised learning method as claimed in claim 1, it is characterized in that: described initialization semi-supervised classifier refers to predicting the outcome on Unlabeled data carried out initialization.
4. safe semi-supervised learning method as claimed in claim 1, it is characterized in that: the objective function of described semi-supervised classifier comprises interval and the probability likelihood of different classes of data.
5. safe semi-supervised learning method as claimed in claim 1 is characterized in that: the supervised learning method in described step 200 comprises production model method, arest neighbors prison learning method and support vector machine learning method.
6. safe semi-supervised learning method as claimed in claim 1 is characterized in that: the Performance Evaluating Indexes that described performance improves function comprises precision, precision ratio, recall ratio and F1 tolerance.
CN2013103155014A 2013-07-24 2013-07-24 Safe semi-supervised learning method Pending CN103390171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013103155014A CN103390171A (en) 2013-07-24 2013-07-24 Safe semi-supervised learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013103155014A CN103390171A (en) 2013-07-24 2013-07-24 Safe semi-supervised learning method

Publications (1)

Publication Number Publication Date
CN103390171A true CN103390171A (en) 2013-11-13

Family

ID=49534438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013103155014A Pending CN103390171A (en) 2013-07-24 2013-07-24 Safe semi-supervised learning method

Country Status (1)

Country Link
CN (1) CN103390171A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107255772A (en) * 2017-06-08 2017-10-17 南京工程学院 A kind of semi-supervised voltage dip accident source discrimination
CN107590262A (en) * 2017-09-21 2018-01-16 黄国华 The semi-supervised learning method of big data analysis
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
CN108885700A (en) * 2015-10-02 2018-11-23 川科德博有限公司 Data set semi-automatic labelling
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN111476300A (en) * 2020-04-07 2020-07-31 屈璠 Throat reflux recognition model establishing method, index obtaining method and electronic system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周志华,李宇峰: "Towards Making Unlabeled Data Never Hurt", 《THE 28TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING》, 31 December 2011 (2011-12-31) *
李宇峰,黄胜君,周志华: "一种基于正则化的半监督多标记学习方法", 《计算机研究与发展》, 31 December 2012 (2012-12-31) *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885700A (en) * 2015-10-02 2018-11-23 川科德博有限公司 Data set semi-automatic labelling
CN107255772A (en) * 2017-06-08 2017-10-17 南京工程学院 A kind of semi-supervised voltage dip accident source discrimination
CN107255772B (en) * 2017-06-08 2020-07-03 南京工程学院 Semi-supervised voltage sag accident source identification method
CN107590262A (en) * 2017-09-21 2018-01-16 黄国华 The semi-supervised learning method of big data analysis
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
WO2019071965A1 (en) * 2017-10-13 2019-04-18 平安科技(深圳)有限公司 Data processing method, data processing device, and computer readable storage medium
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN109977094B (en) * 2019-01-30 2021-02-19 中南大学 Semi-supervised learning method for structured data
CN111476300A (en) * 2020-04-07 2020-07-31 屈璠 Throat reflux recognition model establishing method, index obtaining method and electronic system

Similar Documents

Publication Publication Date Title
CN103390171A (en) Safe semi-supervised learning method
CN102609714B (en) Novel classification device and sorting technique based on information gain and Online SVM
CN105608512A (en) Short-term load forecasting method
CN104268627A (en) Short-term wind speed forecasting method based on deep neural network transfer model
CN102201236A (en) Speaker recognition method combining Gaussian mixture model and quantum neural network
CN104992244A (en) Airport freight traffic prediction analysis method based on SARIMA and RBF neural network integration combination model
Huang et al. An improved differential evolution algorithm based on adaptive parameter
US20190213475A1 (en) Reducing machine-learning model complexity while maintaining accuracy to improve processing speed
CN104751227A (en) Method and system for constructing deep neural network
CN106991442A (en) The self-adaptive kernel k means method and systems of shuffled frog leaping algorithm
CN103020489A (en) Novel method for forecasting siRNA interference efficiency based on ARM (Advanced RISC Machines) microprocessor
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
CN104376234B (en) promoter recognition method and system
CN108985323A (en) A kind of short term prediction method of photovoltaic power
CN106919955A (en) A kind of two points of K mean algorithms based on density criteria for classifying
CN107273842B (en) Selective integrated face recognition method based on CSJOGA algorithm
CN104615679A (en) Multi-agent data mining method based on artificial immunity network
CN103513965A (en) Method for extracting parallel AdaBoost characteristics of heterogeneous system
Bo et al. An improved PAM algorithm for optimizing initial cluster center
Jin et al. Mach number prediction models based on Ensemble Neural Networks for wind tunnel testing
CN104376124A (en) Clustering algorithm based on disturbance absorbing principle
Czarnowski et al. Data reduction algorithm for machine learning and data mining
Ferraro et al. A new fuzzy clustering algorithm with entropy regularization
CN103793602A (en) Global optimization method based on group abstract convex lower bound supporting surface
CN110647381B (en) Virtual machine resource balancing and deployment optimizing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131113

WD01 Invention patent application deemed withdrawn after publication