CN101980202A - Semi-supervised classification method of unbalance data - Google Patents

Semi-supervised classification method of unbalance data Download PDF

Info

Publication number
CN101980202A
CN101980202A CN2010105309121A CN201010530912A CN101980202A CN 101980202 A CN101980202 A CN 101980202A CN 2010105309121 A CN2010105309121 A CN 2010105309121A CN 201010530912 A CN201010530912 A CN 201010530912A CN 101980202 A CN101980202 A CN 101980202A
Authority
CN
China
Prior art keywords
sample
samples
sample set
data
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010105309121A
Other languages
Chinese (zh)
Inventor
王爽
焦李成
冯吭雨
钟桦
侯彪
缑水平
马文萍
张青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN2010105309121A priority Critical patent/CN101980202A/en
Publication of CN101980202A publication Critical patent/CN101980202A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a semi-supervised classification method of unbalance data, which is mainly used for solving the problem of low classification precision of a minority of data which have fewer marked samples and high degree of unbalance in the prior art. The method is implemented by the following steps: (1) initializing a marked sample set and an unmarked sample set; (2) initializing a cluster center; (3) implementing fuzzy clustering; (4) updating the marked sample set and unmarked sample set according to the result of the clustering; (5) performing the self-training based on a support vector machine (SVM) classifier; (6) updating the marked sample set and unmarked sample set according to the result of the self-training; (7) performing the classification of support vector machines Biased-SVM based on penalty parameters; and (8) estimating a classification result and outputting the result. For unbalance data which have fewer marked samples, the method improves the classification precision of a minority of data. And the method can be used for classifying and identifying unbalance data having few training samples.

Description

The semi-supervised sorting technique of unbalanced data
Technical field
The invention belongs to data processing field, relate to the unbalanced data classification, it is pattern-recognition and machine learning application in the data mining field, a kind of specifically unbalanced data sorting technique based on fuzzy clustering and semi-supervised learning can be used for the classification and the identification of the few unbalanced data of training sample.
Background technology
Be accompanied by global fast development of information technology, powerful computing machine, data collection facility and memory device provide lot of data information for people carry out transaction management, information retrieval and data analysis.Although the data volume that obtains is very big, the useful data of people are often only accounted for the sub-fraction of total data.The data set that this certain class sample size obviously is less than other class sample is known as the unbalanced data collection, the classification problem of unbalanced data collection is present among the actual life in a large number, for example, whether the credit application that detects the citizen exists swindle, and the swindle application will be far fewer than legal application generally speaking; Utilize diagnosis and treatment data diagnosis patient's disease, will be as the heart disease patient far fewer than the people of health.In these practical applications, what people more were concerned about is the minority class of data centralization, and promptly number of samples is far fewer than a class sample of other class sample, and the mistake of these minority class divides cost often very big, therefore need effectively improve the nicety of grading of minority class.
While is along with the development of data acquisition technology, obtain a large amount of unmarked samples and become very easy, and obtaining because of the needs lot of manpower and material resources is still difficult relatively of underlined sample assists a small amount of underlined sample to improve the learning performance of sorter thereby need to study the unmarked samples that how effectively utilize a large amount of existence.The thought of introducing semi-supervised learning can utilize underlined sample and unmarked sample data set is trained and to predict simultaneously, wherein the direct-push support vector machine TSVM method based on the svm classifier device is exactly a kind of representative semi-supervised sorting technique, this method need preestablish all kinds of number of samples ratios in the unmarked sample, this will estimate it according to the DATA DISTRIBUTION of underlined sample set usually, in actual applications, if the DATA DISTRIBUTION deviation of unmarked sample and underlined sample is bigger, will have a strong impact on the TSVM sorting technique to the classification of data set with predict the outcome.
In recent years, the classification problem of unbalanced data collection more and more receives the concern of data mining and machine learning research field, Chinese scholars mainly contains two aspects to the research of unbalanced data: one is based on the method for data sampling, its fundamental purpose is by data being carried out the degree of unbalancedness of pre-service reduction data, increasing the synthetic oversampling technique SMOTE of minority class sample of minority class sample as simulation; Two are based on the method for sorting algorithm, and the support vector machine Biased-SVM of the difference punishment parameter that people such as Veropoulos propose assigns different punishment parameters for all kinds of samples, has offset the influence of data degree of unbalancedness to sorter SVM to a certain extent.
In the face of the problem concerning study of unbalanced data collection, the difficulty of research mainly comes from the characteristics of unbalanced data collection itself: the minority class sample deficiency that unbalanced data is concentrated, and the distribution of sample can not well reflect the actual distribution of whole class; Most classes can be mingled with noise data usually, make two class samples tend to occur in various degree overlapping.In addition, the sorting technique in traditional machine learning field is when directly applying to the unbalanced data collection, if do not consider the unbalancedness of data, easily minority class sample mistake is divided into most classes, although whole nicety of grading is very lower to the nicety of grading of minority class than higher; Opposite, if too consider the influence of unbalancedness to sorting technique, the study phenomenon appearred again easily, though can reach very high nicety of grading to training set, in the face of the renewal of data set with when changing, classifying quality is not ideal enough again.
Summary of the invention
The objective of the invention is to overcome the deficiency of above-mentioned prior art, at the less unbalanced data of underlined sample, a kind of unbalanced data sorting technique based on fuzzy clustering and semi-supervised learning is proposed, with when considering the data unbalancedness, introduce the thought of semi-supervised learning, avoid the appearance of study phenomenon, improved sorter is concentrated minority class to data nicety of grading.
The technical thought that realizes the object of the invention is: by implementing fuzzy clustering, and in conjunction with self-training learning process based on the svm classifier device, unmarked sample is constantly carried out mark and utilization, expand the minority class in the underlined sample set, in all kinds of numbers of samples of equilibrium, for sorter provides how effective sample distribution information, thereby improve the classification performance of sorter to unbalanced data.Its technical scheme may further comprise the steps:
(1) read a unbalanced data collection that comprises two types, how much respectively these the two types notes according to number of samples are made minority class and most class, a picked at random part is as initial underlined sample set { x from this two classes unbalanced data sample i, with remaining data sample as initial unmarked sample set { x j;
(2) cluster centre to described unbalanced data collection carries out initialization:
(2a) to current underlined sample set { x iIn minority class sample and most class sample get average respectively, obtain average centralization M={m +, m -, m wherein +Be the average center of minority class sample, m -It is the average center of most class samples;
(2b) the average drifting algorithm is implemented at each center among the average centralization M respectively, found initial cluster center
Figure BDA0000030757080000021
Wherein
Figure BDA0000030757080000022
Be the initial cluster center of minority class sample,
Figure BDA0000030757080000023
It is the initial cluster center of most class samples;
(3) based on initial cluster center M *, current underlined and unmarked sample is implemented fuzzy C-means clustering, obtain cluster centre
Figure BDA0000030757080000024
Wherein
Figure BDA0000030757080000025
Be the cluster centre of minority class sample,
Figure BDA0000030757080000026
Be the cluster centre of most class samples, and current all unmarked samples are made U={u to the degree of membership set note of each cluster centre Cj| j ∈ (1,2 ..., u), c ∈ (+,-) }, u wherein CjBe the degree of membership of j unmarked sample to the cluster centre that is labeled as c, u is the number of samples of current unmarked sample set;
(4) by above-mentioned fuzzy clustering step, according to degree of membership set U, from current unmarked sample set { x jIn choose that cluster just is being labeled as and H sample of corresponding degree of membership maximum carries out mark, i.e. H=p * N +Thereby, current underlined sample set and unmarked sample set are updated to respectively
Figure BDA0000030757080000031
With
Figure BDA0000030757080000032
N in the formula +Be the number of samples of minority class in the current underlined sample set, p is the ratio that never selects the row labels of going forward side by side in the marker samples;
(5) to the data set after the above-mentioned cluster renewal With
Figure BDA0000030757080000034
Carry out self-training based on the svm classifier device;
(6) by above-mentioned self-training step, the unmarked sample set after upgrading from cluster
Figure BDA0000030757080000035
In choose the H of discriminant score maximum *Individual sample carries out mark, promptly
Figure BDA0000030757080000036
Thereby current underlined sample set and unmarked sample set are updated to respectively once more
Figure BDA0000030757080000037
With
Figure BDA0000030757080000038
In the formula
Figure BDA0000030757080000039
Underlined sample set after cluster is upgraded In the number of samples of minority class, p is the ratio that never selects the row labels of going forward side by side in the marker samples;
(7) to the data set after the above-mentioned self-training renewal
Figure BDA00000307570800000311
With Carry out classification based on the supporting vector machine Biased-SVM of difference punishment parameter;
(8) punish that based on difference the unbalanced data classification results of the supporting vector machine Biased-SVM of parameter utilizes geometric mean Gm to assess to above-mentioned;
(9) whether reach optimum according to the geometric mean that obtains, then stop iteration, return step (8) output category result, otherwise return step (2), till satisfying end condition if satisfy as end condition.
The present invention compared with prior art has following advantage:
(1) the present invention excavates DATA DISTRIBUTION information implicit in the unmarked sample owing to introduced unsupervised fuzzy clustering algorithm, thereby need not manually to pre-determine the mark of training sample, has avoided uninteresting time-consuming again markers work in the practical operation; Guide cluster process owing to the present invention uses underlined sample simultaneously, and do not rely on the initial distribution of underlined sample, therefore can not be subjected to the influence that renewal and variation brought of data set, thereby improved the generalization ability of sorter the unbalanced data classification.
(2) the present invention is owing to taken all factors into consideration in actual applications; it is less or be difficult to obtain to run into underlined sample through regular meeting; the very high again data set problem of degree of unbalancedness of while data; by implementing fuzzy clustering; and in conjunction with self-training learning process based on the svm classifier device; unmarked sample is constantly carried out mark and utilization; expand the minority class in the underlined sample set; thereby can be in all kinds of numbers of samples of equilibrium; for sorter provides how effective sample distribution information; avoided the appearance of study phenomenon, improved the classification performance of sorter unbalanced data.
Description of drawings
Fig. 1 is a process flow diagram of the present invention;
Fig. 2 is that the present invention uses the average drifting algorithm that cluster centre is carried out the initialization synoptic diagram
Fig. 3 be among the present invention parameter p performance impact analysis chart to sorter is set;
Fig. 4 is the geometric mean Gm comparison diagram that the present invention and prior art obtain on the unbalanced data collection.
Embodiment
With reference to Fig. 1, specific implementation step of the present invention is as follows:
Step 1, selected initial underlined sample set and initial unmarked sample set.
A given unbalanced data collection, the sample of this data set is divided into two types according to the different of its feature and attribute, how much respectively these the two types notes according to number of samples are made minority class and most class, concentrate a picked at random part as initial underlined sample set { x from this two classes unbalanced data i, with remaining data sample as initial unmarked sample set { x j.
Step 2 is carried out initialization to the cluster centre of described unbalanced data collection.
(2a) to current underlined sample set { x iIn minority class sample and most class sample get average respectively, obtain average centralization M={m +, m -, m wherein +Be the average center of minority class sample, m -It is the average center of most class samples;
(2b) with underlined and unmarked sample { x k| k=1 ..., n} is respectively to average centralization M={m +, m -In each central point implement average drifting algorithm, find initial cluster center Wherein Be the initial cluster center of minority class sample,
Figure BDA0000030757080000043
It is the initial cluster center of most class samples.
To average centralization M={m +, m -In each central point when implementing average drifting algorithm, at first with the following formula definition of average drifting vector:
M h ( x ) = Σ k = 1 n G ( x k - x h ) x k Σ k = 1 n G ( x k - x h ) - x , - - - 1 )
The corresponding central point of x wherein, G () adopts gaussian kernel function, and nucleus band is wide to be got
Figure BDA0000030757080000046
Be the standard deviation of data set, n is a number of samples; Then with 1) first of formula the right be designated as m h(x), given allowable error ε, and three steps below carrying out satisfy until termination condition,
(a) calculate m h(x);
(b) m h(x) compose to x;
(c) if || m h(x)-and x||<ε, end loop, otherwise return execution (a).
In above-mentioned average drifting algorithm, because m h(x)=x+M hAnd M (x), h(x) direction of sensing probability density gradient, be that probability density increases maximum direction, so the average drifting algorithm makes that by carrying out above step central point to be asked constantly moves along the gradient direction of probability density, finally finds the central point in the most intensive zone of sample distribution.
Fig. 2 has showed the validity that adopts average drifting algorithm initial cluster center.At first appoint and get two classes from four class square data centralizations of classics, the ratio of all kinds of numbers of samples is 1: 5, the sample of following picked at random 6% from all kinds of samples is as underlined sample, all the other are as unmarked sample, its DATA DISTRIBUTION is shown in Fig. 2 (a), "+" and " * " represents all kinds of underlined samples respectively, and rhombus " ◇ " is represented average centralization M={m among Fig. 2 (b) +, m -Each central point, the initial cluster center that " ☆ " representative obtains by the average drifting algorithm
Figure BDA0000030757080000051
Each central point, as seen from Figure 2, the initial cluster center point that the average drifting algorithm that the present invention uses obtains is more near all kinds of distribution center of data centralization.
Step 3 is based on the initial cluster center M that obtains in the step 2 *, current underlined and unmarked sample is implemented fuzzy C-means clustering, obtain cluster centre Wherein
Figure BDA0000030757080000053
Be the cluster centre of minority class sample,
Figure BDA0000030757080000054
Be the cluster centre of most class samples, and current all unmarked samples are made U={u to the degree of membership set note of each cluster centre Cj| j ∈ (1,2 ..., u), c ∈ (+,-) }, u wherein CjBe the degree of membership of j unmarked sample to the cluster centre that is labeled as c, u is the number of samples of current unmarked sample set.
The algorithm steps of described fuzzy C average is as follows:
(a) given initial cluster center;
(b) computing below repeating, up to the degree of membership value stabilization of underlined and unmarked sample:
(b1) calculate degree of membership:
u ck = ( 1 / | | x k - v c | | 2 ) 1 / ( b - 1 ) Σ c ( 1 / | | x k - v c | | 2 ) 1 / ( b - 1 ) , k = 1 , . . . , n , c ∈ ( + , - ) - - - 2 )
(b2) utilize the degree of membership that calculates in (b1), calculate cluster centre:
v c = Σ k = 1 n [ u ck ] b x k Σ k = 1 n [ u ck ] b , c ∈ ( + , - ) - - - 3 )
Wherein, v cCorresponding cluster centre point, u CkBe the degree of membership of k sample to the cluster centre that is labeled as c, x kBe underlined and the set of unmarked sample, n is a number of samples, and b is the fog-level coefficient.
Step 4 is by above-mentioned fuzzy clustering step, according to the degree of membership set U that obtains, from current unmarked sample set { x jIn choose that cluster just is being labeled as and H sample of corresponding degree of membership maximum carries out mark, i.e. H=p * N +Thereby, current underlined sample set and unmarked sample set are updated to respectively
Figure BDA0000030757080000062
With
Figure BDA0000030757080000063
N in the formula +Be the number of samples of minority class in the current underlined sample set, p is the ratio that never selects the row labels of going forward side by side in the marker samples.
Step 5 is to the data set after the above-mentioned cluster renewal
Figure BDA0000030757080000064
With
Figure BDA0000030757080000065
Carry out self-training based on the svm classifier device.
(5a) the underlined sample set after utilizing cluster to upgrade
Figure BDA0000030757080000066
Training svm classifier device, the svm classifier device arrives higher dimensional space by nonlinear transformation with data map, seeks the optimum linearity classifying face at higher dimensional space, and farthest with two class samples separately, the objective function of training svm classifier device can be expressed as:
min ( 1 2 | | w | | 2 + C Σ i ξ i ) - - - 4 )
s . t . y i ( w · x i * + b ) ≥ 1 - ξ i , ξ i ≥ 0 ,
Wherein C is the punishment parameter, and w is the weight vector on the optimal classification plane that obtains by training svm classifier device, and b is its bias vector, ξ iBe lax,
Figure BDA0000030757080000069
It is the underlined sample that is used to train;
(5b) utilize the discriminant function of svm classifier device
Figure BDA00000307570800000610
Obtain the unmarked sample set after cluster is upgraded
Figure BDA00000307570800000611
In the test badge of each sample Sgn () is-symbol function wherein,
Figure BDA00000307570800000613
It is the unmarked sample that is used to test.
Step 6, by above-mentioned self-training step, the unmarked sample set after upgrading from cluster
Figure BDA00000307570800000614
In choose the H of discriminant score maximum *Individual sample carries out mark, promptly
Figure BDA00000307570800000615
Thereby current underlined sample set and unmarked sample set are updated to respectively once more With
Figure BDA00000307570800000617
In the formula
Figure BDA00000307570800000618
It is the underlined sample set after cluster is upgraded
Figure BDA0000030757080000071
The number of samples of middle minority class.
Step 7 is to the data set after the above-mentioned self-training renewal
Figure BDA0000030757080000072
With
Figure BDA0000030757080000073
Carry out classification based on the supporting vector machine Biased-SVM of difference punishment parameter.
(7a) the underlined sample set after utilizing self-training to upgrade
Figure BDA0000030757080000074
The supporting vector machine Biased-SVM of the different punishment of training parameters, this training process is at the classification problem of unbalanced data collection, for all kinds of samples are assigned different punishment parameters respectively, with formula 4) in the objective function of the training svm classifier device described become:
min ( 1 2 | | w | | 2 + C + Σ { i | u i = + 1 } ξ i + C - Σ { i | y i = - 1 } ξ i ) - - - 5 )
Wherein, ξ iLax, y iBe the underlined sample x that is used to train iMark, C +Be the punishment parameter of assigning for minority class, C -It is the punishment parameter of assigning for most classes;
(7b) be utilized as all kinds of samples and assign different punishment parameters C respectively +And C -The discriminant function f (x of supporting vector machine Biased-SVM j)=wx j+ b obtains initial unmarked sample set { x jIn the test badge label (x of each sample j)=sgn (wx j+ b), x jIt is the unmarked sample that is used to test.
Step 8 punishes that based on difference the unbalanced data classification results of the supporting vector machine Biased-SVM of parameter utilizes geometric mean Gm to assess to above-mentioned.
(8a) calculate the nicety of grading of minority class respectively
Figure BDA0000030757080000076
Nicety of grading with most classes
Figure BDA0000030757080000077
Wherein, corresponding to predicting the outcome of data, TP is predicted as minority class and actual is the minority class number of samples, FP is predicted as minority class but actual is the number of samples of most classes, FN is predicted as most classes but actual be that the number of samples of minority class, TN are to be predicted as most classes and reality is the number of samples of most classes;
(8b) Se and the Sp value that obtains according to aforementioned calculation, the computational geometry average
Figure BDA0000030757080000078
Whether step 9 reaches optimum as end condition according to the geometric mean that obtains, and then stops iteration if satisfy, and returns step (8) output category result, otherwise returns step (2), till satisfying end condition.
Effect of the present invention can further specify by following emulation experiment:
One, experiment condition and parameter setting
Under the MATLAB simulated environment, based on supporting vector machine SVMlight tool box, to the unbalanced data collection contrast experiment that classifies, wherein the parameter that uses of each method is provided with as follows to the inventive method and prior art:
A) sample of picked at random t=30% is as initial underlined sample set respectively from all kinds of samples of unbalanced data collection, and all the other are as initial unmarked sample set;
B) in supporting vector machine SVM method, adopt linear kernel, punishment parameters C=100;
C) the punishment parameter of most classes is got C in the supporting vector machine Biased-SVM of difference punishment parameter -=100, and use formula C +/ C -=N -/ N +Calculate the punishment parameters C of minority class +, N wherein +And N -Be respectively the number of samples of minority class and most classes in the current underlined sample set;
D) get parameter n=100 in the synthetic oversampling technique SMOTE of minority class sample, promptly the number of samples of the new minority class that produces is original 2 times;
E) the inventive method perhaps error ε=0.1 of trying to please in the average drifting algorithm never selects the ratio p=0.1 of the row labels of going forward side by side in the marker samples in the iteration cycle process.
Two, experiment content and interpretation of result
In order to verify the inventive method advantage on the unbalanced data classification problem compared to existing technology, use one group of higher each method of biological data set pair of data degree of unbalancedness contrast experiment that classifies in the experiment, this biological data collection is as shown in table 1.
Table 1: the description of unbalanced data collection
Figure BDA0000030757080000081
Data degree of unbalancedness in the table 1 refers to the ratio of the concentrated minority class of unbalanced data and most class numbers of samples.The control methods of using in the experiment comprises: the inventive method and prior art supporting vector machine SVM method, the supporting vector machine Biased-SVM method of different punishment parameters, the synthetic oversampling technique SMOTE method of minority class sample and direct-push supporting vector machine TSVM method.
A) related experiment of utilizing the unbalanced data of table 1 that each method is carried out thes contents are as follows:
A1) being provided with of parameter p tested the performance impact analysis of classification among the present invention.
Use the inventive method in parameter p value successively { 0.01,0.03,0.1,0.3,0.5} condition under unbalanced data Data1.2 is carried out classification experiments, its result as shown in Figure 3, each the bar curve among Fig. 3 be the inventive method under the different value conditions of parameter p, its classification performance is with the situation of change of iterations.As can be seen from Figure 3, the P value is big more, be that never to select the ratio of the row labels of going forward side by side in each iteration in the marker samples big more, it is few more that the classification performance of the inventive method reaches optimum required iterations, though time complexity has reduced, but the probability to unmarked sample error flag in each iteration will become greatly, thereby the classification performance of method also decreases.This shows that choosing of p value be the compromise of classification performance and algorithm time complexity, get empirical value p=0.1 according to a large amount of experimental result unifications in the experiment.
A2) the inventive method and the prior art classification contrast experiment on the unbalanced data collection.
The unbalanced data collection is under various sorting techniques, and the nicety of grading Sp value of the nicety of grading Se of minority class and most classes is as shown in table 2; In order better to assess the whole classification performance of various sorting techniques, table 3 provides the geometric mean Gm of unbalanced data collection under various sorting techniques, and wherein last column of form is the average classification situation of each sorting technique to unbalanced data; Find out that for clearer the inventive method is depicted as histogram to the advantage of unbalanced data classification with the experimental result of table 3, as shown in Figure 4.
Table 2:Se and Sp value contrast and experiment
Figure BDA0000030757080000091
Table 3:Gm value contrast and experiment
Figure BDA0000030757080000092
B) interpretation.
As can be seen from Table 2, the nicety of grading Se value of the minority class of prior art is lower, and the nicety of grading Sp value of most classes is relatively very high, and this is because when handling the unbalanced data classification problem, prior art with nearly all unmarked sample all mistake be divided into most classes.
Current key at the unbalanced data sort research is how farthest to improve the nicety of grading of minority class in the nicety of grading that guarantees most classes, thereby improves the nicety of grading to unbalanced data.
From table 3 and Fig. 4 as can be seen, the inventive method has obtained higher geometric mean Gm compared to existing technology, thereby unbalanced data has been obtained better nicety of grading.
To sum up test described, the present invention is directed to the less unbalanced data classification problem of underlined sample, a kind of unbalanced data sorting technique based on fuzzy clustering and semi-supervised learning is proposed, by on one group of biological data, the inventive method and prior art being implemented the classification contrast experiment, verified that the inventive method can obtain better nicety of grading to unbalanced data compared to existing technology.

Claims (4)

1. the semi-supervised sorting technique of a unbalanced data comprises the steps:
(1) read a unbalanced data collection that comprises two types, how much respectively these the two types notes according to number of samples are made minority class and most class, a picked at random part is as initial underlined sample set { x from this two classes unbalanced data sample i, with remaining data sample as initial unmarked sample set { x j;
(2) cluster centre to described unbalanced data collection carries out initialization:
(2a) to current underlined sample set { x iIn minority class sample and most class sample get average respectively, obtain average centralization M={m +, m -, m wherein +Be the average center of minority class sample, m -It is the average center of most class samples;
(2b) the average drifting algorithm is implemented at each center among the average centralization M respectively, found initial cluster center
Figure FDA0000030757070000011
Wherein Be the initial cluster center of minority class sample,
Figure FDA0000030757070000013
It is the initial cluster center of most class samples;
(3) based on initial cluster center M *, current underlined and unmarked sample is implemented fuzzy C-means clustering, obtain cluster centre
Figure FDA0000030757070000014
Wherein
Figure FDA0000030757070000015
Be the cluster centre of minority class sample,
Figure FDA0000030757070000016
Be the cluster centre of most class samples, and current all unmarked samples are made U={u to the degree of membership set note of each cluster centre Cj| j ∈ (1,2 ..., u), c ∈ (+,-) }, u wherein CjBe the degree of membership of j unmarked sample to the cluster centre that is labeled as c, u is the number of samples of current unmarked sample set;
(4) by above-mentioned fuzzy clustering step, according to degree of membership set U, from current unmarked sample set { x jIn choose that cluster just is being labeled as and H sample of corresponding degree of membership maximum carries out mark, i.e. H=p * N +Thereby, current underlined sample set and unmarked sample set are updated to respectively With
Figure FDA0000030757070000018
N in the formula +Be the number of samples of minority class in the current underlined sample set, p is the ratio that never selects the row labels of going forward side by side in the marker samples;
(5) to the data set after the above-mentioned cluster renewal
Figure FDA0000030757070000019
With
Figure FDA00000307570700000110
Carry out self-training based on the svm classifier device;
(6) by above-mentioned self-training step, the unmarked sample set after upgrading from cluster In choose the H of discriminant score maximum *Individual sample carries out mark, promptly
Figure FDA00000307570700000112
Thereby current underlined sample set and unmarked sample set are updated to respectively once more
Figure FDA00000307570700000113
With
Figure FDA00000307570700000114
In the formula
Figure FDA00000307570700000115
Underlined sample set after cluster is upgraded
Figure FDA00000307570700000116
In the number of samples of minority class, p is the ratio that never selects the row labels of going forward side by side in the marker samples;
(7) to the data set after the above-mentioned self-training renewal With
Figure FDA00000307570700000118
Carry out classification based on the supporting vector machine Biased-SVM of difference punishment parameter;
(8) punish that based on difference the unbalanced data classification results of the supporting vector machine Biased-SVM of parameter utilizes geometric mean Gm to assess to above-mentioned;
(9) whether reach optimum according to the geometric mean that obtains, then stop iteration, return step (8) output category result, otherwise return step (2), till satisfying end condition if satisfy as end condition.
2. according to the semi-supervised sorting technique of the unbalanced data of claim 1, the data set that wherein step (5) is described after cluster is upgraded
Figure FDA0000030757070000021
With Carry out self-training, carry out as follows based on the svm classifier device:
(5a) the underlined sample set after utilizing cluster to upgrade
Figure FDA0000030757070000023
Training svm classifier device;
(5b) utilize the discriminant function of svm classifier device
Figure FDA0000030757070000024
Obtain the unmarked sample set after cluster is upgraded
Figure FDA0000030757070000025
In the test badge of each sample
Figure FDA0000030757070000026
Wherein w is the weight vector on the optimal classification plane that obtains by training svm classifier device, and b is its bias vector, sgn () is-symbol function,
Figure FDA0000030757070000027
It is the unmarked sample that is used to test.
3. according to the semi-supervised sorting technique of the unbalanced data of claim 1, the data set that wherein step (7) is described after self-training is upgraded
Figure FDA0000030757070000028
With
Figure FDA0000030757070000029
Carry out classification, carry out as follows based on the supporting vector machine Biased-SVM of difference punishment parameter:
(7a) the underlined sample set after utilizing self-training to upgrade
Figure FDA00000307570700000210
The supporting vector machine Biased-SVM of the different punishment of training parameter;
(7b) utilize the different discriminant function f (x that punish the supporting vector machine Biased-SVM of parameters j)=wx j+ b obtains initial unmarked sample set { x jIn the test badge label (x of each sample j)=sgn (wx j+ b), wherein w is the weight vector that the training difference is punished the optimal classification plane that the supporting vector machine Biased-SVM of parameter obtains, b is its bias vector, sgn () is-symbol function, x jIt is the unmarked sample that is used to test.
4. according to the semi-supervised sorting technique of the unbalanced data of claim 1, wherein step (8) is described to punishing that based on difference the unbalanced data classification results of the supporting vector machine Biased-SVM of parameter utilizes geometric mean Gm to assess, and carries out as follows:
(8a) calculate the nicety of grading of minority class respectively Nicety of grading with most classes
Figure FDA00000307570700000212
Wherein, corresponding to predicting the outcome of data, TP is predicted as minority class and actual is the minority class number of samples, FP is predicted as minority class but actual is the number of samples of most classes, FN is predicted as most classes but actual be that the number of samples of minority class, TN are to be predicted as most classes and reality is the number of samples of most classes;
(8b) Se and the Sp value that obtains according to aforementioned calculation, the computational geometry average
Figure FDA00000307570700000213
CN2010105309121A 2010-11-04 2010-11-04 Semi-supervised classification method of unbalance data Pending CN101980202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105309121A CN101980202A (en) 2010-11-04 2010-11-04 Semi-supervised classification method of unbalance data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105309121A CN101980202A (en) 2010-11-04 2010-11-04 Semi-supervised classification method of unbalance data

Publications (1)

Publication Number Publication Date
CN101980202A true CN101980202A (en) 2011-02-23

Family

ID=43600704

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105309121A Pending CN101980202A (en) 2010-11-04 2010-11-04 Semi-supervised classification method of unbalance data

Country Status (1)

Country Link
CN (1) CN101980202A (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208037A (en) * 2011-06-10 2011-10-05 西安电子科技大学 Hyper-spectral image classification method based on Gaussian process classifier collaborative training algorithm
CN102254177A (en) * 2011-04-22 2011-11-23 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
CN102402690A (en) * 2011-09-28 2012-04-04 南京师范大学 Data classification method based on intuitive fuzzy integration and system
CN102495901A (en) * 2011-12-16 2012-06-13 山东师范大学 Method for keeping balance of implementation class data through local mean
CN103020122A (en) * 2012-11-16 2013-04-03 哈尔滨工程大学 Transfer learning method based on semi-supervised clustering
CN103390171A (en) * 2013-07-24 2013-11-13 南京大学 Safe semi-supervised learning method
CN103605990A (en) * 2013-10-23 2014-02-26 江苏大学 Integrated multi-classifier fusion classification method and integrated multi-classifier fusion classification system based on graph clustering label propagation
CN103886330A (en) * 2014-03-27 2014-06-25 西安电子科技大学 Classification method based on semi-supervised SVM ensemble learning
CN103914704A (en) * 2014-03-04 2014-07-09 西安电子科技大学 Polarimetric SAR image classification method based on semi-supervised SVM and mean shift
CN104063520A (en) * 2014-07-17 2014-09-24 哈尔滨理工大学 Unbalance data classifying method based on cluster sampling kernel transformation
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104598930A (en) * 2015-02-05 2015-05-06 清华大学无锡应用技术研究院 Quick measurement method of characteristic resolutions
CN104809476A (en) * 2015-05-12 2015-07-29 西安电子科技大学 Multi-target evolutionary fuzzy rule classification method based on decomposition
CN104933053A (en) * 2014-03-18 2015-09-23 中国银联股份有限公司 Classification of class-imbalanced data
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN105243394A (en) * 2015-11-03 2016-01-13 中国矿业大学 Evaluation method for performance influence degree of classification models by class imbalance
CN105320677A (en) * 2014-07-10 2016-02-10 香港中文大学深圳研究院 Method and device for training streamed unbalance data
CN105354583A (en) * 2015-08-24 2016-02-24 西安电子科技大学 Local mean based imbalance data classification method
CN106127225A (en) * 2016-06-13 2016-11-16 西安电子科技大学 Semi-supervised hyperspectral image classification method based on rarefaction representation
CN106156789A (en) * 2016-05-09 2016-11-23 浙江师范大学 Towards the validity feature sample identification techniques strengthening grader popularization performance
CN106201897A (en) * 2016-07-26 2016-12-07 南京航空航天大学 Software defect based on main constituent distribution function prediction unbalanced data processing method
CN106294593A (en) * 2016-07-28 2017-01-04 浙江大学 In conjunction with subordinate clause level remote supervisory and the Relation extraction method of semi-supervised integrated study
CN106599618A (en) * 2016-12-23 2017-04-26 吉林大学 Non-supervision classification method for metagenome contigs
CN106596900A (en) * 2016-12-13 2017-04-26 贵州电网有限责任公司电力科学研究院 Transformer fault diagnosis method based on improved semi-supervised classification of graph
CN103902706B (en) * 2014-03-31 2017-05-03 东华大学 Method for classifying and predicting big data on basis of SVM (support vector machine)
CN107038330A (en) * 2016-10-27 2017-08-11 北京郁金香伙伴科技有限公司 A kind of compensation method of shortage of data and device
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
CN107657282A (en) * 2017-09-29 2018-02-02 中国石油大学(华东) Peptide identification from step-length learning method
CN107958216A (en) * 2017-11-27 2018-04-24 沈阳航空航天大学 Based on semi-supervised multi-modal deep learning sorting technique
CN108304427A (en) * 2017-04-28 2018-07-20 腾讯科技(深圳)有限公司 A kind of user visitor's heap sort method and apparatus
CN108509973A (en) * 2018-01-19 2018-09-07 南京航空航天大学 Based on the Cholesky least square method supporting vector machine learning algorithms decomposed and its application
CN108647728A (en) * 2018-05-10 2018-10-12 广州大学 Unbalanced data classification oversampler method, device, equipment and medium
CN108960561A (en) * 2018-05-04 2018-12-07 阿里巴巴集团控股有限公司 A kind of air control model treatment method, device and equipment based on unbalanced data
CN109165694A (en) * 2018-09-12 2019-01-08 太原理工大学 The classification method and system of a kind of pair of non-equilibrium data collection
CN109508350A (en) * 2018-11-05 2019-03-22 北京邮电大学 The method and apparatus that a kind of pair of data are sampled
CN109726821A (en) * 2018-11-27 2019-05-07 东软集团股份有限公司 Data balancing method, device, computer readable storage medium and electronic equipment
CN109829487A (en) * 2019-01-16 2019-05-31 上海上塔软件开发有限公司 A kind of clustering method based on segmentation statistical nature distance
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting
CN110263166A (en) * 2019-06-18 2019-09-20 北京海致星图科技有限公司 Public sentiment file classification method based on deep learning
CN110377587A (en) * 2019-07-15 2019-10-25 腾讯科技(深圳)有限公司 Method, apparatus, equipment and medium are determined based on the migrating data of machine learning
CN110442722A (en) * 2019-08-13 2019-11-12 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN110569876A (en) * 2019-08-07 2019-12-13 武汉中原电子信息有限公司 Non-invasive load identification method and device and computing equipment
CN110579709A (en) * 2019-08-30 2019-12-17 西南交通大学 fault diagnosis method for proton exchange membrane fuel cell for tramcar
CN110795732A (en) * 2019-10-10 2020-02-14 南京航空航天大学 SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
CN110796262A (en) * 2019-09-26 2020-02-14 北京淇瑀信息科技有限公司 Test data optimization method and device of machine learning model and electronic equipment
CN110889445A (en) * 2019-11-22 2020-03-17 咪咕文化科技有限公司 Video CDN hotlinking detection method and device, electronic equipment and storage medium
CN110933102A (en) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN111241360A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium
CN111291818A (en) * 2020-02-18 2020-06-16 浙江工业大学 Non-uniform class sample equalization method for cloud mask
CN111814851A (en) * 2020-06-24 2020-10-23 重庆邮电大学 Coal mine gas data marking method based on single-class support vector machine
CN111832627A (en) * 2020-06-19 2020-10-27 华中科技大学 Image classification model training method, classification method and system for suppressing label noise

Cited By (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254177B (en) * 2011-04-22 2013-06-05 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
CN102254177A (en) * 2011-04-22 2011-11-23 哈尔滨工程大学 Bearing fault detection method for unbalanced data SVM (support vector machine)
CN102208037B (en) * 2011-06-10 2012-10-24 西安电子科技大学 Hyper-spectral image classification method based on Gaussian process classifier collaborative training algorithm
CN102208037A (en) * 2011-06-10 2011-10-05 西安电子科技大学 Hyper-spectral image classification method based on Gaussian process classifier collaborative training algorithm
CN102402690A (en) * 2011-09-28 2012-04-04 南京师范大学 Data classification method based on intuitive fuzzy integration and system
CN102402690B (en) * 2011-09-28 2016-02-24 南京师范大学 The data classification method integrated based on intuitionistic fuzzy and system
CN102495901B (en) * 2011-12-16 2014-10-15 山东师范大学 Method for keeping balance of implementation class data through local mean
CN102495901A (en) * 2011-12-16 2012-06-13 山东师范大学 Method for keeping balance of implementation class data through local mean
CN103020122A (en) * 2012-11-16 2013-04-03 哈尔滨工程大学 Transfer learning method based on semi-supervised clustering
CN103020122B (en) * 2012-11-16 2015-09-30 哈尔滨工程大学 A kind of transfer learning method based on semi-supervised clustering
CN103390171A (en) * 2013-07-24 2013-11-13 南京大学 Safe semi-supervised learning method
CN103605990A (en) * 2013-10-23 2014-02-26 江苏大学 Integrated multi-classifier fusion classification method and integrated multi-classifier fusion classification system based on graph clustering label propagation
CN103605990B (en) * 2013-10-23 2017-02-08 江苏大学 Integrated multi-classifier fusion classification method and integrated multi-classifier fusion classification system based on graph clustering label propagation
CN103914704B (en) * 2014-03-04 2017-02-08 西安电子科技大学 Polarimetric SAR image classification method based on semi-supervised SVM and mean shift
CN103914704A (en) * 2014-03-04 2014-07-09 西安电子科技大学 Polarimetric SAR image classification method based on semi-supervised SVM and mean shift
CN104933053A (en) * 2014-03-18 2015-09-23 中国银联股份有限公司 Classification of class-imbalanced data
CN103886330B (en) * 2014-03-27 2017-03-01 西安电子科技大学 Sorting technique based on semi-supervised SVM integrated study
CN103886330A (en) * 2014-03-27 2014-06-25 西安电子科技大学 Classification method based on semi-supervised SVM ensemble learning
CN103902706B (en) * 2014-03-31 2017-05-03 东华大学 Method for classifying and predicting big data on basis of SVM (support vector machine)
CN105320677A (en) * 2014-07-10 2016-02-10 香港中文大学深圳研究院 Method and device for training streamed unbalance data
CN104063520A (en) * 2014-07-17 2014-09-24 哈尔滨理工大学 Unbalance data classifying method based on cluster sampling kernel transformation
CN104573708A (en) * 2014-12-19 2015-04-29 天津大学 Ensemble-of-under-sampled extreme learning machine
CN104598930A (en) * 2015-02-05 2015-05-06 清华大学无锡应用技术研究院 Quick measurement method of characteristic resolutions
CN104809476B (en) * 2015-05-12 2018-07-31 西安电子科技大学 A kind of multi-target evolution Fuzzy Rule Classification method based on decomposition
CN104809476A (en) * 2015-05-12 2015-07-29 西安电子科技大学 Multi-target evolutionary fuzzy rule classification method based on decomposition
CN104951809A (en) * 2015-07-14 2015-09-30 西安电子科技大学 Unbalanced data classification method based on unbalanced classification indexes and integrated learning
CN105354583B (en) * 2015-08-24 2018-08-31 西安电子科技大学 Unbalanced data sorting technique based on local mean value
CN105354583A (en) * 2015-08-24 2016-02-24 西安电子科技大学 Local mean based imbalance data classification method
CN105243394B (en) * 2015-11-03 2019-03-19 中国矿业大学 Evaluation method of the one type imbalance to disaggregated model performance influence degree
CN105243394A (en) * 2015-11-03 2016-01-13 中国矿业大学 Evaluation method for performance influence degree of classification models by class imbalance
CN106156789A (en) * 2016-05-09 2016-11-23 浙江师范大学 Towards the validity feature sample identification techniques strengthening grader popularization performance
CN106127225B (en) * 2016-06-13 2019-07-02 西安电子科技大学 Semi-supervised hyperspectral image classification method based on rarefaction representation
CN106127225A (en) * 2016-06-13 2016-11-16 西安电子科技大学 Semi-supervised hyperspectral image classification method based on rarefaction representation
CN106201897A (en) * 2016-07-26 2016-12-07 南京航空航天大学 Software defect based on main constituent distribution function prediction unbalanced data processing method
CN106201897B (en) * 2016-07-26 2018-08-24 南京航空航天大学 Software defect based on principal component distribution function predicts unbalanced data processing method
CN106294593A (en) * 2016-07-28 2017-01-04 浙江大学 In conjunction with subordinate clause level remote supervisory and the Relation extraction method of semi-supervised integrated study
CN106294593B (en) * 2016-07-28 2019-04-09 浙江大学 In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study
CN107038330B (en) * 2016-10-27 2020-09-08 北京郁金香伙伴科技有限公司 Data missing compensation method and device
CN107038330A (en) * 2016-10-27 2017-08-11 北京郁金香伙伴科技有限公司 A kind of compensation method of shortage of data and device
CN106596900A (en) * 2016-12-13 2017-04-26 贵州电网有限责任公司电力科学研究院 Transformer fault diagnosis method based on improved semi-supervised classification of graph
CN106599618A (en) * 2016-12-23 2017-04-26 吉林大学 Non-supervision classification method for metagenome contigs
CN108304427B (en) * 2017-04-28 2020-03-17 腾讯科技(深圳)有限公司 User passenger group classification method and device
CN108304427A (en) * 2017-04-28 2018-07-20 腾讯科技(深圳)有限公司 A kind of user visitor's heap sort method and apparatus
WO2018196798A1 (en) * 2017-04-28 2018-11-01 腾讯科技(深圳)有限公司 User group classification method and device
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107391569B (en) * 2017-06-16 2020-09-15 阿里巴巴集团控股有限公司 Data type identification, model training and risk identification method, device and equipment
CN107391569A (en) * 2017-06-16 2017-11-24 阿里巴巴集团控股有限公司 Identification, model training, Risk Identification Method, device and the equipment of data type
US11100220B2 (en) 2017-06-16 2021-08-24 Advanced New Technologies Co., Ltd. Data type recognition, model training and risk recognition methods, apparatuses and devices
US11113394B2 (en) 2017-06-16 2021-09-07 Advanced New Technologies Co., Ltd. Data type recognition, model training and risk recognition methods, apparatuses and devices
CN107657282A (en) * 2017-09-29 2018-02-02 中国石油大学(华东) Peptide identification from step-length learning method
CN107958216A (en) * 2017-11-27 2018-04-24 沈阳航空航天大学 Based on semi-supervised multi-modal deep learning sorting technique
CN108509973A (en) * 2018-01-19 2018-09-07 南京航空航天大学 Based on the Cholesky least square method supporting vector machine learning algorithms decomposed and its application
CN108509973B (en) * 2018-01-19 2022-04-05 南京航空航天大学 Least square support vector machine learning algorithm based on Cholesky decomposition and application thereof
CN108960561A (en) * 2018-05-04 2018-12-07 阿里巴巴集团控股有限公司 A kind of air control model treatment method, device and equipment based on unbalanced data
CN108647728A (en) * 2018-05-10 2018-10-12 广州大学 Unbalanced data classification oversampler method, device, equipment and medium
CN109165694A (en) * 2018-09-12 2019-01-08 太原理工大学 The classification method and system of a kind of pair of non-equilibrium data collection
CN109165694B (en) * 2018-09-12 2022-07-08 太原理工大学 Method and system for classifying unbalanced data sets
CN109508350B (en) * 2018-11-05 2022-04-12 北京邮电大学 Method and device for sampling data
CN109508350A (en) * 2018-11-05 2019-03-22 北京邮电大学 The method and apparatus that a kind of pair of data are sampled
CN109726821A (en) * 2018-11-27 2019-05-07 东软集团股份有限公司 Data balancing method, device, computer readable storage medium and electronic equipment
CN109829487A (en) * 2019-01-16 2019-05-31 上海上塔软件开发有限公司 A kind of clustering method based on segmentation statistical nature distance
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting
CN110263166A (en) * 2019-06-18 2019-09-20 北京海致星图科技有限公司 Public sentiment file classification method based on deep learning
CN110377587A (en) * 2019-07-15 2019-10-25 腾讯科技(深圳)有限公司 Method, apparatus, equipment and medium are determined based on the migrating data of machine learning
CN110569876A (en) * 2019-08-07 2019-12-13 武汉中原电子信息有限公司 Non-invasive load identification method and device and computing equipment
CN110442722B (en) * 2019-08-13 2022-05-13 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN110442722A (en) * 2019-08-13 2019-11-12 北京金山数字娱乐科技有限公司 Method and device for training classification model and method and device for data classification
CN110579709A (en) * 2019-08-30 2019-12-17 西南交通大学 fault diagnosis method for proton exchange membrane fuel cell for tramcar
CN110579709B (en) * 2019-08-30 2021-04-13 西南交通大学 Fault diagnosis method for proton exchange membrane fuel cell for tramcar
CN110796262B (en) * 2019-09-26 2023-09-29 北京淇瑀信息科技有限公司 Test data optimization method and device of machine learning model and electronic equipment
CN110796262A (en) * 2019-09-26 2020-02-14 北京淇瑀信息科技有限公司 Test data optimization method and device of machine learning model and electronic equipment
CN110795732A (en) * 2019-10-10 2020-02-14 南京航空航天大学 SVM-based dynamic and static combination detection method for malicious codes of Android mobile network terminal
CN110889445A (en) * 2019-11-22 2020-03-17 咪咕文化科技有限公司 Video CDN hotlinking detection method and device, electronic equipment and storage medium
CN110889445B (en) * 2019-11-22 2022-09-27 咪咕文化科技有限公司 Video CDN hotlinking detection method and device, electronic equipment and storage medium
CN110933102B (en) * 2019-12-11 2021-10-26 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN110933102A (en) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 Abnormal flow detection model training method and device based on semi-supervised learning
CN111241360A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium
CN111241360B (en) * 2020-01-09 2023-03-21 深圳市雅阅科技有限公司 Information recommendation method, device, equipment and storage medium
CN111291818A (en) * 2020-02-18 2020-06-16 浙江工业大学 Non-uniform class sample equalization method for cloud mask
CN111832627A (en) * 2020-06-19 2020-10-27 华中科技大学 Image classification model training method, classification method and system for suppressing label noise
CN111832627B (en) * 2020-06-19 2022-08-05 华中科技大学 Image classification model training method, classification method and system for suppressing label noise
CN111814851A (en) * 2020-06-24 2020-10-23 重庆邮电大学 Coal mine gas data marking method based on single-class support vector machine

Similar Documents

Publication Publication Date Title
CN101980202A (en) Semi-supervised classification method of unbalance data
Bansal et al. Improved k-mean clustering algorithm for prediction analysis using classification technique in data mining
Guillaumin et al. Large-scale knowledge transfer for object localization in imagenet
CN105389583A (en) Image classifier generation method, and image classification method and device
CN102722713B (en) Handwritten numeral recognition method based on lie group structure data and system thereof
CN103617429A (en) Sorting method and system for active learning
CN103745233B (en) The hyperspectral image classification method migrated based on spatial information
CN106156805A (en) A kind of classifier training method of sample label missing data
CN103839078A (en) Hyperspectral image classifying method based on active learning
CN102819688A (en) Two-dimensional seismic data full-layer tracking method based on semi-supervised classification
CN106228027A (en) A kind of semi-supervised feature selection approach of various visual angles data
CN103020167A (en) Chinese text classification method for computer
CN104750875A (en) Machine error data classification method and system
CN106600046A (en) Multi-classifier fusion-based land unused condition prediction method and device
CN111859983A (en) Natural language labeling method based on artificial intelligence and related equipment
CN106326938A (en) SAR image target discrimination method based on weakly supervised learning
CN103473308B (en) High-dimensional multimedia data classifying method based on maximum margin tensor study
CN109933619A (en) A kind of semisupervised classification prediction technique
CN110175657A (en) A kind of image multi-tag labeling method, device, equipment and readable storage medium storing program for executing
Malakar et al. Offline music symbol recognition using Daisy feature and quantum Grey wolf optimization based feature selection
CN106250913A (en) A kind of combining classifiers licence plate recognition method based on local canonical correlation analysis
CN104268557A (en) Polarization SAR classification method based on cooperative training and depth SVM
CN105894035B (en) SAR image classification method based on SAR-SIFT and DBN
CN109615002A (en) Decision tree SVM university student's consumer behavior evaluation method based on PSO
CN104268555A (en) Polarization SAR image classification method based on fuzzy sparse LSSVM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110223