CN101025729A - Pattern classification rcognition method based on rough support vector machine - Google Patents

Pattern classification rcognition method based on rough support vector machine Download PDF

Info

Publication number
CN101025729A
CN101025729A CNA2007100386365A CN200710038636A CN101025729A CN 101025729 A CN101025729 A CN 101025729A CN A2007100386365 A CNA2007100386365 A CN A2007100386365A CN 200710038636 A CN200710038636 A CN 200710038636A CN 101025729 A CN101025729 A CN 101025729A
Authority
CN
China
Prior art keywords
alpha
support vector
vector machine
sigma
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2007100386365A
Other languages
Chinese (zh)
Inventor
汪源源
张俊华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CNA2007100386365A priority Critical patent/CN101025729A/en
Publication of CN101025729A publication Critical patent/CN101025729A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the mode classifying and identifying technical field, concretely a support vector machine (SVM) -based mode classifying and identifying method. And the invention introduces rough set theory into SVM and defines SVM classifying interval to be a rough set classifying interval, and when training SVM, makes the rough set interval maximum to determine the optimum interface between two classes. And the invention solves the over-learning problem of the traditional SVM on the condition of noises or outliers, so as to improve SVM spreading ability. And the needed calculating time is equivalent to that of the traditional SVM.

Description

Pattern classification rcognition method based on rough support vector machine
Technical field
The invention belongs to pattern classification recognition technology field, be specifically related to a kind of pattern classification rcognition method based on support vector machine (support vectormachine SVM).
Technical background
The purpose of pattern-recognition is that object is classified, and its application is very extensive, for example, and computer-aided diagnosis, character recognition, speech recognition or the like.Support vector machine [1] [2] is a kind of method of the pattern-recognition based on Statistical Learning Theory.Traditional mode identification method based on Statistical Learning Theory is that rule of thumb the risk minimum is come training classifier as the sorter based on Bayesian decision theory.But empiric risk has only as the infinite great talent of sample number and levels off to expected risk.Therefore, the sorter based on the empiric risk minimum can not guarantee higher popularization ability.If number of training is less, may be relatively poor based on the sorter performance of empiric risk minimum.Support vector machine is based on the minimum principle of structure risk, in feature space, seek an optimum interphase, make that this interphase can be as much as possible with correct the separating of two class data points, make two class data points separately simultaneously farthest, promptly find the equilibrium point between empiric risk minimum and the popularization ability maximum apart from interphase.Existing result of study shows that to less training sample set, support vector machine is that performance is best in the various sorters.
Because traditional optimum interphase that support vector machine obtained only depends on a spot of support vector, when having noise in the training sample or peel off data, problem concerning study [3] still may appear in traditional support vector machine.At this problem, this year improving one's methods of many support vector machines appearred, as fuzzy support vector machine [4], total margin SVM[5], scaledSVM[6] etc.The present invention introduces support vector machine, the mistake problem concerning study when noise being arranged or peel off data to overcome with rough set theory [7].
Summary of the invention
The objective of the invention is to propose a kind of pattern classification rcognition method based on rough support vector machine, with solve traditional support vector machine have noise sample or the sample that peels off under the mistake problem concerning study that occurs.
The mode identification method step based on rough support vector machine that the present invention proposes is: at first by known class sample training rough support vector machine, promptly seek an optimum interphase in feature space, make that the rough classification between two classes is maximum at interval.Treat the knowledge sample then, differentiate its affiliated classification with this optimum interphase.Below content of the present invention is described further:
Related notion: support vector machine sorter
If { (x i, y i), i=1,2 ..., l} is the training set that comprises l sample, wherein i sample x i∈ R dBe the eigenvector of d dimension, y i{+1 ,-1} is x to ∈ iClassification; The optimum interphase that support vector machine is sought between two classes makes the class interval maximum; For training sample is linear inseparable situation, and support vector machine arrives the more feature space Z of higher-dimension by Nonlinear Mapping φ with the input feature vector spatial mappings, two class samples can be divided, to seek the optimum interphase between two classes at this high-dimensional feature space neutral line; In high-dimensional feature space, the sample point φ (x) that is positioned on the interphase satisfies w φ (x)+b=0, w ∈ Z wherein, and b ∈ R, w and b are respectively weight vectors and side-play amount, and both have defined the interphase in the high-dimensional feature space; Certain sample x iBy decision function sgn (w φ (x i)+b) is judged to one of two classes (sgn is a sign function).In traditional v-support vector machine, be separating of following optimization problem (former problem) corresponding to interfacial w of optimum and b
min w , b , ξ , ρ 1 2 | | w | | 2 - vρ + 1 l Σ i = 1 l ξ i
Subject to y i(w φ (x i)+b) 〉=ρ-ξ i, ξ i〉=0, i=1 ..., l, ρ 〉=0, (1) is w wherein, b, ξ, ρ are optimization variable, train the width of gained class interval to be by w and ρ decision ξ iBe relaxation factor, corresponding to ξ i>0 training sample point is by sgn (w φ (x iThe sample point of the misclassification of)+b) or be positioned at two lineoid w φ (x i)+b=ρ and w φ (x iPoint in the class interval that)+b=-ρ forms is called error sample point at interval.Above-mentioned former problem can be converted to its dual problem by the introducing Lagrange multiplier and according to Karush-Kuhn-Tucker (KKT) condition [8]:
min α 1 2 Σ i = 1 l Σ j = 1 l y i y j α i α j φ ( x i ) · φ ( x j ) = min α 1 2 Σ i = 1 l Σ j = 1 l y i y j α i α j K ( x i , x j )
subjectto Σ i = 1 l y i α i = 0 , 0 ≤ α i ≤ 1 l , Σ i = 1 l α i ≥ v , - - - ( 2 )
α wherein iFor with restrictive condition y i(w φ (x i)+b) 〉=ρ-ξ iCorresponding Lagrange multiplier, K (x i, x j) the expression kernel function, directly provided the dot product φ (x in the higher dimensional space i) φ (x j).Optimum solution (the α of dual problem 1 *..., α l *) T shown that training sample is in the position in higher dimensional space: corresponding to &alpha; i * = 0 Sample point outside the class interval, satisfy y i(w φ (x i)+b)>ρ; Corresponding to &alpha; i * > 0 Sample point be called the support vector, wherein corresponding to 0 < &alpha; i * < 1 / l Sample point just be positioned on the boundary line, class interval and satisfy y i(w φ (x i)+b)=ρ, and corresponding to a i * = 1 / l Sample point be positioned at the class interval, satisfy y i(w φ (x i)+b)=ρ-ξ i, ξ wherein i>0.
After training is finished, in Classification and Identification stage, unknown sample
Figure A20071003863600059
Classification provide by following formula:
y ~ = sgn ( w * &CenterDot; &phi; ( x ~ ) + b * ) = sgn ( &Sigma; i = 1 l &alpha; i * y i K ( x ~ , x i ) + b * ) , - - - ( 3 )
Wherein
b * = - 1 2 &Sigma; i = 1 l &alpha; i * y i ( K ( x i , x j ) + K ( x i , x k ) ) , - - - ( 4 )
Wherein
j &Element; { i | &alpha; i * &Element; ( 0,1 / l ) , y i = 1 } , k &Element; { i | &alpha; i * &Element; ( 0,1 / l ) , y i = - 1 } .
Support vector machine makes the class interval maximum between two classes, makes the wrong sample number minimum of dividing simultaneously, by two contradiction targets of parameter v balance.V is the upper bound that interval error sample number accounts for the ratio of total sample number, is to support the vector number to account for the lower bound of the ratio of total sample number.
1, rough classification at interval
Rough set theory is described uncertain object by upper and lower being similar to.For certain set X among the domain U, establish R and be the relation of equivalence on the U, U/R is the equivalence class set of the R on the U, when X can be expressed as R equivalence class and the time, X is that R is definable, otherwise, X be R not definable or X be the R rough set.To rough set can use approximate and under be similar to and describe:
The R of rough set X goes up approximate: RX=∪ { Y ∈ U/R|Y  X}
The R of rough set X is approximate down: R - X = &cup; { Y &Element; U / R | Y &cap; X &NotEqual; &phi; }
The R border of rough set X R - X - R - X
According to rough set theory, the present invention defines a rough classification at interval, and this rough classification is represented parameter ρ by last coarse interval and following coarse interval at interval u, ρ lWith coarse width in the w decision be
Figure A20071003863600064
Coarse interval width is down
Figure A20071003863600065
u>ρ l).In seeking optimum interfacial process, be arranged in the sample point or the training mistake branch sample point at coarse interval down, be considered to the data that peel off, bigger value-at-risk will be given, the sample point that is positioned at outside the coarse interval can correctly be classified fully, and the sample point that is positioned at coarse border (coarse border be belong at interval but do not belong to down at interval zone) may be the sample point that mistake is divided sample point or correct classification, will give less value-at-risk.Be similar to traditional support vector machine, rough support vector machine is sought the optimum interphase between two classes in high-dimensional feature space, makes rough classification interval maximum between two classes.Like this, in determining optimum interfacial process, will there be the more sample dot information to be considered, and be not only the support vector of minority.
2, the training stage of rough support vector machine
The former problem definition of rough support vector machine is as follows:
min w , b , &xi; , &xi; &prime; , &rho; l , &rho; u 1 2 | | w | | 2 - v&rho; l - v&rho; u + 1 l &Sigma; i = 1 l &xi; i + &delta; l &Sigma; i = 1 l &xi; i &prime;
subject to y i(w·φ(x i)+b)≥ρ uii′,
0≤ξ i≤ρ ul,ξ i′≥0,ρ l≥0,ρ u≥0, (5)
δ>1 wherein.
For finding the solution this optimal problem, make up Lagrangian function:
L p = 1 2 | | w | | 2 - v&rho; l - v&rho; u + 1 l &Sigma; i = 1 l &xi; i + &delta; l &Sigma; i = 1 l &xi; i &prime; -
&Sigma; i = 1 l &alpha; i [ y i ( w &CenterDot; &phi; ( x i ) + b ) - &rho; u + &xi; i + &xi; i &prime; ] -
&Sigma; i = 1 l &beta; i &xi; i - &Sigma; i = 1 l &lambda; i ( &rho; u - &rho; l - &xi; i ) - &Sigma; i = 1 l &eta; i &xi; i &prime; - &mu; 1 &rho; l - &mu; 2 &rho; u - - - ( 6 )
α wherein i〉=0, β i〉=0, λ i〉=0, η i〉=0, μ 1〉=0, μ 2〉=0 is Lagrange multiplier.According to the KKT condition, optimized parameter satisfies following condition:
&PartialD; L p &PartialD; w = w - &Sigma; i = 1 l &alpha; i y i &phi; ( x i ) = 0 ,
&PartialD; L p &PartialD; b = &Sigma; i = 1 l &alpha; i y i = 0 ,
&PartialD; L p &PartialD; &xi; i = 1 l - &alpha; i - &beta; i + &lambda; i = 0 ,
&PartialD; L p &PartialD; &xi; i &prime; = &delta; l - &alpha; i - &eta; i = 0 ,
&PartialD; L p &PartialD; &rho; l = - v + &Sigma; i = 1 l &lambda; i - &mu; 1 = 0 ,
&PartialD; L p &PartialD; &rho; u = - v + &Sigma; i = 1 l &alpha; i - &Sigma; i = 1 l &lambda; i - &mu; 2 = 0 ,
α i[y i(w·φ(x i)+b)-ρ uii′]=0,
β iξ i=0,λ iuli)=0,η iξ i′=0,
μ 1ρ l=0,μ 2ρ u=0. (7)
With above-mentioned formula substitution formula (6), dual problem (2) can be written as
min &alpha; 1 2 &Sigma; i = 1 l &Sigma; j = 1 l &alpha; i &alpha; j y i y j K ( x i , x j )
subjectto &Sigma; i = 1 l &alpha; i y i = 0 , 0 &le; &alpha; i &le; &delta; l , &Sigma; i = 1 l &alpha; i &GreaterEqual; 2 v . - - - ( 8 )
As the optimum solution (α that obtains dual problem 1 *..., α l *) T, the position of certain training sample point in the rough classification interval is by the α of correspondence i *Value determine: sample point corresponding to:
1. &alpha; i * = 0 , Outside the rough classification interval, satisfy y i(w φ (x i)+b)>ρ u
2. 0 < &alpha; i * < 1 l , On the boundary line at last coarse interval, satisfy y i(w φ (x i)+b)=ρ u
3. &alpha; i * = 1 l , In coarse border, satisfy y i(w φ (x i)+b)=ρ ui, ξ wherein i>0;
4. 1 l < &alpha; i * < &delta; l , On the boundary line at following coarse interval, satisfy y i(w φ (x i)+b)=ρ l
5. &alpha; i * = &delta; l , In following coarse interval, be called error sample point at interval, satisfy y i(w φ (x i)+b)=ρ li', ξ wherein i'>0.
By &Sigma; i = 1 l &alpha; i = 2 v ( &rho; l > 0 ) Can get error sample at interval count out (promptly the number of support vector in the coarse interval) down be m < 2 vl &delta; , Promptly
Figure A20071003863600086
Sample number under being in the coarse interval accounts for the upper bound of total sample number ratio.When δ=1, m<2vl is for satisfying y i(w φ (x i)+b)<ρ uSample points, promptly 2v is the upper bound that sample number in the last interval accounts for total sample number ratio.Therefore parameter v and δ have controlled the number of samples in the upper and lower coarse interval and the width on coarse border together.Usually, it is 0~1 that v chooses scope, and the scope of choosing of δ is 2-10.Defined rough classification is at interval to determining that influence that optimum interphase produces as shown in Figure 2: Fig. 2 (a) and (b) are traditional support vector machine gained interphase (in (a) among v=0.1 (b) v=0.5), as seen, the data that peel off are bigger to interfacial interference, especially work as v hour (shown in Fig. 2 (a)).When adopting rough support vector machine (Fig. 2 (c), (d)), the influence of the data that peel off weakens.
3, the Classification and Identification stage
Unknown sample
Figure A20071003863600087
Classification can calculate by formula (3) equally.At this moment, in the formula (3)
b * = - 1 2 &Sigma; i = 1 l &alpha; i * y i ( K ( x i , x j ) + K ( x i , x k ) ) , - - - ( 9 )
Wherein j &Element; { i | &alpha; i * &Element; ( 0 , 1 l ) , y i = 1 } , k &Element; { i | &alpha; i * &Element; ( 0 , 1 l ) , y i = - 1 } , Perhaps j &Element; { i | &alpha; i * &Element; ( 1 l , &delta; l ) , y i = 1 } , k &Element; { i | &alpha; i * &Element; ( 1 l , &delta; l ) , y i = - 1 } .
The concrete steps of the inventive method are summarized as follows:
(5) the definition rough classification at interval: by last coarse interval and coarse time interval down, wherein go up coarse interval width and be
Figure A200710038636000813
Coarse interval width is down
Figure A200710038636000814
ρ u>ρ l
(6) maximize coarse interval with rough support vector machine and determine that optimal classification face, this optimization problem are expressed as formula (5);
(7) be the optimization problem of solution procedure (2) definition, be converted into by the dual problem of formula (8) expression and find the solution;
(8) find the solution dual problem (8) and obtain its optimum solution (α 1 *..., α l *) T, one of optimum solution of then former optimization problem (formula (5)), i.e. b *Obtain by formula (9).So far finish the training of coarse interval support vector machine;
(9) Classification and Identification, unknown sample Classification calculate the b in the formula (3) by formula (3) *Obtain by step 4.
Description of drawings
Fig. 1, optimum interphase synoptic diagram.
Fig. 2, rough classification are at interval to training interfacial effect diagram.Wherein (a) and (b) are traditional support vector machine gained interphase: parameter v=0.1 (a), (b) in parameter v=0.5, (c), (d) be rough support vector machine gained interphase: parameter v=0.1 (c), (d) middle parameter v=0.5.
Embodiment
Below with 3 benchmark medical data bases: the hepatopathy database, heart disease database and breast cancer database are example, introduce the Classification and Identification process.These 3 databases can obtain from [9].
The hepatopathy database comprises 345 samples, 200 feminine genders wherein, and 145 positives, each sample is by 6 feature descriptions.The heart disease database comprises 270 samples, 150 feminine genders wherein, and 120 positives, each sample is by 13 feature descriptions.The breast cancer database comprises 683 samples, wherein 444 optimum, 239 are pernicious, each sample is by 10 feature descriptions.All samples all normalize to [1,1].
To each experimental data base, this method adopts 5 fens cross validation methods that algorithm is tested and (is about to data set and is equally divided into 5 parts, and the ratio of two class samples in every piece of data is consistent, each with wherein 4 parts as training set, remaining 1 part as test set, successively with 5 parts all respectively as test set, the mean value of getting 5 experimental results is final experimental result).Because the final classification results of support vector machine is relevant with being provided with of parameter v and δ, for selecting suitable parameters, this method adopts 3 fens cross validation methods to determine optimized parameter v and δ (be that 2/3 data are used for training in the training sample, remaining 1/3 data are used for checking) with training sample.The range of choice of parameter v is 0.05 to 1.0, step-length 0.05; The range of choice of parameter δ is 2.0 to 15.0, step-length 1.0.Obtain the optimal value of parameter v and δ by 3 fens cross validations after, train rough support vector machine to obtain optimum interphase, with this interphase the classification of unknown data is predicted at last with this parameter value.Kernel function adopts gaussian kernel in the experiment, and its parameter σ gets 1.0.
Here provide the Classification and Identification process to the hepatopathy database as an example: the hepatopathy database comprises that 345 samples are with { (x i, y i) expression, wherein 200 negative y i=1,145 positive y i=-1, each sample is by 6 feature description: x i=[f I1, f I2..., f I6].In the training stage, construct its dual problem according to formula (8), and find the solution dual problem (because of adopting 5 fens cross validations, the sample number of each training set is 276, i.e. l=276 in the formula (8)) according to training sample, obtain optimum solution (α 1 *..., α l *) T, the back obtains side-play amount b by formula (9) *So far finish the training of coarse interval support vector machine.Then, judge its classification according to formula (3) respectively to getting sample in the test set.
Interpretation of result:
For the hepatopathy database, the correct recognition rata of traditional support vector machine is 66.96%, and the correct recognition rata of rough support vector machine is 68.41%.For the heart disease database, the correct recognition rata of traditional support vector machine is 83.70%, and the correct recognition rata of rough support vector machine is 84.81%.To the breast cancer database, the correct recognition rata of traditional support vector machine is 96.74%, and the correct recognition rata of rough support vector machine is 96.88%.(can be listed as) referring to first of experimental result among the table 1-3.Because in these 3 databases, there is no the too much data that peel off, the discrimination of rough support vector machine and the discrimination of traditional support vector machine are approaching.
For the research rough support vector machine to the elimination effect that data disturb that peels off, this implementation method adds the data that peel off artificially in 3 benchmark databases, promptly randomly-1 class sample is used as by a certain percentage+1 class sample.
The experimental result that three databases is added the data that peel off of different proportion is listed in respectively among the table 1-3, by the result as seen, when the ratio of the data that peel off rose to 30% by 10%, the correct recognition rata of traditional support vector machine obviously descended, especially for the hepatopathy database.And the performance of rough support vector machine is more stable, and the experimental result of 3 databases shows that all the correct recognition rata of rough support vector machine will be apparently higher than traditional support vector machine when existence in the sample peels off data.This has illustrated that the rough support vector machine antijamming capability is better than traditional support vector machine when existence in the training sample peels off data or noise, and promptly its popularization ability is better than traditional support vector machine.
Table 1 hepatopathy database experimental result (F represent added the ratio data that peels off)
F=0% F=10% F=20% F=30%
Tradition is supported The training set discrimination 70.92% 70.26% 63.81% 55.97%
The vector machine The test set discrimination 66.96.% 66.09% 54.78% 51.88%
Coarse support The training set discrimination 71.06% 71.58% 68.06% 64.98%
The vector machine The test set discrimination 68.41% 67.83% 64.06% 59.71%
Table 2 heart disease database experimental result
F=0% F=10% F=20% F=30%
Tradition is supported The training set discrimination 84.81% 83.61% 82.78% 79.17%
The vector machine The test set discrimination 83.70% 81.11% 78.15% 73.70%
Coarse support The training set discrimination 84.72% 84.17% 83.24% 80.09%
The vector machine The test set discrimination 84.81% 83.33% 80.00% 77.78%
Table 3 breast cancer database experimental result
F=0% F=10% F=20% F=30%
Tradition is supported The training set discrimination 97.18% 96.56% 95.75% 94.73%
The vector machine The test set discrimination 96.74% 95.56% 94.67% 92.15%
Coarse support The training set discrimination 97.22% 96.85% 96.01% 95.02%
The vector machine The test set discrimination 96.88% 96.00% 95.56% 94.07%
In sum,, make and seeking optimumly interfacial the time, have the more sample dot information to take into account adaptively, and be not only that minority is supported vector by rough set theory being introduced the support vector machine sorter.Parameter v defined by the user and δ have controlled the width of the coarse frontier district in the high-dimensional feature space jointly, and the support vector number in the upper and lower coarse interval.And realize the identical of calculated amount that rough support vector machine is required and traditional support vector machine.
List of references
1.C.Cortes and V.Vapnik,“Support-vector networks,”Mach.Learn.20(1995)273-297.
2.B.Schlkopf,A.J.Smola,R.C.Williamson and P.L.Bartlett,“New support vectoralgorithms,”Neural Computation 12(2000)1207-1245.
3.X.G.Zhang,“Using class-center vectors to build support vector machines,”Proc.IEEENNSP IX,Madison,WI,USA,Aug.1999,pp.3-11.
4.C.F.Lin and S.D.Wang,“Fuzzy support vector machine,”IEEE Trans.Neural Netw.13(2002)464-471.
5.M.Yoon,Y.Yun and H.Nakayama,“A role of total margin in support vector machines,”Proc.of the Int.Joint Conf.Neural Networks,Piscataway,NJ,USA,July 2003,pp.2049-2053.
6.J.Feng and P.Williams,“The generalization error of the symmetric and scaled supportvector machines,”IEEE Trans.Neural Netw.12(2001)1255-1260.
7.Z.Pawlak,“Rough sets,”Int.J.Comput.Inform.Sci.11(1982)341-356.
8.W.Karush,“Minima of functions of several variables with inequalities as side constraints,”Master’s Thesis,Department of Mathematics,University of Chicago 1939.
9.D.J.Newman,S.Hettich,C.L.Blake and C.J.Merz,UCI repository of machine learningdatabases,Irvine,CA:University of California,Department of Information and ComputerScience(1998).[http://www.ics.uci.edu/~mlearn/MLRepository.html].

Claims (1)

1, a kind of pattern classification rcognition method based on rough support vector machine.If { (x i, y i), i=1,2 ..., l} is the training set that comprises l sample, wherein i sample x i∈ R dBe the eigenvector of d dimension, y i{+1 ,-1} is x to ∈ iClassification; The optimum interphase that support vector machine is sought between two classes makes the class interval maximum; For training sample is linear inseparable situation, and support vector machine arrives the more feature space Z of higher-dimension by Nonlinear Mapping φ with the input feature vector spatial mappings, two class samples can be divided, to seek the optimum interphase between two classes at this high-dimensional feature space neutral line; In high-dimensional feature space, the sample point φ (x) that is positioned on the interphase satisfies w φ (x)+b=0, w ∈ Z wherein, and b ∈ R, w and b are respectively weight vectors and side-play amount, and both have defined the interphase in the high-dimensional feature space; Certain sample x iBy decision function sgn (w φ (x i)+b) is judged to one of two classes; The concrete steps that it is characterized in that this method are as follows:
(1) the definition rough classification at interval: by last coarse interval and coarse time interval down, wherein go up coarse interval width and be
Figure A2007100386360002C1
Coarse interval width is down
Figure A2007100386360002C2
ρ u>ρ l
(2) maximize coarse interval with rough support vector machine and determine that optimal classification face, this optimization problem are expressed as formula (5);
min w , b , &xi; , &xi; &prime; , &rho; l , &rho; u 1 2 | | w | | 2 - v &rho; l - v &rho; u + 1 l &Sigma; i = 1 l &xi; i + &delta; l &Sigma; i = 1 l &xi; i &prime;
subject to y i(w·φ(x i)+b)≥ρ uii′,
0≤ξ i≤ρ ul,ξ l′≥0,ρ l≥0,ρ u≥0, (5)
δ>1 wherein;
(3) be the optimization problem of solution procedure (2) definition, be converted into by the dual problem of formula (8) expression and find the solution;
min &alpha; 1 2 &Sigma; i = 1 l &Sigma; j = 1 l &alpha; i &alpha; j y i y j K ( x i , x j )
subjectto &Sigma; i = 1 l &alpha; i y i = 0,0 &le; &alpha; i &le; &delta; l , &Sigma; l = 1 l &alpha; i &GreaterEqual; 2 v , - - - ( 8 )
Wherein, the scope of choosing of v is 0~1, and the scope of choosing of δ is 2~10;
(4) find the solution dual problem (8) and obtain its optimum solution (α 1 *..., α l *) T, and by formula (9) calculating b *:
b * = - 1 2 &Sigma; l = 1 l &alpha; i * y i ( K ( x i , x j ) + K ( x i , x k ) ) , - - - ( 9 )
Wherein j &Element; { i | &alpha; i * &Element; ( 0 , 1 l ) , y i = 1 } , k &Element; { i | &alpha; i * &Element; ( 0 , 1 l ) , y i = - 1 } , Perhaps j &Element; { i | &alpha; i * &Element; ( 1 l , &delta; l ) , y i = 1 } ,
k &Element; { i | &alpha; i * &Element; ( 1 l , &delta; l ) , y i = - 1 } ,
So far finish the support vector machine training of coarse interval;
(5) Classification and Identification, unknown sample
Figure A2007100386360003C1
Classification calculate by formula (3):
y ~ = sgn ( w * &CenterDot; &phi; ( x ~ ) + b * ) = sgn ( &Sigma; l = 1 l &alpha; i * y i K ( x ~ , x i ) + b * ) , - - - ( 3 )
1 *..., α l *) TAnd b *Try to achieve by step (4), wherein, K (x i, x j)=φ (x i) φ (x j).
CNA2007100386365A 2007-03-29 2007-03-29 Pattern classification rcognition method based on rough support vector machine Pending CN101025729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2007100386365A CN101025729A (en) 2007-03-29 2007-03-29 Pattern classification rcognition method based on rough support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2007100386365A CN101025729A (en) 2007-03-29 2007-03-29 Pattern classification rcognition method based on rough support vector machine

Publications (1)

Publication Number Publication Date
CN101025729A true CN101025729A (en) 2007-08-29

Family

ID=38744040

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2007100386365A Pending CN101025729A (en) 2007-03-29 2007-03-29 Pattern classification rcognition method based on rough support vector machine

Country Status (1)

Country Link
CN (1) CN101025729A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262682A (en) * 2011-08-19 2011-11-30 上海应用技术学院 Rapid attribute reduction method based on rough classification knowledge discovery
CN102799902A (en) * 2012-08-13 2012-11-28 南京师范大学 Enhanced relationship classifier based on representative samples
CN103577690A (en) * 2013-10-29 2014-02-12 西安电子科技大学 Sparse nonparametric body area channel probability representation method
CN106874900A (en) * 2017-04-26 2017-06-20 桂林电子科技大学 A kind of tired driver detection method and detection means based on steering wheel image
CN107786514A (en) * 2016-08-29 2018-03-09 中国电信股份有限公司 Network attack method for early warning and device
CN108414228A (en) * 2018-03-20 2018-08-17 哈尔滨理工大学 Based on averagely more granularity decision rough sets and NNBC Method for Bearing Fault Diagnosis

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262682A (en) * 2011-08-19 2011-11-30 上海应用技术学院 Rapid attribute reduction method based on rough classification knowledge discovery
CN102262682B (en) * 2011-08-19 2016-01-20 上海应用技术学院 Based on the rapid attribute reduction of rough classification knowledge discovery
CN102799902A (en) * 2012-08-13 2012-11-28 南京师范大学 Enhanced relationship classifier based on representative samples
CN103577690A (en) * 2013-10-29 2014-02-12 西安电子科技大学 Sparse nonparametric body area channel probability representation method
CN107786514A (en) * 2016-08-29 2018-03-09 中国电信股份有限公司 Network attack method for early warning and device
CN107786514B (en) * 2016-08-29 2020-04-28 中国电信股份有限公司 Network attack early warning method and device
CN106874900A (en) * 2017-04-26 2017-06-20 桂林电子科技大学 A kind of tired driver detection method and detection means based on steering wheel image
CN108414228A (en) * 2018-03-20 2018-08-17 哈尔滨理工大学 Based on averagely more granularity decision rough sets and NNBC Method for Bearing Fault Diagnosis

Similar Documents

Publication Publication Date Title
CN105446484B (en) A kind of electromyography signal gesture identification method based on Hidden Markov Model
Bai et al. Integrating Fuzzy C-Means and TOPSIS for performance evaluation: An application and comparative analysis
Harahap et al. Implementation of Naïve Bayes classification method for predicting purchase
Wu et al. A patent quality analysis and classification system using self-organizing maps with support vector machine
Cadenas et al. Feature subset selection filter–wrapper based on low quality data
CN101025729A (en) Pattern classification rcognition method based on rough support vector machine
Boulesteix et al. Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction
González et al. Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity
CN106295694A (en) A kind of face identification method of iteration weight set of constraints rarefaction representation classification
Jumutc et al. Ranking-based kernels in applied biomedical diagnostics using a support vector machine
KR20090060359A (en) Two-class classifying/predicting model making method, classifying/predicting model making program, and two-class classifying/predicting model making device
CN106529576A (en) Piano score difficulty recognition algorithm based on improved measure learning support vector machine
CN101681448A (en) Information processing device, information processing method, and program
Steinley et al. K-means clustering and mixture model clustering: Reply to McLachlan (2011) and Vermunt (2011).
Li et al. Feature selection via least squares support feature machine
CN102945238A (en) Fuzzy ISODATA (interactive self-organizing data) based feature selection method
Pai et al. Analyzing foreign exchange rates by rough set theory and directed acyclic graph support vector machines
CN102609733B (en) Fast face recognition method in application environment of massive face database
Songsiri et al. Universum selection for boosting the performance of multiclass support vector machines based on one-versus-one strategy
Razavi Hajiagha et al. Fuzzy C-means based data envelopment analysis for mitigating the impact of units’ heterogeneity
Wagner Latent representations of transaction network graphs in continuous vector spaces as features for money laundering detection
Liu A framework of data mining application process for credit scoring
CN106355198A (en) Method for acquiring fuzzy support vector machine membership function
Lu et al. Cancer classification through filtering progressive transductive support vector machine based on gene expression data
Danganan et al. eHMCOKE: an enhanced overlapping clustering algorithm for data analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20070829