CN101025729A

CN101025729A - Pattern classification rcognition method based on rough support vector machine

Info

Publication number: CN101025729A
Application number: CNA2007100386365A
Authority: CN
Inventors: 汪源源; 张俊华
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2007-03-29
Filing date: 2007-03-29
Publication date: 2007-08-29

Abstract

The invention belongs to the mode classifying and identifying technical field, concretely a support vector machine (SVM) -based mode classifying and identifying method. And the invention introduces rough set theory into SVM and defines SVM classifying interval to be a rough set classifying interval, and when training SVM, makes the rough set interval maximum to determine the optimum interface between two classes. And the invention solves the over-learning problem of the traditional SVM on the condition of noises or outliers, so as to improve SVM spreading ability. And the needed calculating time is equivalent to that of the traditional SVM.

Description

Pattern classification rcognition method based on rough support vector machine

Technical field

The invention belongs to pattern classification recognition technology field, be specifically related to a kind of pattern classification rcognition method based on support vector machine (support vectormachine SVM).

Technical background

The purpose of pattern-recognition is that object is classified, and its application is very extensive, for example, and computer-aided diagnosis, character recognition, speech recognition or the like.Support vector machine [1] [2] is a kind of method of the pattern-recognition based on Statistical Learning Theory.Traditional mode identification method based on Statistical Learning Theory is that rule of thumb the risk minimum is come training classifier as the sorter based on Bayesian decision theory.But empiric risk has only as the infinite great talent of sample number and levels off to expected risk.Therefore, the sorter based on the empiric risk minimum can not guarantee higher popularization ability.If number of training is less, may be relatively poor based on the sorter performance of empiric risk minimum.Support vector machine is based on the minimum principle of structure risk, in feature space, seek an optimum interphase, make that this interphase can be as much as possible with correct the separating of two class data points, make two class data points separately simultaneously farthest, promptly find the equilibrium point between empiric risk minimum and the popularization ability maximum apart from interphase.Existing result of study shows that to less training sample set, support vector machine is that performance is best in the various sorters.

Because traditional optimum interphase that support vector machine obtained only depends on a spot of support vector, when having noise in the training sample or peel off data, problem concerning study [3] still may appear in traditional support vector machine.At this problem, this year improving one's methods of many support vector machines appearred, as fuzzy support vector machine [4], total margin SVM[5], scaledSVM[6] etc.The present invention introduces support vector machine, the mistake problem concerning study when noise being arranged or peel off data to overcome with rough set theory [7].

Summary of the invention

The objective of the invention is to propose a kind of pattern classification rcognition method based on rough support vector machine, with solve traditional support vector machine have noise sample or the sample that peels off under the mistake problem concerning study that occurs.

The mode identification method step based on rough support vector machine that the present invention proposes is: at first by known class sample training rough support vector machine, promptly seek an optimum interphase in feature space, make that the rough classification between two classes is maximum at interval.Treat the knowledge sample then, differentiate its affiliated classification with this optimum interphase.Below content of the present invention is described further:

Related notion: support vector machine sorter

If { (x _i, y _i), i=1,2 ..., l} is the training set that comprises l sample, wherein i sample x _i∈ R ^dBe the eigenvector of d dimension, y _i{+1 ,-1} is x to ∈ _iClassification; The optimum interphase that support vector machine is sought between two classes makes the class interval maximum; For training sample is linear inseparable situation, and support vector machine arrives the more feature space Z of higher-dimension by Nonlinear Mapping φ with the input feature vector spatial mappings, two class samples can be divided, to seek the optimum interphase between two classes at this high-dimensional feature space neutral line; In high-dimensional feature space, the sample point φ (x) that is positioned on the interphase satisfies w φ (x)+b=0, w ∈ Z wherein, and b ∈ R, w and b are respectively weight vectors and side-play amount, and both have defined the interphase in the high-dimensional feature space; Certain sample x _iBy decision function sgn (w φ (x _i)+b) is judged to one of two classes (sgn is a sign function).In traditional v-support vector machine, be separating of following optimization problem (former problem) corresponding to interfacial w of optimum and b

\min_{w, b, ξ, ρ} \frac{1}{2} {| | w | |}^{2} - vρ + \frac{1}{l} Σ_{i = 1}^{l} ξ_{i}

Subject to y _i(w φ (x _i)+b) 〉=ρ-ξ _i, ξ _i〉=0, i=1 ..., l, ρ 〉=0, (1) is w wherein, b, ξ, ρ are optimization variable, train the width of gained class interval to be by w and ρ decision ξ _iBe relaxation factor, corresponding to ξ _i＞0 training sample point is by sgn (w φ (x _iThe sample point of the misclassification of)+b) or be positioned at two lineoid w φ (x _i)+b=ρ and w φ (x _iPoint in the class interval that)+b=-ρ forms is called error sample point at interval.Above-mentioned former problem can be converted to its dual problem by the introducing Lagrange multiplier and according to Karush-Kuhn-Tucker (KKT) condition [8]:

\min_{α} \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} y_{i} y_{j} α_{i} α_{j} φ (x_{i}) \cdot φ (x_{j}) = \min_{α} \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} y_{i} y_{j} α_{i} α_{j} K (x_{i}, x_{j})

subjectto Σ_{i = 1}^{l} y_{i} α_{i} = 0, 0 \leq α_{i} \leq \frac{1}{l}, Σ_{i = 1}^{l} α_{i} &GreaterEqual; v, - - - (2)

α wherein _iFor with restrictive condition y _i(w φ (x _i)+b) 〉=ρ-ξ _iCorresponding Lagrange multiplier, K (x _i, x _j) the expression kernel function, directly provided the dot product φ (x in the higher dimensional space _i) φ (x _j).Optimum solution (the α of dual problem ₁ ^*..., α _l ^*) T shown that training sample is in the position in higher dimensional space: corresponding to

α_{i}^{*} = 0

Sample point outside the class interval, satisfy y _i(w φ (x _i)+b)＞ρ; Corresponding to

α_{i}^{*} > 0

Sample point be called the support vector, wherein corresponding to

0 < α_{i}^{*} < 1 / l

Sample point just be positioned on the boundary line, class interval and satisfy y _i(w φ (x _i)+b)=ρ, and corresponding to

a_{i}^{*} = 1 / l

Sample point be positioned at the class interval, satisfy y _i(w φ (x _i)+b)=ρ-ξ _i, ξ wherein _i＞0.

After training is finished, in Classification and Identification stage, unknown sample

Classification provide by following formula:

\tilde{y} = sgn (w^{*} \cdot φ (\tilde{x}) + b^{*}) = sgn (Σ_{i = 1}^{l} α_{i}^{*} y_{i} K (\tilde{x}, x_{i}) + b^{*}), - - - (3)

Wherein

b^{*} = - \frac{1}{2} Σ_{i = 1}^{l} α_{i}^{*} y_{i} (K (x_{i}, x_{j}) + K (x_{i}, x_{k})), - - - (4)

Wherein

j &Element; {i | α_{i}^{*} &Element; (0,1 / l), y_{i} = 1},

k &Element; {i | α_{i}^{*} &Element; (0,1 / l), y_{i} = - 1} .

Support vector machine makes the class interval maximum between two classes, makes the wrong sample number minimum of dividing simultaneously, by two contradiction targets of parameter v balance.V is the upper bound that interval error sample number accounts for the ratio of total sample number, is to support the vector number to account for the lower bound of the ratio of total sample number.

1, rough classification at interval

Rough set theory is described uncertain object by upper and lower being similar to.For certain set X among the domain U, establish R and be the relation of equivalence on the U, U/R is the equivalence class set of the R on the U, when X can be expressed as R equivalence class and the time, X is that R is definable, otherwise, X be R not definable or X be the R rough set.To rough set can use approximate and under be similar to and describe:

The R of rough set X goes up approximate: RX=∪ { Y ∈ U/R|Y  X}

The R of rough set X is approximate down:

\bar{R} X = \cup {Y &Element; U / R | Y \cap X &NotEqual; φ}

The R border of rough set X

\bar{R} X - \underline{R} X

According to rough set theory, the present invention defines a rough classification at interval, and this rough classification is represented parameter ρ by last coarse interval and following coarse interval at interval _u, ρ _lWith coarse width in the w decision be

Coarse interval width is down

(ρ _u＞ρ _l).In seeking optimum interfacial process, be arranged in the sample point or the training mistake branch sample point at coarse interval down, be considered to the data that peel off, bigger value-at-risk will be given, the sample point that is positioned at outside the coarse interval can correctly be classified fully, and the sample point that is positioned at coarse border (coarse border be belong at interval but do not belong to down at interval zone) may be the sample point that mistake is divided sample point or correct classification, will give less value-at-risk.Be similar to traditional support vector machine, rough support vector machine is sought the optimum interphase between two classes in high-dimensional feature space, makes rough classification interval maximum between two classes.Like this, in determining optimum interfacial process, will there be the more sample dot information to be considered, and be not only the support vector of minority.

2, the training stage of rough support vector machine

The former problem definition of rough support vector machine is as follows:

\min_{w, b, ξ, ξ^{'}, ρ_{l}, ρ_{u}} \frac{1}{2} {| | w | |}^{2} - {vρ}_{l} - {vρ}_{u} + \frac{1}{l} Σ_{i = 1}^{l} ξ_{i} + \frac{δ}{l} Σ_{i = 1}^{l} ξ_{i}^{'}

subject to y _i(w·φ(x _i)+b)≥ρ _u-ξ _i-ξ _i′，

0≤ξ _i≤ρ _u-ρ _l，ξ _i′≥0，ρ _l≥0，ρ _u≥0， (5)

δ＞1 wherein.

For finding the solution this optimal problem, make up Lagrangian function:

L_{p} = \frac{1}{2} {| | w | |}^{2} - {vρ}_{l} - {vρ}_{u} + \frac{1}{l} Σ_{i = 1}^{l} ξ_{i} + \frac{δ}{l} Σ_{i = 1}^{l} ξ_{i}^{'} -

Σ_{i = 1}^{l} α_{i} [y_{i} (w \cdot φ (x_{i}) + b) - ρ_{u} + ξ_{i} + ξ_{i}^{'}] -

Σ_{i = 1}^{l} β_{i} ξ_{i} - Σ_{i = 1}^{l} λ_{i} (ρ_{u} - ρ_{l} - ξ_{i}) - Σ_{i = 1}^{l} η_{i} ξ_{i}^{'} - μ_{1} ρ_{l} - μ_{2} ρ_{u} - - - (6)

α wherein _i〉=0, β _i〉=0, λ _i〉=0, η _i〉=0, μ ₁〉=0, μ ₂〉=0 is Lagrange multiplier.According to the KKT condition, optimized parameter satisfies following condition:

\frac{{&PartialD; L}_{p}}{&PartialD; w} = w - Σ_{i = 1}^{l} α_{i} y_{i} φ (x_{i}) = 0,

\frac{{&PartialD; L}_{p}}{&PartialD; b} = Σ_{i = 1}^{l} α_{i} y_{i} = 0,

\frac{{&PartialD; L}_{p}}{{&PartialD; ξ}_{i}} = \frac{1}{l} - α_{i} - β_{i} + λ_{i} = 0,

\frac{{&PartialD; L}_{p}}{{&PartialD; ξ}_{i}^{'}} = \frac{δ}{l} - α_{i} - η_{i} = 0,

\frac{{&PartialD; L}_{p}}{{&PartialD; ρ}_{l}} = - v + Σ_{i = 1}^{l} λ_{i} - μ_{1} = 0,

\frac{{&PartialD; L}_{p}}{{&PartialD; ρ}_{u}} = - v + Σ_{i = 1}^{l} α_{i} - Σ_{i = 1}^{l} λ_{i} - μ_{2} = 0,

α _i[y _i(w·φ(x _i)+b)-ρ _u+ξ _i+ξ _i′]＝0，

β _iξ _i＝0，λ _i(ρ _u-ρ _l-ξ _i)＝0，η _iξ _i′＝0，

μ ₁ρ _l＝0，μ ₂ρ _u＝0. (7)

With above-mentioned formula substitution formula (6), dual problem (2) can be written as

\min_{α} \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j})

subjectto Σ_{i = 1}^{l} α_{i} y_{i} = 0,

0 \leq α_{i} \leq \frac{δ}{l},

Σ_{i = 1}^{l} α_{i} &GreaterEqual; 2 v . - - - (8)

As the optimum solution (α that obtains dual problem ₁ ^*..., α _l ^*) ^T, the position of certain training sample point in the rough classification interval is by the α of correspondence _i ^*Value determine: sample point corresponding to:

1.

α_{i}^{*} = 0,

Outside the rough classification interval, satisfy y _i(w φ (x _i)+b)＞ρ _u

2.

0 < α_{i}^{*} < \frac{1}{l},

On the boundary line at last coarse interval, satisfy y _i(w φ (x _i)+b)=ρ _u

3.

α_{i}^{*} = \frac{1}{l},

In coarse border, satisfy y _i(w φ (x _i)+b)=ρ _u-ξ _i, ξ wherein _i＞0;

4.

\frac{1}{l} < α_{i}^{*} < \frac{δ}{l},

On the boundary line at following coarse interval, satisfy y _i(w φ (x _i)+b)=ρ _l

5.

α_{i}^{*} = \frac{δ}{l},

In following coarse interval, be called error sample point at interval, satisfy y _i(w φ (x _i)+b)=ρ _l-ξ _i', ξ wherein _i'＞0.

By

Σ_{i = 1}^{l} α_{i} = 2 v (ρ_{l} > 0)

Can get error sample at interval count out (promptly the number of support vector in the coarse interval) down be

m < \frac{2 vl}{δ},

Promptly

Sample number under being in the coarse interval accounts for the upper bound of total sample number ratio.When δ=1, m＜2vl is for satisfying y _i(w φ (x _i)+b)＜ρ _uSample points, promptly 2v is the upper bound that sample number in the last interval accounts for total sample number ratio.Therefore parameter v and δ have controlled the number of samples in the upper and lower coarse interval and the width on coarse border together.Usually, it is 0～1 that v chooses scope, and the scope of choosing of δ is 2-10.Defined rough classification is at interval to determining that influence that optimum interphase produces as shown in Figure 2: Fig. 2 (a) and (b) are traditional support vector machine gained interphase (in (a) among v=0.1 (b) v=0.5), as seen, the data that peel off are bigger to interfacial interference, especially work as v hour (shown in Fig. 2 (a)).When adopting rough support vector machine (Fig. 2 (c), (d)), the influence of the data that peel off weakens.

3, the Classification and Identification stage

Unknown sample

Classification can calculate by formula (3) equally.At this moment, in the formula (3)

b^{*} = - \frac{1}{2} Σ_{i = 1}^{l} α_{i}^{*} y_{i} (K (x_{i}, x_{j}) + K (x_{i}, x_{k})), - - - (9)

Wherein

j &Element; {i | α_{i}^{*} &Element; (0, \frac{1}{l}), y_{i} = 1},

k &Element; {i | α_{i}^{*} &Element; (0, \frac{1}{l}), y_{i} = - 1},

Perhaps

j &Element; {i | α_{i}^{*} &Element; (\frac{1}{l}, \frac{δ}{l}), y_{i} = 1},

k &Element; {i | α_{i}^{*} &Element; (\frac{1}{l}, \frac{δ}{l}), y_{i} = - 1} .

The concrete steps of the inventive method are summarized as follows:

(5) the definition rough classification at interval: by last coarse interval and coarse time interval down, wherein go up coarse interval width and be

Coarse interval width is down

ρ _u＞ρ _l

(6) maximize coarse interval with rough support vector machine and determine that optimal classification face, this optimization problem are expressed as formula (5);

(7) be the optimization problem of solution procedure (2) definition, be converted into by the dual problem of formula (8) expression and find the solution;

(8) find the solution dual problem (8) and obtain its optimum solution (α ₁ ^*..., α _l ^*) ^T, one of optimum solution of then former optimization problem (formula (5)), i.e. b ^*Obtain by formula (9).So far finish the training of coarse interval support vector machine;

(9) Classification and Identification, unknown sample Classification calculate the b in the formula (3) by formula (3) ^*Obtain by step 4.

Description of drawings

Fig. 1, optimum interphase synoptic diagram.

Fig. 2, rough classification are at interval to training interfacial effect diagram.Wherein (a) and (b) are traditional support vector machine gained interphase: parameter v=0.1 (a), (b) in parameter v=0.5, (c), (d) be rough support vector machine gained interphase: parameter v=0.1 (c), (d) middle parameter v=0.5.

Embodiment

Below with 3 benchmark medical data bases: the hepatopathy database, heart disease database and breast cancer database are example, introduce the Classification and Identification process.These 3 databases can obtain from [9].

The hepatopathy database comprises 345 samples, 200 feminine genders wherein, and 145 positives, each sample is by 6 feature descriptions.The heart disease database comprises 270 samples, 150 feminine genders wherein, and 120 positives, each sample is by 13 feature descriptions.The breast cancer database comprises 683 samples, wherein 444 optimum, 239 are pernicious, each sample is by 10 feature descriptions.All samples all normalize to [1,1].

To each experimental data base, this method adopts 5 fens cross validation methods that algorithm is tested and (is about to data set and is equally divided into 5 parts, and the ratio of two class samples in every piece of data is consistent, each with wherein 4 parts as training set, remaining 1 part as test set, successively with 5 parts all respectively as test set, the mean value of getting 5 experimental results is final experimental result).Because the final classification results of support vector machine is relevant with being provided with of parameter v and δ, for selecting suitable parameters, this method adopts 3 fens cross validation methods to determine optimized parameter v and δ (be that 2/3 data are used for training in the training sample, remaining 1/3 data are used for checking) with training sample.The range of choice of parameter v is 0.05 to 1.0, step-length 0.05; The range of choice of parameter δ is 2.0 to 15.0, step-length 1.0.Obtain the optimal value of parameter v and δ by 3 fens cross validations after, train rough support vector machine to obtain optimum interphase, with this interphase the classification of unknown data is predicted at last with this parameter value.Kernel function adopts gaussian kernel in the experiment, and its parameter σ gets 1.0.

Here provide the Classification and Identification process to the hepatopathy database as an example: the hepatopathy database comprises that 345 samples are with { (x _i, y _i) expression, wherein 200 negative y _i=1,145 positive y _i=-1, each sample is by 6 feature description: x _i=[f _I1, f _I2..., f _I6].In the training stage, construct its dual problem according to formula (8), and find the solution dual problem (because of adopting 5 fens cross validations, the sample number of each training set is 276, i.e. l=276 in the formula (8)) according to training sample, obtain optimum solution (α ₁ ^*..., α _l ^*) ^T, the back obtains side-play amount b by formula (9) ^*So far finish the training of coarse interval support vector machine.Then, judge its classification according to formula (3) respectively to getting sample in the test set.

Interpretation of result:

For the hepatopathy database, the correct recognition rata of traditional support vector machine is 66.96%, and the correct recognition rata of rough support vector machine is 68.41%.For the heart disease database, the correct recognition rata of traditional support vector machine is 83.70%, and the correct recognition rata of rough support vector machine is 84.81%.To the breast cancer database, the correct recognition rata of traditional support vector machine is 96.74%, and the correct recognition rata of rough support vector machine is 96.88%.(can be listed as) referring to first of experimental result among the table 1-3.Because in these 3 databases, there is no the too much data that peel off, the discrimination of rough support vector machine and the discrimination of traditional support vector machine are approaching.

For the research rough support vector machine to the elimination effect that data disturb that peels off, this implementation method adds the data that peel off artificially in 3 benchmark databases, promptly randomly-1 class sample is used as by a certain percentage+1 class sample.

The experimental result that three databases is added the data that peel off of different proportion is listed in respectively among the table 1-3, by the result as seen, when the ratio of the data that peel off rose to 30% by 10%, the correct recognition rata of traditional support vector machine obviously descended, especially for the hepatopathy database.And the performance of rough support vector machine is more stable, and the experimental result of 3 databases shows that all the correct recognition rata of rough support vector machine will be apparently higher than traditional support vector machine when existence in the sample peels off data.This has illustrated that the rough support vector machine antijamming capability is better than traditional support vector machine when existence in the training sample peels off data or noise, and promptly its popularization ability is better than traditional support vector machine.

Table 1 hepatopathy database experimental result (F represent added the ratio data that peels off)

		F＝0％	F＝10％	F＝20％	F＝30％
		F＝0％	F＝10％	F＝20％	F＝30％	Tradition is supported	The training set discrimination	70.92％	70.26％	63.81％	55.97％
The vector machine	The test set discrimination	66.96.％	66.09％	54.78％	51.88％	Tradition is supported	The training set discrimination	70.92％	70.26％	63.81％	55.97％
The vector machine	The test set discrimination	66.96.％	66.09％	54.78％	51.88％	Coarse support	The training set discrimination	71.06％	71.58％	68.06％	64.98％
The vector machine	The test set discrimination	68.41％	67.83％	64.06％	59.71％	Coarse support	The training set discrimination	71.06％	71.58％	68.06％	64.98％

Table 2 heart disease database experimental result

		F＝0％	F＝10％	F＝20％	F＝30％
		F＝0％	F＝10％	F＝20％	F＝30％	Tradition is supported	The training set discrimination	84.81％	83.61％	82.78％	79.17％
The vector machine	The test set discrimination	83.70％	81.11％	78.15％	73.70％	Tradition is supported	The training set discrimination	84.81％	83.61％	82.78％	79.17％
The vector machine	The test set discrimination	83.70％	81.11％	78.15％	73.70％	Coarse support	The training set discrimination	84.72％	84.17％	83.24％	80.09％
The vector machine	The test set discrimination	84.81％	83.33％	80.00％	77.78％	Coarse support	The training set discrimination	84.72％	84.17％	83.24％	80.09％

Table 3 breast cancer database experimental result

		F＝0％	F＝10％	F＝20％	F＝30％
		F＝0％	F＝10％	F＝20％	F＝30％	Tradition is supported	The training set discrimination	97.18％	96.56％	95.75％	94.73％

The vector machine	The test set discrimination	96.74％	95.56％	94.67％	92.15％
The vector machine	The test set discrimination	96.74％	95.56％	94.67％	92.15％	Coarse support	The training set discrimination	97.22％	96.85％	96.01％	95.02％
The vector machine	The test set discrimination	96.88％	96.00％	95.56％	94.07％	Coarse support	The training set discrimination	97.22％	96.85％	96.01％	95.02％

In sum,, make and seeking optimumly interfacial the time, have the more sample dot information to take into account adaptively, and be not only that minority is supported vector by rough set theory being introduced the support vector machine sorter.Parameter v defined by the user and δ have controlled the width of the coarse frontier district in the high-dimensional feature space jointly, and the support vector number in the upper and lower coarse interval.And realize the identical of calculated amount that rough support vector machine is required and traditional support vector machine.

List of references

1.C.Cortes and V.Vapnik，“Support-vector networks，”Mach.Learn.20(1995)273-297.

2.B.Schlkopf，A.J.Smola，R.C.Williamson and P.L.Bartlett，“New support vectoralgorithms，”Neural Computation 12(2000)1207-1245.

3.X.G.Zhang，“Using class-center vectors to build support vector machines，”Proc.IEEENNSP IX，Madison，WI，USA，Aug.1999，pp.3-11.

4.C.F.Lin and S.D.Wang，“Fuzzy support vector machine，”IEEE Trans.Neural Netw.13(2002)464-471.

5.M.Yoon，Y.Yun and H.Nakayama，“A role of total margin in support vector machines，”Proc.of the Int.Joint Conf.Neural Networks，Piscataway，NJ，USA，July 2003，pp.2049-2053.

6.J.Feng and P.Williams，“The generalization error of the symmetric and scaled supportvector machines，”IEEE Trans.Neural Netw.12(2001)1255-1260.

7.Z.Pawlak，“Rough sets，”Int.J.Comput.Inform.Sci.11(1982)341-356.

8.W.Karush，“Minima of functions of several variables with inequalities as side constraints，”Master’s Thesis，Department of Mathematics，University of Chicago 1939.

9.D.J.Newman，S.Hettich，C.L.Blake and C.J.Merz，UCI repository of machine learningdatabases，Irvine，CA：University of California，Department of Information and ComputerScience(1998).[http://www.ics.uci.edu/～mlearn/MLRepository.html].

Claims

1, a kind of pattern classification rcognition method based on rough support vector machine.If { (x _i, y _i), i=1,2 ..., l} is the training set that comprises l sample, wherein i sample x _i∈ R ^dBe the eigenvector of d dimension, y _i{+1 ,-1} is x to ∈ _iClassification; The optimum interphase that support vector machine is sought between two classes makes the class interval maximum; For training sample is linear inseparable situation, and support vector machine arrives the more feature space Z of higher-dimension by Nonlinear Mapping φ with the input feature vector spatial mappings, two class samples can be divided, to seek the optimum interphase between two classes at this high-dimensional feature space neutral line; In high-dimensional feature space, the sample point φ (x) that is positioned on the interphase satisfies w φ (x)+b=0, w ∈ Z wherein, and b ∈ R, w and b are respectively weight vectors and side-play amount, and both have defined the interphase in the high-dimensional feature space; Certain sample x _iBy decision function sgn (w φ (x _i)+b) is judged to one of two classes; The concrete steps that it is characterized in that this method are as follows:

(1) the definition rough classification at interval: by last coarse interval and coarse time interval down, wherein go up coarse interval width and be

Coarse interval width is down

ρ _u＞ρ _l

(2) maximize coarse interval with rough support vector machine and determine that optimal classification face, this optimization problem are expressed as formula (5);

\min_{w, b, ξ, ξ^{'}, ρ_{l}, ρ_{u}} \frac{1}{2} | | w | |^{2} - v ρ_{l} - v ρ_{u} + \frac{1}{l} Σ_{i = 1}^{l} ξ_{i} + \frac{δ}{l} Σ_{i = 1}^{l} ξ_{i}^{'}

subject to y _i(w·φ(x _i)+b)≥ρ _u-ξ _i-ξ _i′，

0≤ξ _i≤ρ _u-ρ _l，ξ _l′≥0，ρ _l≥0，ρ _u≥0， (5)

δ＞1 wherein;

(3) be the optimization problem of solution procedure (2) definition, be converted into by the dual problem of formula (8) expression and find the solution;

\min_{α} \frac{1}{2} Σ_{i = 1}^{l} Σ_{j = 1}^{l} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j})

subjectto Σ_{i = 1}^{l} α_{i} y_{i} = 0,0 \leq α_{i} \leq \frac{δ}{l}, Σ_{l = 1}^{l} α_{i} &GreaterEqual; 2 v, - - - (8)

Wherein, the scope of choosing of v is 0～1, and the scope of choosing of δ is 2～10;

(4) find the solution dual problem (8) and obtain its optimum solution (α ₁ ^*..., α _l ^*) ^T, and by formula (9) calculating b ^*:

b^{*} = - \frac{1}{2} Σ_{l = 1}^{l} α_{i}^{*} y_{i} (K (x_{i}, x_{j}) + K (x_{i}, x_{k})), - - - (9)

Wherein

j &Element; {i | α_{i}^{*} &Element; (0, \frac{1}{l}), y_{i} = 1},

k &Element; {i | α_{i}^{*} &Element; (0, \frac{1}{l}), y_{i} = - 1},

Perhaps

j &Element; {i | α_{i}^{*} &Element; (\frac{1}{l}, \frac{δ}{l}), y_{i} = 1},

k &Element; {i | α_{i}^{*} &Element; (\frac{1}{l}, \frac{δ}{l}), y_{i} = - 1},

So far finish the support vector machine training of coarse interval;

(5) Classification and Identification, unknown sample

Classification calculate by formula (3):

\tilde{y} = sgn (w^{*} \cdot φ (\tilde{x}) + b^{*}) = sgn (Σ_{l = 1}^{l} α_{i}^{*} y_{i} K (\tilde{x}, x_{i}) + b^{*}), - - - (3)

(α ₁ ^*..., α _l ^*) ^TAnd b ^*Try to achieve by step (4), wherein, K (x _i, x _j)=φ (x _i) φ (x _j).