CN108846437A

CN108846437A - The method of raising TWSVM algorithm robustness based on capped-l1 norm

Info

Publication number: CN108846437A
Application number: CN201810622213.6A
Authority: CN
Inventors: 业巧林; 王春燕
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2018-06-15
Filing date: 2018-06-15
Publication date: 2018-11-20

Abstract

The present invention relates to data processing fields, disclose a kind of method of raising TWSVM algorithm robustness based on capped-l1 norm, including：Input raw data matrix M and parameter C₁、C₂、ε₁And ε₂；M points are positive and negative data matrix H and G；Two diagonal matrix matrix Fs and D are initialized as unit matrix；According to H, G, C1, C2, ε₁、ε₂, F and D, the parameter w and b of classification plane is calculated；According to w and b, calculate all data points in H and D to classification plane distance, if the distance for having data point to plane of classifying in H and D is respectively greater than ε₁And ε₂, then the point is judged for outlier, sets one close to zero value smallval for that entry value in corresponding F and D；To update diagonal matrix F and D, target value is calculated.The present invention can greatly improve the robustness of TWSVM algorithm, and also have preferable precision on original data set.

Description

The method of raising TWSVM algorithm robustness based on capped-l1 norm

Technical field

The present invention relates to algorithm improvement and data processing field, in particular to a kind of raising based on capped-l1 norm The method of TWSVM algorithm robustness.

Background technique

Support vector machines has been widely applied to data classification and regression problem up as a good tool ?.Including every field such as biological information, text classification, image procossings.In recent years, Mangasarian and Wild is proposed More plane approximation support vector machines based on generalized eigenvalue (Proximal SVM based 0n Generalized Eigenvalues, GEPSVM).The algorithm solves two non-parallel hyperplane by solving two generalized eigenvalue problems. GEPSVM is while guaranteeing computational efficiency better compared with SVM, also available preferable classification performance.Derived from the think of of GEPSVM Think, 2007, Javadeva et al. propose twin support vector machines (Twin SVM, TSVM) .TSVM seek two it is non-parallel Optimal classification surface so that each classifying face is suitable for cross division far from other class sample .TSVM close to a kind of sample The classification of face data collection, and solve two relatively smaller quadratic programming problem (Quadratic Programming Problem, QPP), after this makes the speed of TSVM be significantly faster than that the support vector machines of standard so far, many researchers exist Many algorithms are proposed on the basis of TWSVM.

But many already present TWSVM models often do not account for noise data in training classifying face, in reality In, many data can all have data noise, if not removing these noise datas of pipe, can be easy to cause the classification trained and obtained Plane has deviation, and the accuracy of classification reduces, and the robust performance of algorithm is bad.Why noise data for classification plane certainly Plan has an impact because that many models based on TWSVM use or L2 norm, L2 norm can be expanded by square operation The influence of big noise data.If a data do not have noise, the performance of these algorithms can be very good, still, reality In life, not having noisy data to be impossible existing.Therefore, in algorithm for design, we are contemplated that making an uproar for data Sound problem.It will be apparent that if many of data noise outlier, L2 norm is very not applicable, the TWSVM based on L2 norm Model algorithm is also just very not applicable.In order to improve algorithm for the robustness of outlier, we have proposed capped-l1 norms TWSVM has abandoned the disadvantage of L2, greatly improves the robustness of algorithm.

Summary of the invention

Goal of the invention：Aiming at the problems existing in the prior art, the present invention provides a kind of mentioning based on capped-l1 norm The method of high TWSVM algorithm robustness can greatly improve the robustness of TWSVM algorithm, and on original data set There is preferable precision.

Technical solution：The side for the raising TWSVM algorithm robustness that the present invention provides a kind of based on capped-l1 norm Method includes the following steps：Step 1：Input raw data matrix M and parameter C₁、C₂、ε₁And ε₂；The data matrix M points is just Data matrix H and negative data matrix G, wherein H ∈ R^{m1×（n+1）}, G∈R^{m2×（n+1）}, wherein m₁And m₂For the data matrix M In data amount check, n be the data matrix M data dimension；Step 2：Two diagonal matrix matrix Fs and D are initialized as list Bit matrix；Step 3：According to the correction data matrix H, negative data matrix G, parameter C1, C2, ε₁、ε₂And diagonal matrix F and D, The parameter w and b of classification plane is calculated；Step 4：According to the w and b, all numbers in the correction data matrix H are calculated Strong point to classification plane distance, if having data point be greater than ε₁, then the point is judged for outlier, by the corresponding diagonal matrix That entry value in F is set as smallval, smallval be one close to zero value；According to the w and b, calculate described negative Data point in data matrix G to the classification plane distance, if there is data point to be greater than ε₂, then judge the point for outlier, The smallval is set by that entry value in the corresponding diagonal matrix D；In this approach come update diagonal matrix F and D；Step 5：Calculate target value obj.

Preferably, it in the step 4, is set that entry value in corresponding diagonal matrix F to according to outlier The calculation method of smallval is as follows：；It will be corresponded to according to outlier Diagonal matrix D in that entry value be set as smallval calculation method it is as follows：

。

Preferably, in the step 5, the target value；Its In, z=(w, b)^T, e is a m₁× 1 column vector, each element are 1.

Further, further comprising the steps of after the step 5：Step 6：Step 3-step 5 described in iteration Step is until the target value obj restrains；Step 7 determines the parameter w and b of optimal classification plane.Step 3 and step 4 Iteration update setting so that the present invention hair can obtain it is optimal classification plane parameter w and b, further increase TWSVM algorithm Robustness and precision.

Preferably, the smallval=1e-5.

Beneficial effect：

Different from traditional TWSVM algorithm, capped-l1 normal form is applied in objective function by this algorithm, enables the algorithm Greatly improve the robustness of TWSVM algorithm；Its principle is the influence for removing those noise outliers to the decision of classification plane, Decision is carried out again after removing these outliers.In addition, the present invention is because the loss part in objective function applies Capped-l1 norm, therefore regardless of how serious a point is by the of mistake minute, the value of the loss of this function will not change too Greatly, the robustness of algorithm is improved with this；Finally, the present invention also simply effectively iteratively solves method by a set of, solve A local optimum out.

Compared to more traditional TWSVM algorithm, the present invention is based on the TWSVM methods of capped-l1 normal form distance metric to have Better robustness, and also have preferable precision on original data set；The present invention has selected 12 UCI data, in phase With on data set, by this method and TWSVM, WLTSVM, L1-GEPSVM, L1-NPSVM and this 5 algorithms progress of pTWSVM are smart Comparison in difference is spent, comparison result shows that method of the invention is concentrated in 12 data, there is the better than other algorithms of 8 performances, and And in the case where identical noise situations are added, what is still showed on 8 data sets is better than other algorithms.

Detailed description of the invention

Fig. 1 is target value iteration convergence mistake of the TWSVM algorithm based on capped-l1 norm on data set Haberman The schematic diagram of journey；

Fig. 2 is target value iterative convergent process signal of the TWSVM algorithm based on capped-l1 norm on data set Sonar Figure.

Fig. 3 is that in the case where noise spot artificially is added into the data of crossing plane, capped-l1 model is based in the present invention The precision of several TWSVM algorithms and other five algorithms changes comparison schematic diagram；

Fig. 4 is the TWSVM algorithm and other five based on capped-l1 norm in the present invention in different noise factors The precision of a algorithm changes comparison schematic diagram.

Specific embodiment

The present invention is described in detail with reference to the accompanying drawing.

The method for improving TWSVM algorithm robustness based on capped-l1 norm that present embodiments provide for a kind of, including Following steps：

Step 1：Input raw data matrix M and parameter C₁、C₂、ε₁And ε₂；Data matrix M points are correction data matrix H and negative According to matrix G, wherein H ∈ R^{m1×（n+1）}, G∈R^{m2×（n+1）}, wherein m₁And m₂For the data amount check in data matrix M, n is data The data dimension of matrix M；

Step 2：Two diagonal matrix matrix Fs and D are initialized as unit matrix；

Step 3：According to correction data matrix H, negative data matrix G, parameter C1, C2, ε₁、ε₂And diagonal matrix F and D, it calculates To the parameter w and b of classification plane；

Step 4：According to w and b, calculate all data points in correction data matrix H to plane of classifying distance, if there is data point Greater than ε₁, then judge that the point for outlier, passes through formulaIt will correspond to Diagonal matrix F in that entry value be set as 1e-5；

According to w and b, calculate the data point in negative data matrix G to classification plane distance, if there is data point greater than ε₂, then sentence The point break as outlier, passes through formula 1e-5 is set by that entry value in corresponding diagonal matrix D；

Update diagonal matrix F and D in this approach；

Step 5：Target value obj is calculated,, wherein z=(w, b)^T, e is One m₁× 1 column vector, each element are 1.

Step 6：Iteration three-step 5 of above-mentioned steps step is until target value obj restrains；

Step 7 determines the parameter w and b of optimal classification plane.

In order to intuitively show that this method can quickly restrain, two are listed in present embodiment based on capped- Convergence sex expression of the TWSVM algorithm of l1 norm on two UCI data sets.Such as Fig. 1 and 2, from Fig. 1 and 2 it can be found that this Method can with cracking iteration convergence to a stable value, illustrate the TWSVM algorithm based on capped-l1 norm calculate and It is all feasible on time complexity.

In addition, artificially joined obvious noise spot, such as Fig. 3 in present embodiment into the data of crossing plane Show,

With this data instance, TWSVM, WLTSVM, L1-GEPSVM are compared, L1-NPSVM, pTWSVM and be based on Precision of the TWSVM algorithm of capped-l1 norm in above data, is 55.26%, 95.12%, 97.60% respectively, 54.64%, 55.36% and 98.07%.It will be apparent that this method（TWSVM algorithm based on capped-l1 norm）Precision is higher than Other 5 algorithms, robust performance are more preferable.

In addition, present embodiment compared the TWSVM in different noise proportionals, based on capped-l1 norm The variation of the precision of algorithm and other five algorithms.As Fig. 4 is based on capped- as can be seen from Figure 4 under different noise situations The TWSVM algorithm of l1 norm is not only got well than other arithmetic accuracies, but also more steady than other algorithms.Although in noise factor In the case where 0.25, the precision of TWSVM and the precision of the TWSVM algorithm based on capped-l1 norm will as, still TWSVM is showed very unstable.In addition, WLTSVM, L1-GEPSVM, although L1-NPSVM and pTWSVM is in different noises Under the influence of show very steady, but their mean accuracy is 78.90%, 83.66%, 80.23% and 77.67%, and base In capped-l1 norm TWSVM algorithm be 86.00%.So finally or this method（Based on capped-l1 norm TWSVM algorithm）It is more advantageous.

The technical concepts and features of above embodiment only to illustrate the invention, its object is to allow be familiar with technique People cans understand the content of the present invention and implement it accordingly, and it is not intended to limit the scope of the present invention.It is all according to the present invention The equivalent transformation or modification that Spirit Essence is done, should be covered by the protection scope of the present invention.

Claims

1. a kind of method of the raising TWSVM algorithm robustness based on capped-l1 norm, which is characterized in that including following step Suddenly：

Step 1：Input raw data matrix M and parameter C₁、C₂、ε₁And ε₂；The data matrix M point for correction data matrix H and Negative data matrix G, wherein H ∈ R^{m1×（n+1）}, G∈R^{m2×（n+1）}, wherein m₁And m₂For the data in the data matrix M Number, n are the data dimension of the data matrix M；

Step 2：Two diagonal matrix matrix Fs and D are initialized as unit matrix；

Step 3：According to the correction data matrix H, negative data matrix G, parameter C1, C2, ε₁、ε₂And diagonal matrix F and D, meter It calculates and obtains the parameter w and b of classification plane；

Step 4：According to the w and b, calculate all data points in the correction data matrix H to classification plane distance, if There is data point to be greater than ε₁, then judge that the point for outlier, sets that entry value in the corresponding diagonal matrix F to Smallval, smallval be one close to zero value；

According to the w and b, calculate data point in the negative data matrix G to the classification plane distance, if there is data point Greater than ε₂, then the point is judged for outlier, sets described for that entry value in the corresponding diagonal matrix D smallval；

Update diagonal matrix F and D in this approach；

Step 5：Calculate target value obj.

2. the method for the raising TWSVM algorithm robustness according to claim 1 based on capped-l1 norm, feature It is, in the step 4, sets that entry value in corresponding diagonal matrix F to according to outlier the meter of smallval Calculation method is as follows：

；

The calculation method for setting smallval for that entry value in corresponding diagonal matrix D according to outlier is as follows：

。

3. the method for the raising TWSVM algorithm robustness according to claim 1 based on capped-l1 norm, feature It is, in the step 5,

The target value；

Wherein, z=(w, b)^T, e is a m₁× 1 column vector, each element are 1.

4. the raising TWSVM algorithm robustness according to any one of claim 1 to 3 based on capped-l1 norm Method, which is characterized in that further comprising the steps of after the step 5：

Step 6：Step 3 described in iteration-step 5 step is until the target value obj restrains；

Step 7 determines the parameter w and b of optimal classification plane.

5. the raising TWSVM algorithm robustness according to any one of claim 1 to 3 based on capped-l1 norm Method, which is characterized in that the smallval=1e-5.