CN1588342A

CN1588342A - Cross merge method for reducing support vector and training time

Info

Publication number: CN1588342A
Application number: CN 200410053659
Authority: CN
Inventors: 文益民; 吕宝粮
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2004-08-12
Filing date: 2004-08-12
Publication date: 2005-03-02
Anticipated expiration: 2024-08-12
Also published as: CN100353355C

Abstract

The cross-merging method for reducing back-up vector and training time in the field of intelligent information processing technology includes three steps. The training set decomposing step includes classifying the training specimen sets, extracting specimen, decomposing each specimen set into two son sets, and combining the son sets to obtain four training sets. The layered data screening step based on back-up vector includes processing the four training sets in back-up vector machine method to obtain four back-up vector sets, merges the four back-up vector sets in cross-merging regulations into two groups as two training sets, parallelly processing two classified problems represented by these two training sets in back-up vector machine method to obtain two back-up vector sets, and merging the two back-up vector sets to obtain final training set. Training back-up vector machine with the final training set can obtain final classifier.

Description

Reduce the merging method of intersecting of support vector and training time

Technical field

The present invention relates to a kind of layering parallel machine learning method based on support vector essence, specifically is a kind of merging method of intersecting that reduces support vector and training time.Be used for the intelligent information processing technology field.

Background technology

Along with science and technology development, the mankind have accumulated mass data in every field, and these data are also increasing with higher speed.To the analysis and the understanding of these data,, even may cause human to the prior discovery of nature for the very important meaning of having further developed of human society.On the other hand, owing to Statistical Learning Theory is arranged as solid theory, support vector machine method has become a kind of pandemic method for classifying modes.Use support vector machine method to solve extensive pattern classification problem two kinds of methods are arranged.The incremental learning method is divided into the plurality of sub problem with an extensive problem, then with each subproblem serial processing.The working set method of training support vector machine just belongs to this class.A major advantage of this method is that it has only linear demand to internal memory, and promptly the size of required memory is directly proportional with the training sample number.When handling extensive pattern classification problem, use the incremental learning method can cause iterations too much and problem such as the training time is long, the training time complexity of this method is O (N normally ²) about.The collateral learning method becomes the plurality of sub problem according to the principle of dividing and rule with former PROBLEM DECOMPOSITION, carries out each subproblem parallel processing integrated later on again.The advantage of collateral learning method is to be based upon on the basis of parallel computation, can shorten the training time, has good alterability and expandability, all needs to keep but training process finishes the result of all submodules of back, thereby causes the support vector number to increase.

Support vector is the key concept in the support vector machine method.Find by prior art documents, essence about support vector, syed in 1999, N.A. at document (Incremental Learning withSupport Vector Machines.In:Proceedings of the Workshop on Support VectorMachines at the International Joint Conference on Artificial Intelligence.Sweden, Stockholm, 1999) (the incremental learning of support vector machine, come from: 1999 support vector machine research group of International Joint Conferences on Artificial Intelligence proceedings) by a large amount of numerical simulation evidences: the support vector collection has comprised the classified information that training sample is concentrated, and this support vector collection is necessary, the number that is support vector cannot reduce to above 10% of its sum, but the number that does not have support vector has further argumentation.So far do not have yet about with the report of same document of the present invention.

Summary of the invention

The objective of the invention is at existing long deficiency of training time when using support vector machine method to solve extensive problem, a kind of merging method of intersecting that reduces support vector and training time is provided, make it reduce learning time, reduce support vector simultaneously.The present invention adopts a kind of combined method that merges of intersecting in the process of training sample screening, with the training sample set that guarantees to obtain at last and the consistance of former training sample set.

The present invention is achieved by the following technical solutions, and the inventive method comprises training set decomposing, generates three steps based on the individual-layer data screening of support vector, final sorter.

1) training set decomposing: after will including the sub-category extraction sample of training sample set of two class samples, according to predefined decomposition rate r, all kinds of sample sets in the training set are resolved into two subclass respectively, to make up from different classes of sample subclass then, and then obtain four training sets.The scale of two class classification problems of these four training set representatives is all little than former training sample set.

2) individual-layer data based on support vector screens: with these four two class classification problems of support vector machine method parallel processing, will obtain four support vectors set.According to intersect merging rule, divide two combinations also with the set of four support vectors obtaining, thereby can obtain two training sets.Two classification problems with these two training set representatives of support vector machine method parallel processing obtain the set of two support vectors.The set of these two support vectors is merged, produce a training set.This training set is final training set.Because the support vector collection of a training set has comprised the classified information in the training set, so said process progressively screens non-support vector, reduces the training time thereby reduced training sample.The present invention by two-layer data screening finally obtain with former training set equivalence comprise the less training set of number of samples.

3) generation of final sorter: the final training set training support vector machine of utilizing hierarchical screening to obtain obtains final sorter.

Below the inventive method is further described:

1, training set decomposing

Suppose to belong to class C in the former two class classification problems ₁Sample be:

P = {X_{i}}_{i = 1}^{L_{m}},

Belong to class C ₂Sample be:

N = {X_{i}}_{i = 1}^{L_{n}},

X _iRepresent a sample, L _mAnd L _nThe number of representing two class samples respectively, then all training sample set can be expressed as T=P ∪ N.According to pre-determined decomposition rate r (0＜r≤0.5) former training set P and N are decomposed into two subclass respectively:

P_{1} = {X_{i}}_{i = 1}^{L_{P_{1}}}, P_{2} = {X_{i}}_{i = L_{P_{1}} + 1}^{L_{m}}, N_{1} = {X_{i}}_{i = 1}^{L_{n_{1}}}, N_{2} = {X_{i}}_{i = L_{n_{1}} + 1}^{L_{n}} - - - (1)

Wherein

L _P1And L _N1Represent P respectively ₁And N ₁The number of middle sample.So former two class classification problem T can resolve into two less class classification problems of following four scales:

T ₁＝P ₁∪N ₁，T ₂＝P ₂∪N ₂，T ₄＝P ₂∪N ₂，T ₄＝P ₂∪N ₁????(2)

If these two classes classification problems are still too big, can in them each further be resolved into four two class classification problems that scale is littler according to above method.

2, the individual-layer data based on support vector screens

The support vector machine method of employing standard, parallel training obtains four support vector machine on these four two less class classification problems.The set of their support vector is respectively: SV ₁, SV ₂, SV ₃And SV ₄Adopt crossbinding normally, with T ₁And T ₂Support vector S set V ₁And SV ₂Be merged into T ₁₂, with T ₃And T ₄Support vector S set V ₃And SV ₄Be merged into T ₃₄The so-called intersection merges rule, is to avoid at T ₁And T ₂Or T ₃And T ₄In belong to repeating of of a sort subclass, thereby avoid the people for causing T ₁₂And T ₃₄The imbalance of middle training data and the loss of classified information.

T ₁₂＝SV ₁∪SV ₂，T ₃₄＝SV ₃∪SV ₄???????????????????????????(3)

Because concentrating, support vector comprised classified information so T ₁₂And T ₃₄Preserved the information that former training sample is concentrated from two different angles, the classified information loss of having avoided factor to bring according to division.Simultaneously, from T ₁And T ₂To T ₁₂Or T ₃And T ₄To T ₃₄The sample of non-support vector is screened to be fallen.With T ₁₂And T ₃₄As training set, obtain two support vector machine respectively through parallel processing.Their support vector set is respectively: SV ₁₂And SV ₃₄, both are merged:

T _final＝SV ₁₂∪SV ₃₄?????????????????????????????(4)

Get training set to the end.So T _FinalTo comprise all classification information among the training set T.Owing to only stay support vector, but not support vector is fallen by screening progressively in above process.Compare T with former training set T _FinalIn will only stay less relatively training data.

3, the generation of final sorter

Use T _FinalAs new training set, supported vector machine SVMfinal.This support vector machine is as last pattern classifier, and its employed support vector is less, and this will shorten recognition time.

Above process can be used arthmetic statement:

Known:

Training sample set T=P ∪ N and decomposition rate r

Algorithm:

(1) according to r P and N are decomposed, be combined into four classification problem T that scale is less then ₁, T ₂, T ₃And T ₄

(2) if T ₁, T ₂, T ₃And T ₄Problem scale meet internal memory restriction, then change (3), otherwise change (1);

(3) adopt support vector machine method with T ₁, T ₂, T ₃And T ₄Parallel processing obtains four support vector set: the SVs corresponding with them ₁, SV ₂, SV ₃And SV ₄

(4) according to the intersection combination principle they are combined into two classification problem T ₁₂And T ₃₄, adopt support vector machine method that their parallel processings are obtained two support vector S set V ₁₂And SV ₃₄

(5) make T _Final=SV ₁₂∪ SV ₃₄

(6) with T _FinalObtain final support vector machine as new training set, with its pattern classifier as cognitive phase.

The invention enables the classified information that comprises in the final training set that obtains behind the hierarchical screening and the former training set to be consistent, thereby make and utilize the recognition accuracy of hierarchical screening training sample sorter that obtains and the sorter that utilizes former whole training set to obtain to be consistent.According to adopting a plurality of tests that the present invention carried out to show: method proposed by the invention has reduced training time and support vector number.Another effect of the present invention is: do not reduce under the prerequisite of sorter recognition accuracy in assurance, adopt decomposition method to reduce problem scale.

Description of drawings

Fig. 1 the inventive method process flow diagram

The DATA DISTRIBUTION and the decomposing schematic representation of Fig. 2 embodiment of the invention experiment one

Embodiment

Below in the mode of example and the invention will be further described in conjunction with the accompanying drawings:

As shown in Figure 1, if the multiclass problem need be carried out the conversion of multiclass two classes.The inventive method may further comprise the steps then:

The first, by the pre-service of training sample the training sample classification is extracted, belong to set of composition of sample of each class.This preprocessing process can carry out when gathering training sample, can reduce the time complexity of preprocessing process like this.Under the situation of two classes, the training sample pre-service is become T=P ∪ N, wherein P and N represent to belong to the training sample set of two classifications respectively.

The second, P and N are decomposed according to predefined decomposition rate r, resolve into P respectively ₁, P ₂And N ₁, N ₂Chessboard such as one [0,200] * [0,200] in Fig. 2 is divided into four, and all sample points are evenly distributed on these four.Be positioned at [0,100] * sample in [0,100] and [100,200] * [100,200] is the positive example sample, and the sample that is arranged in remaining space is the counter-example sample.Getting decomposition rate is r=0.5, can make division as shown in Figure 2.Then according to method shown in Figure 1, carry out hierarchical screening and get to the end training set T _FinalWith SV ₁₂And SV ₃₄Merge and obtain T _FinalProcess be one go to overlap and process.In order to reduce time complexity, merging SV ₁₂And SV ₃₄The time, can get SV respectively ₁₂And SV ₃₄In the corresponding sequence number of each training sample in former training set T constitute two set, go then to overlap also, again according to go to overlap and the result training sample of correspondence is fetched, finally constitute T _Final

Three, with T _FinalAs training set, use general training method of support vector machine can get to the end sorter SVM _FinalAttention: each support vector collection among Fig. 1 is by adopting identical parameter to obtain.Such as: when adopting gaussian kernel function, need adopt identical C and σ.

Use sorter SVM _FinalThe sample that will discern is discerned.

Two test figures in the present embodiment are respectively from artificial and practical problems.Experiment porch is: 2.4GHz512MB RAM Pentium 4 PC.

In experiment one,, four different training sets and a common test set have been generated at random in order to check robustness of the present invention.Constitute four two class problem: A like this ₁, A ₂, A ₃And A ₄Each training set comprises 5000 positive example samples and 5000 counter-example samples, comprises 10000 positive example samples and 10000 counter-example samples in the test set.Adopt gaussian kernel function, parameter is chosen as: c=1000, σ=31.62.

The experimental data collection of table 1 experiment one

	?Training		?Testing
	?Training		?Testing		?Positive ?samples	?Negative ?samples	?Positive ?samples	?Negative ?samples
	????A ₁	?5000	?5000	?10000	?Positive ?samples	?Negative ?samples	?Positive ?samples	?Negative ?samples	?10000
????A ₂	????A ₁	?5000	?5000		?5000	?5000
????A ₂	????A ₃	?5000	?5000		?5000	?5000
????A ₄	????A ₃	?5000	?5000		?5000	?5000

In experiment two, the text classification database that the text classification test for data adopts Japanese Yomiuri Shimbun to provide.Through after the feature extraction, the dimension of feature space is 5000.The present invention has extracted three class data as shown in table 2 from this database.Optional two classes wherein constitute one two class classification problem, so obtain three two class problem: A ₅, A ₆And A ₇Being chosen as of parameter: σ=2, C=64 and r=0.5.

The experimental data collection of table 2 experiment two

Category	??Data
	??Data		??Training	??Test
	Accidents Health By-time	??34044 ??35932 ??33590	??Training	??Test	??8483 ??7004 ??7702

In order to verify the actual effect of method proposed by the invention, respectively the support vector machine method of the hierarchical screening training sample that the present invention is proposed with the support vector machine method of the disposable study of whole training sample set is tested comparison.For convenience, the method that the present invention is proposed is designated as C-SVM (Cascade SVM), and a kind of method in back is designated as S-SVM (Standard SVM).Experimental result sees Table 3 and table 4:

The experimental result of table 3 experiment one

	??Method	??Accuracy(％)		Training time(s)	Number of?SVs
		??Accuracy(％)				??Train	??Test
		??A ₁	??S-SVM ??C-SVM			??Train	??Test	??99.84 ??99.78	??99.81 ??99.72	46.39 13.08	93 81
??A ₂	??S-SVM ??C-SVM	??A ₁	??S-SVM ??C-SVM	??99.89 ??99.85	??99.72 ??99.70	38.00 15.34	96 83	??99.84 ??99.78	??99.81 ??99.72	46.39 13.08	93 81
??A ₂	??S-SVM ??C-SVM	??A ₃	??S-SVM ??C-SVM	??99.89 ??99.85	??99.72 ??99.70	38.00 15.34	96 83	??99.93 ??99.86	??99.84 ??99.75	32.44 13.45	88 79
??A ₄	??S-SVM ??C-SVM	??A ₃	??S-SVM ??C-SVM	??99.89 ??99.92	??99.81 ??99.83	35.50 19.87	94 84	??99.93 ??99.86	??99.84 ??99.75	32.44 13.45	88 79
??A ₄	??S-SVM ??C-SVM	??av	??S-SVM ??C-SVM	??99.89 ??99.92	??99.81 ??99.83	35.50 19.87	94 84	??99.89 ??99.85	??99.80 ??99.75	38.08 15.44	93 82

The experimental result of table 4 experiment two

	??Method	?A ₅	?A ₆	?A ₇
	??Method	?A ₅	?A ₆	?A ₇	??Training ??accuracy(％)	??S-SVM ??C-SVM	?97.74 ?97.73	?97.93 ?97.75	?96.67 ?96.67
??Test ??accuracy(％)	??S-SVM ??C-SVM	?95.81 ?95.83	?96.01 ?96.02	?93.62 ?93.62	??Training ??accuracy(％)	??S-SVM ??C-SVM	?97.74 ?97.73	?97.93 ?97.75	?96.67 ?96.67
??Test ??accuracy(％)	??S-SVM ??C-SVM	?95.81 ?95.83	?96.01 ?96.02	?93.62 ?93.62	??Training ??time(s)	??S-SVM ??C-SVM	?12664 ?9519	?7458 ?4491	?18566 ?15060
??Number ??of?SVs	??S-SVM ??C-SVM	?10933 ?10553	?9445 ?9222	?12750 ?12387	??Training ??time(s)	??S-SVM ??C-SVM	?12664 ?9519	?7458 ?4491	?18566 ?15060

Can know by above data:

1, the present invention can reduce the training time under the prerequisite that guarantees the sorter recognition accuracy.This method has robustness to training sample simultaneously; 2, the present invention has reduced the number of support vector, do not have contradiction with the achievement in research of Syed N.A in 1999, but has provided the illustration what degree support vector can reduce to actually.This is for the recognition speed that improves sorter, and being used for sorter in real time, monitoring has important meaning.

Claims

1, a kind of merging method of intersecting that reduces support vector and training time is characterized in that, comprises training set decomposing, generates three steps based on the individual-layer data screening of support vector, final sorter:

1) training set decomposing: after will including the sub-category extraction sample of training sample set of two class samples, according to predefined decomposition rate r, all kinds of sample sets in the training set are resolved into two subclass respectively, to make up from sample subclass of all categories then, and then obtaining four training sets, the scale of two class classification problems of these four training set representatives is all little than former training sample set;

2) individual-layer data based on support vector screens: with these four two class classification problems of support vector machine method parallel processing, to obtain four support vector set, merge rule according to intersecting, divide two combinations also with the set of four support vectors obtaining, thereby obtain two training sets, two classification problems with these two training set representatives of support vector machine method parallel processing, obtain the set of two support vectors, the set of these two support vectors is merged, produce a training set, this training set is final training set;

2, the merging method of intersecting of minimizing support vector as claimed in claim 1 and training time, it is characterized in that, in the step 1), after training sample classification extraction, according to predefined decomposition rate r, be combined into four two class classification problems after all kinds of sample sets in the training set are decomposed, if each problem is still too big, then further continue to decompose according to same decomposition method, decomposition rate r decision is with one deck calculation burden apportionment of falling into a trap.

3, the merging method of intersecting of minimizing support vector as claimed in claim 1 and training time, it is characterized in that, step 2) in, four classification problems are after extracting support vector, the rule that merges according to intersection becomes two classification problems with four support vector collection integrations, each classification problem has embodied the classified information of certain angle of former training set, two classification problems that obtain are extracted through parallel support vector, two support vector set that will obtain then merge, will be together from the classified information integration of two angles, thereby make that the classified information that comprises in SV12USV34 and the former whole training set is consistent, finally make the sorter that obtains that consistent recognition accuracy is arranged.