CN103744978A

CN103744978A - Parameter optimization method for support vector machine based on grid search technology

Info

Publication number: CN103744978A
Application number: CN201410016619.1A
Authority: CN
Inventors: 杨广文; 季颖生; 王小鸽; 陈宇樹; 薛志辉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2014-01-14
Filing date: 2014-01-14
Publication date: 2014-04-23

Abstract

The invention relates to a parameter optimization method for a support vector machine based on grid search technology and belongs to the field of parameter optimization of machine learning. The parameter optimization method consists of three stages: sampling, optimizing and electing, and particularly comprises generating a plurality of training sets by sampling, wherein P subsets which are used as the training sets are sampled randomly P times from a given complete sample set and P is a positive integer; ensuring that the sample proportion of the positive and negative samples of every subset is consistent with the sample proportion of the positive and negative samples in the given complete sample set; presetting the size of every subset according to the size of the given complete sample set and ensuring that P which is the amount of the subsets is capable of reflecting the probability distribution of the given complete sample set; conducting parameter optimization in every subset; conducting parameter optimization in parallel in the P subsets after sampling by means of grid search technology throughout the overall parameter space; summarizing performance results and selecting a parameter combination by means of electing to send out as the ultimate results. The parameter optimization method is aimed at improving the calculation efficiency in parameter optimization.

Description

A kind of parameter optimization method that is used for support vector machine based on grid search technology

Technical field

The invention belongs to the parameter optimization field of machine learning, particularly a kind of based on grid search technology the parameter optimization method for support vector machine.

Background technology

Support vector machine (SVM) is a kind of machine learning algorithm of widespread use, and it has good performance performance in the pattern recognition problem that solves sample on a small scale, non-linear and high dimensional data, and the problem of processing has mainly comprised statistical classification and regretional analysis.Owing to good Generalization Capability, SVM is widely used in various fields, such as, text classification, pattern-recognition, fault diagnosis etc.SVM is the learning algorithm that develops out based on Statistical Learning Theory, as an example of two classification problems example, introduces now SVM algorithm, and other problem has certain difference on algorithm, but basic ideas are consistent.

Given first problem definition, supposes one group of sample set { (x _i, y _i) | x _i∈ R ^d, i=1,2 ..., n}, wherein x _ithe proper vector of d dimension, y _irepresent sample class, two classification problems have two classification logotypes+1 ,-1} ,+1 be positive class ,-1 for bear class).Under normal circumstances, sample data is linearly inseparable, SVM passes through sample data from original inseparable spatial mappings to higher-dimension separable space, the sample data of original linearly inseparable has been changed into linear separability, then set up a largest interval lineoid, this largest interval lineoid represents by a decision function, be exactly that SVM trains the model obtaining or is called model (training of any machine learning algorithm obtain be referred to as model), the sample data on both sides is maximized to the distance of lineoid, as shown in Fig. 1 (a), dashed middle line is lineoid, the parallel solid line in both sides is the sample data point nearest apart from lineoid (small circle in figure and little triangle), SVM is required of the lineoid of that dotted line representative that maximizes these two solid line spacing distances.The foundation of SVM model and use comprise following two stages:

Stage 1: the training stage, by training data, solve largest interval lineoid (obtain model, algorithm essence is to separate following quadratic programming problem):

\min_{ω, b, ξ} \frac{1}{2} ω^{T} ω + C Σ_{i = 1}^{n} ξ_{i}

s.t y _i(ω ^TΦ(x _i)+b)≥1-ξ _i

ξ _i≥0,i＝1,2,...,n

Wherein, ω represents the normal vector perpendicular to lineoid, and b represents skew, slack variable ξ _iwith penalty factor for the treatment of hard interval problem, thereby hard Margin Classification is easily subject to a few sample impact, changing largest interval lineoid causes error to increase, as shown in Fig. 1 (b), by slack variable and penalty factor, set up soft interval, allow the existence of certain classification error, largest interval lineoid just can not change according to a few sample.In addition, in actual computation process, SVM do not need real by sample data from original inseparable spatial mappings to higher-dimension separable space, but be similar to the dot product (Φ (x of two sample datas in higher-dimension separable space by kernel function K _i) ^tΦ (x _j)).

Stage 2: test phase, by the training stage separate that quadratic programming problem obtains solve largest interval lineoid, be configured to a following decision function, be used for predicting the affiliated classification of unknown sample data.

f (x) = sign (Σ_{i = 1}^{n} α_{i} y_{i} K (x_{i}, x) + b)

Wherein, for indicator function sign (), when the result of calculation in above-mentioned bracket is more than or equal to 0, the positive class of indicator function output+1(), otherwise the negative class of indicator function output-1().

For fear of over-fitting problem, in the above-mentioned SVM training stage, conventionally can adopt the method for testing precision of cross validation to obtain a reliable and stable model.Cross validation is a training process that circulation is estimated, as an example of 10-folding cross validation example, set forth substance now: sample set is divided into 10 subsets that size is identical, each is taken turns 9 subsets is wherein generated to SVM model as training set, using 1 remaining subset as test set, the SVM model that training is obtained carries out performance verification on test set, altogether carry out 10 and take turns, each is taken turns and gets respectively different subsets and carry out testing authentication, finally determines overall performance.

The performance of SVM model depends primarily on its parameter configuration, adopts different parameter combinations to generate the SVM model obtaining and often has very large performance difference, and parameter optimization is most important for generating the SVM model of a superperformance.The target of parameter optimization is exactly from parameter space, to find the parameter combinations that makes SVM model best performance on sample set, because each parameter combinations needs to verify performance by setting up corresponding SVM model, so the expense of parameter optimization is very large, the efficiency of parameter optimization has directly determined to generate the efficiency of SVM model.

Grid search is the most basic a kind of parameter optimization technology, now using radial basis function RBF as SVM kernel function, introduces the basic step of grid search.SVM adopts kernel function RBF mainly to comprise two parameters that affect performance: penalty factor and nuclear parameter λ.

Step 1: set a parameter space.Described penalty factor chooses 2 ^-10, 2 ^-8..., 2 ⁴, nuclear parameter λ chooses 2 ^-16, 2 ^-14..., 2 ^-4(this is conventional SVM setting parameter mode), these two parameters have formed the parameter space of a two dimensional surface, as shown in Figure 2, parameter presents with log form in the drawings, the parameter space of two dimensional surface has carried out grid division, each lattice point represents a parameter combinations (C, λ), and is parameter combinations (2 as shown in stain in figure ⁰, 2 ¹⁰);

Step 2: the each parameter combinations in parameter space is trained with SVM, generate corresponding model and verify its performance, the performance of evaluation model can adopt existing machine learning performance index, as accuracy, precision, recall rate, Deng, also according to demand self-defining performance index as unified measurement criterion;

Step 3: travel through whole parameter space, attempt all parameter combinations, final output causes the parameter combinations of SVM model performance optimum, is optimal result.

Grid search technology, compared with other parameter optimization technology, it is advantageous that: grid search technology realizes simple, and versatility is good, guarantees to find globally optimal solution in given parameter space.But the shortcoming of grid search technology is that computing cost is large.Its main cause is that grid search technology has adopted the way of search of limit, needs each parameter combinations in test parameter space, and training obtains corresponding SVM model, and its operation expense is very large.According to the number of parameter, calculated amount can constantly expand by index scale, and for example, each parameter is got 10 values, corresponding 100 models of 100 combinations of 2 parameters, corresponding 1000 models of 1000 combinations of 3 parameters.Than other machine learning techniques, SVM needs the parameter of tuning few, so number of parameters is not the bottleneck of restriction grid search technology application.In addition, according to grid thickness (setting of parameter granularity), thinner grid, the optimum solution obtaining is more accurate, and parameter combinations is more, needs the SVM model of generation more.For example, if each parameter is got 10 values, 2 parameters have 100 kinds of parameter combinations, if each parameter is got 20 values, 2 parameters just have 400 kinds of parameter combinations.Therefore, the setting of parameter granularity can cause the calculating scale of parameter optimization to roll up.

Because the ultimate principle of grid search technology has caused the computing cost of its parameter optimization, be very large, need a kind of effective optimization method to improve the efficiency of grid optimizing search technique, thereby SVM is carried out to parameter optimization, such optimization method is very necessary faster and betterly.

Summary of the invention

The object of the invention is the deficiency for overcoming prior art, proposed a kind of based on grid search technology the optimization method for SVM parameter optimization, be intended to promote the counting yield in parameter optimization process.

A kind of parameter optimization method that is used for SVM based on grid search technology that the present invention proposes, it is characterized in that, for given sample set, adopt the grid search technology of optimizing to carry out parameter optimization, adopt N folding cross validation to guarantee to obtain reliable and stable SVM model simultaneously;

The method comprises sampling, optimizing and election three phases; As shown in Figure 3, specifically comprise the following steps:

Step 1) sampling generates multiple training sets: from a given full sample, concentrate P subset of P composition of randomly drawing sample, as training set, P is positive integer; Guarantee that the positive and negative sample proportion in positive and negative sample proportion and the complete or collected works in each subset is consistent; Each subset scale is given in advance according to the size of full sample collection, and the size of number of subsets P guarantees to react complete or collected works' probability distribution, and (the less performance of subset scale is better, but more difficult reaction complete or collected works' sample distribution; If subset scale is too large, SVM training is just slow, the subset as much as possible of generally sampling, the least possible sample of each subset does parameter optimization);

Step 2) each subset carries out parameter optimization: utilize grid search technology, P the subset respectively sampling being obtained carried out parameter optimization, the whole parameter space of complete traversal concurrently;

Specifically comprise following two kinds of parallelization modes:

Mode 1: each subset is carried out parameter optimization as single independently calculation task, each subset allocation is to the enterprising line parameter optimizing of a core in computer cluster, each calculation task is executed in parallel, and as shown in Fig. 3 (a), in dotted line frame, P subset carried out parameter optimization on P core simultaneously;

Mode 2: each subset is carried out the computation process of N folding cross validation concurrently, each subset is divided into N secondary subset, wherein N-1 is individual as training set, 1 as test set, the calculating of altogether carrying out N wheel cross validation (is that each subset is carried out N wheel and independently calculated, therefore comprise N calculation task), each calculation task is fitted on the enterprising line parameter optimizing of a core in computer cluster, each calculation task executed in parallel, as Fig. 3 (b), P subset in dotted line frame, each subset is carried out N folding cross validation, each N taking turns distribution of computation tasks is calculated to N core, N × P distribution of computation tasks carried out parameter optimization to N × P core simultaneously altogether, N is that (N is larger, and precision is higher for positive integer, but computing time is longer, General N is 5 or 10),

(in actual use procedure, according to computational resource, determining to adopt any parallelization)

Step 3) gathers results of property and adopts the mode of election to select parameter combinations as final result output, specifically comprises following two sub-steps:

Step 3-1) best parameter group of collecting each calculation task combines as candidate parameter;

Each calculation task adopts identical index to measure the performance of the SVM model of all parameter combinations generations on the responsible data set of this task, and the therefrom parameter combinations of selectivity optimum, outputs in file; After parameter optimization finishes, collect the output file of each calculation task, gather the best parameter group of each calculation task, form Candidate Set;

If (employing mode 1 is carried out parallel computation, obtains P parameter combinations and forms Candidate Set; If employing mode 2 is carried out parallel computation, obtain N × P parameter combinations and form Candidate Set.Though select the Candidate Set scale of mode 1 little, but the result of calculation of N folding cross validation has been carried out primary screening in Performance Evaluation process, the quality of Candidate Set is high, selects the Candidate Set scale abundance of mode 2, the result that both finally obtain is consistent)

Step 3-2) from Candidate Set election obtain the parameter combinations of best performance:

Each parameter combinations in Candidate Set is represented with a point, in Candidate Set, the distribution spatially of all parameter combinations forms a probability cloud model figure, find out all parameter point middle distance cloud models that nearest point of barycenter be a little convergent point, this convergent point is exactly institute's optimization combination, if there are multiple convergent points, convergent point is taken out and checks detailed votes, draw optimal parameter combination.

Feature of the present invention and beneficial effect: SVM is one of the most widely used machine learning techniques, and grid search is one of the most frequently used parameter optimization technology, realizes simple and guarantees to find globally optimal solution.But the way of search of this algorithm limit can cause high computing cost.The present invention proposes a kind of SVM parameter optimization method based on grid search technology.The method is based on sampling-election mechanism, for reducing the data volume of SVM training process; The method has been set up a parallel framework, mainly for common group system, and the interconnection network poor performance between node, for example, computing system, so parallel framework does not relate to the optimization of SVM kernel, the task level that is mainly used in excavating in SVM parameter optimization process walks abreast; The method is mainly for computing system or common computer cluster, the feature of this computer system is, conventionally by a large amount of heterogeneous computers, formed, the computational resource of individual node there are differences and computing power generally not high, between node, normally LAN (Local Area Network) connects, do not configure high performance interconnect equipment for transmitting data, for high performance calculating cluster, the method is applicable equally certainly.

The method provides the grid search technology of an optimization, reduces the training time of SVM model by reducing sample set, adopts the parallel counting yield of further accelerating parameter optimization of task level simultaneously.Specifically, first, the method adopts sample mode from complete sample set, to extract multiple subsets, thereby owing to adopting the subset of small scale to carry out parameter optimization to have greatly reduced each parameter combinations and generate the training time of SVM model; Secondly, the method has been set up a parallel framework and has been walked abreast for the task level of excavating SVM parameter optimization process, and the parameter optimization process of each subset can be carried out simultaneously, if computational resource allows, can carry out parallel computation to the computation process of cross validation; Finally, the parameter optimization result that the method gathers each calculation task forms Candidate Set, then by ballot mode, elects the parameter combinations that causes most of subset best performance, thereby has guaranteed the correctness of this optimization method.The method, mainly for the poor computer cluster of interconnect equipment, as computing system, is applicable to high performance calculating cluster simultaneously.The method, for the parameter optimization of SVM, is intended to promote the counting yield in parameter optimization process.

Accompanying drawing explanation

Fig. 1 is SVM bis-classification problem exemplary plot: (a) linear separability situation, (b) soft interval situation;

Fig. 2 is the two-dimensional parameter space exemplary plot of grid search;

Fig. 3 is the process flow diagram of parameter optimization method of the present invention: (a) subset parameter is carried out to parallel computation, (b) cross validation is carried out to parallel computation.

Embodiment

The present invention proposes a kind of SVM parameter optimization method based on grid search technology, below in conjunction with accompanying drawing and by embodiment, set forth the specific embodiment of the present invention.

This method comprises sampling, optimizing and election three phases; As shown in Figure 3, specifically comprise the following steps:

Specifically comprise following two kinds of parallelization modes:

Each calculation task adopts identical index to measure the performance of the SVM model of all parameter combinations generations on the responsible data set of this task, and the therefrom parameter combinations of selectivity optimum, outputs in file; After parameter optimization finishes, collect the file of each calculation task output, gather the best parameter group of each calculation task, form Candidate Set;

Embodiment

The present embodiment is take two classification SVM as example, adopt RBF kernel function to carry out parameter tuning for generating the SVM model of best performance, the parameter that need to carry out tuning comprises two: penalty factor and nuclear parameter λ, these two parameters have formed a two-dimentional parameter space, the mode of the setting means employing exponential increase of granularity defines variable parameter step length and carries out optimizing, has adopted N folding cross validation to guarantee to obtain reliable and stable SVM model simultaneously;

The penalty factor of the present embodiment gets 2 ^-10, 2 ^-8..., 2 ¹⁰, nuclear parameter λ gets 2 ^-16, 2 ^-14..., 2 ^-4.Sampling phase, extracts 36 subsets, and each sub-set size is complete or collected works 1/10; In the optimizing stage, adopt 10 folding cross validations to guarantee to generate reliable and stable SVM model; In the election stage, adopt accuracy to carry out measurement model performance as index.

The present embodiment comprises sampling, optimizing and election three phases; Specifically comprise the following steps: (seeing accompanying drawing 3).

Step 1) sampling generates multiple training sets: from given full sample, concentrate 36 subsets of randomly drawing sample composition, as training set; Each subset scale is complete or collected works 1/10, guarantees that the positive and negative sample proportion in positive and negative sample proportion and the complete or collected works in each subset is consistent simultaneously;

Step 2) 36 subsets that step 1) is obtained, utilize grid search technology respectively each subset to be carried out to parameter optimization, complete the whole parameter space of traversal; Specifically comprise following two kinds of parallelization modes:

Mode 1: each subset is carried out parameter optimization as single independently calculation task, 36 subsets are assigned to 36 enterprising line parameter optimizing of core, and 36 tasks are calculated simultaneously;

Mode 2: each subset is carried out the computation process of 10 folding cross validations concurrently, each subset is divided into 10 secondary subset, wherein 9 as training set, 1 as test set, altogether carry out 10 and take turns the calculating of cross validation, 36 subsets, each subset comprises 10 calculation tasks (be each subset carry out 10 take turns calculating), each calculation task is fitted on the enterprising line parameter optimizing of a core in computer cluster, and 360 tasks are assigned to and on 360 cores, carry out parameter optimization calculating simultaneously altogether;

(in actual use procedure, according to the computational resource that can be assigned to, determine to adopt any parallelization mode, according to this use-case, if can be assigned to 360 cores, can select mode 2, if distribute less than, select mode 1)

Step 3-1) best parameter group of collecting each task combines as candidate parameter.

The SVM model that each calculation task adopts accuracy to generate all parameter combinations to obtain carry out performance metric, compare, the therefrom parameter combinations of selectivity optimum, outputs in file; After parameter optimization finishes, collect the file of each calculation task output, gather the best parameter group of each calculation task, form Candidate Set;

(note here, if employing mode 1 is carried out parallel computation, obtain 36 parameter combinations and form Candidate Set; If employing mode 2 is carried out parallel computation, obtain 360 parameter combinations and form Candidate Set.Though select the Candidate Set scale of mode 1 little, but the result of calculation of 10 checkings has been carried out primary screening in Performance Evaluation process, the quality of Candidate Set is high, selects the Candidate Set scale abundance of mode 2, the result that both finally obtain is consistent)

Adopt 36 subsets and 10-folding cross validation to obtain 360 candidate parameter combinations, each parameter combinations represents with a point, its distribution spatially forms a probability cloud model figure, optimal parameter combination meeting corresponding to subset converged to optimal parameter corresponding to complete or collected works, find out all parameter point middle distance cloud models that nearest point of barycenter be a little exactly institute's optimization combination, if there are multiple convergent points, convergent point is taken out and checks detailed votes, draw optimal parameter combination.

The present embodiment is used for the difference technical characterictic of the efficient parameter optimization method of SVM: first, the method utilizes the random positive negative sample that generates of sample mode than 36 identical data subsets; Secondly, parallel framework is according to distributes calculation resources difference, take parallel mode targetedly, the subset that each sampling obtains is as task independently, and totally 36 tasks are carried out parallel computation, or the further cross-validation process of the each subset of parallelization, if adopt 10 folding cross validations, obtain 10 × 36 tasks and carry out parallel computation, each task adopts grid search technology to carry out parameter optimization, travels through whole parameter space; Finally, the best parameter group that gathers each calculation task output forms Candidate Set, and from Candidate Set, vote is enumerated optimum parameter combinations, as the net result output of the method.

Feature of the present invention and gain effect: first, the method adopts sample mode from complete sample set, to extract multiple subsets, thereby owing to adopting the subset of small scale to carry out parameter optimization to have greatly reduced each parameter combinations and generate the training time of SVM model; Secondly, the method has been set up a parallel framework and has been walked abreast for the task level of excavating SVM parameter optimization process, the parameter optimization process of each subset can be carried out simultaneously, if computational resource allows, can carry out parallel computation to the computation process of cross validation, thereby further improve the counting yield of grid search; Finally, the parameter optimization result that the method gathers each calculation task forms Candidate Set, then by ballot mode, elects the parameter combinations that causes most of subset best performance, thereby has guaranteed the correctness of this optimization method.

The method, mainly for the poor computer cluster of interconnect equipment, as computing system, is applicable to high performance calculating cluster simultaneously.Because grid search algorithm has universality; this optimization method is applicable to the parameter optimization of other machine learning algorithm equally; any people who is familiar with this technology is in the disclosed technical scope of the present invention, and the variation that can expect easily or replacement, within all should being encompassed in protection of the present invention.

Claims

1. the parameter optimization method for SVM based on grid search technology, is characterized in that, for given sample set, adopts the grid search technology of optimizing to carry out parameter optimization, adopts N folding cross validation to guarantee to obtain reliable and stable SVM model simultaneously;

The method comprises sampling, optimizing and election three phases; Specifically comprise the following steps:

Step 1) sampling generates multiple training sets: from full sample, concentrate P subset of P composition of randomly drawing sample, as training set, P is positive integer; Guarantee that the positive and negative sample proportion in positive and negative sample proportion and the complete or collected works in each subset is consistent; Each subset scale is given in advance according to the size of full sample collection, and the size of number of subsets P guarantees to react complete or collected works' probability distribution;

2. method as claimed in claim 1, is characterized in that described step 2) specifically comprise following two kinds of parallelization modes:

Mode 1: each subset is carried out parameter optimization calculating as single independently calculation task, each subset allocation is to the enterprising line parameter optimizing of a core in computer cluster, and each calculation task is executed in parallel, carries out parameter optimization on P core simultaneously;

Mode 2: each subset is carried out the computation process of N folding cross validation concurrently, each subset is divided into N secondary subset, wherein N-1 is individual as training set, 1 as test set, altogether carry out the calculating of N wheel cross validation, each calculation task is fitted on the enterprising line parameter optimizing of a core in computer cluster, each calculation task executed in parallel, each subset is carried out N folding cross validation, each N taking turns distribution of computation tasks is calculated to N core, N × P distribution of computation tasks carried out parameter optimization to N × P core simultaneously altogether, and N is positive integer.