CN102707955A

CN102707955A - Method for realizing support vector machine by MPI programming and OpenMP programming

Info

Publication number: CN102707955A
Application number: CN2012101566351A
Authority: CN
Inventors: 廖士中; 卢玮
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2012-05-18
Filing date: 2012-05-18
Publication date: 2012-10-03

Abstract

The invention relates to a machine learning method based on statistical learning theory. In order to solve the problems on large-scale sorting and the solution optimization in practical realization of an SVM (support vector machine), and realizes control on time price and space price of calculation, the technical scheme adopted by the invention is a method for realizing support vector machine by adoption of MPI programming and OpenMP programming. According to the concept of sorting algorithm of SVM (support vector machine), the method is realizes as follows: serial program codes are complied by C++ and associated statements and functions of OpenMP and MPI based on the serial codes are added to realize parallelization. The method provided by the invention includes the following detailed steps: 1) determining functions of parts of the algorithm; communicating among the algorithm modules by MPI programming to transfer data and realize synchronization; and 2) adding compiling guidance statements by Open MP in the sub-modules of the algorithm, wherein a compiler automatically conducts thread-level parallel realization of the codes in the parallel area included in the compiling guidance statements. The method provided by the invention is mainly applied to the machine learning.

Description

Adopt MPI and OpenMP programming to realize support vector machine method

Technical field

The present invention relates to machine learning method, specifically, realize parallelization support vector sorter based on the hybrid programming of MPI and OpenMP based on Statistical Learning Theory.

Background technology

1. SVMs (SVM)

SVMs (Support Vector Machine; SVM) be a kind of machine learning method that people such as Vapnik proposes based on Statistical Learning Theory; It improves the generalization ability of sorter with maximization class interval structure optimal classification lineoid, has solved problems such as non-linear, dimensions, local minimum point preferably.Compare with the traditional neural networks learning method, SVM has the structure risk minimum, can approach arbitrary function and guarantee global optimum, is applicable to the field of small sample, the modeling of non-linear nuclear higher-dimension.At present, SVM has been widely used in aspects such as handwritten word identification, text classification, speech recognition, and has obtained good effect.

For the 2-category support vector machines, known training set:

T={(x ₁，y ₁),(x ₂，y ₂),...,(x _l，y _l)}∈(X×Y) ^l

X wherein _i∈ X=R ⁿ, y _i∈ Y={1 ,-1}, i=1 ..., l.

Seek X=R ⁿOn a real-valued function g (x) so that infer that with decision function f (x)=sgn (g (x)) the corresponding y value of arbitrary patterns x is a classification.

The 2-linear category support vector machines naive model of classifying is following:

subject to:y _i(ω ^Tx _i+b)≥1-ξ _i,i＝1,...,n， (1)

ξ _i≥0,i＝1,...,n

Set up Lagrangian original function:

Make local derviation equal zero and the KKT condition is:

\frac{&PartialD; L}{&PartialD; ω} = 0 &DoubleRightArrow; ω = Σ_{i}^{n} α_{i} y_{i} x_{i}, - - - (3)

\frac{&PartialD; L}{&PartialD; b} = 0 &DoubleRightArrow; Σ_{i}^{n} α_{i} y_{i} = 0, - - - (4)

\frac{&PartialD; L}{{&PartialD; ξ}_{i}} = 0 &DoubleRightArrow; C - α_{i} - r_{i} = 0, i = 1, . . ., n, - - - (5)

The expression formula that obtains is taken back L and abbreviation, obtains objective function:

\max_{α} Σ_{i = 1}^{n} α_{i} - \frac{1}{2} Σ_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} < x_{i}, x_{j} >, - - - (6)

Former problem can be written as:

\max_{α} Σ_{i = 1}^{n} α_{i} - \frac{1}{2} Σ_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} < x_{i}, x_{j} >

subject to:0≤α _i≤C,i＝1，...,n (7)

Σ_{i = 1}^{n} α_{i} y_{i} = 0

By above-mentioned condition find the solution decision function g (x)=ω ^TX+b.

The SVMs kernel function model, for general nuclear situation:

\max_{α} Σ_{i = 1}^{n} α_{i} - \frac{1}{2} Σ_{i, j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j})

subject to:0≤α _i≤C,i＝1,...,n （8）

Σ_{i = 1}^{n} α_{i} y_{i} = 0

2. parallel Programming

High-performance computer (HPC) can be divided into by its storage organization shares storage organization and distributed store structure two big classes.Current domestic and international popular Parallel Computing machine architecture mainly contains symmetrical multiprocessing and shares memory parallel machine (SMP), distributes and share memory parallel machine (DSM), large-scale parallel machine (MPP) and SMP cluster etc.Wherein the SMP architecture has the advantages that low communication postpones, and can adopt multi-thread mechanism (like Pthreads) and compiling to guide (like OpenMP) multiple programming, and its realization is comparatively simple, but that shortcoming is an extensibility is relatively poor; The MPP architecture is with good expansibility based on distributed storage, mainly adopts message passing mechanism, and the realization standard has MPI, PVM etc., but the communication overhead between its processor is big, and programming is difficulty relatively.At present, with the SMP cluster be representative level parallel architecture computing machine development rapidly, become the fashion trend of domestic and international parallel machine development.

Distributed storage programming model MPl:

MPI is the realization standard by a program message passing model of the common exploitation of academia, government and TIA, is the main flow programming model on the present distributed memory system.It is not an independently programming language, but a storehouse provides the binding with FORTRAN and C/C++ language.MPI is applicable to the parallel computation environment of sharing with distributed storage, can directly on the SMP cluster, move with its written program.MPI has portable good, powerful, efficient advantages of higher, is specially adapted to the parallel of coarseness, almost supported by all multithreading operation systems (comprising UNIX, Windows NT etc.), and be the most reliable platform of present ultra-large parallel computation.

Share storage programming model OpenMP:

OpenMP is an industrial standard of sharing the storage system programming, and its target is for smp system portable, extendible development interface to be provided.The OpenMP standard a series of compiling guidance, Runtime Library and environmental variance the parallel mechanism of sharing storage organization is described.The compiling guidance is the expansion to programming language, and the support that the zone that walks abreast, work are shared, constructed synchronously further is provided, and supports sharing and privatization of data.Runtime Library and environmental variance make the user can adjust the execution environment of concurrent program.What OpenMP realized is the parallel of thread-level, and inter-thread communication is realized through the read/write shared variable.

Hybrid programming model M PI/OpenMP:

In order to make full use of the characteristics of SMP cluster level memory structure, can consider above-mentioned two kinds of programming models are combined, realize the hybrid programming model of MPI/OpenMP.This model has hierarchical structure equally: the MPI on upper strata representes internodal parallel; The OpenMP of lower floor representes the parallel of intranodal.It is based on so theoretical apportion model: at first problem is carried out MPI and decompose, task division is become the not intensive several sections of communication, each part is assigned on the SMP node (i.e. process), communicates through the message transmission between node; Add OpenMP compiling guidance statement then the part on each node is decomposed once more, and be assigned on the different processor of SMP and carried out by a plurality of thread parallels, intranodal communicates through sharing storage.The hybrid programming model of MPI and OpenMP provides between node the two-stage parallel mechanism with intranodal, and its contribution has been to combine the coarse grain parallelism of process level) walk abreast with the fine granularity of circulation level.The method of mixture model is suitable for having the large-scale application program of inherent multilevel hierarchy.Facts have proved that its execution efficient is higher than pure MPI and OpenMP program under many circumstances.

Summary of the invention

The present invention is intended to solve the deficiency that overcomes prior art, extensive classification problem and solving-optimizing problem in the concrete realization of solution SVM, time and space cost that control is calculated.For achieving the above object, the technical scheme that the present invention takes is to adopt MPI and OpenMP programming to realize support vector machine method; According to svm classifier algorithm thought; Write the serial program code with C++, on the basis of serial code, add correlative and the function of OpenMP and MPI; Realize parallelization, concrete steps are:

1) confirm the function of algorithm each several part, use MPI to be programmed into row communication at each intermodule of algorithm, Data transmission realizes with MPI each module assignment of algorithm being handled to different processes synchronously, and the process level that reaches each intermodule of algorithm walks abreast;

2) in the submodule of algorithm, use OpenMP to add compiling guidance statement, in the code packages that needs parallelization was contained in, the code in the parallel territory that compiler compile time can be automatically comprises compiling guidance statement carries out Thread-Level Parallelism to be realized.

Adopt OpenMP and jargon to make initial work collection, the initialization gradient submodule executed in parallel of SVM algorithm; According to α _i, α _jState upgrades also executed in parallel of G (i), can communicate between the submodule of executed in parallel is fast.

Technical characterstic of the present invention and effect:

Realize the SVMs of parallelization, effectively solved extensive classification problem; Linear SVM can not only be handled, kernel function situation commonly used can be handled simultaneously.

Description of drawings

Fig. 1 SVM training data carries out the modeling process flow diagram.

Fig. 2 SVM predicted data process flow diagram.

The succession of Fig. 3 class and syntagmatic figure.

Fig. 4 serial program flowchart.

Fig. 5 MPI and OpenMP concurrent program flowchart.

Embodiment

The technical scheme that the present invention adopts is:

1.SVM algorithm is realized classification

The concrete steps that the SVM algorithm is classified:

1) the statistics classification is total, writes down the label of classification simultaneously, adds up the number of samples of each type.

2) will belong to the sample packet of same item, deposit continuously.

3) calculate weight C.

4) training n (n-1)/2 model:

Initialization nozero array is convenient to add up SV;

In the training process, need to rebuild the subdata collection, the characteristic of sample is constant, but the classification of sample will change into+1/-1;

Training subdata collection svm_train_one;

Add up nozero,,, then change into true if be false if nozero has been true, just constant.

5) output model mainly is to fill svm_model.

6) remove internal memory.

Key problem in the SVM algorithm:

In SVM, maximize the border and then need minimize this numerical value:

W: be parameter, it is obvious more to be worth big more border.

C: represent penalty coefficient, if promptly certain x belongs to a certain type, but it has departed from such, goes to the place of other types of the latter on the border and has gone, and C shows more greatly and do not want to abandon this point more that the border will dwindle.

ξ _i: represent loose variable.

Because SVM is a convex quadratic programming problem, convex quadratic programming problem has optimum solution, so problem converts following form (KKT condition) to:

\{\begin{matrix} α_{i} = 0 &DoubleLeftRightArrow; y_{i} u_{i} &GreaterEqual; 1 & (a) \\ 0 < α_{i} < C &DoubleLeftRightArrow; y_{i} u_{i} = 1 & (b) \\ α_{i} = C &DoubleLeftRightArrow; y_{i} u_{i} \leq 1 & (c) \end{matrix}

The α here _iBe Lagrange multiplier (problem is found the solution through Lagrangian multiplication number).

Situation for (a) shows α _iBe normal classification, inner on the border (we know the some y of correct classification _i* f (x _i)>=0);

Situation for (b) has shown α _iBe support vector, on the border;

Situation for (c) has shown α _iBe between two borders.

And optimum solution need satisfy the KKT condition, promptly (a) (b) (c) condition all satisfy.

Below several kinds of situation occur will occurring not satisfying:

y _iu _i≤1 but α _i＜C then is ungratified, and script α _i＞C;

y _iu _i>=1 but α _i＞0 is ungratified and script α _i=0;

y _iu _i=1 but α _i=0 or α _i=C then shows ungratified, and former should be 0＜α _i＜C.

So find out these α of discontented sufficient KKT _i, and upgrade these α _i, but these α _iReceive the another one constraint again, promptly

Σ_{i = 1}^{l} α_{i} y_{i} = 0

Therefore, we promptly upgrade α simultaneously through another method _iAnd α _j, satisfy following equality

Just can guarantee and be 0 constraint.

Utilize y _iα _i+ y _jα _j=constant, cancellation α _i, can obtain one about single argument α _iA convex quadratic programming problem, do not consider its constraint 0≤α _j≤C, can get its separate into:

α_{j}^{new} = α_{j} + \frac{y_{i} (E_{i} - E_{j})}{η}

Here E _i=u _i-y _i, η=k (x _i, x _i)+k (x _j, x _j)-2k (x _i, x _j).

Consider constraint 0≤α then _jThe analytic solution that≤C can obtain a are:

α_{j}^{new, clipped} = \{\begin{matrix} H & if α_{j}^{new} &GreaterEqual; H \\ α_{j}^{new} & ifL < α_{j}^{new} < \\ L & if α_{j}^{new} \leq L \end{matrix}, H

\{\begin{matrix} L = \max (0, α_{j} - α_{i}), H = \max (C, C + α_{j} - α_{i}) & if y_{i} &NotEqual; y_{j} \\ L = \max (0, α_{j} + α_{i} - C), H = \max (C, α_{j} - α_{i}) & if y_{i} = y_{j} \end{matrix}

For α _i, have

α_{i}^{New} = α_{i} + y_{i} y_{j} (α_{i} - α_{j}^{New, Clipped}) .

Second multiplier α _jCan look for and satisfying condition: max|E _i-E _j|.

The renewal of b:

b_{1} = b - E_{i} - y_{i} (a_{i} - a_{i}^{old}) k (x_{i}, x_{i}) - y_{i} (a_{j} - a_{j}^{old}) k (x_{j}, x_{j})

b_{2} = b - E_{j} - y_{i} (a_{i} - a_{i}^{old}) k (x_{i}, x_{i}) - y_{j} (a_{j} - a_{j}^{old}) k (x_{j}, x_{j})

Satisfying condition:

b : = \{\begin{matrix} b_{1} & if 0 < α_{i} < C \\ b_{2} & if 0 < α_{j} < C \\ (b_{1} + b_{2}) / 2 & otherwise \end{matrix}

Under upgrade b.

All α of final updating _i, y and b obtain model.

2.OpenMP and MPI

1) OpenMP programming:

OpenMP adopts the Fork-Join programming model, and execution pattern has only the main thread journey to exist when beginning to carry out.Main thread when running into when need carry out parallel computation, derives (Fork) thread and carries out parallel task in operational process.In executed in parallel, main thread and derivation thread co-operation.After parallel codes finishes to carry out, derive from thread and withdraw from or hang up, no longer to work, control flow is got back to (Join) in the independent main thread.Realize parallel through compiling guidance statement and run-time library function.

Loop parallelization:

Parallel zone compiling guidance statement:

#pragma omp parallel[clause[clause]…]

block

2) MPI design:

MPI is based on message passing mechanism.Come exchange message, coordinate paces, control and carry out through pass-along message between the part of each executed in parallel.In communication domain, send and accept message and walk abreast.

Main through realizing with minor function:

MPI_Recv；

MPI_Send；

MPI_Allgather；

MPI_Allreduce；

MPI_Bsend；

MPI_Bcast。

3.SVM the OpenMP of algorithm and MPI parallelization

According to svm classifier algorithm thought, write the serial program code with C++.On the basis of serial code, add correlative and the function (function statement as previously mentioned) of OpenMP and MPI and realize parallelization.

1) confirm the function of algorithm each several part, use MPI to be programmed into row communication at each intermodule of algorithm, Data transmission realizes with MPI each module assignment of algorithm being handled to different processes synchronously, and the process level that reaches each intermodule of algorithm walks abreast.

Following mask body is introduced the parallel programming process.In this software, the core concept of algorithm and kernel programming are embodied in the parallel_svm.cpp file.Parallel_svm_train.c and parallel_svm_predict.c all realize data training and prediction respectively through the related function that calls in the parallel_svm.cpp file.

Below be class succession and the constitutional diagram among the parallel_svm.cpp, solid line is represented inheritance, and dotted line is represented syntagmatic:

As shown in Figure 3.

Class Cache: this type mainly is responsible for the management of the related internal memory of computing, comprises application, release etc.

Class Kernel: this type realized different kernel function types, accomplishes the computing of dissimilar kernel functions.

Class c-svc: the external parameter of user's appointment when this type handled training, kernel function type.

Class Solver: this type comprised the core function of algorithm, and core function realizes with the parallelization mode, accomplishes the training to model, obtains g (x)=ω ^TX+b.Carry out data prediction.

Class Solver_NU: derived class Solver.

The concrete processing module that class Solver comprises comprises: initialization α state, and the initial work collection, the initialization gradient is according to α _i, α _jState upgrades G (i), upgrade the α state with The calculating target function value, return result.(detailed process is with reference to Sequential Minimal Optimization for SVM)

Do not using under the parallelization situation, the serial processing process is following:

As shown in Figure 4.

In this software, to use MPI and OpenMP to carry out the two-stage parallelization and handle, process is following, is the parallel, as shown in Figure 5 of MPI one-level in the frame of broken lines.

It more than is the detailed process that MPI and OpenMP parallelization realize SVM.

Pass through an instantiation, further explain the present invention below.

1. parallelization SVMs (parallel_svm) brief introduction

Parallel_svm provides C++ source file, can supply the user to compile the generation executable file voluntarily.Executable file under the Windows operating system also is provided simultaneously, has comprised: the svmtrain.exe that carries out the SVMs training; The svmpredict.exe that data set is predicted according to acquired supporting vector machine model; And the svmscale.exe that training data and test data is carried out the simple scalability operation.They can directly use in the DOS environment.

The general step that parallel_svm uses is:

1) prepares data set according to the desired form of parallel_svm software package;

2) data are carried out simple zoom operations;

3) consider to select for use RBF kernel function

4) adopt parameters C and g that whole training set is trained and obtain supporting vector machine model;

5) utilize the model obtain to test and predict.

2.parallel_svm the data layout that uses

Training data and test data file form that parallel_svm uses are following:

<label><index1>：<value1><index2>：<value2>…

Wherein < label>is the desired value of training dataset, and for classification, it is the integer (supporting a plurality of types) of certain type of sign.< index>is the integer with 1 beginning, the sequence number of representation feature; < value>is real number, just our eigenwert or independent variable of often saying.When eigenwert was 0, characteristic sequence number and eigenwert value can omit simultaneously, and promptly index can be discontinuous natural number.Separate with the space between < label>and first characteristic sequence number, previous eigenwert and the back characteristic sequence number.Label in the test data file only is used for accuracy in computation or error, if it is unknown, only needs to fill in this hurdle with any number, also can emptyly not fill out.For example:

+1 1:0.708 2:1 3:1 4:-0.320 5:-0.105 6:-1 8:1.21

For the convenience of using, can write small routine, the data layout that oneself is commonly used requires to convert to this form according to this data layout and supplies parallel_svm directly to use.For example: the format conversion function write4parallel_svm that in MATLAB, uses is following:

3.parallel_svm_scale usage

The purpose of data set being carried out convergent-divergent is: 1) avoid some range of characteristic values excessive and other range of characteristic values are too small; 2) avoid when when training calculated inner product in order to calculate kernel function, causing the difficulty of numerical evaluation.Therefore, data are zoomed between [1,1] or [0,1] usually.

Usage: parallel_svm_scale [l lower] [u upper] [y y_lower y_upper] [ssave_filename] [r restore_filename] filename

(default value: lower=-1, upper=1 do not carry out convergent-divergent to y)

Wherein,

-l: data lower limit mark; Lower: data lower limit behind the convergent-divergent;

-u: data upper limit mark; Upper: the data upper limit behind the convergent-divergent;

-y: whether desired value is carried out convergent-divergent simultaneously; Y_lower is a lower limit, and y_upper is a higher limit;

-s save_filename: expression saves as file save_filename with the rule of convergent-divergent;

-r restore_filename: expression is written into the back by this convergent-divergent with convergent-divergent rule file restore_filename;

Filename: the data file (foregoing form is satisfied in requirement) of treating convergent-divergent.

The convergent-divergent rule file can be opened with line-based browser, sees that its form is:

lower upper

<index1>lval1 uval1

<index2>lval2 uval2

Lower wherein and upper set lower when using is identical with the upper implication; Index representation feature sequence number; Lval is the eigenwert of lower limit lower after this characteristic corresponding conversion; Uval is the eigenwert corresponding to conversion back upper limit upper.

The scaled results of data set is in the case through dos window output, can certainly be through the file redirection symbol ">of DOS " result is saved as the file of appointment.

Use-case:

1)parallel_svm_scale –s train3.range train3>train3.scale

Expression adopt default value (promptly to property value zoom to [1,1]-scope, desired value is not carried out convergent-divergent)

Data set train3 is carried out zoom operations, and its resultant scaled rule file saves as train3.range, and the scaled results of scaled set saves as train3.scale.

2）parallel_svms_scale–r train3.range test3>test3.scale

Expression is carried out convergent-divergent according to its bound characteristic of correspondence value and the linear ground of upper lower limit value to data set test3 after being written into convergent-divergent rule train3.range, and the result saves as test3.scale.

4.parallel_svm_train usage

Parallel_svm_train realizes the training to training dataset, obtains the SVM model.

Usage: parallel_svm_train [options] training_set_file [model_file]

Wherein,

Options (operating parameter): the connotation that available option is promptly represented is as follows

-t kernel function type: the kernel function type is set, and default value is 2, and optional type has:

0--linear kernel: u ' * v;

1--polynomial kernel: (γ * u ' * v+coef0) ^Degree

2--RBF nuclear:

3--sigmoid nuclear: tanh (γ * u ' * v+coef0);

-d degree: the degree in the kernel function is provided with, and default value is 3;

-g γ: the γ in the kernel function is set, and default value is 1/k;

-r coef0: the coef0 in the kernel function is set, and default value is 0;

-c cost: be provided with among the C-SVC from penalty coefficient C, default value is 1;

-m cachesize: the cache memory size is set, and is unit with MB, and default value is 40;

-e ε: the tolerable deviation in the stop criterion is set, and default value is 0.001;

-h shrinking: whether use heuristicly, optional value, default value is 1 if being 0 or 1;

-wi weight: to the penalty coefficient C weighting of all kinds of samples, default value is 1.

More than these parameter settings can carry out combination in any according to the parameter that type and the kernel function of SVM are supported, if the parameter that is provided with also can not exert an influence in function or SVM type, program can not accepted this parameter; If it is incorrect that due parameter is provided with, parameter will adopt default value.Training_set_file is the data set that will train; Model_file is that training finishes the model file that the back produces, if this parameter is not provided with the filename that adopts acquiescence, also can be arranged to oneself habitual filename.

Use-case:

1）parallel_svm_train train3.scale train3.model

Training train3.scale is stored in file train3.model with model, and in the dos window, exports following result:

optimization finished,#iter=1756

nu=0.464223

obj=-551.002342,rho=-0.337784

nSV=604,nBSV=557

Total nSV=604

Wherein, #iter is an iterations, and nu is the parameter of the kernel function type of selection, and the quadratic programming that obj for the SVM file conversion is is found the solution the minimum value that obtains; Rho is the constant term b of decision function; NSV is the support vector number, and nBSV is borderline support vector number, and Total nSV is the total number of support vector.

Model after the training saves as file train3.model, opens with line-based browser such as notepads and can see that it thes contents are as follows:

The svm type that svm_type c_svc % training is adopted is C-SVC here

The kernel function type that kernel_type rbf % training is adopted is RBF nuclear here

Gamma 0.047619 % is identical with g implication during operating parameter is provided with

Nr_class 2 % divide the classification number of time-like, are two classification problems here

Total_sv 604 % support vector number altogether

Constant term b in the rho-0.337784 % decision function

Label 01 % class labels

The corresponding support vector number of nr_sv 314 290 % label of all categories

Below the SV % is support vector

1 1:-0.963808 2:0.906788...19:-0.197706 20:-0.928853 21:-1

1 1:-0.885128 2:0.768219...19:-0.452573 20:-0.980591 21:-1

... ... ...

1 1:-0.847359 2:0.485921...19:-0.54145 720:-0.989077 21:-1

5.parallel_svm_predict usage

Parallel_svm_predict is the model that obtains according to training, and set is predicted to data.

Usage: parallel_svm_predict test_file model_file output_file

Model_file is the model file that is produced by parallel_svm_train; Test_file will carry out the data predicted file; Output_file is the output file of parallel_svm_predict, expression prediction result value.

Claims

1. one kind is adopted MPI and OpenMP programming to realize support vector machine method, it is characterized in that, according to svm classifier algorithm thought; Write the serial program code with C++, on the basis of serial code, add correlative and the function of OpenMP and MPI; Realize parallelization, concrete steps are:

2. employing MPI as claimed in claim 1 and OpenMP programming realize support vector machine method, it is characterized in that, adopt OpenMP and jargon to make initial work collection, the initialization gradient submodule executed in parallel of SVM algorithm; According to α _i, α _jState upgrades also executed in parallel of G (i), can communicate between the submodule of executed in parallel is fast.