CN106407561A

CN106407561A - A division method of the parallel GPDT algorithm on a multi-core SOC

Info

Publication number: CN106407561A
Application number: CN201610832540.5A
Authority: CN
Inventors: 韩军; 轩四中; 袁腾跃; 曾晓洋
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2016-09-19
Filing date: 2016-09-19
Publication date: 2017-02-15
Anticipated expiration: 2036-09-19
Also published as: CN106407561B

Abstract

The invention relates to the technical field of integrated circuit design and in particular provides a division method of the parallel GPDT algorithm on a multi-core SOC. The parallel GPDT algorithm includes two layers of iteration, wherein the inner-layer iteration is responsible for solving a working set and the outer-layer iteration is responsible for updating the working set. As for critical paths of computing speed, the critical path of outer-layer circulation is gradient update and the critical path of inner-layer circulation is vector computing following each projection, wherein both the matrix operations need multi-core parallel processing; the other operations can only be carried out in a serial manner on a main core, including a gradient projection operation realized by using the Dai-Fletcher algorithm and the working set update realized by introducing in the quick sort algorithm. Vectors obtained after the computing is over are support vectors of training data of the GPDT algorithm.

Description

A kind of division methods on multinuclear SOC for parallel GPDT algorithm

Technical field

The invention belongs to IC design technical field, specially a kind of parallel GPDT algorithm drawing on multinuclear SoC Divide method.

Background technology

GPDT algorithm is a kind of decomposition method to original QP problem that Zanni et al. proposes, the work of its each iteration Collection variable number is 10²To 10³Between the order of magnitude so that algorithm after little iteration several times just can reach convergence although The amount of calculation ratio of iteration is larger every time, but it is enterprising by way of parallelization, complicated calculating can be assigned to multiple processors OK, thus obtaining faster training speed.

The original expression of support vector machine problem is：

G is the matrix of a l × l, referred to as nuclear matrix, wherein, For kernel function.

The decomposition of problem is exactly by vector to be solvedIt is divided into two parts, a part is working set, represented with B, another Part is non-working set, is represented with N.In formula, vector to be solved, sample class vector and nuclear matrix are all accordingly decomposed For following form：

QP subproblem after abbreviation, decomposition is converted into following form：

The solution procedure of QP subproblem is broadly divided into four steps, obtains final result by loop iteration, and iteration terminates Rule of judgment be KKT (Karush-Kuhn-Tucker) condition.

The comprising the following steps that of algorithm：

Step 1：Initialization.

By vectorIt is initialized as 0, then select two Integer n_BAnd n_C, make 0≤n_C≤n_B≤1,n_CFor even number, from vectorMiddle random selection n_BIndividual elementary composition working set B, remaining elementary composition inoperative collection N, make external iteration number of times k=1；

Step 2：QP subproblem solves.

OrderSolution for QP subproblem.Then make again

Step 2.1：Initialization

OrderRepresent Initial Gradient, and makeDecline step-length ρ₀∈[ρ_min, ρ_max], ρ_minAnd ρ_max For preset value, and meet 0<ρ_min＜ ρ_max, make internal layer iterationses k '=0；

Step 2.2：Projection

Use P_Ω() represents the operation to feasible zone Ω projection, first determines whether vectorWhether meet end condition, if Meet and then terminate iteration, otherwise calculate the direction of gradient decline using following formula：

Step 2.3：Matrix multiplication

Calculating matrix

Step 2.4：Line search

Method design factor λ with line search_k‘, and update and wait to ask vector

Step 2.5：Update

Calculate kth '+1 time iterationWith

Then calculate new gradient and decline step-length ρ_k′+1, make iterationses k '=k '+1, and return to step 2.2.

Step 3：Gradient updating.

After updating kth time iteration, object function is with regard to vectorGradient：

After renewal, ifMeet KKT condition, then terminate iteration, otherwise enter next step.

Step 4：Working set updates.

Solve following problem first：

Then, by resultIn the corresponding α of nonzero term_iTake out, form working setThe number of nonzero term is n to the maximum_c Individual, from old working set B, then take out element be filled intoIn, untilIn reach n_BIndividual element, finally makesK=k+ 1, it is then back to step 2.

The advantage of GPDT algorithm is that the working set element number of each iterative can reach 10³The order of magnitude is so that calculate Method can rapidly restrain, but due to there are substantial amounts of matrix operationss in single iteration, amount of calculation is very big.

Content of the invention

It is an object of the invention to provide a kind of division methods on multinuclear SoC for parallel GPDT algorithm, to greatly shorten list The calculating time of secondary iteration, thus improve the operational efficiency of whole training algorithm.

Division methods on multinuclear SoC for the parallel GPDT algorithm that the present invention provides, its overall thought is, by working set B In n_BIndividual element is evenly distributed on N number of processor, and each processor is in the backup locally having training data, this Matrix operationss can be easily assigned to execution on N number of processor by sample.Algorithm be can be seen that by the ultimate principle of algorithm Degree of parallelism is concentrated mainly in step 2 and step 3, and this two step is the step of matrix operationss Relatively centralized.

Parallel division methods on multinuclear SoC for the GPDT algorithm, including two parts：Row decomposes and row decompose；Concrete Jie Continue as follows.

Row decomposition method.Including：Matrix by rows decomposition, parallel computation, result splice three steps：

In the initialization procedure of step 2.1, calculate Initial GradientWherein, A represents a n_B× n_BMatrix, andRepresent a n_B× 1 column vector, then,Result be also a n_B× 1 column vector；First First, matrix A will be decomposed into by rowWherein, A_niRepresent oneMatrix；Then calculate on each coreValue；Finally, the operation result of each core is spliced by main core,It isResult.

Row decomposition method.Including：Matrix presses row decomposition, parallel computation, result three steps of splicing：

In the gradient updating of step 3, calculateWherein, G_LBIt is one Individual l × n_BMatrix, andIt is a n_B× 1 column vector, then the result that the two is multipliedIt is the column vector of l × 1. Due to matrix G_LBIt is l row n_BRow, so, first, matrix is decomposed into by row Decompose by row ForThen, each core calculatesFinally, each result assessing calculation is added up by main core,It isValue.

According to above-mentioned division methods, then the parallel GPDT algorithm after improving is (i.e. parallel based on divide on multinuclear SoC GPDT algorithm) comprise the following steps that：

Step 1：Initialization vector first on main coreFor 0, select two Integer n_BAnd n_C, make 0≤n_C≤n_B≤ 1, n_C For even number, from vectorMiddle random selection n_BIndividual elementary composition working set B, makes external iteration number of times k=1.

Step 2：QP subproblem solves

2.1 set Initial Gradient on main coreDecline step-length ρ₀∈[ρ_min, ρ_max], ρ_minAnd ρ_maxFor preset value, and Meet 0<ρ_min＜ ρ_max, make internal layer iterationses k '=0；

2.2 then on each core parallel computation Initial GradientRow fragmentBy result of calculation on main core Splicing：

Wherein, A is n_B×n_BMatrix,It is n_B× 1 column vector,It is also n_B× 1 row Vector；First, matrix A will be decomposed into by rowWherein, A_niRepresent oneMatrix；Then on each core CalculateValue；Finally, the operation result of each core is spliced by main core,It isResult；

2.3 complete the operation to feasible zone Ω projection on main core, and judge vectorWhether meet end condition, If met, terminating iteration, otherwise calculating the direction d that gradient declines^(k’)；

2.4 then parallel computation matrix z on each core^(k’)Row fragmentThe row isolation of wherein matrix A With in step 2.2, then on main core, the operation result of each core is spliced,It is's Result of calculation；

2.5 then on main core first line search design factor λ_k, calculate new step-length ρ_k‘+1And u^k′+1Deng, then make internal layer Iterationses k '=k '+1；Judge u^k′+1Whether meet KKT end condition, if it is satisfied, entering next step；Otherwise, return to Step 2.2, calculates new gradient descent direction.

Step 3：In the solution obtaining QP subproblemAfterwards, need to update gradient, on each core, parallel computation gradient increases The column-slice section of amountThen on main core, result is added up, obtain new gradient：

CalculateWherein, G_LBIt is a l × n_BMatrix, andIt is one Individual n_B× 1 column vector, then the result that the two is multipliedIt is the column vector of l × 1.Due to matrix G_LBIt is l row n_BRow, so, First, matrix is decomposed into by row It is decomposed into by rowThen, on each core CalculateFinally, each result assessing calculation is added up by main core,It is Value.

Step 4：Main core judgesWhether meet KKT condition, if it is satisfied, calculate terminating, otherwise on main core Update working set, concrete renewal process is passed away scape technology introduction, makes k=k+1, returns to step 2.

This parallel GPDT algorithm mainly includes two-layer iteration, and internal layer iteration is responsible for solving working set B, and external iteration is responsible for more New working set B.In terms of the critical path of calculating speed, the critical path of interior loop is vectorial z after each projection^(k’)Meter Calculate, and the critical path of outer loop is the renewal of gradient, this two parts matrix operations needs to be assigned to parallelization on each core Process, the mode that parallelization is processed is respectively " decomposing by row " and " decomposing by row ", remaining computing serial implementation on main core, Main inclusion two parts, one is the projection operation of gradient, uses Dai-Fletcher algorithm, and two is the renewal of working set B, This step efficiently fills the element in new working set by being introduced into quick sorting algorithm.

Brief description

Fig. 1 parallel GPDT algorithm flow.

Fig. 2 presses the matrix multiplication that row decomposes.

Fig. 3 presses the matrix multiplication that row decompose.

Specific embodiment

Below in conjunction with the accompanying drawings, the invention will be further described.

As shown in figure 1, the present invention will calculate Initial Gradient in algorithmInInterior loop Calculating matrix z^(k’), outer loop calculate gradient incrementProcess processed by parallelization, point It is fitted on and carries out on multiple processors, the time of matrix operationss in each iterative process will be greatly reduced, in addition other in algorithm Part remains serialization operation, the projection including gradient and the renewal of working set etc..According to Amdahl's law, parallelization is calculated The speed-up ratio of method not only with can parallelization part speed-up ratio relevant, also with can parallelization part ratio relevant, therefore with The increase of training data, can the operation time ratio of parallelization part increase, the overall speed-up ratio of algorithm will move closer in simultaneously The speed-up ratio of rowization part.

1st, the overall thought of parallel patition is, by the n in working set B_BIndividual element is evenly distributed on N number of processor, often The working set subscript that individual processor is assigned to is defined as set I_p, p=1,2 ..., N, then the set I after distributing_pMeet：

It is that the set that each processor is assigned to is mutually disjointed.Assume the working set element number that each processor is assigned to For n_pIndividual, and meetAnd each processor is locally having the backup of training data, thus permissible Matrix operationss are easily assigned to execution on N number of processor, the degree of parallelism of algorithm is concentrated mainly in step 2 and step 3.

2nd, the computing formula that the parallelization of Dai-Fletcher algorithm Initial Gradient calculates Initial Gradient isWherein A represents a n_B×n_BMatrix, andThen represent a n_B× 1 column vector, then Result be also a n_B× 1 column vector.Divide according to accompanying drawing 2, matrix A is pressed row and decomposes, each processor is assigned to The wherein n of matrix A_pThe fragment of row, then with column vectorIt is multiplied, eventually pass splicing, obtain final result：

In the same manner, matrix in step 2.3Calculating also decomposed with identical method, that is,

3rd, the formula of the parallel computation gradient updating of gradient updating is：

OrderThen

Wherein, G_LBIt is a l × n_BMatrix, andRepresent the vector of adjacent iteration twiceDifference, then two The result that person is multipliedIt is the column vector of l × 1.Due to matrix G_LBIt is l row n_BRow, so dividing mode here is by square Battle array G_LBDecompose by row, as shown in Figure 3.For each processor, the matrix G that it is assigned to_LBColumn-slice section G_npIt is l row n_pThe matrix of row, with column vectorRow fragmentIt is multiplied, the result obtaining isIt is the column vector of a l row, institute To need to carry out adding up by the result of calculation of each processor just to obtain final result：

4th, the other parts in algorithm, the projection operation including gradient and the renewal of working set etc., then still on main core Serial executes, and the parallel GPDT algorithm overall procedure after improvement is as shown in Figure 1.

Claims

1. a kind of division methods on multinuclear SoC for parallel GPDT algorithm are it is characterised in that comprise the following steps that：

Step 1：Initialization vector first on main coreFor 0, select two Integer n_BAnd n_C, make 0≤n_C≤n_B≤ 1, n_CFor idol Number, from vectorMiddle random selection n_BIndividual elementary composition working set B, makes external iteration number of times k=1；

Step 2：QP subproblem solves

2.1 set Initial Gradient on main coreDecline step-length ρ₀∈[ρ_min, ρ_max], ρ_minAnd ρ_maxFor preset value, and meet 0< ρ_min＜ ρ_max, make internal layer iterationses k '=0；

2.2 then on each core parallel computation Initial GradientRow fragmentResult of calculation is spliced by main core：Wherein, A is n_B×n_BMatrix,It is n_B× 1 column vector,It is also n_B× 1 column vector；First First, matrix A will be decomposed into by rowWherein, A_niRepresent oneMatrix；Then calculate on each coreValue；Finally, the operation result of each core is spliced by main core,It isResult；

2.3 complete the operation to feasible zone Ω projection on main core, and judge vectorWhether meet end condition, if full Sufficient then terminate iteration, otherwise calculate the direction d that gradient declines^(k’)；

2.4 then parallel computation matrix z on each core^(k’)Row fragmentThe row isolation of wherein matrix A is synchronous The same in rapid 2.2, then on main core, the operation result of each core is spliced,It isCalculating Result.

2.5 then on main core first line search design factor λ_k, calculate new step-length ρ_k‘+1And u^k′+1Deng, then make internal layer iteration Number of times k '=k '+1；Judge u^k′+1Whether meet KKT end condition, if it is satisfied, entering next step；Otherwise, return to step 2.2, calculate new gradient descent direction；

Step 3：In the solution obtaining QP subproblemAfterwards, need to update gradient, parallel computation gradient increment on each core Column-slice sectionThen on main core, result is added up, obtain new gradient：

CalculateWherein, G_LBIt is a l × n_BMatrix, andIt is a n_B × 1 column vector, then the result that the two is multipliedIt is the column vector of l × 1；Due to matrix G_LBIt is l row n_BRow, so, first First, matrix is decomposed into by row It is decomposed into by rowThen, each core is counted CalculateFinally, each result assessing calculation is added up by main core,It isValue；

Step 4：Main core judgesWhether meeting KKT condition, if it is satisfied, calculate terminating, otherwise updating on main core Working set, makes k=k+1, returns to step 2.