CN106407561B - Method for dividing parallel GPDT algorithm on multi-core SOC - Google Patents

Method for dividing parallel GPDT algorithm on multi-core SOC Download PDF

Info

Publication number
CN106407561B
CN106407561B CN201610832540.5A CN201610832540A CN106407561B CN 106407561 B CN106407561 B CN 106407561B CN 201610832540 A CN201610832540 A CN 201610832540A CN 106407561 B CN106407561 B CN 106407561B
Authority
CN
China
Prior art keywords
matrix
core
algorithm
parallel
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610832540.5A
Other languages
Chinese (zh)
Other versions
CN106407561A (en
Inventor
韩军
轩四中
袁腾跃
曾晓洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201610832540.5A priority Critical patent/CN106407561B/en
Publication of CN106407561A publication Critical patent/CN106407561A/en
Application granted granted Critical
Publication of CN106407561B publication Critical patent/CN106407561B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention belongs to the technical field of integrated circuit design, and particularly relates to a method for dividing a parallel GPDT algorithm on a multi-core SoC. The parallel GPDT algorithm includes two layers of iteration, the inner layer of iteration is responsible for solving the working set, and the outer layer of iteration is responsible for updating the working set. In the aspect of calculating the critical path of the speed, the critical path of the outer loop is gradient updating, the critical path of the inner loop is vector calculation after each projection, and the two parts of matrix operation need to be subjected to parallel processing on multiple cores; and the rest operations can be realized only in a serial mode on the main core, including gradient projection operation realized by utilizing the Dai-Fletcher algorithm, update of a working set realized by introducing a quick sequencing algorithm and the like. And the vector obtained after the calculation is the support vector of the GPDT algorithm training data.

Description

Method for dividing parallel GPDT algorithm on multi-core SOC
Technical Field
The invention belongs to the technical field of integrated circuit design, and particularly relates to a method for dividing a parallel GPDT algorithm on a multi-core SoC.
Background
The GPDT algorithm is a decomposition method for the original QP problem proposed by Zanni et al, and the number of working set variables of each iteration is 102To 103The magnitude order is different, so that the algorithm can reach convergence after few iterations, although the calculation amount of each iteration is larger, the complex calculation can be distributed to a plurality of processors for carrying out in a parallelization mode, and the faster training speed is obtainedAnd (4) degree.
The original expression of the support vector machine problem is:
Figure BDA0001116404800000011
Figure BDA0001116404800000012
g is a matrix of l × l, called the kernel matrix, where,
Figure BDA0001116404800000013
Figure BDA0001116404800000014
is a kernel function.
The decomposition of the problem is the vector to be solved
Figure BDA0001116404800000015
The method is divided into two parts, wherein one part is a working set and is represented by B, and the other part is a non-working set and is represented by N. The vector to be solved, the sample category vector and the kernel matrix in the formula are decomposed into the following forms:
Figure BDA0001116404800000016
through simplification, the QP sub-problem after decomposition is converted into the following form:
Figure BDA0001116404800000017
Figure BDA0001116404800000018
the solving process of the QP subproblem mainly comprises four steps, the final result is solved through loop iteration, and the judgment condition of the iteration end is a KKT (Karush-Kuhn-Tucker) condition.
The specific steps of the algorithm are as follows:
step 1: and (5) initializing.
Will vector
Figure BDA0001116404800000021
Initialized to 0 and then two integers n are selectedBAnd nCLet 0 be equal to or less than nC≤nB≤1,nCIs even number, slave vector
Figure BDA0001116404800000022
In (1) random selection of nBForming a working set B by one element, forming a non-working set N by the rest elements, and enabling the outer layer iteration number k to be 1;
step 2: and solving the QP subproblem.
Figure BDA0001116404800000023
Figure BDA0001116404800000024
Order to
Figure BDA0001116404800000025
Is the solution to the QP subproblem. Then order again
Figure BDA0001116404800000026
Step 2.1: initialization
Order to
Figure BDA0001116404800000027
Representing an initial gradient, and order
Figure BDA0001116404800000028
Step down ρ0∈[ρmin,ρmax],ρminAnd ρmaxIs a predetermined value and satisfies 0<ρmin<ρmaxMaking the inner layer iteration number k' equal to 0;
step 2.2: projection (projector)
By PΩ() Representing the projection operation onto the feasible region omega, the vector is first determined
Figure BDA0001116404800000029
Whether a termination condition is met, if so, ending the iteration, otherwise calculating the direction of gradient descent using:
Figure BDA00011164048000000210
step 2.3: matrix multiplication
Computing matrices
Figure BDA00011164048000000211
Step 2.4: line search
Calculating coefficient lambda by line search methodk‘And updating the vector to be solved
Figure BDA00011164048000000212
Step 2.5: updating
Calculating the k' +1 iteration
Figure BDA00011164048000000213
And
Figure BDA00011164048000000214
Figure BDA00011164048000000215
Figure BDA00011164048000000216
a new gradient descent step size p is then calculatedk′+1Let k '═ k' +1 for the number of iterations, and return to step 2.2.
And step 3: and (4) updating the gradient.
Updating the vector of the objective function after the k iteration
Figure BDA00011164048000000217
Gradient (2):
Figure BDA00011164048000000218
after the update, if
Figure BDA00011164048000000219
If the KKT condition is satisfied, the iteration is ended, otherwise the next step is entered.
And 4, step 4: and updating the working set.
The following problem is solved first:
Figure BDA00011164048000000220
Figure BDA0001116404800000031
then, the result is obtained
Figure BDA0001116404800000032
α corresponding to the non-zero term in (1)iTaken out to form a working set
Figure BDA0001116404800000033
The maximum number of non-zero terms is ncThen take out the elements from the old working set B and fill in
Figure BDA0001116404800000034
In (1), up to
Figure BDA0001116404800000035
In to nBAn element, last order
Figure BDA0001116404800000036
k is k +1 and then returns to step 2.
The advantage of the GPDT algorithm is the work of solving each iterationThe number of the collecting elements can reach 103Orders of magnitude, enabling the algorithm to converge quickly, however, in a single iteration the computation is very large due to the large number of matrix operations.
Disclosure of Invention
The invention aims to provide a method for dividing a parallel GPDT algorithm on a multi-core SoC, so that the calculation time of single iteration is greatly shortened, and the operation efficiency of the whole training algorithm is improved.
The invention provides a method for dividing a parallel GPDT algorithm on a multi-core SoC, which has the general idea that n in a working set B is dividedBThe elements are equally distributed to N processors, and each processor is locally provided with a backup of training data, so that the matrix operation can be conveniently distributed to the N processors for execution. As can be seen from the basic principle of the algorithm, the parallelism of the algorithm is mainly concentrated in the steps 2 and 3, which are relatively concentrated steps of the matrix operation.
The method for dividing the parallel GPDT algorithm on the multi-core SoC comprises two parts: row decomposition and column decomposition; the details are as follows.
A line decomposition method. The method comprises the following steps: matrix is decomposed according to rows, parallel computation and result splicing:
in the initialization procedure of step 2.1, the initial gradient is calculated
Figure BDA0001116404800000037
Wherein A represents an nB×nBA matrix of
Figure BDA0001116404800000038
Represents an nB× 1, then,
Figure BDA0001116404800000039
also the result of (a) is an nB× 1, first, the matrix A will be decomposed into rows
Figure BDA00011164048000000310
Wherein A isniRepresents one
Figure BDA00011164048000000311
A matrix of (a); then compute on each core
Figure BDA00011164048000000312
A value of (d); finally, the operation results of the cores are spliced on the main core,
Figure BDA00011164048000000313
is that
Figure BDA00011164048000000314
The result of (1).
Column decomposition method. The method comprises the following steps: the matrix is decomposed according to columns, parallel calculation and result splicing:
in the gradient update of step 3, calculation is performed
Figure BDA00011164048000000315
Wherein G isLBIs l × nBA matrix of
Figure BDA00011164048000000316
Is nB× 1, the result of multiplying the two
Figure BDA00011164048000000317
Is a column vector of l × 1 due to the matrix GLBIs l line nBColumns, therefore, first, the matrix is decomposed into columns
Figure BDA0001116404800000041
Figure BDA0001116404800000042
Is decomposed into
Figure BDA0001116404800000043
Then, the calculation is carried out on each core
Figure BDA0001116404800000044
Finally, the results of the individual core calculations are accumulated on the primary core,
Figure BDA0001116404800000045
is that
Figure BDA0001116404800000046
The value of (c).
According to the partitioning method, the improved parallel GPDT algorithm (i.e., the parallel GPDT algorithm partitioned on the multi-core SoC) specifically includes the following steps:
step 1: first, a vector is initialized on a primary core
Figure BDA0001116404800000047
Is 0, two integers n are selectedBAnd nCLet 0 be equal to or less than nC≤nB≤1,nCIs even number, slave vector
Figure BDA0001116404800000048
In (1) random selection of nBEach element forms a working set B, and the outer iteration number k is 1.
Step 2: QP sub-problem solving
Figure BDA0001116404800000049
2.1 setting initial gradients on the Primary nucleus
Figure BDA00011164048000000410
Step down ρ0∈[ρmin,ρmax],ρminAnd ρmaxIs a predetermined value and satisfies 0<ρmin<ρmaxMaking the inner layer iteration number k' equal to 0;
2.2 the initial gradients are then computed in parallel on the individual kernels
Figure BDA00011164048000000411
Row segment of
Figure BDA00011164048000000412
Will count on the main coreAnd (4) splicing calculation results:
Figure BDA00011164048000000413
wherein A is nB×nBThe matrix is a matrix of a plurality of matrices,
Figure BDA00011164048000000414
is nB× 1, the column vector of the column vector,
Figure BDA00011164048000000415
is also nB× 1, first, the matrix A will be decomposed into rows
Figure BDA00011164048000000416
Wherein A isniRepresents one
Figure BDA00011164048000000417
A matrix of (a); then compute on each core
Figure BDA00011164048000000418
A value of (d); finally, the operation results of the cores are spliced on the main core,
Figure BDA00011164048000000419
is that
Figure BDA00011164048000000420
The result of (1);
2.3 finishing the operation of omega projection to the feasible domain on the main core and judging the vector
Figure BDA00011164048000000421
Whether a termination condition is met, if so, ending the iteration, otherwise, calculating the gradient descending direction d(k’)
2.4 then the matrix z is calculated in parallel on the individual cores(k’)Row segment of
Figure BDA00011164048000000422
Wherein the matrix A is decomposed in the same way in rowsThe same in step 2.2, then the operation results of the cores are spliced on the main core,
Figure BDA00011164048000000423
is that
Figure BDA00011164048000000424
The calculation result of (2);
2.5 then calculate the coefficient λ by line search on the principal kernel firstkCalculating a new step size pk‘+1And uk′+1And then making the inner layer iteration number k '═ k' + 1; judgment uk′+1Whether the KKT termination condition is met or not, if yes, entering the next step; otherwise, return to step 2.2 and calculate a new gradient descent direction.
And step 3: in obtaining a solution to the QP subproblem
Figure BDA00011164048000000425
Later, the gradient needs to be updated, and column segments of gradient increments are computed in parallel on each core
Figure BDA00011164048000000426
The results are then accumulated on the primary kernel, resulting in a new gradient:
computing
Figure BDA0001116404800000051
Wherein G isLBIs l × nBA matrix of
Figure BDA0001116404800000052
Is nB× 1, the result of multiplying the two
Figure BDA00011164048000000513
Is a column vector of l × 1 due to the matrix GLBIs l line nBColumns, therefore, first, the matrix is decomposed into columns
Figure BDA0001116404800000053
Figure BDA0001116404800000054
Is decomposed into
Figure BDA0001116404800000055
Then, the calculation is carried out on each core
Figure BDA0001116404800000056
Finally, the results of the individual core calculations are accumulated on the primary core,
Figure BDA0001116404800000057
is that
Figure BDA0001116404800000058
The value of (c).
And 4, step 4: judging on the Master core
Figure BDA0001116404800000059
And if the KKT condition is met, finishing calculation, otherwise, updating the working set on the main core, specifically, referring to background technical introduction, and returning to the step 2, wherein k is k + 1.
The parallel GPDT algorithm mainly comprises two layers of iteration, wherein the inner layer of iteration is responsible for solving the working set B, and the outer layer of iteration is responsible for updating the working set B. In terms of calculating the critical path of the velocity, the critical path of the inner loop is the post-projection vector z(k’)The two parts of matrix operation need to be distributed to each core for parallelization processing, the parallelization processing modes are 'decomposition by row' and 'decomposition by column', the rest operations are realized on the main core in series, and the method mainly comprises two parts, namely the projection operation of the gradient, the Dai-Fletcher algorithm and the update of the working set B, and the step fills elements in a new working set efficiently by introducing a quick sorting algorithm.
Drawings
Figure 1 parallel GPDT algorithm flow.
Fig. 2 matrix multiplication by row decomposition.
Fig. 3 matrix multiplication by column decomposition.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in FIG. 1, the present invention calculates the initial gradient in the algorithm
Figure BDA00011164048000000510
In (1)
Figure BDA00011164048000000511
Inner loop computation matrix z(k’)Outer loop calculation of gradient increments
Figure BDA00011164048000000512
The process of (2) is distributed to a plurality of processors for carrying out parallelization processing, so that the time of matrix operation in each iteration process is greatly reduced, and in addition, other parts in the algorithm are still serialization operation, including gradient projection, working set updating and the like. According to the Amdall law, the acceleration ratio of the parallelization algorithm is not only related to the acceleration ratio of the parallelizable part, but also related to the proportion of the parallelizable part, so that along with the increase of training data, the operation time proportion of the parallelizable part is increased, and the acceleration ratio of the whole algorithm is gradually close to the acceleration ratio of the parallelizable part.
1. The general idea of parallel partitioning is to divide n in the working set BBThe elements are equally distributed to N processors, and the work set subscript distributed to each processor is defined as a set IpP 1, 2, …, N, then the set I after allocationpSatisfies the following conditions:
Figure BDA0001116404800000061
i.e. the sets to which each processor is assigned do not intersect each other. Assume that each processor is assigned a number of working set elements of npAnd satisfy
Figure BDA0001116404800000062
And each processor is stored locallyAnd the training data is backed up, so that the matrix operation can be conveniently distributed to N processors for execution, and the parallelism of the algorithm is mainly concentrated in the steps 2 and 3.
2. The calculation formula for calculating the initial gradient by parallelizing the initial gradient of the Dai-Fletcher algorithm is
Figure BDA0001116404800000063
Wherein A represents an nB×nBA matrix of
Figure BDA0001116404800000064
Then represents an nB× 1, then
Figure BDA0001116404800000065
Also the result of (a) is an nB× 1, according to fig. 2, the matrix a is decomposed into rows, each processor being assigned to n of the matrix apSegment of a row, then with a column vector
Figure BDA0001116404800000066
Multiplying, and finally splicing to obtain a final result:
Figure BDA0001116404800000067
similarly, matrix in step 2.3
Figure BDA0001116404800000068
Are also decomposed in the same way, i.e. by
Figure BDA0001116404800000069
3. Parallel computation of gradient updates the formula for the gradient update is:
Figure BDA00011164048000000610
order to
Figure BDA00011164048000000611
Then
Figure BDA00011164048000000612
Wherein G isLBIs l × nBA matrix of
Figure BDA00011164048000000613
Vector representing two adjacent iterations
Figure BDA00011164048000000614
The difference of (2) is the result of multiplication of the two
Figure BDA00011164048000000615
Is a column vector of l × 1 due to the matrix GLBIs l line nBColumns, so the division here is by way of matrix GLBThe column decomposition is shown in figure 3. For each processor, its assigned matrix GLBColumn fragment G ofnpIs l line npMatrix of columns, and column vector
Figure BDA00011164048000000616
Row segment of
Figure BDA00011164048000000617
Multiplication, the result obtained is
Figure BDA00011164048000000618
Is a column vector of l rows, so the computation results of each processor need to be accumulated to obtain the final result:
Figure BDA00011164048000000619
4. other parts of the algorithm, including projection operation of the gradient and update of the working set, are still executed in series on the main core, and the overall flow of the improved parallel GPDT algorithm is shown in fig. 1.

Claims (1)

1. A method for dividing a parallel GPDT algorithm on a multi-core SoC is characterized by comprising the following specific steps:
step 1: first, a vector is initialized on a primary core
Figure FDA0002474205910000015
Is 0, two integers n are selectedBAnd nCLet 0 be equal to or less than nC≤nB≤1,nCIs even number, slave vector
Figure FDA0002474205910000016
In (1) random selection of nBForming a working set B by the elements, and enabling the outer layer iteration number k to be 1;
step 2: QP sub-problem solving
Figure FDA0002474205910000017
2.1 setting initial gradients on the Primary nucleus
Figure FDA0002474205910000018
Step down ρ0∈[ρmin,ρmax],ρminAnd ρmaxIs a predetermined value and satisfies 0<ρmin<ρmaxMaking the inner layer iteration number k' equal to 0;
2.2 the initial gradients are then computed in parallel on the individual kernels
Figure FDA0002474205910000019
Row segment of
Figure FDA00024742059100000110
And splicing the calculation results on the main core:
Figure FDA00024742059100000111
wherein A is nB×nBThe matrix is a matrix of a plurality of matrices,
Figure FDA00024742059100000112
is nB× 1, the column vector of the column vector,
Figure FDA00024742059100000113
is also nB× 1, first, the matrix A will be decomposed into rows
Figure FDA0002474205910000011
Wherein A isniRepresents one
Figure FDA00024742059100000114
A matrix of (a); then compute on each core
Figure FDA00024742059100000115
A value of (d); finally, the operation results of the cores are spliced on the main core,
Figure FDA0002474205910000012
is that
Figure FDA00024742059100000116
The result of (1);
2.3 finishing the operation of omega projection to the feasible domain on the main core and judging the vector
Figure FDA00024742059100000117
Whether a termination condition is met, if so, ending the iteration, otherwise, calculating the gradient descending direction d(k’)
2.4 then the matrix z is calculated in parallel on the individual cores(k’)Row segment of
Figure FDA00024742059100000118
Wherein the row decomposition mode of the matrix A is the same as that in the step 2.2, then the operation results of the cores are spliced on the main core,
Figure FDA0002474205910000013
is that
Figure FDA00024742059100000119
The calculation result of (2);
2.5 then calculate the coefficient λ by line search on the principal kernel firstkCalculating a new step size pk‘+1And uk′+1Then, making the inner layer iteration number k '═ k' + 1; judgment uk′+1Whether the KKT termination condition is met or not, if yes, entering the next step; otherwise, returning to the step 2.2, and calculating a new gradient descending direction;
and step 3: in obtaining a solution to the QP subproblem
Figure FDA00024742059100000120
Later, the gradient needs to be updated, and column segments of gradient increments are computed in parallel on each core
Figure FDA00024742059100000121
The results are then accumulated on the primary kernel, resulting in a new gradient:
computing
Figure FDA0002474205910000014
Wherein G isLBIs l × nBA matrix of
Figure FDA00024742059100000122
Is nB× 1, the result of multiplying the two
Figure FDA00024742059100000123
Is a column vector of l × 1, due to the matrix GLBIs l line nBColumns, therefore, first, the matrix is decomposed into columns
Figure FDA0002474205910000022
Figure FDA0002474205910000023
Is decomposed into
Figure FDA0002474205910000021
Then, the calculation is carried out on each core
Figure FDA0002474205910000024
Finally, the results of the individual core calculations are accumulated on the primary core,
Figure FDA0002474205910000025
is that
Figure FDA0002474205910000026
A value of (d);
and 4, step 4: judging on the Master core
Figure FDA0002474205910000027
And if the KKT condition is met, finishing the calculation, otherwise, updating the working set on the main core, and returning to the step 2 to enable k to be k + 1.
CN201610832540.5A 2016-09-19 2016-09-19 Method for dividing parallel GPDT algorithm on multi-core SOC Active CN106407561B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610832540.5A CN106407561B (en) 2016-09-19 2016-09-19 Method for dividing parallel GPDT algorithm on multi-core SOC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610832540.5A CN106407561B (en) 2016-09-19 2016-09-19 Method for dividing parallel GPDT algorithm on multi-core SOC

Publications (2)

Publication Number Publication Date
CN106407561A CN106407561A (en) 2017-02-15
CN106407561B true CN106407561B (en) 2020-07-03

Family

ID=57997635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610832540.5A Active CN106407561B (en) 2016-09-19 2016-09-19 Method for dividing parallel GPDT algorithm on multi-core SOC

Country Status (1)

Country Link
CN (1) CN106407561B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897163A (en) * 2017-03-08 2017-06-27 郑州云海信息技术有限公司 A kind of algebra system method for solving and system based on KNL platforms
EP3654208A1 (en) * 2017-08-31 2020-05-20 Cambricon Technologies Corporation Limited Chip device and related products
CN115619890B (en) * 2022-12-05 2023-04-07 山东省计算中心(国家超级计算济南中心) Tomography method and system for solving linear equation set based on parallel random iteration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102844762A (en) * 2010-01-22 2012-12-26 意法爱立信有限公司 Secure environment management during switches between different modes of multicore systems
CN104461467A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode
CN104820657A (en) * 2015-05-14 2015-08-05 西安电子科技大学 Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
CN105550161A (en) * 2015-12-16 2016-05-04 浪潮(北京)电子信息产业有限公司 Parallel logic regression method and system for heterogeneous systems

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7363463B2 (en) * 2005-05-13 2008-04-22 Microsoft Corporation Method and system for caching address translations from multiple address spaces in virtual machines
US20150323975A1 (en) * 2014-05-12 2015-11-12 Qualcomm Innovation Center, Inc. SYNCHRONIZATION OF ACTIVITY OF MULTIPLE SUBSYSTEMS IN A SoC TO SAVE STATIC POWER

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102844762A (en) * 2010-01-22 2012-12-26 意法爱立信有限公司 Secure environment management during switches between different modes of multicore systems
CN104461467A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode
CN104820657A (en) * 2015-05-14 2015-08-05 西安电子科技大学 Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
CN105550161A (en) * 2015-12-16 2016-05-04 浪潮(北京)电子信息产业有限公司 Parallel logic regression method and system for heterogeneous systems

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A parallel solver for large quadratic programs in training support vector machines;G. Zanghirati等;《Parallel Computing》;20031231;全文 *
GRADIENT PROJECTION METHODS FOR QUADRATIC PROGRAMS AND APPLICATIONS IN TRAINING SUPPORT VECTOR MACHINES;THOMAS SERAFINI等;《Optimization Methods and Software》;20140531;全文 *
支持向量机处理大规模问题算法综述;文益民 等;《计算机科学》;20090731;第36卷(第7期);全文 *
面向无线安全的多核SoC平台关键技术研究;曹丹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150815(第8期);全文 *

Also Published As

Publication number Publication date
CN106407561A (en) 2017-02-15

Similar Documents

Publication Publication Date Title
CN106407561B (en) Method for dividing parallel GPDT algorithm on multi-core SOC
Nurvitadhi et al. GraphGen: An FPGA framework for vertex-centric graph computation
Peeters et al. Stacking sequence optimisation of variable stiffness laminates with manufacturing constraints
EP3816824A1 (en) High throughput matrix processor with support for concurrently processing multiple matrices
CN102521854B (en) Parallel flow line placing method applicable to two-dimensional flow field
WO2021057465A1 (en) Method and apparatus for performing parallel processing on deep learning model
CN106959937A (en) A kind of vectorization implementation method of warp product matrix towards GPDSP
CN104835168A (en) Fast multi-phase image segmentation method based on global convex variational model
JP2020520519A5 (en)
Lai et al. Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs
CN103065015B (en) A kind of bearing structure low-carbon (LC) material-saving method for designing based on internal force path geometry form
CN110188424B (en) Local area grid reconstruction parallel method for dynamic boundary flow field numerical simulation
CN104615790B (en) Feature recommends method and apparatus
Cools et al. On rounding error resilience, maximal attainable accuracy and parallel performance of the pipelined Conjugate Gradients method for large-scale linear systems in PETSc
CN104317244A (en) Reconfigurable manufacturing system part family construction method
CN104049612A (en) Processing workshop scheduling method based on distribution estimation
CN111125620B (en) Parallel random gradient descent method based on matrix decomposition in recommendation system
Li et al. Optimized deep belief networks on CUDA GPUs
CN106227982A (en) A kind of electromagnetic relay static characteristic computational methods and device
JP2016224801A (en) Parallel computer system, parallel calculation method and program
Harlap et al. PipeDream: Pipeline parallelism for DNN training
CN108599173B (en) Method and device for solving batch power flows
JP6573583B2 (en) System development support apparatus and system development support method
Herrero et al. An implementation of level set based topology optimization using GPU
Zhang et al. A Barzilai and Borwein regularization feasible direction algorithm for convex nonlinear SOC programming with linear constraints

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant