CN102096744A - Irregular iteration parallelization method - Google Patents

Irregular iteration parallelization method Download PDF

Info

Publication number
CN102096744A
CN102096744A CN2011100537959A CN201110053795A CN102096744A CN 102096744 A CN102096744 A CN 102096744A CN 2011100537959 A CN2011100537959 A CN 2011100537959A CN 201110053795 A CN201110053795 A CN 201110053795A CN 102096744 A CN102096744 A CN 102096744A
Authority
CN
China
Prior art keywords
block
sub
order
data block
tile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100537959A
Other languages
Chinese (zh)
Inventor
张纪林
徐向华
万健
蒋从锋
张伟
任永坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN2011100537959A priority Critical patent/CN102096744A/en
Publication of CN102096744A publication Critical patent/CN102096744A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention relates to an irregular iteration parallelization method. In the initializing stage, the data locality property and the data parallelism of irregular iteration calculation are enhanced by analyzing an access mode of data according to a generation strategy of a data block and a sub data block; in the executing stage, the data locality property and the data parallelism of an irregular iteration technology are enhanced by executing a new generated scheduling strategy and a converted code; and in an actual executing process, an automatic performance optimizer of the irregular iteration calculation method is created, a parameter value combination under the condition of optimal efficiency is found out by an exhaustion detecting method and the parameter values of the parameter value combination are fixed, and the optimal running efficiency under the system architecture can be realized. The method is good in parallelization efficiency and expandability.

Description

A kind of non-regular iteration parallel method
Technical field
The invention belongs to the numerical reservoir simulation field, relate to a kind of non-regular iteration parallel method.
Background technology
In numerical reservoir simulation, through finite element analysis, physical region is dispersed is non-regular grid, and the problem of finding the solution the variablees such as pressure of each grid finally is summed up as the problem that adopts successive overrelaxation (SOR) or Gauss-Sai Deer iterative algorithms such as (GS) to find the solution large-scale sparse linear system of equations efficiently.Pressure equation is used in the numerical reservoir simulation Expression, wherein,
Figure 740880DEST_PATH_IMAGE002
Expression pressure unknown number vector, Represent large-scale sparse matrix of coefficients,
Figure 182356DEST_PATH_IMAGE004
Expression constant vector.In the numerical reservoir simulation field, large-scale sparse matrix of coefficients
Figure 860856DEST_PATH_IMAGE003
In the shared global matrix of nonzero element capacity usually less than ,, can effectively reduce storage space and computing time by the row compression and storage method.But in actual applications,, also need the indirect referencing between array in order to quote the concrete numerical value of nonzero element.A kind of form of expression of this class indirect referencing right and wrong rule iterative computation can cause compiler to be difficult to the control program behavior, can't discern the concurrency of non-regular iterative computation, and the locality that non-rule is calculated is difficult to be optimized.How improving the data locality and the concurrency of non-regular iterative computation, is the key issue that improves its performance.
Non-regular iterative computation parallel model comprises PW(Post/Wait) model, SE(Speculative Executor) model, IE(Inspector/Executor) model and EIE(Extend Inspector/Executor) model, the wherein kernel model of IE model right and wrong rule iterative computation.Because large-scale sparse matrix of coefficients
Figure 159113DEST_PATH_IMAGE003
The non-regular access mode of the caused data of packed data method can cause the room and time locality of data to reduce.Therefore, aspect the raising temporal locality, can use data multiplexing technique; Aspect the raising spatial locality, can adopt the summit method for reordering.In addition, it is the important optimization method that improves iterative technique degree of parallelism and data locality that traditional circulation stick is divided, and this method can be improved data locality to a great extent, and reorders by stick and can improve degree of parallelism and to reduce communication overhead.Traditional stick optimization research is used for the data traversal of regular stick more.But not regular iterative computation problem can't be determined sparse matrix of coefficients when compiling The data array subscript, therefore research method in the past is to this type of problem and inapplicable.
Summary of the invention
The present invention proposes a kind of non-regular iterative computation parallel method, it is towards distributed type assemblies and have the function of automatic tuning, during by utilization and operation and line interlacing stick strategy, improves the executed in parallel performance of non-regular iterative computation.
The technical solution adopted in the present invention is:
The present invention by analyzing the access mode of data, improves the data locality and the concurrency of non-regular iterative computation in the starting stage according to the generation strategy of data block and sub-block; In the execute phase, by carrying out newly-generated scheduling strategy, carry out the code after changing, improve the data locality and the concurrency of non-regular iterative technique; In practical implementation, construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, the operational efficiency optimum of realization technology under this architecture; Concrete steps are:
1, definition initialization matrix: the matrix of coefficients that will have symmetrical structure
Figure 289935DEST_PATH_IMAGE003
With an adjacent map
Figure 118214DEST_PATH_IMAGE006
Describe,
Figure 2011100537959100002DEST_PATH_IMAGE007
Represent the summit,
Figure 229782DEST_PATH_IMAGE008
The expression matrix of coefficients
Figure 713985DEST_PATH_IMAGE003
In element,
Figure 2011100537959100002DEST_PATH_IMAGE009
The expression adjacent map
Figure 825160DEST_PATH_IMAGE010
In a limit
Figure 2011100537959100002DEST_PATH_IMAGE011
2, to matrix of coefficients
Figure 209743DEST_PATH_IMAGE003
Once divide: carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, the data block number Value is by computing node number decision in the distributed type assemblies.Divide by figure once, make the summit
Figure 2011100537959100002DEST_PATH_IMAGE013
With divide the data block produced Set up following mapping relations:
, wherein
Figure 112998DEST_PATH_IMAGE016
Represent Individual figure divides the back data block,
Figure 553262DEST_PATH_IMAGE012
Value is a computing node number in the distributed type assemblies.
Partial order restriction relation with each summit of NodeDepence data structure storage.Be described below:
NodeDepence +={<v i, v j| (Tile (0, v i)<Tile (0, v j)) ∩ (<v i, v j∈ E ∪<v j, v i∈ E), wherein, Tile(0, v i) the expression vertex v iAt the numerical value of the data block at the 0th iteration place, and Tile (0, v j) the expression vertex v jNumerical value in the data block at the 0th iteration place.
3, to matrix of coefficients
Figure 74373DEST_PATH_IMAGE003
Carrying out secondary divides: once divide the back and produce
Figure 261772DEST_PATH_IMAGE012
It is exactly to once dividing the data block that the back produces that individual data block, secondary are divided
Figure 2011100537959100002DEST_PATH_IMAGE017
Carrying out quadratic diagram by the K-way method in the Metis storehouse, figure partitioning algorithm storehouse again divides.Divide the capacity decision of block size parameter by system memory unit, each data block is divided into again
Figure 184728DEST_PATH_IMAGE018
Individual sub-block (sub-block-k).Partial order restriction relation with each summit of Sub_NodeDepence data structure storage.Be described below:
Sub_NodeDepence[m]+={<v i,v j>|Tile(0,v i)<Tile(0,v j))?∩(<v i,v j>∈E?∪?<?v j?,v i?>∈E)}
4, sub-block is carried out inner iteration border and revises, specifically describe as follows:
For each summit, by following formula, the update calculation vertex v iNumerical value in the data block at the iter time iteration place.Inner herein iterations is that number of times is revised on the border.
Tile?(iter,v i)?=?MAX(Tile(iter,v i),Tile(iter-1,v j));
Tile(?(iver,v j)?=?MAX(Tile(iter,v i),Tile(iter,v j));
5, reorder in the summit: promptly to system of linear equations
Figure 494487DEST_PATH_IMAGE001
In unknown quantity
Figure 565211DEST_PATH_IMAGE002
Reorder, be mapped as new unknown quantity Under the prerequisite that satisfies the summit partial ordering relation, right Order rearranges, and in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement
Figure 823510DEST_PATH_IMAGE020
,
Figure 2011100537959100002DEST_PATH_IMAGE021
6, sub-block reorders: utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering:
1. in 0 computing node inside, there is not the sub-block of dependence with other nodes, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
2. in non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
7, take a sub-block iterative computation process to test and assess as sample, setting the branch block size parameter by exhaustive detection method is the multiple of computing node level cache size in the distributed type assemblies, setting inner iterations is 3 to 10 times, carry out repeatedly iterative computation, choose optimum, i.e. the shortest data block size parameter and inner iterations of iterative computation time.
8, the execution of non-regular iteration parallel method: according to the order of the summit after the rearrangement, matrix
Figure 179536DEST_PATH_IMAGE003
Row order and the row order one by one corresponding vertex change in proper order.Carry out the iterative computation based on data block and sub-block then, this type of iterative computation is described below:
Data block after the figure division is distributed in each computing node, carries out three cyclings then, outer circulation is convergent iterations, travels through each data block; The middle level circulation travels through the sub-block in each data block successively; After having traveled through middle level circulation, obtain and send data boundary, carry out the interior loop operation then, interior loop is at comprising in the sub-block
Figure 471977DEST_PATH_IMAGE022
Iterative computation is carried out on the summit.
Characteristics of the present invention are: the non-regular bar block iteration parallel method that the present invention describes is towards distributed cluster system; Parallel method needs to consider communication optimization when considering local optimization.Adopted the figure partitioning technology twice, initial graph is divided in order to guarantee the load balancing of each processor node, when having guaranteed non-regular iterative calculation method local optimization, has reduced communication and synchronization overhead.In addition, method of the present invention has automatic tuning device, finds parameter matching under the efficiency optimization situation by it.
Non-regular iteration parallel method proposed by the invention has good parallel efficiency and extensibility.In addition, to parameters such as branch block size in the non-regular iteration parallel method and inner iterationses, design automatic tuning device, under different architectures, select and fixing best parameter, when calling non-regular iteration parallel method after being convenient to, realize the operational efficiency optimum of non-regular iteration parallel method.
Description of drawings
Fig. 1 is the non-regular iteration parallel method flow process with the automatic tuning function of performance.
Fig. 2 is certain structural symmetry matrix initialization synoptic diagram.
Fig. 3 carries out the figure division to matrix diagram among Fig. 2.
Fig. 4 is that the secondary of matrix diagram is divided synoptic diagram.
Fig. 5 is a sub-block border makeover process synoptic diagram.
The serial implementation of Fig. 6 right and wrong rule iterative calculation method.
Sub-block vertical view between Fig. 7 right and wrong rule iterative space-renewal sub-block order.
Embodiment:
Be described in further detail below in conjunction with the embodiment of accompanying drawing this method.
This method by analyzing the access mode of data, according to the generation strategy of data block and sub-block, improves the data locality and the concurrency of non-regular iterative computation in the starting stage; In the execute phase, by carrying out newly-generated scheduling strategy, carry out the code after changing, improve the data locality and the concurrency of non-regular iterative technique; In practical implementation, construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, the operational efficiency optimum of realization technology under this architecture.Fig. 1 is a method flow diagram.
The initialization matrix will have the matrix of coefficients of symmetrical structure as shown in Figure 2
Figure 765293DEST_PATH_IMAGE003
With an adjacent map
Figure 459580DEST_PATH_IMAGE006
Describe,
Figure 48824DEST_PATH_IMAGE007
Represent the summit,
Figure 94140DEST_PATH_IMAGE008
The expression matrix of coefficients
Figure 489350DEST_PATH_IMAGE003
In element,
Figure 975826DEST_PATH_IMAGE009
The expression adjacent map
Figure 63867DEST_PATH_IMAGE010
In a limit
Figure 330901DEST_PATH_IMAGE011
Carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, data block number parameter
Figure 467484DEST_PATH_IMAGE012
Value is by computing node number decision in the distributed type assemblies.Partial order restriction relation with each summit of NodeDepence data structure storage.Divide by figure once, make the summit
Figure 870784DEST_PATH_IMAGE013
With divide the data block produced
Figure 568874DEST_PATH_IMAGE014
Set up following mapping relations:
Figure 323204DEST_PATH_IMAGE015
, wherein Represent
Figure 521284DEST_PATH_IMAGE012
Individual figure divides the back data block,
Figure 951128DEST_PATH_IMAGE012
Value is a computing node number in the distributed type assemblies.Fig. 3 has described the result behind the matrix diagram piecemeal first time in the step 1.
Once figure divides the back generation Individual data block block piece, and then calling graph partitioning algorithm storehouse Metis carries out second time figure to each data block and divides, each data block piece is divided into again
Figure 405560DEST_PATH_IMAGE018
Individual sub-block piece.Secondary is divided and exactly each data block block is carried out the figure division again.Divide the capacity decision of block size by system memory unit (as the level cache L1cache of computing node).Fig. 4 has described the result after the pairing adjacent map secondary division of sparse matrix of coefficients, and it carries out the secondary division to block1 and block2 respectively, respectively is divided into two sub-pieces, be respectively: sub-block1-1, sub-block-1-2, sub-block2-1, sub-block2-2.
In traditional iterative algorithm, computing node must and upgrade whole summits and finish iterative process one time by traversal, and when data volume increased and has indirect referencing, data locality was relatively poor.For this reason, we revise the boundary in each inner iteration at sub-block and finish serial iteration, by being carried out time-axis direction in each iterative process, data block is divided into sub-block, realization is carried out repeatedly iteration step renewal of recursion to same data block, thereby when not changing serial iteration algorithm character, improve data locality in the sub-block.
We carry out the border to sub-block and revise, and the border modification method adopts time lag technology (time-skewing), and the sub-block of each iteration is revised the boundary, and curved boundary is represented revised border among Fig. 5.The definition digraph Deposit the relation of adjacent sub-block.If summit in the sub-block v i With summit in the sub-block v j The border link to each other and v i <v j , then<v i, v j∈ E.Definition
Figure 2011100537959100002DEST_PATH_IMAGE023
(v i , v j , k) be
Figure 321881DEST_PATH_IMAGE018
Belong to data block in the inferior iteration v i And with data block v j Adjacent data boundary.
Data block border correction algorithm in the non-regular iterative technique carries out inner iteration border to sub-block and revises, for each summit, and by following formula, the update calculation vertex v iAnd v jNumerical value in the data block at the iter time iteration place.Inner herein iterations is that number of times is revised on the border.
Tile?(iter,v i)?=?MAX(Tile(iter,v i),Tile(iter-1,v j));
Tile(?(iver,v j)?=?MAX(Tile(iter,v i),Tile(iter,v j));
To system of linear equations
Figure 487021DEST_PATH_IMAGE001
In unknown quantity Reorder, be mapped as new unknown quantity
Figure 63813DEST_PATH_IMAGE019
Under the prerequisite that satisfies the summit partial ordering relation, right Order rearranges, and the following description of queueing discipline: in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement
Figure 723781DEST_PATH_IMAGE024
, promptly
Figure 543970DEST_PATH_IMAGE021
Utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering: in 0 computing node inside, do not have the sub-block of dependence with other nodes, be placed on the prostatitis of sub-block order after the rearrangement order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
In non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
The serialization implementation of non-regular iterative calculation method is as shown in Figure 6: the shape on included summit is all different with quantity in each height piece, and the wire list registration of interblock is according to dependence.Sub-block 1 ~ sub-block 5 is that first data block is divided on first computing node and carries out; Sub-block 6 ~ sub-block 10 is that second data block is divided on second computing node and carries out.As shown in Figure 6, there is dependence successively in all sub-blocks, and therefore, the iteration of sub-block data needs serial to carry out.At first calculate the data in the sub-block 1, after fixed point in the 1st sub-block is carried out T iterative computation of inner iterations, sub-block 2 reads the summit value that associated border connects in the sub-block 1, carry out the iterative computation operation, carry out successively, finish up to sub-block 10 iterative computation.
Under the situation that guarantees the sub-block partial ordering relation,, need rearrange the sub-block order in order to realize the executed in parallel of the non-regular iterative computation of sub-block.Utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering: in 0 computing node inside, do not have the sub-block of dependence with other nodes, be placed on the prostatitis of sub-block order after the rearrangement order, all the other sub-blocks are placed in the sub-block execution sequence by original order; In non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
Fig. 7 has described order after the permutatation of non-regular iterative computation neutron data piece: reorder back sub-block 2 and subdata 9 obtain/send data boundary after, sub-block 1,, 3,5,7,9 and sub-block 2,4,6,8,10 can be on two computing nodes executed in parallel iterative computation process.
A kind of non-regular iteration parallel method that the present invention proposes, it is a kind of parallel method with automatic tuning towards distributed type assemblies, by carrying out runtime data block boundary correction strategy, improves the executed in parallel performance of non-regular iteration.Construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, realize the operational efficiency optimum of non-regular iteration parallel method.

Claims (1)

1. non-regular iteration parallel method is characterized in that this method may further comprise the steps:
Step 1, definition initialization matrix: the matrix of coefficients that will have symmetrical structure
Figure 2011100537959100001DEST_PATH_IMAGE002
With an adjacent map Describe,
Figure 2011100537959100001DEST_PATH_IMAGE006
Represent the summit,
Figure 2011100537959100001DEST_PATH_IMAGE008
The expression matrix of coefficients
Figure 851288DEST_PATH_IMAGE002
In element,
Figure 2011100537959100001DEST_PATH_IMAGE010
The expression adjacent map
Figure 2011100537959100001DEST_PATH_IMAGE012
In a limit
Figure 2011100537959100001DEST_PATH_IMAGE014
Step 2, to matrix of coefficients
Figure 167737DEST_PATH_IMAGE002
Once divide: carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, the data block number
Figure 2011100537959100001DEST_PATH_IMAGE016
Value is by computing node number decision in the distributed type assemblies; Divide by figure once, make the summit
Figure 2011100537959100001DEST_PATH_IMAGE018
With divide the data block produced
Figure 2011100537959100001DEST_PATH_IMAGE020
Set up following mapping relations:
, wherein
Figure 2011100537959100001DEST_PATH_IMAGE024
Represent
Figure 846150DEST_PATH_IMAGE016
Individual figure divides the back data block,
Figure 266767DEST_PATH_IMAGE016
Value is a computing node number in the distributed type assemblies;
Partial order restriction relation with each summit of NodeDepence data structure storage; Be described below:
NodeDepence +={<v i, v j| (Tile (0, v i)<Tile (0, v j)) ∩ (<v i, v j∈ E ∪<v j, v i∈ E), wherein, Tile(0, v i) the expression vertex v iAt the numerical value of the data block at the 0th iteration place, and Tile (0, v j) the expression vertex v jNumerical value in the data block at the 0th iteration place;
Step 3, to matrix of coefficients
Figure 677020DEST_PATH_IMAGE002
Carrying out secondary divides: to once dividing the data block that the back produces Carrying out quadratic diagram by the K-way method in the Metis storehouse, figure partitioning algorithm storehouse again divides; Divide the capacity decision of block size parameter by system memory unit, each data block is divided into again
Figure 2011100537959100001DEST_PATH_IMAGE028
Individual sub-block; Partial order restriction relation with each summit of Sub_NodeDepence data structure storage; Be described below:
Sub_NodeDepence[m]+={<v i,v j>|Tile(0,v i)<Tile(0,v j))?∩(<v i,v j>∈E?∪?<?v j?,v i?>∈E)}
Step 4, sub-block carried out inner iteration border revise, specifically describe as follows:
For each summit, by following formula, the update calculation vertex v iNumerical value in the data block at the iter time iteration place; Wherein inner iterations is that number of times is revised on the border;
Tile?(iter,v i)?=?MAX(Tile(iter,v i),Tile(iter-1,v j));
Tile(?(iver,v j)?=?MAX(Tile(iter,v i),Tile(iter,v j));
Reorder in step 5, summit: promptly to system of linear equations
Figure 2011100537959100001DEST_PATH_IMAGE030
In unknown quantity
Figure 2011100537959100001DEST_PATH_IMAGE032
Reorder, be mapped as new unknown quantity
Figure 2011100537959100001DEST_PATH_IMAGE034
Under the prerequisite that satisfies the summit partial ordering relation, right
Figure 836475DEST_PATH_IMAGE032
Order rearranges, and in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement ,
Figure 2011100537959100001DEST_PATH_IMAGE038
Step 6, sub-block reorder: utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies; Described sub-block reorders tactful as follows:
A. in 0 computing node inside, there is not the sub-block of dependence with other nodes, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order;
B. in non-0 computing node q inside, with computing node x the sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order, wherein x<q;
Step 7, take a sub-block iterative computation process to test and assess as sample, setting the branch block size parameter by exhaustive detection method is the multiple of computing node level cache size in the distributed type assemblies, setting inner iterations is 3~10 times, carry out iterative computation, choose the shortest data block size parameter and inner iterations of iterative computation time;
The execution of step 8, non-regular iteration parallel method: according to the order of the summit after the rearrangement, matrix Row order and the row order one by one corresponding vertex change in proper order; Carry out the iterative computation based on data block and sub-block then, described iterative computation is as follows:
Data block after the figure division is distributed in each computing node, carries out three cyclings then, outer circulation is convergent iterations, travels through each data block; The middle level circulation travels through the sub-block in each data block successively; After having traveled through middle level circulation, obtain and send data boundary, carry out the interior loop operation then, interior loop is at comprising in the sub-block
Figure 2011100537959100001DEST_PATH_IMAGE040
Iterative computation is carried out on the summit.
CN2011100537959A 2011-03-07 2011-03-07 Irregular iteration parallelization method Pending CN102096744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100537959A CN102096744A (en) 2011-03-07 2011-03-07 Irregular iteration parallelization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100537959A CN102096744A (en) 2011-03-07 2011-03-07 Irregular iteration parallelization method

Publications (1)

Publication Number Publication Date
CN102096744A true CN102096744A (en) 2011-06-15

Family

ID=44129839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100537959A Pending CN102096744A (en) 2011-03-07 2011-03-07 Irregular iteration parallelization method

Country Status (1)

Country Link
CN (1) CN102096744A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520917A (en) * 2011-12-15 2012-06-27 杭州电子科技大学 Parallelization method for three-dimensional incompressible pipe flows
CN102521463A (en) * 2011-12-26 2012-06-27 杭州电子科技大学 Method for improving numerical reservoir simulation efficiency by optimizing behaviors of Cache
CN103150290A (en) * 2013-02-28 2013-06-12 杭州电子科技大学 Novel numerical simulation method for three-dimensional incompressible pipe flow
CN105701291A (en) * 2016-01-13 2016-06-22 中国航空动力机械研究所 Finite element analysis device, information acquisition method and method for parallel generation of system matrix
CN107797852A (en) * 2016-09-06 2018-03-13 阿里巴巴集团控股有限公司 The processing unit and processing method of data iteration
CN109478145A (en) * 2016-06-10 2019-03-15 华为技术有限公司 The parallel optimization of homogeneous system
CN109636709A (en) * 2018-11-28 2019-04-16 华中科技大学 A kind of figure calculation method suitable for heterogeneous platform
CN111381886A (en) * 2020-03-02 2020-07-07 西安交通大学 Rhombic block parallel optimization method for template calculation
CN111830361A (en) * 2019-04-18 2020-10-27 中国石油化工股份有限公司 Oil field tank field grounding grid fault detection device
CN111830362A (en) * 2019-04-18 2020-10-27 中国石油化工股份有限公司 Non-excavation detection method suitable for grounding grid of oil field tank field
CN113553288A (en) * 2021-09-18 2021-10-26 北京大学 Two-layer blocking multicolor parallel optimization method for HPCG benchmark test
WO2024007652A1 (en) * 2022-07-06 2024-01-11 芯和半导体科技(上海)股份有限公司 Accelerated solving method for large sparse matrix, system, and storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520917A (en) * 2011-12-15 2012-06-27 杭州电子科技大学 Parallelization method for three-dimensional incompressible pipe flows
CN102521463A (en) * 2011-12-26 2012-06-27 杭州电子科技大学 Method for improving numerical reservoir simulation efficiency by optimizing behaviors of Cache
CN103150290A (en) * 2013-02-28 2013-06-12 杭州电子科技大学 Novel numerical simulation method for three-dimensional incompressible pipe flow
CN105701291B (en) * 2016-01-13 2019-04-23 中国航空动力机械研究所 Finite element fraction analysis apparatus and information acquisition method, sytem matrix parallel generation method
CN105701291A (en) * 2016-01-13 2016-06-22 中国航空动力机械研究所 Finite element analysis device, information acquisition method and method for parallel generation of system matrix
CN109478145A (en) * 2016-06-10 2019-03-15 华为技术有限公司 The parallel optimization of homogeneous system
CN109478145B (en) * 2016-06-10 2021-04-09 华为技术有限公司 Parallel optimization of homogeneous systems
CN107797852A (en) * 2016-09-06 2018-03-13 阿里巴巴集团控股有限公司 The processing unit and processing method of data iteration
CN109636709B (en) * 2018-11-28 2020-12-08 华中科技大学 Graph calculation method suitable for heterogeneous platform
CN109636709A (en) * 2018-11-28 2019-04-16 华中科技大学 A kind of figure calculation method suitable for heterogeneous platform
CN111830361A (en) * 2019-04-18 2020-10-27 中国石油化工股份有限公司 Oil field tank field grounding grid fault detection device
CN111830362A (en) * 2019-04-18 2020-10-27 中国石油化工股份有限公司 Non-excavation detection method suitable for grounding grid of oil field tank field
CN111830362B (en) * 2019-04-18 2021-10-29 中国石油化工股份有限公司 Non-excavation detection method suitable for grounding grid of oil field tank field
CN111830361B (en) * 2019-04-18 2022-04-22 中国石油化工股份有限公司 Method for detecting corrosion fault of grounding grid of oil field tank field
CN111381886A (en) * 2020-03-02 2020-07-07 西安交通大学 Rhombic block parallel optimization method for template calculation
CN113553288A (en) * 2021-09-18 2021-10-26 北京大学 Two-layer blocking multicolor parallel optimization method for HPCG benchmark test
CN113553288B (en) * 2021-09-18 2022-01-11 北京大学 Two-layer blocking multicolor parallel optimization method for HPCG benchmark test
WO2024007652A1 (en) * 2022-07-06 2024-01-11 芯和半导体科技(上海)股份有限公司 Accelerated solving method for large sparse matrix, system, and storage medium

Similar Documents

Publication Publication Date Title
CN102096744A (en) Irregular iteration parallelization method
Mukkara et al. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling
Xiao et al. A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach
Yao et al. An efficient graph accelerator with parallel data conflict management
Harish et al. Large graph algorithms for massively multithreaded architectures
US6374403B1 (en) Programmatic method for reducing cost of control in parallel processes
Ravishankar et al. Code generation for parallel execution of a class of irregular loops on distributed memory systems
Jeffrey et al. Data-centric execution of speculative parallel programs
Karantasis et al. Parallelization of reordering algorithms for bandwidth and wavefront reduction
Hong et al. MultiGraph: Efficient graph processing on GPUs
Tabuchi et al. A source-to-source OpenACC compiler for CUDA
Bøgh et al. Work-efficient parallel skyline computation for the GPU
CN105224452A (en) A kind of prediction cost optimization method for scientific program static analysis performance
Palermo et al. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers
Boehm et al. Declarative machine learning-a classification of basic properties and types
Liu et al. On-chip cache hierarchy-aware tile scheduling for multicore machines
Yin et al. Conflict-free loop mapping for coarse-grained reconfigurable architecture with multi-bank memory
Wang et al. Pencil: A pipelined algorithm for distributed stencils
Liu et al. OBFS: OpenCL based BFS optimizations on software programmable FPGAs
Agnesina et al. Improving FPGA-based logic emulation systems through machine learning
Lashgar et al. IPMACC: open source openacc to cuda/opencl translator
CN105260222A (en) Optimization method for initiation interval between circulating pipeline iterations in reconfigurable compiler
CN112306500A (en) Compiling method for reducing multi-class access conflict aiming at coarse-grained reconfigurable structure
CN109522127A (en) A kind of fluid machinery simulated program isomery accelerated method based on GPU
CN109388876A (en) A kind of groundwater solute transfer numerical simulation parallel acceleration method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110615