CN102096744A - Irregular iteration parallelization method - Google Patents
Irregular iteration parallelization method Download PDFInfo
- Publication number
- CN102096744A CN102096744A CN2011100537959A CN201110053795A CN102096744A CN 102096744 A CN102096744 A CN 102096744A CN 2011100537959 A CN2011100537959 A CN 2011100537959A CN 201110053795 A CN201110053795 A CN 201110053795A CN 102096744 A CN102096744 A CN 102096744A
- Authority
- CN
- China
- Prior art keywords
- block
- sub
- order
- data block
- tile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Complex Calculations (AREA)
Abstract
The invention relates to an irregular iteration parallelization method. In the initializing stage, the data locality property and the data parallelism of irregular iteration calculation are enhanced by analyzing an access mode of data according to a generation strategy of a data block and a sub data block; in the executing stage, the data locality property and the data parallelism of an irregular iteration technology are enhanced by executing a new generated scheduling strategy and a converted code; and in an actual executing process, an automatic performance optimizer of the irregular iteration calculation method is created, a parameter value combination under the condition of optimal efficiency is found out by an exhaustion detecting method and the parameter values of the parameter value combination are fixed, and the optimal running efficiency under the system architecture can be realized. The method is good in parallelization efficiency and expandability.
Description
Technical field
The invention belongs to the numerical reservoir simulation field, relate to a kind of non-regular iteration parallel method.
Background technology
In numerical reservoir simulation, through finite element analysis, physical region is dispersed is non-regular grid, and the problem of finding the solution the variablees such as pressure of each grid finally is summed up as the problem that adopts successive overrelaxation (SOR) or Gauss-Sai Deer iterative algorithms such as (GS) to find the solution large-scale sparse linear system of equations efficiently.Pressure equation is used in the numerical reservoir simulation
Expression, wherein,
Expression pressure unknown number vector,
Represent large-scale sparse matrix of coefficients,
Expression constant vector.In the numerical reservoir simulation field, large-scale sparse matrix of coefficients
In the shared global matrix of nonzero element capacity usually less than
,, can effectively reduce storage space and computing time by the row compression and storage method.But in actual applications,, also need the indirect referencing between array in order to quote the concrete numerical value of nonzero element.A kind of form of expression of this class indirect referencing right and wrong rule iterative computation can cause compiler to be difficult to the control program behavior, can't discern the concurrency of non-regular iterative computation, and the locality that non-rule is calculated is difficult to be optimized.How improving the data locality and the concurrency of non-regular iterative computation, is the key issue that improves its performance.
Non-regular iterative computation parallel model comprises PW(Post/Wait) model, SE(Speculative Executor) model, IE(Inspector/Executor) model and EIE(Extend Inspector/Executor) model, the wherein kernel model of IE model right and wrong rule iterative computation.Because large-scale sparse matrix of coefficients
The non-regular access mode of the caused data of packed data method can cause the room and time locality of data to reduce.Therefore, aspect the raising temporal locality, can use data multiplexing technique; Aspect the raising spatial locality, can adopt the summit method for reordering.In addition, it is the important optimization method that improves iterative technique degree of parallelism and data locality that traditional circulation stick is divided, and this method can be improved data locality to a great extent, and reorders by stick and can improve degree of parallelism and to reduce communication overhead.Traditional stick optimization research is used for the data traversal of regular stick more.But not regular iterative computation problem can't be determined sparse matrix of coefficients when compiling
The data array subscript, therefore research method in the past is to this type of problem and inapplicable.
Summary of the invention
The present invention proposes a kind of non-regular iterative computation parallel method, it is towards distributed type assemblies and have the function of automatic tuning, during by utilization and operation and line interlacing stick strategy, improves the executed in parallel performance of non-regular iterative computation.
The technical solution adopted in the present invention is:
The present invention by analyzing the access mode of data, improves the data locality and the concurrency of non-regular iterative computation in the starting stage according to the generation strategy of data block and sub-block; In the execute phase, by carrying out newly-generated scheduling strategy, carry out the code after changing, improve the data locality and the concurrency of non-regular iterative technique; In practical implementation, construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, the operational efficiency optimum of realization technology under this architecture; Concrete steps are:
1, definition initialization matrix: the matrix of coefficients that will have symmetrical structure
With an adjacent map
Describe,
Represent the summit,
The expression matrix of coefficients
In element,
The expression adjacent map
In a limit
2, to matrix of coefficients
Once divide: carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, the data block number
Value is by computing node number decision in the distributed type assemblies.Divide by figure once, make the summit
With divide the data block produced
Set up following mapping relations:
, wherein
Represent
Individual figure divides the back data block,
Value is a computing node number in the distributed type assemblies.
Partial order restriction relation with each summit of NodeDepence data structure storage.Be described below:
NodeDepence +={<v
i, v
j| (Tile (0, v
i)<Tile (0, v
j)) ∩ (<v
i, v
j∈ E ∪<v
j, v
i∈ E), wherein, Tile(0, v
i) the expression vertex v
iAt the numerical value of the data block at the 0th iteration place, and Tile (0, v
j) the expression vertex v
jNumerical value in the data block at the 0th iteration place.
3, to matrix of coefficients
Carrying out secondary divides: once divide the back and produce
It is exactly to once dividing the data block that the back produces that individual data block, secondary are divided
Carrying out quadratic diagram by the K-way method in the Metis storehouse, figure partitioning algorithm storehouse again divides.Divide the capacity decision of block size parameter by system memory unit, each data block is divided into again
Individual sub-block (sub-block-k).Partial order restriction relation with each summit of Sub_NodeDepence data structure storage.Be described below:
Sub_NodeDepence[m]+={<v
i,v
j>|Tile(0,v
i)<Tile(0,v
j))?∩(<v
i,v
j>∈E?∪?<?v
j?,v
i?>∈E)}
4, sub-block is carried out inner iteration border and revises, specifically describe as follows:
For each summit, by following formula, the update calculation vertex v
iNumerical value in the data block at the iter time iteration place.Inner herein iterations is that number of times is revised on the border.
Tile?(iter,v
i)?=?MAX(Tile(iter,v
i),Tile(iter-1,v
j));
Tile(?(iver,v
j)?=?MAX(Tile(iter,v
i),Tile(iter,v
j));
5, reorder in the summit: promptly to system of linear equations
In unknown quantity
Reorder, be mapped as new unknown quantity
Under the prerequisite that satisfies the summit partial ordering relation, right
Order rearranges, and in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement
,
6, sub-block reorders: utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering:
1. in 0 computing node inside, there is not the sub-block of dependence with other nodes, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
2. in non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
7, take a sub-block iterative computation process to test and assess as sample, setting the branch block size parameter by exhaustive detection method is the multiple of computing node level cache size in the distributed type assemblies, setting inner iterations is 3 to 10 times, carry out repeatedly iterative computation, choose optimum, i.e. the shortest data block size parameter and inner iterations of iterative computation time.
8, the execution of non-regular iteration parallel method: according to the order of the summit after the rearrangement, matrix
Row order and the row order one by one corresponding vertex change in proper order.Carry out the iterative computation based on data block and sub-block then, this type of iterative computation is described below:
Data block after the figure division is distributed in each computing node, carries out three cyclings then, outer circulation is convergent iterations, travels through each data block; The middle level circulation travels through the sub-block in each data block successively; After having traveled through middle level circulation, obtain and send data boundary, carry out the interior loop operation then, interior loop is at comprising in the sub-block
Iterative computation is carried out on the summit.
Characteristics of the present invention are: the non-regular bar block iteration parallel method that the present invention describes is towards distributed cluster system; Parallel method needs to consider communication optimization when considering local optimization.Adopted the figure partitioning technology twice, initial graph is divided in order to guarantee the load balancing of each processor node, when having guaranteed non-regular iterative calculation method local optimization, has reduced communication and synchronization overhead.In addition, method of the present invention has automatic tuning device, finds parameter matching under the efficiency optimization situation by it.
Non-regular iteration parallel method proposed by the invention has good parallel efficiency and extensibility.In addition, to parameters such as branch block size in the non-regular iteration parallel method and inner iterationses, design automatic tuning device, under different architectures, select and fixing best parameter, when calling non-regular iteration parallel method after being convenient to, realize the operational efficiency optimum of non-regular iteration parallel method.
Description of drawings
Fig. 1 is the non-regular iteration parallel method flow process with the automatic tuning function of performance.
Fig. 2 is certain structural symmetry matrix initialization synoptic diagram.
Fig. 3 carries out the figure division to matrix diagram among Fig. 2.
Fig. 4 is that the secondary of matrix diagram is divided synoptic diagram.
Fig. 5 is a sub-block border makeover process synoptic diagram.
The serial implementation of Fig. 6 right and wrong rule iterative calculation method.
Sub-block vertical view between Fig. 7 right and wrong rule iterative space-renewal sub-block order.
Embodiment:
Be described in further detail below in conjunction with the embodiment of accompanying drawing this method.
This method by analyzing the access mode of data, according to the generation strategy of data block and sub-block, improves the data locality and the concurrency of non-regular iterative computation in the starting stage; In the execute phase, by carrying out newly-generated scheduling strategy, carry out the code after changing, improve the data locality and the concurrency of non-regular iterative technique; In practical implementation, construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, the operational efficiency optimum of realization technology under this architecture.Fig. 1 is a method flow diagram.
The initialization matrix will have the matrix of coefficients of symmetrical structure as shown in Figure 2
With an adjacent map
Describe,
Represent the summit,
The expression matrix of coefficients
In element,
The expression adjacent map
In a limit
Carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, data block number parameter
Value is by computing node number decision in the distributed type assemblies.Partial order restriction relation with each summit of NodeDepence data structure storage.Divide by figure once, make the summit
With divide the data block produced
Set up following mapping relations:
, wherein
Represent
Individual figure divides the back data block,
Value is a computing node number in the distributed type assemblies.Fig. 3 has described the result behind the matrix diagram piecemeal first time in the step 1.
Once figure divides the back generation
Individual data block block piece, and then calling graph partitioning algorithm storehouse Metis carries out second time figure to each data block and divides, each data block piece is divided into again
Individual sub-block piece.Secondary is divided and exactly each data block block is carried out the figure division again.Divide the capacity decision of block size by system memory unit (as the level cache L1cache of computing node).Fig. 4 has described the result after the pairing adjacent map secondary division of sparse matrix of coefficients, and it carries out the secondary division to block1 and block2 respectively, respectively is divided into two sub-pieces, be respectively: sub-block1-1, sub-block-1-2, sub-block2-1, sub-block2-2.
In traditional iterative algorithm, computing node must and upgrade whole summits and finish iterative process one time by traversal, and when data volume increased and has indirect referencing, data locality was relatively poor.For this reason, we revise the boundary in each inner iteration at sub-block and finish serial iteration, by being carried out time-axis direction in each iterative process, data block is divided into sub-block, realization is carried out repeatedly iteration step renewal of recursion to same data block, thereby when not changing serial iteration algorithm character, improve data locality in the sub-block.
We carry out the border to sub-block and revise, and the border modification method adopts time lag technology (time-skewing), and the sub-block of each iteration is revised the boundary, and curved boundary is represented revised border among Fig. 5.The definition digraph
Deposit the relation of adjacent sub-block.If summit in the sub-block
v i With summit in the sub-block
v j The border link to each other and
v i <v j , then<v
i, v
j∈ E.Definition
(v i , v j , k) be
Belong to data block in the inferior iteration
v i And with data block
v j Adjacent data boundary.
Data block border correction algorithm in the non-regular iterative technique carries out inner iteration border to sub-block and revises, for each summit, and by following formula, the update calculation vertex v
iAnd v
jNumerical value in the data block at the iter time iteration place.Inner herein iterations is that number of times is revised on the border.
Tile?(iter,v
i)?=?MAX(Tile(iter,v
i),Tile(iter-1,v
j));
Tile(?(iver,v
j)?=?MAX(Tile(iter,v
i),Tile(iter,v
j));
To system of linear equations
In unknown quantity
Reorder, be mapped as new unknown quantity
Under the prerequisite that satisfies the summit partial ordering relation, right
Order rearranges, and the following description of queueing discipline: in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement
, promptly
Utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering: in 0 computing node inside, do not have the sub-block of dependence with other nodes, be placed on the prostatitis of sub-block order after the rearrangement order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
In non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
The serialization implementation of non-regular iterative calculation method is as shown in Figure 6: the shape on included summit is all different with quantity in each height piece, and the wire list registration of interblock is according to dependence.Sub-block 1 ~ sub-block 5 is that first data block is divided on first computing node and carries out; Sub-block 6 ~ sub-block 10 is that second data block is divided on second computing node and carries out.As shown in Figure 6, there is dependence successively in all sub-blocks, and therefore, the iteration of sub-block data needs serial to carry out.At first calculate the data in the sub-block 1, after fixed point in the 1st sub-block is carried out T iterative computation of inner iterations, sub-block 2 reads the summit value that associated border connects in the sub-block 1, carry out the iterative computation operation, carry out successively, finish up to sub-block 10 iterative computation.
Under the situation that guarantees the sub-block partial ordering relation,, need rearrange the sub-block order in order to realize the executed in parallel of the non-regular iterative computation of sub-block.Utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies.The sub-block tactful following description of reordering: in 0 computing node inside, do not have the sub-block of dependence with other nodes, be placed on the prostatitis of sub-block order after the rearrangement order, all the other sub-blocks are placed in the sub-block execution sequence by original order; In non-0 computing node q inside, with computing node x(x<q) sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order.
Fig. 7 has described order after the permutatation of non-regular iterative computation neutron data piece: reorder back sub-block 2 and subdata 9 obtain/send data boundary after, sub-block 1,, 3,5,7,9 and sub-block 2,4,6,8,10 can be on two computing nodes executed in parallel iterative computation process.
A kind of non-regular iteration parallel method that the present invention proposes, it is a kind of parallel method with automatic tuning towards distributed type assemblies, by carrying out runtime data block boundary correction strategy, improves the executed in parallel performance of non-regular iteration.Construct the automatic tuning device of performance of non-regular iterative calculation method, find the parameter values under the efficiency optimization situation to make up by exhaustive detection method, and fix its parameter value, realize the operational efficiency optimum of non-regular iteration parallel method.
Claims (1)
1. non-regular iteration parallel method is characterized in that this method may further comprise the steps:
Step 1, definition initialization matrix: the matrix of coefficients that will have symmetrical structure
With an adjacent map
Describe,
Represent the summit,
The expression matrix of coefficients
In element,
The expression adjacent map
In a limit
Step 2, to matrix of coefficients
Once divide: carry out the figure division by the K-way method among the figure partitioning algorithm storehouse Metis, make the different closely summits of association in same subgraph, the data block number
Value is by computing node number decision in the distributed type assemblies; Divide by figure once, make the summit
With divide the data block produced
Set up following mapping relations:
, wherein
Represent
Individual figure divides the back data block,
Value is a computing node number in the distributed type assemblies;
Partial order restriction relation with each summit of NodeDepence data structure storage; Be described below:
NodeDepence +={<v
i, v
j| (Tile (0, v
i)<Tile (0, v
j)) ∩ (<v
i, v
j∈ E ∪<v
j, v
i∈ E), wherein, Tile(0, v
i) the expression vertex v
iAt the numerical value of the data block at the 0th iteration place, and Tile (0, v
j) the expression vertex v
jNumerical value in the data block at the 0th iteration place;
Step 3, to matrix of coefficients
Carrying out secondary divides: to once dividing the data block that the back produces
Carrying out quadratic diagram by the K-way method in the Metis storehouse, figure partitioning algorithm storehouse again divides; Divide the capacity decision of block size parameter by system memory unit, each data block is divided into again
Individual sub-block; Partial order restriction relation with each summit of Sub_NodeDepence data structure storage; Be described below:
Sub_NodeDepence[m]+={<v
i,v
j>|Tile(0,v
i)<Tile(0,v
j))?∩(<v
i,v
j>∈E?∪?<?v
j?,v
i?>∈E)}
Step 4, sub-block carried out inner iteration border revise, specifically describe as follows:
For each summit, by following formula, the update calculation vertex v
iNumerical value in the data block at the iter time iteration place; Wherein inner iterations is that number of times is revised on the border;
Tile?(iter,v
i)?=?MAX(Tile(iter,v
i),Tile(iter-1,v
j));
Tile(?(iver,v
j)?=?MAX(Tile(iter,v
i),Tile(iter,v
j));
Reorder in step 5, summit: promptly to system of linear equations
In unknown quantity
Reorder, be mapped as new unknown quantity
Under the prerequisite that satisfies the summit partial ordering relation, right
Order rearranges, and in same data block, arrange according to partial ordering relation on the summit, and in the different pieces of information piece, the summit order produces new unknown vector according to the data block series arrangement
,
Step 6, sub-block reorder: utilize the sub-block strategy that reorders, to the order of the sub-block rearrangement under the computing node in the distributed type assemblies; Described sub-block reorders tactful as follows:
A. in 0 computing node inside, there is not the sub-block of dependence with other nodes, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order;
B. in non-0 computing node q inside, with computing node x the sub-block of dependence is arranged, after the rearrangement order, be placed on the prostatitis of sub-block order, all the other sub-blocks are placed in the sub-block execution sequence by original order, wherein x<q;
Step 7, take a sub-block iterative computation process to test and assess as sample, setting the branch block size parameter by exhaustive detection method is the multiple of computing node level cache size in the distributed type assemblies, setting inner iterations is 3~10 times, carry out iterative computation, choose the shortest data block size parameter and inner iterations of iterative computation time;
The execution of step 8, non-regular iteration parallel method: according to the order of the summit after the rearrangement, matrix
Row order and the row order one by one corresponding vertex change in proper order; Carry out the iterative computation based on data block and sub-block then, described iterative computation is as follows:
Data block after the figure division is distributed in each computing node, carries out three cyclings then, outer circulation is convergent iterations, travels through each data block; The middle level circulation travels through the sub-block in each data block successively; After having traveled through middle level circulation, obtain and send data boundary, carry out the interior loop operation then, interior loop is at comprising in the sub-block
Iterative computation is carried out on the summit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100537959A CN102096744A (en) | 2011-03-07 | 2011-03-07 | Irregular iteration parallelization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011100537959A CN102096744A (en) | 2011-03-07 | 2011-03-07 | Irregular iteration parallelization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102096744A true CN102096744A (en) | 2011-06-15 |
Family
ID=44129839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011100537959A Pending CN102096744A (en) | 2011-03-07 | 2011-03-07 | Irregular iteration parallelization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102096744A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520917A (en) * | 2011-12-15 | 2012-06-27 | 杭州电子科技大学 | Parallelization method for three-dimensional incompressible pipe flows |
CN102521463A (en) * | 2011-12-26 | 2012-06-27 | 杭州电子科技大学 | Method for improving numerical reservoir simulation efficiency by optimizing behaviors of Cache |
CN103150290A (en) * | 2013-02-28 | 2013-06-12 | 杭州电子科技大学 | Novel numerical simulation method for three-dimensional incompressible pipe flow |
CN105701291A (en) * | 2016-01-13 | 2016-06-22 | 中国航空动力机械研究所 | Finite element analysis device, information acquisition method and method for parallel generation of system matrix |
CN107797852A (en) * | 2016-09-06 | 2018-03-13 | 阿里巴巴集团控股有限公司 | The processing unit and processing method of data iteration |
CN109478145A (en) * | 2016-06-10 | 2019-03-15 | 华为技术有限公司 | The parallel optimization of homogeneous system |
CN109636709A (en) * | 2018-11-28 | 2019-04-16 | 华中科技大学 | A kind of figure calculation method suitable for heterogeneous platform |
CN111381886A (en) * | 2020-03-02 | 2020-07-07 | 西安交通大学 | Rhombic block parallel optimization method for template calculation |
CN111830361A (en) * | 2019-04-18 | 2020-10-27 | 中国石油化工股份有限公司 | Oil field tank field grounding grid fault detection device |
CN111830362A (en) * | 2019-04-18 | 2020-10-27 | 中国石油化工股份有限公司 | Non-excavation detection method suitable for grounding grid of oil field tank field |
CN113553288A (en) * | 2021-09-18 | 2021-10-26 | 北京大学 | Two-layer blocking multicolor parallel optimization method for HPCG benchmark test |
WO2024007652A1 (en) * | 2022-07-06 | 2024-01-11 | 芯和半导体科技(上海)股份有限公司 | Accelerated solving method for large sparse matrix, system, and storage medium |
-
2011
- 2011-03-07 CN CN2011100537959A patent/CN102096744A/en active Pending
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520917A (en) * | 2011-12-15 | 2012-06-27 | 杭州电子科技大学 | Parallelization method for three-dimensional incompressible pipe flows |
CN102521463A (en) * | 2011-12-26 | 2012-06-27 | 杭州电子科技大学 | Method for improving numerical reservoir simulation efficiency by optimizing behaviors of Cache |
CN103150290A (en) * | 2013-02-28 | 2013-06-12 | 杭州电子科技大学 | Novel numerical simulation method for three-dimensional incompressible pipe flow |
CN105701291B (en) * | 2016-01-13 | 2019-04-23 | 中国航空动力机械研究所 | Finite element fraction analysis apparatus and information acquisition method, sytem matrix parallel generation method |
CN105701291A (en) * | 2016-01-13 | 2016-06-22 | 中国航空动力机械研究所 | Finite element analysis device, information acquisition method and method for parallel generation of system matrix |
CN109478145A (en) * | 2016-06-10 | 2019-03-15 | 华为技术有限公司 | The parallel optimization of homogeneous system |
CN109478145B (en) * | 2016-06-10 | 2021-04-09 | 华为技术有限公司 | Parallel optimization of homogeneous systems |
CN107797852A (en) * | 2016-09-06 | 2018-03-13 | 阿里巴巴集团控股有限公司 | The processing unit and processing method of data iteration |
CN109636709B (en) * | 2018-11-28 | 2020-12-08 | 华中科技大学 | Graph calculation method suitable for heterogeneous platform |
CN109636709A (en) * | 2018-11-28 | 2019-04-16 | 华中科技大学 | A kind of figure calculation method suitable for heterogeneous platform |
CN111830361A (en) * | 2019-04-18 | 2020-10-27 | 中国石油化工股份有限公司 | Oil field tank field grounding grid fault detection device |
CN111830362A (en) * | 2019-04-18 | 2020-10-27 | 中国石油化工股份有限公司 | Non-excavation detection method suitable for grounding grid of oil field tank field |
CN111830362B (en) * | 2019-04-18 | 2021-10-29 | 中国石油化工股份有限公司 | Non-excavation detection method suitable for grounding grid of oil field tank field |
CN111830361B (en) * | 2019-04-18 | 2022-04-22 | 中国石油化工股份有限公司 | Method for detecting corrosion fault of grounding grid of oil field tank field |
CN111381886A (en) * | 2020-03-02 | 2020-07-07 | 西安交通大学 | Rhombic block parallel optimization method for template calculation |
CN113553288A (en) * | 2021-09-18 | 2021-10-26 | 北京大学 | Two-layer blocking multicolor parallel optimization method for HPCG benchmark test |
CN113553288B (en) * | 2021-09-18 | 2022-01-11 | 北京大学 | Two-layer blocking multicolor parallel optimization method for HPCG benchmark test |
WO2024007652A1 (en) * | 2022-07-06 | 2024-01-11 | 芯和半导体科技(上海)股份有限公司 | Accelerated solving method for large sparse matrix, system, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102096744A (en) | Irregular iteration parallelization method | |
Mukkara et al. | Exploiting locality in graph analytics through hardware-accelerated traversal scheduling | |
Xiao et al. | A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach | |
Yao et al. | An efficient graph accelerator with parallel data conflict management | |
Harish et al. | Large graph algorithms for massively multithreaded architectures | |
US6374403B1 (en) | Programmatic method for reducing cost of control in parallel processes | |
Ravishankar et al. | Code generation for parallel execution of a class of irregular loops on distributed memory systems | |
Jeffrey et al. | Data-centric execution of speculative parallel programs | |
Karantasis et al. | Parallelization of reordering algorithms for bandwidth and wavefront reduction | |
Hong et al. | MultiGraph: Efficient graph processing on GPUs | |
Tabuchi et al. | A source-to-source OpenACC compiler for CUDA | |
Bøgh et al. | Work-efficient parallel skyline computation for the GPU | |
CN105224452A (en) | A kind of prediction cost optimization method for scientific program static analysis performance | |
Palermo et al. | Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers | |
Boehm et al. | Declarative machine learning-a classification of basic properties and types | |
Liu et al. | On-chip cache hierarchy-aware tile scheduling for multicore machines | |
Yin et al. | Conflict-free loop mapping for coarse-grained reconfigurable architecture with multi-bank memory | |
Wang et al. | Pencil: A pipelined algorithm for distributed stencils | |
Liu et al. | OBFS: OpenCL based BFS optimizations on software programmable FPGAs | |
Agnesina et al. | Improving FPGA-based logic emulation systems through machine learning | |
Lashgar et al. | IPMACC: open source openacc to cuda/opencl translator | |
CN105260222A (en) | Optimization method for initiation interval between circulating pipeline iterations in reconfigurable compiler | |
CN112306500A (en) | Compiling method for reducing multi-class access conflict aiming at coarse-grained reconfigurable structure | |
CN109522127A (en) | A kind of fluid machinery simulated program isomery accelerated method based on GPU | |
CN109388876A (en) | A kind of groundwater solute transfer numerical simulation parallel acceleration method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20110615 |