CN104765589B - Grid parallel computation preprocess method based on MPI - Google Patents
Grid parallel computation preprocess method based on MPI Download PDFInfo
- Publication number
- CN104765589B CN104765589B CN201410004273.3A CN201410004273A CN104765589B CN 104765589 B CN104765589 B CN 104765589B CN 201410004273 A CN201410004273 A CN 201410004273A CN 104765589 B CN104765589 B CN 104765589B
- Authority
- CN
- China
- Prior art keywords
- grid
- array
- grid cell
- file
- mpi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 126
- 230000008569 process Effects 0.000 claims abstract description 85
- 238000005192 partition Methods 0.000 claims abstract description 30
- 230000001351 cycling effect Effects 0.000 claims abstract description 5
- 238000007670 refining Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 19
- 238000004891 communication Methods 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241001229889 Metis Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011438 discrete method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007788 roughening Methods 0.000 description 1
- 235000015096 spirit Nutrition 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013316 zoning Methods 0.000 description 1
Landscapes
- Multi Processors (AREA)
Abstract
The invention discloses a kind of grid parallel computation preprocess method based on MPI, include the number of partitions of the grid of given computational fields;Start MPI multi-process, set into number of passes;Judge whether be equal to the number of partitions into number of passes, equal to grid file is then opened, host process reads grid cell message file, grid cell is averagely initially allocated to each process, it abuts array to each process creation, otherwise restarts MPI multi-process;Each process calls ParMETIS to carry out mesh generation;Grid cell block sort is read array by each process, sets the index position of array;Each course cycles traversal grid cell message file, judge that whether array length subtracts the index position number of array less than grid cell message length, it is filled into less than the data for then reading grid cell message file in array, array element is otherwise assigned to grid cell;Judge whether the partition number of grid cell is equal to process number, equal to then by the storage of grid cell information into process file, otherwise continue cycling through judgement.
Description
Technical field
The present invention relates to a kind of parallel preprocessing, specifically, it is related to a kind of grid parallel computation preprocess method.
Background technology
In scientific and engineering computing field, grid is significant to the numerical solution of all kinds of differential equations, grid
Distribution is to solve for the basic environment calculated.The solution of the differential equation mainly includes numerical discretization and Algebraic Equation set solves two steps,
In the case where discrete method is determined, grid distributed intelligence can directly reflect Algebraic Equation set solution vector and the logic of coefficient matrix
Structure.With the extensive use of parallel computation, grid plays highly important role in the Parallel implementation of the differential equation.It is right
For Distributed Parallel Computing, the parallel execution of mesh generation and grid data distribution storage based on Region Decomposition is differential side
The main Parallel implementation approach of journey.
Mesh generation is extensive multiple by one by setting up the corresponding relation of grid cell and parallel computer multiprocessor
Miscellaneous mesh generation is into multiple sub-grids.The quality of mesh generation directly affects the efficiency of parallel computation and the essence of derivation algorithm
Exactness, the key of Gridding Method is how to be divided big grid so that sub-grid is easier to Parallel implementation, and
Computational load balance and the minimum target of inter-processor communication expense on each processor can be reached.
Data are split and mesh information management is the grid parallel computation pretreatment main time-consuming stage, and prior art is pre- in grid
The data segmentation of processing and mesh information management stage, time-consuming, and efficiency is low.For the division of grid, existing multilayer recurrence
To point, that ranks such as divide at the technical speed is slow, divides quality undesirable.Existing grid pretreating scheme is generally the side serially performed
Formula, can only be performed on single cpu core, while many using serial traversal grid file, speed is slower.Moreover, existing grid is pre-
Processing scheme is more to be centrally stored in grid data file in one or a few file, when data scale is larger, can produce
Raw I/O file read-writes are blocked, and influence the scale and speed of grid processing data.
The content of the invention
It is an object of the invention to provide a kind of grid parallel computation preprocess method based on MPI, realized using ParMETIS
Efficiently quick mesh generation, using distributed storage grid data, improves the scale and speed of processing data.
To achieve these goals, the technical solution adopted in the present invention is as follows:
A kind of grid parallel computation preprocess method based on MPI, comprises the following steps:The subregion of the grid of given computational fields
Number;Start MPI multi-process, set into number of passes;Judge whether be equal to the number of partitions into number of passes, it is main if opening grid file equal to if
Process reads grid cell message file, and grid cell is averagely initially allocated to each process, each process creation grid list
The adjoining array of member, otherwise restarts MPI multi-process;Each process is called ParMETIS to carry out grid to grid cell and drawn
Point;Grid cell block sort is read array by each process, sets the index position of array;Each course cycles traversal grid
Unit information file, judges that whether array length subtracts the index position number of array less than grid cell message length, if being less than
The data for then reading grid cell message file are filled into array, array element otherwise are assigned into grid cell, and change
The index position of array;Judge whether the partition number of grid cell is equal to process number, store grid cell information if equal to if
Into process file, array indexing position is otherwise changed, judgement is continued cycling through.
Further, the number of partitions of the grid of computational fields is less than or equal to the processor number of parallel computer.
Further, the storage format of adjacent array is CSR.
Further, each process calls ParMETIS subprogram ParMETIS_V3_Mesh2Dual, and grid cell is turned
Chemical conversion figure.
Further, each process calls ParMETIS subprogram ParMETIS_V3_AdaptiveRepart, and figure is entered
Row is divided again.
Further, each process calls ParMETIS subprogram ParMETIS_V3_RefineKway, further refines
The quality of mesh generation.
Compared with prior art, the present invention realizes efficiently quickly mesh generation using ParMETIS, is deposited using distribution
Grid data is stored up, the scale and speed of processing data is improved.
Brief description of the drawings
The schematic flow sheet that Fig. 1 pre-processes for the grid parallel computation of the present invention;
Fig. 2 is the schematic flow sheet of the grid-distributed storage of the present invention.
Embodiment
Grid parallel computation preprocess method of the present invention based on MPI is made furtherly with specific embodiment below in conjunction with the accompanying drawings
It is bright.
The present invention uses the distributed parallel executive mode based on MPI, divides by ParMETIS parallel Trellis subregion and again
Area's function, high-quality divide is carried out to three-dimensional grid using multilayer k- roads figure division methods.According to the result after mesh generation,
Start multi-process searching loop grid file, realize the fast parallel pretreatment of extensive grid.Using the present invention based on MPI's
Grid parallel computation preprocess method, can substantially reduce the call duration time during grid parallel computation is calculated, improve parallel efficiency calculation.
ParMETIS(Parallel Graph Partitioning and Fill-reducing Matrix
Ordering)-figure is divided parallel and filling-reduced matrix sorts, and is particularly suitable for the parallel numerical of extensive unstructured grids
Simulation.ParMETIS is based on the parallel storehouses of MPI, realizes for being divided without structure chart, mesh generation, calculates filling out for sparse matrix
Fill-many algorithms such as reduction order.ParMETIS extends the function that METIS is provided, and contain be particularly suitable for it is parallel
Calculate the subprogram with numerical simulation.The algorithm realized in ParMETIS, which has to divide based on parallel multilayer k- roads figure, to be calculated
Method, it is a kind of division methods based on graph theory that multilayer k- roads figure, which is divided, generally there is the roughening algorithm of figure, initial division algorithm and also
Former optimized algorithm composition.Make the summit weights of each subgraph essentially identical based on multilayer k- roads figure division methods and divide the side of generation
Cut flexible strategy to minimize, the call duration time that division result is produced is substantially reduced compared with other division methods such as ranks, so that
The execution time of whole concurrent program can effectively be reduced, and continuous increase with data scale and processor number
Increase, the effect of communication overhead reduction is more obvious.
After mesh generation is completed, the distribution corresponding relation of each mesh node or unit and processor can be obtained immediately, is connect
Needs to carry out data segmentation according to this division result for the distributed storage of gridding information.The main implementation process of data segmentation
It is that by node or element number traversal loop is done to all processors, node or unit to being allocated to current processor, right
The array element answered is moved in local storage with grid cell node list, is finally generated inside each processor local
Coordinate array and adjoint point matrix, realize the distributed storage of gridding information.
MPI is a kind of parallel programming model based on message transmission, be now widely used in distributed storage architecture and
During row is calculated.MPI initializes MPI performing environments by MPI_Init functions, starts multiple processes, create multiple MPI processes it
Between communication domain.Distributed parallel implementation strategy based on MPI, is a kind of parallel algorithm of coarseness, by by finite element net
Lattice zoning is divided into the subregion number equal with entering number of passes, then by the grid data of these subregions be mapped to it is each enter
Parallel preconditioning in journey.Because each process is only responsible for the pretreatment of respective subregion, only produced on grid subzone boundaries face
Raw communication, data traffic is few, therefore can obtain the effect of good Parallel preconditioning.
Referring to Fig. 1, the scheme for the grid parallel computation pretreatment that the present invention is provided, starts MPI multi-process on computers, if
Surely the number of passes that enters of ParMETIS subregion tasks is performed, that is, creates ParMETIS communication domain.Utilize ParMETIS parallel regions point
Solution instrument, creates adjoining the array xadj and adjncy of grid cell, as the input parameter of ParMETIS power functions, by net
Lattice change into figure, then figure is carried out into repartition.Repartition result can realize the load balance of parallel computation and less subregion side
Boundary's number, so as to reduce the call duration time of parallel computation, significantly improves the efficiency of grid parallel computation calculating.ParMETIS mesh generation knots
Fruit establishes the one-to-one relationship between grid cell or node and processor process, and each process is according to division result circulation time
Gridding information file is gone through, grid data is read by the method piecemeal for positioning file pointer and array indexing, to being allocated to current place
The node or unit of device are managed, corresponding array element and grid cell node list are moved in local storage, finally existed
The local coordinate array of generation and adjacency matrix, quickly realize the distributed storage of gridding information inside each processor.This hair
The method that bright positioning file pointer and array indexing piecemeal are read, substantially reduce the number the operation for reading file, and effectively prevent many
Process reads the competition stand-by period consumption that file is produced simultaneously.
Refer to after the completion of Fig. 1 and Fig. 2, grid data distributed storage, each process uses linked list data structure, will constitute
All mesh nodes of local grid unit carry out insertion sort, to index the partial indexes that chained list represents local grid node,
Create the partial indexes of mesh node.Each process reorders to grid cell, improves and solves system of linear equations sparse matrix
Quality, and set interprocess communication relation to index the grid cell after sequence, last each process preserves local sparse matrix
Etc. the Parallel implementation that data are used as equation.High-quality mesh generation result and efficiently quickly realization in the present invention, are differential
The accuracy of equation Parallel implementation, which is provided, to be ensured, while the application for extensive grid in numerical simulation is provided conveniently.
Referring to Fig. 2, the invention discloses a kind of grid parallel computation preprocess method based on MPI, comprising the following steps:Give
Determine the number of partitions of the grid of computational fields;Start MPI multi-process, set into number of passes;Judge whether be equal to the number of partitions into number of passes, if waiting
In then opening grid file, host process reads grid cell message file, grid cell is averagely initially allocated to each process,
The adjoining array of each process creation grid cell, otherwise restarts MPI multi-process;Each process calls ParMETIS to net
Lattice unit carries out mesh generation;Grid cell block sort is read array by each process, sets the index position of array;Each
Course cycles travel through grid cell message file, judge that whether array length subtracts the index position number of array less than grid cell
Message length, is filled into array if the data that grid cell message file is read less than if, is otherwise assigned to array element
Grid cell, and change the index position of array;Judge whether the partition number of grid cell is equal to process number, by net if being equal to
Lattice unit information is stored into process file, is otherwise changed array indexing position, is continued cycling through judgement.
In grid pretreating scheme of the present invention, the number of partitions of the grid of computational fields is less than or equal to the processing of parallel computer
Device number.The storage format of adjacent array is CSR (Compressed Sparse Row).Each process calls ParMETIS
Program ParMETIS_V3_Mesh2Dual, figure is changed into by grid cell.Each process calls ParMETIS subprogram
ParMETIS_V3_AdaptiveRepart, is divided again to figure.Each process calls ParMETIS subprogram
ParMETIS_V3_RefineKway, the quality for mesh generation of further refining.The present invention is by repeatedly calling ParMETIS's
Refine grid function function ParMETIS_V3_RefineKway, continues to optimize the quality of mesh generation, further reduces grid
Partition boundaries size, reduces the call duration time of parallel computation, improves the quality of mesh generation.
The present invention realizes efficiently quickly mesh generation, ParMETIS multilayer k- roads figure division methods using ParMETIS
Flexible strategy minimum, the call duration time that division result is produced are cut in the side for making the summit weights of each subgraph essentially identical and dividing generation
It is short so that the execution time of whole concurrent program effectively reduced, and the continuous increase with data scale and place
The increase of device number is managed, the effect of communication overhead reduction is more obvious.
The present invention is by the way of MPI multi-process calculating, and distribution carries out the division and pretreatment of grid, can realize big rule
The quick division of lay wire lattice;Extensive grid is divided using ParMETIS parallel regions disassembling tool simultaneously, by net to be divided
Lattice, are initially evenly distributed to multiple processes, and each task parallelism completes the division of grid, lifts the speed of mesh generation.
Each process of the present invention is according to the result of mesh generation, searching loop grid file, using positioning file pointer sum
The method of group index carries out piecemeal reading to grid file, reduces the operation of substantial amounts of reading gridding information, effectively prevent simultaneously
The competition stand-by period consumption of file is read in multi-process simultaneously, is accelerated segmentation of each process to grid data, can quickly be realized net
The distributed storage of lattice.
Embodiment one
The basic step of the grid parallel computation preprocess method based on MPI of the invention is:MPI process initiations, read in net first
Lattice file data, two files are write into from grid file respectively by node information and grid cell information.Create new communication
Domain, calls ParMETIS partition functions, and high-quality division is realized to grid.According to the result of mesh generation, start and enter more
Journey carries out searching loop to grid file simultaneously, by positioning the method that file pointer piecemeal is read, realizes that the distribution of grid is deposited
Storage.
Refer to Fig. 1 and Fig. 2, the present embodiment is comprised the following steps that:
1. user gives the number of partitions num_domains needed for grid, the number of partitions can not be more than the processing of parallel computer
Device number.
2. starting MPI multi-process, set into number of passes num_processors, the number of partitions need to be equal to by entering number of passes.
3. judging that MPI enters whether number of passes is equal to the number of partitions of grid, if entering number of passes equal to the number of partitions, program continues to hold
OK, otherwise exit, restart MPI multi-process, set into number of passes num_processors.
4. from grid file, read in the global geometry nodal point number of grid and number of meshes.Open and specified under assigned catalogue
Original mesh file channnel.msh, from channnel.msh files, read in the global geometry nodal point number global_ of grid
Nde and global grid unit number global_nel.
5. by the grid.bin files that mesh node is numbered and node coordinate is write under assigned catalogue.From channnel.msh
In grid file, mesh node information is read with fscanf functions, and mesh node numbering and node coordinate are write into specified mesh
Grid.bin files under record.
6. the nenn.bin texts that the information such as the ode table by the type of grid cell and Component units are write under assigned catalogue
Part.The information such as grid cell information, and ode table by the type of grid cell and Component units are read with fscanf functions to write
The nenn.bin files entered under assigned catalogue.
7. create the communication domain of ParMETIS partitioning tools.That specifies ParMETIS enters number of passes num_run, according to specified
Communication domain needed for process creation ParMETIS partition functions.
8. mesh generation.Comprise the following steps:
1)Host process reads grid cell message file nenn.bin, and grid cell is averagely initially allocated to each process,
Each process is responsible for the division of global_nel/num_processors grid cell.
2)Host process creates array elmdist, elmdist=new idx_t [num_run+1], represents each process processing
Grid cell scope.Wherein the mpi_id process is responsible for elmdist [mpi_id] to elmdist [mpi_id+1] individual net
The division of lattice unit.
3)Host process creates the adjoining structure of arrays of global grid unit, using CSR forms i.e. with two array global_
Eptr, global_eind represent the syntople of global grid unit.
4)Host process obtains grid according to adjacent array global_eptr, global_eind and elmdist array of the overall situation
The parallel C SR forms of unit, the i.e. individual grid cell of each process elmdist [mpi_id] to elmdist [mpi_id+1]
Adjacent structure, is represented with array eptr, eind.
5)Host process is circulated using MPI communication modes according to number of passes is entered, by elmdist [mpi_id] to elmdist
Adjacent structure array eptr, eind of [mpi_id+1] individual grid cell, is sent to the process that process number is mpi_id.
6)Subprocess receives eptr, eind array that host process is sent using MPI_Recv functions.
7)Each process prepares the other input/output arguments of ParMETIS functions, such as represents the output ginseng of division result array
Number part.
8)Each process calls ParMETIS function ParMETIS_V3_Mesh2Dual, by Mesh Conversion into figure, obtains figure
Abutment structure, is represented with array xadj, adjncy.
9)Each process calls ParMETIS repartition function ParMETIS_V3_AdaptiveRepart, with the adjoining of figure
Structural array xadj, adjncy is divided again as the input parameter of the function to figure, obtains the result array part divided,
Represent the corresponding relation of grid cell and process number.
10)Each process calls the ParMETIS partition functions ParMETIS_V3_RefineKway that refines, in above-mentioned division
On the basis of, the quality for mesh generation of further refining.
9. each process is by the output result array part of ParMETIS functions, with MPI IO functions MPI_File_write_
At writes into partition.bin files simultaneously, wherein the start offset position that each process is write is elmdist [mpi_id], writes
Array size is elmdist [mpi_id+1]-elmdist [mpi_id].
10. the distributed storage of grid.The present invention is stored using distributed storage mode, and each subregion can produce oneself
In data file, the memory space for being stored in corresponding process, I/O bottlenecks are reduced, the scale and speed of processing data is improved.
Referring to Fig. 2, comprising the following steps:
1)Partition.bin division result is read in array ele_part by each process, represents global grid unit
Partition number.
2)Each process creation size is file_arr_size piecemeal reading group file_arr, for storing each piecemeal
The grid cell content of reading.File_arr_size oneself can be set, and set the DeGrain of too small algorithm, set bigger
The number of times of file read-write request is fewer, and the performance of algorithm is better, so when internal memory is enough big, file_arr_size can be set
It is set to bigger value.
3)Each process reads file_arr_size categorical data to array file_arr from nenn.bin files, and changes
File pointer deviation post offset is file_arr_size*sizeof (data type), sets the current index position of array
Arr_offset is 0, sets file to read part size read_file_size for file_arr_size.Calculate whole file
Nenn.bin file size is file_size.
4)Each process judges array according to global grid unit number searching loop grid cell message file nenn.bin
Whether the skew that file_arr size subtracts array is less than a complete grid cell message length.If array file_arr
Size subtract the skew of array and be less than complete grid cell message length, then position nenn.bin file pointer,
From the position readings evidence of current nenn.bin file pointers, it is filled into array file_arr.If array file_arr size
Not less than one complete grid cell message length of skew of array is subtracted, then positions array file_arr index, ought
Element under preceding array indexing is to grid cell type, grid physical entity, and grid cell node array assignment, and assignment is complete
Cheng Hou, modification array indexing position arr_offset value.
if((file_arr_size–arr_offset)<(3+n_max)) judge whether array surplus element number is less than
The length of one full unit information.
{
The element value that the surplus element of array is arr_offset to file_arr_size is assigned to array successively
0th to file_arr_size-arr_offset-1 elements, and with fseek function locating nenn.bin file pointers to currently
The position of offset values.
if(file_size–read_file_size)>=arr_offset) judge whether file unread portion is less than array
Part to be filled
{
Create the reading array of data read_arr that a size is arr_offset, the data read for memory partitioning.
Arr_offset categorical data is read to array read_arr, change file pointer deviation post offset=
offset+arr_offset*sizeof(Data type)
Read_arr array element is assigned to array file_arr rear arr_offset elements successively.
The size read_file_size of file value, read_file_size=read_file_size+ have been read in modification
arr_offset
}
else
{
The array read_arr that a size is file_size-read_file_size is created, file_size- is read
Read_file_size categorical data changes file pointer deviation post offset=offset+ to array read_arr
(file_size-read_file_size)*sizeof(Data type)
Read_arr array element is assigned to array file_arr rear file_size-read_file_ successively
Size elements.
The size read_file_size of file value, read_file_size=file_size have been read in modification
}
Delete array read_arr, releasing memory.
The value for setting array file_arr index position arr_offset is 0
else
{
The element of current array indexing position is assigned to the type ele_type of grid cell, and node array
Value arr_offset=arr_offset+3+n_max of array indexing is changed, wherein 3+n_max believes for grid cell
The length of breath.
}
If (whether partition number is equal to process number)
{
The element of current index position is assigned to the node array nenn [n_max] of grid cell, wherein n_max is structure
Into the nodal point number of grid cell.Cell type ele_type, node array nenn are write into the grid cell named with process number
File nenn_mpi_id_physical_entity.bin.
}
Judge whether the partition number of current grid unit is equal to local process number, if partition number is equal to process number, by net
The type and node array of lattice unit write into the file nenn_mpi_id_ named jointly by local process number and physical entity number
Physical_entity.bin, if partition number is not equal to process number, modification array indexing position continues cycling through judgement.
5)Each process is according to global grid unit number global_nel circulating repetition steps 4), terminate until entirely circulating,
Realize the distributed storage of grid.
11. grid parallel computation pretreatment terminates.
Described above is the detailed description for the present invention preferably possible embodiments, but embodiment is not limited to this hair
The equal change or modification change completed under bright patent claim, all disclosed technical spirits, all should belong to
Cover the scope of the claims in the present invention.
Claims (6)
1. a kind of grid parallel computation preprocess method based on MPI, it is characterised in that comprise the following steps:
The number of partitions of the grid of given computational fields;
Start MPI multi-process, set into number of passes;
Judge whether be equal to the number of partitions into number of passes, if opening grid file equal to if, host process reads grid cell message file,
Grid cell is averagely initially allocated to each process, otherwise the adjoining array of each process creation grid cell restarts
MPI multi-process;
Each process calls ParMETIS to carry out mesh generation to grid cell;
Grid cell block sort is read array by each process, sets the index position of array;
Each course cycles traversal grid cell message file, judges that array length subtracts the index position number of array and whether is less than
Grid cell message length, is filled into array if the data that grid cell message file is read less than if, otherwise by array member
Element is assigned to grid cell, and changes the index position of array;
Judge whether the partition number of grid cell is equal to process number, process file is arrived if storing grid cell information equal to if
In, array indexing position is otherwise changed, judgement is continued cycling through.
2. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that:The grid of computational fields
The number of partitions is less than or equal to the processor number of parallel computer.
3. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that:The storage of adjacent array
Form is CSR.
4. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that:Each process is called
ParMETIS subprogram ParMETIS_V3_Mesh2Dual, figure is changed into by grid cell.
5. the grid parallel computation preprocess method as claimed in claim 4 based on MPI, it is characterised in that:Each process is called
ParMETIS subprogram ParMETIS_V3_AdaptiveRepart, is divided again to figure.
6. the grid parallel computation preprocess method as claimed in claim 5 based on MPI, it is characterised in that:Each process is called
ParMETIS subprogram ParMETIS_V3_RefineKway, the quality for mesh generation of further refining.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410004273.3A CN104765589B (en) | 2014-01-02 | 2014-01-02 | Grid parallel computation preprocess method based on MPI |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410004273.3A CN104765589B (en) | 2014-01-02 | 2014-01-02 | Grid parallel computation preprocess method based on MPI |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104765589A CN104765589A (en) | 2015-07-08 |
CN104765589B true CN104765589B (en) | 2017-10-31 |
Family
ID=53647447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410004273.3A Active CN104765589B (en) | 2014-01-02 | 2014-01-02 | Grid parallel computation preprocess method based on MPI |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104765589B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106548512B (en) * | 2015-09-22 | 2019-10-29 | 中国石油化工股份有限公司 | The generation method of grid model data |
CN105701291B (en) * | 2016-01-13 | 2019-04-23 | 中国航空动力机械研究所 | Finite element fraction analysis apparatus and information acquisition method, sytem matrix parallel generation method |
CN107688680A (en) * | 2016-08-05 | 2018-02-13 | 南京理工大学 | A kind of efficient time-Domain FEM domain decomposition parallel method |
CN107391871A (en) * | 2017-08-03 | 2017-11-24 | 中国空气动力研究与发展中心计算空气动力研究所 | A kind of space lattice deformation method based on parallelization RBF |
CN107391892A (en) * | 2017-09-11 | 2017-11-24 | 元计算(天津)科技发展有限公司 | A kind of parallel encoding method and system based on finite element language |
CN109271344B (en) * | 2018-08-07 | 2020-08-04 | 浙江大学 | Data preprocessing method based on parallel file reading of Shenwei chip architecture |
CN110532093B (en) * | 2019-08-23 | 2022-05-13 | 中国原子能科学研究院 | Parallel task division method for multi-geometric-shape full core sub-channels of numerical nuclear reactor |
CN111125949A (en) * | 2019-12-06 | 2020-05-08 | 北京科技大学 | Large-scale parallel meshing system and method for finite element analysis |
CN111914455B (en) * | 2020-07-31 | 2024-03-15 | 英特工程仿真技术(大连)有限公司 | Finite element parallel computing method based on node overlap type regional decomposition Schwarz alternation-free |
CN113177329B (en) * | 2021-05-24 | 2022-05-27 | 清华大学 | Data processing system for numerical program |
CN113900808A (en) * | 2021-10-09 | 2022-01-07 | 合肥工业大学 | MPI parallel data structure based on arbitrary polyhedron unstructured grid |
CN114004176B (en) * | 2021-10-29 | 2023-08-25 | 中船奥蓝托无锡软件技术有限公司 | Uniform structured grid parallel partitioning method |
CN114490648A (en) * | 2022-01-17 | 2022-05-13 | 三亚海兰寰宇海洋信息科技有限公司 | Data processing method, device and equipment for offshore target object |
-
2014
- 2014-01-02 CN CN201410004273.3A patent/CN104765589B/en active Active
Non-Patent Citations (3)
Title |
---|
Formal Verification of Practical MPI Programs;Anh Vo, et.al;《Acm Sigplan Notices》;20090218;第4卷(第44期);正文第261-270页 * |
Parallel Programming Environment for;Andrey Chernikov, et.al;《Proceedings of ICNGG. 2002》;20020430;正文第1-10页 * |
PARMETIS Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 3.1;George Karypis, et.al;《https://dev.ece.ubc.ca/projects/gpgpu-sim/export/96f6ad00d6d3e9a58b1d51edaac76d061c02fa82/ispass2009-benchmarks/DG/3rdParty/ParMetis-3.1/Manual/manual.pdf 》;20030815;正文第5-6页 3.1 Unstructured Graph Partitioning、第7页 3.2 Partitioning Meshes Directly、第9页第8-9行、第11页第10-12行、第12-13页4.1 Format of the Input Graph、第14页 4.4 Format of the Computed Partitionings and Orderings:Format of the Partitioning Array, 第5页附图1 * |
Also Published As
Publication number | Publication date |
---|---|
CN104765589A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104765589B (en) | Grid parallel computation preprocess method based on MPI | |
Chen et al. | A bi-layered parallel training architecture for large-scale convolutional neural networks | |
WO2021057713A1 (en) | Method for splitting neural network model by using multi-core processor, and related product | |
KR101959376B1 (en) | Systems and methods for a multi-core optimized recurrent neural network | |
US8463820B2 (en) | System and method for memory bandwidth friendly sorting on multi-core architectures | |
EP3979143A1 (en) | Method of performing splitting in neural network model by means of multi-core processor, and related product | |
US8676874B2 (en) | Data structure for tiling and packetizing a sparse matrix | |
Yeralan et al. | Algorithm 980: Sparse QR factorization on the GPU | |
CN110826708B (en) | Method for realizing neural network model splitting by using multi-core processor and related product | |
Dimond et al. | Accelerating large-scale HPC Applications using FPGAs | |
KR20130090147A (en) | Neural network computing apparatus and system, and method thereof | |
Holst et al. | High-throughput logic timing simulation on GPGPUs | |
Wang et al. | Towards memory-efficient allocation of CNNs on processing-in-memory architecture | |
US20200090051A1 (en) | Optimization problem operation method and apparatus | |
CN108710943B (en) | Multilayer feedforward neural network parallel accelerator | |
CN105468439A (en) | Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework | |
Liu | Parallel and scalable sparse basic linear algebra subprograms | |
CN116384312B (en) | Circuit yield analysis method based on parallel heterogeneous computation | |
JP2021179937A (en) | Neural network accelerator hardware-specific division of inference | |
Gong et al. | Improving hw/sw adaptability for accelerating cnns on fpgas through a dynamic/static co-reconfiguration approach | |
CN110211234A (en) | A kind of grid model sewing system and method | |
Dhar et al. | GDP: GPU accelerated detailed placement | |
US11409836B2 (en) | Optimization problem arithmetic method and optimization problem arithmetic apparatus | |
Martínez del Amor et al. | Sparse-matrix representation of spiking neural P systems for GPUs | |
Li et al. | FSimGP^ 2: An efficient fault simulator with GPGPU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |