CN104765589B

CN104765589B - Grid parallel computation preprocess method based on MPI

Info

Publication number: CN104765589B
Application number: CN201410004273.3A
Authority: CN
Inventors: 陈春艳; 罗海飙; 廖俊豪; 王婷
Original assignee: Guangzhou Institute of Software Application Technology Guangzhou GZIS
Current assignee: Guangzhou Institute of Software Application Technology Guangzhou GZIS
Priority date: 2014-01-02
Filing date: 2014-01-02
Publication date: 2017-10-31
Anticipated expiration: 2034-01-02
Also published as: CN104765589A

Abstract

The invention discloses a kind of grid parallel computation preprocess method based on MPI, include the number of partitions of the grid of given computational fields；Start MPI multi-process, set into number of passes；Judge whether be equal to the number of partitions into number of passes, equal to grid file is then opened, host process reads grid cell message file, grid cell is averagely initially allocated to each process, it abuts array to each process creation, otherwise restarts MPI multi-process；Each process calls ParMETIS to carry out mesh generation；Grid cell block sort is read array by each process, sets the index position of array；Each course cycles traversal grid cell message file, judge that whether array length subtracts the index position number of array less than grid cell message length, it is filled into less than the data for then reading grid cell message file in array, array element is otherwise assigned to grid cell；Judge whether the partition number of grid cell is equal to process number, equal to then by the storage of grid cell information into process file, otherwise continue cycling through judgement.

Description

Grid parallel computation preprocess method based on MPI

Technical field

The present invention relates to a kind of parallel preprocessing, specifically, it is related to a kind of grid parallel computation preprocess method.

Background technology

In scientific and engineering computing field, grid is significant to the numerical solution of all kinds of differential equations, grid Distribution is to solve for the basic environment calculated.The solution of the differential equation mainly includes numerical discretization and Algebraic Equation set solves two steps, In the case where discrete method is determined, grid distributed intelligence can directly reflect Algebraic Equation set solution vector and the logic of coefficient matrix Structure.With the extensive use of parallel computation, grid plays highly important role in the Parallel implementation of the differential equation.It is right For Distributed Parallel Computing, the parallel execution of mesh generation and grid data distribution storage based on Region Decomposition is differential side The main Parallel implementation approach of journey.

Mesh generation is extensive multiple by one by setting up the corresponding relation of grid cell and parallel computer multiprocessor Miscellaneous mesh generation is into multiple sub-grids.The quality of mesh generation directly affects the efficiency of parallel computation and the essence of derivation algorithm Exactness, the key of Gridding Method is how to be divided big grid so that sub-grid is easier to Parallel implementation, and Computational load balance and the minimum target of inter-processor communication expense on each processor can be reached.

Data are split and mesh information management is the grid parallel computation pretreatment main time-consuming stage, and prior art is pre- in grid The data segmentation of processing and mesh information management stage, time-consuming, and efficiency is low.For the division of grid, existing multilayer recurrence To point, that ranks such as divide at the technical speed is slow, divides quality undesirable.Existing grid pretreating scheme is generally the side serially performed Formula, can only be performed on single cpu core, while many using serial traversal grid file, speed is slower.Moreover, existing grid is pre- Processing scheme is more to be centrally stored in grid data file in one or a few file, when data scale is larger, can produce Raw I/O file read-writes are blocked, and influence the scale and speed of grid processing data.

The content of the invention

It is an object of the invention to provide a kind of grid parallel computation preprocess method based on MPI, realized using ParMETIS Efficiently quick mesh generation, using distributed storage grid data, improves the scale and speed of processing data.

To achieve these goals, the technical solution adopted in the present invention is as follows：

A kind of grid parallel computation preprocess method based on MPI, comprises the following steps：The subregion of the grid of given computational fields Number；Start MPI multi-process, set into number of passes；Judge whether be equal to the number of partitions into number of passes, it is main if opening grid file equal to if Process reads grid cell message file, and grid cell is averagely initially allocated to each process, each process creation grid list The adjoining array of member, otherwise restarts MPI multi-process；Each process is called ParMETIS to carry out grid to grid cell and drawn Point；Grid cell block sort is read array by each process, sets the index position of array；Each course cycles traversal grid Unit information file, judges that whether array length subtracts the index position number of array less than grid cell message length, if being less than The data for then reading grid cell message file are filled into array, array element otherwise are assigned into grid cell, and change The index position of array；Judge whether the partition number of grid cell is equal to process number, store grid cell information if equal to if Into process file, array indexing position is otherwise changed, judgement is continued cycling through.

Further, the number of partitions of the grid of computational fields is less than or equal to the processor number of parallel computer.

Further, the storage format of adjacent array is CSR.

Further, each process calls ParMETIS subprogram ParMETIS_V3_Mesh2Dual, and grid cell is turned Chemical conversion figure.

Further, each process calls ParMETIS subprogram ParMETIS_V3_AdaptiveRepart, and figure is entered Row is divided again.

Further, each process calls ParMETIS subprogram ParMETIS_V3_RefineKway, further refines The quality of mesh generation.

Compared with prior art, the present invention realizes efficiently quickly mesh generation using ParMETIS, is deposited using distribution Grid data is stored up, the scale and speed of processing data is improved.

Brief description of the drawings

The schematic flow sheet that Fig. 1 pre-processes for the grid parallel computation of the present invention；

Fig. 2 is the schematic flow sheet of the grid-distributed storage of the present invention.

Embodiment

Grid parallel computation preprocess method of the present invention based on MPI is made furtherly with specific embodiment below in conjunction with the accompanying drawings It is bright.

The present invention uses the distributed parallel executive mode based on MPI, divides by ParMETIS parallel Trellis subregion and again Area's function, high-quality divide is carried out to three-dimensional grid using multilayer k- roads figure division methods.According to the result after mesh generation, Start multi-process searching loop grid file, realize the fast parallel pretreatment of extensive grid.Using the present invention based on MPI's Grid parallel computation preprocess method, can substantially reduce the call duration time during grid parallel computation is calculated, improve parallel efficiency calculation.

ParMETIS（Parallel Graph Partitioning and Fill-reducing Matrix Ordering)-figure is divided parallel and filling-reduced matrix sorts, and is particularly suitable for the parallel numerical of extensive unstructured grids Simulation.ParMETIS is based on the parallel storehouses of MPI, realizes for being divided without structure chart, mesh generation, calculates filling out for sparse matrix Fill-many algorithms such as reduction order.ParMETIS extends the function that METIS is provided, and contain be particularly suitable for it is parallel Calculate the subprogram with numerical simulation.The algorithm realized in ParMETIS, which has to divide based on parallel multilayer k- roads figure, to be calculated Method, it is a kind of division methods based on graph theory that multilayer k- roads figure, which is divided, generally there is the roughening algorithm of figure, initial division algorithm and also Former optimized algorithm composition.Make the summit weights of each subgraph essentially identical based on multilayer k- roads figure division methods and divide the side of generation Cut flexible strategy to minimize, the call duration time that division result is produced is substantially reduced compared with other division methods such as ranks, so that The execution time of whole concurrent program can effectively be reduced, and continuous increase with data scale and processor number Increase, the effect of communication overhead reduction is more obvious.

After mesh generation is completed, the distribution corresponding relation of each mesh node or unit and processor can be obtained immediately, is connect Needs to carry out data segmentation according to this division result for the distributed storage of gridding information.The main implementation process of data segmentation It is that by node or element number traversal loop is done to all processors, node or unit to being allocated to current processor, right The array element answered is moved in local storage with grid cell node list, is finally generated inside each processor local Coordinate array and adjoint point matrix, realize the distributed storage of gridding information.

MPI is a kind of parallel programming model based on message transmission, be now widely used in distributed storage architecture and During row is calculated.MPI initializes MPI performing environments by MPI_Init functions, starts multiple processes, create multiple MPI processes it Between communication domain.Distributed parallel implementation strategy based on MPI, is a kind of parallel algorithm of coarseness, by by finite element net Lattice zoning is divided into the subregion number equal with entering number of passes, then by the grid data of these subregions be mapped to it is each enter Parallel preconditioning in journey.Because each process is only responsible for the pretreatment of respective subregion, only produced on grid subzone boundaries face Raw communication, data traffic is few, therefore can obtain the effect of good Parallel preconditioning.

Referring to Fig. 1, the scheme for the grid parallel computation pretreatment that the present invention is provided, starts MPI multi-process on computers, if Surely the number of passes that enters of ParMETIS subregion tasks is performed, that is, creates ParMETIS communication domain.Utilize ParMETIS parallel regions point Solution instrument, creates adjoining the array xadj and adjncy of grid cell, as the input parameter of ParMETIS power functions, by net Lattice change into figure, then figure is carried out into repartition.Repartition result can realize the load balance of parallel computation and less subregion side Boundary's number, so as to reduce the call duration time of parallel computation, significantly improves the efficiency of grid parallel computation calculating.ParMETIS mesh generation knots Fruit establishes the one-to-one relationship between grid cell or node and processor process, and each process is according to division result circulation time Gridding information file is gone through, grid data is read by the method piecemeal for positioning file pointer and array indexing, to being allocated to current place The node or unit of device are managed, corresponding array element and grid cell node list are moved in local storage, finally existed The local coordinate array of generation and adjacency matrix, quickly realize the distributed storage of gridding information inside each processor.This hair The method that bright positioning file pointer and array indexing piecemeal are read, substantially reduce the number the operation for reading file, and effectively prevent many Process reads the competition stand-by period consumption that file is produced simultaneously.

Refer to after the completion of Fig. 1 and Fig. 2, grid data distributed storage, each process uses linked list data structure, will constitute All mesh nodes of local grid unit carry out insertion sort, to index the partial indexes that chained list represents local grid node, Create the partial indexes of mesh node.Each process reorders to grid cell, improves and solves system of linear equations sparse matrix Quality, and set interprocess communication relation to index the grid cell after sequence, last each process preserves local sparse matrix Etc. the Parallel implementation that data are used as equation.High-quality mesh generation result and efficiently quickly realization in the present invention, are differential The accuracy of equation Parallel implementation, which is provided, to be ensured, while the application for extensive grid in numerical simulation is provided conveniently.

Referring to Fig. 2, the invention discloses a kind of grid parallel computation preprocess method based on MPI, comprising the following steps：Give Determine the number of partitions of the grid of computational fields；Start MPI multi-process, set into number of passes；Judge whether be equal to the number of partitions into number of passes, if waiting In then opening grid file, host process reads grid cell message file, grid cell is averagely initially allocated to each process, The adjoining array of each process creation grid cell, otherwise restarts MPI multi-process；Each process calls ParMETIS to net Lattice unit carries out mesh generation；Grid cell block sort is read array by each process, sets the index position of array；Each Course cycles travel through grid cell message file, judge that whether array length subtracts the index position number of array less than grid cell Message length, is filled into array if the data that grid cell message file is read less than if, is otherwise assigned to array element Grid cell, and change the index position of array；Judge whether the partition number of grid cell is equal to process number, by net if being equal to Lattice unit information is stored into process file, is otherwise changed array indexing position, is continued cycling through judgement.

In grid pretreating scheme of the present invention, the number of partitions of the grid of computational fields is less than or equal to the processing of parallel computer Device number.The storage format of adjacent array is CSR (Compressed Sparse Row).Each process calls ParMETIS Program ParMETIS_V3_Mesh2Dual, figure is changed into by grid cell.Each process calls ParMETIS subprogram ParMETIS_V3_AdaptiveRepart, is divided again to figure.Each process calls ParMETIS subprogram ParMETIS_V3_RefineKway, the quality for mesh generation of further refining.The present invention is by repeatedly calling ParMETIS's Refine grid function function ParMETIS_V3_RefineKway, continues to optimize the quality of mesh generation, further reduces grid Partition boundaries size, reduces the call duration time of parallel computation, improves the quality of mesh generation.

The present invention realizes efficiently quickly mesh generation, ParMETIS multilayer k- roads figure division methods using ParMETIS Flexible strategy minimum, the call duration time that division result is produced are cut in the side for making the summit weights of each subgraph essentially identical and dividing generation It is short so that the execution time of whole concurrent program effectively reduced, and the continuous increase with data scale and place The increase of device number is managed, the effect of communication overhead reduction is more obvious.

The present invention is by the way of MPI multi-process calculating, and distribution carries out the division and pretreatment of grid, can realize big rule The quick division of lay wire lattice；Extensive grid is divided using ParMETIS parallel regions disassembling tool simultaneously, by net to be divided Lattice, are initially evenly distributed to multiple processes, and each task parallelism completes the division of grid, lifts the speed of mesh generation.

Each process of the present invention is according to the result of mesh generation, searching loop grid file, using positioning file pointer sum The method of group index carries out piecemeal reading to grid file, reduces the operation of substantial amounts of reading gridding information, effectively prevent simultaneously The competition stand-by period consumption of file is read in multi-process simultaneously, is accelerated segmentation of each process to grid data, can quickly be realized net The distributed storage of lattice.

Embodiment one

The basic step of the grid parallel computation preprocess method based on MPI of the invention is：MPI process initiations, read in net first Lattice file data, two files are write into from grid file respectively by node information and grid cell information.Create new communication Domain, calls ParMETIS partition functions, and high-quality division is realized to grid.According to the result of mesh generation, start and enter more Journey carries out searching loop to grid file simultaneously, by positioning the method that file pointer piecemeal is read, realizes that the distribution of grid is deposited Storage.

Refer to Fig. 1 and Fig. 2, the present embodiment is comprised the following steps that：

1. user gives the number of partitions num_domains needed for grid, the number of partitions can not be more than the processing of parallel computer Device number.

2. starting MPI multi-process, set into number of passes num_processors, the number of partitions need to be equal to by entering number of passes.

3. judging that MPI enters whether number of passes is equal to the number of partitions of grid, if entering number of passes equal to the number of partitions, program continues to hold OK, otherwise exit, restart MPI multi-process, set into number of passes num_processors.

4. from grid file, read in the global geometry nodal point number of grid and number of meshes.Open and specified under assigned catalogue Original mesh file channnel.msh, from channnel.msh files, read in the global geometry nodal point number global_ of grid Nde and global grid unit number global_nel.

5. by the grid.bin files that mesh node is numbered and node coordinate is write under assigned catalogue.From channnel.msh In grid file, mesh node information is read with fscanf functions, and mesh node numbering and node coordinate are write into specified mesh Grid.bin files under record.

6. the nenn.bin texts that the information such as the ode table by the type of grid cell and Component units are write under assigned catalogue Part.The information such as grid cell information, and ode table by the type of grid cell and Component units are read with fscanf functions to write The nenn.bin files entered under assigned catalogue.

7. create the communication domain of ParMETIS partitioning tools.That specifies ParMETIS enters number of passes num_run, according to specified Communication domain needed for process creation ParMETIS partition functions.

8. mesh generation.Comprise the following steps：

1）Host process reads grid cell message file nenn.bin, and grid cell is averagely initially allocated to each process, Each process is responsible for the division of global_nel/num_processors grid cell.

2）Host process creates array elmdist, elmdist=new idx_t [num_run+1], represents each process processing Grid cell scope.Wherein the mpi_id process is responsible for elmdist [mpi_id] to elmdist [mpi_id+1] individual net The division of lattice unit.

3）Host process creates the adjoining structure of arrays of global grid unit, using CSR forms i.e. with two array global_ Eptr, global_eind represent the syntople of global grid unit.

4）Host process obtains grid according to adjacent array global_eptr, global_eind and elmdist array of the overall situation The parallel C SR forms of unit, the i.e. individual grid cell of each process elmdist [mpi_id] to elmdist [mpi_id+1] Adjacent structure, is represented with array eptr, eind.

5）Host process is circulated using MPI communication modes according to number of passes is entered, by elmdist [mpi_id] to elmdist Adjacent structure array eptr, eind of [mpi_id+1] individual grid cell, is sent to the process that process number is mpi_id.

6）Subprocess receives eptr, eind array that host process is sent using MPI_Recv functions.

7）Each process prepares the other input/output arguments of ParMETIS functions, such as represents the output ginseng of division result array Number part.

8）Each process calls ParMETIS function ParMETIS_V3_Mesh2Dual, by Mesh Conversion into figure, obtains figure Abutment structure, is represented with array xadj, adjncy.

9）Each process calls ParMETIS repartition function ParMETIS_V3_AdaptiveRepart, with the adjoining of figure Structural array xadj, adjncy is divided again as the input parameter of the function to figure, obtains the result array part divided, Represent the corresponding relation of grid cell and process number.

10）Each process calls the ParMETIS partition functions ParMETIS_V3_RefineKway that refines, in above-mentioned division On the basis of, the quality for mesh generation of further refining.

9. each process is by the output result array part of ParMETIS functions, with MPI IO functions MPI_File_write_ At writes into partition.bin files simultaneously, wherein the start offset position that each process is write is elmdist [mpi_id], writes Array size is elmdist [mpi_id+1]-elmdist [mpi_id].

10. the distributed storage of grid.The present invention is stored using distributed storage mode, and each subregion can produce oneself In data file, the memory space for being stored in corresponding process, I/O bottlenecks are reduced, the scale and speed of processing data is improved.

Referring to Fig. 2, comprising the following steps：

1）Partition.bin division result is read in array ele_part by each process, represents global grid unit Partition number.

2）Each process creation size is file_arr_size piecemeal reading group file_arr, for storing each piecemeal The grid cell content of reading.File_arr_size oneself can be set, and set the DeGrain of too small algorithm, set bigger The number of times of file read-write request is fewer, and the performance of algorithm is better, so when internal memory is enough big, file_arr_size can be set It is set to bigger value.

3）Each process reads file_arr_size categorical data to array file_arr from nenn.bin files, and changes File pointer deviation post offset is file_arr_size*sizeof (data type), sets the current index position of array Arr_offset is 0, sets file to read part size read_file_size for file_arr_size.Calculate whole file Nenn.bin file size is file_size.

4）Each process judges array according to global grid unit number searching loop grid cell message file nenn.bin Whether the skew that file_arr size subtracts array is less than a complete grid cell message length.If array file_arr Size subtract the skew of array and be less than complete grid cell message length, then position nenn.bin file pointer, From the position readings evidence of current nenn.bin file pointers, it is filled into array file_arr.If array file_arr size Not less than one complete grid cell message length of skew of array is subtracted, then positions array file_arr index, ought Element under preceding array indexing is to grid cell type, grid physical entity, and grid cell node array assignment, and assignment is complete Cheng Hou, modification array indexing position arr_offset value.

if((file_arr_size–arr_offset)<(3+n_max)) judge whether array surplus element number is less than The length of one full unit information.

{

The element value that the surplus element of array is arr_offset to file_arr_size is assigned to array successively 0th to file_arr_size-arr_offset-1 elements, and with fseek function locating nenn.bin file pointers to currently The position of offset values.

if(file_size–read_file_size)>=arr_offset) judge whether file unread portion is less than array Part to be filled

{

Create the reading array of data read_arr that a size is arr_offset, the data read for memory partitioning.

Arr_offset categorical data is read to array read_arr, change file pointer deviation post offset= offset+arr_offset*sizeof（Data type）

Read_arr array element is assigned to array file_arr rear arr_offset elements successively.

The size read_file_size of file value, read_file_size=read_file_size+ have been read in modification arr_offset

}

else

{

The array read_arr that a size is file_size-read_file_size is created, file_size- is read Read_file_size categorical data changes file pointer deviation post offset=offset+ to array read_arr （file_size-read_file_size）*sizeof（Data type）

Read_arr array element is assigned to array file_arr rear file_size-read_file_ successively Size elements.

The size read_file_size of file value, read_file_size=file_size have been read in modification

}

Delete array read_arr, releasing memory.

The value for setting array file_arr index position arr_offset is 0

else

{

The element of current array indexing position is assigned to the type ele_type of grid cell, and node array

Value arr_offset=arr_offset+3+n_max of array indexing is changed, wherein 3+n_max believes for grid cell The length of breath.

}

If (whether partition number is equal to process number)

{

The element of current index position is assigned to the node array nenn [n_max] of grid cell, wherein n_max is structure Into the nodal point number of grid cell.Cell type ele_type, node array nenn are write into the grid cell named with process number File nenn_mpi_id_physical_entity.bin.

}

Judge whether the partition number of current grid unit is equal to local process number, if partition number is equal to process number, by net The type and node array of lattice unit write into the file nenn_mpi_id_ named jointly by local process number and physical entity number Physical_entity.bin, if partition number is not equal to process number, modification array indexing position continues cycling through judgement.

5）Each process is according to global grid unit number global_nel circulating repetition steps 4）, terminate until entirely circulating, Realize the distributed storage of grid.

11. grid parallel computation pretreatment terminates.

Described above is the detailed description for the present invention preferably possible embodiments, but embodiment is not limited to this hair The equal change or modification change completed under bright patent claim, all disclosed technical spirits, all should belong to Cover the scope of the claims in the present invention.

Claims

1. a kind of grid parallel computation preprocess method based on MPI, it is characterised in that comprise the following steps：

The number of partitions of the grid of given computational fields；

Start MPI multi-process, set into number of passes；

Judge whether be equal to the number of partitions into number of passes, if opening grid file equal to if, host process reads grid cell message file, Grid cell is averagely initially allocated to each process, otherwise the adjoining array of each process creation grid cell restarts MPI multi-process；

Each process calls ParMETIS to carry out mesh generation to grid cell；

Grid cell block sort is read array by each process, sets the index position of array；

Each course cycles traversal grid cell message file, judges that array length subtracts the index position number of array and whether is less than Grid cell message length, is filled into array if the data that grid cell message file is read less than if, otherwise by array member Element is assigned to grid cell, and changes the index position of array；

Judge whether the partition number of grid cell is equal to process number, process file is arrived if storing grid cell information equal to if In, array indexing position is otherwise changed, judgement is continued cycling through.

2. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that：The grid of computational fields The number of partitions is less than or equal to the processor number of parallel computer.

3. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that：The storage of adjacent array Form is CSR.

4. the grid parallel computation preprocess method as claimed in claim 1 based on MPI, it is characterised in that：Each process is called ParMETIS subprogram ParMETIS_V3_Mesh2Dual, figure is changed into by grid cell.

5. the grid parallel computation preprocess method as claimed in claim 4 based on MPI, it is characterised in that：Each process is called ParMETIS subprogram ParMETIS_V3_AdaptiveRepart, is divided again to figure.

6. the grid parallel computation preprocess method as claimed in claim 5 based on MPI, it is characterised in that：Each process is called ParMETIS subprogram ParMETIS_V3_RefineKway, the quality for mesh generation of further refining.