CN116755636B - Parallel reading method, device and equipment for grid files and storage medium - Google Patents

Parallel reading method, device and equipment for grid files and storage medium Download PDF

Info

Publication number
CN116755636B
CN116755636B CN202311029868.XA CN202311029868A CN116755636B CN 116755636 B CN116755636 B CN 116755636B CN 202311029868 A CN202311029868 A CN 202311029868A CN 116755636 B CN116755636 B CN 116755636B
Authority
CN
China
Prior art keywords
grid
file
sub
parallel
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311029868.XA
Other languages
Chinese (zh)
Other versions
CN116755636A (en
Inventor
陈呈
何舟桥
杨超
赵丹
郭宁波
邢德
王岳青
杨文祥
喻杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computational Aerodynamics Institute of China Aerodynamics Research and Development Center
Original Assignee
Computational Aerodynamics Institute of China Aerodynamics Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computational Aerodynamics Institute of China Aerodynamics Research and Development Center filed Critical Computational Aerodynamics Institute of China Aerodynamics Research and Development Center
Priority to CN202311029868.XA priority Critical patent/CN116755636B/en
Publication of CN116755636A publication Critical patent/CN116755636A/en
Application granted granted Critical
Publication of CN116755636B publication Critical patent/CN116755636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing

Abstract

The invention relates to the field of grid file processing, and discloses a parallel reading method, a device, equipment and a storage medium of grid files, wherein the method comprises the following steps: preprocessing the grid file and constructing metadata; grid segmentation is carried out by utilizing the information of the metadata, and each sub-grid data set obtained by segmentation is distributed to different processes; reading in the sub-grid data set in parallel, constructing grid topology and carrying out grid mapping to obtain a mapping table; and analyzing the mapping table, and reading corresponding attribute data in parallel to obtain an output result. The method can read the large-scale data file in parallel by utilizing the multi-core characteristic of the cluster system, greatly improves the reading speed of the large-scale file and solves the problem of memory limitation during serial reading.

Description

Parallel reading method, device and equipment for grid files and storage medium
Technical Field
The present invention relates to the field of grid file processing, and in particular, to a method, an apparatus, a device, and a storage medium for parallel reading of grid files.
Background
With the rapid development of computer technology and numerical simulation methods, the computational and data processing capabilities of fluid mechanics (Computational fluid dynamics, CFD) have greatly increased. The solving process of computing CFD with high precision and high fidelity often generates data files such as ultra-large scale grids, flow fields, attributes and the like, and the ultra-large scale data volume brings challenges to the visualization process of the post-processing stage, wherein the problem of reading the ultra-large scale data is an unavoidable problem to be solved at first.
After simulation by different post-processing tools, a specific data format is typically generated, such as: the EnSight Gold data file format shown in FIG. 1 contains three-dimensional position and size information, scalar fields, vector fields, and other three-dimensional data, which can directly describe vector data. The geo is a file store of suffixes, and attribute data related to models, such as scalar, vector, tensor, is stored in a file of user-defined suffixes. The EnSight Gold format typically stores unsteady flow field data, and maintains file associations between geometric models and attributes, as well as organizational relationships describing different time steps of unsteady data, through case files.
The current data reading method aiming at EnSight Gold format is still mainly based on a traditional serial reading mode, and the mode generally has some problems: because the speed of the processor is not matched with the speed of the disk, the speed of the hardware limits the reading speed of data, so that the disk can be in a frequent access state when a very large-scale file of tens to hundreds of GB is read on a single CPU, the processing speed of the CPU is far greater than the IO speed, and a great amount of time is wasted in the process of waiting for IO by the CPU; the large-scale flow field data is huge in data quantity and is usually stored in a binary form, and when the large-scale flow field data is read into a memory, the operation system can perform operations such as memory alignment, data type conversion and the like, so that the occupied memory quantity is increased sharply, for example, a 3GB EnSight Gold format file stored in a binary form can occupy 10GB of memory when being read in series. The serial read-in mode brings great pressure to the memory, and even program breakdown is possibly caused by insufficient memory, so that the read-in fails.
Disclosure of Invention
Accordingly, the present invention is directed to a method, apparatus, device and storage medium for parallel reading of grid files, which can increase the reading speed of large-scale files and solve the problem of memory limitation during serial reading. The specific scheme is as follows:
a parallel reading method of grid files comprises the following steps:
preprocessing the grid file and constructing metadata;
grid segmentation is carried out by utilizing the information of the metadata, and each sub-grid data set obtained by segmentation is distributed to different processes;
reading in the sub-grid data set in parallel, constructing grid topology and carrying out grid mapping to obtain a mapping table;
and analyzing the mapping table, and reading corresponding attribute data in parallel to obtain an output result.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, preprocessing and metadata construction are performed on the grid file, including:
pre-scanning the grid file to obtain scanning data;
and extracting key information of the grid file from the scanning data to construct metadata.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, grid segmentation is performed by using information of the metadata, and each sub-grid data set obtained by segmentation is allocated to a different process, including:
dividing the grids of each component according to the number of processes according to the information of the metadata to obtain a plurality of sub-grid data sets;
calculating the start-stop positions of each sub-grid data set in the grid file;
and distributing each sub-grid data set to different processes according to the start-stop positions of each sub-grid data set in the grid file so as to create file view ports for each process.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, calculating a start-stop position of each sub-grid data set in the grid file includes:
calculating the total number of the sub-grid data sets allocated by the processes before each process;
calculating the initial byte position of the sub-grid data set in the grid file according to the first byte position of the sub-grid data set, the number of points forming the sub-grid data set and the total number of the sub-grid data sets distributed by the processes before each process;
acquiring the number of the sub-grid data sets distributed by the current process;
and calculating the ending byte position of the sub-grid data set in the grid file according to the starting byte position of the sub-grid data set in the grid file, the number of points forming the sub-grid data set and the number of the sub-grid data sets distributed by the current process.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, reading the sub-grid dataset in parallel, constructing a grid topology and performing grid mapping to obtain a mapping table, including:
mapping the global point set shared by all the components to the corresponding process local point set in each file viewport to generate a mapping result;
and reading in the sub-grid data set in parallel, constructing grid topology by using the mapping result, and mapping the global grid unit into a process to obtain a mapping table.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, after obtaining the mapping table, the method further includes:
counting the points used by the current process;
copying the memory space corresponding to the used points into the mapping table; and the mapping table stores the mapping relation between the ID of the coordinate point used by the current process in the grid file and the ID in the process.
Preferably, in the parallel reading method of a grid file provided by the embodiment of the present invention, reading corresponding attribute data in parallel to obtain an output result includes:
when the grid attribute is read in parallel, locating the corresponding grid through the ID, the type name and the type ID of the component; the length of the mapping table represents the number of grids;
when the coordinate point attributes are read in parallel, positioning the coordinate point used in the current process through the ID of the component; the length of the mapping table represents the number of real points used in the current process.
The embodiment of the invention also provides a parallel reading device of the grid file, which comprises the following steps:
the metadata construction module is used for preprocessing the grid file and constructing metadata;
the grid segmentation and distribution module is used for carrying out grid segmentation by utilizing the information of the metadata and distributing each sub-grid data set obtained by segmentation to different processes;
the grid construction and mapping module is used for reading the sub-grid data sets in parallel, constructing grid topology and carrying out grid mapping to obtain a mapping table;
and the data reading module is used for analyzing the mapping table and reading corresponding attribute data in parallel to obtain an output result.
The embodiment of the invention also provides a parallel reading device of the grid file, which comprises a processor and a memory, wherein the parallel reading method of the grid file provided by the embodiment of the invention is realized when the processor executes the computer program stored in the memory.
The embodiment of the invention also provides a computer readable storage medium for storing a computer program, wherein the computer program realizes the parallel reading method of the grid file provided by the embodiment of the invention when being executed by a processor.
From the above technical solution, the parallel reading method of grid files provided by the present invention includes: preprocessing the grid file and constructing metadata; grid segmentation is carried out by utilizing the information of the metadata, and each sub-grid data set obtained by segmentation is distributed to different processes; reading in the sub-grid data set in parallel, constructing grid topology and carrying out grid mapping to obtain a mapping table; and analyzing the mapping table, and reading corresponding attribute data in parallel to obtain an output result.
According to the parallel reading method for the grid files, the grid files are preprocessed and metadata are constructed, grid segmentation is carried out by utilizing the metadata, segmented sub-grid data sets are distributed to different processes, construction and mapping of grid topology are further achieved, and corresponding attribute data are read in parallel by analyzing a mapping table, so that the files are read in parallel by utilizing the multi-core characteristic of the cluster system, the reading speed of large-scale files is greatly improved, the I/O performance is improved, and the memory restriction problem in serial reading is solved.
In addition, the invention also provides a corresponding device, equipment and a computer readable storage medium for the parallel reading method of the grid file, so that the method has more practicability, and the device, the equipment and the computer readable storage medium have corresponding advantages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only embodiments of the present invention, and other drawings may be obtained according to the provided drawings without inventive effort for those skilled in the art.
FIG. 1 is a conventional EnSight Gold format file structure;
FIG. 2 is a flow chart of a parallel reading method of grid files provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a parallel reading method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a metadata data structure according to an embodiment of the present invention;
FIG. 5 is a schematic view of a view port of a process reading file according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of grid construction and mapping according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of coordinate two-time mapping provided in an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a parallel reading device for grid files according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a parallel reading method of grid files, which is shown in fig. 2 and comprises the following steps:
s201, preprocessing a grid file and constructing metadata;
it should be noted that, before formally reading in data to construct a grid, the grid file needs to be preprocessed and metadata needs to be constructed, and the constructed metadata can assist parallel reading in.
S202, grid segmentation is carried out by utilizing the information of the metadata, and each sub-grid data set obtained by segmentation is distributed to different processes;
in the invention, the grid data is divided by utilizing the information of the metadata, a plurality of sub-grid data sets can be obtained, then the divided sub-grid data sets are distributed to different computing nodes, and each computing node can load the sub-grid data sets into the memory, so that the memory limit of serial reading is broken through.
The result of step S202 is that a file viewport is created for each process, conditions are created for parallel reading, each process maintains a file pointer, and operates in the respective file viewport.
S203, parallelly reading in the sub-grid data set, constructing grid topology and carrying out grid mapping to obtain a mapping table;
it should be noted that, the present invention uses a process level parallelism (Message Passing Interface, MPI) technique for file processing in a cluster environment, and partitions and distributes the grid to different computing nodes for parallel reading, so as to improve the reading speed and the I/O performance, and improve the subsequent visual rendering efficiency.
S204, analyzing the mapping table, and reading corresponding attribute data in parallel to obtain an output result.
In the parallel reading method of the grid file provided by the embodiment of the invention, the grid file is preprocessed and metadata is constructed, the metadata is utilized to carry out grid segmentation, the segmented sub-grid data sets are distributed to different processes, further, construction and mapping of grid topology are realized, and the corresponding attribute data are read in parallel by analyzing the mapping table, so that the file is read in parallel by utilizing the multi-core characteristic of the cluster system, the reading speed of a large-scale file is greatly improved, the I/O performance is improved, and the memory limitation problem during serial reading is solved.
In practical application, the parallel reading method provided by the invention is independent of a file storage mode, and can be applied to centralized storage and distributed storage. The invention mainly aims to realize parallel reading of large-scale data files in EnSight Gold format.
Further, in the implementation, in the parallel reading method of the grid file provided by the embodiment of the present invention, step S201 performs preprocessing and metadata construction on the grid file, as shown in fig. 3, may include: firstly, pre-scanning a grid file to obtain scanning data; then, key information of the grid file is extracted from the scan data, and metadata is constructed.
It should be noted that the data file in the EnSight Gold format differs greatly from other format data types in the manner in which the grid and grid point coordinates are stored. It uses a global set of point coordinates (both two and three dimensional, including XYZ three dimensional coordinate values) in each part (part) separately, different types of grids in part share the ID of this set of coordinates and the ID of the midpoint and grid are incremented by default in the order stored starting from 0. The number of parts and the size of the file are not in positive correlation, so that the parts cannot be directly allocated to each process, but the number of grids in the parts and the size of the file are in positive correlation, and therefore, the strategy adopted by the invention is to divide the grids in the parts into sub-grids and allocate the sub-grids to different processes. The invention designs a data structure (partsintype) without an information overview of each part in the geo file storing the geometric model information, as shown in fig. 4, and scans the geo file once before reading in the geometric model file to perform grid construction, circularly scans each part and stores their basic information in the data structure as metadata of the geometric model file.
Further, in the embodiment of the present invention, in the parallel reading method of the grid file, step S202 performs grid segmentation using metadata information, and distributes each sub-grid data set obtained by segmentation to different processes, as shown in fig. 3, may include: firstly, dividing grids of each part according to the number of processes according to the information of metadata to obtain a plurality of sub-grid data sets; then, calculating the start and stop positions of each sub-grid data set in the grid file; and then, distributing each sub-grid data set to different processes according to the start-stop positions of each sub-grid data set in the grid file so as to create a file viewport for each process.
It should be noted that each part is generally formed by a plurality of cells of different types, and the number and topology of each cell are explicitly given, and the ID of each cell of the different types is implicitly expressed. In the process of constructing metadata, the grid is further divided according to the number of processes. The invention designs a data structure (CellTypes) for representing the division condition of grids. Optionally, the dividing the grid of each part according to the number of processes in the step may include: firstly, according to the number of cells of each type, dividing all cells into all processes, if n cells remain, sequentially distributing the cells to the processes from 0 to n-1, and storing the number of the cells distributed to each process into vBlock (a vector, the subscript of which represents the process number, and the value corresponding to the subscript represents the number of the cells distributed to the process).
In addition, since the calculated split case cannot be directly used by the file pointer, it is necessary to further calculate the start-stop positions of the split grid in the file.
Optionally, in a specific implementation, calculating the start-stop position of each sub-grid dataset in the grid file in the step above may include: calculating the total number of sub-grid data sets allocated by the processes before each process; calculating the initial byte position of the sub-grid data set in the grid file according to the first byte position of the sub-grid data set, the number of points forming the sub-grid data set and the total number of the sub-grid data sets allocated by the processes before each process; acquiring the number of sub-grid data sets distributed by the current process; and calculating the ending byte position of the sub-grid data set in the grid file according to the starting byte position of the sub-grid data set in the grid file, the number of points forming the sub-grid data set and the number of sub-grid data sets distributed by the current process.
Specifically, firstly, calculating the total number of cells allocated by the process before each process according to vBlockPREBLOCK) Obtaining the number of cells distributed by the current process through the value of vBlockCURBLOCK). One for each digit in EnSight GoldINTThe type indicates that, therefore, the cell's start position calculation method is (1), and the end position calculation method is (2):
REALSTARTPOS=CELLSTARTPOS+PREBLOCK*sizeofINT* NUMPOINTSOFCELL;(1)
REALSTARTPOS=REALSTARTPOS+CURBLOCK *sizeofINT* NUMPOINTSOFCELL-1;(2)
wherein, the liquid crystal display device comprises a liquid crystal display device,REALSTARTPOSindicating that the current process should handle the starting byte position of the cell,CELLSTARTPOSrepresenting the firstThe first byte position of a cell,NUMPOINTSOFCELLindicating the number of points constituting the cell,REALENDPOSindicating that the current process should handle the ending byte position of the cell.
Further, in the implementation, in the parallel reading method of the grid file provided by the embodiment of the present invention, step S203 reads the sub-grid dataset in parallel, constructs a grid topology and performs grid mapping to obtain a mapping table, which may include: mapping the global point set shared by all the components to the corresponding process local point set in each file viewport to generate a mapping result; and reading in the sub-grid data set in parallel, constructing grid topology by using the mapping result, and mapping the global grid unit into a process to obtain a mapping table. After performing step S203 to obtain the mapping table, it may further include: counting the points used by the current process; copying the memory space corresponding to the used points into a mapping table; the mapping table stores the mapping relation between the ID of the coordinate point used by the current process in the grid file and the ID in the process.
Fig. 5 shows the read viewport situation for each process. Each process opens the file and maintains the file pointer in each process, and each process only reads the data belonging to the own viewport to construct the grid topology and make the grid mapping with the help of the metadata, so that the contents processed by each process are not interfered with each other.
In the invention, the coordinate point set can be used for constructing a grid topology, in EnSight Gold, one part shares one set of coordinate point set, and after the grid is divided into a plurality of sub-grid data sets to be distributed to different processes, each process needs to independently maintain the point set in the process. The invention provides a point set mapping method, which maps a global point set shared by a part to a process local point set; formally reading in data to construct grids, constructing a new grid topology by utilizing a local point set, and mapping global grid units into processes. The point set in the geometric model part is mapped to each process through the two mappings, so that the correct construction and mapping of grids and the correct reading of attribute files are ensured.
Fig. 6 shows the two mapping process. The point set of the first mapping is used to constructThe grid comprises the following specific schemes: each process maintains a mapping tablepointsIdReflectTable(length is total number of part midpoints), when constructing the grid, acquiring a first coordinate ID from the NODEIDLIST, taking the ID as a subscript, and setting the value corresponding to the subscript in the array aspointsCnt(initial value of 1, cyclic increment), each coordinate ID is obtained and judged firstpointsIdReflectTableIf the mapping exists, mapping is carried out, then the mapping result is utilized to construct a grid, and if the mapping exists, the mapping result is directly utilized to construct a new cell topological structure. After all grid topologies are built, mapping tablepointsIdReflectTableAnd (3) completing construction, wherein a value of 0 indicates no mapping relation, and a mapping relation is formed by a non-0 value and a subscript of the value.
The second mapped point set is stored in the object and used for reading attribute data, and the specific scheme is as follows: each process maintains a mapping tablerealPointsIdReflectTable(the array cannot be directly constructed because the number of points used by the current process can only be counted after all grids are constructed, so that an intermediate array is needed to temporarily map points and copy the memory into the array). The first coordinate ID is obtained from NODEIDLIST, which is taken as a value,pointsCnt -1 is stored for the subscript into the temporary mapping table. When all grids are constructed, the temporary mapping table is mapped, and the length is equal topointsCntIs copied to mapping tablerealPointsIdReflectTableIn the method, the mapping table only stores the mapping relation between the ID of the coordinate point used by the current process in the geometric model file and the ID in the process.
In addition, it should be noted that, the parallel construction and mapping of the grid are realized by parsing the metadata of the geometric model file instead of reading line by line in the serial I/O scheme. Each process analyzes the partsinftype, obtains the information of the part to be processed by the process, and operates the file pointer to move in the part. First pass throughREALSTARTPOSAndREALENDPOSall cell topologies to be processed by the current process are read directly into an array NODEIDLIST (the array is used to store the topology of all cells, the partAre consecutive bytes and can therefore be read in directly). The ID of the cell in the geometric model file is increased from 0 by default, and the ID of the cell is increased from 0 when the serial method reads in the constructed grid. By means ofPREBLOCKAndCURBLOCKcalculating the starting ID and the ending ID of a cell to be processed in the process, taking the starting ID and the ending ID as the circulation conditions for constructing grids and mapping, sequentially taking out the IDs of coordinate points for constructing a single cell from NODEIDLIST, wherein the coordinate point sets used in different processes are different, the point sets in the geometric model file need to be mapped in each process respectively, each process maintains the point set of the current process, and the point set is used in each process to construct grids and point set mapping. At this time, the cyclic variable K represents the ID of the cell in the geometric model file, and after each grid is built, the variable K is inserted into the list and stored, and the subscript of the list is started from 0 by default, which maps the ID of the cell in the geometric model to the process, and fig. 7 shows the mapping method. One specific embodiment is listed: when the grid type of the process 2 is TETRA4 and the ID of the process is 1000, the original topology of the cell is 198,21,33,18 (the topology represents that coordinates of the points with the ID of 4 in a geometric model file are sequentially connected to form a grid of the TETRA4 type), after mapping by the method provided by the invention, the ID of the coordinate which is actually inserted when the cell is built in the process is 0,1,2 and 3, the value of K is 1000, and the K is inserted into a list, so that the cell with the ID of 1000 in the geometric model file is mapped into the cell with the ID of 0 in the process 2.
Further, in the parallel reading method of the grid file provided by the embodiment of the present invention, step S204 reads corresponding attribute data in parallel to obtain an output result, which may include: when the grid attribute is read in parallel, locating the corresponding grid through the ID, the type name and the type ID of the component; the length of the mapping table represents the number of grids; when the coordinate point attributes are read in parallel, positioning the coordinate point used in the current process through the ID of the component; the length of the mapping table represents the number of real points used in the current process.
In the invention, when constructing the grid, the ID mapping tables of the grid and the coordinate points in the files and the processes are constructed and stored in the Reader object, and when the attribute data is read in, the mapping tables can be directly called to acquire the mapping relation, so that the attribute data is associated with the corresponding grid and coordinate points. The real IDs of the grids and coordinate points in the process are obtained through analyzing the mapping table, when the grid attribute is read in, the corresponding grids can be quickly positioned through the part ID, the type name and the type ID, the length of the mapping table indicates the number of the grids, and therefore attribute data are associated; when the coordinate point attribute is read in, the corresponding coordinate point can be quickly positioned through the ID of the part, and the length of the mapping table indicates the number of the real points used in the process, so that attribute data are associated.
Based on the same inventive concept, the embodiment of the invention also provides a parallel reading device of the grid file, and because the principle of solving the problem of the device is similar to that of the parallel reading method of the grid file, the implementation of the device can refer to the implementation of the parallel reading method of the grid file, and the repetition is omitted.
In a specific implementation, the parallel reading device for a grid file provided by the embodiment of the present invention, as shown in fig. 8, specifically includes:
the metadata construction module 11 is used for preprocessing the grid file and constructing metadata;
the grid dividing and distributing module 12 is configured to divide the grid by using information of metadata, and distribute each sub-grid data set obtained by dividing to different processes;
the grid construction and mapping module 13 is used for reading in the sub-grid data set in parallel, constructing grid topology and performing grid mapping to obtain a mapping table;
the data reading module 14 is configured to parse the mapping table, and read in parallel the corresponding attribute data to obtain an output result.
In the parallel reading device for the grid files provided by the embodiment of the invention, the parallel reading of the files can be realized through the interaction of the four modules, the reading speed of large-scale files is greatly improved, the I/O performance is improved, and the memory limitation problem during serial reading is solved.
For more specific working procedures of the above modules, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
Correspondingly, the embodiment of the invention also discloses parallel reading equipment of the grid file, which comprises a processor and a memory; the parallel reading method of the grid file disclosed in the foregoing embodiment is implemented when the processor executes the computer program stored in the memory. For more specific procedures of the above method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
Further, the invention also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by the processor, implements the parallel reading method of the grid file disclosed above. For more specific procedures of the above method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. The apparatus, device, and storage medium disclosed in the embodiments are relatively simple to describe, and the relevant parts refer to the description of the method section because they correspond to the methods disclosed in the embodiments.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The parallel reading method, device, equipment and storage medium of the grid file provided by the invention are described in detail, and specific examples are applied to the description of the principle and implementation mode of the invention, and the description of the above examples is only used for helping to understand the method and core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. A parallel reading method of grid files, comprising:
preprocessing the grid file and constructing metadata;
dividing the grids of each component according to the number of processes according to the information of the metadata to obtain a plurality of sub-grid data sets;
calculating the start-stop positions of each sub-grid data set in the grid file;
distributing each sub-grid data set to different processes according to the start-stop positions of each sub-grid data set in the grid file so as to create a file viewport for each process;
mapping the global point set shared by all the components to the corresponding process local point set in each file viewport to generate a mapping result;
reading in the sub-grid data set in parallel, constructing grid topology by using the mapping result, and mapping the global grid unit into a process to obtain a mapping table;
and analyzing the mapping table, and reading corresponding attribute data in parallel to obtain an output result.
2. The parallel reading method of grid files according to claim 1, wherein preprocessing and metadata construction are performed on the grid files, comprising:
pre-scanning the grid file to obtain scanning data;
and extracting key information of the grid file from the scanning data to construct metadata.
3. The parallel reading-in method of a grid file according to claim 2, wherein calculating start-stop positions of each of the sub-grid data sets in the grid file comprises:
calculating the total number of the sub-grid data sets allocated by the processes before each process;
calculating the initial byte position of the sub-grid data set in the grid file according to the first byte position of the sub-grid data set, the number of points forming the sub-grid data set and the total number of the sub-grid data sets distributed by the processes before each process;
acquiring the number of the sub-grid data sets distributed by the current process;
and calculating the ending byte position of the sub-grid data set in the grid file according to the starting byte position of the sub-grid data set in the grid file, the number of points forming the sub-grid data set and the number of the sub-grid data sets distributed by the current process.
4. A parallel reading method of grid files according to claim 3, further comprising, after obtaining the mapping table:
counting the points used by the current process;
copying the memory space corresponding to the used points into the mapping table; and the mapping table stores the mapping relation between the ID of the coordinate point used by the current process in the grid file and the ID in the process.
5. The parallel reading-in method of grid files according to claim 4, wherein reading-in corresponding attribute data in parallel, obtaining an output result, comprises:
when the grid attribute is read in parallel, locating the corresponding grid through the ID, the type name and the type ID of the component; the length of the mapping table represents the number of grids;
when the coordinate point attributes are read in parallel, positioning the coordinate point used in the current process through the ID of the component; the length of the mapping table represents the number of real points used in the current process.
6. A parallel reading device for a grid file, comprising:
the metadata construction module is used for preprocessing the grid file and constructing metadata;
the grid dividing and distributing module is used for dividing grids of all the components according to the process number according to the information of the metadata to obtain a plurality of sub-grid data sets; calculating the start-stop positions of each sub-grid data set in the grid file; distributing each sub-grid data set to different processes according to the start-stop positions of each sub-grid data set in the grid file so as to create a file viewport for each process;
the grid construction and mapping module is used for mapping the global point set shared by all the components to the corresponding process local point set in each file viewport to generate a mapping result; reading in the sub-grid data set in parallel, constructing grid topology by using the mapping result, and mapping the global grid unit into a process to obtain a mapping table;
and the data reading module is used for analyzing the mapping table and reading corresponding attribute data in parallel to obtain an output result.
7. A parallel reading-in device of a grid file, characterized by comprising a processor and a memory, wherein the processor implements the parallel reading-in method of a grid file according to any one of claims 1 to 5 when executing a computer program stored in the memory.
8. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements a parallel read-in method of a grid file according to any one of claims 1 to 5.
CN202311029868.XA 2023-08-16 2023-08-16 Parallel reading method, device and equipment for grid files and storage medium Active CN116755636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311029868.XA CN116755636B (en) 2023-08-16 2023-08-16 Parallel reading method, device and equipment for grid files and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311029868.XA CN116755636B (en) 2023-08-16 2023-08-16 Parallel reading method, device and equipment for grid files and storage medium

Publications (2)

Publication Number Publication Date
CN116755636A CN116755636A (en) 2023-09-15
CN116755636B true CN116755636B (en) 2023-10-27

Family

ID=87948153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311029868.XA Active CN116755636B (en) 2023-08-16 2023-08-16 Parallel reading method, device and equipment for grid files and storage medium

Country Status (1)

Country Link
CN (1) CN116755636B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235974A (en) * 2013-04-25 2013-08-07 中国科学院地理科学与资源研究所 Method for improving processing efficiency of massive spatial data
CN103605852A (en) * 2013-11-25 2014-02-26 国家电网公司 Parallel topology method for electromechanical transient real-time simulation for large-scale power network
CN104537125A (en) * 2015-01-28 2015-04-22 中国人民解放军国防科学技术大学 Remote-sensing image pyramid parallel building method based on message passing interface
CN105677488A (en) * 2016-01-12 2016-06-15 中国人民解放军国防科学技术大学 Method for constructing raster image pyramid in hybrid parallel mode
EP3131060A1 (en) * 2015-08-14 2017-02-15 Samsung Electronics Co., Ltd. Method and apparatus for constructing three dimensional model of object
CN106897131A (en) * 2017-02-22 2017-06-27 郑州云海信息技术有限公司 A kind of parallel calculating method and its device for astronomical software Gridding
CN110211234A (en) * 2019-05-08 2019-09-06 上海索辰信息科技有限公司 A kind of grid model sewing system and method
CN111680456A (en) * 2020-04-28 2020-09-18 中国科学院深圳先进技术研究院 Fluid mechanics simulation method, device and storage medium
CN112463360A (en) * 2020-10-29 2021-03-09 空气动力学国家重点实验室 Parallel read-in method for billion-hundred-GB-level grid data file
CN115344383A (en) * 2022-08-16 2022-11-15 浙江工商大学 Streamline visualization parallel acceleration method based on process parallel
CN116303219A (en) * 2023-03-22 2023-06-23 苏州浪潮智能科技有限公司 Grid file acquisition method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7594075B2 (en) * 2004-10-20 2009-09-22 Seagate Technology Llc Metadata for a grid based data storage system
US8386227B2 (en) * 2010-09-07 2013-02-26 Saudi Arabian Oil Company Machine, computer program product and method to generate unstructured grids and carry out parallel reservoir simulation
KR101358037B1 (en) * 2012-11-28 2014-02-05 한국과학기술정보연구원 Record medium recorded in a structure of file format and directory for massive cfd(computational fuid dynamics) data visualization in parallel, and method for transforming structure of data file format thereof
US9959306B2 (en) * 2015-06-12 2018-05-01 International Business Machines Corporation Partition-based index management in hadoop-like data stores
US10311023B1 (en) * 2015-07-27 2019-06-04 Sas Institute Inc. Distributed data storage grouping
US20210200717A1 (en) * 2019-12-26 2021-07-01 Oath Inc. Generating full metadata from partial distributed metadata

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103235974A (en) * 2013-04-25 2013-08-07 中国科学院地理科学与资源研究所 Method for improving processing efficiency of massive spatial data
CN103605852A (en) * 2013-11-25 2014-02-26 国家电网公司 Parallel topology method for electromechanical transient real-time simulation for large-scale power network
CN104537125A (en) * 2015-01-28 2015-04-22 中国人民解放军国防科学技术大学 Remote-sensing image pyramid parallel building method based on message passing interface
EP3131060A1 (en) * 2015-08-14 2017-02-15 Samsung Electronics Co., Ltd. Method and apparatus for constructing three dimensional model of object
CN105677488A (en) * 2016-01-12 2016-06-15 中国人民解放军国防科学技术大学 Method for constructing raster image pyramid in hybrid parallel mode
CN106897131A (en) * 2017-02-22 2017-06-27 郑州云海信息技术有限公司 A kind of parallel calculating method and its device for astronomical software Gridding
CN110211234A (en) * 2019-05-08 2019-09-06 上海索辰信息科技有限公司 A kind of grid model sewing system and method
CN111680456A (en) * 2020-04-28 2020-09-18 中国科学院深圳先进技术研究院 Fluid mechanics simulation method, device and storage medium
CN112463360A (en) * 2020-10-29 2021-03-09 空气动力学国家重点实验室 Parallel read-in method for billion-hundred-GB-level grid data file
CN115344383A (en) * 2022-08-16 2022-11-15 浙江工商大学 Streamline visualization parallel acceleration method based on process parallel
CN116303219A (en) * 2023-03-22 2023-06-23 苏州浪潮智能科技有限公司 Grid file acquisition method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于HDF5的结构网格计算流体动力学程序并行I/O技术;杨丽鹏;车永刚;;计算机应用(第09期);2423-2427 *

Also Published As

Publication number Publication date
CN116755636A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN107544948B (en) Vector file conversion method and device based on MapReduce
Weiss et al. Simplex and diamond hierarchies: Models and applications
CN112287182A (en) Graph data storage and processing method and device and computer storage medium
WO2022057303A1 (en) Image processing method, system and apparatus
CN109726441B (en) Body and surface mixed GPU parallel computing electromagnetism DGTD method
WO2019211057A1 (en) Methods and systems for simplified graphical depictions of bipartite graphs
WO2021173490A1 (en) Vectorized queues for shortest-path graph searches
Carr et al. Scalable contour tree computation by data parallel peak pruning
CN115798654A (en) Model material processing method, device, equipment and medium
Lindner Data transfer in partitioned multi-physics simulations: interpolation & communication
CN108880872B (en) Method and device for decomposing topological structure of Internet test bed
Rivara Lepp-bisection algorithms, applications and mathematical properties
Heene A massively parallel combination technique for the solution of high-dimensional PDEs
CN116755636B (en) Parallel reading method, device and equipment for grid files and storage medium
CN102254093B (en) Connected domain statistical correlation algorithm based on Thiessen polygon
CN111737490B (en) Knowledge graph ontology model generation method and device based on banking channel
CN114882956A (en) Pan-genome data organization method based on graph and system thereof
CN114022649A (en) GPU-CPU (graphics processing unit-central processing unit) cooperative raster data rapid coordinate conversion method and system
CN113761293A (en) Graph data strong-connectivity component mining method, device, equipment and storage medium
CN111949839B (en) Data association method, electronic device and medium
Weiss et al. Supercubes: A high-level primitive for diamond hierarchies
CN112818179B (en) Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
CN116450872B (en) Spark distributed vector grid turning method, system and equipment
Lee et al. Constant-time navigation in four-dimensional nested simplicial meshes
CN117078879A (en) Shared point self-blanking visualization method and system for corner grids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant