CN114003385A - Parallelization method for improving post-processing performance - Google Patents
Parallelization method for improving post-processing performance Download PDFInfo
- Publication number
- CN114003385A CN114003385A CN202111287291.3A CN202111287291A CN114003385A CN 114003385 A CN114003385 A CN 114003385A CN 202111287291 A CN202111287291 A CN 202111287291A CN 114003385 A CN114003385 A CN 114003385A
- Authority
- CN
- China
- Prior art keywords
- file
- mpi
- calling
- function
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5055—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a parallelization method for improving post-processing performance, which comprises the following steps: each process opens the file by calling a file operation function of the MPI; each process moves the global shared file pointer to a corresponding position in the file according to the read address displacement by calling a file addressing function of the MPI, and the processes simultaneously use one global shared file pointer to carry out I/O operation; each process reads a plurality of data in the file from the file into a memory buffer area by calling a file reading function of the MPI; each process realizes file closing and I/O operation ending by calling a file pointer closing function of the MPI; dividing the data into a plurality of sub data sets which are independent of each other by applying a data division algorithm; and (4) parallelly and collectively writing the sub data sets into the visual object, and transmitting the visual object to the main process to finish parallel and efficient display results.
Description
Technical Field
The method is applied to post-processing software in the field of simulation software, and particularly used for parallelization processing of large-scale data visualization and data IO parts.
Background
The physical and engineering simulation software carries out simulation and numerical calculation in various fields by methods such as finite element and the like, and covers a plurality of fields such as mechanics, fluid, electromagnetism, optics, acoustics, electrochemistry, chemical engineering, semiconductors and the like. Data results obtained by simulation computing software are often in the order of GB to hundreds of GB, and when large data results are subjected to visual post-processing, data reading, processing, checking and the like bring inconvenience, and the software often needs to wait for a long time.
The speed of the current simulation analysis software for reading and displaying ten GB magnitude is slow, and more manpower and time are consumed. Therefore, in order to improve the efficiency of the simulation software user in post-processing, a method is needed to be found, which can read the displayed functions to save the time of the simulation user when the post-processing is completed quickly.
Disclosure of Invention
In the parallel post-processing technical method, aiming at the problem that the time consumption is too long when the large-magnitude simulation calculation result data is post-processed, the invention provides the method for parallelizing the post-processing by using the multi-core CPU of the high-performance computer, thereby greatly improving the speed and the efficiency of the post-processing.
The invention solves the technical problems through the following technical scheme:
the invention provides a parallelization method for improving post-processing performance, which is characterized by comprising the following steps of:
s1, in the I/O of the parallel MPI (information transfer interface), each process opens the file by calling the file operation function of the MPI;
s2, each process moves the global shared file pointer to the corresponding position in the file according to the read address displacement by calling the file addressing function of MPI, and the processes simultaneously use one global shared file pointer to perform I/O operation;
s3, each process reads a plurality of data in the file from the file into a memory buffer area by calling a file reading function of the MPI;
s4, each process realizes file closing and I/O operation ending by calling a file pointer closing function of the MPI;
s5, dividing the data into a plurality of independent subdata sets by using a data segmentation algorithm;
and S6, writing the sub data sets into the visual object in a parallel and aggregated manner, and transmitting the visual object to the main process to finish parallel and efficient display results.
Preferably, in step S1, a first parameter of the file operation function represents a communication domain, a second parameter represents a file path and a file name to be opened, a third parameter represents a file opening manner, a fourth parameter transfers information to the I/O implementation by attaching a pair of key values to an information object declaring MPI, a fifth parameter represents a global shared file pointer, which is a handle returned by the I/O operation and can be used for a subsequent I/O operation, and in an object of the fourth parameter, information such as file fragmentation and internal buffer size can be transferred for optimizing the I/O operation of the MPI.
On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.
The positive progress effects of the invention are as follows:
the method provided by the invention can improve the performance of the simulation calculation software from the parallelization point of view aiming at a large number of calculation results of the simulation calculation software, accelerate the post-processing speed of the simulation data from the aspect of combining software and hardware, and avoid that a software user consumes too much time when facing huge data in the post-processing stage, thereby improving the post-processing efficiency while ensuring the accuracy.
Drawings
FIG. 1 is a flow chart of a parallelization method for improving post-processing performance according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present embodiment provides a parallelization method for improving post-processing performance, which is characterized in that it includes the following steps:
in step 101, in the I/O of the parallel MPI (information transfer interface), each process opens a file by calling a file operation function of the MPI.
In step 101, a first parameter of the file operation function represents a communication domain, a second parameter represents a file path and a file name that need to be opened, a third parameter represents a file opening manner, a fourth parameter transmits information to an I/O implementation by attaching a pair of key values to an information object that declares MPI, a fifth parameter represents a global shared file pointer, is a handle returned by the I/O operation, and can be used for a subsequent I/O operation, wherein in the object of the fourth parameter, information such as file fragments and an internal buffer size can be transmitted for optimizing the I/O operation of the MPI.
And 102, each process moves the global shared file pointer to a corresponding position in the file according to the read address displacement by calling a file addressing function of the MPI, and the processes simultaneously use one global shared file pointer to perform I/O operation.
And step 104, each process realizes file closing and I/O operation ending by calling a file pointer closing function of the MPI.
And 105, segmenting the data into a plurality of independent subdata sets by using a data segmentation algorithm.
When data segmentation is carried out, load distribution balance is ensured, so that delay caused by waiting is reduced. Each processor is responsible for processing different data subsets, has a set of processing flow, and in visualization, after each process obtains a result, the result of each subprocess is transmitted to a main process for display.
And 106, parallelly and collectively writing the sub data sets into the visual object, and transmitting the visual object to the main process to finish parallel and efficient display results.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.
Claims (2)
1. A parallelization method for improving post-processing performance, comprising the steps of:
s1, in the I/O of the parallel MPI, each process opens the file by calling the file operation function of the MPI;
s2, each process moves the global shared file pointer to the corresponding position in the file according to the read address displacement by calling the file addressing function of MPI, and the processes simultaneously use one global shared file pointer to perform I/O operation;
s3, each process reads a plurality of data in the file from the file into a memory buffer area by calling a file reading function of the MPI;
s4, each process realizes file closing and I/O operation ending by calling a file pointer closing function of the MPI;
s5, dividing the data into a plurality of independent subdata sets by using a data segmentation algorithm;
and S6, writing the sub data sets into the visual object in a parallel and aggregated manner, and transmitting the visual object to the main process to finish parallel and efficient display results.
2. The parallelization method for improving post-processing performance according to claim 1, wherein in step S1, a first parameter of the file operation function represents a communication domain, a second parameter represents a file path and a file name to be opened, a third parameter represents a file opening manner, a fourth parameter transfers information to the I/O implementation by appending a pair of key values to an information object declaring MPI, a fifth parameter represents a global shared file pointer, is a handle returned by the I/O operation, and can be used for a subsequent I/O operation, and in an object of the fourth parameter, information such as file fragmentation and an internal buffer size can be transferred to optimize the I/O operation of the MPI.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111287291.3A CN114003385A (en) | 2021-11-02 | 2021-11-02 | Parallelization method for improving post-processing performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111287291.3A CN114003385A (en) | 2021-11-02 | 2021-11-02 | Parallelization method for improving post-processing performance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114003385A true CN114003385A (en) | 2022-02-01 |
Family
ID=79926309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111287291.3A Withdrawn CN114003385A (en) | 2021-11-02 | 2021-11-02 | Parallelization method for improving post-processing performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114003385A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116185662A (en) * | 2023-02-14 | 2023-05-30 | 国家海洋环境预报中心 | Asynchronous parallel I/O method based on NetCDF and non-blocking communication |
-
2021
- 2021-11-02 CN CN202111287291.3A patent/CN114003385A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116185662A (en) * | 2023-02-14 | 2023-05-30 | 国家海洋环境预报中心 | Asynchronous parallel I/O method based on NetCDF and non-blocking communication |
CN116185662B (en) * | 2023-02-14 | 2023-11-17 | 国家海洋环境预报中心 | Asynchronous parallel I/O method based on NetCDF and non-blocking communication |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11023206B2 (en) | Dot product calculators and methods of operating the same | |
EP3144805B1 (en) | Method and processing apparatus for performing arithmetic operation | |
Wang et al. | Workload analysis and efficient OpenCL-based implementation of SIFT algorithm on a smartphone | |
WO2017185393A1 (en) | Apparatus and method for executing inner product operation of vectors | |
KR102371844B1 (en) | Computing method applied to artificial intelligence chip, and artificial intelligence chip | |
WO2019019926A1 (en) | System parameter optimization method, apparatus and device, and readable medium | |
CN115880132A (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN111651206A (en) | Device and method for executing vector outer product operation | |
CN114003385A (en) | Parallelization method for improving post-processing performance | |
CN114048816B (en) | Method, device, equipment and storage medium for sampling data of graph neural network | |
Xiao et al. | Image Sobel edge extraction algorithm accelerated by OpenCL | |
CN113538687B (en) | Finite element visualization method, system, device and storage medium based on VTK | |
US9032405B2 (en) | Systems and method for assigning executable functions to available processors in a multiprocessing environment | |
Tan et al. | Parallel particle swarm optimization algorithm based on graphic processing units | |
Sun et al. | Efficient knowledge graph embedding training framework with multiple gpus | |
CN106708499B (en) | Analysis method and analysis system of drawing processing program | |
He et al. | An optimal parallel implementation of Markov Clustering based on the coordination of CPU and GPU | |
CN116579914B (en) | Execution method and device of graphic processor engine, electronic equipment and storage medium | |
Kłopotek et al. | Solving systems of polynomial equations on a GPU | |
US11630667B2 (en) | Dedicated vector sub-processor system | |
Koprawi | Parallel Computation in Uncompressed Digital Images Using Computer Unified Device Architecture and Open Computing Language | |
CN112328960B (en) | Optimization method and device for data operation, electronic equipment and storage medium | |
CN117349190A (en) | Memory allocation method, accelerator, and storage medium | |
CN117556273A (en) | Method and device for calculating contrast loss through multiple graphic processors | |
CN118051264A (en) | Matrix processing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220201 |