CN114995754A - High-performance read-write method for single scientific big data HDF5 file - Google Patents

High-performance read-write method for single scientific big data HDF5 file Download PDF

Info

Publication number
CN114995754A
CN114995754A CN202210585275.0A CN202210585275A CN114995754A CN 114995754 A CN114995754 A CN 114995754A CN 202210585275 A CN202210585275 A CN 202210585275A CN 114995754 A CN114995754 A CN 114995754A
Authority
CN
China
Prior art keywords
dataset
buf
read
data
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210585275.0A
Other languages
Chinese (zh)
Other versions
CN114995754B (en
Inventor
张承龙
张一�
何旭
李想
朱中柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of High Energy Physics of CAS
Original Assignee
Institute of High Energy Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of High Energy Physics of CAS filed Critical Institute of High Energy Physics of CAS
Priority to CN202210585275.0A priority Critical patent/CN114995754B/en
Publication of CN114995754A publication Critical patent/CN114995754A/en
Application granted granted Critical
Publication of CN114995754B publication Critical patent/CN114995754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a high-performance read-write method for a single scientific big data HDF5 file, which is characterized in that fine-grained parallelism is carried out on data sets in a single HDF5 file, each process processes a part of data sets, the next data sets are processed only after the data sets are processed, and the next data sets are processed only after all processes process the same data sets.

Description

High-performance read-write method for single scientific big data HDF5 file
Technical Field
The invention relates to the technical field of computers, in particular to a high-performance read-write method for a single scientific big data HDF5 file.
Background
The Hierarchical Data Format (HDF) is a file Format developed by the american super computing application center (NCSA) for storing and organizing large amounts of Data. The data storage system is mainly used for storing various types of scientific data generated by different computing platforms, has the advantages of parallel I/O (input/output) capability, cross-platform performance, easiness in expansion and the like, has become a standard format of EOS (Ethernet over Standard) data and information systems, and is widely applied to various scientific big data fields of physics, biology, chemistry, environmental science, materials, earth science, aviation, ocean and the like due to self-description, universality, flexibility and expansibility, so that the data storage system is used for storing and processing various complex types of scientific data.
In order to better perform data acquisition, transmission, processing and pre-experiment, scientific experiment process generally aggregates scientific data into a few files, which results in a huge single HDF5 file, the HDF5 file is stored in a disk file system, and the huge data volume of the HDF5 file and the low efficiency of disk storage result in that a large amount of time is spent by a scientific big data software system for waiting for the HDF5 file to perform read-write operation because the disk is a slow storage medium.
In the prior art, a read-write method for a single HDF5 file is mainly serial read-write, and parallel I/O read-write is difficult to maintain consistency, and a method for improving the read-write performance of a single HDF5 file through parallel I/O is not available, so that a large amount of time is consumed when a scientific big data software system reads and writes a single HDF5 file.
Therefore, a high-performance read-write method for a single scientific big data HDF5 file needs to be researched, and the problems that the single file is used as unit for coarse-grained parallel read-write, the parallel efficiency of a multi-core processor and a disk file system is low, and the synchronization among processes is complex are solved, so that the scientific experiment efficiency is indirectly improved. By performing finer-grained parallel on dataset inside a single HDF5 file as a granularity, the problems in the prior art can be effectively solved, the parallel efficiency of a multi-core processor and a disk file system is improved, and the read-write performance of the single HDF5 file is remarkably improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a high-performance read-write method for a single scientific big data HDF5 file, solves the problems of coarse-grained parallel read-write by taking the single file as a unit, low parallel efficiency of a multi-core processor and a disk file system and serious restriction on scientific experimental efficiency due to the fact that synchronization among processes is complex, can effectively solve the problems in the prior art by performing finer-grained parallel on dataset in the single HDF5 file by taking dataset as the granularity, improves the parallel efficiency of the multi-core processor and the disk file system, and remarkably improves the read-write performance of the single HDF5 file.
In order to achieve the purpose, the invention provides a high-performance read-write method for a single scientific big data HDF5 file, the read-write method is similar in principle, and the read-write method comprises the following steps:
s1: setting the number of processes in the communication domain as size, and recording the serial number of the current process as ID;
s2: opening the HDF5 file using parallel IO;
s3: acquiring the total number of datasets in the HDF5 file, and recording the total number as count; acquiring the data byte size of a single dataset in the HDF5 file, and recording the data byte size as M;
s4: setting the total number of the data sets stored in the process sending buffer area as N; distributing a cache for a sending cache region send _ buf of each process, wherein the size of the cache is nbytes ═ NxM; the total number of datasets processed by all processes at each time is NN (N × size);
s5: judging whether the process is a process No. 0; if so, allocating a memory space array with the size of count × M for saving the read data; distributing a receiving buffer recv _ buf, wherein the size of the receiving buffer recv _ buf is NNXM; if not, no action is taken.
S6: setting an iteration starting point 1 as ID multiplied by N; iteration end1 int (count/NN × NN); the iteration variable x1 is start 1;
s6: judging whether x1 is less than end 1;
when the judgment result of S6 is yes, the following steps are performed:
s6-1: setting an iteration starting point start2 to be 0; iteration end2 ═ N; the iteration variable x2 is start 2;
s6-2: judging whether x2 is less than end 2;
s6-3: when the judgment result of the step S6-2 is yes, reading the (x 1+ x 2) th dataset from the memory, copying the read dataset to the (x 2) th dataset position of the send buffer send _ buf, executing (x 2) ═ x2+1, and returning to the step S6-2;
s6-4: when the judgment result of the S6-2 is negative, performing aggregation operation on the cache data of all the processes, namely, Gather (send _ buf, recv _ buf);
s7: judging whether the process is a process No. 0;
s7-1: when the determination result at S7 is yes, the data in the receiving buffer recv _ buf is stored to the address beginning at the x1 dataset in the array, x1 is executed as x1+ NN, and then the process returns to S6 again;
s7-2: when the result of S7 is no, go directly back to S6;
when the judgment result of S6 is no, the following steps are performed:
s8: judging whether the process is a process No. 0;
when the judgment result of the S8 is yes, serially reading the rest dataset and storing the dataset into the array;
if the result of determination at S8 is negative, no processing is performed, and the process is skipped.
S9: close the HDF5 file;
s10: judging whether the process is a non-0 process;
when the result of S10 is yes, then exit is performed exit (0);
when the result of S10 is no, the entire algorithm process ends.
According to the invention, each process processes a part of the datasets, the next batch of datasets is processed after the datasets are processed, and the next batch of datasets is processed after all processes process the same batch of datasets.
Compared with the prior art, the method has the advantages that by performing finer-grained parallelization on the dataset in the single HDF5 file, the problems in the prior art can be effectively solved, the parallelization efficiency of the multi-core processor and the disk file system is improved, and the read-write performance of the single HDF5 file is remarkably improved.
Drawings
FIG. 1 is a schematic diagram of the logical structure of the HDF5 file according to the present invention;
FIG. 2 is a first schematic diagram of the algorithm of the present invention;
FIG. 3 is a schematic diagram of the algorithm flow of the present invention;
Detailed Description
The invention will now be further described with reference to the accompanying drawings. Referring to fig. 1 to fig. 3, the invention provides a high-performance read-write method for a single file of HDF5 for scientific big data, the read-write method has similar principles, wherein the read-write method comprises the following steps:
s1: setting the number of processes in the communication domain as size, and recording the number of the current process as ID;
s2: opening the HDF5 file using parallel IO;
s3: acquiring the total number of datasets in the HDF5 file, and recording the total number as count; acquiring the data byte size of a single dataset in the HDF5 file, and recording the data byte size as M;
s4: setting the total number of the data sets stored in the process sending buffer area as N; distributing a cache for a sending cache region send _ buf of each process, wherein the size of the cache is nbytes ═ NxM; the total number of datasets processed by all processes at each time is NN (N × size);
s5: judging whether the process is the process No. 0, if the process is the process No. 0, allocating a memory space array with the size of count multiplied by M for storing the read data, allocating a receiving cache recv _ buf with the size of NN multiplied by M, and if the process is not the process No. 0, not operating in the step;
s6: setting an iteration starting point 1 as ID multiplied by N; iteration end1 int (count/NN); the iteration variable x1 is equal to start1, and whether x1 is smaller than end1 is judged;
when the judgment result of S6 is yes, the following steps are performed:
s6-1: setting an iteration starting point start2 to be 0; iteration end2 ═ N; the iteration variable x2 is start 2;
s6-2: judging whether x2 is less than end 2;
s6-3: when the judgment result of the step S6-2 is yes, reading the (x 1+ x 2) th dataset from the memory, copying the read dataset to the (x 2) th dataset position of the sending buffer send _ buf, executing (x 2 is equal to x2+ 1), and returning to the step S6-2;
s6-4: when the judgment result of the S6-2 is negative, performing aggregation operation on the cache data of all the processes, namely Gather _ buf and recv _ buf;
s7: judging whether the process is a process No. 0;
s7-1: when the determination result at S7 is yes, the data in the receiving buffer recv _ buf is stored to the address beginning at the x1 dataset in the array, x1 is executed as x1+ NN, and then the process returns to S6 again;
s7-2: when the result of S7 is no, go directly back to S6;
when the judgment result of S6 is no, the following steps are performed:
s8: judging whether the process is a process No. 0;
when the judgment result of the S8 is yes, serially reading the rest dataset and storing the dataset into the array;
when the judgment result of the S8 is negative, no processing is performed, and the skipping is directly carried out;
s9: close the HDF5 file;
s10: judging whether the process is a non-0 process;
when the result of S10 is yes, then exit is performed exit (0);
when the result of S10 is no, the entire algorithm process ends.
Example (b):
the operation platform of the method mainly comprises a server, a desktop, a notebook and the like, and assuming that the file name of the HDF5 is file.h5, the principle of the invention is described below by taking an example of storing each picture as a dataset and reading HDF5 (fig. 1 is a logical structure of an HDF5 file).
All pictures are stored as a 3D array (the third dimension of the 3D array is regarded as a dataset), and the single file writing process of the HDF5 is similar to the three dimensions, as shown in FIGS. 2 and 3, the specific algorithm steps are as follows:
1. the number of processes in the communication domain is size
2. The process number is ID
3. Opening file. h5 using MPI-IO
4. Get the total number of datasets in file.h5, and record as count
5. H5, obtaining the data byte size of a single dataset and recording as M
6. Setting the total number of datasets stored in the process sending buffer area and recording the total number as N
7. Allocating a buffer to the send buffer (send _ buf) of each process, wherein the size of the buffer is nbytes ═ N × M
8. The total number of datasets processed by all processes at a time is NN (N size)
9. If the process is the process 0, allocating memory space array with the size of count × M for storing the read data
10. If the process is the process No. 0, the receiving buffer recv _ buf is distributed and has the size of NN M
11. Iteration start1 ═ ID × N
12. Iteration end1 int (count/NN)
13. Iteration variable x1 ═ start1
14. If x1< end1, then all processes perform the following in parallel:
14.1. iteration start2 is 0
14.2. Iteration end2 ═ N
14.3. Iteration variable x2 ═ start2
14.4. If x2< end2, then the following is performed:
14.4.1. reading the (x 1+ x 2) th dataset from the memory, and copying the dataset to the position of the (x 2) th dataset of the sending buffer send _ buf
14.4.2.x2=x2+1
14.4.3. Jump to execution at 14.4
14.5. Performing aggregation operation on the cache data of all the processes, Gather (send _ buf, recv _ buf)
14.6. If the process is the process No. 0, the data in the receiving buffer recv _ buf is stored to the address starting from the x1 th dataset in the array
14.7.x1=x1+NN
14.8. Jump to execution at 14 th
15. If it is process 0, then the remaining datasets are serially read and stored in array.
16. Close file. h5
17. If the process is not process number 0, exit is executed exit (0).
The above are only preferred embodiments of the present invention, and are only used to help understanding the method and the core idea of the present application, the scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.
The invention integrally solves the problems of low efficiency and complex operation caused by the fact that experimental data are stored in a single HDF5 file in the prior art to perform coarse-grained parallel reading and writing by taking a single file as a unit, and can effectively solve the problems in the prior art by performing finer-grained parallel by taking dataset in a single HDF5 file as a granularity, improve the parallel efficiency of a multi-core processor and a disk file system, remarkably improve the reading and writing performance of a single HDF5 file and greatly improve the efficiency of scientific experiments.

Claims (1)

1. A high-performance read-write method for a single scientific big data HDF5 file is characterized in that the read-write method is similar in principle, wherein the read method comprises the following steps:
s1: setting the number of processes in the communication domain as size, and recording the serial number of the current process as ID;
s2: opening the HDF5 file by using parallel IO;
s3: acquiring the total number of datasets in the HDF5 file, and recording the total number as count; acquiring the data byte size of a single dataset in the HDF5 file, and recording the data byte size as M;
s4: setting the total number of the data sets stored in the process sending buffer area as N; distributing a cache for a sending cache region send _ buf of each process, wherein the size of the cache is Nbytes which is N multiplied by M; the total number of datasets processed by all processes at each time is NN (N × size);
s5: judging whether the process is a process No. 0, if the process is the process No. 0, allocating a memory space array with the size of count multiplied by M for storing read data, allocating a receiving cache recv _ buf with the size of NNmultiplied by M, and if the process is not the process No. 0, not operating in the step;
s6: setting an iteration starting point 1 as ID multiplied by N; iteration end1 int (count/NN); an iteration variable x1 is equal to start1, and whether x1 is smaller than end1 is judged;
when the judgment result of the step S6 is yes, the following steps are executed:
s6-1: setting an iteration starting point start2 to be 0; iteration end2 ═ N; the iteration variable x2 is start 2;
s6-2: judging whether x2 is less than end 2;
s6-3: when the judgment result of the step S6-2 is yes, reading the (x 1+ x 2) th dataset from the memory, copying the read dataset to the (x 2) th dataset position of the sending buffer send _ buf, executing (x 2 is equal to x2+ 1), and returning to the step S6-2;
s6-4: when the judgment result of the S6-2 is negative, performing aggregation operation on the cache data of all the processes, namely, Gather (send _ buf, recv _ buf);
s7: judging whether the process is a process No. 0;
s7-1: when the determination result at S7 is yes, the data in the receiving buffer recv _ buf is stored to the address beginning at the x1 dataset in the array, x1 is executed as x1+ NN, and then the process returns to S6 again;
s7-2: when the result of S7 is no, go directly back to S6;
when the judgment result of the S6 is negative, the following steps are executed:
s8: judging whether the process is a process No. 0;
when the judgment result of the S8 is yes, serially reading the rest dataset and storing the dataset into an array;
when the judgment result of the S8 is negative, no processing is performed, and the skipping is directly carried out;
s9: close the HDF5 file;
s10: judging whether the process is a non-0 process;
when the result of the S10 is yes, then exit (0) is executed;
when the result of S10 is no, the entire algorithm process ends.
CN202210585275.0A 2022-05-26 2022-05-26 High-performance read-write method for single scientific big data HDF5 file Active CN114995754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210585275.0A CN114995754B (en) 2022-05-26 2022-05-26 High-performance read-write method for single scientific big data HDF5 file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210585275.0A CN114995754B (en) 2022-05-26 2022-05-26 High-performance read-write method for single scientific big data HDF5 file

Publications (2)

Publication Number Publication Date
CN114995754A true CN114995754A (en) 2022-09-02
CN114995754B CN114995754B (en) 2022-12-16

Family

ID=83028584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210585275.0A Active CN114995754B (en) 2022-05-26 2022-05-26 High-performance read-write method for single scientific big data HDF5 file

Country Status (1)

Country Link
CN (1) CN114995754B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286261B1 (en) * 2011-11-14 2016-03-15 Emc Corporation Architecture and method for a burst buffer using flash technology
CN106708848A (en) * 2015-11-12 2017-05-24 国核(北京)科学技术研究院有限公司 Request processing service-based HDF5 file multi-thread access method
CN110275732A (en) * 2019-05-28 2019-09-24 上海交通大学 The Parallel Implementation method of particle in cell method on ARMv8 processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286261B1 (en) * 2011-11-14 2016-03-15 Emc Corporation Architecture and method for a burst buffer using flash technology
CN106708848A (en) * 2015-11-12 2017-05-24 国核(北京)科学技术研究院有限公司 Request processing service-based HDF5 file multi-thread access method
CN110275732A (en) * 2019-05-28 2019-09-24 上海交通大学 The Parallel Implementation method of particle in cell method on ARMv8 processor

Also Published As

Publication number Publication date
CN114995754B (en) 2022-12-16

Similar Documents

Publication Publication Date Title
US10095556B2 (en) Parallel priority queue utilizing parallel heap on many-core processors for accelerating priority-queue-based applications
US9712646B2 (en) Automated client/server operation partitioning
JP7039631B2 (en) Methods, devices, devices, and storage media for managing access requests
EP3938917B1 (en) Moving data in a memory and command for memory control
AU2013361244A1 (en) Paraller priority queue utilizing parallel heap on many-core processors for accelerating priority-queue-based applications
CN111124270B (en) Method, apparatus and computer program product for cache management
CN112181293B (en) Solid state disk controller, solid state disk, storage system and data processing method
US9288109B2 (en) Enabling cluster scaling
CN115129621B (en) Memory management method, device, medium and memory management module
WO2022257575A1 (en) Data processing method, apparatus, and device
CN102609486A (en) Data reading/writing acceleration method of Linux file system
CN114995754B (en) High-performance read-write method for single scientific big data HDF5 file
Chen et al. Active burst-buffer: In-transit processing integrated into hierarchical storage
WO2023040348A1 (en) Data processing method in distributed system, and related system
KR20220085031A (en) Storage device adapter to accelerate database temporary table processing
CN111538487B (en) Distributed parallel grid generation software framework
Liu et al. LazySort: A customized sorting algorithm for non-volatile memory
US20230325240A1 (en) Data processing using a heterogeneous memory pool
TWI799221B (en) Method and apparatus for programming data into flash memory
US20230221865A1 (en) Method, system, and device for writing compressed data to disk, and readable storage medium
Gezelter Revisiting Operating System Mass Storage Presumptions Enables Higher Performance and Efficiency
US20240061577A1 (en) Recycle optimization in storage engine
CN116893982A (en) Character buffer zone setting method and system based on linked list and circular queue
US9165014B1 (en) Methods and apparatus for multi-resolution replication of files in a parallel computing system using semantic information
US20240220334A1 (en) Data processing method in distributed system, and related system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant