CN101957840B - Storage and optimization method of MPI (Message Passing Interface) parallel data - Google Patents

Storage and optimization method of MPI (Message Passing Interface) parallel data Download PDF

Info

Publication number
CN101957840B
CN101957840B CN2010102818399A CN201010281839A CN101957840B CN 101957840 B CN101957840 B CN 101957840B CN 2010102818399 A CN2010102818399 A CN 2010102818399A CN 201010281839 A CN201010281839 A CN 201010281839A CN 101957840 B CN101957840 B CN 101957840B
Authority
CN
China
Prior art keywords
file
mpi
function
fsblksize
offset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2010102818399A
Other languages
Chinese (zh)
Other versions
CN101957840A (en
Inventor
张伟涛
王道邦
韩双牛
李焰
肖建国
方仑
周泽湘
谭毓安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Original Assignee
BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING TOYOU FEIJI ELECTRONICS Co Ltd filed Critical BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority to CN2010102818399A priority Critical patent/CN101957840B/en
Publication of CN101957840A publication Critical patent/CN101957840A/en
Application granted granted Critical
Publication of CN101957840B publication Critical patent/CN101957840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a storage and optimization method of the parallel data of an MPI (Message Passing Interface). The method comprises the following steps of: (1) calling an MPI function relative to a target filet to be written, i.e. opening the file, writing the file, closing the file and encapsulating in a same-parameter and different-name type; (2) acquiring an integral multiple distance X from a current operation data offset to the next fsBlkSize as well as an integral multiple distance Y from the tail offset formed by offset+count to the last fsBlkSize according to a data block fsBlkSize of the current file system, the offset of the current operation data and a bit number (count) to be written in the file writing function encapsulated in the step (1), and acquiring different processing modes by the encapsulated file writing function according to the acquired X and Y values; and (3) substituting the encapsulated function in the step (1) for all MPI functions to be called by all target files to be written in an application program. Without any edition to the file system, the invention can ensure the result accuracy of the write operation at receivable efficiency by little edition of an application layer.

Description

A kind of MPI parallel data storage optimization method
Technical field
The present invention relates to a kind of storage optimization method, be specifically related to a kind of MPI parallel data storage optimization method, belong to computing machine parallel processing field.
Background technology
The parallel processing capability of computing machine has important use and is worth in some field; MPI (Message Passing Interface; Message passing interface) for the programming personnel most popular in the world at present multiple programming environment is provided; To scalable parallel computer, the network of workstations of distributed storage, and group system all has good support.Yet; In practical application, find; Some parallel file system calls at execution MPI_File_write_at; When same file destination is carried out between multinode the operation of multi-process concurrent write, because the head and the tail side-play amount of some process operation data can cause finally writing the inconsistent of result not on the integral multiple position of file system data block size.
Address this problem from parallel file system itself, need set about from the overall architecture of file system, drop into a large amount of man power and materials, it is higher to expend cost; Before the each write operation of application layer; The head and the tail side-play amount that guarantees institute's write data in advance is all on the integral multiple position of file system data block size; Can accomplish the concurrent write operation comparatively efficiently; But need the programming personnel constantly to note this problem at work, increased virtually programming task amount, weakened the dirigibility of program.
Summary of the invention
The objective of the invention is to problem to the prior art existence; From application layer a kind of optimization method is provided; Guarantee that some parallel file system calls carrying out MPI, when same file destination is carried out between multinode the operation of multi-process concurrent write, finally write result's consistance.
In some parallel file system; When a plurality of concurrent processes are carried out write operation to same file destination; With regard to certain process wherein, shown in accompanying drawing 1, there are two main parameter informations: side-play amount of data (offset) and the byte number (count) that will write.The size of supposing current file system data piece is fsBlkSize (byte); For process A, its data head and the tail offset location that will carry out write operation distributes as shown in Figure 1, wherein; Offset is X (byte) to the distance of next fsBlkSize integral multiple side-play amount; And the end side-play amount that (offset+count) forms is Y (byte) to the distance of a last fsBlkSize integral multiple side-play amount, and like this, the length of intermediate rest data segment just should be the nonnegative integer times of fsBlkSize.In the same way, the actual conditions of each process can be represented with the A process in the concurrent process, and only their separately X, Y value are different, if guarantee the value of X, Y and (X+Y) with, between 0 to fsBlkSize, get final product.In view of above-mentioned analysis; The solution that the present invention adopted is; The MPI function (MPI_File_open, MPI_File_write_at and MPI_File_close) that the file destination that will carry out write operation is related to; Carry out with the encapsulation of ginseng different name formula; And in application program, will be referred to the MPI operation (File Open, written document and closing of a file) of file destination, all adopts function after encapsulating (such as, correspond to MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex respectively) replace.Function after the encapsulation can adopt different processing modes respectively according to by the X of head and the tail offset location decision, the different values of Y when carrying out the concurrent write operation.
The invention provides a kind of MPI parallel data storage optimization method, may further comprise the steps:
The MPI function call that step 1 relates to the file destination of desiring to carry out write operation, promptly File Open MPI_File_open, written document MPI_File_write_at and closing of a file MPI_File_close carry out with the encapsulation of ginseng different name formula;
Step 2 is in the written document function of step 1 encapsulation; Suppose that the current file system data block size that reads is fsBlkSize; According to the side-play amount offset of current service data and the byte number count that will write; Quantitative relation by offset and fsBlkSize; Obtain the distance X of offset,, obtain the distance Y of the end side-play amount of offset+count formation to a last fsBlkSize integral multiple side-play amount by the quantitative relation of offset+count and fsBlkSize to next fsBlkSize integral multiple side-play amount; Wherein the unit of fsBlkSize, X, Y is byte; The written document function of encapsulation adopts different processing modes according to resulting X, Y value:
(1) if the head and the tail side-play amount of current process service data all on the integral multiple position of file system data block size, promptly X, Y are zero, still call former MPI_File_write_at function and carry out the concurrent write operation;
(2) if the offset value of current process not on the integral multiple position of file system data block size; Be that the X value is non-vanishing; Need here that length is the data message of X, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;
(3) if the offset+count value of current process not on the integral multiple position of file system data block size; Be that the Y value is non-vanishing; Need here that length is the data message of Y, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;
(4) no matter which kind of situation X, Y value are; As long as the length of intermediate rest is that the length of data segment of fsBlkSize integral multiple is non-vanishing; Promptly be fsBlkSize at least, all need handle that processing mode is: still call former MPI_File_write_at function and carry out the concurrent write operation to it;
Step 3 desires to carry out the MPI function call that the file destination of write operation relates to all in application program, i.e. MPI_File_open, MPI_File_write_at and MPI_File_close adopt in the step 1 function after the encapsulation to replace; After the replacement, to carrying out the file destination of write operation, the function that all calls after the encapsulation is operated.
Beneficial effect
The invention has the beneficial effects as follows:
(1) revises simply.Need not some parallel file system is made any modification,, just can guarantee write operation result's correctness with acceptable efficient only through few modifications in application layer.
(2) application is easy.The solution that is provided is convenient to realize, also is packaged into the built-in function of different-style easily, is convenient to satisfy the programming needs of different user.
(3) compatible good.The parameter that this scheme relates to all comes from the original function of MPI, so the built-in function that encapsulates, can be better compatible with the original function of MPI, convenient replacement.
Description of drawings
Fig. 1---process write operation DATA DISTRIBUTION synoptic diagram.
Fig. 2---encapsulation back MPI_File_open_Ex function processing flow chart.
Fig. 3---encapsulation back MPI_File_write_at_Ex function processing flow chart.
Fig. 4---encapsulation back MPI_File_close_Ex function processing flow chart.
Embodiment
Below in conjunction with accompanying drawing and embodiment, the present invention is done further detailed description.
In internal memory, open up one section zone, be used for having at least one not keep in the head and the tail side-play amount at the locational little blocks of data of file system data block size integral multiple (being the part of X and Y).Two structure: rest_data_t and rest_data_group_t have been defined here.Wherein, rest_data_t is used for preserving the temporary storage location pointer of unjustified data, with and side-play amount and length in file destination; Rest_data_group_t is on the basis of rest_data_t structure; Cnt, lockHandle and rest_data_t have been encapsulated; The above two represent current process need the lock effective rest_data_t quantity handled and the file handle that is used to lock respectively; And rest_data_t can be defined as the array form, and element wherein is used for writing down the necessary information that needs temporary head and tail.On this basis, the global variable gRestData of a rest_data_group_t structure of definition, the significant variable that need use when this is the actual temporal data of current process.
Three MPI functions that the file destination write operation is related to: " MPI_File_open ", " MPI_File_write_at " and " MPI_File_close "; Carry out with the encapsulation of ginseng different name formula; Be that function parameters is constant, and just function name carried out simple modification, such as; Similar with aforementioned part, above-mentioned three functions are packaged into successively: MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex here.After encapsulation; In the application program of reality; Will be referred to the MPI function call of file destination, i.e. MPI_File_open, MPI_File_write_at and MPI_File_close (respective file open, written document and closing of a file) successively adopt the function after the encapsulation; Be MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex; Simply replace respectively, adopt the function after encapsulating to carry out multi-process concurrent write operation between multinode, can normally realize the concurrent write function.Three function brief accounts of encapsulation are following:
(1) the MPI_File_open function is encapsulated, as shown in Figure 2, fundamental purpose is on the basis of this function original function, and each is desired to carry out the file destination of write operation, creates a lock file, and the function prototype after the encapsulation is following:
int?MPI_File_open_Ex(MPI_Comm?comm,char*filename,int?amode,MPI_Info?info,MPI_File*fh);
Function declaration: this function is accomplished the opening operation to the file destination of desiring to carry out write operation; In the function body, according to the catalogue and the title of file destination, under the catalogue of file destination place; Generated one of the same name but have more one " .lock " temporary file as suffix; As the lock file, and gRestData carried out initialization, all the other operations are called the original MPI_File_open function of MPI fully and are realized.
(2) the MPI_File_write_at function is encapsulated, as shown in Figure 3, mainly be head and the tail side-play amount according to the current process service data, the content that write is carried out differentiating and processing, the function prototype after the encapsulation is following:
int?MPI_File_write_at_Ex(MPI_File_fh,MPI_Offset?offset,void*buf,int?count,MPI_Datatype?datatype,MPI_Status*status);
Function declaration: write operation is handled function; Mainly be according to the side-play amount offset and the quantity count (is unit with byte) that want deal with data; Calculate the current head and the tail side-play amount that will write data; If handle respectively according to aforementioned schemes then---the head and the tail side-play amount of service data all is the integral multiple of file system data block size, then still calls original MPI_File_write_at function and carries out the concurrent write operation, thereby realized the concurrent write between process; This has reached the effect that most of data parallel is handled to a certain extent; If the head and the tail of operated data skews is not in the integral multiple position of file system data block size; Then in the whole story of current process service data offset ranges; The head and the tail side-play amount is adjusted to the nearest integral multiple position of file system data block size; This part content is still pressed the integral multiple situation handle, remaining head and tail two blocks of data nature is littler than file system data piece, copies in one section zone the content of head and tail temporary; And, be recorded among the gRestData subsequent use with information such as temporary storage location, side-play amount and quantity.
(3) the MPI_File_close function is encapsulated, as shown in Figure 4, major function is to handle the data that possibly keep in the write operation, and the function prototype after the encapsulation is following:
int?MPI_File_close_Ex(MPI_File*fh);
Function declaration: this function is accomplished the shutoff operation to file destination, in the function body, before the file destination of accomplishing write operation is carried out shutoff operation; Under the file lock mechanism guarantees, judge the temporary information whether this process need be handled, if having; Then temporary content is written to the physical location of file; Remove the file lock then, and call the MPI_File_close function, with closing of a file.
In a word, after original MPI function encapsulated, in user space application; The MPI function call that the file destination of desiring to carry out write operation is related to; Be MPI_File_open, MPI_File_write_at and MPI_File_close (respective file open, written document and closing of a file) successively, all will use the function after the encapsulation, i.e. MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex; Replace respectively; Realize in some parallel file system that with the function after the encapsulation multi-process is operated the concurrent write of same file destination between multinode: when file is opened, generate the lock file; Write operation is automatically performed by the function after encapsulating according to aforementioned principles; When closing of a file, utilize the file lock mechanism, handle the data of being kept in the write operation.These operations realize that by the function after encapsulating the user need not to pay close attention to concrete realization details, can guarantee finally to write result's correctness and consistance; The built-in function of encapsulation desired parameters and former MPI is in full accord, makes things convenient for the user of the original function of those custom MPI to carry out replacement operation.
The present invention is not limited only to above embodiment, everyly utilizes mentality of designing of the present invention, does the design of some simple change, all should count within protection scope of the present invention.

Claims (1)

1. MPI parallel data storage optimization method may further comprise the steps:
The MPI function call that step 1 relates to the file destination of desiring to carry out write operation, promptly File Open MPI_File_open, written document MPI_File_write_at and closing of a file MPI_File_close carry out with the encapsulation of ginseng different name formula;
Step 2 is in the written document function of step 1 encapsulation; Suppose that the current file system data block size that reads is fsBlkSize; According to the side-play amount offset of current service data and the byte number count that will write; Quantitative relation by offset and fsBlkSize; Obtain the distance X of offset,, obtain the distance Y of the end side-play amount of offset+count formation to a last fsBlkSize integral multiple side-play amount by the quantitative relation of offset+count and fsBlkSize to next fsBlkSize integral multiple side-play amount; Wherein the unit of fsBlkSize, X, Y is byte; The written document function of encapsulation adopts different processing modes according to resulting X, Y value:
(1) if the head and the tail side-play amount of current process service data all on the integral multiple position of file system data block size, promptly X, Y are zero, still call former MPI_File_write_at function and carry out the concurrent write operation;
(2) if the offset value of current process not on the integral multiple position of file system data block size; Be that the X value is non-vanishing; Need here that length is the data message of X, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;
(3) if the offset+count value of current process not on the integral multiple position of file system data block size; Be that the Y value is non-vanishing; Need here that length is the data message of Y, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;
(4) no matter which kind of situation X, Y value are; As long as intermediate rest is that step (2) and the untreated length of step (3) are that the length of data segment of fsBlkSize integral multiple is non-vanishing; Promptly be fsBlkSize at least; All need handle it, processing mode is: still call former MPI_File_write_at function and carry out the concurrent write operation;
Step 3 desires to carry out the MPI function call that the file destination of write operation relates to all in application program, i.e. MPI_File_open, MPI_File_write_at and MPI_File_close adopt in the step 1 function after the encapsulation to replace; After the replacement, to carrying out the file destination of write operation, the function that all calls after the encapsulation is operated.
CN2010102818399A 2010-09-14 2010-09-14 Storage and optimization method of MPI (Message Passing Interface) parallel data Active CN101957840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102818399A CN101957840B (en) 2010-09-14 2010-09-14 Storage and optimization method of MPI (Message Passing Interface) parallel data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102818399A CN101957840B (en) 2010-09-14 2010-09-14 Storage and optimization method of MPI (Message Passing Interface) parallel data

Publications (2)

Publication Number Publication Date
CN101957840A CN101957840A (en) 2011-01-26
CN101957840B true CN101957840B (en) 2012-06-27

Family

ID=43485170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102818399A Active CN101957840B (en) 2010-09-14 2010-09-14 Storage and optimization method of MPI (Message Passing Interface) parallel data

Country Status (1)

Country Link
CN (1) CN101957840B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102650956B (en) * 2011-02-23 2014-08-27 蓝盾信息安全技术股份有限公司 Program concurrent method and system
CN111061652B (en) * 2019-12-18 2021-12-31 中山大学 Nonvolatile memory management method and system based on MPI-IO middleware
CN112783476B (en) * 2021-01-15 2022-02-22 中国核动力研究设计院 Easily-extensible software system, calling method and terminal for reactor core numerical solver

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187906A (en) * 2006-11-22 2008-05-28 国际商业机器公司 System and method for providing high performance scalable file I/O

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8312464B2 (en) * 2007-08-28 2012-11-13 International Business Machines Corporation Hardware based dynamic load balancing of message passing interface tasks by modifying tasks
US20100037214A1 (en) * 2008-08-11 2010-02-11 International Business Machines Corporation Method and system for mpi_wait sinking for better computation-communication overlap in mpi applications
US8161127B2 (en) * 2009-02-23 2012-04-17 International Business Machines Corporation Process mapping in parallel computing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187906A (en) * 2006-11-22 2008-05-28 国际商业机器公司 System and method for providing high performance scalable file I/O

Also Published As

Publication number Publication date
CN101957840A (en) 2011-01-26

Similar Documents

Publication Publication Date Title
Zheng et al. PreDatA–preparatory data analytics on peta-scale machines
CN103098014B (en) Storage system
Zheng et al. FlexIO: I/O middleware for location-flexible scientific data analytics
Bhatotia et al. Incoop: MapReduce for incremental computations
Chen et al. Computation and communication efficient graph processing with distributed immutable view
Son et al. Enabling active storage on parallel I/O software stacks
CN104965689A (en) Hybrid parallel computing method and device for CPUs/GPUs
CN103930875A (en) Software virtual machine for acceleration of transactional data processing
CN105051695B (en) It is immutable to share zero replicate data and spread defeated
CN105103136B (en) Shared and managed memory is unified to be accessed
CN101957863A (en) Data parallel processing method, device and system
Latham et al. A case study for scientific I/O: improving the FLASH astrophysics code
US20180039422A1 (en) Solid state storage capacity management systems and methods
CN103218176A (en) Data processing method and device
CN101957840B (en) Storage and optimization method of MPI (Message Passing Interface) parallel data
CN105378673A (en) Zero-copy caching
CN101789944A (en) Development system of communication protocol stack of multifunctional energy meter
Lee et al. High performance communication between parallel programs
CN103927215A (en) kvm virtual machine scheduling optimization method and system based on memory disk and SSD disk
Yuan et al. Cloud data management for scientific workflows: Research issues, methodologies, and state-of-the-art
Beynon et al. Performance optimization for data intensive grid applications
Berriman et al. The application of cloud computing to the creation of image mosaics and management of their provenance
CN102750353B (en) Method for analyzing distributed data in key value library
Poyraz et al. Application-specific I/O optimizations on petascale supercomputers
Watson et al. Operational experiences with the TI advanced scientific computer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant