CN101957840B

CN101957840B - Storage and optimization method of MPI (Message Passing Interface) parallel data

Info

Publication number: CN101957840B
Application number: CN2010102818399A
Authority: CN
Inventors: 张伟涛; 王道邦; 韩双牛; 李焰; 肖建国; 方仑; 周泽湘; 谭毓安
Original assignee: BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Current assignee: BEIJING TOYOU FEIJI ELECTRONICS Co Ltd
Priority date: 2010-09-14
Filing date: 2010-09-14
Publication date: 2012-06-27
Anticipated expiration: 2030-09-14
Also published as: CN101957840A

Abstract

The invention relates to a storage and optimization method of the parallel data of an MPI (Message Passing Interface). The method comprises the following steps of: (1) calling an MPI function relative to a target filet to be written, i.e. opening the file, writing the file, closing the file and encapsulating in a same-parameter and different-name type; (2) acquiring an integral multiple distance X from a current operation data offset to the next fsBlkSize as well as an integral multiple distance Y from the tail offset formed by offset+count to the last fsBlkSize according to a data block fsBlkSize of the current file system, the offset of the current operation data and a bit number (count) to be written in the file writing function encapsulated in the step (1), and acquiring different processing modes by the encapsulated file writing function according to the acquired X and Y values; and (3) substituting the encapsulated function in the step (1) for all MPI functions to be called by all target files to be written in an application program. Without any edition to the file system, the invention can ensure the result accuracy of the write operation at receivable efficiency by little edition of an application layer.

Description

A kind of MPI parallel data storage optimization method

Technical field

The present invention relates to a kind of storage optimization method, be specifically related to a kind of MPI parallel data storage optimization method, belong to computing machine parallel processing field.

Background technology

The parallel processing capability of computing machine has important use and is worth in some field; MPI (Message Passing Interface; Message passing interface) for the programming personnel most popular in the world at present multiple programming environment is provided; To scalable parallel computer, the network of workstations of distributed storage, and group system all has good support.Yet; In practical application, find; Some parallel file system calls at execution MPI_File_write_at; When same file destination is carried out between multinode the operation of multi-process concurrent write, because the head and the tail side-play amount of some process operation data can cause finally writing the inconsistent of result not on the integral multiple position of file system data block size.

Address this problem from parallel file system itself, need set about from the overall architecture of file system, drop into a large amount of man power and materials, it is higher to expend cost; Before the each write operation of application layer; The head and the tail side-play amount that guarantees institute's write data in advance is all on the integral multiple position of file system data block size; Can accomplish the concurrent write operation comparatively efficiently; But need the programming personnel constantly to note this problem at work, increased virtually programming task amount, weakened the dirigibility of program.

Summary of the invention

The objective of the invention is to problem to the prior art existence; From application layer a kind of optimization method is provided; Guarantee that some parallel file system calls carrying out MPI, when same file destination is carried out between multinode the operation of multi-process concurrent write, finally write result's consistance.

In some parallel file system; When a plurality of concurrent processes are carried out write operation to same file destination; With regard to certain process wherein, shown in accompanying drawing 1, there are two main parameter informations: side-play amount of data (offset) and the byte number (count) that will write.The size of supposing current file system data piece is fsBlkSize (byte); For process A, its data head and the tail offset location that will carry out write operation distributes as shown in Figure 1, wherein; Offset is X (byte) to the distance of next fsBlkSize integral multiple side-play amount; And the end side-play amount that (offset+count) forms is Y (byte) to the distance of a last fsBlkSize integral multiple side-play amount, and like this, the length of intermediate rest data segment just should be the nonnegative integer times of fsBlkSize.In the same way, the actual conditions of each process can be represented with the A process in the concurrent process, and only their separately X, Y value are different, if guarantee the value of X, Y and (X+Y) with, between 0 to fsBlkSize, get final product.In view of above-mentioned analysis; The solution that the present invention adopted is; The MPI function (MPI_File_open, MPI_File_write_at and MPI_File_close) that the file destination that will carry out write operation is related to; Carry out with the encapsulation of ginseng different name formula; And in application program, will be referred to the MPI operation (File Open, written document and closing of a file) of file destination, all adopts function after encapsulating (such as, correspond to MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex respectively) replace.Function after the encapsulation can adopt different processing modes respectively according to by the X of head and the tail offset location decision, the different values of Y when carrying out the concurrent write operation.

The invention provides a kind of MPI parallel data storage optimization method, may further comprise the steps:

The MPI function call that step 1 relates to the file destination of desiring to carry out write operation, promptly File Open MPI_File_open, written document MPI_File_write_at and closing of a file MPI_File_close carry out with the encapsulation of ginseng different name formula;

Step 2 is in the written document function of step 1 encapsulation; Suppose that the current file system data block size that reads is fsBlkSize; According to the side-play amount offset of current service data and the byte number count that will write; Quantitative relation by offset and fsBlkSize; Obtain the distance X of offset,, obtain the distance Y of the end side-play amount of offset+count formation to a last fsBlkSize integral multiple side-play amount by the quantitative relation of offset+count and fsBlkSize to next fsBlkSize integral multiple side-play amount; Wherein the unit of fsBlkSize, X, Y is byte; The written document function of encapsulation adopts different processing modes according to resulting X, Y value:

(1) if the head and the tail side-play amount of current process service data all on the integral multiple position of file system data block size, promptly X, Y are zero, still call former MPI_File_write_at function and carry out the concurrent write operation;

(2) if the offset value of current process not on the integral multiple position of file system data block size; Be that the X value is non-vanishing; Need here that length is the data message of X, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;

(3) if the offset+count value of current process not on the integral multiple position of file system data block size; Be that the Y value is non-vanishing; Need here that length is the data message of Y, temporarily copy in one section zone temporary, treat that this process is carried out shutoff operation to file destination before; Again that this part is temporary content is utilized the file lock mechanism, and serial is written to the physical location of file;

(4) no matter which kind of situation X, Y value are; As long as the length of intermediate rest is that the length of data segment of fsBlkSize integral multiple is non-vanishing; Promptly be fsBlkSize at least, all need handle that processing mode is: still call former MPI_File_write_at function and carry out the concurrent write operation to it;

Step 3 desires to carry out the MPI function call that the file destination of write operation relates to all in application program, i.e. MPI_File_open, MPI_File_write_at and MPI_File_close adopt in the step 1 function after the encapsulation to replace; After the replacement, to carrying out the file destination of write operation, the function that all calls after the encapsulation is operated.

Beneficial effect

The invention has the beneficial effects as follows:

(1) revises simply.Need not some parallel file system is made any modification,, just can guarantee write operation result's correctness with acceptable efficient only through few modifications in application layer.

(2) application is easy.The solution that is provided is convenient to realize, also is packaged into the built-in function of different-style easily, is convenient to satisfy the programming needs of different user.

(3) compatible good.The parameter that this scheme relates to all comes from the original function of MPI, so the built-in function that encapsulates, can be better compatible with the original function of MPI, convenient replacement.

Description of drawings

Fig. 1---process write operation DATA DISTRIBUTION synoptic diagram.

Fig. 2---encapsulation back MPI_File_open_Ex function processing flow chart.

Fig. 3---encapsulation back MPI_File_write_at_Ex function processing flow chart.

Fig. 4---encapsulation back MPI_File_close_Ex function processing flow chart.

Embodiment

Below in conjunction with accompanying drawing and embodiment, the present invention is done further detailed description.

In internal memory, open up one section zone, be used for having at least one not keep in the head and the tail side-play amount at the locational little blocks of data of file system data block size integral multiple (being the part of X and Y).Two structure: rest_data_t and rest_data_group_t have been defined here.Wherein, rest_data_t is used for preserving the temporary storage location pointer of unjustified data, with and side-play amount and length in file destination; Rest_data_group_t is on the basis of rest_data_t structure; Cnt, lockHandle and rest_data_t have been encapsulated; The above two represent current process need the lock effective rest_data_t quantity handled and the file handle that is used to lock respectively; And rest_data_t can be defined as the array form, and element wherein is used for writing down the necessary information that needs temporary head and tail.On this basis, the global variable gRestData of a rest_data_group_t structure of definition, the significant variable that need use when this is the actual temporal data of current process.

Three MPI functions that the file destination write operation is related to: " MPI_File_open ", " MPI_File_write_at " and " MPI_File_close "; Carry out with the encapsulation of ginseng different name formula; Be that function parameters is constant, and just function name carried out simple modification, such as; Similar with aforementioned part, above-mentioned three functions are packaged into successively: MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex here.After encapsulation; In the application program of reality; Will be referred to the MPI function call of file destination, i.e. MPI_File_open, MPI_File_write_at and MPI_File_close (respective file open, written document and closing of a file) successively adopt the function after the encapsulation; Be MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex; Simply replace respectively, adopt the function after encapsulating to carry out multi-process concurrent write operation between multinode, can normally realize the concurrent write function.Three function brief accounts of encapsulation are following:

(1) the MPI_File_open function is encapsulated, as shown in Figure 2, fundamental purpose is on the basis of this function original function, and each is desired to carry out the file destination of write operation, creates a lock file, and the function prototype after the encapsulation is following:

int?MPI_File_open_Ex(MPI_Comm?comm，char*filename，int?amode，MPI_Info?info，MPI_File*fh)；

Function declaration: this function is accomplished the opening operation to the file destination of desiring to carry out write operation; In the function body, according to the catalogue and the title of file destination, under the catalogue of file destination place; Generated one of the same name but have more one " .lock " temporary file as suffix; As the lock file, and gRestData carried out initialization, all the other operations are called the original MPI_File_open function of MPI fully and are realized.

(2) the MPI_File_write_at function is encapsulated, as shown in Figure 3, mainly be head and the tail side-play amount according to the current process service data, the content that write is carried out differentiating and processing, the function prototype after the encapsulation is following:

int?MPI_File_write_at_Ex(MPI_File_fh，MPI_Offset?offset，void*buf，int?count，MPI_Datatype?datatype，MPI_Status*status)；

Function declaration: write operation is handled function; Mainly be according to the side-play amount offset and the quantity count (is unit with byte) that want deal with data; Calculate the current head and the tail side-play amount that will write data; If handle respectively according to aforementioned schemes then---the head and the tail side-play amount of service data all is the integral multiple of file system data block size, then still calls original MPI_File_write_at function and carries out the concurrent write operation, thereby realized the concurrent write between process; This has reached the effect that most of data parallel is handled to a certain extent; If the head and the tail of operated data skews is not in the integral multiple position of file system data block size; Then in the whole story of current process service data offset ranges; The head and the tail side-play amount is adjusted to the nearest integral multiple position of file system data block size; This part content is still pressed the integral multiple situation handle, remaining head and tail two blocks of data nature is littler than file system data piece, copies in one section zone the content of head and tail temporary; And, be recorded among the gRestData subsequent use with information such as temporary storage location, side-play amount and quantity.

(3) the MPI_File_close function is encapsulated, as shown in Figure 4, major function is to handle the data that possibly keep in the write operation, and the function prototype after the encapsulation is following:

int?MPI_File_close_Ex(MPI_File*fh)；

Function declaration: this function is accomplished the shutoff operation to file destination, in the function body, before the file destination of accomplishing write operation is carried out shutoff operation; Under the file lock mechanism guarantees, judge the temporary information whether this process need be handled, if having; Then temporary content is written to the physical location of file; Remove the file lock then, and call the MPI_File_close function, with closing of a file.

In a word, after original MPI function encapsulated, in user space application; The MPI function call that the file destination of desiring to carry out write operation is related to; Be MPI_File_open, MPI_File_write_at and MPI_File_close (respective file open, written document and closing of a file) successively, all will use the function after the encapsulation, i.e. MPI_File_open_Ex, MPI_File_write_at_Ex and MPI_File_close_Ex; Replace respectively; Realize in some parallel file system that with the function after the encapsulation multi-process is operated the concurrent write of same file destination between multinode: when file is opened, generate the lock file; Write operation is automatically performed by the function after encapsulating according to aforementioned principles; When closing of a file, utilize the file lock mechanism, handle the data of being kept in the write operation.These operations realize that by the function after encapsulating the user need not to pay close attention to concrete realization details, can guarantee finally to write result's correctness and consistance; The built-in function of encapsulation desired parameters and former MPI is in full accord, makes things convenient for the user of the original function of those custom MPI to carry out replacement operation.

The present invention is not limited only to above embodiment, everyly utilizes mentality of designing of the present invention, does the design of some simple change, all should count within protection scope of the present invention.

Claims

1. MPI parallel data storage optimization method may further comprise the steps:

(4) no matter which kind of situation X, Y value are; As long as intermediate rest is that step (2) and the untreated length of step (3) are that the length of data segment of fsBlkSize integral multiple is non-vanishing; Promptly be fsBlkSize at least; All need handle it, processing mode is: still call former MPI_File_write_at function and carry out the concurrent write operation;