CN101382955A

CN101382955A - File reading method in cluster file system and system

Info

Publication number: CN101382955A
Application number: CNA2008102234889A
Authority: CN
Inventors: 刘岳; 熊劲
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2008-09-28
Filing date: 2008-09-28
Publication date: 2009-03-11
Anticipated expiration: 2028-09-28
Also published as: CN101382955B

Abstract

The invention relates to a method for reading files in a cluster file system and a system thereof. The method includes the steps: step 1 that a client divides a single pre-reading request from VFS level or a single file reading request from the VFS level with access granularity over a pre-set value into at least two divided reading requests; step 2 that the client packages every divided reading request into a reading request message which is sent to a storage server; step 3 that the storage server receives all reading request messages one of which is processed in sequence so as to obtain location information, read data specified by the location information, and send a responding message to the client; the step 3 that is repeated till all data visited by the reading message are read; and step 4 that the client receives the responding message and sends the data in the responding message to the VSF level. Therefore, a disk I/O of a storage server and network data transmission can work synchronously, overall processing time of a request is shortened and throughput ratio is improved.

Description

The method and system that file reads in a kind of cluster file system

Technical field

The present invention relates to the Computer Storage field, relate in particular to the method and system that file reads in a kind of cluster file system.

Background technology

A group of planes (cluster) system is made up of interconnected a plurality of stand-alone computer, this computing machine can be unit or multicomputer system, for example PC (personal computer), workstation or SMP (symmetrical multiprocessing system), each computing machine all has storer, I/O (I/O) the device and operating system of oneself.Network of Workstation is a single system to user and application, and high performance environments and rapid and reliable service efficiently at a low price can be provided.Because Network of Workstation has the advantage of high performance-price ratio, it has become the main flow structure of high-performance computer.

In Network of Workstation, storage server is equipped with jumbo memory device usually, when Network of Workstation operates, need manage these memory devices.Simultaneously, Network of Workstation also needs to provide file-sharing service for the user of different clients.Cluster file system provides above-mentioned service for Network of Workstation, and it integrates all memory devices in the Network of Workstation, sets up a unified name space (institutional framework of file and catalogue).Each client is seen the file system of bibliographic structure unanimity, and the user of different nodes (client) can adopt the identical file of transparent way visit.Data in the cluster file system are not stored in the disk of this client usually, but are stored on the storage server, thereby all can be provided with special-purpose storage server usually.To be written as example, when application process was passed through the client write data of cluster file system, client at first was sent to the storage server end with data by network, and storage server is write the data that receive in the memory device of storage server again.

The IO of cluster file system (input and output) path is long, the implementation of whole operation relates to a plurality of key components, such as the buffer memory of cluster file system client, buffer memory, the IO scheduling of storage server end and controller, processor and the Internet resources of storage server end.In cluster file system, need above-mentioned part collaborative work to finish the various IO operation requests of application.At present, disk access and network transmission performance are relatively low, lag behind the development of other assembly.Therefore, for I/O (I/O) intensive applications of cluster file system, the disk access of data and network latency have occupied the overwhelming majority of whole Request Processing time.

Because the processing of request process need experiences a plurality of stages, be subjected to the inspiration of instruction process streamlined, can adopt the method for processing of request streamlined, thereby make a plurality of physical equipment concurrent workings.The most important condition of streamlined is that a plurality of requests can be sent simultaneously.Have only a plurality of requests to be handled simultaneously, could guarantee that a plurality of processing element can concurrent working.Request transfers the asynchronization process that process that pilosity penetrates is called request to by single transmit.Because write operation itself is asynchronous, do not need to wait for that this write operation finishes, just can send next write request.Therefore, even under the situation of single load, synchronization also has a plurality of write requests and is addressed to the processing of storage server end.So the streamlined of write request can form naturally.Different with write request, read request all is synchronous.Application program need just can be carried out follow-up calculating and follow-up read operation from file after the sense data.So if do not carry out particular processing, the processing procedure of read request will be strict serial.As shown in Figure 1, the storage server end reads in internal memory with the data of first request from disk earlier, then, again data is sent back client by network; Client is sent follow-up read request after receiving first processing of request result again.In the entire process process, various physical equipments can't concurrent working.

For other processing procedure parallelization with disk access process and request, prior art is to adopt the mode of data pre-fetching.When the access module that detects application was sequential access, file system also can continue to read follow-up a part of data from disk after will using the data of being asked and reading internal memory from disk.When application program is sent read request to follow-up data, then directly from internal memory, read.

Reading mechanism in advance in the storage server end use of cluster file system can be overlapping with disk access time and network latency to a certain extent, realizes parallelization.But two restrictive conditions are arranged: one is that the visit of application program must be a sequential access, will can not play a role otherwise read mechanism in advance; In addition, the granularity of reading in advance of server end needs big to a certain degree could be with the sufficient parallelization of processing procedure.And under multi-load case, excessive read granularity in advance and will consume more memory source.More than 2 limited read in advance mechanism in effect with the parallelization of processing of request process.

Summary of the invention

For addressing the above problem, the invention provides the method and system that file reads in a kind of cluster file system, but can make the magnetic disc i/o and the network data transmission concurrent working of storage server, shorten the total processing time of request, the throughput of elevator system.

The invention discloses the method that file reads in a kind of cluster file system, comprising:

Step 1, client in the future the self-virtualizing file system layer single pre-read request or be split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value and split read requests;

Step 2, described client is encapsulated into each described fractionation read request in the read request message, and all described read request message are sent to storage server;

Step 3, described storage server receives all described read request message, and first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;

Step 4, the next read request message of described storage server processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat described step 4, the data of visiting up to all described read requests all are read;

Step 5, described client receives described response message, and data in the described response message are returned to the Virtual File System layer.

The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.

Also comprise before the described step 1:

Step 31, described client judge whether access module is sequential access mode, if then execution in step 32, otherwise, execution in step 33;

Step 32 is carried out sequential prefetch operations, and the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server;

Step 33 is prepared described read request is mail to described storage server.

The preset value of the visit granularity of described read request is the 512K byte.

The invention also discloses the system that file reads in a kind of cluster file system, comprise client and storage server,

Described client comprises and splits module, package module and return module,

Described fractionation module is used for the single pre-read request of self-virtualizing file system layer in the future or is split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value splitting read requests;

Described package module is used for each described fractionation read request is encapsulated into a read request message, and all described read request message are sent to storage server;

The described module of returning is used to receive the response message that described storage server is replied, and the data in the described response message are returned to the Virtual File System layer;

Described storage server, be used to receive all described read request message, first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message; The next read request message of processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat the process of the next read request message of described processing, the data of visiting up to all described read request message all are read.

Described fractionation module also is used for when access module is sequential access mode, carries out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.

Beneficial effect of the present invention is, by single read request or pre-read request are split into several little read requests, is encapsulated into that read request message is parallel to be sent, and can guarantee synchronization, has a plurality of read request message to be sent to the storage server end.After the content that the storage server end is visited previous read request message is read internal memory from disk, with these data in transmission over networks, disk reads the content that next read request message is visited, thereby the time-interleaving of reading from disk of the previous read request message data of the visiting data of visiting in network latency and a back read request message, as shown in Figure 2.The size of single read request or pre-read request is S, the data that this read request or pre-read request are visited from the time that disk reads be T _d, the network latency of these data is T _n, the T.T. that this read request or pre-read request are handled is T _d+ T _nWhen in certain visit particle size range, disk access time and network latency all are directly proportional with data volume and disk access speed when being slower than network speed, with raw requests split into four split read requests after, the T.T. of processing is (T _d/ 4+T _d/ 4+T _d/ 4+T _d/ 4+T _n/ 4)=T _d+ T _n/ 4.The processing time that is original read request message has reduced T _n* 3/4.

Description of drawings

Fig. 1 is a prior art read request serial processing synoptic diagram;

Fig. 2 is a beneficial effect synoptic diagram of the present invention;

Fig. 3 is a system flowchart of the present invention;

Fig. 4 is a method flow diagram of the present invention.

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail.

System architecture of the present invention comprises client 301 and storage server 302 as shown in Figure 3.

Client 301 comprises and splits module 311, package module 312 and return module 313,

Split module 311, be used for from the single pre-read request of VFS (Virtual File System, Virtual File System) layer or be split as at least two from the visit granularity of VFS layer greater than the single file read request of preset value and split read requests.Wherein, preset value is the 512K byte in the present embodiment.

Split read request, be read request, the visit granularity that splits read request is less than the pre-read request from the VFS layer, and less than from the visit granularity of the VFS layer single file read request greater than preset value.

Split module 311 and also be used for when access module is sequential access mode, carry out the pre-read operation of order, will ask the positional information of pre-read data to be packaged into pre-read request, and prepare and pre-read request to mail to storage server 302; When access module is not sequential access mode, preparation will be mail to storage server 302 from the single file read request of VFS layer.

Package module 312 is used for that each is split read request and is encapsulated into a read request message, and all read request message are sent to storage server 302.Wherein, a fractionation read request is encapsulated in the read request message.

Return module 313, be used to receive the response message that storage server 302 is replied, the data in this response message are returned to the VFS layer.

Storage server 302, be used to receive all read request message that client 301 sends, first read request message of processed in sequence, obtain data positional information hereof, read this positional information data designated, by response message these data are sent to client 301, and the next read request message of processed in sequence, obtain data positional information hereof, read this positional information data designated, these data are sent to client 301, repeat and handle next read request by response message, by response message these data are sent to client 301, the data of visiting up to all read requests all are read.

The inventive method flow process as shown in Figure 4.

Step S401, client judges whether access module is sequential access mode, if, execution in step S402 then, otherwise, execution in step S403.

Step S402 carries out the pre-read operation of order, will ask the positional information of pre-read data to be packaged into pre-read request, and prepares and should pre-read request mail to described storage server, execution in step S404.

The visit granularity of the pre-read request after the conversion is not the visit granularity that application program is sent the file access request, but through reading the visit granularity after the machine-processed polymerization in advance.

Step S403, preparation will be mail to storage server from the single file read request of VFS layer.

Step S404, client will or be split as at least two from the visit granularity of VFS layer greater than the single file read request of preset value from the single pre-read request of VFS layer and split read requests.

Wherein, preset value is the 512K byte in the present embodiment.

Split the visit granularity of the visit granularity of read request less than single file read request, and less than the visit granularity of pre-read request.

Step S405, client splits read request with each and is encapsulated in the read request message, and all read request message are sent to storage server.Wherein, a fractionation read request is encapsulated in the read request message.

Step S406, storage server receives all read request message, and first read request message of processed in sequence is obtained the data positional information hereof in this read request message, read this positional information data designated, these data are sent to client by response message.

Step S407, the next read request message of processed in sequence, obtain the data positional information hereof in this read request message, read this positional information data designated, by response message these data are sent to client, repeat described step S407, the data of visiting up to all read requests all are read.

The response message that sends among client receiving step S406 and the S407 returns to application layer with data in this response message.

Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.

Claims

1. the method that file reads in the cluster file system is characterized in that, comprising:

Step 3, described storage server receives all read request message, and first read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;

Step 5, described client receives described response message, and the data in the described response message are returned to the Virtual File System layer.

2. the method that file reads in the cluster file system as claimed in claim 1 is characterized in that, the visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.

3. the method that file reads in the cluster file system as claimed in claim 2 is characterized in that, also comprises before the described step 1:

Step 33 is prepared described read request is mail to described storage server.

4. the method that file reads in the cluster file system as claimed in claim 3 is characterized in that,

5. the system that file reads in the cluster file system comprises client and storage server, it is characterized in that,

Described client comprises and splits module, package module and return module,

6. the system that file reads in the cluster file system as claimed in claim 5 is characterized in that,

7. the system that file reads in the cluster file system as claimed in claim 6, it is characterized in that, described fractionation module also is used for when access module is sequential access mode, carry out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.

8. the system that file reads in the cluster file system as claimed in claim 7 is characterized in that,