CN101382955A - File reading method in cluster file system and system - Google Patents
File reading method in cluster file system and system Download PDFInfo
- Publication number
- CN101382955A CN101382955A CNA2008102234889A CN200810223488A CN101382955A CN 101382955 A CN101382955 A CN 101382955A CN A2008102234889 A CNA2008102234889 A CN A2008102234889A CN 200810223488 A CN200810223488 A CN 200810223488A CN 101382955 A CN101382955 A CN 101382955A
- Authority
- CN
- China
- Prior art keywords
- read request
- read
- data
- storage server
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a method for reading files in a cluster file system and a system thereof. The method includes the steps: step 1 that a client divides a single pre-reading request from VFS level or a single file reading request from the VFS level with access granularity over a pre-set value into at least two divided reading requests; step 2 that the client packages every divided reading request into a reading request message which is sent to a storage server; step 3 that the storage server receives all reading request messages one of which is processed in sequence so as to obtain location information, read data specified by the location information, and send a responding message to the client; the step 3 that is repeated till all data visited by the reading message are read; and step 4 that the client receives the responding message and sends the data in the responding message to the VSF level. Therefore, a disk I/O of a storage server and network data transmission can work synchronously, overall processing time of a request is shortened and throughput ratio is improved.
Description
Technical field
The present invention relates to the Computer Storage field, relate in particular to the method and system that file reads in a kind of cluster file system.
Background technology
A group of planes (cluster) system is made up of interconnected a plurality of stand-alone computer, this computing machine can be unit or multicomputer system, for example PC (personal computer), workstation or SMP (symmetrical multiprocessing system), each computing machine all has storer, I/O (I/O) the device and operating system of oneself.Network of Workstation is a single system to user and application, and high performance environments and rapid and reliable service efficiently at a low price can be provided.Because Network of Workstation has the advantage of high performance-price ratio, it has become the main flow structure of high-performance computer.
In Network of Workstation, storage server is equipped with jumbo memory device usually, when Network of Workstation operates, need manage these memory devices.Simultaneously, Network of Workstation also needs to provide file-sharing service for the user of different clients.Cluster file system provides above-mentioned service for Network of Workstation, and it integrates all memory devices in the Network of Workstation, sets up a unified name space (institutional framework of file and catalogue).Each client is seen the file system of bibliographic structure unanimity, and the user of different nodes (client) can adopt the identical file of transparent way visit.Data in the cluster file system are not stored in the disk of this client usually, but are stored on the storage server, thereby all can be provided with special-purpose storage server usually.To be written as example, when application process was passed through the client write data of cluster file system, client at first was sent to the storage server end with data by network, and storage server is write the data that receive in the memory device of storage server again.
The IO of cluster file system (input and output) path is long, the implementation of whole operation relates to a plurality of key components, such as the buffer memory of cluster file system client, buffer memory, the IO scheduling of storage server end and controller, processor and the Internet resources of storage server end.In cluster file system, need above-mentioned part collaborative work to finish the various IO operation requests of application.At present, disk access and network transmission performance are relatively low, lag behind the development of other assembly.Therefore, for I/O (I/O) intensive applications of cluster file system, the disk access of data and network latency have occupied the overwhelming majority of whole Request Processing time.
Because the processing of request process need experiences a plurality of stages, be subjected to the inspiration of instruction process streamlined, can adopt the method for processing of request streamlined, thereby make a plurality of physical equipment concurrent workings.The most important condition of streamlined is that a plurality of requests can be sent simultaneously.Have only a plurality of requests to be handled simultaneously, could guarantee that a plurality of processing element can concurrent working.Request transfers the asynchronization process that process that pilosity penetrates is called request to by single transmit.Because write operation itself is asynchronous, do not need to wait for that this write operation finishes, just can send next write request.Therefore, even under the situation of single load, synchronization also has a plurality of write requests and is addressed to the processing of storage server end.So the streamlined of write request can form naturally.Different with write request, read request all is synchronous.Application program need just can be carried out follow-up calculating and follow-up read operation from file after the sense data.So if do not carry out particular processing, the processing procedure of read request will be strict serial.As shown in Figure 1, the storage server end reads in internal memory with the data of first request from disk earlier, then, again data is sent back client by network; Client is sent follow-up read request after receiving first processing of request result again.In the entire process process, various physical equipments can't concurrent working.
For other processing procedure parallelization with disk access process and request, prior art is to adopt the mode of data pre-fetching.When the access module that detects application was sequential access, file system also can continue to read follow-up a part of data from disk after will using the data of being asked and reading internal memory from disk.When application program is sent read request to follow-up data, then directly from internal memory, read.
Reading mechanism in advance in the storage server end use of cluster file system can be overlapping with disk access time and network latency to a certain extent, realizes parallelization.But two restrictive conditions are arranged: one is that the visit of application program must be a sequential access, will can not play a role otherwise read mechanism in advance; In addition, the granularity of reading in advance of server end needs big to a certain degree could be with the sufficient parallelization of processing procedure.And under multi-load case, excessive read granularity in advance and will consume more memory source.More than 2 limited read in advance mechanism in effect with the parallelization of processing of request process.
Summary of the invention
For addressing the above problem, the invention provides the method and system that file reads in a kind of cluster file system, but can make the magnetic disc i/o and the network data transmission concurrent working of storage server, shorten the total processing time of request, the throughput of elevator system.
The invention discloses the method that file reads in a kind of cluster file system, comprising:
Step 1, client in the future the self-virtualizing file system layer single pre-read request or be split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value and split read requests;
Step 2, described client is encapsulated into each described fractionation read request in the read request message, and all described read request message are sent to storage server;
Step 3, described storage server receives all described read request message, and first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;
Step 4, the next read request message of described storage server processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat described step 4, the data of visiting up to all described read requests all are read;
Step 5, described client receives described response message, and data in the described response message are returned to the Virtual File System layer.
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
Also comprise before the described step 1:
Step 31, described client judge whether access module is sequential access mode, if then execution in step 32, otherwise, execution in step 33;
Step 32 is carried out sequential prefetch operations, and the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server;
Step 33 is prepared described read request is mail to described storage server.
The preset value of the visit granularity of described read request is the 512K byte.
The invention also discloses the system that file reads in a kind of cluster file system, comprise client and storage server,
Described client comprises and splits module, package module and return module,
Described fractionation module is used for the single pre-read request of self-virtualizing file system layer in the future or is split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value splitting read requests;
Described package module is used for each described fractionation read request is encapsulated into a read request message, and all described read request message are sent to storage server;
The described module of returning is used to receive the response message that described storage server is replied, and the data in the described response message are returned to the Virtual File System layer;
Described storage server, be used to receive all described read request message, first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message; The next read request message of processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat the process of the next read request message of described processing, the data of visiting up to all described read request message all are read.
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
Described fractionation module also is used for when access module is sequential access mode, carries out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.
The preset value of the visit granularity of described read request is the 512K byte.
Beneficial effect of the present invention is, by single read request or pre-read request are split into several little read requests, is encapsulated into that read request message is parallel to be sent, and can guarantee synchronization, has a plurality of read request message to be sent to the storage server end.After the content that the storage server end is visited previous read request message is read internal memory from disk, with these data in transmission over networks, disk reads the content that next read request message is visited, thereby the time-interleaving of reading from disk of the previous read request message data of the visiting data of visiting in network latency and a back read request message, as shown in Figure 2.The size of single read request or pre-read request is S, the data that this read request or pre-read request are visited from the time that disk reads be T
d, the network latency of these data is T
n, the T.T. that this read request or pre-read request are handled is T
d+ T
nWhen in certain visit particle size range, disk access time and network latency all are directly proportional with data volume and disk access speed when being slower than network speed, with raw requests split into four split read requests after, the T.T. of processing is (T
d/ 4+T
d/ 4+T
d/ 4+T
d/ 4+T
n/ 4)=T
d+ T
n/ 4.The processing time that is original read request message has reduced T
n* 3/4.
Description of drawings
Fig. 1 is a prior art read request serial processing synoptic diagram;
Fig. 2 is a beneficial effect synoptic diagram of the present invention;
Fig. 3 is a system flowchart of the present invention;
Fig. 4 is a method flow diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
System architecture of the present invention comprises client 301 and storage server 302 as shown in Figure 3.
Split read request, be read request, the visit granularity that splits read request is less than the pre-read request from the VFS layer, and less than from the visit granularity of the VFS layer single file read request greater than preset value.
The inventive method flow process as shown in Figure 4.
Step S401, client judges whether access module is sequential access mode, if, execution in step S402 then, otherwise, execution in step S403.
Step S402 carries out the pre-read operation of order, will ask the positional information of pre-read data to be packaged into pre-read request, and prepares and should pre-read request mail to described storage server, execution in step S404.
The visit granularity of the pre-read request after the conversion is not the visit granularity that application program is sent the file access request, but through reading the visit granularity after the machine-processed polymerization in advance.
Step S403, preparation will be mail to storage server from the single file read request of VFS layer.
Step S404, client will or be split as at least two from the visit granularity of VFS layer greater than the single file read request of preset value from the single pre-read request of VFS layer and split read requests.
Wherein, preset value is the 512K byte in the present embodiment.
Split the visit granularity of the visit granularity of read request less than single file read request, and less than the visit granularity of pre-read request.
Step S405, client splits read request with each and is encapsulated in the read request message, and all read request message are sent to storage server.Wherein, a fractionation read request is encapsulated in the read request message.
Step S406, storage server receives all read request message, and first read request message of processed in sequence is obtained the data positional information hereof in this read request message, read this positional information data designated, these data are sent to client by response message.
Step S407, the next read request message of processed in sequence, obtain the data positional information hereof in this read request message, read this positional information data designated, by response message these data are sent to client, repeat described step S407, the data of visiting up to all read requests all are read.
The response message that sends among client receiving step S406 and the S407 returns to application layer with data in this response message.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.
Claims (8)
1. the method that file reads in the cluster file system is characterized in that, comprising:
Step 1, client in the future the self-virtualizing file system layer single pre-read request or be split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value and split read requests;
Step 2, described client is encapsulated into each described fractionation read request in the read request message, and all described read request message are sent to storage server;
Step 3, described storage server receives all read request message, and first read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;
Step 4, the next read request message of described storage server processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat described step 4, the data of visiting up to all described read requests all are read;
Step 5, described client receives described response message, and the data in the described response message are returned to the Virtual File System layer.
2. the method that file reads in the cluster file system as claimed in claim 1 is characterized in that, the visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
3. the method that file reads in the cluster file system as claimed in claim 2 is characterized in that, also comprises before the described step 1:
Step 31, described client judge whether access module is sequential access mode, if then execution in step 32, otherwise, execution in step 33;
Step 32 is carried out sequential prefetch operations, and the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server;
Step 33 is prepared described read request is mail to described storage server.
4. the method that file reads in the cluster file system as claimed in claim 3 is characterized in that,
The preset value of the visit granularity of described read request is the 512K byte.
5. the system that file reads in the cluster file system comprises client and storage server, it is characterized in that,
Described client comprises and splits module, package module and return module,
Described fractionation module is used for the single pre-read request of self-virtualizing file system layer in the future or is split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value splitting read requests;
Described package module is used for each described fractionation read request is encapsulated into a read request message, and all described read request message are sent to storage server;
The described module of returning is used to receive the response message that described storage server is replied, and the data in the described response message are returned to the Virtual File System layer;
Described storage server, be used to receive all described read request message, first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message; The next read request message of processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat the process of the next read request message of described processing, the data of visiting up to all described read request message all are read.
6. the system that file reads in the cluster file system as claimed in claim 5 is characterized in that,
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
7. the system that file reads in the cluster file system as claimed in claim 6, it is characterized in that, described fractionation module also is used for when access module is sequential access mode, carry out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.
8. the system that file reads in the cluster file system as claimed in claim 7 is characterized in that,
The preset value of the visit granularity of described read request is the 512K byte.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102234889A CN101382955B (en) | 2008-09-28 | 2008-09-28 | File reading method in cluster file system and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008102234889A CN101382955B (en) | 2008-09-28 | 2008-09-28 | File reading method in cluster file system and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101382955A true CN101382955A (en) | 2009-03-11 |
CN101382955B CN101382955B (en) | 2011-01-12 |
Family
ID=40462793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2008102234889A Expired - Fee Related CN101382955B (en) | 2008-09-28 | 2008-09-28 | File reading method in cluster file system and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101382955B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102014111A (en) * | 2009-09-04 | 2011-04-13 | 无锡江南计算技术研究所 | Data transmission method, message engine, communication node and network system |
CN102567548A (en) * | 2012-02-21 | 2012-07-11 | 上海交通大学 | Streaming data pre-reading method for network file system |
WO2018133414A1 (en) * | 2017-01-20 | 2018-07-26 | 深圳市中兴微电子技术有限公司 | Packet cutting method, request processing method and apparatus, and computer storage medium |
CN109542361A (en) * | 2018-12-04 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of distributed memory system file reading, system and relevant apparatus |
CN110636341A (en) * | 2019-10-25 | 2019-12-31 | 四川虹魔方网络科技有限公司 | Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method |
-
2008
- 2008-09-28 CN CN2008102234889A patent/CN101382955B/en not_active Expired - Fee Related
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102014111A (en) * | 2009-09-04 | 2011-04-13 | 无锡江南计算技术研究所 | Data transmission method, message engine, communication node and network system |
CN102014111B (en) * | 2009-09-04 | 2013-09-18 | 无锡江南计算技术研究所 | Data transmission method, message engine, communication node and network system |
CN102567548A (en) * | 2012-02-21 | 2012-07-11 | 上海交通大学 | Streaming data pre-reading method for network file system |
WO2018133414A1 (en) * | 2017-01-20 | 2018-07-26 | 深圳市中兴微电子技术有限公司 | Packet cutting method, request processing method and apparatus, and computer storage medium |
CN109542361A (en) * | 2018-12-04 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of distributed memory system file reading, system and relevant apparatus |
CN109542361B (en) * | 2018-12-04 | 2022-06-07 | 郑州云海信息技术有限公司 | Distributed storage system file reading method, system and related device |
CN110636341A (en) * | 2019-10-25 | 2019-12-31 | 四川虹魔方网络科技有限公司 | Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method |
CN110636341B (en) * | 2019-10-25 | 2021-11-09 | 四川虹魔方网络科技有限公司 | Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method |
Also Published As
Publication number | Publication date |
---|---|
CN101382955B (en) | 2011-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Seo et al. | HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment | |
Islam et al. | High performance RDMA-based design of HDFS over InfiniBand | |
Lagar-Cavilla et al. | Snowflock: Virtual machine cloning as a first-class cloud primitive | |
Huang et al. | High-performance design of hbase with rdma over infiniband | |
Ohta et al. | Optimization techniques at the I/O forwarding layer | |
Abbasi et al. | Extending i/o through high performance data services | |
Yu et al. | Design and evaluation of network-levitated merge for hadoop acceleration | |
CN101382955B (en) | File reading method in cluster file system and system | |
Sehgal et al. | Understanding application-level interoperability: Scaling-out mapreduce over high-performance grids and clouds | |
Peng et al. | Implementation issues of a cloud computing platform. | |
Wasi-ur-Rahman et al. | A comprehensive study of MapReduce over lustre for intermediate data placement and shuffle strategies on HPC clusters | |
CN103605630A (en) | Virtual server system and data reading-writing method thereof | |
US8869155B2 (en) | Increasing parallel program performance for irregular memory access problems with virtual data partitioning and hierarchical collectives | |
Shen et al. | Magnet: push-based shuffle service for large-scale data processing | |
Xu et al. | Analysis and optimization of data import with hadoop | |
Liu et al. | The research and analysis of efficiency of hardware usage base on HDFS | |
Krevat et al. | Applying performance models to understand data-intensive computing efficiency | |
Uta et al. | Overcoming data locality: An in-memory runtime file system with symmetrical data distribution | |
Chen et al. | A fast RPC system for virtual machines | |
Li et al. | Improving spark performance with zero-copy buffer management and RDMA | |
Que et al. | Hierarchical merge for scalable mapreduce | |
Sutherland et al. | Cooperative Concurrency Control for Write-Intensive Key-Value Workloads | |
Singh et al. | GePSeA: a general-purpose software acceleration framework for lightweight task offloading | |
Ross et al. | Server-side scheduling in cluster parallel I/O systems | |
Kunkel | Towards automatic load balancing of a parallel file system with subfile based migration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110112 Termination date: 20190928 |
|
CF01 | Termination of patent right due to non-payment of annual fee |