CN101382955A - File reading method in cluster file system and system - Google Patents

File reading method in cluster file system and system Download PDF

Info

Publication number
CN101382955A
CN101382955A CNA2008102234889A CN200810223488A CN101382955A CN 101382955 A CN101382955 A CN 101382955A CN A2008102234889 A CNA2008102234889 A CN A2008102234889A CN 200810223488 A CN200810223488 A CN 200810223488A CN 101382955 A CN101382955 A CN 101382955A
Authority
CN
China
Prior art keywords
read request
read
data
storage server
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102234889A
Other languages
Chinese (zh)
Other versions
CN101382955B (en
Inventor
刘岳
熊劲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN2008102234889A priority Critical patent/CN101382955B/en
Publication of CN101382955A publication Critical patent/CN101382955A/en
Application granted granted Critical
Publication of CN101382955B publication Critical patent/CN101382955B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for reading files in a cluster file system and a system thereof. The method includes the steps: step 1 that a client divides a single pre-reading request from VFS level or a single file reading request from the VFS level with access granularity over a pre-set value into at least two divided reading requests; step 2 that the client packages every divided reading request into a reading request message which is sent to a storage server; step 3 that the storage server receives all reading request messages one of which is processed in sequence so as to obtain location information, read data specified by the location information, and send a responding message to the client; the step 3 that is repeated till all data visited by the reading message are read; and step 4 that the client receives the responding message and sends the data in the responding message to the VSF level. Therefore, a disk I/O of a storage server and network data transmission can work synchronously, overall processing time of a request is shortened and throughput ratio is improved.

Description

The method and system that file reads in a kind of cluster file system
Technical field
The present invention relates to the Computer Storage field, relate in particular to the method and system that file reads in a kind of cluster file system.
Background technology
A group of planes (cluster) system is made up of interconnected a plurality of stand-alone computer, this computing machine can be unit or multicomputer system, for example PC (personal computer), workstation or SMP (symmetrical multiprocessing system), each computing machine all has storer, I/O (I/O) the device and operating system of oneself.Network of Workstation is a single system to user and application, and high performance environments and rapid and reliable service efficiently at a low price can be provided.Because Network of Workstation has the advantage of high performance-price ratio, it has become the main flow structure of high-performance computer.
In Network of Workstation, storage server is equipped with jumbo memory device usually, when Network of Workstation operates, need manage these memory devices.Simultaneously, Network of Workstation also needs to provide file-sharing service for the user of different clients.Cluster file system provides above-mentioned service for Network of Workstation, and it integrates all memory devices in the Network of Workstation, sets up a unified name space (institutional framework of file and catalogue).Each client is seen the file system of bibliographic structure unanimity, and the user of different nodes (client) can adopt the identical file of transparent way visit.Data in the cluster file system are not stored in the disk of this client usually, but are stored on the storage server, thereby all can be provided with special-purpose storage server usually.To be written as example, when application process was passed through the client write data of cluster file system, client at first was sent to the storage server end with data by network, and storage server is write the data that receive in the memory device of storage server again.
The IO of cluster file system (input and output) path is long, the implementation of whole operation relates to a plurality of key components, such as the buffer memory of cluster file system client, buffer memory, the IO scheduling of storage server end and controller, processor and the Internet resources of storage server end.In cluster file system, need above-mentioned part collaborative work to finish the various IO operation requests of application.At present, disk access and network transmission performance are relatively low, lag behind the development of other assembly.Therefore, for I/O (I/O) intensive applications of cluster file system, the disk access of data and network latency have occupied the overwhelming majority of whole Request Processing time.
Because the processing of request process need experiences a plurality of stages, be subjected to the inspiration of instruction process streamlined, can adopt the method for processing of request streamlined, thereby make a plurality of physical equipment concurrent workings.The most important condition of streamlined is that a plurality of requests can be sent simultaneously.Have only a plurality of requests to be handled simultaneously, could guarantee that a plurality of processing element can concurrent working.Request transfers the asynchronization process that process that pilosity penetrates is called request to by single transmit.Because write operation itself is asynchronous, do not need to wait for that this write operation finishes, just can send next write request.Therefore, even under the situation of single load, synchronization also has a plurality of write requests and is addressed to the processing of storage server end.So the streamlined of write request can form naturally.Different with write request, read request all is synchronous.Application program need just can be carried out follow-up calculating and follow-up read operation from file after the sense data.So if do not carry out particular processing, the processing procedure of read request will be strict serial.As shown in Figure 1, the storage server end reads in internal memory with the data of first request from disk earlier, then, again data is sent back client by network; Client is sent follow-up read request after receiving first processing of request result again.In the entire process process, various physical equipments can't concurrent working.
For other processing procedure parallelization with disk access process and request, prior art is to adopt the mode of data pre-fetching.When the access module that detects application was sequential access, file system also can continue to read follow-up a part of data from disk after will using the data of being asked and reading internal memory from disk.When application program is sent read request to follow-up data, then directly from internal memory, read.
Reading mechanism in advance in the storage server end use of cluster file system can be overlapping with disk access time and network latency to a certain extent, realizes parallelization.But two restrictive conditions are arranged: one is that the visit of application program must be a sequential access, will can not play a role otherwise read mechanism in advance; In addition, the granularity of reading in advance of server end needs big to a certain degree could be with the sufficient parallelization of processing procedure.And under multi-load case, excessive read granularity in advance and will consume more memory source.More than 2 limited read in advance mechanism in effect with the parallelization of processing of request process.
Summary of the invention
For addressing the above problem, the invention provides the method and system that file reads in a kind of cluster file system, but can make the magnetic disc i/o and the network data transmission concurrent working of storage server, shorten the total processing time of request, the throughput of elevator system.
The invention discloses the method that file reads in a kind of cluster file system, comprising:
Step 1, client in the future the self-virtualizing file system layer single pre-read request or be split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value and split read requests;
Step 2, described client is encapsulated into each described fractionation read request in the read request message, and all described read request message are sent to storage server;
Step 3, described storage server receives all described read request message, and first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;
Step 4, the next read request message of described storage server processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat described step 4, the data of visiting up to all described read requests all are read;
Step 5, described client receives described response message, and data in the described response message are returned to the Virtual File System layer.
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
Also comprise before the described step 1:
Step 31, described client judge whether access module is sequential access mode, if then execution in step 32, otherwise, execution in step 33;
Step 32 is carried out sequential prefetch operations, and the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server;
Step 33 is prepared described read request is mail to described storage server.
The preset value of the visit granularity of described read request is the 512K byte.
The invention also discloses the system that file reads in a kind of cluster file system, comprise client and storage server,
Described client comprises and splits module, package module and return module,
Described fractionation module is used for the single pre-read request of self-virtualizing file system layer in the future or is split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value splitting read requests;
Described package module is used for each described fractionation read request is encapsulated into a read request message, and all described read request message are sent to storage server;
The described module of returning is used to receive the response message that described storage server is replied, and the data in the described response message are returned to the Virtual File System layer;
Described storage server, be used to receive all described read request message, first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message; The next read request message of processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat the process of the next read request message of described processing, the data of visiting up to all described read request message all are read.
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
Described fractionation module also is used for when access module is sequential access mode, carries out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.
The preset value of the visit granularity of described read request is the 512K byte.
Beneficial effect of the present invention is, by single read request or pre-read request are split into several little read requests, is encapsulated into that read request message is parallel to be sent, and can guarantee synchronization, has a plurality of read request message to be sent to the storage server end.After the content that the storage server end is visited previous read request message is read internal memory from disk, with these data in transmission over networks, disk reads the content that next read request message is visited, thereby the time-interleaving of reading from disk of the previous read request message data of the visiting data of visiting in network latency and a back read request message, as shown in Figure 2.The size of single read request or pre-read request is S, the data that this read request or pre-read request are visited from the time that disk reads be T d, the network latency of these data is T n, the T.T. that this read request or pre-read request are handled is T d+ T nWhen in certain visit particle size range, disk access time and network latency all are directly proportional with data volume and disk access speed when being slower than network speed, with raw requests split into four split read requests after, the T.T. of processing is (T d/ 4+T d/ 4+T d/ 4+T d/ 4+T n/ 4)=T d+ T n/ 4.The processing time that is original read request message has reduced T n* 3/4.
Description of drawings
Fig. 1 is a prior art read request serial processing synoptic diagram;
Fig. 2 is a beneficial effect synoptic diagram of the present invention;
Fig. 3 is a system flowchart of the present invention;
Fig. 4 is a method flow diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
System architecture of the present invention comprises client 301 and storage server 302 as shown in Figure 3.
Client 301 comprises and splits module 311, package module 312 and return module 313,
Split module 311, be used for from the single pre-read request of VFS (Virtual File System, Virtual File System) layer or be split as at least two from the visit granularity of VFS layer greater than the single file read request of preset value and split read requests.Wherein, preset value is the 512K byte in the present embodiment.
Split read request, be read request, the visit granularity that splits read request is less than the pre-read request from the VFS layer, and less than from the visit granularity of the VFS layer single file read request greater than preset value.
Split module 311 and also be used for when access module is sequential access mode, carry out the pre-read operation of order, will ask the positional information of pre-read data to be packaged into pre-read request, and prepare and pre-read request to mail to storage server 302; When access module is not sequential access mode, preparation will be mail to storage server 302 from the single file read request of VFS layer.
Package module 312 is used for that each is split read request and is encapsulated into a read request message, and all read request message are sent to storage server 302.Wherein, a fractionation read request is encapsulated in the read request message.
Return module 313, be used to receive the response message that storage server 302 is replied, the data in this response message are returned to the VFS layer.
Storage server 302, be used to receive all read request message that client 301 sends, first read request message of processed in sequence, obtain data positional information hereof, read this positional information data designated, by response message these data are sent to client 301, and the next read request message of processed in sequence, obtain data positional information hereof, read this positional information data designated, these data are sent to client 301, repeat and handle next read request by response message, by response message these data are sent to client 301, the data of visiting up to all read requests all are read.
The inventive method flow process as shown in Figure 4.
Step S401, client judges whether access module is sequential access mode, if, execution in step S402 then, otherwise, execution in step S403.
Step S402 carries out the pre-read operation of order, will ask the positional information of pre-read data to be packaged into pre-read request, and prepares and should pre-read request mail to described storage server, execution in step S404.
The visit granularity of the pre-read request after the conversion is not the visit granularity that application program is sent the file access request, but through reading the visit granularity after the machine-processed polymerization in advance.
Step S403, preparation will be mail to storage server from the single file read request of VFS layer.
Step S404, client will or be split as at least two from the visit granularity of VFS layer greater than the single file read request of preset value from the single pre-read request of VFS layer and split read requests.
Wherein, preset value is the 512K byte in the present embodiment.
Split the visit granularity of the visit granularity of read request less than single file read request, and less than the visit granularity of pre-read request.
Step S405, client splits read request with each and is encapsulated in the read request message, and all read request message are sent to storage server.Wherein, a fractionation read request is encapsulated in the read request message.
Step S406, storage server receives all read request message, and first read request message of processed in sequence is obtained the data positional information hereof in this read request message, read this positional information data designated, these data are sent to client by response message.
Step S407, the next read request message of processed in sequence, obtain the data positional information hereof in this read request message, read this positional information data designated, by response message these data are sent to client, repeat described step S407, the data of visiting up to all read requests all are read.
The response message that sends among client receiving step S406 and the S407 returns to application layer with data in this response message.
Those skilled in the art can also carry out various modifications to above content under the condition that does not break away from the definite the spirit and scope of the present invention of claims.Therefore scope of the present invention is not limited in above explanation, but determine by the scope of claims.

Claims (8)

1. the method that file reads in the cluster file system is characterized in that, comprising:
Step 1, client in the future the self-virtualizing file system layer single pre-read request or be split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value and split read requests;
Step 2, described client is encapsulated into each described fractionation read request in the read request message, and all described read request message are sent to storage server;
Step 3, described storage server receives all read request message, and first read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message;
Step 4, the next read request message of described storage server processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat described step 4, the data of visiting up to all described read requests all are read;
Step 5, described client receives described response message, and the data in the described response message are returned to the Virtual File System layer.
2. the method that file reads in the cluster file system as claimed in claim 1 is characterized in that, the visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
3. the method that file reads in the cluster file system as claimed in claim 2 is characterized in that, also comprises before the described step 1:
Step 31, described client judge whether access module is sequential access mode, if then execution in step 32, otherwise, execution in step 33;
Step 32 is carried out sequential prefetch operations, and the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server;
Step 33 is prepared described read request is mail to described storage server.
4. the method that file reads in the cluster file system as claimed in claim 3 is characterized in that,
The preset value of the visit granularity of described read request is the 512K byte.
5. the system that file reads in the cluster file system comprises client and storage server, it is characterized in that,
Described client comprises and splits module, package module and return module,
Described fractionation module is used for the single pre-read request of self-virtualizing file system layer in the future or is split as at least two from the visit granularity of Virtual File System layer greater than the single file read request of preset value splitting read requests;
Described package module is used for each described fractionation read request is encapsulated into a read request message, and all described read request message are sent to storage server;
The described module of returning is used to receive the response message that described storage server is replied, and the data in the described response message are returned to the Virtual File System layer;
Described storage server, be used to receive all described read request message, first described read request message of processed in sequence is obtained data positional information hereof, read described positional information data designated, described data are sent to described client by response message; The next read request message of processed in sequence, obtain data positional information hereof, read described positional information data designated, by response message described data are sent to described client, repeat the process of the next read request message of described processing, the data of visiting up to all described read request message all are read.
6. the system that file reads in the cluster file system as claimed in claim 5 is characterized in that,
The visit granularity of described fractionation read request is less than the visit granularity of described read request, and less than the visit granularity of described pre-read request.
7. the system that file reads in the cluster file system as claimed in claim 6, it is characterized in that, described fractionation module also is used for when access module is sequential access mode, carry out the pre-read operation of order, the positional information of order prefetch data is packaged into pre-read request, and prepares described pre-read request is mail to described storage server; When access module is not sequential access mode, prepare described read request is mail to described storage server.
8. the system that file reads in the cluster file system as claimed in claim 7 is characterized in that,
The preset value of the visit granularity of described read request is the 512K byte.
CN2008102234889A 2008-09-28 2008-09-28 File reading method in cluster file system and system Expired - Fee Related CN101382955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102234889A CN101382955B (en) 2008-09-28 2008-09-28 File reading method in cluster file system and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102234889A CN101382955B (en) 2008-09-28 2008-09-28 File reading method in cluster file system and system

Publications (2)

Publication Number Publication Date
CN101382955A true CN101382955A (en) 2009-03-11
CN101382955B CN101382955B (en) 2011-01-12

Family

ID=40462793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102234889A Expired - Fee Related CN101382955B (en) 2008-09-28 2008-09-28 File reading method in cluster file system and system

Country Status (1)

Country Link
CN (1) CN101382955B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014111A (en) * 2009-09-04 2011-04-13 无锡江南计算技术研究所 Data transmission method, message engine, communication node and network system
CN102567548A (en) * 2012-02-21 2012-07-11 上海交通大学 Streaming data pre-reading method for network file system
WO2018133414A1 (en) * 2017-01-20 2018-07-26 深圳市中兴微电子技术有限公司 Packet cutting method, request processing method and apparatus, and computer storage medium
CN109542361A (en) * 2018-12-04 2019-03-29 郑州云海信息技术有限公司 A kind of distributed memory system file reading, system and relevant apparatus
CN110636341A (en) * 2019-10-25 2019-12-31 四川虹魔方网络科技有限公司 Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014111A (en) * 2009-09-04 2011-04-13 无锡江南计算技术研究所 Data transmission method, message engine, communication node and network system
CN102014111B (en) * 2009-09-04 2013-09-18 无锡江南计算技术研究所 Data transmission method, message engine, communication node and network system
CN102567548A (en) * 2012-02-21 2012-07-11 上海交通大学 Streaming data pre-reading method for network file system
WO2018133414A1 (en) * 2017-01-20 2018-07-26 深圳市中兴微电子技术有限公司 Packet cutting method, request processing method and apparatus, and computer storage medium
CN109542361A (en) * 2018-12-04 2019-03-29 郑州云海信息技术有限公司 A kind of distributed memory system file reading, system and relevant apparatus
CN109542361B (en) * 2018-12-04 2022-06-07 郑州云海信息技术有限公司 Distributed storage system file reading method, system and related device
CN110636341A (en) * 2019-10-25 2019-12-31 四川虹魔方网络科技有限公司 Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method
CN110636341B (en) * 2019-10-25 2021-11-09 四川虹魔方网络科技有限公司 Large-concurrency supporting multi-level fine-grained caching mechanism launcher interface optimization method

Also Published As

Publication number Publication date
CN101382955B (en) 2011-01-12

Similar Documents

Publication Publication Date Title
Seo et al. HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment
Islam et al. High performance RDMA-based design of HDFS over InfiniBand
Lagar-Cavilla et al. Snowflock: Virtual machine cloning as a first-class cloud primitive
Huang et al. High-performance design of hbase with rdma over infiniband
Ohta et al. Optimization techniques at the I/O forwarding layer
Abbasi et al. Extending i/o through high performance data services
Yu et al. Design and evaluation of network-levitated merge for hadoop acceleration
CN101382955B (en) File reading method in cluster file system and system
Sehgal et al. Understanding application-level interoperability: Scaling-out mapreduce over high-performance grids and clouds
Peng et al. Implementation issues of a cloud computing platform.
Wasi-ur-Rahman et al. A comprehensive study of MapReduce over lustre for intermediate data placement and shuffle strategies on HPC clusters
CN103605630A (en) Virtual server system and data reading-writing method thereof
US8869155B2 (en) Increasing parallel program performance for irregular memory access problems with virtual data partitioning and hierarchical collectives
Shen et al. Magnet: push-based shuffle service for large-scale data processing
Xu et al. Analysis and optimization of data import with hadoop
Liu et al. The research and analysis of efficiency of hardware usage base on HDFS
Krevat et al. Applying performance models to understand data-intensive computing efficiency
Uta et al. Overcoming data locality: An in-memory runtime file system with symmetrical data distribution
Chen et al. A fast RPC system for virtual machines
Li et al. Improving spark performance with zero-copy buffer management and RDMA
Que et al. Hierarchical merge for scalable mapreduce
Sutherland et al. Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
Singh et al. GePSeA: a general-purpose software acceleration framework for lightweight task offloading
Ross et al. Server-side scheduling in cluster parallel I/O systems
Kunkel Towards automatic load balancing of a parallel file system with subfile based migration

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110112

Termination date: 20190928

CF01 Termination of patent right due to non-payment of annual fee