CN103902660A - System and method for prefetching file layout through readdir++ in cluster file system - Google Patents

System and method for prefetching file layout through readdir++ in cluster file system Download PDF

Info

Publication number
CN103902660A
CN103902660A CN201410076739.0A CN201410076739A CN103902660A CN 103902660 A CN103902660 A CN 103902660A CN 201410076739 A CN201410076739 A CN 201410076739A CN 103902660 A CN103902660 A CN 103902660A
Authority
CN
China
Prior art keywords
file
client
catalogue
directory
file layout
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410076739.0A
Other languages
Chinese (zh)
Other versions
CN103902660B (en
Inventor
杨洪章
张军伟
刘振军
许鲁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongke Bluewhale Information Technology Co ltd
Institute of Computing Technology of CAS
Original Assignee
Tianjin Zhongke Bluewhale Information Technology Co ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongke Bluewhale Information Technology Co ltd, Institute of Computing Technology of CAS filed Critical Tianjin Zhongke Bluewhale Information Technology Co ltd
Priority to CN201410076739.0A priority Critical patent/CN103902660B/en
Publication of CN103902660A publication Critical patent/CN103902660A/en
Application granted granted Critical
Publication of CN103902660B publication Critical patent/CN103902660B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5681Pre-fetching or pre-delivering data based on network characteristics

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a system and method for prefetching a file layout through readdir++ in a cluster file system. The system comprises a client side module (1) and a server module (2). The client side module (1) is used for obtaining or returning catalogue reading authorization from the server module (2); after the catalogue reading authentication is obtained, catalogue reading requests are sent to the server module (2); a webpage which is sent by the server module (2) and contains the file layout is stored in a local cache, and when the client side module (1) reads files in a catalogue, the file layout stored in the local cache is used directly. The server module (2) is used for authorizing the catalogue reading authorization to the client side module (1) or recalling the catalogue reading authorization from the client side module (1); when the catalogue reading requests are received, metadata information including the file layout is encapsulated in the webpage, and the webpage is sent to the client side module (1). Therefore, network interaction overheads for acquiring the file layout in the file reading process can be reduced, and the reading access performance of the massive small files can be improved greatly.

Description

In cluster file system by readdir++ look ahead system and the method thereof of file layout
Technical field
The present invention relates to metadata prefetch mechanisms in cluster file system, particularly in a kind of cluster file system by readdir++ look ahead system and the method thereof of file layout.
Background technology
Along with the arriving of large data age, global metadata quantity of information rapid growth.In the fields such as ecommerce, social networks, science calculating, there is increasing undersized file.Therefore, managing efficiently " mass small documents ", the small documents access services of low delay is provided, is the new problem of pendulum in face of distributed file system.
In recent years, metadata and data, services isolating construction have become the main flow trend of distributed file system.The benefit of this isolating construction is: customer end adopted out-band method DASD, can obtain higher access performance for the access of large file.
But the situation of small documents access is completely different.For small documents, the shared ratio of data access is few, and the shared ratio of metadata access is large.And when client-access file data, just can carry out after all needing first synchronously to obtain file layout (layout) by network interaction (RPC), cause single small documents operating delay excessive.Particularly when the large amount of small documents under the continuous same catalogue of read-only access, client need to be carried out separately a hyposynchronous file layout metadata to each small documents continually and be obtained access, and this has caused very large impact for system performance.
Reading catalogue (readdir) is the operation of reading catalogue in file system, and object is to obtain the essential informations such as the title (name) of all directory entries (entry) in catalogue, type (type), inode number (ino).
Catalogue licensing scheme (DELEGATION) is the recallable guarantee of one that server is given to client.Giving mandate to during recalling mandate, server can ensure that other clients can not cause the conflict to file system consistency semanteme to the operation of this catalogue.Its essence is exactly that server gives client and processes and read catalogue, search (lookup), open (open), close (close), read (read), write the ability of (write) in this locality, and does not need and server interaction.If there is no catalogue licensing scheme, above operation all needs to carry out just completing alternately with server, and its time overhead is very large.
Current parallel network file system (pNFS) adopts readdir+ technology, and this technology is in the once improvement of reading on catalogue and catalogue licensing scheme basis.Except obtaining title, type, index node extra, file handle (fh) and the file attribute (fattr) of whole directory entries under the catalogue of also additionally looking ahead, these two is vital metadata information.Afterwards, if client need to be accessed the metadata information of file under this catalogue, need not send getattr and the network interaction information request of obtaining handle, directly from local cache, obtain these metadata informations, reduce the network interaction carrying out with meta data server.
But in the distributed file system based on piece interface, the metadata information that the follow-up read access operations of client needs is not only file attribute and file handle, also needs the book physical block number that file logical place is corresponding, i.e. file layout.The readdir+ technology that pNFS uses obviously cannot effectively reduce the network interaction information overhead that file layout obtains.
Summary of the invention
In order to address the above problem, the object of the invention is to, look ahead system and the method thereof of file layout by readdir++ are provided in a kind of cluster file system, the network interaction expense of file layout can be reduce file reading time, obtained, the read access performance of mass small documents can be promoted significantly.
Readdir++ technology of the present invention is the once improvement based on readdir+ technology, in the time reading catalogue, and not only look ahead file attribute, file handle etc., and the file layout of looking ahead, the primary network interactive operation that can avoid obtaining file layout.
For achieving the above object, the present invention proposes in a kind of cluster file system the look ahead system of file layout by readdir++, for the metadata information of the mass small documents including file layout of looking ahead, to read fast mass small documents, it is characterized in that, this system comprises:
Client modules (1), for obtaining from server module (2) or giving back and read catalogue mandate; Read after catalogue mandate when obtaining this, send and read catalog request to this server module (2); The page stores that contains file layout that this server module (2) is sent, in local cache, when this client modules (1) reads the file under this catalogue, is directly used the file layout of this file of storing in local cache;
This server module (2), stores the metadata information of small documents, for authorizing to this client modules (1) or recalling this and read catalogue mandate; In the time receiving this and read catalog request, also the metadata information including file layout is packaged in directory entry, and this directory entry is encapsulated in the page in order, this page is sent to this client modules (1).
In cluster file system of the present invention, by the look ahead system of file layout of readdir++, it is characterized in that, this client modules (1) specifically comprises:
Client sends network interaction submodule (11), for sending network interaction information to this server module;
Client network interaction submodule (12), the network interaction information sending for receiving this server module;
Customer terminal webpage cache sub-module (13), the page sending for depositing this server module;
The cache sub-module (14) of client directory cache item and index node, for storing directory cache entry and index node;
Client is resolved directory entry submodule (15), for traveling through all pages in this customer terminal webpage cache sub-module (13), and parses metadata information;
Client is submitted to and is read directory information submodule (16), reads directory information for submitting to;
Client operation behavior triggers submodule (17), reads catalogue, searches, opens and shutoff operation for triggering;
Whether client directory authorisation process submodule (18), obtain catalogue mandate for checking, catalogue mandate to be recalled extractd from the catalogue chained list of having authorized;
Client file layout management module (19), for increasing and decreasing file layout's reference count, merges existing file layout in file layout's index node corresponding with it, and according to file layout, corresponding data content in reading out data disk.
In cluster file system of the present invention, by the look ahead system of file layout of readdir++, it is characterized in that, this server module (2) specifically comprises:
Server end receives network interaction submodule (21), for receiving the network interaction information request being sent by this client modules;
Server end obtains file layout's submodule (22), for obtaining file layout and this file layout being encoded;
Server end sends network interaction submodule (23), for according to this network interaction information type receiving, client modules is made to corresponding response;
Server end catalogue mandate submodule (24), for carrying out authorizing of catalogue mandate or recalling of catalogue mandate.
In group of planes file of the present invention, by the look ahead system of file layout of readdir++, it is characterized in that, in customer terminal webpage cache sub-module (13), comprise multiple pages, the organizational form of the page is:
Storing directory item in order in each page, the page using page index number as index foundation, directory entry number in each page is not etc., if the remaining space of current page is not enough to deposit next directory entry, use a new page, until all directory entries are all deposited in page-in in this catalogue, all with ending mark, whether whether to record this directory entry be last directory entry of this page and be last directory entry of this catalogue in ending place of each directory entry.
In cluster file system of the present invention, by the look ahead system of file layout of readdir++, it is characterized in that,
This directory entry comprises inode number, title, file handle, file attribute, file layout and ending mark; Wherein, this inode number, reads directory information for submitting to; This title, for building catalogue cache entry; This file handle and file attribute, for building this index node.
In cluster file system as above, by the look ahead method of system of file layout of readdir++, it is characterized in that, the method comprises the following steps:
Step 1, what client modules obtained file directory from server module reads catalogue mandate;
Step 2, client modules triggers and reads directory operation, obtains the information of the whole directory entries this catalogue from server module, comprises inode number, title, file handle, file attribute, file layout;
Step 3, client modules triggers File Open operation, and judges whether the catalogue cache entry of this local file is present in the cache sub-module (14) of client directory cache item and index node; If existed, enter step 4, if there is no enter step 5;
Step 4, client modules triggers the operation of the catalogue cache entry of searching this file, in this search operation process, obtain catalogue cache entry and file layout thereof, and the reference count of this file layout is increased to 1, then according to the described file layout having obtained, from data disk, with described file layout correspondence position reading out data content, enter step 6;
Step 5, client modules is resolved corresponding directory entry in page cache, and use metadata information wherein to build catalogue cache entry, and the reference count of file layout is increased to 1, then according to the described file layout that obtained, from data disk with described file layout correspondence position reading out data content;
Step 6, client modules triggers file close operation, and the reference count of this file layout is subtracted to 1;
Step 7, client modules checks whether the reference count of the file layout of the All Files in this catalogue is 0 entirely, if be not 0 entirely, explanation has file not close, it is closed to need wait; If 0, can give back and read catalogue mandate to server module.
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 1 is further comprising the steps:
Step 11, client sends network interaction submodule (11) and sends network interaction information to this server module, and that applies for this catalogue reads catalogue mandate;
Step 12, server end receives network interaction submodule (21) and receives network interaction information, judges that this request is catalogue authorization requests;
Step 13, server end catalogue mandate submodule (24) is processed this request, provides the result of whether authorizing;
Step 14, server end sends network interaction submodule (23) result is sent to client;
Step 15, client network interaction submodule (12) receives the result of reading catalogue mandate of whether authorizing this catalogue that this server module is sent; If obtained the authorization, notify client directory authorisation process submodule (18), enter step 16; If do not obtained the authorization, resumes step 11;
Step 16, this catalogue is recorded in acquired reading in catalogue mandate chained list by client directory authorisation process submodule (18).
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 2 is further comprising the steps:
Step 21, client operation behavior triggers submodule (17) triggering and reads directory operation, and customer terminal webpage cache sub-module (13) is found the page at the each directory entry place in catalogue in local page buffer memory; If do not found, enter step 22; If found, enter step 27;
Step 22, client sends network interaction submodule (11) and sends network interaction information to this server module, and directory information is read in application;
Step 23, server end receives network interaction submodule (21) and receives network interaction information;
Step 24, server end obtains file layout's submodule (22) and obtains the file layout of assigned catalogue item;
Step 25, server end sends network interaction submodule (23) metadata information including file layout is packaged in directory entry, and this directory entry is encapsulated in the page in order, and the page is sent to this client modules;
Step 26, client network interaction submodule (12) is received this page that server end is sent, and transfers to customer terminal webpage cache sub-module (13) to preserve;
Step 27, client is resolved directory entry submodule (15) and in the page of local cache, is resolved one by one directory entry, title is used for building catalogue cache entry, file handle and file attribute are used for index building node, the catalogue cache entry newly building and index node are transferred to cache sub-module (14) preservation of client directory cache item and index node;
Step 28, client file layout management module (19) merges the file layout of having deposited in file layout and index node;
Step 29, client is submitted to and is read directory information submodule (16) by title, inode number, type, submits to and reads directory operation, mark has been done to the mark position 1 of reading directory operation in parent directory index node.
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 3 is further comprising the steps:
Step 31, client operation behavior triggers submodule (17) and triggers File Open operation;
Step 32, whether client directory authorisation process submodule (18) inspection obtains is read catalogue mandate, if obtained, enters step 33; If do not obtain, enter step 34;
Step 33, client submission is read directory information submodule (16) and is checked whether this zone bit is 1; If 1, represent to have done to read directory operation, enter step 35; If 0, represent not do to read directory operation, enter step 34;
Step 34, client sends network interaction submodule (11) and sends network interaction information, according to completing opening operation without the flow process of catalogue mandate;
Step 35, client directory cache item and the cache sub-module (14) of index node check whether file to be opened corresponding catalogue cache entry in buffer memory exists; If exist, enter step 4; If there is no, enter step 5.
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 4 is further comprising the steps:
Step 41, client operation behavior triggering submodule (17) triggers the operation of the catalogue cache entry of searching this file;
Step 42, the cache sub-module (14) of client directory cache item and index node directly returns to opening operation by the catalogue cache entry in buffer memory;
Step 43, client file layout management module (19) increases 1 by file layout reference count, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content.
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 5 is further comprising the steps:
Step 51, client is resolved directory entry submodule (15) and in page cache, travel through directory entry corresponding to searching, and matching condition is that title in directory entry and the title of file to be opened fit like a glove, if do not found, enters step 52; If found, enter step 53;
Step 52, client operation behavior triggering submodule (17) to directory entry, enters step 56 by error message assignment;
Step 53, client parsing directory entry submodule (15) is resolved other full details of corresponding directory entry, comprises file handle, file attribute, file layout; File handle and file attribute are used for index building node, and index node is associated with catalogue cache entry;
Step 54, cache sub-module (14) storing directory cache entry and the index node of client directory cache item and index node;
Step 55, returns to opening operation by catalogue cache entry.
Step 56, client file layout management module (19) merges the file layout of having deposited in file layout and this index node, file layout reference count increases 1, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content;
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 6 is further comprising the steps:
Step 61, client operation behavior triggers submodule (17) and triggers file close operation;
Step 62, client file layout management module (19) subtracts 1 to the reference count of this file layout.
In cluster file system of the present invention, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 7 is further comprising the steps:
Step 71, server end catalogue mandate submodule (24) determines to recall catalogue mandate;
Step 72, server end sends network interaction submodule (23) and sends network interaction information to this client modules, notifies it to discharge catalogue mandate;
Step 73, client network interaction submodule (12) receives catalogue mandate recall notice;
Step 74, client file layout management module (19) checks whether file layout's reference count of All Files in this catalogue is 0 entirely; If be 0 entirely, enter step 75; If be not 0 entirely, wait for;
Step 75, client directory authorisation process submodule (18) is extractd catalogue to be recalled from the catalogue chained list of having authorized;
Step 76, the cache sub-module (14) of customer terminal webpage cache sub-module (13) and client directory cache item and index node is removed local cache, and file layout is eliminated simultaneously;
Step 77, client sends network interaction submodule (11) and sends network interaction information, and announcement server end has been given back and has been read catalogue mandate.
Good effect of the present invention is:
In cluster file system proposed by the invention, look ahead in the system and method thereof of file layout by readdir++, after client modules completes and reads directory operation, reduce the network interaction expense of obtaining file layout in follow-up file reading content operation, and simultaneously other operations (comprise open, close, search and again read catalogue etc.) also need not be passed through metadata access, all can directly complete at client terminal local, save metadata network interaction expense completely.Through contrast properties test, this kind of method, in mass small documents applied environment, can take cost by minimum client-cache, exchanges significantly read access performance boost for.
Brief description of the drawings
Fig. 1 is of the present invention by the look ahead structural representation of system of file layout of readdir++;
Fig. 2 is the structural representation of the organizational form of page cache of the present invention;
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing to being further elaborated by look ahead system and the method thereof of file layout of readdir++ in cluster file system of the present invention.Should be appreciated that embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
First,, to describing by the look ahead system of file layout of readdir++ in cluster file system of the present invention, this system architecture as shown in Figure 1.
Native system comprises client modules 1 and server module 2, wherein, in this server module 2, deposits documentary metadata information.In above-mentioned module, further, client modules comprises following 9 submodules:
Client sends network interaction submodule 11:
Send network interaction information (RPC) for client modules to server module.The scene relating in the present invention has: catalogue mandate is read in application, request is read directory information, opening operation, given back and read catalogue mandate.
Client network interaction submodule 12:
The network interaction information sending for client modules reception server module.The scene relating in the present invention has: catalogue mandate is read in application, directory information is read in request.The return message that wherein catalogue mandate is read in application is integer variable 1 or 0, in order to represent whether obtain the authorization; The return message that directory information is read in request is the page, and client modules is directly stored in it in local page buffer memory.
Customer terminal webpage cache sub-module 13:
Client terminal local buffer memory is made up of two parts: a part is the page cache relating in customer terminal webpage cache sub-module 13, and the inside is used for depositing the page; And another part is the catalogue cache entry that relates in the cache sub-module 14 of client directory cache item and index node and the buffer memory of index node, the inside is only for storing directory cache entry and index node.
The organizational form of page cache is as shown in Figure 2:
Each page the inside storing directory item in order, the foundation of the page using page index number as index.Be not quite similar because length difference, the file attribute of title comprise attribute kind, therefore catalogue item size is without fixed numbers, and this does not just cause the directory entry number that can hold in each page not etc.If the remaining space of current page is not enough to deposit next directory entry, use a new page, until all directory entries are all deposited in page-in in this catalogue.Whether ending place of each directory entry is all used ending mark (eof) to record this directory entry is whether last directory entry and this directory entry of this page is last directory entry in this catalogue.
It should be noted that, if client obtains catalogue mandate, can ensure that by reading the page that directory operation obtains be effective.Once catalogue mandate is called back, the information of page cache is no longer valid, therefore needs to remove page cache.
The cache sub-module 14 of client directory cache item and index node:
In the time reading catalogue, the title (name) of looking ahead is used for building catalogue cache entry (dentry), file handle (fh) and file attribute (fattr) are used for index building node (inode), and in file layout (layout) and index node, existing file layout merges.Catalogue cache entry and index node carry out operation associated after, catalogue cache entry and index node can be left in buffer memory.
It should be noted that obtaining catalogue mandate might not represent the buffer memory continuously effective of catalogue cache entry and index node, because do not use for a long time the buffer memory that also can cause client timing to remove catalogue cache entry and index node.In the effective situation of catalogue cache entry, search (lookup) operation and directly return to corresponding catalogue cache entry; And in the situation that catalogue cache entry catalogue item is invalid, can utilize the information in effective page cache, again set up catalogue cache entry and index node.Above two kinds of situations all need not be carried out the network interaction with meta data server.
Client is resolved directory entry submodule 15:
Module travels through all pages in page cache, in each page, travels through one by one all directory entries, and therefrom parses full content, i.e. inode number, title, file handle, file attribute, file layout etc.The application scenarios of this submodule has two places, is respectively: read catalogue (readdir) and search (lookup).
Client is submitted to and is read directory information submodule 16:
This module is the core of the former flow process of readdir+, and client is returned to the essential informations such as the title of all directory entries under catalogue, type, inode number, thereby reaches the object that reads catalogue.Whether client maintains zone bit plusplus_done and represents to do and read directory operation in the index node of parent directory.
Client operation behavior triggers submodule 17:
In the present invention, the operation that client relates to is read catalogue (readdir), searches (lookup), opens (open) and is closed (close) operation.
Client directory authorisation process submodule 18:
An important foundation of the present invention be just client obtain read catalogue mandate, the validity of information in guarantee local cache like this, also just can make operate carry out in this locality completely, save RPC mutual.Client has obtained the catalogue chained list of catalogue mandate two of local maintenances, be respectively and read catalogue mandate chained list and read-write catalogue mandate chained list.In the time getting catalogue mandate, by extremely corresponding chained list end of this direct insertion; In the time reclaiming catalogue mandate, from corresponding chained list by its deletion.
Client file layout management module 19:
This module major responsibility is the increase and decrease of file layout's reference count, file layout and index node existing file layout merged, and according to file layout, corresponding data content in reading out data disk.File layout is mainly made up of following several parts: initial, length, type, pattern, numbering.
In above-mentioned server module, further, server end module comprises following 4 submodules:
Server end receives network interaction submodule 21:
Server end receives the network interaction information request of being sent by client, judges request type.
Server end obtains file layout's submodule 22:
The function of this module is the same with the mode of obtaining file layout in the readdir+ technology of introducing in background technology.
Server end sends network interaction submodule 23:
Server, according to the network interaction information type receiving, provides response.If request is readdir++, server end, by the file layout getting, together with information such as file handle, file attribute, titles, is encapsulated in directory entry.Again directory entry is encapsulated in the page in order, sends to client by network interaction information.If request, for catalogue mandate, provides the result of whether authorizing.
Server end catalogue mandate submodule 24:
Server end is authorized client directory mandate, is that client can keep effectively basis of file layout in buffer memory.This module is mainly by two parts function composition: a part is authorizing of catalogue mandate, and another part is recalling of catalogue mandate.Server end is the catalogue mandate chained list that one of each directory maintenance has been authorized, represents which client has obtained the mandate of this catalogue.It should be noted that, synchronization only has 1 or 0 client to hold the read-write catalogue mandate to certain catalogue, and can have some clients to hold and read catalogue mandate certain catalogue.
Below to describing by the look ahead method of system of file layout of readdir++ in cluster file system of the present invention.
The method comprises the following steps:
Step 1, what client modules obtained file directory from server module reads catalogue mandate;
Step 2, client modules triggers and reads directory operation, obtains the information of the whole directory entries this catalogue from server module, comprises inode number, title, file handle, file attribute, file layout;
Step 3, client modules triggers File Open operation, and judges whether the catalogue cache entry of this local file is present in the cache sub-module 14 of client directory cache item and index node; If existed, enter step 4, if there is no enter step 5;
Step 4, client modules triggers the operation of the catalogue cache entry of searching this file, in this search operation process, obtain catalogue cache entry and file layout thereof, and the reference count of this file layout is increased to 1, then according to the described file layout having obtained, from data disk, with described file layout correspondence position reading out data content, enter step 6;
Step 5, client modules is resolved corresponding directory entry in page cache, and use metadata information wherein to build catalogue cache entry, and the reference count of file layout is increased to 1, then according to the described file layout that obtained, from data disk with described file layout correspondence position reading out data content;
Step 6, client modules triggers file close operation, and the reference count of this file layout is subtracted to 1;
Step 7, client modules checks whether the reference count of the file layout of the All Files in this catalogue is 0 entirely, if be not 0 entirely, explanation has file not close, it is closed to need wait; If 0, can give back and read catalogue mandate to server module.
Wherein, this step 1 is further comprising the steps:
Step 11, client sends network interaction submodule 11 and sends network interaction information to this server module, and that applies for this catalogue reads catalogue mandate;
Step 12, server end receives network interaction submodule 21 and receives network interaction information, judges that this request is catalogue authorization requests;
Step 13, server end catalogue mandate submodule 24 is processed this request, provides the result of whether authorizing;
Step 14, server end sends network interaction submodule 23 result is sent to client;
Step 15, client network interaction submodule 12 receives the result of reading catalogue mandate of whether authorizing this catalogue that this server module is sent; If obtained the authorization, notify client directory authorisation process submodule 18, enter step 16; If do not obtained the authorization, resumes step 11;
Step 16, this catalogue is recorded in acquired reading in catalogue mandate chained list by client directory authorisation process submodule 18.
This step 2 is further comprising the steps:
Step 21, client operation behavior triggers submodule 17 triggerings and reads directory operation, and customer terminal webpage cache sub-module 13 is found the page at the each directory entry place in catalogue in local page buffer memory; If do not found, enter step 22; If found, enter step 27;
Step 22, client sends network interaction submodule 11 and sends network interaction information to this server module, and directory information is read in application;
Step 23, server end receives network interaction submodule 21 and receives network interaction information;
Step 24, server end obtains file layout's submodule 22 and obtains the file layout of assigned catalogue item;
Step 25, server end sends network interaction submodule 23 metadata information including file layout is packaged in directory entry, and this directory entry is encapsulated in the page in order, and the page is sent to this client modules;
Step 26, client network interaction submodule 12 is received this page that server end is sent, and transfers to customer terminal webpage cache sub-module 13 to preserve;
Step 27, client is resolved directory entry submodule 15 and in the page of local cache, is resolved one by one directory entry, title is used for building catalogue cache entry, file handle and file attribute are used for index building node, transfer to the cache sub-module 14 of client directory cache item and index node to preserve the catalogue cache entry and the index node that newly build;
Step 28, client file layout management module 19 merges the file layout of having deposited in file layout and index node;
Step 29, client is submitted to and is read directory information submodule 16 by title, inode number, type, submits to and reads directory operation, mark has been done to the zone bit (plusplus_done) of reading directory operation in parent directory index node and has put 1.
This step 3 is further comprising the steps:
Step 31, client operation behavior triggers submodule 17 and triggers File Open operation;
Step 32, whether client directory authorisation process submodule 18 checks to obtain reads catalogue mandate, if obtained, enters step 33; If do not obtain, enter step 34;
Step 33, client is submitted to and is read whether directory information submodule 16 these zone bits of inspection are 1; If 1, represent to have done to read directory operation, enter step 35; If 0, represent not do to read directory operation, enter step 34;
Step 34, client sends network interaction submodule 11 and sends network interaction information, according to completing opening operation without the flow process of catalogue mandate;
Step 35, client directory cache item and the cache sub-module 14 of index node check whether file to be opened corresponding catalogue cache entry in buffer memory exists; If exist, enter step 4; If there is no, enter step 5.
This step 4 is further comprising the steps:
Step 41, client operation behavior triggering submodule 17 triggers the operation of the catalogue cache entry of searching this file;
Step 42, the cache sub-module 14 of client directory cache item and index node directly returns to opening operation by the catalogue cache entry in buffer memory;
Step 43, client file layout management module 19 increases 1 by file layout reference count, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content.
This step 5 is further comprising the steps:
Step 51, client is resolved directory entry submodule 15 traversal in page cache and is found corresponding directory entry, and matching condition is that title in directory entry and the title of file to be opened fit like a glove, if do not found, enters step 52; If found, enter step 53;
Step 52, client operation behavior triggering submodule 17 to directory entry, enters step 56 by error message assignment;
Step 53, client parsing directory entry submodule 15 is resolved other full details of corresponding directory entry, comprises file handle, file attribute, file layout; File handle and file attribute are used for index building node, and index node is associated with catalogue cache entry;
Step 54, cache sub-module 14 storing directory cache entry and the index nodes of client directory cache item and index node;
Step 55, returns to opening operation by catalogue cache entry.
Step 56, client file layout management module 19 merges the file layout of having deposited in file layout and this index node, file layout reference count increases 1, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content;
This step 6 is further comprising the steps:
Step 61, client operation behavior triggers submodule 17 and triggers file close operation;
Step 62, client file layout management module 19 subtracts 1 to the reference count of this file layout.
This step 7 is further comprising the steps:
Step 71, server end catalogue mandate submodule 24 determines to recall catalogue mandate;
Step 72, server end sends network interaction submodule 23 and sends network interaction information to this client modules, notifies it to discharge catalogue mandate;
Step 73, client network interaction submodule 12 receives catalogue mandate recall notice;
Step 74, client file layout management module 19 checks whether file layout's reference count of All Files in this catalogue is 0 entirely; If be 0 entirely, enter step 75; If be not 0 entirely, wait for;
Step 75, client directory authorisation process submodule 18 is extractd catalogue to be recalled from the catalogue chained list of having authorized;
Step 76, the cache sub-module 14 of customer terminal webpage cache sub-module 13 and client directory cache item and index node is removed local cache, and file layout is eliminated simultaneously;
Step 77, client sends network interaction submodule 11 and sends network interaction information, and announcement server end has been given back and has been read catalogue mandate.

Claims (13)

  1. In cluster file system by the look ahead system of file layout of readdir++, for reading fast mass small documents, it is characterized in that, this system comprises:
    Client modules (1), for obtaining from server module (2) or giving back and read catalogue mandate; Read after catalogue mandate when obtaining this, send and read catalog request to this server module (2); The page stores that contains file layout that this server module (2) is sent, in local cache, when this client modules (1) reads the file under this catalogue, is directly used the file layout of this file of storing in local cache;
    This server module (2), stores the metadata information of small documents, for authorizing to this client modules (1) or recalling this and read catalogue mandate; In the time receiving this and read catalog request, also the metadata information including file layout is packaged in directory entry, and this directory entry is encapsulated in the page in order, this page is sent to this client modules (1).
  2. In cluster file system as claimed in claim 1 by the look ahead system of file layout of readdir++, it is characterized in that, this client modules (1) specifically comprises:
    Client sends network interaction submodule (11), for sending network interaction information to this server module;
    Client network interaction submodule (12), the network interaction information sending for receiving this server module;
    Customer terminal webpage cache sub-module (13), the page sending for depositing this server module;
    The cache sub-module (14) of client directory cache item and index node, for storing directory cache entry and index node;
    Client is resolved directory entry submodule (15), for traveling through all pages in this customer terminal webpage cache sub-module (13), and parses metadata information;
    Client is submitted to and is read directory information submodule (16), reads directory information for submitting to;
    Client operation behavior triggers submodule (17), reads catalogue, searches, opens and shutoff operation for triggering;
    Whether client directory authorisation process submodule (18), obtain catalogue mandate for checking, catalogue mandate to be recalled extractd from the catalogue chained list of having authorized;
    Client file layout management module (19), for increasing and decreasing file layout's reference count, merges existing file layout in file layout's index node corresponding with it, and according to file layout, corresponding data content in reading out data disk.
  3. In cluster file system as claimed in claim 1 by the look ahead system of file layout of readdir++, it is characterized in that, this server module (2) specifically comprises:
    Server end receives network interaction submodule (21), for receiving the network interaction information request being sent by this client modules;
    Server end obtains file layout's submodule (22), for obtaining file layout and this file layout being encoded;
    Server end sends network interaction submodule (23), for according to this network interaction information type receiving, client modules is made to corresponding response;
    Server end catalogue mandate submodule (24), for carrying out authorizing of catalogue mandate or recalling of catalogue mandate.
  4. In cluster file system as claimed in claim 2 by the look ahead system of file layout of readdir++, it is characterized in that, in customer terminal webpage cache sub-module (13), comprise multiple pages, the organizational form of page cache is:
    Storing directory item in order in each page, the page using page index number as index foundation, directory entry number in each page is not etc., if the remaining space of current page is not enough to deposit next directory entry, use a new page, until all directory entries are all deposited in page-in in this catalogue, all with ending mark, whether whether to record this directory entry be last directory entry of this page and be last directory entry of this catalogue in ending place of each directory entry.
  5. In cluster file system as described in claim 1 or 4 by the look ahead system of file layout of readdir++, it is characterized in that,
    This directory entry comprises inode number, title, file handle, file attribute, file layout and ending mark; Wherein, this inode number, reads directory information for submitting to; This title, for building catalogue cache entry; This file handle and file attribute, for building this index node.
  6. In cluster file system as described in claim 1-5 by the look ahead method of system of file layout of readdir++, it is characterized in that, the method comprises the following steps:
    Step 1, what client modules obtained file directory from server module reads catalogue mandate;
    Step 2, client modules triggers and reads directory operation, obtains the information of the whole directory entries this catalogue from server module, comprises inode number, title, file handle, file attribute, file layout;
    Step 3, client modules triggers File Open operation, and judges whether the catalogue cache entry of this local file is present in the cache sub-module (14) of client directory cache item and index node; If existed, enter step 4, if there is no enter step 5;
    Step 4, client modules triggers the operation of the catalogue cache entry of searching this file, in this search operation process, obtain catalogue cache entry and file layout thereof, and the reference count of this file layout is increased to 1, then according to the described file layout that obtained from data disk with described file layout correspondence position reading out data content, enter step 6;
    Step 5, client modules is resolved corresponding directory entry in page cache, and use metadata information wherein to build catalogue cache entry, and the reference count of file layout is increased to 1, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content;
    Step 6, client modules triggers file close operation, and the reference count of this file layout is subtracted to 1;
    Step 7, client modules checks whether the reference count of the file layout of the All Files in this catalogue is 0 entirely, if be not 0 entirely, explanation has file not close, it is closed to need wait; If 0, can give back and read catalogue mandate to server module.
  7. In cluster file system as claimed in claim 6 by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 1 is further comprising the steps:
    Step 11, client sends network interaction submodule (11) and sends network interaction information to this server module, and that applies for this catalogue reads catalogue mandate;
    Step 12, server end receives network interaction submodule (21) and receives network interaction information, judges that this request is catalogue authorization requests;
    Step 13, server end catalogue mandate submodule (24) is processed this request, provides the result of whether authorizing;
    Step 14, server end sends network interaction submodule (23) result is sent to client;
    Step 15, client network interaction submodule (12) receives the result of reading catalogue mandate of whether authorizing this catalogue that this server module is sent; If obtained the authorization, notify client directory authorisation process submodule (18), enter step 16; If do not obtained the authorization, resumes step 11;
    Step 16, this catalogue is recorded in acquired reading in catalogue mandate chained list by client directory authorisation process submodule (18).
  8. In cluster file system as claimed in claim 6 by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 2 is further comprising the steps:
    Step 21, client operation behavior triggers submodule (17) triggering and reads directory operation, and customer terminal webpage cache sub-module (13) is found the page at the each directory entry place in catalogue in local page buffer memory; If do not found, enter step 22; If found, enter step 27;
    Step 22, client sends network interaction submodule (11) and sends network interaction information to this server module, and directory information is read in application;
    Step 23, server end receives network interaction submodule (21) and receives network interaction information;
    Step 24, server end obtains file layout's submodule (22) and obtains the file layout of assigned catalogue item;
    Step 25, server end sends network interaction submodule (23) metadata information including file layout is packaged in directory entry, and this directory entry is encapsulated in the page in order, and the page is sent to this client modules;
    Step 26, client network interaction submodule (12) is received this page that server end is sent, and transfers to customer terminal webpage cache sub-module (13) to preserve;
    Step 27, client is resolved directory entry submodule (15) and in the page of local cache, is resolved one by one directory entry, title is used for building catalogue cache entry, file handle and file attribute are used for index building node, the catalogue cache entry newly building and index node are transferred to cache sub-module (14) preservation of client directory cache item and index node;
    Step 28, client file layout management module (19) merges the file layout of having deposited in file layout and index node;
    Step 29, client is submitted to and is read directory information submodule (16) by title, inode number, type, submits to and reads directory operation, mark has been done to the mark position 1 of reading directory operation in parent directory index node.
  9. In cluster file system as claimed in claim 6 by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 3 is further comprising the steps:
    Step 31, client operation behavior triggers submodule (17) and triggers File Open operation;
    Step 32, whether client directory authorisation process submodule (18) inspection obtains is read catalogue mandate, if obtained, enters step 33; If do not obtain, enter step 34;
    Step 33, client submission is read directory information submodule (16) and is checked whether this zone bit is 1; If 1, represent to have done to read directory operation, enter step 35; If 0, represent not do to read directory operation, enter step 34;
    Step 34, client sends network interaction submodule (11) and sends network interaction information, according to completing opening operation without the flow process of catalogue mandate;
    Step 35, client directory cache item and the cache sub-module (14) of index node check whether file to be opened corresponding catalogue cache entry in buffer memory exists; If exist, enter step 4; If there is no, enter step 5.
  10. In cluster file system as claimed in claim 6 by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 4 is further comprising the steps:
    Step 41, client operation behavior triggering submodule (17) triggers the operation of the catalogue cache entry of searching this file;
    Step 42, the cache sub-module (14) of client directory cache item and index node directly returns to opening operation by the catalogue cache entry in buffer memory;
    Step 43, client file layout management module (19) increases 1 by file layout reference count, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content.
  11. In 11. cluster file systems as claimed in claim 6, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 5 is further comprising the steps:
    Step 51, client is resolved directory entry submodule (15) and in page cache, travel through directory entry corresponding to searching, and matching condition is that title in directory entry and the title of file to be opened fit like a glove, if do not found, enters step 52; If found, enter step 53;
    Step 52, client operation behavior triggering submodule (17) to directory entry, enters step 56 by error message assignment;
    Step 53, client parsing directory entry submodule (15) is resolved other full details of corresponding directory entry, comprises file handle, file attribute, file layout; File handle and file attribute are used for index building node, and index node is associated with catalogue cache entry;
    Step 54, cache sub-module (14) storing directory cache entry and the index node of client directory cache item and index node;
    Step 55, returns to opening operation by catalogue cache entry.
    Step 56, client file layout management module (19) merges the file layout of having deposited in file layout and this index node, file layout reference count increases 1, then according to the described file layout having obtained from data disk with described file layout correspondence position reading out data content.
  12. In 12. cluster file systems as claimed in claim 6, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 6 is further comprising the steps:
    Step 61, client operation behavior triggers submodule (17) and triggers file close operation;
    Step 62, client file layout management module (19) subtracts 1 to the reference count of this file layout.
  13. In 13. cluster file systems as claimed in claim 6, by the look ahead method of system of file layout of readdir++, it is characterized in that, this step 7 is further comprising the steps:
    Step 71, server end catalogue mandate submodule (24) determines to recall catalogue mandate;
    Step 72, server end sends network interaction submodule (23) and sends network interaction information to this client modules, notifies it to discharge catalogue mandate;
    Step 73, client network interaction submodule (12) receives catalogue mandate recall notice;
    Step 74, client file layout management module (19) checks whether file layout's reference count of All Files in this catalogue is 0 entirely; If be 0 entirely, enter step 75; If be not 0 entirely, wait for;
    Step 75, client directory authorisation process submodule (18) is extractd catalogue to be recalled from the catalogue chained list of having authorized;
    Step 76, the cache sub-module (14) of customer terminal webpage cache sub-module (13) and client directory cache item and index node is removed local cache, and file layout is eliminated simultaneously;
    Step 77, client sends network interaction submodule (11) and sends network interaction information, and announcement server end has been given back and has been read catalogue mandate.
CN201410076739.0A 2014-03-04 2014-03-04 System and method for prefetching file layout through readdir++ in cluster file system Expired - Fee Related CN103902660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410076739.0A CN103902660B (en) 2014-03-04 2014-03-04 System and method for prefetching file layout through readdir++ in cluster file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410076739.0A CN103902660B (en) 2014-03-04 2014-03-04 System and method for prefetching file layout through readdir++ in cluster file system

Publications (2)

Publication Number Publication Date
CN103902660A true CN103902660A (en) 2014-07-02
CN103902660B CN103902660B (en) 2017-04-12

Family

ID=50993982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410076739.0A Expired - Fee Related CN103902660B (en) 2014-03-04 2014-03-04 System and method for prefetching file layout through readdir++ in cluster file system

Country Status (1)

Country Link
CN (1) CN103902660B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933144A (en) * 2015-06-19 2015-09-23 中国科学院计算技术研究所 System and method thereof for guaranteeing data effectiveness in parallel network file system
CN105095353A (en) * 2015-06-19 2015-11-25 中国科学院计算技术研究所 System and method of busy-wait after pre-reading small file in parallel network file system
CN105119955A (en) * 2015-07-09 2015-12-02 中国科学院计算技术研究所 Method and system for supporting reading of multi-page directory in distributed file system
CN109947719A (en) * 2019-03-21 2019-06-28 昆山九华电子设备厂 A method of it improving cluster and reads directory entry efficiency under catalogue
CN112286897A (en) * 2020-10-10 2021-01-29 苏州浪潮智能科技有限公司 Method for communication between PNFS server and client
CN113485639A (en) * 2021-06-18 2021-10-08 济南浪潮数据技术有限公司 Distributed storage IO speed optimization method, system, terminal and storage medium
CN113608694A (en) * 2021-07-27 2021-11-05 北京达佳互联信息技术有限公司 Data migration method, information processing method, device, server and medium
CN114003562A (en) * 2021-12-29 2022-02-01 苏州浪潮智能科技有限公司 Directory traversal method, device and equipment and readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179185B (en) * 2012-12-25 2015-07-08 中国科学院计算技术研究所 Method and system for creating files in cache of distributed file system client

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU-CHAO等: "Research on Implement Snapshot of pNFS Distributed File System", 《APPLIED MATHEMATICS & INFORMATION SCIENCES》 *
冯振乾: "并行网络文件系统数据管理技术的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095353A (en) * 2015-06-19 2015-11-25 中国科学院计算技术研究所 System and method of busy-wait after pre-reading small file in parallel network file system
CN104933144B (en) * 2015-06-19 2018-03-30 中国科学院计算技术研究所 Ensure the system and method for data validity in a kind of parallel network file system
CN105095353B (en) * 2015-06-19 2018-12-04 中国科学院计算技术研究所 The equal system and method that does after small documents is pre-read in a kind of parallel network file system
CN104933144A (en) * 2015-06-19 2015-09-23 中国科学院计算技术研究所 System and method thereof for guaranteeing data effectiveness in parallel network file system
CN105119955A (en) * 2015-07-09 2015-12-02 中国科学院计算技术研究所 Method and system for supporting reading of multi-page directory in distributed file system
CN109947719B (en) * 2019-03-21 2022-10-11 昆山九华电子设备厂 Method for improving efficiency of cluster reading directory entries under directory
CN109947719A (en) * 2019-03-21 2019-06-28 昆山九华电子设备厂 A method of it improving cluster and reads directory entry efficiency under catalogue
CN112286897A (en) * 2020-10-10 2021-01-29 苏州浪潮智能科技有限公司 Method for communication between PNFS server and client
CN112286897B (en) * 2020-10-10 2023-01-10 苏州浪潮智能科技有限公司 Method for communication between PNFS server and client
CN113485639A (en) * 2021-06-18 2021-10-08 济南浪潮数据技术有限公司 Distributed storage IO speed optimization method, system, terminal and storage medium
CN113485639B (en) * 2021-06-18 2024-02-20 济南浪潮数据技术有限公司 IO speed optimization method, system, terminal and storage medium for distributed storage
CN113608694A (en) * 2021-07-27 2021-11-05 北京达佳互联信息技术有限公司 Data migration method, information processing method, device, server and medium
CN113608694B (en) * 2021-07-27 2024-03-19 北京达佳互联信息技术有限公司 Data migration method, information processing method, device, server and medium
CN114003562A (en) * 2021-12-29 2022-02-01 苏州浪潮智能科技有限公司 Directory traversal method, device and equipment and readable storage medium

Also Published As

Publication number Publication date
CN103902660B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
US10958752B2 (en) Providing access to managed content
CN103902660A (en) System and method for prefetching file layout through readdir++ in cluster file system
US8738572B2 (en) System and method for storing data streams in a distributed environment
US8706710B2 (en) Methods for storing data streams in a distributed environment
CN104536959A (en) Optimized method for accessing lots of small files for Hadoop
CN104408111A (en) Method and device for deleting duplicate data
US11468053B2 (en) Servicing queries of a hybrid event index
CN103916465A (en) Data pre-reading device based on distributed file system and method thereof
US9262511B2 (en) System and method for indexing streams containing unstructured text data
JP5557824B2 (en) Differential indexing method for hierarchical file storage
CN107368608A (en) The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN110287201A (en) Data access method, device, equipment and storage medium
CN102984256B (en) Processing method and system for metadata based on authorization manner
CN105912675A (en) Batch delete/query method and apparatus for merging small files
CN105868234A (en) Update method and device of caching data
CN104021137A (en) Method and system for opening and closing file locally through client side based on catalogue authorization
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
CN103136294A (en) File operating method and device
CN114416676A (en) Data processing method, device, equipment and storage medium
US10628391B1 (en) Method and system for reducing metadata overhead in a two-tier storage architecture
CN111143366B (en) High-efficiency storage method for massive large object data
CN112650711A (en) Massive small file storage method based on Redis and HDFS
CN105095353B (en) The equal system and method that does after small documents is pre-read in a kind of parallel network file system
CN103810209B (en) A kind of method and system saving data
Sujatha et al. An efficient enhanced prefix hash tree model for optimizing the storage and image deduplication in cloud

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170412