CN102541985A - Organization method of client directory cache in distributed file system - Google Patents
Organization method of client directory cache in distributed file system Download PDFInfo
- Publication number
- CN102541985A CN102541985A CN2011103264489A CN201110326448A CN102541985A CN 102541985 A CN102541985 A CN 102541985A CN 2011103264489 A CN2011103264489 A CN 2011103264489A CN 201110326448 A CN201110326448 A CN 201110326448A CN 102541985 A CN102541985 A CN 102541985A
- Authority
- CN
- China
- Prior art keywords
- client
- catalogue
- read
- directory
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses an organization method of a client directory cache in a distributed file system, wherein the distributed file system adopts a multi-metadata server framework, i.e. contents of a single directory are distributed on a plurality of metadata servers. The reason why the multi-metadata framework is adopted is mainly that the pressure of metadata access can be decentralized and the concurrency can be improved. According to the method, aiming at the characteristic that people write less but read more in network application, contents of a directory item and corresponding index nodes are remained in the cache of a client side, thereby avoiding the client side from communicating with the servers for multiple times when reading repeatedly; meanwhile when a directory is accessed for the first time, the directory items of the directory distributed on different metadata servers are pre-read, and the file index nodes and the file contents are pre-read according to a default pre-reading strategy or a pre-reading strategy issued by an application program. Consequently when the application program needs to access a certain file under the directory items, the metadata and data of the file are pre-read into the local cache of the client side already, so that the execution speed of the application program is accelerated greatly.
Description
Technical field
The present invention relates to directory entry management in the distributed file system, specifically, relate to the method for organizing of client directory buffer memory in a kind of distributed file system.
Background technology
The develop rapidly of Along with computer technology, various application are increasing for the demand of storage, and this is wherein typical with the application of network.The storage demand of network application roughly is divided into two kinds, and a kind of is that big file is main storage demand, uses like audio-video network, and the characteristics of this type application are that number of files is few, but the size of single file normally GB even TB rank; A kind of in addition is main storage demand with the small documents, and like online shopping mall, portal website etc., the characteristics of this type demand are that single file is little; But quantity of documents is huge; Deposit up to ten million files under the common single catalogue, and this class file only writes once usually, later on to be read as the master.
In most of network application, in order to satisfy the demand of storage, distributed file system is introduced in the diverse network application, and that this is wherein representative is NFS, lustre, GPFS etc.The characteristics of this type distributed file system are that the operation for big file has reasonable performance, if but the small documents of enormous amount is arranged under the single catalogue, then the efficient of the catalog item of this type of file system then is difficult to satisfactory.Therefore, a lot of network companies, like Taobao, Netease, Tengxun etc., in order to satisfy own demand, one after another to the small documents design Storage storage architecture of suitable own demand.
In the parallel file system of seeing at present that storage is optimized to small documents, the overwhelming majority adopts single group metadata framework, and client just goes to read on the meta data server when carrying out metadata access usually when needed.Like this; The delay of network will be shone into very big influence the response speed of client; And if the data that client need be visited are not in the internal memory of meta data server; Then also need visit disk, this has influenced the real-time of using with regard to making the access time of application program have a big chunk to be wasted in above the IO.
Summary of the invention
The present invention is intended to disclose the method for organizing of client directory item buffer memory in a kind of distributed file system, and this method can solve the following low problem of mass small documents access efficiency of monocular record in the network application effectively.
The method for organizing of client directory buffer memory in a kind of distributed file system,
Divide the catalogue subclass as required, the directory entry in the single catalogue is carried out Hash operation, store in each catalogue subclass, each catalogue subclass is distributed on the meta data server, and the directory entry buffer structure on the client is organized according to the catalogue subclass.
Preferably, when application need traveled through said catalogue, whether client was at first inquired about local cache and is existed, if exist, then directly returned to the client; If do not exist, then read to meta data server, read completion after, client leaves it in local cache, returns to application then.
Preferably, said reading adopts parallel mode to read.
Preferably, said client can be looked ahead to the file under this catalogue after reading catalogue for the first time.
Preferably, said strategy of looking ahead is: all directory entries under this catalogue are corresponded to read corresponding index node on the meta data server.
Preferably, said order of looking ahead can be sent by answering, and client is read back the index node of this batch file, and then removed the data server prefetch data when receiving prefetch request from meta data server.
In the present invention, distributed file system adopts the multivariate data server architecture, and promptly the distribution of content of single catalogue is on a plurality of meta data servers.Why selecting the framework of multivariate data for use, mainly is in order to disperse the pressure of metadata access, to improve concurrency.Write to network application and to read many characteristics less, the present invention keeps the content and the corresponding index node of directory entry in the buffer memory of client, and needs repeatedly communicated by letter with server when avoiding client repeatedly to read; Simultaneously; When catalogue of maiden visit, the directory entry that is distributed in this catalogue on the different meta data servers walked abreast read in advance, simultaneously; According to acquiescence read in advance that strategy or application program issue read strategy in advance, file inode and file content are read in advance.Like this, when application program needed certain file of access catalog item, metadata of this document and data possibly read in the client terminal local buffer memory in advance, thereby the execution speed of accelerating application greatly.
Embodiment
Elaborate below in conjunction with embodiment:
(1) among the present invention, the directory entry in the single catalogue carries out Hash according to its name earlier, is divided into some subclass, and each subclass is distributed on the meta data server.
(2) the directory entry buffer structure on the client is organized according to the directory entry subclass, promptly the directory entry that is distributed on each metadata is managed respectively, keeps independent each other.
(3) when certain catalogue of application need traversal, whether client inquiry local cache earlier exists, if exist, then directly returns to the user.If buffer memory does not exist, then need read to meta data server.When reading,, therefore adopt parallel mode to read in the invention, can quicken the speed that directory entry reads like this because all directory entries of single catalogue leave on the different meta data servers according to subclass.After directory entry read, client left it in local cache earlier, returns to application program then.
(4) final purpose of a catalogue of application access is the file of visiting under it usually, therefore after the directory entry traversal, and then can have the request of visiting each file under this catalogue to be handed down to the client of file system successively.In order to make full use of the professional time of self handling of application program, among the present invention, file system client can be looked ahead to the file under this catalogue after reading catalogue for the first time.The default policy of looking ahead is all directory entries under this catalogue to be corresponded to read corresponding index node information on the meta data server.Application program also can be according to the characteristics of self; Issue the strategy of reading in advance to client, read a certain batch file in advance like needs, client receive read strategy request in advance after; Can the index node of this batch file be read back from meta data server, and then remove prefetch data on the data server.Like this, when application program need be visited concrete file, the possible data that it needs were through reading to have entered into the local cache of client in advance, thereby can significantly reduce the response time of application program.
Claims (6)
1. the method for organizing of client directory buffer memory in the distributed file system is characterized in that:
Divide the catalogue subclass as required, the directory entry in the single catalogue is carried out Hash operation, store in each catalogue subclass, each catalogue subclass is distributed on the meta data server, and the directory entry buffer structure on the client is organized according to the catalogue subclass.
2. the method for claim 1, it is characterized in that: when application need traveled through said catalogue, whether client was at first inquired about local cache and is existed, if exist, then directly returned to the client; If do not exist, then read to meta data server, read completion after, client leaves it in local cache, returns to application then.
3. method as claimed in claim 2 is characterized in that: said reading adopts parallel mode to read.
4. the method for claim 1 is characterized in that: preferred, said client can be looked ahead to the file under this catalogue after reading catalogue for the first time.
5. method as claimed in claim 4 is characterized in that: said strategy of looking ahead is: all directory entries under this catalogue are corresponded to read corresponding index node on the meta data server.
6. method as claimed in claim 4; It is characterized in that:, said order of looking ahead can be sent by answering, and client is when receiving prefetch request; The index node of this batch file is read back from meta data server, and then remove the data server prefetch data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103264489A CN102541985A (en) | 2011-10-25 | 2011-10-25 | Organization method of client directory cache in distributed file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103264489A CN102541985A (en) | 2011-10-25 | 2011-10-25 | Organization method of client directory cache in distributed file system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102541985A true CN102541985A (en) | 2012-07-04 |
Family
ID=46348888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103264489A Pending CN102541985A (en) | 2011-10-25 | 2011-10-25 | Organization method of client directory cache in distributed file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102541985A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819599A (en) * | 2012-08-15 | 2012-12-12 | 华数传媒网络有限公司 | Method for constructing hierarchical catalogue based on consistent hashing data distribution |
CN103150394A (en) * | 2013-03-25 | 2013-06-12 | 中国人民解放军国防科学技术大学 | Distributed file system metadata management method facing to high-performance calculation |
CN103685453A (en) * | 2013-09-11 | 2014-03-26 | 华中科技大学 | A method for obtaining metadata in a cloud storage system |
CN104125253A (en) * | 2013-04-27 | 2014-10-29 | 博雅网络游戏开发(深圳)有限公司 | Network application realization method and system |
CN104239435A (en) * | 2014-08-29 | 2014-12-24 | 四川长虹电器股份有限公司 | Distributed picture caching method based on picture thumbnail processing |
CN104580437A (en) * | 2014-12-30 | 2015-04-29 | 创新科存储技术(深圳)有限公司 | Cloud storage client and high-efficiency data access method thereof |
WO2015176659A1 (en) * | 2014-05-22 | 2015-11-26 | Huawei Technologies Co., Ltd. | System and method for pre-fetching |
CN105138545A (en) * | 2015-07-09 | 2015-12-09 | 中国科学院计算技术研究所 | Method and system for asynchronously pre-reading directory entries in distributed file system |
CN105677892A (en) * | 2016-01-29 | 2016-06-15 | 华为技术有限公司 | Method and device for reading catalog subitem metadata |
CN106570113A (en) * | 2016-10-25 | 2017-04-19 | 中国电力科学研究院 | Cloud storage method and system for mass vector slice data |
CN106775994A (en) * | 2017-02-28 | 2017-05-31 | 郑州云海信息技术有限公司 | The method and device of a kind of metadata cluster catalogue scheduling |
CN107066503A (en) * | 2017-01-05 | 2017-08-18 | 郑州云海信息技术有限公司 | The method and device of magnanimity metadata burst distribution |
CN107291870A (en) * | 2017-06-15 | 2017-10-24 | 郑州云海信息技术有限公司 | Files in batch read method in a kind of distributed storage |
CN107491545A (en) * | 2017-08-25 | 2017-12-19 | 郑州云海信息技术有限公司 | The catalogue read method and client of a kind of distributed memory system |
CN108319634A (en) * | 2017-12-15 | 2018-07-24 | 创新科存储技术(深圳)有限公司 | The directory access method and apparatus of distributed file system |
CN110321080A (en) * | 2019-07-02 | 2019-10-11 | 北京计算机技术及应用研究所 | A kind of warm data pool pre-head method of cross-node |
CN110334073A (en) * | 2019-06-13 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of metadata forecasting method, device, terminal, server and storage medium |
CN111258956A (en) * | 2019-03-22 | 2020-06-09 | 深圳市远行科技股份有限公司 | Method and equipment for pre-reading mass data files facing far end |
CN112559574A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN112799589A (en) * | 2021-01-14 | 2021-05-14 | 新华三大数据技术有限公司 | Data reading method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1614591A (en) * | 2004-12-02 | 2005-05-11 | 中国科学院计算技术研究所 | Method for organizing and accessing distributive catalogue of document system |
CN1692356A (en) * | 2002-11-14 | 2005-11-02 | 易斯龙系统公司 | Systems and methods for restriping files in a distributed file system |
CN102024016A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Rapid data restoration method for distributed file system (DFS) |
CN102024017A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Method for traversing directory entries of distribution type file system in repetition-free and omission-free way |
CN102024019A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Suffix tree based catalog organizing method in distributed file system |
-
2011
- 2011-10-25 CN CN2011103264489A patent/CN102541985A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1692356A (en) * | 2002-11-14 | 2005-11-02 | 易斯龙系统公司 | Systems and methods for restriping files in a distributed file system |
CN1614591A (en) * | 2004-12-02 | 2005-05-11 | 中国科学院计算技术研究所 | Method for organizing and accessing distributive catalogue of document system |
CN102024016A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Rapid data restoration method for distributed file system (DFS) |
CN102024017A (en) * | 2010-11-04 | 2011-04-20 | 天津曙光计算机产业有限公司 | Method for traversing directory entries of distribution type file system in repetition-free and omission-free way |
CN102024019A (en) * | 2010-11-04 | 2011-04-20 | 曙光信息产业(北京)有限公司 | Suffix tree based catalog organizing method in distributed file system |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819599B (en) * | 2012-08-15 | 2016-06-01 | 华数传媒网络有限公司 | The method building hierarchical directory in uncommon data distributed basis is breathed out in consistence |
CN102819599A (en) * | 2012-08-15 | 2012-12-12 | 华数传媒网络有限公司 | Method for constructing hierarchical catalogue based on consistent hashing data distribution |
CN103150394B (en) * | 2013-03-25 | 2014-07-23 | 中国人民解放军国防科学技术大学 | Distributed file system metadata management method facing to high-performance calculation |
CN103150394A (en) * | 2013-03-25 | 2013-06-12 | 中国人民解放军国防科学技术大学 | Distributed file system metadata management method facing to high-performance calculation |
CN104125253B (en) * | 2013-04-27 | 2017-10-24 | 博雅网络游戏开发(深圳)有限公司 | The method and system of network application |
CN104125253A (en) * | 2013-04-27 | 2014-10-29 | 博雅网络游戏开发(深圳)有限公司 | Network application realization method and system |
CN103685453A (en) * | 2013-09-11 | 2014-03-26 | 华中科技大学 | A method for obtaining metadata in a cloud storage system |
CN103685453B (en) * | 2013-09-11 | 2016-08-03 | 华中科技大学 | The acquisition methods of metadata in a kind of cloud storage system |
WO2015176659A1 (en) * | 2014-05-22 | 2015-11-26 | Huawei Technologies Co., Ltd. | System and method for pre-fetching |
CN106462610A (en) * | 2014-05-22 | 2017-02-22 | 华为技术有限公司 | System and method for pre-fetching |
CN104239435A (en) * | 2014-08-29 | 2014-12-24 | 四川长虹电器股份有限公司 | Distributed picture caching method based on picture thumbnail processing |
CN104580437A (en) * | 2014-12-30 | 2015-04-29 | 创新科存储技术(深圳)有限公司 | Cloud storage client and high-efficiency data access method thereof |
CN105138545A (en) * | 2015-07-09 | 2015-12-09 | 中国科学院计算技术研究所 | Method and system for asynchronously pre-reading directory entries in distributed file system |
CN105138545B (en) * | 2015-07-09 | 2018-10-09 | 中国科学院计算技术研究所 | The asynchronous method and system pre-read of directory entry in a kind of distributed file system |
CN105677892B (en) * | 2016-01-29 | 2018-12-25 | 华为技术有限公司 | A kind of method and device reading catalogue subitem metadata |
CN105677892A (en) * | 2016-01-29 | 2016-06-15 | 华为技术有限公司 | Method and device for reading catalog subitem metadata |
CN106570113B (en) * | 2016-10-25 | 2022-04-01 | 中国电力科学研究院 | Mass vector slice data cloud storage method and system |
CN106570113A (en) * | 2016-10-25 | 2017-04-19 | 中国电力科学研究院 | Cloud storage method and system for mass vector slice data |
CN107066503A (en) * | 2017-01-05 | 2017-08-18 | 郑州云海信息技术有限公司 | The method and device of magnanimity metadata burst distribution |
CN106775994A (en) * | 2017-02-28 | 2017-05-31 | 郑州云海信息技术有限公司 | The method and device of a kind of metadata cluster catalogue scheduling |
CN107291870A (en) * | 2017-06-15 | 2017-10-24 | 郑州云海信息技术有限公司 | Files in batch read method in a kind of distributed storage |
CN107491545A (en) * | 2017-08-25 | 2017-12-19 | 郑州云海信息技术有限公司 | The catalogue read method and client of a kind of distributed memory system |
CN108319634B (en) * | 2017-12-15 | 2021-08-06 | 深圳创新科技术有限公司 | Directory access method and device for distributed file system |
CN108319634A (en) * | 2017-12-15 | 2018-07-24 | 创新科存储技术(深圳)有限公司 | The directory access method and apparatus of distributed file system |
CN111258956A (en) * | 2019-03-22 | 2020-06-09 | 深圳市远行科技股份有限公司 | Method and equipment for pre-reading mass data files facing far end |
CN111258956B (en) * | 2019-03-22 | 2023-11-24 | 深圳市远行科技股份有限公司 | Method and device for prereading far-end mass data files |
CN110334073A (en) * | 2019-06-13 | 2019-10-15 | 腾讯科技(深圳)有限公司 | A kind of metadata forecasting method, device, terminal, server and storage medium |
CN110321080A (en) * | 2019-07-02 | 2019-10-11 | 北京计算机技术及应用研究所 | A kind of warm data pool pre-head method of cross-node |
CN112559574A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Data processing method and device, electronic equipment and readable storage medium |
CN112559574B (en) * | 2020-12-25 | 2023-10-13 | 北京百度网讯科技有限公司 | Data processing method, device, electronic equipment and readable storage medium |
CN112799589A (en) * | 2021-01-14 | 2021-05-14 | 新华三大数据技术有限公司 | Data reading method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102541985A (en) | Organization method of client directory cache in distributed file system | |
CN103020315B (en) | A kind of mass small documents storage means based on master-salve distributed file system | |
US9710535B2 (en) | Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories | |
CN102158546B (en) | Cluster file system and file service method thereof | |
TWI472935B (en) | Scalable segment-based data de-duplication system and method for incremental backups | |
CN103177027B (en) | Obtain the method and system of dynamic Feed index | |
CN103530387A (en) | Improved method aimed at small files of HDFS | |
CN105868286B (en) | The parallel method of adding and system merged based on distributed file system small documents | |
CN103856567A (en) | Small file storage method based on Hadoop distributed file system | |
CN102332027A (en) | Mass non-independent small file associated storage method based on Hadoop | |
CN102385623B (en) | Catalogue access method in DFS (distributed file system) | |
US20120290595A1 (en) | Super-records | |
CN104850572A (en) | HBase non-primary key index building and inquiring method and system | |
CN105117417A (en) | Read-optimized memory database Trie tree index method | |
CN100424699C (en) | Attribute extensible object file system | |
CN104408111A (en) | Method and device for deleting duplicate data | |
CN102024019B (en) | Suffix tree based catalog organizing method in distributed file system | |
CN103577470A (en) | File system and method for improving performances of web server | |
CN102521419A (en) | Hierarchical storage realization method and system | |
CN103559229A (en) | Small file management service (SFMS) system based on MapFile and use method thereof | |
CN104111898A (en) | Hybrid storage system based on multidimensional data similarity and data management method | |
CN105354250A (en) | Data storage method and device for cloud storage | |
CN106446099A (en) | Distributed cloud storage method and system and uploading and downloading method thereof | |
CN103942301B (en) | Distributed file system oriented to access and application of multiple data types | |
CN102915340A (en) | Expanded B+ tree-based object file system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120704 |