CN102385623B - Catalogue access method in DFS (distributed file system) - Google Patents

Catalogue access method in DFS (distributed file system) Download PDF

Info

Publication number
CN102385623B
CN102385623B CN 201110328295 CN201110328295A CN102385623B CN 102385623 B CN102385623 B CN 102385623B CN 201110328295 CN201110328295 CN 201110328295 CN 201110328295 A CN201110328295 A CN 201110328295A CN 102385623 B CN102385623 B CN 102385623B
Authority
CN
China
Prior art keywords
directory entry
band
idle
catalogue
directory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110328295
Other languages
Chinese (zh)
Other versions
CN102385623A (en
Inventor
杨浩
马照云
马振杰
邵宗有
刘新春
苗艳超
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN 201110328295 priority Critical patent/CN102385623B/en
Publication of CN102385623A publication Critical patent/CN102385623A/en
Application granted granted Critical
Publication of CN102385623B publication Critical patent/CN102385623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In the invention, the content of catalogues is stored by documents, and as for the catalogue items in the catalogues, the catalogue items are divided into a plurality of subsets by hashing the titles of the catalogue items. Different subsets are stored in the catalogue documents in a striped manner, and the strip is relatively larger, so that the pre-reading function of a bottom file system can be fully utilized when a disc is read. The catalogue items in the strips of each subset are stored in a binary tree manner, so that the binary tree can be prevented from being set when initial reading is conducted. All the strips are accessed in a memory map (mmap) mode, so that expenditure caused by memory allocation and invocation of a read-write system of a system document when the disc data is accessed each time can be avoided.

Description

The access method of catalogue in a kind of distributed file system
Technical field
The present invention relates to the organizational form of directory entry in the distributed file system, specifically, relate to the access method of catalogue in a kind of distributed file system.
Background technology
Along with rapid development of network technology, diverse network is used for the demand of storage increasing.The storage demand of network application roughly is divided into two kinds, and a kind of is that big file is main storage demand, uses as audio-video network, and the characteristics that this class is used are that number of files is few, but the size of single file normally GB even TB rank; Another is based on the storage demand of small documents, and as online shopping mall etc., the characteristics of this class demand are that single file is little, but quantity of documents is huge.
In order to satisfy the demand of storage, distributed file system is introduced in during diverse network uses, and this is wherein representative to be the Google FS of Google, the HDFS of Hadoop etc.The characteristics of this class distributed file system are that the operation for big file has reasonable performance, if but the small documents of enormous amount is arranged under the single catalogue, then the efficient of the catalog item of this type of file system then is difficult to satisfactory, especially for non-focus catalogue, be when not having buffer memory in the current internal memory, its delay is more obvious.Trace it to its cause, mainly be since the directory entry number in the catalogue more for a long time, catalogue file big or small relatively large, needs all read in disk with all directory contents during retrieval, cause disk operating too much, have wasted the plenty of time; Another side, when directory entry more for a long time, internal memory retrieval adopts the mode that makes up binary tree to carry out usually, the needed time of contributing can become big along with the increase of directory entry number.
Summary of the invention
The present invention is intended to disclose in a kind of distributed file system directory entry method for organizing efficiently, solves in the distributed file system, and when storing the magnanimity directory entry under the single catalogue, the problem that the directory entry recall precision is low.
The access method of catalogue in a kind of distributed file system,
Directory entry in the single catalogue is stored in the same file, marks off the catalogue subclass, carry out Hash operation according to the title of described directory entry, directory entry is assigned in the catalogue subclass.
Preferably, the number of described catalogue subclass is 1024 or its integral multiple.
Preferably, described catalogue subclass piecemeal carries out the striping storage, and each stripe size is 256KB.
Preferably, the band of described catalogue subclass adopts memory-mapped to map directly to the virtual memory zone when visit.
Preferably, the directory entry in the described band adopts the binary tree mode to store.
Preferably, the index information of described band head memory stick band in-list item, described index information is only stored the relativity shift of physical directory item in band, removes to read in the respective offsets directory entry during visit.
Preferably, described index is according to the ordering of directory entry title, and the space for idle in the band fictionalizes the free space directory entry, and by the free space size ordering.
Preferably, described band calculates the maximum directory entry number that this band can carry when initialization, reserves the space storage index information of respective number at the band head.
Preferably, during initialization, described band has only an idle directory entry, and this directory entry is whole band.
Preferably, when described directory entry need add, at first go out the catalogue subclass at directory entry place according to the character string Hash calculation, travel through all bands in this subclass then; Binary chop in the idle directory entry of band obtains holding minimum idle of target directory item, revises former idle directory entry size then, and with after the initialization of target directory item, its skew is inserted in the index set of validation category item.
Preferably, if described idle directory entry ratio needs the directory entry of interpolation a lot of greatly, need this directory entry is split into a new idle directory entry and is used for storing the directory entry of target.
Preferably, described directory entry at first calculates the directory entry subclass when carrying out search operation, travels through each band piece of this subclass then, according to validation category entry index binary chop, up to finding out the target directory item.
Preferably, when described directory entry need be deleted, need to obtain corresponding directory entry by index earlier, revise its directory entry content then, and according to the character of front and back directory entry, determine whether to merge with the idle directory entry of front and back, if can not merge, then generate a new idle directory entry, otherwise merge with the idle directory entry in front and back; Afterwards with the deletion from validation category entry index set of target directory entry index, and newly-generated idle directory entry joined in the middle of idle directory entry indexed set closes.
In the present invention, the content of catalogue uses file to store, and for the directory entry in the catalogue, by the directory entry title is carried out Hash, directory entry is divided into some subclass.For different subclass, its mode according to striping in catalogue file is stored, band big or small relatively large so that can take full advantage of the bottom document system during reading disk read function in advance.For all directory entries in the band piece of each subclass, adopt the mode of binary tree to store, to avoid to set up binary tree when reading for the first time.For all band pieces, adopt the mode of memory-mapped (mmap) to conduct interviews, carry out the expense that Memory Allocation, the system call of calling system file read-write are brought when avoiding visiting data in magnetic disk at every turn.
Description of drawings
Fig. 1 is the structural drawing of band piecemeal among the present invention
Fig. 2 is band cut-away view among the present invention
Embodiment
It is as follows to do specific descriptions below in conjunction with accompanying drawing:
(1) for single catalogue, its all directory entries are stored in same file, for all directory entries, its title are carried out Hash, and it is divided into N subclass (among the present invention, N selects 1024).
(2) for each subclass, piecemeal carries out striping storage, storage mode such as Fig. 1.In this programme, each stripe size is chosen 256K, and is consistent with the pre-reading window size of Linux Virtual File System.For each catalogue band piece, adopt memory-mapped that it is mapped to a slice virtual memory zone, can directly conduct interviews like this, do not need loaded down with trivial details management structure and read to make up operation.
(3) directory entry of each band piece inside adopts the mode of binary tree to store, and its structure is seen Fig. 2.As shown in Figure 2, at the head of each band piece, storing the index information of all directory entries in the piece.Each index is only stored the physical directory item at the relativity shift of piece inside, during actual access, according to index, removes to go in the respective offsets to read corresponding directory entry.All index sort according to the order of directory entry title, and the space for idle in the band piece fictionalizes special directory entry, to these directory entries, sort according to the size of free space.
(4) for each band piece, when initialization, calculate the maximum directory entry number that this band piece may carry, reserve the space of respective number in band build portion and store index information.But when initial, whole band piece has only the directory entry of a free time, and this directory entry is contained whole band piece.
(5) when needs add directory entry in the band piece, at first go out the directory entry subclass at directory entry place according to the character string Hash calculation, travel through all the band pieces in this subclass then: binary chop in the idle directory entry of band piece, obtain to hold the minimum idle item of target directory item, if idle directory entry is more a lot of greatly than the directory entry that needs to add, for wasting space not, need that this directory entry split into a new idle directory entry and be used for the directory entry of storage target.Revise former idle directory entry size then, and with after the initialization of target directory item, its skew is inserted in the index set of validation category item, owing to be to adopt the mode of mmap to operate directory block, so this modification can finally be reflected in the disk and goes.
(6) when needs are searched directory entry, the directory entry subclass is at first calculated in similar interpolation, travels through each band piece of this subclass then, according to validation category entry index binary chop, up to finding out the target directory item.
When (7) deltreeing, need to obtain corresponding directory entry by index earlier, revise its directory entry content then, and according to the character of front and back directory entry, determine whether to merge with the idle directory entry of front and back, if can not merge, then generate a new idle directory entry, otherwise merge with the idle directory entry in front and back.Afterwards, with the deletion from validation category entry index set of target directory entry index, and newly-generated idle directory entry joined in the middle of idle directory entry indexed set closes.

Claims (9)

1. the access method of catalogue in the distributed file system is characterized in that:
Directory entry in the single catalogue is stored in the same file, marks off the catalogue subclass, carry out Hash operation according to the title of described directory entry, directory entry is assigned in the catalogue subclass;
When described directory entry need add, at first go out the catalogue subclass at directory entry place according to the character string Hash calculation, travel through all bands in this subclass then; Binary chop in the idle directory entry of band obtains holding minimum idle of target directory item, revises former idle directory entry size then, and with after the initialization of target directory item, its skew is inserted in the index set of validation category item;
If described idle directory entry ratio needs the directory entry of interpolation a lot of greatly, need this directory entry is split into a new idle directory entry and is used for storing the directory entry of target;
Described directory entry at first calculates the directory entry subclass when carrying out search operation, travel through each band piece of this subclass then, according to validation category entry index binary chop, up to finding out the target directory item;
When described directory entry need be deleted, need to obtain corresponding directory entry by index earlier, revise its directory entry content then, and according to the character of front and back directory entry, determine whether to merge with the idle directory entry of front and back, if can not merge, then generate a new idle directory entry, otherwise merge with the idle directory entry in front and back; Afterwards with the deletion from validation category entry index set of target directory entry index, and newly-generated idle directory entry joined in the middle of idle directory entry indexed set closes.
2. the method for claim 1 is characterized in that: the number of described catalogue subclass is 1024 or its integral multiple.
3. the method for claim 1 is characterized in that: described catalogue subclass piecemeal carries out the striping storage, and each stripe size is 256KB.
4. method as claimed in claim 3 is characterized in that: the band of described catalogue subclass adopts memory-mapped to map directly to the virtual memory zone when visit.
5. method as claimed in claim 3 is characterized in that: the directory entry in the described band adopts the binary tree mode to store.
6. method as claimed in claim 5, it is characterized in that: the index information of described band head memory stick band in-list item, described index information is only stored the relativity shift of physical directory item in band, removes to read in the respective offsets directory entry during visit.
7. method as claimed in claim 6 is characterized in that: described index is according to the ordering of directory entry title, and the space for idle in the band fictionalizes the free space directory entry, and by the free space size ordering.
8. method as claimed in claim 3, it is characterized in that: described band calculates the maximum directory entry number that this band can carry when initialization, reserves the space storage index information of respective number at the band head.
9. method as claimed in claim 8, it is characterized in that: during initialization, described band has only an idle directory entry, and this directory entry is whole band.
CN 201110328295 2011-10-25 2011-10-25 Catalogue access method in DFS (distributed file system) Active CN102385623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110328295 CN102385623B (en) 2011-10-25 2011-10-25 Catalogue access method in DFS (distributed file system)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110328295 CN102385623B (en) 2011-10-25 2011-10-25 Catalogue access method in DFS (distributed file system)

Publications (2)

Publication Number Publication Date
CN102385623A CN102385623A (en) 2012-03-21
CN102385623B true CN102385623B (en) 2013-08-28

Family

ID=45825039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110328295 Active CN102385623B (en) 2011-10-25 2011-10-25 Catalogue access method in DFS (distributed file system)

Country Status (1)

Country Link
CN (1) CN102385623B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473337A (en) * 2013-09-22 2013-12-25 北京航空航天大学 Massive catalogs and files oriented processing method in distributed type storage system
CN103744882B (en) * 2013-12-20 2018-05-25 浪潮(北京)电子信息产业有限公司 A kind of browse film segment table based on key-value pair shows method and device
CN103823865A (en) * 2014-02-25 2014-05-28 南京航空航天大学 Database primary memory indexing method
CN105988873B (en) * 2015-02-04 2019-10-08 深圳神州数码云科数据技术有限公司 A kind of method and device of optimization processing resource
CN105138545B (en) * 2015-07-09 2018-10-09 中国科学院计算技术研究所 The asynchronous method and system pre-read of directory entry in a kind of distributed file system
CN105159616A (en) * 2015-09-11 2015-12-16 浪潮(北京)电子信息产业有限公司 Disk space management method and device
CN105677789A (en) * 2015-12-31 2016-06-15 浪潮(北京)电子信息产业有限公司 Method and system for managing directory capacity of distributed file system
CN106021462A (en) * 2016-05-17 2016-10-12 深圳市中博科创信息技术有限公司 File storage method of cluster file system and cluster file system
CN106570113B (en) * 2016-10-25 2022-04-01 中国电力科学研究院 Mass vector slice data cloud storage method and system
US11507533B2 (en) 2018-02-05 2022-11-22 Huawei Technologies Co., Ltd. Data query method and apparatus
CN108763589B (en) * 2018-06-20 2021-12-07 程慧泉 Directory system of distributed file system and implementation method thereof
CN110245122B (en) * 2019-05-08 2022-08-09 华为技术有限公司 Data processing method and KV storage system
CN111966733B (en) * 2020-08-18 2024-05-28 中国银行股份有限公司 Hot spot knowledge generation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1614591A (en) * 2004-12-02 2005-05-11 中国科学院计算技术研究所 Method for organizing and accessing distributive catalogue of document system
CN1692356A (en) * 2002-11-14 2005-11-02 易斯龙系统公司 Systems and methods for restriping files in a distributed file system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1692356A (en) * 2002-11-14 2005-11-02 易斯龙系统公司 Systems and methods for restriping files in a distributed file system
CN1614591A (en) * 2004-12-02 2005-05-11 中国科学院计算技术研究所 Method for organizing and accessing distributive catalogue of document system

Also Published As

Publication number Publication date
CN102385623A (en) 2012-03-21

Similar Documents

Publication Publication Date Title
CN102385623B (en) Catalogue access method in DFS (distributed file system)
CN100468402C (en) Sort data storage and split catalog inquiry method based on catalog tree
CN103019953B (en) Construction system and construction method for metadata
CN104850358B (en) A kind of magneto-optic electricity mixing storage system and its data acquisition and storage method
CN101996217B (en) Method for storing data and memory device thereof
CN102541985A (en) Organization method of client directory cache in distributed file system
CN101464901B (en) Object search method in object storage device
CN103164490B (en) A kind of efficient storage implementation method of not fixed-length data and device
CN104408111A (en) Method and device for deleting duplicate data
CN102332027A (en) Mass non-independent small file associated storage method based on Hadoop
CN106446001B (en) A kind of method and system of the storage file in computer storage medium
CN110347852A (en) It is embedded in the file system and file management method of key assignments storage system extending transversely
CN102779180A (en) Operation processing method of data storage system and data storage system
CN106066896A (en) A kind of big Data duplication applying perception deletes storage system and method
CN103020174A (en) Similarity analysis method, device and system
CN103885887B (en) User data storage method, read method and system
CN111367469B (en) Method and system for migrating layered storage data
CN101571869B (en) File memory and read method of smart card and device thereof
CN102479189B (en) A kind of magnanimity timestamp type data high-speed uniform index of reference method in internal memory
CN102779138B (en) The hard disk access method of real time data
CN103139300A (en) Virtual machine image management optimization method based on data de-duplication
CN104281717B (en) A kind of method for setting up magnanimity ID mapping relations
CN102024019B (en) Suffix tree based catalog organizing method in distributed file system
CN113821171B (en) Key value storage method based on hash table and LSM tree
CN103019887A (en) Data backup method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220729

Address after: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee after: Dawning Information Industry (Beijing) Co.,Ltd.

Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Address before: 100084 Beijing Haidian District City Mill Street No. 64

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.