CN100483420C

CN100483420C - Fine grit document and catalogs version management method based on snapshot

Info

Publication number: CN100483420C
Application number: CNB2007101770653A
Authority: CN
Inventors: 舒继武; 薛巍; 向小佳
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2007-11-09
Filing date: 2007-11-09
Publication date: 2009-04-29
Anticipated expiration: 2027-11-09
Also published as: CN101162469A

Abstract

The present invention relates to a fine-granularity files and directory edition management method based on snapshots, belonging to the multi-version document system field. The present invention separates a name space consisting of files and dirnames in a whole file system from an edition space representing the generating periods of different editions, and adopts relatively independent strategies to execute management, forming a hierarchical two-dimensional structure, i.e. forming a hierarchical structure from a root directory to a file in the name space; in the edition space, the editions of files and directory are organized through an index structure chronologically, forming a hierarchical structure in the edition space. The retrieval of the name space adopts an index strategy based on dynamic hash. The retrieval of the edition space adopts an index strategy based on a red-black tree. The directory edition and file edition respectively adopt a red-black tree structure variant aiming at the respective characteristics. The present invention can greatly improve the usability and the performance of the system, and controls the amount of consumption of time and space resulting from the maintenance of historical editions in an acceptable scope.

Description

Fine granularity file and directory versions management method based on snapshot

Technical field

Fine granularity file and directory versions management method based on snapshot belong to many version files system field, relate in particular to the tissue and the searching field of generation, file and the catalogue data of file and directory versions.

Background technology

Many version files system is a kind of like this file system with high reliability: save as different versions by the data with file with historical data, file system can be inquired about the data that correct version recovers loss automatically when user misoperation or the system failure cause data degradation; Simultaneously, file system can provide the change records of file data for customer analysis file access pattern, the suspicious data variation of tracking.Many version files of tradition system mainly realizes the reservation of version by log and the real-time duplication technology of historical data.The former will be recorded in the daily record with the form of a record each change of file, and the amount of taking up room is big, needs rewind journal, poor performance when usage log recovers historical data simultaneously; The latter concentrates file system historical data at a time and copies in the history data store space of opening up specially, not only poor performance, be difficult for to realize that online copy, the amount of taking up room are big, and the operation granularity that keeps whole file system historical data reflection is thick excessively, can not satisfy the demand of user flexibility reserve part catalogue and file content, be unfavorable for management.Simultaneously, use traditional many version files system of above technology when the appointment old version of retrieving files, often to need all versions before the linear traversal, have performance bottleneck.

Fine granularity file and directory versions management method based on snapshot have proposed a whole set of new version generation, tissue and retrieval technique, have effectively solved the problems referred to above.

Summary of the invention

The object of the present invention is to provide highly reliable high performance many version files system that can comprehensively satisfy network service and scientific calculation service demand, realize the online in real time protection of data.Emphasis of the present invention is: the design of lightweight fine granularity version generting machanism and efficient version retrieval module.

The invention is characterized in: snapping technique is a lightweight, and snapshot only keeps basic temporal information when carrying out, and all copy functions (comprising the copy of data and the copy of metadata) are transferred to when real needs are revised to be carried out again.

With the name space formed by file and dirname in the whole file system and the version that generates by different time form version space independent, adopt relatively independent strategy to manage.In the name space, son file that is relative to each other and sub-directory leave under the same parent directory, and formation to sub-directories at different levels, is arrived the hierarchical structure of file, all corresponding a series of version of each file here and catalogue from root directory at last; In the version space, the version of file and catalogue was organized by index structure according to the time that version generates, and file that the rise time is close and sub-directory version leave under the same parent directory, formed the hierarchical structure in the version space.

The index strategy based on dynamic Hash has been adopted in the retrieval of name space, is used for replacing the linear directory strategy in the traditional file systems, not only makes the retrieval time of linear the increasing with the expansion of catalogue scale, and has guaranteed lower exceptional space occupancy.The index strategy based on RBTree has been adopted in the retrieval of version space, directory versions and FileVersion adopt respectively at the RBTree structural variant of characteristics separately, and respective meta-data leaves in the inode structure (being used for representing the data structure of file or catalogue in the class unix system) of version correspondence.

Described fine granularity file and directory versions management method based on snapshot contains following steps successively:

Step 1: the fine granularity version generates

Step 1.1, carry out snapshot operation in the following manner:

Write down the position and the time of snapshot when snapshot generates in system's corresponding data structure, the time behind the snapshot is recorded in respectively: in the global variable of file system, be called the timestamp of overall snapshot, promptly overall snapepoch; The user carry out snapshot operation at file or the metadata of the current version of catalogue in, the local express that is called this document or catalogue is according to timestamp, promptly local snapepoch;

Step 1.2, revised file or catalogue in the following manner:

In file system hierarchical structure tree, top-down execution is by root directory current version seeking directly to the current version of the catalogue that will revise or file.To seek in the track each file of process or the current version of catalogue, find out its parent directory in name space, the local snapepoch to this version adjusts as follows:

Local snapepoch=MAX (snapepoch of the parent directory current version of catalogue or file, local snapepoch);

Step 1.3, whether described file to be revised of determining step 1.2 or catalogues should generate new version after the time adjusting snapshot: relatively epoch in the Inode of the current version of catalogue that step 1.2 was revised or file and the value of snapepoch, if: snapepoch is greater than epoch, this version also was not modified after the last snapshot operation was described, this current version is expired, need to keep legacy data, and the Inode metadata of duplicating this catalogue or file current version, keep as old version, join in the index structure, write down the data sharing information of current version and old version simultaneously by bitmap form; Otherwise it is expired to illustrate that current version does not have, and need not keep legacy data and metadata, directly revises the related data of current version, revises the data sharing bitmap of current version simultaneously.

Step 2: in name space, set up quick indexing

Catalogue is inner in the name space adopts dynamic Hash table to organize each directory entry, compare with the conventional linear index, organize each directory entry to have the fast and constant advantage of retrieval rate with dynamic Hash table, simultaneously for common Hash table, its space consuming can be with the catalogue scale adaptive change, favorable expandability and space availability ratio height.Dynamically Hash table has used gang's hash function, and wherein each function corresponding address space size satisfies the exponential rule that increases progressively.When the catalogue scale hour, we use the hash function of less address space, when the directory content increase causes current hash function address space can't hold certain directory entry, need carry out the upgrading of hash function, enlarge address space, otherwise then will demote.

In name space, set up quick indexing according to the following steps:

Step 2.1, in the name space catalogue, set up dynamic Hash table, wherein, each list item is represented sub-directory or the son file in the catalogue, its address is stored in the record mapping table i_block territory among the directory versions Inode, and the currently used hash function of this catalogue is stored in the hash territory among the Inode; Each list item of this dynamic Hash table comprises two parts: (a) pointer pointer, point to a basic unit of storage bucket, be set at the size of a physical block, internal memory the directory entry of sub-directory or son file, and wherein elongated data block representative is stored in the directory entry among this bucket, and this directory entry contains name and the sub-directory of this directory entry representative or Inode number of son file of sub-directory or son file; (b) the rank Level of current list item, the number of times of representing list item indication bucket to be divided, when the directory entry of new insertion is mapped to certain list item, and the pointer of this list item bucket pointed do not have enough spaces to hold this directory entry, and bucket just needs division; Dynamically each hash table of Hash table has different memory addresss, as the cryptographic hash of name is next sub-directory or son file is mapped to this list item; In this dynamic Hash table, adopted one group of hash function h ₀, h ₁H _k,

, n is the maximum number of the open ended directory entry of catalogue in this system, i ∈ 0,1,2 ...., k}, (i is the rank level of hash function), h _i=hmod2 ¹(h is the traditional hash function with even distribution character, message digest algorithm MD5 for example, Secure Hash Algorithm SHA).

Step 2.2, as input, the count results of hash function is corresponding sub directory or the pairing list item of the son file address in dynamic hash index table with subdirectory name or son file name;

Step 3: in version space, set up quick indexing

Step 3.1 is set up quick indexing based on the embedded RBTree of Inode for the different editions of same catalogue in the file system, and with the metadata of catalog version, its step is as follows:

Step 3.1.1, the leaf node of this RBTree does not have correspondent entity only as external node, only exists for keeping a RBTree; The non-leaf node of this RBTree is as inner node, and wherein each inner node is corresponding one by one with certain concrete version of catalogue;

Step 3.1.2, the data of inner node all are stored among the Inode that represents this directory versions, leave in the external memory, store following information: the node key value, rise time epoch for version, point to the pointer of the corresponding father node of this node, point to the pointer of left subtree, point to the pointer of right subtree and the color of this node; External node does not have correspondent entity, only is present in the internal memory, stores following information: point to the pointer of the corresponding father node of this node, the color of external node (defaulting to black); Between each inner node, node sorts by key value: be benchmark with the root node, the inside node that the node key value is bigger than root node is represented the later directory versions of creation-time, is arranged in the left subtree of root node, otherwise, then be arranged in right subtree.According to RBTree tree body adjustment algorithm, along with the insertion and the deletion of node, the position of each node (containing root node) can adaptive change.

Step 3.1.3, the catalogue redaction is the Inode initialize of this directory versions of representative after generating, and puts the rise time of node key value for this version, puts each pointer for empty, setpoint color is red.

Step 3.1.4, in the RBTree index structure, search for the position that this directory versions should be inserted according to the version key value, indexing means is as follows: the key value that at first relatively is inserted into node and root node, if the former is greater than the latter, then advance to and continue search in the left subtree of root node, otherwise, advance in the right subtree, by that analogy, till arriving at leaf node, the father node of buffer memory leaf node; Replace leaf node with this node then, revise corresponding this node of subtree pointed in the father node, revise the father node pointed father node in this node, and be two outside child nodes about this node generation automatically, establishing its color simultaneously is black, as new leaf node.Because the uniqueness of version rise time equates with this node key value if find existing node in the search procedure, then reports an error and withdraws from.

Step 3.1.5 checks the tree body structure, if imbalance then be adjusted accordingly, concrete principle is: node all comprises " black " node of similar number to every simple path of its descendants's node, and can not there be two continuous red nodes in the while; Concrete grammar is: check the color of this version node and the color of its father node, if both are not redness, then EO simultaneously; Otherwise, set the body adjustment algorithm according to the symmetrical y-bend B tree that Bayer proposed in 1972, do left-handed or right-handed operations to the corresponding subtree that comprises this node, in subtree, reset the node color, make this subtree satisfy adjust principle, keep tree body balance, and the subtree root node after finishing with rotation is a target, carry out the inspection of next round, so move in circles.

Step 3.2: set up the quick indexing of the heavy clue RBTree of cum rights for the different editions of identical file in the file system, with the metadata of retrieving files version, its step is as follows:

Step 3.2.1, the non-leaf node of this RBTree is inner node, only uses as index structure, does not have correspondent entity; The leaf node of this RBTree is as external node, and is corresponding one by one with certain concrete version of file;

Step 3.2.2, the data of external node all are stored among the Inode that represents this document version, leave in the external memory, store following information: the node key value, rise time epoch for version, point to the pointer of the corresponding father node of this node, the node weight is pointed to forerunner in the weight chained list and follow-up pointer and the color of this node respectively; Inner node only is present in the internal memory, store following information: clue, promptly point in the RBTree inorder traversal should inside node forerunner pointer (this forerunner must be an external node), be used for extracting forerunner's key value, point to the pointer of the corresponding father node of this node, point to the pointer of left subtree, point to the pointer of right subtree and the color of this node; Between each inner node, node is by its clue indication forerunner's key value ordering, the similar embedded RBTree of sortord, not existing together only is that each inner node does not have key value, the key value of the external node that the clue of necessary this inside node of use is pointed substitutes.

Step 3.2.3, the file redaction is the Inode initialize of representative this document version after generating, and puts the rise time of respective external node key value for this version, puts its father node pointer and weight chain list index for empty, setpoint color is a black.

Step 3.2.4, in the RBTree index structure, search for the position that this document version should insert according to the version key value, indexing means is as follows: the key value that at first relatively is inserted into node key value and root node clue indication forerunner, if the former is greater than the latter, then advance to and continue search in the left subtree of root node, otherwise, advance in the right subtree, by that analogy, till arriving at leaf node, this leaf node of buffer memory (hereinafter sibling is called in letter) and father node thereof; Generate new inside node then, initialization should inside node color be red, make the above-mentioned father node of father node pointed of this inside node simultaneously, with the leaf node of above-mentioned buffer memory and be inserted into two child nodes of external node as this inside node, the clue that initialization simultaneously should the inside node is pointed to its left child node.Because the uniqueness of version rise time equates with this version key value if find the key value of existing node in the search procedure, then reports an error and withdraws from.

Step 3.2.5 checks the tree body structure, if imbalance then be adjusted accordingly is adjusted concrete principle and is: node all comprises " black " node of similar number to every simple path of its descendants's node, and can not there be two continuous red nodes in the while; Concrete grammar is: the father node (inner node) with new insertion version node (external node) is a start node, checks the color of this node and the color of its father node, if both are not redness, then EO simultaneously; Otherwise, set the body adjustment algorithm according to the symmetrical y-bend B tree that Bayer proposed in 1972, do left-handed or right-handed operations to the corresponding subtree that comprises this node, in subtree, reset the node color, make this subtree satisfy adjust principle, keep tree body balance, and the subtree root node after finishing with rotation is a target, carry out the inspection of next round, so move in circles.

Step 3.2.6 is linked into this version node in the weight chained list, and concrete grammar is as follows: represent all Inode of the existing version of this document to be linked into a weight chained list according to the weight size, getting weighted value in our design is the node key value; If above (step 3.2.4) described sibling is the Zuo brother of this version node, then with the follow-up insertion weight chained list of this version node as this sibling, the weight chain list index that the original successor node of sibling, this version node, sibling is set makes these three nodes link chaining in regular turn; If sibling is the right brother of this version node, then this version node is inserted the weight chained list as the forerunner of this sibling, the weight chain list index that the original forerunner's node of sibling, this version node, sibling are set makes these three nodes link chaining in regular turn.

Advantage of the present invention is as follows:

(1) snapping technique of lightweight only writes down the time of snapshot, the copy function of data and metadata is disperseed and delays execution, makes that the execution time of snapshot operation is short, can ignore to the influence of other operations of system.

(2) fine-grained snapping technique can overcome the shortcoming that existing many version files system can not keep version to partial directory and file, make the user keep version, flexible configuration version generation strategy to catalogue and file selectively.

(3) separate version tissue and the index structure of name space and version space, can make full use of the tight coupling between the version of same catalogue and file, the version of same catalogue and file is organized together, leave close physical location in, be convenient to it is managed as a whole, improved retrieval rate simultaneously.

(4) set up hierarchical structure between parent directory version and sub-directory version, FileVersion at version space, the sub-directory version and the FileVersion that make the close time generate store together.The version that time is approaching more, correlativity are high more, and the probability that leaves under the same parent directory version is also high more, utilize this characteristic can quicken from the root to the assigned catalogue or the retrieving of FileVersion.

(5) name space adopt dynamic Hash table organize and catalog in directory entry, have the fast and constant advantage of retrieval rate, simultaneously its space consuming can be with the catalogue scale adaptive change, favorable expandability and space availability ratio height.

(6) to adopt tree construction be that numerous versions of same catalogue or file are set up index to version space, and can overcome existing many version files system linearity indexed search time increases and the linear shortcoming that increases with version number.Index data structure is kept among the inode of respective version, does not change the layout of file system in storage medium, and is compatible good.Simultaneously, different according to catalogue and file access pattern adopt slightly variant RBTree structure to set up index, take into account the consumption in time and space simultaneously.

The present invention tests in department of computer science, Tsinghua university high-performance calculation technical institute.The result shows, the function that can realize flexible configuration version generation strategy based on the fine granularity file and the directory versions management method of snapshot, effectively improved the performance of catalog and file history version, simultaneously, time and space expense that maintenance release brought are little.

To based on the test of the fine granularity file of snapshot and directory versions management method respectively from the old version attribute access time, read-write average reaction time and system space take three aspects to carry out.Test is as follows with server configures: Intel Xeon 2GHz processor; The 512MB internal memory; Adaptec aic7902 Ultra320 SCSI adapter; SEAGATE ST336607LW hard disk, capacity are 34GB.(SuSE) Linux OS, kernel version 2 .4.22 are adopted in experiment.We adopt and have realized a prototype system thvfs based on the fine granularity file and the catalog management method of snapshot, experiment carries between file system ext3, the ext3cow of many version files system and the wayback at thvfs, linux to be carried out, the ext3cow version is 0.1.4, and the wayback version is 1.0.1.We will test with hard disk partition is a subregion, and each file system and test are installed in regular turn.Experiment has used the file trace player of department of computer science, Tsinghua university high-performance calculation technical institute exploitation as testing tool, and the file trace:Research that people such as the Berkeley branch school Roselli of university of use California, USA university gathered in 1997 is as test data.Test result is seen Fig. 5, Fig. 6, Fig. 7, Fig. 8.

From test result as can be seen: as Fig. 5, with regard to the old version access performance, our prototype system thvfs is more famous, and many version files ext3cow of system has improved 34.4%.During Trace play, as Fig. 6, with respect to ext3, the performance of reading of thvfs improved 12%; As Fig. 7, to compare with the wayback of many version files system, ext3cow, the additive decrementation of thvfs on write performance is minimum; Simultaneously, as Fig. 8, under the high-frequency of a snapshot of generation in per 72 minutes, thvfs safeguards that all old versions only need 70% exceptional space.

Description of drawings

Fig. 1. the hierarchical structure figure of name space and version space.

Fig. 2. dynamic hash index synoptic diagram.

Fig. 3. be used for setting up the embedded RBTree of inode of directory versions index.

Black inner node

Red inner node

Puppet's external node

Fig. 4. be used for setting up the heavy clue RBTree of cum rights of FileVersion index.

The chain list index

Clue

Black inner node

Red inner node

External node

The comparison of Fig. 5 .ext3cow and thvfs version access time.

—■—ext3cow

Read average response time (ART) relatively in Fig. 6 .trace experiment.

—▲—ext3

—★—ext3cow

—■—wayback

Write average response time (ART) relatively in Fig. 7 .trace experiment.

—▲—ext3

—★—ext3cow

—■—wayback

—●—thvfs

Fig. 8. space hold amount contrast synoptic diagram.

Fig. 9. schematic flow sheet of the present invention.

Embodiment

Fine granularity file and directory versions management method based on snapshot need be done certain expansion to the key data structure of file system, and particular content is as follows:

The expansion of Inode: Inode is a data structure of representing file or catalogue in the file system.In traditional file systems, a catalogue or the only corresponding inode of file; In many version files system, a catalogue or the corresponding a plurality of inode of file, reason is: catalogue in many version files system and file have a plurality of versions, and each version all has independently inode.The content that increases in the inode data structure comprises: the operating system time when (1) epoch, version generate, and the corresponding different epoch of different editions of file or catalogue, more early its corresponding epoch value of the version of Sheng Chenging is more little, otherwise big more; (2) snapepoch is used for depositing the last time to catalogue or file execution snapshot operation, and is relevant with the fine granularity snapshot; (3) share bitmap (share bitmap), be used for depositing the data sharing relation between identical file or the catalogue different editions; (4) index structure (index structure) is deposited and is used for the pointer and the correlation behavior of other version of index.In addition, for directory versions, the peculiar variation of its inode structure comprises: the pointer i_block of (1) index data piece becomes the dynamic hash index table of sensing by pointing to a linear directory table; (2) increase to point to the pointer hash of hash function, in gang's hash function which what indicate that current system uses is.

The expansion of Dentry.Directory entry in the dentry representation file system, if the catalogue in the file system is regarded as a table, each bar record wherein is exactly a directory entry.The content that increases in the dentry data structure is: life cycle two tuples (life cycle tuple).Its concrete form is＜death_epoch birth_epoch 〉; System time when on behalf of dentry, birth_epoch be created, i.e. date of birth; System time when death_epoch represents dentry deleted, i.e. death time.

Based on generation and management that the fine granularity file and the directory versions management method of snapshot can be supported assigned catalogue or FileVersion, its basic thought mainly comprises following 2 points:

In the system metadata corresponding construction, write down the position and the time of snapshot when at first, snapshot generates.The time keeping of snapshot is in the metadata of the catalogue of carrying out snapshot operation or file current version, promptly in the snapepoch territory of each catalogue or file current version inode.For example: the snapshot time to overall snapshot that whole file system is done is recorded among the snapepoch of system root directory current version inode.

Secondly, when the current version of catalogue or file is made amendment, judge whether generate redaction, then copy corresponding metadata and data if desired according to the temporal information of this catalogue or file present position and current version.This is the difficult point of fine granularity version generation technique, and reason is: catalogue or file are ancestors' such as its parent directory, grandfather's catalogue and root directory descendants simultaneously, and the snapshot of being done on arbitrary ancestors all can exert an influence to the version of this catalogue or file.So, judge a catalogue or file whether should generate redaction need travel through in regular turn its all ancestors (root directory ..., grandfather's catalogue is up to parent directory) current version, the snapshot time of all ancestors' current version is adjusted and compares.The cardinal rule of adjusting is: the snapshot time of sub-directory or file current version should be later than or equal the snapshot time of parent directory current version.Cardinal rule relatively is: if the rise time of the catalogue of being modified or file current version (being recorded in the epoch territory of inode) is early than the snapshot time of this version, this version also was not modified after the last snapshot operation was described, then need to generate redaction, otherwise then do not generate.

Fine granularity version generating algorithm example is as follows: fine granularity impinges upon when carrying out on certain catalogue or the file soon, at first, is to be recorded in the current time among the snapepoch of global variable superepoch and this catalogue or file current version inode with the snapshot time.Then, when needs were revised this catalogue or file, in the file system hierarchical structure, top-down execution was by the seek footpath of root directory current version to this catalogue or file current version, and does the adjustment (except the root directory) of snapshot time.Current version B* with catalogue B is an example, if the snapshot time of B* is snapepochB, the current version of its parent directory is that A* and its snapshot time are snapepochA, and the method for then adjusting snapepochB is formulated as follows: and snapepochB=MAX (snapepochA, snapepochB).At last, after the current version snapshot time of this catalogue or file finishes to adjust operation, compare epoch and snapepoch value among its inode, if snapepoch〉epoch, this version also was not modified after the last snapshot operation was described, legacy data should obtain keeping, and by metadata of duplicating this catalogue or file current version and the data that are modified, generates new version.

In above-mentioned ergodic process, except adjusting snapshot the time, also need be in the inode of each catalogue and file current version the current superepoch value of caching system, this value only is kept in the internal memory, be used for before carrying out ergodic process once more, judge whether target directory or file destination current version inode be expired, if it is not expired, the superepoch that is buffer memory equates with the current superepoch of system, explanation did not also trigger new snapshot operation from last time in the system of traversal back, traversal be can skip, follow-up comparison and modification directly carried out.

For the tight coupling between the different editions that utilizes identical file or catalogue, different editions is organized together according to time sequencing, adopt and name search index structure independently mutually, thereby with the name space formed by file and dirname in the whole file system with represent the version space of different editions rise time independent, adopt relatively independent strategy to manage.The same with traditional file system, in the name space of many version files system, file that is relative to each other and sub-directory leave under the same parent directory, form the hierarchical structure from the root directory to the file, and we are called one-dimentional structure, as shown in Figure 1a.Simultaneously, in the version space of many version files system, file that we are close with the rise time and sub-directory version leave under the same version of parent directory, form the hierarchical structure in the version space, and combine with the hierarchical structure of name space, we are called two-dimensional structure, shown in Fig. 1 b.In name space, we have designed dynamic Hash search strategy, in version space, we have designed the index strategy based on RBTree, directory versions and FileVersion adopt respectively at the RBTree structural variant of characteristics separately, wherein catalogue has adopted the embedded RBTree of inode, and file has adopted the heavy clue RBTree of cum rights.

In many version files system, the dynamic Hash search strategy of name space can the acceleration search process, alleviates the additional management burden of bringing to system owing to keep catalogue and file history version.Utilize in the process that hash index retrieves, at first set up dynamic hash index table in catalogue, each list item is wherein represented sub-directory or file in the catalogue; Then, as input, the result of calculation of hash function is corresponding sub directory or the pairing list item of the file address in dynamic hash index table with subdirectory name or filename.

Dynamically the core of hash index strategy is one group of hash function h ₀, h ₁H _kCan to hold the maximum number of directory entry be n to catalogue in the initialization system, then should satisfy by group function: h _i=hmod2 ¹, i ∈ 0,1,2 ...., k}, , wherein i is called the rank of hash function.H is traditional hash function, should have the characteristic that filename evenly is mapped to address space, can select the such hash function of SHA, MD5, is specifically specified by the user.Fig. 2 is dynamic hash index synoptic diagram.

Fig. 2 middle part is dynamic hash index table, and its address is stored in the i_block territory among the directory versions ionde.The pointer of the currently used hash function of this catalogue is stored in the hash territory among the inode.Dynamically each list item of hash index table comprises two parts: (1) pointer pointer, point to a basic unit of storage bucket, the directory entry of sub-directory or file just leaves among the bucket, and directory entry generally comprises information such as the name of sub-directory or file and data physical address; (2) the rank level of current list item.Among the figure, the numeral of each list item left end is the memory address of this list item, if the name cryptographic hash of sub-directory or son file is this numeral, then this sub-directory or son file are mapped to this list item.For example, file f oobar, as input, the result who calculates through hash function h3 is 7 with its name foobar, then the list item address of foobar correspondence is 7, and we just can find the directory entry and the data of this document successively by the pointer territory in this list item again.

Pointer in the list item points to a bucket, bucket is a basic unit of storage, be set at a physical block size, this be because: at first, to reading of exterior storage is to be unit with the piece, to less than the same I/O operation that needs to initiate a disk of reading of the bucket of physical block size; Secondly, be example with the ext2 file system, a physical block can hold at least 3 directory entries, and the directory entry with identical Hash address can leave among the same bucket, is the probability that the physical block size also can reduce address conflict so set bucket.Fig. 2 right part is the bucket example, and the elongated data block representative of band shade is stored in the directory entry among this bucket, the memory address of digitized representation in the directory entry and the pairing hash index table of this directory entry list item among the bucket.As scheme first bucket internal memory and contain three directory entries, wherein two corresponding stored addresses are 0 list item, another one corresponding stored address is 4 list item.Blank parts among the figure in the Bucket is represented free space.

The number of times that on behalf of list item indication bucket, the Level in the list item divided.When the directory entry of new insertion is mapped to certain list item, and the pointer of this list item bucket pointed is not when having enough spaces to hold this directory entry, and bucket just needs division.The division of Bucket need be adjusted relative index table list item with the mapping relations between buckets.

Dynamically the main algorithm in the hash index strategy comprises that retrieval, insertion, deletion and list item reclaim.In below discussing, establish the current hash function that uses and be h _k, dynamically the hash index table is IdxTbl[].

Retrieval:

Use dynamic hash index strategy to retrieve and only need read physical block 2 times.With catalog item foobar is example, and the first step calculates h according to the catalogue key name _k(foobar), read corresponding dynamic hash index table list item IdxTbl[h according to this address _k(foobar)]; Second step is according to the pointer IdxTbl[h in the list item _k(foobar)] → pointer reads corresponding directory entry and and then finds file data.

Insert and deletion:

When among the bucket enough spaces being arranged, can directly insert in the insertion algorithm, otherwise must generate new bucket.Have abundant list item if current hash index table draws table, then directly set up the mapping relations of list item and newly-generated bucket, otherwise the hash function of must upgrading enlarges address realm, increases concordance list list item number.

Deletion algorithm also needs to merge where necessary bucket except carrying out necessary directory entry deletion action, discharge redundant space.

List item reclaims:

In the frequent insertion delete procedure of dynamic hash index table, can generate unnecessary list item.List item reclaims algorithm and is used to compress the shared space of dynamic hash index table.At first travel through all list items of concordance list, if the level value of all list items then reclaims half list item all less than the rank k of current hash function, the hash function degradation.The operation that list item reclaims is more consuming time, regularly triggers and at running background.

In many version files system, the index structure of version space has adopted two kinds of index structures based on RBTree, comprising: the heavy clue RBTree of embedded RBTree of inode and cum rights.The former advantage is that data all are stored in the inode structure in the external memory, committed memory space not, and shortcoming is the operation that needs to carry out more visit external memory in the retrieval, latter's speed is slow relatively; The latter's advantage is the operation that has reduced the visit external memory, and speed is fast, and shortcoming is to need the outer memory headroom of occupying volume.The accessing characteristic of catalogue and file is in many version files system: mostly the visit to catalogue is to read, and revises few; And file modification is frequent, and version updating is fast, retrieval performance is required high.Take all factors into consideration the requirement in time and space, we use the embedded RBTree of inode to come index list version metadata, and use the heavy clue RBTree of cum rights to come index file version metadata.

Index structure based on RBTree provides three kinds of operations: retrieval, insertion and deletion.Claim that herein the leaf node of tree is an external node; Non-leaf node is inner node.

The embedded RBTree of inode is to be used for the index of metadata structure of catalog version, is that the typical case of common RBTree structure in many version files system uses.The inner node of on it each is corresponding one by one with certain concrete version of catalogue, its data structure is comprised among the inode that represents this directory versions, deposit in external memory, be mainly used to store following information: the pointer of node key value, sensing father node, point to the pointer of left subtree, point to the pointer of right subtree and the state (color attribute etc.) of this node.The node key value is the foundation of retrieval, is set at the corresponding directory versions rise time epoch of node.Its structural representation is as follows:

typedef?struct?embeddedininode_rbt_node{

Int key; // represent key value, be set at version rise time epoch

Int color; // represent the color of node

Struct embeddedininode_rbt_node*parent; The corresponding father node of // sensing node

Struct embeddedininode_rbt_node*left; The pointer of // sensing left subtree

Struct embeddedininode_rbt_node*right; The pointer of // sensing right subtree

}?*pei_rbt_node；

External node is puppet (dummy) node, only for the character of keeping RBTree exists, does not have corresponding entity.

Fig. 3 is the example of the embedded RBTree of inode.Inside node circular among the figure is represented directory versions, the digitized representation key value in the node.Internal junction point is example according to the key value ordering with the root node, and the inside node that key value is bigger than root node is represented the later directory versions of creation-time, is arranged in the left subtree of root node; The inside node that key value is littler than root node is represented creation-time directory versions early, is arranged in the right subtree of root node.Square among the figure is external node, does not have directory versions corresponding with it.

The heavy clue RBTree of cum rights is to be used for the index of metadata structure of retrieving files version.The inner node of on it each all is the index node that is based upon in the internal memory, the pointer that comprises clue, sensing father node in the data structure, point to the pointer of left subtree, point to the pointer of right subtree and the state (color attribute etc.) of this node, but do not comprise key value; Each external node on it is corresponding one by one with certain concrete version of file, data structure is comprised among the inode that represents this document version, deposit in external memory, be mainly used to store following information: the epoch value the when pointer and the weight of node key value, sensing father node, node key value are set at the respective file version generation of node institute.The structural representation of its inner node and external node is as follows:

typedef?struct?weightlink_rbt_node{

union{

Int key; // be used for external node, represent weight and key value, the foundation of external node ordering

struct{

Struct weightlink_rbt_node * ll; //ll represents the left clue of inner node

Struct weightlink_rbt_node * rl; //rl represents the right clue of inner node

Link; // be used for inner node, represent the clue of node

}kl；

union{

Int weight; // be used for external node, represent the node weight

Int color; // be used for inner node, represent the color of node, external node is defaulted as black

}wc；

union{

Struct weightlink_rbt_node*root; // be used for external node, point to the root of RBTree

Struct weightlink_rbt_node*parent; // be used for inner node, point to the corresponding father node of this node

}rp；

union{

struct{

Struct weightlink_rbt_node*forerunner; // be used for external node, point to the externally forerunner in the node chained list of this node

Struct weightlink_rbt_node*successor; // be used for external node, point to externally follow-up in the node chained list of this node

Chain; // be used for making up the chained list that sorts by weight into external node

struct{

Struct weightlink_rbt_node*left; // being used for inner node, the pointer of left subtree is pointed in representative

Struct weightlink_rbt_node*right; // being used for inner node, the pointer of right subtree is pointed in representative

Child; // being used for inner node, the pointer of left and right sides subtree is pointed in representative

}cc；

Int flag; // zone bit is stored the attribute of this node, as: external node/internal junction dot mark

}*pwl_rbt_node；

Weight is represented the significance level of external node, is used for constructing RBTree external node weight chained list, and the node of weight maximum is positioned at gauge outfit, and minimum node is positioned at the table tail.For increasing the reliability and the dirigibility of system, we except this version is inserted or deletes, also can carry out corresponding operating in RBTree external node weight chained list from the RBTree index when generation and deleted file version.External node weight chained list can be used as replenishing of RBTree index: when the unexpected inefficacy of RBTree index, the user still can pass through the corresponding FileVersion of weight chain table access.The concrete implication of weight can be set by user oneself, for example: can get the accessed number of times of external node as weight.

Clue is the special pointer in the inner node of RBTree.The clue of an inner node is pointed to its forerunner in RBTree inorder traversal process and follow-up, and its forerunner and follow-up all be external node.The effect that increases clue is to give inner node with key value, reason is: no corresponding relation between inner node and FileVersion, and external node has, epoch value when the key value in the simultaneity factor is set at the FileVersion generation, so have only external node just to comprise key value, the inner node key assignments that has nothing to do.But the RBTree search operaqtion need be read the key value of inner node, as with key value to be retrieved object relatively, visit inner node forerunner or follow-up key value can head it off by clue.

Fig. 4 is the example of the heavy clue RBTree of cum rights, and circular is inner node, and square is external node.The doubly linked list that among the figure with head is head is the external node weight chained list by the weight ordering, and gauge outfit is the weight limit external node.Digitized representation key value in the external node, the epoch value when promptly external node representative FileVersion generates.Weight equates with key value in this example.The band arrow dotted line of being drawn by the inner node of root among the figure is represented the clue of root: the dotted line that is labeled as " ll " is represented the left clue of root, points to the forerunner of root in the inorder traversal process; Be labeled as the right clue of representative of " rl ", point to root follow-up in the inorder traversal process.With the retrieval key value is that 2 node is an example because root does not have key value to compare with it, so the key value 4 (3) in the external node that its left side (right side) clue need be pointed to read in, and then relatively.If accessed node key value just with read in key value and equate, then can directly read in external node.If etc., then do not turn to corresponding subtree to continue retrieval.

Claims

1. based on the fine granularity file and the directory versions management method of snapshot, it is characterized in that this method realizes according to following steps successively on file and directory versions management server:

Step (1). initialization: the data structure of file and catalog system is done following the expansion:

At the system data structure of file or catalogue, promptly increase following content among the file system index node Inode to adapt to the system requirements of supporting file or directory versions generting machanism:

Operating system time epoch when version generates, the different editions of file or catalogue is corresponding to different epoch values, and more early its corresponding epoch value of the version of Sheng Chenging is more little, otherwise big more;

The last time snapepoch to file or catalogue execution snapshot operation;

Share bitmap, wherein depositing the data sharing relation between identical file or catalogue different editions; Bitmap is arranged in root index structure and the indirect index structure of Inode, and its form is the set of bit; Pointer in the set in bit and the index structure is one to one in regular turn; The corresponding pointer of bit ' 1 ' representative data block pointed is managed by this Inode, and this Inode has the right of possession corporeal right to this data block, and can share this data block with version before; Bit ' 0 ' the corresponding pointer of representative data block pointed does not belong to this Inode, but is managed by the Inode of certain version afterwards, and this Inode only has the right to use to this data block;

The version index structure is used for depositing the pointer and the correlation behavior of other versions of index;

For directory versions, also have among its Inode:

The pointer i_block of index data piece points to dynamic hash index table;

Point to the pointer hash of hash function, indicate presently used hash function;

Directory entry dentry in the file system in the catalogue listing: life cycle two tuples, comprising: date of birth, birth_epoch, the system time when promptly the directory versions of directory entry representative is created; Death time, death_epoch, the system time when promptly the directory versions of directory entry representative is left out;

Meanwhile, the name space of forming by the different names of file and catalogue in the whole file system with generate by different time but version space that the identical version of name is formed is independent; In name space, the son file that is relative to each other and sub-directory are all left in the name space under the same parent directory, thereby the file of different names and catalogue form from the hierarchical structure of the root directory of name space by the relation that contains in logic; Here, the all corresponding a series of version of each file under each name and catalogue, form a version space, wherein, the version of file and catalogue was organized by index structure according to the time that version generates, make son file and sub-directory version that the rise time is close successively leave in by the rise time under the version of same parent directory, form the hierarchical structure in the version space;

The index strategy based on dynamic Hash has been adopted in the search of name space, the index strategy based on RBTree has been adopted in the retrieval of version space, and directory versions and FileVersion adopt respectively at the RBTree structural variant of characteristics separately, and corresponding metadata leaves in the Inode structure of directory versions and FileVersion correspondence;

Step (2) generates respective version according to the following steps:

Step (2.1), carry out snapshot operation in the following manner:

Step (2.2), revised file or catalogue in the following manner:

In file system hierarchical structure tree, top-down execution is by root directory current version seeking directly to the current version of the catalogue that will revise or file; To seek in the track each file of process or the current version of catalogue, find out its parent directory in name space, the local snapepoch to this version adjusts as follows:

Step (2.3), determining step (2.2) is sought the footpath and is finished that file found or catalogue and whether should generate new version after the time adjusting snapshot: relatively epoch in the Inode of the current version of catalogue that step (2.2) was revised or file and the value of snapepoch, if: snapepoch is greater than epoch, this version also was not modified after the last snapshot operation was described, this current version is expired, need to keep legacy data, and the Inode metadata of duplicating this catalogue or file current version, keep as old version, join in the index structure, write down the data sharing information of current version and old version simultaneously by bitmap form; Otherwise it is expired to illustrate that current version does not have, and need not keep legacy data and metadata, directly revises the related data of current version, revises the data sharing bitmap of current version simultaneously;

Step (3), in name space, set up quick indexing according to the following steps:

Step (3.1), in the name space catalogue, set up dynamic Hash table, wherein, each list item is represented sub-directory or the son file in the catalogue, its address is stored in the record mapping table i_block territory among the directory versions Inode, and the currently used hash function of this catalogue is stored in the hash territory among the Inode; Each list item of this dynamic Hash table comprises two parts: (a) pointer pointer, point to a basic unit of storage bucket, be set at the size of a physical block, internal memory the directory entry of sub-directory or son file, and wherein elongated data block representative is stored in the directory entry among this bucket, and this directory entry contains name and the sub-directory of this directory entry representative or Inode number of son file of sub-directory or son file; (b) the rank Level of current list item, the number of times of representing list item indication bucket to be divided, when the directory entry of new insertion is mapped to certain list item, and the pointer of this list item bucket pointed do not have enough spaces to hold this directory entry, and bucket just needs division; Dynamically each hash table of Hash table has different memory addresss, as the cryptographic hash of name is next sub-directory or son file is mapped to this list item; In this dynamic Hash table, adopted one group of hash function h ₀, h ₁... h _k,

Be the maximum number of the open ended directory entry of catalogue in this system, i ∈ 0,1,2 ..., k}, i are the rank level of hash function, h _i=hmod2 ⁱ, h is the traditional hash function with even distribution character;

Step (3.2), as input, the count results of hash function is corresponding sub directory or the pairing list item of the son file address in dynamic hash index table with subdirectory name or son file name;

Step (4), in version space, set up quick indexing:

Step (4.1) is set up quick indexing based on the embedded RBTree of Inode for the different editions of same catalogue in the file system, and with the metadata of catalog version, its step is as follows:

Step (4.1.1), the leaf node of this RBTree does not have correspondent entity only as external node, only exists for keeping a RBTree; The non-leaf node of this RBTree is as inner node, and wherein each inner node is corresponding one by one with certain concrete version of catalogue;

Step (4.1.2), the data of inner node all are stored among the Inode that represents this directory versions, leave in the external memory, store following information: the node key value, rise time epoch for version, point to the pointer of the corresponding father node of this node, point to the pointer of left subtree, point to the pointer of right subtree and the color of this node; External node does not have correspondent entity, only is present in the internal memory, stores following information: point to the pointer of the corresponding father node of this node, and the color of external node, the latter defaults to black; Between each inner node, node sorts by key value: be benchmark with the root node, the inside node that the node key value is bigger than root node is represented the later directory versions of creation-time, is arranged in the left subtree of root node, otherwise, then be arranged in right subtree; According to RBTree tree body adjustment algorithm, along with the insertion and the deletion of node, the adaptive change of the position of each node meeting;

Step (4.1.3), the catalogue redaction is the Inode initialize of this directory versions of representative after generating, and puts the rise time of node key value for this version, puts each pointer for empty, setpoint color is red;

Step (4.1.4), in the RBTree index structure, search for the position that this directory versions should be inserted according to the version key value, indexing means is as follows: the key value that at first relatively is inserted into node and root node, if the former is greater than the latter, then advance to and continue search in the left subtree of root node, otherwise, advance in the right subtree, by that analogy, till arriving at leaf node, the father node of buffer memory leaf node; Replace leaf node with this node then, revise corresponding this node of subtree pointed in the father node, revise the father node pointed father node in this node, and be two outside child nodes about this node generation automatically, establishing its color simultaneously is black, as new leaf node; Because the uniqueness of version rise time equates with this node key value if find existing node in the search procedure, then reports an error and withdraws from;

Step (4.1.5) is checked the tree body structure, if imbalance then be adjusted accordingly, concrete principle is: node all comprises " black " node of similar number to every simple path of its descendants's node, and the while, red node can only be adjacent with the black node; Concrete grammar is: check the color of this version node and the color of its father node, if both are not redness, then EO simultaneously; Otherwise, set the body adjustment algorithm according to the symmetrical y-bend B tree that Bayer proposed in 1972, do left-handed or right-handed operations to the corresponding subtree that comprises this node, in subtree, reset the node color, make this subtree satisfy adjust principle, keep tree body balance, and the subtree root node after finishing with rotation is a target, carry out the inspection of next round, so move in circles;

Step (4.2) is set up the quick indexing of the heavy clue RBTree of cum rights for the different editions of identical file in the file system, and with the metadata of retrieving files version, its step is as follows:

Step (4.2.1), the non-leaf node of this RBTree is inner node, only uses as index structure, does not have correspondent entity; The leaf node of this RBTree is as external node, and is corresponding one by one with certain concrete version of file;

Step (4.2.2), the data of external node all are stored among the Inode that represents this document version, leave in the external memory, store following information: the node key value, rise time epoch for version, point to the pointer of the corresponding father node of this node, the node weight is pointed to forerunner in the weight chained list and follow-up pointer and the color of this node respectively; Inner node only is present in the internal memory, store following information: clue, promptly point in the RBTree inorder traversal should inside node forerunner pointer, because this forerunner must be an external node, so clue is used to extract forerunner's key value, point to the pointer of the corresponding father node of this node, point to the pointer of left subtree, point to the pointer of right subtree and the color of this node; Between each inner node, node is by its clue indication forerunner's key value ordering, the similar embedded RBTree of sortord, not existing together only is that each inner node does not have key value, the key value of the external node that the clue of necessary this inside node of use is pointed substitutes;

Step (4.2.3), the file redaction is the Inode initialize of representative this document version after generating, and puts the rise time of respective external node key value for this version, puts its father node pointer and weight chain list index for empty, setpoint color is a black;

Step (4.2.4), search for the position that this document version should insert according to the version key value in the RBTree index structure, indexing means is as follows: at first relatively be inserted into node key value and root node clue indication forerunner's key value, if the former is greater than the latter, then advance to and continue search in the left subtree of root node, otherwise, advance in the right subtree, by that analogy, till arriving at leaf node, this leaf node of buffer memory, hereinafter sibling is called in letter, and father node; Generate new inside node then, initialization should inside node color be red, make the above-mentioned father node of father node pointed of this inside node simultaneously, with above-mentioned leaf node and be inserted into two child nodes of external node as this inside node, initialization simultaneously should the inside node clue point to its left child node, because the uniqueness of version rise time equates with this version key value if find the key value of existing node in the search procedure, then reports an error and withdraws from;

Step (4.2.5) is checked the tree body structure, if imbalance then be adjusted accordingly is adjusted concrete principle and is: node all comprises " black " node of similar number to every simple path of its descendants's node, and can not there be two continuous red nodes in the while; Concrete grammar is: the father node with new insertion version node is a start node, check the color of this node and the color of its father node, if both are not red simultaneously, EO then: otherwise, set the body adjustment algorithm according to the symmetrical y-bend B tree that Bayer proposed in 1972, do left-handed or right-handed operations to the corresponding subtree that comprises this node, in subtree, reset the node color, this subtree is satisfied adjust principle, keep tree body balance, and the subtree root node after finishing with rotation is target, carries out the inspection of next round, so moves in circles;

Step (4.2.6) is linked into this version node in the weight chained list, and concrete grammar is as follows: represent all Inode of the existing version of this document to be linked into a weight chained list according to the weight size, getting weighted value in our design is the node key value; If the described sibling of step (4.2.4) is the Zuo brother of this version node, then with the follow-up insertion weight chained list of this version node as this sibling, the weight chain list index that the original successor node of sibling, this version node, sibling is set makes these three nodes link chaining in regular turn; If sibling is the right brother of this version node, then this version node is inserted the weight chained list as the forerunner of this sibling, the weight chain list index that the original forerunner's node of sibling, this version node, sibling are set makes these three nodes link chaining in regular turn.