CN113010486B - Metadata layered caching method and device for centerless distributed file system - Google Patents

Metadata layered caching method and device for centerless distributed file system Download PDF

Info

Publication number
CN113010486B
CN113010486B CN202110160110.4A CN202110160110A CN113010486B CN 113010486 B CN113010486 B CN 113010486B CN 202110160110 A CN202110160110 A CN 202110160110A CN 113010486 B CN113010486 B CN 113010486B
Authority
CN
China
Prior art keywords
directory entry
directory
metadata
cache layer
file attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110160110.4A
Other languages
Chinese (zh)
Other versions
CN113010486A (en
Inventor
张国华
张恽
李庆节
彭琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Guoke Yicun Information Technology Co ltd
Original Assignee
Hunan Guoke Yicun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Guoke Yicun Information Technology Co ltd filed Critical Hunan Guoke Yicun Information Technology Co ltd
Priority to CN202110160110.4A priority Critical patent/CN113010486B/en
Publication of CN113010486A publication Critical patent/CN113010486A/en
Application granted granted Critical
Publication of CN113010486B publication Critical patent/CN113010486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/188Virtual file systems
    • G06F16/196Specific adaptations of the file system to access devices and non-file objects via standard file system access operations, e.g. pseudo file systems

Abstract

The invention discloses a metadata layered caching method and a device for a centerless distributed file system, wherein the method comprises the following steps: s1, hierarchically setting a directory entry cache layer and a file attribute cache layer for each storage node in a centerless distributed file system; s2, directory item metadata, directory item relations, directory item identification information and file attribute metadata corresponding to each directory item in the virtual file system are inquired, the directory item metadata and the directory item relations obtained through inquiry are stored in a directory item cache layer, and the directory item identification information and the file attribute metadata obtained through inquiry are stored in a file attribute cache layer according to the corresponding relations. The invention can improve the processing speed of the storage node on the metadata request and improve the metadata operation performance under the precondition of not influencing a centerless distributed file system.

Description

Metadata layered caching method and device for centerless distributed file system
Technical Field
The invention relates to the technical field of distributed file systems, in particular to a metadata layered caching method and a metadata layered caching system for a centerless distributed file system.
Background
The centerless distributed file system is a file system without a main control node, any node is the same level, data blocks in the file system are maintained by all storage nodes in the whole system together, and each node is responsible for maintaining local data of the node, so that the inevitable problems of single-point failure and performance bottleneck in the centered file system can be avoided, and the reliability, the availability and the expansibility of the system are improved.
Metadata is information describing data attributes, including directory attributes, file attributes, and the like. The centerless distributed file system has no metadata server, metadata is distributed on each storage node, and a client has various metadata operations such as directory operation (such as create, mkdir, mknod, rmdir, unlink, etc.), file attribute operation (such as lookup, stat, setr, etc.) and file content read-write operation (readv, writev, etc.) on the storage node. At present, metadata in a centerless distributed file system are all uniformly stored in a disk, and in such a manner that the metadata are uniformly stored in the disk, a single storage node needs to simultaneously face a large number of directory operations, file attribute operations and file content read-write operations sent by clients, and various types of operations can mutually cause the problem of metadata request congestion due to disk IO preemption, which can greatly affect the processing speed of metadata requests and the operational performance of the metadata.
For data caching, in order to reduce the amount of data caching in a data set, a typical solution is classified caching, i.e., different types of data are cached in different cache spaces. However, the classified cache manner is only to cache in different cache spaces according to data types, and different data caches are usually independent from each other. However, metadata on each storage node in a centerless distributed file system has hierarchy and coupling relevance, for example, there is a coupling relationship between directory entries and between files, there is hierarchy relevance between directory entries and file attributes, if a manner of independent classification caching is directly adopted for metadata, it still takes a long time to independently obtain required metadata from different caches due to lack of relevance information between data, especially when facing a large amount of data requests, so when a client sends a metadata request, it is difficult to accurately obtain required target data from a cache quickly, and there still exist problems of low processing speed of metadata requests and poor data operation performance.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a metadata layered caching method and a metadata layered caching system for a centerless distributed file system, which can improve the processing speed of a storage node on a metadata request and improve the metadata operation performance on the premise of not influencing the centerless distributed file system.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
a metadata hierarchical caching method in a centerless distributed file system comprises the following steps:
s1, hierarchical cache construction: the method comprises the steps that a directory entry cache layer and a file attribute cache layer are arranged for each storage node in a centerless distributed file system in a layering mode, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
s2, data layered storage: the method comprises the steps of inquiring directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in a virtual file system, storing the directory entry metadata and the directory entry relations obtained through inquiry into a directory entry cache layer, wherein the directory entry relations are relations between the directory entry metadata and the directory entries, and storing the directory entry identification information and the file attribute metadata obtained through inquiry into the file attribute cache layer according to the corresponding relations.
Further, in step S1, the directory entry caching layer is set in the memory, and the file attribute caching layer is set in the fast storage medium.
Further, the step of step S2 includes:
s21, defining a directory entry structure body for storing a directory entry relation; defining a directory entry array for storing the array of the directory entry structure, and defining a file attribute database for storing the file attributes of the directory entries;
s22, traversing all directory items of a shared directory in the virtual file system, inquiring the directory item attribute corresponding to each directory item during traversal, storing the directory item attribute into the directory item array, acquiring the directory item relationship, writing the directory item relationship into the directory item structure, constructing and forming the directory item cache layer, acquiring the directory item identification information during traversal, storing the directory item identification information and the file attribute metadata of the directory items into the file attribute database according to the corresponding relationship, and constructing and forming the file attribute cache layer.
Further, in step S22, when traversing a directory entry each time, building the directory entry structure based on the information of the directory entry, and storing the directory entry structure in the directory entry array; after the directory entry array is stored, writing a father directory value of a current directory entry, a head value of a sub-directory entry linked list and a position value of the sub-directory entry linked list acquired in traversal operation into a data parameter corresponding to the directory entry structure, and constructing and forming a directory entry cache layer; and forming a k-v key value pair form by using the directory entry identification information as a key and using the file attribute value metadata of the directory entry as a value, and inserting the k-v key value pair form into the file attribute database to complete the construction of the file attribute cache layer.
Further, when the directory entry is stored in the directory entry array, the stored position is calculated by a hash function, the input of the hash function is directory entry identification information in the current directory entry, and the output is an array subscript.
Further, the step S2 is followed by a data retrieving step S3, which includes: when receiving a metadata request operation for a storage node, retrieving the directory entry cache layer and/or the file attribute cache layer according to the received request type and directory identification information; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; if the information of a parent directory entry or a sub-directory entry related to the current directory entry needs to be inquired at the same time, traversing a sub-directory linked list in the inquired directory entry metadata to obtain all sub-directory entry metadata; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer currently and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
Further, when traversing the sub-directory linked list in the searched directory entry metadata, the sub-directory linked list is traversed by specifically using the parent directory value in the directory entry structure, the header value of the sub-directory entry linked list, and the position value of the sub-directory entry linked list, so as to obtain all the sub-directory entry metadata.
Further, in step S3, a hash value is calculated by using a preset hash function with the received directory entry identification information as an input, and a memory address of the directory entry metadata in the directory entry cache layer is obtained through a hash table.
A metadata hierarchical caching apparatus for a centerless distributed file system, comprising:
the hierarchical cache structure comprises a directory entry cache layer and a file attribute cache layer which are hierarchically arranged for each storage node in the centerless distributed file system, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
the data hierarchical storage module is used for inquiring directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in the virtual file system, storing the inquired directory entry metadata and directory entry relations into the directory entry cache layer, wherein the directory entry relations are relations between the directory entry metadata and the directory entries, and storing the inquired directory entry identification information and the file attribute metadata into the file attribute cache layer according to the corresponding relations.
The data retrieval module is connected with the data layering storage module and is used for retrieving the directory entry cache layer and/or the file attribute cache layer according to the request type and the directory identification information when receiving metadata request operation on the storage node; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; if the information of a parent directory entry or a sub-directory entry related to the current directory entry needs to be inquired at the same time, traversing a sub-directory linked list in the inquired directory entry metadata to obtain all sub-directory entry metadata; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer currently and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
Compared with the prior art, the invention has the advantages that:
1. according to the method, a local metadata cache is established on a storage node, a hierarchical cache structure is established according to the type of metadata, a hierarchical storage mode is adopted during metadata storage, directory item metadata and the relation of directory items are stored in a directory item cache layer according to the type and the hierarchical relation of the metadata, directory item identification information and file attribute metadata are stored in a file attribute cache layer according to the corresponding relation, based on the hierarchical storage mode, request blocking caused by the fact that all metadata are uniformly stored in the same hard disk can be avoided, and when a client needs to inquire the metadata, the required metadata can be quickly searched according to the corresponding relation, so that metadata request operation of the client to the storage node is converted into quick search of the hierarchical cache, and the performance of the metadata operation is effectively improved.
2. According to the invention, the metadata is hierarchically cached by combining the hierarchical cache structure and the corresponding relation between the metadata, so that the processing speed of the storage node on the metadata request of the client can be increased and the metadata request path can be reasonably optimized under the premise of not influencing the architecture of the centerless distributed file system, thereby effectively increasing the metadata request speed and improving the performance of the centerless distributed file system.
3. According to the invention, the directory entry related information is stored in the memory, and the file attribute related information is stored in the fast hard disk device, so that not only can metadata operation not access slow disk IO any more, but also the interaction between the directory entry metadata request and the file attribute metadata request can be avoided, and the overall performance of metadata retrieval is improved.
4. When the data is cached hierarchically, the parent directory value of the directory entry, the header value of the sub-directory entry linked list and the position value of the sub-directory entry linked list are written into the directory entry structure body to construct the directory entry cache layer when the directory entry is traversed, when a client needs to inquire metadata, all related directories can be quickly returned through the directory entry information in the directory entry cache layer without frequent retrieval one by one through Hash calculation, the data request efficiency can be further improved, and the metadata operation performance is improved.
Drawings
Fig. 1 is a schematic flow chart of an implementation of a metadata hierarchical caching method in a centerless distributed file system according to this embodiment.
Fig. 2 is a schematic diagram illustrating a location principle of the hierarchical caching in the centerless distributed file system according to the embodiment.
Fig. 3 is a schematic diagram of the construction of a hierarchical cache in the present embodiment.
FIG. 4 is a schematic diagram illustrating the principle of object transformation between directory entries and file attributes in a virtual file system.
Fig. 5 is a schematic flow chart of an implementation of processing a metadata request based on a hierarchical cache structure in this embodiment.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.
As shown in fig. 1, the metadata hierarchical caching method in the centerless distributed file system of this embodiment includes the steps of:
s1, hierarchical cache construction: a directory entry cache layer and a file attribute cache layer are hierarchically arranged for each storage node in the centerless distributed file system, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
s2, data layered storage: directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in the virtual file system are inquired, the directory entry metadata and the directory entry relations obtained through inquiry are stored in a directory entry cache layer, the directory entry relations are relations between the directory entry metadata and the directory entries, and the directory entry identification information and the file attribute metadata obtained through inquiry are stored in a file attribute cache layer according to the corresponding relations.
According to the embodiment, a local metadata cache is established on a storage node, a hierarchical cache structure is established according to metadata types to store metadata related to directory entries and metadata related to file attributes in a hierarchical mode, meanwhile, a hierarchical storage mode is adopted during metadata storage, the metadata of the directory entries and the relationship of the directory entries are stored in a directory entry cache layer according to the types and the hierarchical relationship of the metadata, and directory entry identification information and file attribute metadata are stored in a file attribute cache layer according to the corresponding relationship. Based on the hierarchical storage mode, request blocking caused by the fact that all metadata are stored in the same hard disk in a unified mode can be avoided, and when the client needs to inquire the metadata, the needed metadata can be retrieved quickly according to the corresponding relation, so that metadata request operation of the client to the storage nodes is converted into quick retrieval of the hierarchical cache, and performance of metadata operation is improved effectively.
In this embodiment, a metadata hierarchical cache is specifically established on a Virtual File System (VFS), and the positions of the hierarchical caches in the centerless distributed File System are shown in fig. 2, where a MetadataCache layer is a position of the hierarchical caches in the centerless distributed File System.
In the distributed storage system, each storage node has a plurality of local file systems, and the local file systems need to construct corresponding virtual file systems before being used. The virtual file system is specifically composed of a file system type, a superblock, an inode, a dentry and a vfsmount, wherein the dentry stores position information of a certain file system object in a file system tree, the inode stores general metadata of the file system object, and mapping between object relationships is completed through each interface.
In this embodiment, the file metadata is divided into two types, namely directory entry metadata (entry) and file attribute metadata (inode), according to the type of the file metadata, and a directory entry cache layer and a file attribute cache layer are correspondingly established. In consideration of the characteristics that the directory entry dentry has a deep coupling relationship (parent-child directory and peer directory) and a small data volume, the embodiment specifically establishes a directory entry cache layer in the memory. In consideration of the characteristics of shallow coupling relationship and large data volume among files, the embodiment establishes a file attribute cache layer in a form of k-v key value pairs in a fast storage medium such as an SSD.
The hierarchical cache structure constructed in this embodiment is shown in fig. 3, and based on the hierarchical cache structure, metadata request operation of a client on a storage node can be converted into retrieval of a hierarchical cache, directory entry related information is stored in a memory, file attribute related information is stored in a fast hard disk device such as an SSD, so that not only can metadata operation be prevented from accessing slow disk IO, but also mutual influence between a directory entry metadata request and a file attribute metadata request can be avoided, and overall performance of metadata retrieval is improved.
It can be understood that the storage form of the hierarchical cache structure may also adopt other manners according to actual requirements, such as storing the directory entry related information and the file attribute related information into different addresses in the memory at the same time, or storing the directory entry related information and the file attribute related information into different spaces in the hard disk device at the same time, and the like.
The step S2 in this embodiment includes:
s21, defining a directory entry structure body for storing a directory entry relation; defining a directory entry array for storing an array of directory entry structures, and defining a file attribute database for storing directory entry file attributes;
s22, traversing all directory items of a shared directory in the virtual file system, inquiring the directory item attribute corresponding to each directory item during traversal, storing the directory item attribute into a directory item array, obtaining the directory item relationship, writing the directory item relationship into a directory item structure, constructing and forming a directory item cache layer, obtaining the directory item identification information during traversal, storing the directory item identification information and the file attribute metadata of the directory items into a file attribute database according to the corresponding relationship, and constructing and forming a file attribute cache layer.
The directory item relationship in the virtual file system is divided into a parent-child relationship and a peer-level relationship, and is stored through d _ parent, d _ child and d _ subdir values in the directory item attribute. In this embodiment, a directory entry structure is defined first, which is used to store directory entry metadata and relationships (parent-child relationships and peer relationships) between directory entries, and a memory space is applied in advance as an array for storing the directory entry structure, so as to construct a directory entry cache layer; and simultaneously selecting a lightweight database leveldb to store the file attributes of the directory entries, and constructing a file attribute cache layer.
In this embodiment, the step S22 may specifically adopt a Depth First Search (DFS) algorithm to traverse all directory entries from the shared directory root directory. Of course, other ways of traversing the directory entries may be used according to actual needs. In the traversal process, a file system interface API provided by an operating system is used for inquiring corresponding directory entry attributes and file attribute metadata for each directory entry, and the directory entry attributes and the file attribute metadata are stored in a directory entry cache layer and a file attribute cache layer after processing.
In this embodiment, when the directory entry is traversed in step S22 each time, a directory entry structure is specifically constructed based on the information of the directory entry and stored in the directory entry array; after the directory entry array is stored, writing a parent directory value d _ parent of a current directory entry acquired during traversal operation, a header value d _ subdirs of a child directory entry linked list and a position value d _ child of the child directory entry linked list into a data parameter corresponding to a directory entry structure, wherein the d _ child value is used for being linked into a d _ child linked list subordinate to the parent directory d _ parent, and the directory relationship is constructed through the d _ parent value, the d _ subdirs value and the d _ child value, so that a directory entry cache layer is constructed; and forming a k-v key value pair form by using the directory entry identification information as a key and the file attribute value metadata of the directory entry as a value, and inserting the k-v key value pair form into a file attribute database to complete the construction of a file attribute cache layer. Based on the construction of the directory relationship, if directory entries of a subordinate subdirectory entry linked list of a directory entry (all subdirectory entries, the head of the sub-chain table can be found through a d _ subdir value in a directory entry structure) need to be returned, and then all subdirectory entries are quickly returned through d _ child traversal linked list.
In practical applications, it may be necessary to obtain all directory entries that are relevant, rather than just a single directory entry. In this embodiment, on the basis of a hierarchical cache structure, when metadata is hierarchically cached, a directory relationship is constructed by the d _ parent value, the d _ subdirs value, and the d _ child value together, and when a request operation is performed on the metadata, all relevant directories can be quickly returned by the directory entry information in the directory entry cache layer without frequently performing retrieval one by one through hash calculation, so that the metadata processing speed can be effectively increased, and the metadata operation performance of the system can be improved.
When the directory entry is stored in the directory entry array, the stored position is obtained by calculating a hash function, the input of the hash function is directory entry identification information in the current directory entry, and the output is an array subscript.
In the virtual file system, a directory entry tree is constructed by using directory entries, the positions of the directory entries and the directory entries adjacent to the directory entries in the file system tree can be found through directory entry interfaces (d _ parent, d _ child, d _ subdirs operations), bidirectional finding between the directory entries dentry and file attribute inodes can be performed through file attribute interfaces (d _ inode, i _ dentry), and the interrelation between the virtual file system vfs directory entries and the file attributes in the LINUX system is shown in fig. 4, wherein the dentry and the inode are hierarchical organization objects and correspond to directory entry values and file attribute values.
In a specific application embodiment, the detailed step of step S2 is:
step S21:
(1) defining directory entry structure
The directory entry structure is defined to store the directory entry metadata and the relationship (parent-child relationship and peer relationship) between directory entries.
(2) Defining directory entry arrays and file attribute databases
Applying a memory space as an array for storing a directory entry structure in advance, and constructing a directory entry cache layer; and simultaneously selecting a lightweight database leveldb to store the file attributes of the directory entries, and constructing a file attribute cache layer.
(3) Constructing a directory entry cache layer and a file attribute cache layer
Traversing the whole directory from the shared directory root directory by adopting a depth-first algorithm; in the traversal process, a file system interface API provided by an operating system is used for inquiring corresponding directory entry attributes and file attribute metadata for each directory entry, and the directory entry attributes and the file attribute metadata are stored in a directory entry cache layer and a file attribute cache layer after processing.
Step S22:
(1) directory entry caching layer
a. Inserting directory entries
When a specific directory entry is traversed, firstly, a directory entry structure is constructed based on the information of the current directory entry and stored in an array. The stored location (i.e., array index) is calculated by a pre-constructed hash function, the input of which is the specific identifier in the directory entry, and the output of which is the array index. If repeated hash function calculation results occur, the zipper method is used to resolve the conflict.
b. Storing directory entry relationships
And c, for the directory entry being processed in the step a, after the array is inserted, writing a parent directory (d _ parent) value and a table head (d _ subdirs) value of a subdirectory entry linked list of the directory entry acquired in the traversal operation into a data parameter corresponding to the directory entry structure.
(2) File attribute caching layer
And c, using the identification information of the directory entry as a key, using the file attribute value metadata of the directory entry to be processed into a character string form through serialization processing as a value, forming a k-v key value pair form, and inserting the k-v key value pair form into the database to complete the construction of the file attribute cache layer.
And after the traversal is finished, based on the hierarchical cache structure, carrying out metadata request redirection on the metadata request sent by the client.
The step S2 of the present embodiment further includes a data retrieving step S3, which includes: when receiving a metadata request operation for a storage node, retrieving a directory entry cache layer and/or a file attribute cache layer according to a received request type and directory identification information; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; if the information of a parent directory entry or a sub-directory entry related to the current directory entry needs to be inquired at the same time, traversing a sub-directory linked list in the inquired directory entry metadata to obtain all sub-directory entry metadata; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer at present and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
In this embodiment, when traversing the sub-directory linked list in the searched directory entry metadata, the sub-directory linked list is specifically traversed by the parent directory value in the directory entry structure, the header value of the sub-directory entry linked list, and the position value in the sub-directory entry linked list, so as to obtain all sub-directory entry metadata. Specifically, when a hierarchical cache is constructed, the construction of a directory relationship is realized through a d _ parent value, a d _ subdirs value and a d _ child value, when a request operation related to a directory entry is received, after target directory entry metadata is returned, if directory entries (all subdirectory entries) of a subordinate subdirectory entry linked list of the target directory entry need to be returned at the same time, a sublist header is found through the d _ subdir value in a directory entry structure, then all subdirectory entries are quickly returned through a d _ child traversal linked list, and all required subdirectory entries can be quickly returned.
After metadata is stored hierarchically according to the corresponding relation, when a request operation for the metadata is received, a corresponding retrieval path in the cache is determined according to the request type and the identifier, the metadata request operation of the client for the storage node is converted into a quick retrieval for the hierarchical cache, the metadata to be retrieved can be quickly retrieved according to the hierarchical cache, the efficiency of the metadata request is effectively improved, and therefore the performance of the request operation is improved.
In step 3 of this embodiment, the received directory entry identification information is used as an input, a preset hash function is used to calculate a hash value, and a memory address of the directory entry metadata in the directory entry cache layer is obtained through the hash table.
As shown in fig. 5, after the directory entry cache layer and the file attribute cache layer are constructed and completed in this embodiment, when a metadata request is received, if directory entry information of the directory entry cache layer needs to be acquired, the directory entry cache layer is entered to query the directory entry information, and when metadata of file attributes needs to be queried, the file attribute cache layer is entered to query the directory entry attributes, and a final data retrieval result is returned.
In the specific application embodiment, after the directory entry caching layer and the file attribute caching layer are constructed, the metadata requests sent by the client are classified, for example, into three types, namely directory entry-only correlation, file attribute-only correlation, and directory entry-metadata attribute-all correlation. Aiming at a retrieval request sent by a server, the type of target metadata is judged firstly, if only directory entries are relevant, a hash value is calculated through a preset hash function by taking the transmitted directory entry identification information as input, and the memory address of the directory entry information in a directory entry cache layer is obtained based on a hash table, so that the directory entry information to be inquired can be returned; if the subdirectory item of the target directory item needs to be further inquired, traversing a subdirectory linked list in the inquired directory item metadata, and acquiring information of all subdirectory items; if the data of the file attribute cache layer needs to be searched at the same time, the directory item identification retrieved by the directory item cache layer is used as an index, the file attribute cache layer is indexed to search the corresponding target data, and the needed directory attribute and file attribute metadata are quickly retrieved.
The metadata hierarchical caching device for the centerless distributed file system in the embodiment comprises:
the hierarchical cache structure comprises a directory entry cache layer and a file attribute cache layer which are hierarchically arranged for each storage node in the centerless distributed file system, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
and the data hierarchical storage module is used for inquiring directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in the virtual file system, storing the inquired directory entry metadata and directory entry relations into a directory entry cache layer, wherein the directory entry relations are relations between the directory entry metadata and the directory entries, and storing the inquired directory entry identification information and the file attribute metadata into a file attribute cache layer according to the corresponding relations.
The directory entry cache layer is specifically arranged in a memory, and the file attribute cache layer is arranged in a fast storage medium.
In this embodiment, the data layering storage module includes:
the defining unit is used for defining a directory entry structure body to be used for storing the directory entry relation; defining a directory entry array for storing the array of the directory entry structure, and defining a file attribute database for storing the file attributes of the directory entries;
and the traversal unit is used for traversing all directory items of a shared directory in the virtual file system, inquiring the directory item attribute corresponding to each directory item during traversal, storing the directory item attribute into the directory item array, acquiring the directory item relationship, writing the directory item relationship into the directory item structure, constructing and forming the directory item cache layer, acquiring the directory item identification information during traversal, storing the directory item identification information and the file attribute metadata of the directory item into the file attribute database according to the corresponding relationship, and constructing and forming the file attribute cache layer.
In this embodiment, the system further includes a data retrieval module connected to the data hierarchical storage module, and configured to retrieve the directory entry cache layer and/or the file attribute cache layer according to the request type and the directory identification information when receiving a metadata request operation for the storage node; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; traversing a subdirectory linked list in the searched directory entry metadata to obtain all subdirectory entry metadata if the subdirectory entry of the target directory entry needs to be searched; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer at present and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
The metadata hierarchical caching apparatus for the centerless distributed file system in this embodiment corresponds to the above metadata hierarchical caching method for the centerless distributed file system, and the specific implementation principle is as described above, and is not described in detail here.
According to the hierarchical caching method, the metadata is hierarchically cached by combining the corresponding relation and the hierarchical caching structure, the processing speed of the storage node on the metadata request of the client can be increased on the premise of not influencing the architecture of the centerless distributed file system, and the metadata request path is reasonably optimized, so that the metadata request speed can be effectively increased, and the performance of the centerless distributed file system can be improved.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims (10)

1. A metadata hierarchical caching method for a centerless distributed file system is characterized by comprising the following steps:
s1, hierarchical cache construction: the method comprises the steps that a directory entry cache layer and a file attribute cache layer are arranged for each storage node in a centerless distributed file system in a layering mode, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
s2, data layered storage: the method comprises the steps of inquiring directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in a virtual file system, storing the directory entry metadata and the directory entry relations obtained through inquiry into a directory entry cache layer, wherein the directory entry relations are relations between the directory entry metadata and the directory entries, and storing the directory entry identification information and the file attribute metadata obtained through inquiry into the file attribute cache layer according to the corresponding relations.
2. The method for hierarchical caching of metadata for a centerless distributed file system of claim 1, wherein: in step S1, the directory entry caching layer is set in the memory, and the file attribute caching layer is set in the fast storage medium.
3. The metadata hierarchical caching method for the centerless distributed file system according to claim 1, wherein the step of step S2 comprises:
s21, defining a directory entry structure body for storing a directory entry relation; defining a directory entry array for storing the array of the directory entry structure, and defining a file attribute database for storing the file attributes of the directory entries;
s22, traversing all directory items of a shared directory in the virtual file system, inquiring the directory item attribute corresponding to each directory item during traversal, storing the directory item attribute into the directory item array, acquiring the directory item relationship, writing the directory item relationship into the directory item structure, constructing and forming the directory item cache layer, acquiring the directory item identification information during traversal, storing the directory item identification information and the file attribute metadata of the directory items into the file attribute database according to the corresponding relationship, and constructing and forming the file attribute cache layer.
4. The metadata hierarchical caching method for the centerless distributed file system according to claim 3, wherein: in step S22, when traversing a directory entry each time, first constructing the directory entry structure based on the information of the directory entry, and storing the directory entry structure in the directory entry array; after the directory entry array is stored, writing a father directory value of a current directory entry, a head value of a sub-directory entry linked list and a position value of the sub-directory entry linked list acquired in traversal operation into a data parameter corresponding to the directory entry structure, and constructing and forming a directory entry cache layer; and forming a k-v key value pair form by using the directory entry identification information as a key and using the file attribute value metadata of the directory entry as a value, and inserting the k-v key value pair form into the file attribute database to complete the construction of the file attribute cache layer.
5. The metadata hierarchical caching method for the centerless distributed file system according to claim 4, wherein when storing into the directory entry array, the storage location is calculated by a hash function, the input of the hash function is directory entry identification information in the current directory entry, and the output is an array index.
6. The metadata hierarchical caching method for the centerless distributed file system according to any one of claims 1 to 5, wherein the step S2 is followed by a data retrieving step S3, the steps comprising: when receiving a metadata request operation for a storage node, retrieving the directory entry cache layer and/or the file attribute cache layer according to the received request type and directory identification information; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; if the information of the parent directory entry or the sub-directory entry related to the directory entry needs to be inquired at the same time, traversing a sub-directory linked list in the inquired directory entry metadata to obtain all sub-directory entry metadata; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer currently and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
7. The metadata hierarchical caching method for the centerless distributed file system according to claim 6, wherein: and when traversing the sub-directory linked list in the searched directory entry metadata, specifically traversing the sub-directory linked list by a parent directory value in the directory entry structure, a header value of the sub-directory entry linked list and a position value of the sub-directory entry linked list to obtain all the sub-directory entry metadata.
8. The metadata hierarchical caching method for the centerless distributed file system according to claim 6, wherein in step S3, a hash value is calculated by using a preset hash function with the received directory entry identification information as an input, and a memory address of the directory entry metadata in the directory entry caching layer is obtained through a hash table.
9. A metadata hierarchical caching apparatus for a centerless distributed file system, comprising:
the hierarchical cache structure comprises a directory entry cache layer and a file attribute cache layer which are hierarchically arranged for each storage node in the centerless distributed file system, wherein the directory entry cache layer is used for caching metadata related to directory entries, and the file attribute cache layer is used for caching metadata related to file attributes;
the data hierarchical storage module is used for inquiring directory entry metadata, directory entry relations, directory entry identification information and file attribute metadata corresponding to each directory entry in the virtual file system, storing the inquired directory entry metadata and directory entry relations into the directory entry cache layer, wherein the directory entry relations are relations between the directory entry metadata and the directory entries, and storing the inquired directory entry identification information and the file attribute metadata into the file attribute cache layer according to the corresponding relations.
10. The metadata hierarchical caching apparatus for the centerless distributed file system according to claim 9, further comprising a data retrieval module connected to the data hierarchical storage module, configured to retrieve the directory entry caching layer and/or the file attribute caching layer according to a request type and directory identification information when a metadata request operation for a storage node is received; when a request operation related to a directory entry is received, acquiring a memory address of target directory entry metadata in a directory entry cache layer according to the received directory entry identification information, and returning the target directory entry metadata according to the acquired address; if the information of a parent directory entry or a sub-directory entry related to the current directory entry needs to be inquired at the same time, traversing a sub-directory linked list in the inquired directory entry metadata to obtain all sub-directory entry metadata; and if the data of the file attribute cache layer needs to be inquired at the same time, acquiring the directory entry identification which is retrieved at the directory entry cache layer currently and taking the directory entry identification as an index, and indexing the file attribute cache layer to inquire the corresponding target data.
CN202110160110.4A 2021-02-05 2021-02-05 Metadata layered caching method and device for centerless distributed file system Active CN113010486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110160110.4A CN113010486B (en) 2021-02-05 2021-02-05 Metadata layered caching method and device for centerless distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110160110.4A CN113010486B (en) 2021-02-05 2021-02-05 Metadata layered caching method and device for centerless distributed file system

Publications (2)

Publication Number Publication Date
CN113010486A CN113010486A (en) 2021-06-22
CN113010486B true CN113010486B (en) 2021-11-26

Family

ID=76383808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110160110.4A Active CN113010486B (en) 2021-02-05 2021-02-05 Metadata layered caching method and device for centerless distributed file system

Country Status (1)

Country Link
CN (1) CN113010486B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116613A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Metadata query method, equipment and storage medium based on distributed file system
US11762812B2 (en) 2021-12-10 2023-09-19 Microsoft Technology Licensing, Llc Detecting changes in a namespace using namespace enumeration endpoint response payloads
CN114003562B (en) * 2021-12-29 2022-03-22 苏州浪潮智能科技有限公司 Directory traversal method, device and equipment and readable storage medium
CN114415971B (en) * 2022-03-25 2022-09-23 阿里云计算有限公司 Data processing method and device
CN117435559B (en) * 2023-12-20 2024-03-01 苏州元脑智能科技有限公司 Metadata hierarchical management method and device, storage medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102088389A (en) * 2009-12-02 2011-06-08 中兴通讯股份有限公司 Distributed content access scheduling device and content reading method
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
CN102567396A (en) * 2010-12-30 2012-07-11 中国移动通信集团公司 Method, system and device for data mining on basis of cloud computing
CN103179185A (en) * 2012-12-25 2013-06-26 中国科学院计算技术研究所 Method and system for creating files in cache of distributed file system client
CN103310000A (en) * 2013-06-25 2013-09-18 曙光信息产业(北京)有限公司 Metadata management method
CN109086462A (en) * 2018-09-21 2018-12-25 郑州云海信息技术有限公司 The management method of metadata in a kind of distributed file system
CN110750507A (en) * 2019-09-30 2020-02-04 华中科技大学 Client persistent caching method and system under global namespace facing DFS
US10579524B1 (en) * 2010-10-25 2020-03-03 Mellanox Technologies Ltd. Computing in parallel processing environments
CN111881107A (en) * 2020-08-05 2020-11-03 北京计算机技术及应用研究所 Distributed storage method supporting mounting of multi-file system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5975473B2 (en) * 2014-08-29 2016-08-23 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Backup method, restore method, storage system, and program
US9875249B1 (en) * 2015-03-31 2018-01-23 Maginatics Llc Reducing database fragmentation
US10678461B2 (en) * 2018-06-07 2020-06-09 Vast Data Ltd. Distributed scalable storage

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102088389A (en) * 2009-12-02 2011-06-08 中兴通讯股份有限公司 Distributed content access scheduling device and content reading method
US10579524B1 (en) * 2010-10-25 2020-03-03 Mellanox Technologies Ltd. Computing in parallel processing environments
CN102567396A (en) * 2010-12-30 2012-07-11 中国移动通信集团公司 Method, system and device for data mining on basis of cloud computing
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
CN103179185A (en) * 2012-12-25 2013-06-26 中国科学院计算技术研究所 Method and system for creating files in cache of distributed file system client
CN103310000A (en) * 2013-06-25 2013-09-18 曙光信息产业(北京)有限公司 Metadata management method
CN109086462A (en) * 2018-09-21 2018-12-25 郑州云海信息技术有限公司 The management method of metadata in a kind of distributed file system
CN110750507A (en) * 2019-09-30 2020-02-04 华中科技大学 Client persistent caching method and system under global namespace facing DFS
CN111881107A (en) * 2020-08-05 2020-11-03 北京计算机技术及应用研究所 Distributed storage method supporting mounting of multi-file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Support-Based Prefetching Technique for Hierarchical Collaborative Caching Algorithm to Improve the Performance of a Distributed File System";R. Gopisetty 等;《2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming》;20160121;第97-103页 *
"BeeGFS并行文件系统性能优化技术研究";宋振龙 等;《计算机工程与科学》;20201015;第1765-1773页 *

Also Published As

Publication number Publication date
CN113010486A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113010486B (en) Metadata layered caching method and device for centerless distributed file system
CN105354151B (en) Cache management method and equipment
CN104123359B (en) Resource management method of distributed object storage system
US8296312B1 (en) Search and update of attributes in file systems
CN106874383B (en) Decoupling distribution method of metadata of distributed file system
US10467188B2 (en) In-line policy management with multi-level object handle
CN103577123B (en) A kind of small documents optimization based on HDFS stores method
JP6006267B2 (en) System and method for narrowing a search using index keys
US9565254B2 (en) Object location service for network-based content repository
US8402071B2 (en) Catalog that stores file system metadata in an optimized manner
WO2016029865A1 (en) Metadata index search in file system
CN107562757B (en) Query and access method, device and system based on distributed file system
CN105677826A (en) Resource management method for massive unstructured data
US9842114B2 (en) Peer to peer network write deduplication
CN111639075B (en) Non-relational database vector data management method based on flattened R tree
CN112015820A (en) Method, system, electronic device and storage medium for implementing distributed graph database
CN111209259A (en) NAS distributed file system and data processing method
CN114297145A (en) Method, medium and system for searching file based on keywords locally by IPFS node
US20180203908A1 (en) Distributed database system and distributed data processing method
WO2023179787A1 (en) Metadata management method and apparatus for distributed file system
Min et al. The mobile spatial DBMS for the partial map air update in the navigation
US8549041B2 (en) Converter traversal using power of two-based operations
CN109213760A (en) The storage of high load business and search method of non-relation data storage
CN105468599A (en) Metadata hierarchy management method for storage virtualization system
Hua et al. SmartStore: A new metadata organization paradigm with metadata semantic-awareness for next-generation file systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant