CN114610687A

CN114610687A - Metadata storage method and distributed file system

Info

Publication number: CN114610687A
Application number: CN202210158235.8A
Authority: CN
Inventors: 舒继武; 陆游游; 吕文豪
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-06-10

Abstract

The application provides a metadata storage method and a distributed file system, wherein the method is applied to a Distributed File System (DFS), and the DFS comprises a plurality of metadata servers; according to the method, each directory metadata is disassembled (namely decoupled) into two parts of corresponding access metadata and content metadata, and then each access metadata, each content metadata and each file metadata after each directory metadata corresponding to a directory tree is disassembled are recombined and stored on the premise that the mapping relation is not changed, so that the directory list operation of the DFS is ensured to be performed on one MDS. The method and the device solve the problem that the performance of the whole distributed file system is low due to frequent directory list operation in the prior art.

Description

Metadata storage method and distributed file system

Technical Field

The present application relates to the field of data storage technologies, and in particular, to a metadata storage method and a distributed file system.

Background

The Distributed File System (DFS) organizes and manages files in the whole DFS through a directory tree of a hierarchical tree structure composed of directory metadata and File metadata. The global directory tree view corresponding to the directory tree is shown in fig. 1, and the global directory tree view is a view that only displays the directory name of each directory metadata and the file name of each file metadata in the directory tree, and the hierarchical mapping relationship between the directory metadata and the file metadata. The DFS architecture is shown in fig. 2. The directory Metadata and file Metadata in the DFS are stored in a distributed manner on a plurality of Metadata servers (MDSs) 221 in the MDS cluster 22, and the respective stored directory Metadata or file Metadata is managed and processed by each MDS 221. And the file content Data of the files in the DFS is stored on a plurality of Data Servers (DS)231 in a Data Server (DS) cluster 23. Client 211 performs an access operation (e.g., reads) on the file content data of the target file on DS 231 by accessing MDS cluster 22 and obtaining the file metadata of the target file from MDS cluster 22, and then indexing to DS 231 storing the file content data of the target file based on the file metadata. The directory metadata comprises a directory name, a globally unique Identifier (ID) of the directory, authority information of the directory, a timestamp of the directory, directory entries (namely, a list of subfile names and subdirectory names under the directory), and for example, the directory metadata for specifying the directory name comprises a specified directory name, an ID for specifying the directory name, authority information for specifying the directory name, a timestamp for specifying the directory name, and a directory entry for specifying the directory name; the file metadata includes a file name, a globally unique identifier of the file, rights information of the file, file content data storage information (e.g., an identifier of a DS storing the file content data).

In the process that the client 211 obtains the file metadata of the target file from the MDS cluster 22, the MDS 221 associated with the target file in the MDS cluster 22 is required to analyze the stored metadata (i.e., the directory metadata and/or the file metadata) of the target file, obtain the file metadata of the target file, and return the file metadata to the client 211. Currently, the metadata of the DFS is commonly stored on the MDSs 221 in the MDS cluster 22 as follows: the directory tree of the DFS is partitioned hierarchically (as shown in figure 1) and metadata at different levels is stored on different MDSs 221 of MDS cluster 22. Each MDS is responsible for managing metadata stored therein, and all MDSs 221 in the MDS cluster 22 cooperatively maintain metadata of one DFS directory tree and a hierarchical mapping relationship between the metadata, and perform metadata processing based on an access request of the client 211. The metadata storage mode is simple in design and low in implementation difficulty, the whole DFS metadata processing throughput is the sum of the throughputs of all MDS 221 in the MDS cluster 22, and the metadata processing performance of the DFS is effectively guaranteed.

Since the DFS has a frequent directory listing operation (such as new or deletion of a file, directory command (ls) operation, etc.), when the DFS adopts the existing metadata storage method, the frequent directory listing operation will cause the performance of the whole DFS to be low.

Disclosure of Invention

The application provides a metadata storage method and a distributed file system, which aim to solve the problem that the performance of the whole DFS is low due to frequent directory list operation in the prior art.

In a first aspect, the present application provides a metadata storage method, which is applied to a distributed file system DFS, where the distributed file system DFS includes a plurality of metadata servers;

the method comprises the following steps:

decoupling each directory metadata in the DFS directory tree as follows to obtain access metadata and content metadata corresponding to each directory metadata: decoupling the directory metadata specifying the directory name into access metadata specifying the directory name and content metadata specifying the directory name; the access metadata of the appointed directory name comprises the appointed directory name and a global unique identifier of the appointed directory name; the content metadata specifying the directory name includes a directory entry specifying the directory name;

and forming a metadata group by using the access metadata and the content metadata corresponding to the directory metadata in the directory tree and the file metadata as follows: the method comprises the steps that access metadata of a specified directory name, content metadata of a parent directory of the specified directory name and a global unique identifier of the parent directory of the specified directory name form a metadata set; and the content metadata of the appointed directory name, the global unique identifier of the appointed directory name and the access metadata of the subdirectory of the appointed directory name and/or the file metadata of the subfile form a metadata group;

and storing a plurality of metadata groups of the directory tree in a balanced distribution mode in the plurality of metadata servers.

Optionally, the storing, in a balanced distribution, the multiple metadata groups of the directory tree in the multiple metadata servers includes:

respectively carrying out consistent hash calculation on the metadata group and the metadata server to obtain hash values corresponding to the metadata group and the metadata server;

and correspondingly storing the metadata groups in the metadata server based on the hash values.

Optionally, the performing consistent hash calculation on the metadata group to obtain a hash value corresponding to the metadata group includes:

performing consistent hash calculation on the metadata group in the following way to obtain a hash value corresponding to the metadata group:

carrying out consistent Hash calculation on the global unique identifier of the father directory of the appointed directory name to obtain a Hash value of the access metadata of the appointed directory name;

carrying out consistent Hash calculation on the global unique identifier of the appointed directory name to obtain a Hash value of the content metadata of the appointed directory name;

and carrying out consistent hash calculation on the global unique identifier of the parent directory to which the file metadata belongs to obtain the hash value of the file metadata.

Optionally, the correspondingly storing the metadata groups in the metadata server includes:

and correspondingly storing the metadata groups in the metadata server in a key value pair mode.

Optionally, the access metadata of the specified directory name further includes authority information of the specified directory name; the content metadata specifying the directory name further includes a timestamp specifying the directory name;

the correspondingly storing the metadata groups in the metadata server in the form of key-value pairs comprises:

correspondingly storing the metadata groups in the metadata server by adopting key value pairs in the following forms:

a key value pair formed by taking the global unique identifier of the appointed directory name and the authority information of the appointed directory name as values and taking the global unique identifier of the father directory of the appointed directory name and the appointed directory name as keys;

a key value pair consisting of a directory entry of the specified directory name and a timestamp of the specified directory name as values and a globally unique identifier of the specified directory name as a key;

and the key value pair is formed by taking the file metadata as a value and taking the global unique identifier and the file name of the parent directory to which the file metadata belongs as keys.

In a second aspect, the present application provides a distributed file system, comprising: a plurality of clients, a plurality of metadata servers, a plurality of data servers; wherein,

the client is used for sending an access request to the metadata server to access the metadata server and obtaining a request result returned by the metadata server; the data server is also used for accessing the file content data stored by the data server corresponding to the request result based on the request result;

the metadata server is used for storing the directory metadata and the file metadata of the distributed file system by adopting the method; the metadata processing module is further used for processing metadata of the metadata group stored in the metadata server based on the access request of the client to obtain a request result and returning the request result to the client;

the metadata server is further configured to maintain consistency of the client cache item with metadata of the distributed file system by:

after receiving the access request of the client, the metadata server checks the validity of a client cache item corresponding to the access request to obtain a check result, and performs the following consistency maintenance operation based on the check result:

if the check result is that the client cache item corresponding to the access request is valid, returning a request result corresponding to the access request and an invalid instruction for invalidating the invalid cache item in the client cache to the client;

and if the check result is that the cache item of the client corresponding to the access request is invalid, returning a termination instruction for representing termination of the access request and an invalid instruction for invalidating the invalid cache item in the cache of the client to the client.

Optionally, the system further includes: a coordination server;

the coordination server is used for identifying a directory orphan loop in a directory tree of the distributed file system when a plurality of clients perform directory renaming operation concurrently, and sending a termination instruction for terminating the directory renaming operation to the clients when the directory orphan loop is identified;

the directory orphan loop is an isolated mapping loop formed by mapping relations among a plurality of directory metadata or an isolated mapping loop formed by mapping relations among a plurality of directory metadata and file metadata.

In a third aspect, the present application provides a server, comprising:

a processor and a memory;

the memory stores executable instructions executable by the processor;

wherein execution of the executable instructions stored by the memory by the processor causes the processor to perform the method as described above.

In a fourth aspect, the present application provides a storage medium having stored therein computer-executable instructions for implementing the method as described above when executed by a processor.

In a fifth aspect, the present application provides a program product comprising a computer program which, when executed by a processor, implements the method as described above.

According to the metadata storage method and the distributed file system, the directory metadata are disassembled into two parts, namely the directory access metadata and the directory content metadata, and then the directory access metadata, the directory content metadata and the file metadata of each disassembled directory corresponding to the directory tree are recombined and stored on the premise that the mapping relation is not changed, so that the directory list operation of the DFS is ensured to be performed on one MDS. The method and the device solve the problem that the performance of the whole DFS is low due to frequent directory listing operation in the prior art.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a diagram of a prior art global directory tree view;

FIG. 2 is a diagram of a prior art distributed file system architecture;

FIG. 3 is a flowchart of a metadata storage method provided in an embodiment of the present application;

FIG. 4 is a diagram of a distributed file system architecture provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a directory orphan loop provided in an embodiment of the present application;

fig. 6 is a diagram of a server structure provided in the embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Distributed File Systems (DFS) have the advantages of cost saving, convenient management, good extensibility, strong reliability, strong usability, etc., and are now widely used. The distributed file system organizes and manages files in the whole distributed file system through a directory tree corresponding to the global directory tree view shown in fig. 1. When the distributed file system adopts the existing metadata storage mode, the directory tree of the distributed file system is divided according to the hierarchy shown in fig. 1, and metadata of different hierarchies is stored on different MDSs 221 of the MDS cluster 22 shown in fig. 2, the hierarchical mapping relationship between the metadata and the metadata of the DFS directory tree is cooperatively maintained by a plurality of MDSs 221 in the MDS cluster 22. When a directory list operation (such as file new or deletion) needs to be performed on the distributed file system, new or deletion of corresponding file metadata needs to be performed on the new or deleted file. When the file metadata is newly created or deleted, the mapping relationship between the file metadata and the directory metadata of the parent directory of the file metadata is necessarily newly created or modified. However, in the prior art, file metadata and directory metadata of its parent directory are stored on different MDS 221 as belonging to different hierarchies. When the mapping relationship between the file metadata and the directory metadata of the parent directory of the file metadata is newly created or modified, the two MDSs 221 corresponding to the file metadata and the directory metadata of the parent directory of the file metadata are required to perform communication coordination of a distributed protocol, so as to complete the processing of the related file metadata and directory metadata on the two MDSs 221. For example, when a file metadata is created or deleted, two MDS 221 distributed atomic commit protocols (e.g., two-phase commit) in which the file metadata and the directory metadata of its parent directory are located are required to ensure that data on multiple metadata servers are atomically updated, and a distributed lock protocol (e.g., two-phase lock) is required to ensure isolation between different operations. These distributed protocols require cumbersome network communications, with high delays; the contention among the locks causes a decrease in the concurrency and a decrease in throughput, which in turn affects the throughput performance of the distributed file system. However, with the continuous development of network technology, the number of files carried by the distributed file system does not increase, and there are cases where the client 211 in the distributed file system frequently creates or deletes a file, performs directory command modification operation, and other directory list operations. The distributed file system of the business such as Taobao, Jingdong, etc. has a situation that the files of the product information are frequently uploaded or deleted due to the extremely huge business group. Therefore, frequent directory listing operations may result in degraded overall distributed file system performance (e.g., throughput performance).

Since the directory metadata includes a directory name, a globally unique Identifier (ID) of the directory, authority information of the directory, a timestamp of the directory, and a directory entry; the file metadata includes a file name, a globally unique identifier of the file, rights information of the file, and file content data storage information (e.g., an identifier of the DS storing the file). The hierarchical mapping relation between the directory metadata of the previous hierarchy and the directory metadata of the next hierarchy is realized through the corresponding relation between the directory entry in the directory metadata of the previous hierarchy and the directory name in the directory metadata of the next hierarchy; similarly, the hierarchical mapping relationship between the directory metadata of the previous hierarchy and the file metadata of the next hierarchy is realized by the correspondence relationship between the directory entry in the directory metadata of the previous hierarchy and the file name in the file metadata of the next hierarchy. For example, as shown in fig. 1, the directory metadata, the file metadata, and the mapping relationship before the new file f3 in the directory tree corresponding to the global directory tree view are as shown in table 1:

TABLE 1 directory metadata, file metadata, and mapping relationships

The directory metadata and file metadata examples in table 1 are illustrated as follows:

d1 denotes the name of directory d 1; IDd1 represents the globally unique identifier of directory d1, i.e., the ID of directory d 1; the d1 rights information indicates the rights information of the directory d 1; the d1 timestamp represents the timestamp of directory d 1; the d1 directory entry represents a directory entry of directory d 1.

f1 represents the name of file f 1; IDf1 represents the globally unique identifier for file f1, namely the ID of file f 1; f1 rights information indicates rights information of the file f 1; the f1 timestamp represents the timestamp of file f 1; the f1 content storage information indicates file content data storage information of the file f1, such as a storage address or an identifier of a data server storing the file content data of the file f1, or the like.

In the present application, the representation manner of other directory metadata and file metadata is similar to the above description of the directory metadata and the description of the file metadata, and is not repeated.

As shown in table 1, in the prior art, the directory list operation is an operation (such as new creation, modification, deletion) on the hierarchical mapping relationship between two directory metadata or between directory metadata and file metadata. For example, as shown in table 1 above, when a file f3 needs to be newly created in the DFS, and a hierarchical mapping relationship between directory metadata of a new directory d3 (i.e., the directory name is d3) and file metadata of a file f3 (i.e., the file f3 is f3) is realized, a directory entry in the directory metadata of the directory d3 needs to be modified (e.g., f3 is newly added to the directory entry), and file metadata of a new file f3 needs to be newly created. Thus, the operation of the new file f3 requires multiple MDSs (i.e., at least two MDSs, such as the MDS storing the directory metadata of the directory d3 and the MDS storing the file metadata of the file f3) to perform communication coordination in a distributed protocol.

That is, in the metadata storage method in the prior art, the directory tree is disassembled from the hierarchical mapping relationship between two directory metadata or between the directory metadata and the file metadata associated with the directory metadata, and then the directory tree is stored in a distributed manner.

In order to solve the problem that after metadata storage is performed by adopting the prior art, directory list operation needs a plurality of MDSs to perform communication coordination of a distributed protocol, the application provides a metadata storage method, each directory metadata is disassembled (namely decoupled) into two parts of corresponding access metadata and content metadata, and then each access metadata, each content metadata and each file metadata after the directory metadata corresponding to a directory tree are disassembled are recombined and stored on the premise of ensuring that a mapping relation is not changed, so that the directory list operation of the DFS is ensured to be performed on one MDS, and the problem that the performance of the whole DFS is low due to frequent directory list operation in the prior art is solved.

The metadata storage method provided by the present application is described below with reference to some embodiments.

The distributed file system architecture of the embodiment of the present application may also be an architecture as shown in fig. 2. As shown in fig. 2, the distributed file system includes: a client cluster 21, a metadata server (MDS) cluster 22, and a Data Server (DS) cluster 23. The MDS cluster 22 includes a plurality of metadata servers (MDSs) 221, such as metadata server 1, metadata server 2, metadata servers 3, …, and metadata server n shown in fig. 2; the DS cluster 23 includes a plurality of Data Servers (DS)231, such as data server 1, data server 2, data server 3, …, data server m shown in fig. 2; the client cluster 21 includes a plurality of clients 211, such as client 1, client 2, client 3, …, client p shown in fig. 2; and m, n and p are all natural numbers. The distributed system organizes and manages files in the whole DFS through a directory tree. The directory metadata and the file metadata in the DFS are uniformly and distributively stored on the multiple MDSs 221, and the respective stored directory metadata or file metadata is managed and processed by each MDS 221. File content data of files in the DFS is stored on the plurality of DSs 231. The client 211 may perform an access operation (e.g., read) on the file content data of the target file stored on the DS 231 by accessing at least one MDS 221 and obtaining the file metadata of the target file from the MDS 221, and then communicating with the DS 231 corresponding to the file metadata based on the file metadata. The method for storing directory metadata and file metadata corresponding to the directory tree of the distributed file system on MDS cluster 22 is shown in fig. 3.

Fig. 3 is a flowchart of a metadata storage method according to an embodiment of the present application. The execution subject of the embodiment shown in fig. 3 may be any one of the client 211, MDS 221, DS 231 shown in fig. 2. The metadata storage method provided by the application is applied to a distributed file system DFS, and the distributed file system DFS comprises a plurality of metadata servers. As shown in fig. 3, the method includes:

s301, decoupling each directory metadata in the DFS directory tree as follows to obtain access metadata and content metadata corresponding to each directory metadata: decoupling the directory metadata specifying the directory name into access metadata specifying the directory name and content metadata specifying the directory name; the access metadata of the appointed directory name comprises the appointed directory name and a global unique identifier of the appointed directory name; the content metadata specifying the directory name includes a directory entry specifying the directory name.

Illustratively, the client 211 decouples each directory metadata in the DFS directory tree as follows, and obtains access metadata and content metadata corresponding to each directory metadata:

decoupling the directory metadata specifying the directory name into access metadata specifying the directory name and content metadata specifying the directory name; the access metadata of the appointed directory name comprises the appointed directory name and a global unique identifier of the appointed directory name; the content metadata specifying the directory name includes a directory entry specifying the directory name.

For example, assuming that the directory name is designated as K, the client 211 decouples each directory metadata in the DFS directory tree as follows, and obtains access metadata and content metadata corresponding to each directory metadata:

decoupling the directory metadata of the directory K into access metadata of the directory K and content metadata of the directory K; the access metadata of the directory K comprises the name of the directory K and the global unique identifier of the directory K; the content metadata of directory K includes directory entries of directory K.

S302, forming a metadata group by the access metadata and the content metadata corresponding to the directory metadata in the directory tree and the file metadata according to the following modes: the method comprises the steps that access metadata of a specified directory name, content metadata of a parent directory of the specified directory name and a global unique identifier of the parent directory of the specified directory name form a metadata set; and composing the content metadata specifying the directory name, the globally unique identifier specifying the directory name, and the access metadata specifying the subdirectory of the directory name and/or the file metadata of the subfile into a metadata set.

Illustratively, the client 211 groups the access metadata and the content metadata corresponding to each directory metadata in the directory tree, and each file metadata into metadata groups as follows:

the client 211 combines the access metadata of the specified directory name, the content metadata of the parent directory of the specified directory name and the globally unique identifier of the parent directory of the specified directory name into a metadata group; the client 211 also groups content metadata specifying a directory name, a globally unique identifier specifying the directory name, and access metadata specifying subdirectories of the directory name and/or file metadata of subfiles into a metadata set.

For example, assuming that a directory name K is specified, the client 211 groups access metadata and content metadata corresponding to each directory metadata in the directory tree, and each file metadata into a metadata group as follows:

the client 211 combines the access metadata of the directory K, the content metadata of the parent directory of the directory K, and the globally unique identifier of the parent directory of the directory K into a metadata group; the client 211 also groups the content metadata of directory K, the globally unique identifier of directory K, and the access metadata of the subdirectories of directory K and/or the file metadata of the subfiles into a metadata group.

S303, storing the plurality of metadata groups of the directory tree in a plurality of metadata servers in a balanced distribution manner.

Illustratively, the client 211 stores a plurality of metadata groups of the directory tree in a balanced distribution in a plurality of metadata servers 221.

Optionally, the client 211 performs consistent hash calculation on the metadata group and the metadata server, respectively, to obtain hash values corresponding to the metadata group and the metadata server, respectively. The client 211 stores the metadata group association in the metadata server 221 based on the hash value, i.e., the hash value corresponding to each of the metadata group and the metadata server.

Further, the client 211 performs consistent hash calculation on the metadata group in the following manner to obtain a hash value corresponding to the metadata group:

carrying out consistent hash calculation on a globally unique identifier of a parent directory of the specified directory name (namely, the parent directory ID of the specified directory name) to obtain a hash value of access metadata of the specified directory name;

carrying out consistent hash calculation on the global unique identifier (namely the directory ID of the designated directory name) of the designated directory name to obtain a hash value of the content metadata of the designated directory name;

and performing consistent hash calculation on the globally unique identifier of the parent directory to which the file metadata belongs (namely the parent directory ID of the file) to obtain the hash value of the file metadata.

For example, assuming that the directory name is designated as K, the client 211 performs consistent hash calculation on the metadata set in the following manner to obtain a hash value corresponding to the metadata set:

carrying out consistent hash calculation on the global unique identifier of the parent directory of the directory K (namely the parent directory ID of the directory K) to obtain a hash value of the access metadata of the directory K;

carrying out consistent hash calculation on the global unique identifier of the directory K (namely the directory ID of the directory K) to obtain a hash value of the content metadata of the directory K;

and performing consistent hash calculation on the globally unique identifier of the parent directory to which the file metadata belongs (namely the parent directory ID of the file) to obtain a hash value of the file metadata.

Optionally, the client 211 correspondingly stores the metadata groups in the form of key-value pairs in the metadata server 221.

Optionally, the access metadata specifying the directory name further includes permission information specifying the directory name; the content metadata specifying the directory name also includes a timestamp specifying the directory name. The client 211 correspondingly stores the metadata groups in the metadata server 221 by using key value pairs in the following form:

a key value pair consisting of a global unique identifier for specifying a directory name and authority information for specifying the directory name as values and a global unique identifier for specifying a parent directory of the directory name and a specified directory name as keys;

a key value pair consisting of a directory entry specifying a directory name and a timestamp specifying the directory name as values and a globally unique identifier specifying the directory name as a key;

For example, assuming that a directory name K is specified, the access metadata of the directory K further includes authority information of the directory K; the content metadata of catalog K also includes a timestamp of catalog K. The client 211 correspondingly stores the metadata groups in the metadata server 221 by using key value pairs in the following form:

the global unique identifier of the directory K and the authority information of the directory K are used as values, and the global unique identifier of the parent directory of the directory K and the name of the directory K are used as keys to form a key value pair;

the key value pair is formed by taking the directory item of the directory K and the timestamp of the directory K as values and taking the global unique identifier of the directory K as a key;

Exemplarily, assuming that a global directory tree view corresponding to a directory tree of a distributed file system R is shown in fig. 1, and directory metadata and file metadata corresponding to the directory tree are shown in table 1, the distributed file system R includes four metadata servers: metadata server 1, metadata server 2, metadata server 3, metadata server 4. When the metadata storage methods provided by the prior art and the present application are respectively used to store the directory metadata and the file metadata of the distributed file system R in the metadata server 1, the metadata server 2, the metadata server 3, and the metadata server 4, the respective storage statuses are respectively shown in table 2 and table 3.

Table 2 metadata storage case using prior art

Table 3 metadata storage using the method provided in the present application

The directory tree corresponding to the directory metadata and the file metadata before the new file f3 is the directory tree corresponding to the global directory tree view shown in fig. 1.

After the metadata storage of the distributed file system R is performed by using the prior art, when the directory list operation is performed on the distributed file system R, the new file f3 shown in the above table 2 at least relates to the distributed protocol communication coordination of the metadata server 2 and the metadata server 3.

After the metadata storage method provided by the present invention is used to store the metadata of the distributed file system R, when the directory list operation is performed on the distributed file system R, as shown in table 3 above, the new file f3 is created, because the file metadata of the file f3 and the hierarchical mapping relationship between the file metadata and the directory d3 only need to be performed on one metadata server, i.e., the metadata server 4, and the coordination of distributed protocol communication among multiple metadata servers is not involved.

The following describes, with reference to a specific example, an operation effect of the distributed file system after the metadata storage is performed by using the metadata storage method provided by the present invention. Assuming that a system architecture of a distributed file system Q is shown in fig. 2, the distributed file system Q stores metadata by using the metadata storage method provided by the present invention. The metadata storage of the distributed file system Q is shown in table 3. The directory tree of the distributed file system Q is the directory tree corresponding to the global directory tree view shown in fig. 1. When the client 1 wants to access the file f2, the operation flow of the distributed file system Q is as follows:

the client 1 performs consistent hash operation on the root directory ID to obtain a code (such as MDS-1) of a metadata server, namely the metadata server 1, where the access metadata of the directory d1 is located; based on MDS-1, client 1 sends access request 1 (such as directory ID, d1, user ID) to metadata server 1;

the metadata server 1 finds the access metadata of the directory d1 based on the root directory ID and d1 in the access request 1, and checks whether the user corresponding to the user ID owns the rights of the directory d1 based on the user ID in the access request 1. If the check result is authorized, the metadata server 1 returns the IDd1 and an acknowledgement signal (such as an ACK signal) to the client 1; if the check result is no authority, the metadata server 1 returns a non-acknowledgement signal (e.g., NACK signal) to the client 1 to terminate the access request 1 of the client 1. It is assumed in this example that the metadata server 1 returns an IDd1 and an ACK signal to the client 1.

Continuously, the client 1 performs consistent hash operation on the IDd1 to obtain an encoding (such as MDS-2) of a metadata server, i.e., the metadata server 2, where the access metadata of the directory d3 is located; the client 1 sends an access request 2 (e.g., IDd1, d3, user ID) to the metadata server 2 based on MDS-2;

continuously, the metadata server 2 finds the access metadata of the directory 3 based on the IDd1 and d3 in the access request 2, and checks whether the user corresponding to the user ID owns the right of the directory d3 based on the user ID in the access request 2. If the check result is that the client has the authority, the metadata server 2 returns the IDd3 and the ACK signal to the client 1;

continuously, the client 1 performs consistent hash operation on the IDd3 to obtain an encoding (such as MDS-4) of a metadata server, i.e., the metadata server 4, where the file metadata of the file f2 is located; the client 1 sends an access request 3 (e.g., IDd3, f2, user ID) to the metadata server 4 based on MDS-4;

continuously, the metadata server 4 finds the file metadata of the file f2 based on the IDd3 and f2 in the access request 3, and checks whether the user corresponding to the user ID has the authority of the file f2 based on the user ID in the access request 3. If the check result is that the file has the authority, the metadata server 4 returns the file metadata of the file f2 and an ACK signal to the client 1;

finally, the client 1 can communicate with the data server storing the file content data of the file f2 based on the f2 content storage information in the file metadata of the file f2, and read the file content data of the file f2 stored on the data server.

Therefore, after the metadata storage method provided by the application is adopted to store the metadata of the distributed file system, the performance (such as throughput performance) of the distributed file system cannot be adversely affected by frequent directory list operation. The metadata storage method provided by the application solves the problem that the prior art has low whole DFS performance (such as throughput performance) caused by frequent directory list operation. In addition, in the distributed file system adopting the metadata storage method provided by the application, when the client accesses the file content data of the file in the distributed file system, the client only needs to perform consistent hash operation and then performs targeted communication with the related metadata server, information interaction between the metadata servers is not needed, and the performance of the whole distributed file system for dealing with the file content data of the file concurrently accessed by multiple clients is greatly improved.

The embodiment of the application also provides a distributed file system. Fig. 4 is a diagram of a distributed file system architecture provided by an embodiment of the present application. Fig. 5 is a schematic diagram of a directory orphan loop provided in an embodiment of the present application. The distributed file system provided by the present application is described below with reference to fig. 4 and 5. As shown in fig. 4, the distributed file system includes: a plurality of clients 211, a plurality of metadata servers 221, and a plurality of data servers 231.

The client 211 is configured to send an access request to the metadata server 221 to access the metadata server 221, and obtain a request result returned by the metadata server 221; and is further configured to access, based on the request result, the file content data stored by the data server 231 corresponding to the request result;

a metadata server 221, configured to store directory metadata and file metadata of the distributed file system by using the method provided by the present application (e.g., the method provided in the embodiment shown in fig. 3); the metadata processing module is further configured to perform metadata processing on a metadata group stored by the metadata server 221 based on an access request of the client 211, obtain a request result, and return the request result to the client 211;

metadata server 221, further configured to maintain consistency of client 211 cached items with metadata of the distributed file system by:

after receiving the access request from the client 211, the metadata server 221 checks the validity of the cache entry of the client 211 corresponding to the access request, obtains a check result, and performs the following consistency maintenance operation based on the check result:

if the check result is that the cache item of the client 211 corresponding to the access request is valid, returning a request result corresponding to the access request and an invalidation instruction for invalidating the invalidated cache item in the cache of the client 211 to the client;

if the check result is that the cache entry of the client 211 corresponding to the access request is invalid, a termination instruction for characterizing termination of the access request and an invalidation instruction for invalidating the invalid cache entry in the client cache are returned to the client 211.

Illustratively, the technical effect of the manner for maintaining consistency of the client 211 cache items and the metadata of the distributed file system provided by the embodiment of the present application is described as follows:

in the prior art, the way for the metadata server 221 to maintain the consistency of the cached items of the client 211 and the metadata of the distributed file system is as follows: once the directory tree in the distributed file system changes, if at least one metadata server 221 performs a directory change operation that causes the directory tree to change, such as a directory renaming operation, a directory list operation, a directory authority modification operation, and the like, the metadata server 221 that performs the directory change operation simultaneously sends an invalidation instruction for invalidating the invalidated cache entry in the client 211 to each client 211 after the directory change operation is completed. That is, each time the metadata server 221 performs a directory change operation, it will simultaneously send an invalidation instruction to each client 211, and each time the client 211 receives the invalidation instruction, it will perform an invalidation operation on its corresponding cache entry. When frequent directory change operations exist in the distributed file system, in the whole distributed file system, the overhead of maintaining the consistency between the cache items of the client 211 and the metadata of the distributed file system by the metadata server 221 is large.

In practical applications, the cache entry in each cache of the client 211 is not a cache entry that is used with high frequency, and there is a case that a certain cache entry in most of the caches of the client 211 is not used after being cached in the client 211. For such unused client 211 cache entries, there is no need for consistency maintenance. Therefore, the manner for maintaining the consistency between the cache entry of the client 211 and the metadata of the distributed file system provided in the embodiment of the present application is to perform validity check and consistency maintenance on the cache entry in the client 211 when the client 211 uses the cache entry, so as to ensure that the consistency maintenance overhead of the metadata server 221 on the cache entry that is not used in the cache of the client 211 is avoided on the premise of not affecting the usability of the client 211.

Optionally, the distributed file system provided by the present application further includes: coordinating server 41.

The coordination server 41 is configured to identify a directory orphan loop in a directory tree of the distributed file system when the multiple clients 211 concurrently perform a directory renaming operation, and send an abort instruction to abort the directory renaming operation to the clients 211 when the directory orphan loop is identified.

An example of coordinating the identification of the directory orphan loop by server 41 and sending an abort instruction to client 211 to abort the directory rename operation upon identifying the directory orphan loop is described as follows:

illustratively, as shown in fig. 5, fig. 5a is a directory tree path diagram (i.e., a hierarchical mapping relationship walk diagram) of the distributed file system S before a directory renaming operation, where the directory tree path diagram includes the following two paths: root directory (i.e., directory denoted by "/" in fig. 5) → directory a → directory B → directory C, and root directory → directory D → directory E → directory F; fig. 5b and 5c are path diagrams of the client M and the client N performing directory renaming operation on the distributed file system S at the same time respectively; fig. 5d is a global directory tree view formed after the client M and the client N perform directory renaming operations on the distributed file system S concurrently when the system architecture of the distributed file system S is the system architecture shown in fig. 2.

As shown in fig. 5, if the system architecture of the distributed file system S is the system architecture shown in fig. 2, when the client M and the client N perform the directory renaming operation on the distributed file system S at the same time, the client M and the client N can only check whether the directory renaming operation of the client M and the client N causes a directory orphan loop, and cannot know whether concurrent directory renaming operations of different clients cause a directory orphan loop. When the client M and the client N perform the directory renaming operation on the distributed file system S at the same time as shown in fig. 5B and 5C, respectively, a directory orphan loop B → C → E → F → B as shown in fig. 5d occurs in the distributed file system S, so that the directory metadata related to the directory orphan loop or the directory metadata and the file metadata related to the directory orphan loop are lost, and the directory metadata of the directory B, the directory C, the directory E, and the directory F cannot be path-resolved to the index when the distributed file system S operates.

If the system architecture of the distributed file system S is the system architecture shown in fig. 4, for a case where the multiple clients 211 concurrently perform the directory renaming operation, for example, a case where the clients M and N each perform the directory renaming operation on the distributed file system S at the same time as shown in fig. 5b and 5c, the coordination server 41 identifies the directory orphan loop, and sends a termination instruction for terminating the directory renaming operation to the clients 211 when the directory orphan loop is identified. Illustratively, before the directory renaming operation results of the client M and the client N take effect, the client M and the client N verify through the coordination server 41 whether the directory renaming operations of the client M and the client N at the same time cause a directory orphan loop together, and if the coordination server 41 verifies that the directory renaming operations of the client M and the client N at the same time cause a directory orphan loop together, the coordination server 1 sends a suspension instruction to the client M and the client N, respectively, to suspend the directory renaming operations of the client M and the client N, and prevents the directory renaming operation results of the client M and the client N from taking effect; further, if the coordination server 41 verifies that the directory renaming operations of the client M and the client N do not cause a directory orphan loop together at the same time, the coordination server 1 sends a confirmation instruction to the client M and the client N, respectively, to allow the directory renaming operation results of the client M and the client N to take effect.

The coordination server 41 identifies a directory orphan loop in a directory tree formed when the directory renaming operation is concurrently performed on the multiple clients 211 in the distributed file system by the following method: the coordination server 41 locally maintains a directory renaming directed graph, where the directory renaming directed graph records a source path and a destination path of a currently executed directory renaming operation, and the coordination server 41 identifies a directory orphan loop formed when the multiple clients 211 concurrently perform the directory renaming operation by identifying the directory orphan loop in the directory renaming directed graph.

Optionally, during the directory renaming operation performed by the client 211, the source path and the destination path of the directory renaming operation are saved in the directory renaming directed graph by the coordination server 41, and are deleted by the coordination server 41 after the directory renaming operation is completed (e.g., after the coordination server 41 sends an abort instruction or a confirm instruction to the client 211).

Optionally, if a directory orphan loop is not formed when the multiple clients 211 perform directory renaming operations concurrently, after the coordination server 41 sends a confirmation instruction to the multiple clients 211 that perform directory renaming operations, the coordination server 41 stores the respective directory renaming operations performed concurrently by the multiple clients 211, and generates an incremental version number for each directory renaming operation. Then, the coordinator server 41 transmits the operation version number and the operation information of each directory renaming operation to the invalidation list of each metadata server 221. The operation information of the directory renaming operation comprises an operation type, a source path and a destination path. After receiving the operation version number and the operation information of the directory renaming operation sent from the coordinator server 41, the metadata server 221 associated with the directory renaming operation performs processing on the target metadata according to the operation version number and the operation information of the directory renaming operation. And processing the target metadata, for example, reading and deleting the access metadata key value pair of the directory corresponding to the directory renaming operation by the metadata server corresponding to the source path of the directory renaming operation, and creating the access metadata key value pair of the directory corresponding to the directory renaming operation by the metadata server corresponding to the destination path.

Alternatively, the metadata server 221 may use the operation version number and the operation information of the directory renaming operation on the invalidation list stored by the metadata server 221 to check whether the cache entry of the client 211 corresponding to the access request sent by the client 211 is valid when the client 211 accesses the metadata server 221. For example, when the client 211 accesses the metadata server 221, the metadata server 221 may compare the operation version number and the operation information of the directory renaming operation on the stored invalid list with the cache entry of the client 211 corresponding to the access request sent by the client 211, and if the comparison result is consistent, it indicates that the cache entry of the client 211 corresponding to the access request sent by the client 211 is valid; if the comparison result is not consistent, it indicates that the cache entry of the client 211 corresponding to the access request sent by the client 211 has failed.

The distributed file system provided by the embodiment of the application not only has the technical principle and technical effect of the embodiment shown in fig. 3, but also reduces the network overhead of the metadata server for maintaining the consistency of the cache items of the client. The distributed file system provided by the embodiment of the application also solves the problem that directory orphan loops are formed by multiple clients concurrently performing directory renaming operation in the prior art, so that directory metadata related to the directory orphan loops or directory metadata and file metadata related to the directory orphan loops are lost.

The embodiment of the application also provides a server. Fig. 6 is a diagram of a server structure provided in the embodiment of the present application. As shown in fig. 6, the server includes a processor 61 and a memory 62, and the memory 62 stores executable instructions of the processor 61, so that the processor 61 can be used to execute the technical solution of the foregoing method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again. It should be understood that the Processor 61 may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor. The Memory 62 may include a high-speed Random Access Memory (RAM), a Non-volatile Memory (NVM), at least one disk Memory, a usb disk, a removable hard disk, a read-only Memory, a magnetic disk, or an optical disk.

The embodiment of the present application also provides a storage medium, in which computer execution instructions are stored, and when the computer execution instructions are executed by a processor, the metadata storage method provided by the present application is implemented. The storage medium may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk or an optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Embodiments of the present application further provide a program product, such as a computer program, which when executed by a processor, implements the metadata storage method covered by the present application.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the embodiments of the present invention.

Claims

1. A metadata storage method is applied to a Distributed File System (DFS), and the DFS comprises a plurality of metadata servers;

the method comprises the following steps:

decoupling each directory metadata in the DFS directory tree as follows to obtain access metadata and content metadata corresponding to each directory metadata: decoupling the directory metadata specifying the directory name into access metadata specifying the directory name and content metadata specifying the directory name; wherein the access metadata for the specified directory name comprises the specified directory name, a globally unique identifier for the specified directory name; the content metadata of the specified directory name comprises a directory entry of the specified directory name;

2. The method of claim 1, wherein storing the plurality of metadata groups of the directory tree in a distributed manner in the plurality of metadata servers comprises:

3. The method of claim 2, wherein performing the consistent hash calculation on the metadata set to obtain the hash value corresponding to the metadata set comprises:

4. The method of claim 2, wherein storing the metadata set correspondence in the metadata server comprises:

5. The method of claim 4, wherein the access metadata specifying a directory name further comprises permission information specifying a directory name; the content metadata specifying the directory name further includes a timestamp specifying the directory name;

a key value pair consisting of a global unique identifier of the specified directory name and authority information of the specified directory name as values and a global unique identifier of a parent directory of the specified directory name and the specified directory name as keys;

6. A distributed file system, comprising: a plurality of clients, a plurality of metadata servers, a plurality of data servers; wherein,

the metadata server for storing directory metadata and file metadata of the distributed file system using the method of any of claims 1-5; the metadata processing module is further used for processing metadata of the metadata group stored in the metadata server based on the access request of the client to obtain a request result and returning the request result to the client;

the metadata server is further configured to maintain consistency of the client-side cached items with metadata of the distributed filesystem by:

7. The distributed file system in accordance with claim 6, the system further comprising: a coordination server;

the coordination server is used for identifying a directory orphan loop in a directory tree of the distributed file system when a plurality of clients perform directory renaming operation concurrently, and sending an abort instruction for aborting the directory renaming operation to the clients when the directory orphan loop is identified;

wherein the directory orphan loop is an isolated mapping loop formed by mapping relations between a plurality of directory metadata or an isolated mapping loop formed by mapping relations between a plurality of directory metadata and file metadata.

8. A server, comprising:

a processor and a memory;

the memory stores executable instructions executable by the processor;

wherein execution of the executable instructions stored by the memory by the processor causes the processor to perform the method of any of claims 1-5.

9. A storage medium having stored therein computer executable instructions for performing the method of any one of claims 1-5 when executed by a processor.

10. A program product comprising a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 5.