CN113515487B

CN113515487B - Directory query method, computing device and distributed file system

Info

Publication number: CN113515487B
Application number: CN202111040865.7A
Authority: CN
Inventors: 李立帅; 李红; 张天旭; 郝志敏; 汪权; 韦新伟; 蒋维
Original assignee: Lenovo Netapp Technology Ltd
Current assignee: Lenovo Netapp Technology Ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-11-19
Anticipated expiration: 2041-09-07
Also published as: CN113515487A

Abstract

A method, computing device, and distributed file system for querying metadata are provided. The method comprises the following steps: responding to a snapshot creating instruction sent by a metadata server based on a query request, creating a snapshot for a directory to be queried by a metadata base, and returning partition identification information and snapshot identification information to a client through the metadata server, wherein the partition identification information identifies at least one storage partition used for storing metadata of files of the directory to be queried, the snapshot identification information identifies the created snapshot, and the snapshot comprises a metadata storage file set used for storing the metadata of the files under the directory to be queried in the at least one storage partition; sending a snapshot analysis request and the two identification information to a metadata base by a client; and determining the snapshot to be analyzed by the metadata base based on the two identification information, and responding to the snapshot to be analyzed analysis request to analyze the determined snapshot to be analyzed so as to obtain the metadata of the file in the directory to be queried from the metadata storage file set.

Description

Directory query method, computing device and distributed file system

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method for querying a directory of a distributed file system, a computing device, and a distributed file system.

Background

In a distributed file system, metadata is an important unit for describing the file system, and includes basic attribute information and directory structure information of files. In most typical applications of distributed file systems, such as the internet, scientific computing, etc., query requests for metadata account for more than half of all requests. In some business scenarios, such as large data processing or machine learning, a huge amount of small files may be generated, and these huge amount of small files may be associated with one or more directories. In addition, there will be correspondingly a huge amount of metadata for these huge amounts of small files. Therefore, for access operations of a distributed file system, it is important to be able to quickly query and process a directory containing a large number of small files to obtain metadata of the small files under the queried directory.

Therefore, the efficiency of querying the directory in the distributed file system has an extremely important influence on the performance of the entire file system, and a method for improving the performance of querying the directory in the distributed file system is needed.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided a method of querying metadata in a distributed file system, the distributed file system including a metadata repository, comprising: in response to a snapshot creation indication from a first entity, creating a snapshot for a directory to be queried, wherein the directory to be queried corresponds to at least one storage partition in the metadata base, the snapshot includes a metadata storage file set corresponding to the directory to be queried, and the metadata storage file set includes a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried; sending, by a first entity, partition identification information and snapshot identification information identifying a storage partition in the metadata base corresponding to the directory to be queried to a second entity serving as a query requester, where the storage partition corresponding to the directory to be queried is a storage partition in the metadata base for storing metadata of files of the directory to be queried; obtaining a snapshot analysis request, the partition identification information, and the snapshot identification information from the second entity; determining a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzing the snapshot to be analyzed in response to the snapshot analysis request, so as to obtain metadata of files in the directory to be queried from a metadata storage file set included in the snapshot to be analyzed.

According to an aspect of the present disclosure, there is provided a method of querying metadata in a distributed file system, the distributed file system including a metadata repository, comprising: sending a query request aiming at a directory to be queried to a metadata server; obtaining partition identification information and snapshot identification information from a metadata server, wherein the partition identification information identifies at least one storage partition in the metadata base for storing metadata of files of the directory to be queried, the snapshot identification information identifies a snapshot created for the directory to be queried, the snapshot includes a metadata storage file set composed of a plurality of metadata storage files in the at least one storage partition for storing metadata of files in the directory to be queried, the partition identification information, the snapshot identification information, and a snapshot analysis request are sent to the metadata base, the snapshot analysis request is used for triggering the metadata base to perform analysis on a snapshot determined based on the partition identification information and the snapshot identification information so as to obtain metadata of the file in the directory to be queried; and acquiring the metadata of the file under the directory to be queried from the metadata database.

According to an aspect of the present disclosure, there is provided a method of querying metadata in a distributed file system, the distributed file system including a metadata repository, comprising: acquiring a query request aiming at a directory to be queried; sending a snapshot creating instruction aiming at creating a snapshot for the directory to be queried to the metadata base in response to the query request; obtaining partition identification information and snapshot identification information from the metadata base, wherein the partition identification information identifies at least one storage partition used for storing metadata of files of the directory to be queried in the metadata base, the snapshot identification information identifies a snapshot created for the directory to be queried, and the snapshot includes a metadata storage file set composed of a plurality of metadata storage files used for storing metadata of files under the directory to be queried in the at least one storage partition; and sending the partition identification information and the snapshot identification information to a requester sending the query request, wherein the partition identification information and the snapshot identification information are sent to a metadata base by the requester for determining a snapshot to be analyzed to obtain metadata of the file under the directory to be queried.

According to an aspect of the present disclosure, there is provided a method of querying metadata in a distributed file system, the distributed file system including a metadata repository, comprising: sending, by a metadata server, a snapshot creation indication to the metadata repository in response to the query request from a client for a directory to be queried; creating, by the metadata base, a snapshot for the directory to be queried in response to the snapshot creation indication, and returning partition identification information and snapshot identification information to the client via the metadata server, where the partition identification information identifies at least one storage partition in the metadata base for storing metadata of files of the directory to be queried, and the snapshot identification information identifies the created snapshot, where the snapshot includes a metadata storage file set composed of a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried; sending, by the client, a snapshot analysis request, the partition identification information, and the snapshot identification information to the metadata repository; determining, by the metadata base, a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzing the snapshot to be analyzed in response to the snapshot analysis request to obtain metadata of the file under the directory to be queried from the metadata storage file set.

According to another aspect of the present disclosure, there is provided a distributed file system including: a client; a metadata server; and the client, the metadata server and the metadata database are respectively configured to execute the above method for querying the metadata in the distributed file system.

According to another aspect of the present disclosure, there is provided a computing device comprising: a processor, a memory including one or more computer program modules; wherein the one or more computer program modules are configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the method for querying metadata in a distributed file system as described above.

Drawings

For the purpose of illustrating the principles of the present disclosure, embodiments of the present disclosure will be described in conjunction with the appended drawings. It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Alternatively, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose computer devices.

Fig. 1 shows a schematic diagram of a process of metadata writing of a Key-Value database.

Fig. 2 shows the format of an SST file.

FIG. 3A illustrates a process diagram for querying metadata of files of a directory.

FIG. 3B illustrates a process diagram for querying metadata of files of a directory according to an embodiment of the disclosure.

Fig. 4 shows a flowchart of a method of querying metadata of a file of a directory according to an embodiment of the present disclosure.

FIG. 5 illustrates a flow diagram of another method of querying metadata of files under a directory in a distributed file system in accordance with an embodiment of the present disclosure.

FIG. 6 illustrates a flow diagram of yet another method of querying metadata of files under a directory in a distributed file system in accordance with an embodiment of the present disclosure.

Fig. 7 shows two snapshots created at two different points in time for a directory to be queried.

Fig. 8 shows a schematic diagram of partitioning by column family.

FIG. 9 shows a schematic diagram of a client-side interaction process with a key-value store, according to an embodiment of the disclosure.

Fig. 10 is a block diagram illustrating a structure of an apparatus for querying metadata of a file of a directory according to an embodiment of the present disclosure.

Fig. 11 is a block diagram illustrating a structure of another apparatus for querying metadata of a file of a directory according to an embodiment of the present disclosure.

Fig. 12 is a block diagram illustrating a structure of still another apparatus for querying metadata of a file of a directory according to an embodiment of the present disclosure.

FIG. 13 shows a block diagram of a computing device.

Detailed Description

Embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the disclosure are shown. This disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Like numbers refer to like elements throughout.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The present disclosure is described below with reference to block diagrams and/or flowchart illustrations of methods, apparatus (systems) and/or computer program products according to embodiments of the disclosure. It will be understood that one block of the block diagrams and/or flowchart illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computing device, special purpose computing device, and/or other programmable data processing apparatus, such that the instructions, which execute via the processor of the computing device and/or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.

The distributed file system is composed of a plurality of servers and a plurality of clients. The servers may be logically divided into a metadata server and a data server by function. The data server is used for managing the data of various files, while the metadata server is responsible for managing the metadata of the files, and the metadata server may not store the metadata locally but store the metadata in a database bound with the metadata server. A distributed file system typically includes multiple directories, which may include sub-directories, grand-directories, etc., under each type of directory may include one or more files. In the present disclosure, the directory to be queried or the designated directory referred to hereinafter may be any directory in a distributed file system.

One method for storing metadata for files in a distributed file system is a method that utilizes a tree structure, which is characterized in that the metadata for files under a directory is stored in nodes of, for example, a binary tree. The method needs to adjust the binary tree in time in the operation steps of adding and deleting files so that the binary tree still keeps balance, and needs to search each node in the path from the root of the binary tree to the leaf direction of the tree when accessing the directory, and when the size of the directory is larger and the depth of the tree is increased, the longer the search path is needed during query, which often causes larger access delay.

A method for storing metadata for files in a distributed file system using a Key-Value (KV) database. Furthermore, although the present disclosure employs an example of a key-value store storing metadata for ease of description, those skilled in the art will appreciate that other metadata stores (including processing devices or servers) that utilize unstructured, ordered, index-based storage formats to store metadata may also be used and perform operations related to key-value stores as described above and below in the present disclosure.

The Key Value database stores metadata of each file in a directory by using unstructured Key Value pair data, namely, a Key (namely, Key) is used as an index, and operations such as reading, writing, deleting and the like are performed on a metadata record (namely, Value) corresponding to the Key. Compared with a tree structure, the method for storing the metadata through the key value database has the advantages that due to the adoption of the unstructured ordered key value pairs, programming, execution and the like are simpler and more convenient, and the storage space is saved.

Key-value databases are mainly composed of three parts: memtable, WAL files, persisted SST files (stored in hard disk). In the present disclosure, a key-value database is employed to store metadata of files under a directory, in which case the metadata is organized in order according to a key-value format, and may be in the form of a character string. The writing process may specifically be as shown in fig. 1.

In FIG. 1, Memtable represents the memory data structure, the skip list implementation, where the newly written metadata will be written first; after the written metadata reaches the upper limit of the capacity set by Memtable, Memtable becomes Immunable Memtable, so that preparation is made for later merging to SST files.

The WAL file indicates that metadata is written to the WAL file before being written to Memtable, and the WAL is written sequentially by way of append (append). The existence of the WAL enables the memory data loss caused by machine downtime to be recovered.

The SST file is a disk data storage file and has a predetermined format. The SST files have a hierarchy which is divided into a plurality of levels from 0 to N, and each Level comprises a plurality of SST files; the total amount of single-layer SST files doubles with the increase of layers; data within the SST file is ordered; wherein, the SST file of Level 0 is generated by Immutable Memtable direct Dump (Dump), and the SST files of other levels are generated by merging the file at the upper layer and the file at the current layer; SST files are sequentially written and generated in the merging process, and after the SST files are generated, the SST files can be deleted in the subsequent merging process only without any modification operation.

For example, for each node (corresponding to each file), Key-Value pair data (KV) of metadata thereof is created, wherein a Key code (Key) represents information of the corresponding node/file, which may include a parent node or name of the node/file, etc., Key: INDODE/parent _ INODE/name; a metadata record (value) represents a metadata record of a file at its node, value being serialized data. Then, the key-value pair data of each metadata can be written to Memtable and WAL by a predetermined operation, for example, calling the function Put (), and finally saved in the form of an SST file in the disk.

In addition, the key-value database may further include a Manifest file and a Current file. The Manifest file records the distribution of the SST files at different levels, the maximum and minimum keys of a single SST file, and other meta information required by some key value databases. As can be seen from the above description, the primary task at startup of the key-value database is to find the Current Manifest, and there may be multiple Manifests, and the Current file simply records the file name of the Current Manifest, so that the process becomes very simple.

Of course, the above is merely an example, and different files may be included according to the type of key-value store selected.

To facilitate an understanding of the present disclosure, a predetermined format of the SST file holding the metadata will first be described, as may be shown in fig. 2.

The SST File can be regarded as a table form and comprises different blocks (blocks), which are physically consistent in storage manner but logically store different contents, such as Data blocks and Meta blocks for storing Data, Index blocks and Meta Index blocks for storing Index information, and File folders located at the end of the table of the SST File and pointing to the position and size of each Block partition.

Specifically, each Data Block stores key-value pairs of metadata, and is arranged in order, allocated in consecutive Data blocks, the size of which is predetermined, for example, 4K (4096 bytes).

Each Meta Block stores metadata related information of metadata stored in the Data Block, and is currently generally used only for storing bloom filters (if Filter blocks exist). Data written to a Data Block will simultaneously update the filter in the corresponding Meta Block. The data is also read by first being filtered by the bloom filter.

Index Block is an Index to the Data Block, and there is only one. There are several Data blocks in the SST file, and several key-value pairs in the Index Block. For each key value pair in the Index Block, the key thereof is larger than or equal to the key of the last record in the corresponding Data Block and is smaller than the key of the first record in the subsequent Data Block of the Data Block; value is location information (e.g., offset and length) that points to the corresponding Data Block.

Meta Index Block is an Index to Meta Block and has only one. key is the name of the Meta index (i.e., the name of the filter), and value is location information (e.g., offset and length) pointing to the Meta Block.

Each Block (all types) may contain multiple records (Record), and when the total length of added records exceeds a predetermined size of Block (e.g., 4K), then a Block is newly started.

Further details of each Block and Foote are well known to those skilled in the art and therefore a description thereof is omitted here.

In the above, a way of storing metadata of files under a directory of a distributed file system based on a key-value database is mentioned, which, compared to a tree structure, can make programming, execution, etc. simpler and save storage space due to the adoption of unstructured ordered key-value pairs. When a query needs to be made for metadata of a file of a directory stored in the key-value store (i.e., metadata of a file under the directory is obtained, also referred to as a query for the directory), a method as shown in fig. 3A may be employed.

As shown in fig. 3A, the Client (Client) first executes a query command (readdir) for the specified directory, which indicates to query the specified directory. A query request, which may carry information (e.g., a directory or file name) for the specified directory, is then sent over the network to a metadata server (MDS) according to a communication protocol (e.g., RPC). After receiving the query request, the metadata server reads metadata of files in a specified directory based on information of the specified directory carried by the query request, but generally, the metadata cached locally at the metadata server is limited, and if the metadata server does not locally cache the metadata of the required files in the directory, the metadata server needs to query in a Key Value (KV) database interfaced with the server, and then returns a query response to the client, where the query response includes at least a part of the requested metadata.

For querying a directory including a large number of files, it is usually necessary to repeatedly perform many query processes, because for each query request, the metadata server can only return a fixed number of metadata key-value pairs to the client, for example, if there are 100 ten thousand files under the specified directory, if each query request can only return metadata of 1000 files, 1000 query processes are required for the directory, and after the client parses the fixed number of metadata key-value pairs received each time, the client sends the query request to the server again with the offset value parameter to continue the next query process at the key-value server. The above steps are repeated until the directory is traversed, and the whole query process may involve many times of network message encoding and decoding, database query and disk read/write (IO) (the disk IO is usually slow), so if many files are in the directory, the query is slow, and the upper layer service at the client side feels that the directory query for the file system is slow, thereby affecting the user experience.

Therefore, there is a need for a metadata method that can efficiently query files in a distributed file system, including directories of a large number of files.

In this regard, the present disclosure proposes a method for reading or querying metadata of a file under a directory comprising a large number of files in a distributed file system based on a snapshot, and the general flow of the method is shown in fig. 3B.

FIG. 3B illustrates a schematic diagram of the main process of querying directories in a distributed file system according to an embodiment of the present disclosure. The distributed file system comprises at least a Client (Client), a metadata server (MDS) and a key-value Database (DB). It should be noted that only the Client (Client), the metadata server (MDS) and the key-value store (DB) are shown in the figure, but other entities, such as other clients (Client), other data servers (MDS), etc., may be included in the distributed file system. Optionally, the key-value store may have servers for controlling its various processes (e.g., querying, deleting, backing up, etc.) and may include various interfaces so that it may bind with, and in turn communicate through, the metadata server to perform the various operations involved in the query process as described below.

Firstly, a query request (1) for a directory to be queried is sent to a metadata server by a client so as to request for querying metadata of files under the directory to be queried. A snapshot creation indication is then sent by the metadata server to the key-value store in response to the query request (2). Then, creating, by the key-value store, in response to the snapshot creation indication, a snapshot for the directory to be queried, and returning to a metadata server, for example, through a response message, partition identification information and snapshot identification information (snapshot ID) of the created snapshot (3), where the directory to be queried corresponds to at least one storage partition in the key-value store, the snapshot includes a set of metadata storage files corresponding to the directory to be queried, and the set of metadata storage files includes a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried. The partition identification information and snapshot identification information are then returned to the client by the metadata server, e.g., via a query response message (4). Next, a snapshot analysis request, partition identification information, and snapshot identification information (both identification information may also be carried in the snapshot analysis request) are sent by the client to the key-value database (5). And finally, determining a snapshot to be analyzed by the key value database based on the partition identification information and the snapshot identification information, analyzing the snapshot to be analyzed in response to the snapshot analysis request, so as to obtain metadata of the file under the directory to be queried from the metadata storage file set, and returning the requested metadata to the client (6).

The specific details involved in each process will be described in conjunction with the description of fig. 4-9.

Therefore, in the query process, the metadata server receives the query request from the client only once and sends a snapshot creation instruction to the key-value database, the key-value database obtains and returns the requested metadata to the client based on the analysis of the snapshot, the query times of the server request for the key-value database are reduced, the query efficiency is improved, the load overhead at the metadata server can be reduced, the metadata server has more control and processing capacity to execute other operations, and the control performance of the metadata server is improved.

FIG. 4 illustrates a flowchart of a method of querying metadata of a file under a directory in a distributed file system, according to an embodiment of the disclosure.

The method 400 shown in fig. 4 may be performed by the client (as a requestor). The client refers to a program corresponding to the server and providing local services for the client, and is generally installed on a common client (e.g., a computer device, a mobile device, etc.) and can operate in cooperation with the server. The client may also interact with the database for simple data or information exchange.

As shown in fig. 4, in step S410, a query request for a directory to be queried is sent to the metadata server.

Optionally, the query request may carry information of the directory to be queried, such as name, number, and the like, and may be sent to the metadata server.

In step S420, partition identification information and snapshot identification information are acquired from the metadata server, where the partition identification information identifies at least one storage partition in a metadata base (key-value database) for storing metadata of files of a directory to be queried (also referred to simply as "corresponding to the directory to be queried"), the snapshot identification information identifies a snapshot created for the directory to be queried, the snapshot is saved in the key-value database, and a metadata storage file set composed of a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried is included.

Optionally, the partition identification information and the snapshot identification information are obtained from a key-value store via a metadata server. The snapshot is created by the key value database, and may be regarded as a backup of a metadata storage file set (for example, an SST file, which may also include a corresponding WAL file, a Manifest file, and the like) of metadata of a file in a directory to be queried at a current time point, and as described above, the metadata storage file set (a plurality of SST files) is a final saving state of a metadata key value pair (if there is a written metadata key value pair of a file in the directory to be queried in a memory structure at the current time point, the metadata storage file set may be first stored in the SST file), that is, saved in a disk, so that the snapshot includes the SST file(s), and thus the SST file can be obtained when the snapshot is parsed. Alternatively, the SST files included in the snapshot may be copies of SST files in disk or hard-linked files.

In step S430, the partition identification information, the snapshot identification information, and a snapshot analysis request are sent to the key-value database, where the snapshot analysis request is used to trigger the key-value database to perform analysis on a snapshot determined based on the partition identification information and the snapshot identification information, so as to obtain metadata of a file in the directory to be queried (also referred to as requested metadata in the following text).

For example, after receiving the partition identification information, the snapshot identification information, from the metadata server, the client may know that the snapshot including the metadata storage file set has been saved in the key-value store, and thus may send a snapshot analysis request to the key-value store to trigger the analysis of the snapshot. Since multiple storage partitions may be included in a key-value store, and it has been determined which storage partition the metadata for a file under each directory is associated with when the metadata is written to the key-value store, each storage partition has particular partition identification information. A directory may correspond to at least one storage partition, i.e. metadata for files under a directory may be stored in at least one storage partition. For example, if a first storage partition stores metadata for a file under a first directory, a second storage partition stores metadata for a file under a second directory …, and so on, when new metadata key-value pairs (new files are continuously saved under the directory) are continuously written to the first directory, when a certain number is reached, the newly written metadata for the new file under the first directory will be stored in the new storage partition. The storage partition may be a Column Family (Column Family).

A specific process of determining the snapshot to be analyzed based on the partition identification information and the snapshot identification information will be described in detail later.

Optionally, the snapshot analysis command includes a parse sub-request and a traverse sub-request. The client may send a parsing sub-request to the key-value database, for example, the parsing sub-request is used to trigger parsing, at the key-value database, of a snapshot determined based on the partition identification information and the snapshot identification information, and open a metadata storage file set, such as an SST file, included in the determined snapshot, and may return a parsing success response to the client. And then, the client sends a traversal sub-request to the key-value database, wherein the traversal sub-request is used for triggering the traversal of the metadata storage file set at the key-value database so as to obtain the metadata of the file under the directory to be queried. The traversal sending sub-request is sent in response to obtaining the parsing success response (i.e., the SST file is already opened), or may also be sent at a certain timing after sending the parsing sub-request, and so on, which is not limited by the present disclosure. The specific snapshot analysis process will be described later.

In step S440, metadata of the file in the directory to be queried is obtained from the key-value database.

Because the key-value database can obtain the metadata of the file in the directory to be queried through analyzing the snapshot, the client can obtain the metadata from the key-value database.

The above describes the query process on the client (requester) side, in which the client sends a query request to the metadata server only once, and also has simple interaction with the key-value database, for example, only once or twice requests and responses can obtain the requested query result (metadata of the file), so the number of times that the client sends the query request can be reduced, and the query efficiency is improved.

The method illustrated in fig. 5 may be performed by a metadata server.

In step S510, a query request for the directory to be queried is obtained.

For example, the query request may be received from a client, which may include information (e.g., name, number, etc.) of the directory to be queried.

In step S520, in response to the query request, a snapshot creation indication for creating a snapshot for the directory to be queried is sent to the key-value database.

Optionally, after receiving the query request, the metadata server may first determine whether the requested metadata is cached locally, and if not, the metadata server sends a snapshot creation indication.

Alternatively, the metadata server generally has a file quantity counting function, and can determine the number of files in the directory requested to be queried, and when the number is greater than a threshold (for example, 100 ten thousand), it indicates that there are many files in the directory, and the querying process needs to be performed based on the snapshot. If the number is not large, it is still based on the general query pattern described above with reference to FIG. 3A, i.e., the metadata server receives at least one query request from the requestor and returns a preset amount of metadata to the requestor for each query request.

Optionally, the snapshot creation indication is used to trigger the key-value store to perform snapshot creation.

In step S530, partition identification information identifying a storage partition corresponding to the directory to be queried in the key-value database and snapshot identification information of the created snapshot are obtained from the key-value database. The storage partition corresponding to the directory to be queried is at least one storage partition used for storing metadata of files in the directory to be queried, and the snapshot comprises a metadata storage file set used for storing the metadata of the files in the directory to be queried in the at least one storage partition;

for example, a snapshot creation success response may be taken from the key-value pair, which may include the partition identification information and snapshot identification information.

In step S540, the partition identification information and the snapshot identification information are sent to the requester sending the query request, where the partition identification information and the snapshot identification information are sent to the key-value database by the requester for determining to be analyzed to obtain the snapshot of the metadata of the file under the directory to be queried.

For example, the metadata server may return a query response to the requestor (client), which may include the partition identification information and snapshot identification information.

As described with reference to fig. 4, after the client receives the two identification information, the client knows that the snapshot is already saved in the key-value database, and therefore, the client can send a snapshot analysis request to the key-value database independently of the metadata server to trigger the analysis of the snapshot determined based on the partition identification information and the snapshot identification information, and the key-value database can obtain the requested metadata after analyzing the determined snapshot and can directly return the requested metadata to the client.

The above describes the query process at the metadata server side with reference to fig. 5, in the query process, the metadata server receives the query request from the client only once, and sends a snapshot creation indication once to the key-value database, and then receives the identification information of the snapshot from the key-value database and provides the identification information to the client, and thereafter, the metadata server may not participate in the query process any more, so that the load overhead at the metadata server may be reduced, and the metadata server may have more control and processing capabilities to perform other operations, thereby improving the control efficiency.

The method shown in fig. 6 may be performed by a key-value store, actually under control of a processing device or server of the key-value store.

As shown in fig. 6, in step S610, in response to a snapshot creation indication from the first entity, a snapshot is created for a directory to be queried, where the directory to be queried corresponds to at least one storage partition in the metadata base, and the snapshot includes a metadata storage file set corresponding to the directory to be queried, where the metadata storage file set includes a plurality of metadata storage files in the at least one storage partition for storing metadata of files in the directory to be queried.

Alternatively, the first entity may be a metadata server, and the snapshot creation indication is sent by the metadata server to a processing device or server of the key-value store based on a query request from a client for the specified directory.

Alternatively, snapshot creation may be performed for any portion or all of the key-value store. When the processing device or the server of the key-value database receives the snapshot creation instruction, a snapshot is created for a part related to the metadata of the file in the directory to be queried in the key-value database which is running at the current time point. Before taking a snapshot, the memory structure Immutable Memtable or key value pairs in Memtable are stored in the SST file first, so that a snapshot created by backing up the SST file in the disk or linking with a hard link may include all key value pairs of metadata of files in a directory to be queried. The snapshot may correspond to a read-only database that is the portion of the key-value store that is to be referenced to the directory to be queried, typically as a full backup (all backups at each point in time) or an incremental backup (only the updated portion of the snapshot relative to the previous point in time, i.e., the incremental backup).

The portion of the key-value database corresponding to the directory to be queried may include a set of database files (e.g., SST files, Current files, Manifest files, etc.), where the metadata storage file set (multiple SST files) is a final saved state of key-value pairs of metadata, that is, saved in a disk, and thus a snapshot created for a portion of files in the key-value database related to the directory to be queried includes the SST file corresponding to the directory to be queried, so that the SST file may be obtained when the snapshot is parsed. Optionally, the SST file included in the snapshot is a backup file or a hard link file of the SST file in the disk. Since the collection of database files may also include other files, such as Current files, Manifest files, etc., these files may also be included in the snapshot.

Optionally, when the processing device or the server of the key-value database creates a snapshot, snapshot identification information, i.e., a snapshot ID, is also correspondingly allocated to the snapshot. The identification information of the snapshots created at different points in time is different, e.g., a sequence number is assigned in an incremental manner, e.g., snapshot 1, snapshot 2, etc. The snapshot IDs may be the same between snapshots corresponding to different storage partitions, e.g., a snapshot in which a metadata storage fileset of files of a directory corresponding to a first storage partition is included may be numbered incrementally by 1 over time, while the snapshot corresponding to a second storage partition may also be numbered incrementally by 1.

As a specific example, for the process of obtaining a set of database files, for example, after receiving a snapshot creation instruction, (a processing device or a server of) a key-value database may first call a DisableFileDeletions interface to prohibit deletion of a file (e.g., an SST file), then incrementally allocate a snapshot ID to a snapshot to be created this time, then call a GetLiveFiles interface to obtain Current valid files, such as an SST file, a Current file, an Options file, and a Manifest file, and call a getsortwadefiles interface to obtain a Current valid WAL file, and determine a portion of SST files associated with a directory to be queried. After the files are obtained, the valid files may be copied to a backup location, such as under a special directory for snapshots, and the SST files hard-linked to the SST files in disk. After snapshot creation is complete, the EnableFileDeletions interface may be called to re-allow compression to delete obsolete files. The directory tree of snapshots may be as shown in fig. 7.

Fig. 7 shows two snapshots of the creation of a directory to be queried at two different points in time. Where the meta directory contains a file describing the metadata of each snapshot, the name of which is the snapshot ID. The private directory contains SST files (one shown, and may actually be multiple), Current files, Options files, and Manifest files, and is grouped by snapshot ID. Therefore, the SST file can be obtained by analyzing the snapshot.

Returning to fig. 6, in step S620, partition identification information and snapshot identification information identifying a storage partition corresponding to a directory to be queried in the metadata base, which is a storage partition for storing metadata of a file of the directory to be queried in the metadata base, are transmitted to the second entity, which is a query requester, via the first entity.

Alternatively, as described earlier, the partition identification information may be Column Family (CF) identification information (CF _ id). The second entity may be a client.

A column family may be considered a data set composed of a series of key value pairs for which all read and write operations require that the column family be first designated to implement a storage partition (logical partition) of the key-value store, i.e., when data is written, it can be determined to which column family the data is written. As mentioned before, the storage partition indicated by at least one column family may correspond to (metadata of) a directory, i.e. the key-value pairs of the metadata of the files under these directories will be stored in the storage partition (storage is implemented with SST files). For example, if a first column family is bound to metadata for files under a first directory, a second column family is bound to metadata for files under a second directory …, and so on, when new metadata key-value pairs (new files are continuously saved under the directory) are continuously written to the first directory corresponding to the first column family, when a certain number is reached, the newly written metadata for the new files under the first directory will be stored in the new column family.

Schematic diagram of column families as shown in fig. 8, each column family has its own Memtable, SST file, but all column families share WAL, Current, and Manifest files.

In addition, since snapshot IDs of snapshots of metadata of files under directories corresponding to different column families may be the same, for example, snapshots of a directory corresponding to a first column family may be numbered incrementally by 1 at a time point, and snapshots of a directory corresponding to a second column family may also be numbered incrementally by 1, it is necessary to determine column family identification information in advance to find a snapshot created for a directory to be queried.

Thus, the key-value store (processing device or server) returns the partition identification information and the snapshot identification information to the client after creating the snapshot, so that the client later interacts with the key-value store with both identifications to obtain data.

In step S630, a snapshot analysis request, partition identification information, and snapshot identification information are acquired from the second entity.

Optionally, the partition identification information and the snapshot identification information may be carried in the snapshot analysis request.

In step S640, a snapshot to be analyzed is determined based on the partition identification information and the snapshot identification information, and the snapshot to be analyzed is analyzed in response to the snapshot analysis request, so as to obtain metadata of the file in the directory to be queried from a metadata storage file set included in the snapshot to be analyzed.

Snapshots in a key-value store are periodically cleaned, so that there is at least one snapshot for the directory to be queried that was created at least one point in time. In addition, as described above, since the snapshot ID may be the same between snapshots of metadata of files under directories corresponding to different storage partitions, it is necessary to determine partition identification information in advance to find a snapshot created for a directory to be queried.

Therefore, the at least one storage partition of metadata of the file under the directory to be queried in the metadata base may be determined based on the partition identification information, and a snapshot set corresponding to the at least one storage partition may be determined, where each snapshot in the snapshot set corresponds to a metadata storage file set of the metadata in the at least one storage partition at its corresponding time point; and determining a snapshot corresponding to the snapshot identification information from the snapshot set based on the snapshot identification information. Thus, the determined snapshot is the snapshot to be analyzed.

For example, when the directory corresponds to only a first storage partition, the partition identification information (column family identification information) is an identification based on which the key-value database determines the first storage partition, thereby acquiring corresponding n snapshots (n time points are created for the first storage partition, n is an integer greater than or equal to 1), each snapshot corresponding to a metadata storage file set (SST files) for metadata in the first storage partition at the corresponding time point, and then determining the corresponding one snapshot from the n snapshots based on the snapshot identification information.

When a directory corresponds to more than two storage partitions, taking the 1 st and 11 th storage partitions as an example, the partition identification information (column family identification information) may have two identifications CF _1 and CF _11 (or, since the association relationship of CF1 and CF11 is already determined when a key value pair is saved in an SST file, only one identification can determine the two partitions), the key-value database determines the two storage partitions based on the two identifications, thereby acquiring corresponding n snapshots (n time points are created for the 1 st and 11 th storage partitions, n is an integer greater than or equal to 1) each corresponding to a metadata storage file set (a plurality of SST files) of metadata in the two storage partitions at the corresponding time point, and then determining a corresponding snapshot from the n snapshots based on the snapshot identification information. That is, in this case, one snapshot may include SST files in more than two memory partitions.

Further, the snapshot analysis request includes a parse sub-request and a traverse sub-request, and thus the analysis process for the determined snapshot may include the following sub-steps.

Parsing the snapshot in response to the parse sub-request and opening a metadata storage file set (SST file) included in the snapshot; and traversing the opened metadata storage file set in response to the traversing sub-request to obtain metadata of the file in the directory to be queried.

The key-value pairs of the metadata of the file may be stored in the SST file according to an internal key, e.g., each stored key-value pair of the SST file may include an internal key that includes: storing the original key-code, version information and operation type of the key-value pair before storing, and the value of each stored key-value pair is the original value. For example, the internal key is represented as lnnalkey = userkey + seq + type, where userkey, seq, and type represent the original key, version information, and operation type, respectively. For example, the operation type is write (put) or delete (delete).

Thus, since the old and new versions of each metadata key-value pair are included in the SST file, and it is often desirable to directly obtain the latest version of the key-value pair, the SST file may be traversed to obtain the metadata of the file under the directory to be queried by the following operations.

First, a first level traversal pattern and a second level traversal pattern are determined in response to a traversal sub-request.

For example, a first level traversal pattern may be for each internal key of the stored key-value pair, and a second level traversal pattern may be for each original key that the key of the stored key-value pair comprises. Illustratively, the first level traversal pattern may use an InternalKey (userKey, seq, type) as the traversal granularity, and as long as any one constituent element of the InternalKey of two stored key-value pairs is different, they are considered to be different key-code pairs, and the second level traversal pattern may use the original key-code included in the internal key of each stored key-value pair as the traversal granularity, and as long as the userkeys of the two stored key-value pairs are the same, they are considered to be different versions of one key-value pair.

Then, the metadata storage file set is traversed by utilizing a first-level traversal mode to obtain a first group of storage key-value pairs, wherein the storage key-value pairs of the first group of storage key-value pairs are different in at least one of original key codes, version information and operation types.

For example, a first level of traversal is performed for the internal key, with an InternalKey (userkey, seq, type) as the granularity of traversal, and the resulting first set of stored key-value pairs is shown in table 1:

[ TABLE 1 ] first-level traversal result-first group of stored key-value pairs

Optionally, a walker (SSTIterator) may be created in the key-value store for the traversal process described above. Further, since the SST files are of different hierarchies (e.g., L0 SST file, L1 SST file, L2 SST file, etc.), and the L0 SST file of the lowest hierarchy can be traversed separately from the SST files of other hierarchies since it is not subjected to the merge operation.

For example, the walker (SSTIterator) is created first, and may be viewed as creating a walker for the lowest level of L0 SST files and a walker for other levels of SST files (denoted as L1+ SST files). Since the Data Block of the SST file stores the key value pair of the metadata and includes the Index Block for the Data Block, a TwoLevelIterator may be used to implement the traversal operation of Data with a hierarchical relationship logically, and a BlockIter [ Index Block ] for the Index Block and a BlockIter (Data Block) for the Data Block are combined. In traversing for an L1+ SST file, a hierarchical file number walker may also be included to traverse each level of SST files in turn.

And finally, traversing the first group of storage key value pairs by using a second-stage traversal mode, and determining a second group of key value pairs based on the version information included by the storage key value pairs in the first group of storage key value pairs, wherein the key codes of the key value pairs in the second group of key value pairs are original key codes and are different.

For example, to get the latest version of each key-value pair, therefore, screening the second set of key-value pairs may include the following process: i. determining at least one subgroup of stored key-value pairs in the first group of stored key-value pairs, in which the internal key-codes include the same original key-code, for example, the original key-codes (user _ key) in the first two rows of stored key-value pairs are both keys 1, so that the two stored key-value pairs constitute the first subgroup of stored key-value pairs; for each subgroup of stored key-value pairs, selecting, as the representative key-value pair of the subgroup, one stored key-value pair whose version is the newest (e.g., having the largest sequence number for the operation of the key-value pair, the larger the sequence number, the newer the version) from the stored key-value pairs of the subgroup based on the version information, e.g., the stored key-value pair of the first row has a sequence number greater than the second row, so that the stored key-value pair of the first row is the newest, as the representative key-value pair of the subgroup, and similarly, the representative key-value pairs of other subgroups (e.g., corresponding to key2, key 3) can be obtained; using the representative key-value pairs of all subgroups as a second set of key-value pairs, as shown in table 2:

(TABLE 2) first-level traversal result-second set of key-value pairs

In this way, the obtained second group of key-value pairs are represented by character strings of key-value pairs of metadata of a file under a directory to be queried, and since the key-value pairs are all generated in order based on the data structure of the actual directory, the obtained second group of key-value pairs can be deserialized, so that a preset data structure (required by the client side) can be filled to return the requested metadata to the user, and the requested metadata can be parsed at the client side because the requested metadata has the preset data structure.

In order to better understand the above process of obtaining metadata of a file under a directory to be queried based on snapshot analysis, the following describes an interaction process between a client and a key-value database in conjunction with fig. 9. FIG. 9 illustrates an example interaction process of a client with a key-value store.

Firstly, a client sends an analysis sub-request to a key value database, wherein the analysis sub-request can carry snapshot identification information and column family identification information; after receiving the sub-request for parsing, the key-value database finds out a snapshot to be analyzed (a snapshot created recently for the directory to be queried) by using the snapshot identification information and the column family identification information, opens the SST file included in the snapshot, and then can return a response of successful parsing to the client. The client may then send a traverse sub-request to the key-value database, which creates a traverser to traverse open SST files (which may include traversers for SST files of different hierarchies), and may also filter the traversed multiple key-value pairs to obtain the latest versions of each key-value pair. These key-value pairs of the latest version can be returned to the client as metadata of the files under the directory to be queried.

Alternatively, the parse sub-request and traverse sub-request may be one request, and the traversal process may be performed automatically after the key-value store determines that the SST file has been opened.

The query process on the key-value database side is described above with reference to fig. 6 to 9, in the query process, the key-value database creates a snapshot based on a snapshot creation instruction from the metadata server, the creation process is relatively lightweight, and then traversal of the SST file included in the snapshot can be triggered according to the identification information included in the request with the client to obtain the requested metadata, so that it is no longer necessary to return a preset amount of metadata to the metadata server multiple times, and only the last time, the return is directly made to the client, and therefore signaling overhead can be reduced, and resource usage and power consumption can be reduced.

According to another aspect of the disclosure, correspondingly, an apparatus for querying a directory of a distributed file system is also disclosed.

FIG. 10 depicts a block diagram of an apparatus for querying a directory of a distributed file system. The apparatus may be a client.

As shown in fig. 10, the apparatus 1000 includes a transmitting module 1010, a receiving module 1020, and a processing module 1030.

The sending module 1010 is configured to send a query request for a directory to be queried to a metadata server, and may send partition identification information, snapshot identification information, and a snapshot analysis request to a key value database, where the partition identification information identifies at least one storage partition in the metadata database for storing metadata of a file of the directory to be queried, the snapshot identification information identifies a snapshot created for the directory to be queried, and the snapshot includes a metadata storage file set in the at least one storage partition for storing metadata of the file under the directory to be queried; wherein the snapshot analysis request is used to trigger the key-value database: and analyzing the snapshot determined based on the partition identification information and the snapshot identification information to obtain the metadata of the file in the directory to be queried.

The receiving module 1020 may receive partition identification information and snapshot identification information from a metadata server, wherein a snapshot is saved in a key-value store.

The processing module 1030 may process the two identification information received via the receiving module 1020 to be sent to the key-value store via the sending module 1010 along with (e.g., carried in) the snapshot analysis request (e.g., or separately from) the request.

More details of the operations performed by each module are similar to those described with reference to fig. 4, and the apparatus may include more or less modules according to different functional divisions, each module may further include sub-modules, and the like, and the description thereof will not be repeated.

The apparatus described above with reference to fig. 10 sends a query request to the metadata server only once, and also has simple interaction with the key-value store, for example, only once or twice requests and responses can obtain the requested query result (metadata of the file), so that the number of times that the client sends the query request can be reduced, and the query efficiency is improved.

According to another aspect of the disclosure, another apparatus for querying a directory of a distributed file system is also disclosed accordingly.

FIG. 11 depicts a block diagram of an apparatus for querying a directory of a distributed file system. The apparatus may be a metadata server.

As shown in fig. 11, the apparatus 1100 includes a transmitting module 1110, a receiving module 1120, and a processing module 1130.

The receiving module 1120 is configured to obtain a query request for a directory to be queried, and obtain partition identification information and snapshot identification information from a key value database, where the partition identification information identifies at least one storage partition in the metadata database, where the storage partition is used to store metadata of a file of the directory to be queried, and the snapshot identification information identifies a snapshot created for the directory to be queried, where the snapshot includes a metadata storage file set in the at least one storage partition, where the metadata storage file set is used to store metadata of a file in the directory to be queried.

The sending module 1110 is configured to send, in response to the query request, a snapshot creation indication for creating a snapshot for the directory to be queried to the key-value database, and after receiving the partition identification information and the snapshot identification information from the key-value database, send the partition identification information and the snapshot identification information to the requester that sends the query request.

The processing module 1130 may process and analyze the query request received via the receiving module 1120 to generate a snapshot creation indication based on the query request. Optionally, the snapshot creation indication may further include information of the directory to be queried.

More details of the operation performed by each module are similar to those described with reference to fig. 5, and the apparatus may include more or less modules according to different functional divisions, each module may further include sub-modules, and the like, and the description thereof will not be repeated.

The device described above with reference to fig. 11 receives a query request from a client only once during a query process, sends a snapshot creation instruction to the key-value database once, receives identification information of a snapshot from the key-value database and provides the identification information to the client, and thereafter, the device may not participate in the query process any more, so that load overhead at the device may be reduced, and the device may have more control and processing capabilities to perform other operations, thereby improving control efficiency.

According to another aspect of the disclosure, a further apparatus for querying a directory of a distributed file system is also disclosed accordingly.

As shown in fig. 12, the apparatus 1200 includes a snapshot creation module 1210, a sending module 1220, a receiving module 1230, and a selection and analysis module 1240. The device may be a key-value store (with a processing device or server).

The snapshot creating module 1210 creates a snapshot for a directory to be queried in response to a snapshot creation indication from the first entity, where the directory to be queried corresponds to at least one storage partition in the metadata base, and the snapshot includes a metadata storage file set corresponding to the directory to be queried, and the metadata storage file set includes a plurality of metadata storage files in the at least one storage partition for storing metadata of files in the directory to be queried.

The sending module 1220 sends, via the first entity, the partition identification information and the snapshot identification information identifying the storage partition corresponding to the directory to be queried in the metadata base to the second entity serving as the query requester, where the storage partition corresponding to the directory to be queried is a storage partition used for storing metadata of files of the directory to be queried in the metadata base.

The receiving module 1230 obtains the snapshot analysis request, the partition identification information, and the snapshot identification information from the second entity.

The selection and analysis module 1240 determines a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzes the snapshot to be analyzed in response to the snapshot analysis request to obtain metadata of the file under the directory to be queried from the metadata storage file set.

More details of the operations performed by each module are similar to those described with reference to fig. 6-9, and the apparatus may include more or less modules according to different functional divisions, each module may further include sub-modules, and the like, and the description thereof will not be repeated.

The apparatus described above with reference to fig. 12 creates a snapshot based on a snapshot creation indication from the metadata server in the query process, the creation process is relatively lightweight, and then traversal of the SST file included in the snapshot may be triggered according to the identification information included in the request with the client to obtain the requested metadata, so that it is no longer necessary to return a preset amount of metadata to the metadata server multiple times, and only needs to return directly to the client at last, which may reduce signaling overhead, and reduce resource usage and power consumption.

According to another aspect of the present disclosure, a computing device is also disclosed. Fig. 13 shows a block diagram of a computing device 1300, according to an embodiment of the disclosure. The computing device includes: a processor; and a memory having instructions stored thereon which, when executed by the processor, cause the processor to perform the various steps of the query method as described with reference to fig. 4-6.

The computing device may be a computer terminal, mobile terminal, or other device having computing and processing capabilities. The computing device may be a server (e.g., a metadata server or a server of a database system).

The processor may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which may be of the X84 or ARM architecture.

The memory may be a non-volatile memory such as a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory. It should be noted that the memories of the methods described in this disclosure are intended to comprise, without being limited to, these and any other suitable types of memories.

The display screen of the computing device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computing device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a terminal shell, an external keyboard, a touch pad or a mouse and the like.

According to another aspect of the present disclosure, there is also provided a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the query method as described with reference to fig. 4-6.

According to yet another aspect of the present disclosure, there is also provided a computer program product comprising a computer program which, when executed by a processor, performs the steps of the query method as described with reference to fig. 4-6. The computer program may be stored in a computer readable storage medium.

The storage medium mentioned above may be a non-volatile storage medium such as a read-only memory, a magnetic or optical disk, or the like.

It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The exemplary embodiments of the present disclosure described in detail above are merely illustrative, and not restrictive. It will be appreciated by those skilled in the art that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and that such modifications are intended to be within the scope of the disclosure.

Claims

1. A method of querying metadata in a distributed file system, the distributed file system including a metadata repository, comprising:

in response to a snapshot creation indication from a first entity, creating a snapshot for a directory to be queried, wherein the directory to be queried corresponds to at least one storage partition in the metadata base, the snapshot includes a metadata storage file set corresponding to the directory to be queried, and the metadata storage file set includes a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried;

sending, by a first entity, partition identification information and snapshot identification information identifying a storage partition in the metadata base corresponding to the directory to be queried to a second entity serving as a query requester, where the storage partition corresponding to the directory to be queried is a storage partition in the metadata base for storing metadata of files of the directory to be queried;

obtaining a snapshot analysis request, the partition identification information, and the snapshot identification information from the second entity;

determining a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzing the snapshot to be analyzed in response to the snapshot analysis request, so as to obtain metadata of files in the directory to be queried from a metadata storage file set included in the snapshot to be analyzed.

2. The method according to claim 1, wherein there is at least one snapshot created at least one point in time for the directory to be queried,

determining a snapshot to be analyzed based on the partition identification information and the snapshot identification information, including:

determining the at least one storage partition of metadata of the file under the directory to be queried in the metadata base based on the partition identification information, and determining a snapshot set corresponding to the at least one storage partition, wherein each snapshot in the snapshot set corresponds to a metadata storage file set of the metadata in the at least one storage partition at its corresponding point in time; and

determining a snapshot corresponding to the snapshot identification information from the snapshot set based on the snapshot identification information.

3. The method of claim 1 or 2, wherein the snapshot analysis request comprises a parse sub-request and a traverse sub-request,

analyzing the snapshot in response to the snapshot analysis request, and obtaining metadata of the file under the directory to be queried based on the metadata storage file set, including:

analyzing the snapshot in response to the analysis sub-request, and opening a metadata storage file set included in the determined snapshot;

and traversing the opened metadata storage file set in response to the traversing sub-request to obtain metadata of the file in the directory to be queried.

4. The method of claim 3, wherein the metadata is organized in a key-value pair format, wherein the metadata repository is a key-value repository storing metadata in key-value pairs, and wherein the internal key-code of each stored key-value pair in the set of metadata storage files comprises: the original key code, version information and operation type before the storage key-value pair is stored,

wherein traversing the opened set of metadata storage files in response to the traverse sub-request comprises:

determining a first level traversal pattern and a second level traversal pattern in response to the traversal sub-request;

traversing the metadata storage file set by utilizing a first-level traversal mode to obtain a first group of storage key-value pairs, wherein the storage key-value pairs of the first group of storage key-value pairs are different in at least one of original key codes, version information and operation types;

and traversing the first group of storage key value pairs by utilizing a second-level traversal mode, and determining a second group of key value pairs based on version information included in the storage key value pairs in the first group of storage key value pairs, wherein the key codes of the key value pairs in the second group of key value pairs are original key codes and are different.

5. The method of claim 4, wherein determining a second set of key-value pairs based on version information included in the stored key-value pairs of the first set of stored key-value pairs comprises:

determining at least one subgroup of storage key value pairs of the first group of storage key value pairs, wherein the internal key codes comprise the same original key codes;

for the storage key value pair of each subgroup, screening out one storage key value pair with the latest version from the storage key value pairs of the subgroups based on version information to serve as a representative key value pair of the subgroup;

the representative key-value pairs of all subgroups are taken as the second group of key-value pairs.

6. The method of claim 4 or 5, wherein the set of metadata storage files comprises a plurality of metadata storage files having different hierarchical levels;

the method for traversing the storage key-value pairs in the metadata storage file set by using the first-level traversal mode to obtain a first group of storage key-value pairs comprises the following steps:

and traversing the storage file of the lowest level and a plurality of metadata storage files of other levels separately based on a first-level traversal mode to obtain a first group of storage key value pairs.

7. A method according to claim 4 or 5, wherein the second set of key-value pairs is used to populate a pre-set data structure, and the pre-set data structure is returned to the second entity.

8. The method of claim 1, wherein the first entity is a metadata server and the second entity is a client.

9. A method of querying metadata in a distributed file system, the distributed file system including a metadata repository, the method comprising:

sending a query request aiming at a directory to be queried to a metadata server;

obtaining partition identification information and snapshot identification information from a metadata server, wherein the partition identification information identifies at least one storage partition in the metadata base for storing metadata of files of the directory to be queried, the snapshot identification information identifies a snapshot created for the directory to be queried, the snapshot includes a metadata storage file set composed of a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried,

sending the partition identification information, the snapshot identification information and a snapshot analysis request to the metadata base, wherein the snapshot analysis request is used for triggering the metadata base to perform analysis on a snapshot determined based on the partition identification information and the snapshot identification information so as to obtain metadata of a file in the directory to be queried; and

and acquiring the metadata of the file under the directory to be queried from the metadata database.

10. The method of claim 9, wherein the snapshot analysis request comprises a parse sub-request and a traverse sub-request, wherein:

the analysis sub-request is used for triggering the analysis of the determined snapshot at the metadata base and opening a metadata storage file set included in the determined snapshot;

the traversal sub-request is used for triggering traversal of the opened metadata storage file set at the metadata base to obtain metadata of the file under the directory to be queried.

11. A method of querying metadata in a distributed file system, the distributed file system including a metadata repository, the method comprising:

acquiring a query request aiming at a directory to be queried;

sending a snapshot creating instruction aiming at creating a snapshot for the directory to be queried to the metadata base in response to the query request;

obtaining partition identification information and snapshot identification information from the metadata base, wherein the partition identification information identifies at least one storage partition used for storing metadata of files of the directory to be queried in the metadata base, the snapshot identification information identifies a snapshot created for the directory to be queried, and the snapshot includes a metadata storage file set composed of a plurality of metadata storage files used for storing metadata of files under the directory to be queried in the at least one storage partition;

and sending the partition identification information and the snapshot identification information to a requester sending the query request, wherein the partition identification information and the snapshot identification information are sent to a metadata base by the requester for determining a snapshot to be analyzed to obtain metadata of the file under the directory to be queried.

12. The method of claim 11, further comprising:

determining the number of files under the directory to be queried based on the query request;

in the case that the number is greater than or equal to a threshold value, sending the snapshot creation indication to the metadata base; and

and executing a common query mode when the number is smaller than a threshold value, wherein in the common mode, at least one query request is received from a requester, and a preset number of metadata of the files in the directory to be queried is returned to the requester for each query request.

13. A method of querying metadata in a distributed file system, the distributed file system including a metadata repository, the method comprising:

sending a snapshot creating instruction to the metadata base by the metadata server in response to a query request from a client for a directory to be queried;

creating, by the metadata base, a snapshot for the directory to be queried in response to the snapshot creation indication, and returning partition identification information and snapshot identification information to the client via the metadata server, where the partition identification information identifies at least one storage partition in the metadata base for storing metadata of files of the directory to be queried, and the snapshot identification information identifies the created snapshot, where the snapshot includes a metadata storage file set composed of a plurality of metadata storage files in the at least one storage partition for storing metadata of files under the directory to be queried;

sending, by the client, a snapshot analysis request, the partition identification information, and the snapshot identification information to the metadata repository;

determining, by the metadata base, a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzing the snapshot to be analyzed in response to the snapshot analysis request to obtain metadata of the file under the directory to be queried from the metadata storage file set.

14. A distributed file system, comprising: a client; a metadata server; and a metadata repository characterized by:

the metadata server responds to a query request for a directory to be queried from a client and sends a snapshot creation instruction to the metadata base;

the metadata base responds to the snapshot creation indication, creates a snapshot for the directory to be queried, and returns partition identification information and snapshot identification information to the client through the metadata server, wherein the partition identification information identifies at least one storage partition used for storing metadata of files of the directory to be queried in the metadata base, the snapshot identification information identifies the created snapshot, and the snapshot comprises a metadata storage file set used for storing metadata of files under the directory to be queried in the at least one storage partition;

the client sends a snapshot analysis request, the partition identification information and the snapshot identification information to the metadata base;

and the metadata base determines a snapshot to be analyzed based on the partition identification information and the snapshot identification information, and analyzes the snapshot to be analyzed in response to the snapshot analysis request so as to obtain metadata of the file in the directory to be queried from the metadata storage file set.

15. A computing device, comprising:

a processor for processing the received data, wherein the processor is used for processing the received data,

a memory including one or more computer program modules;

wherein the one or more computer program modules are configured to be executed by the processor, the one or more computer program modules comprising instructions for implementing the method of querying metadata in a distributed file system of any of claims 1-12.