CN115098466A

CN115098466A - Metadata management method and device, storage node and readable storage medium

Info

Publication number: CN115098466A
Application number: CN202210843863.XA
Authority: CN
Inventors: 林杰
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2022-09-23

Abstract

The invention relates to the technical field of distributed storage, and provides a metadata management method, a device, a storage node and a readable storage medium, wherein the method comprises the following steps: receiving an operation request of an operation target object sent by a client, wherein the target object comprises a target directory or a target file; determining target metadata in metadata of a target object based on the operation request, and operating the target object by accessing the target metadata to obtain an operation result of the operation target object, wherein the metadata of the target directory comprises access control metadata, content metadata and directory list metadata, and the metadata of the target file comprises access metadata; and returning the operation result to the client. The embodiment of the invention simplifies the management of the metadata and effectively improves the access performance of the metadata.

Description

Metadata management method and device, storage node and readable storage medium

Technical Field

The invention relates to the technical field of distributed file storage, in particular to a metadata management method, a metadata management device, a storage node and a readable storage medium.

Background

In the application field of storage of massive small files, high throughput and low time delay are generally required, and stable storage and access performance can still be provided under the requirement of billions of small files. Currently existing distributed file systems, such as: the Ceph file system stores the directory/file metadata in independent object objects, which are complex to design and manage.

Disclosure of Invention

The invention aims to provide a metadata management method, a device, a storage node and a readable storage medium, which are used for decoupling target metadata and file metadata by respectively dividing directory metadata and file metadata in advance, thereby simplifying the management of the metadata, only accessing a part of metadata when operating directories and files, and effectively improving the access performance of the metadata.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a metadata management method, which is applied to any storage node in a plurality of storage nodes in a distributed file system, where the storage node is in communication connection with a client, and the method includes:

receiving an operation request of an operation target object sent by the client, wherein the target object comprises a target directory or a target file;

determining target metadata in metadata of the target object based on the operation request, and operating the target object by accessing the target metadata to obtain an operation result of operating the target object, wherein the metadata of the target directory comprises access control metadata, content metadata and directory list metadata, the metadata of the target file comprises access metadata, the access control metadata is used for controlling access to the target directory, and the content metadata is used for representing management information of the target directory related to subdirectories and files; the directory list metadata represents list information of subdirectories and files included in the target directory, and the access metadata represents access information required for accessing the target files;

and returning the operation result to the client.

Optionally, the step of determining target metadata in the metadata of the target object based on the operation request, and operating the target object by accessing the target metadata includes:

if the operation request is used for creating the target directory, determining the access control metadata and the content metadata of the target directory as the target metadata;

storing the access control metadata and the content metadata of the target directory to create the target directory.

Optionally, the storage node includes a first storage tier and a second storage tier whose access performance is sequentially reduced, and the step of storing the access control metadata and the content metadata of the target directory includes:

acquiring a directory identifier, a directory name and a parent directory identifier of the target directory;

determining a first storage node from the plurality of storage nodes according to the father directory identification and the directory name;

storing the access control metadata to a first storage tier and a second storage tier of the first storage node;

determining a second storage node from the plurality of storage nodes according to the directory identifier of the target directory;

storing the content metadata to a first storage tier and a second storage tier of the second storage node.

Optionally, the storage node includes a first storage tier, a second storage tier, and a third storage tier, where access performance is sequentially reduced, and the step of determining target metadata in the metadata of the target object based on the operation request and operating the target object by accessing the target metadata further includes:

if the operation request is used for obtaining a directory list of the target directory, determining the directory list metadata as the target metadata;

determining a third storage node for storing the log index of the target directory according to the directory identifier of the target directory and reading the log index from a first storage layer of the third storage node;

if the log index is a preset value, reading the target list metadata from a second storage layer of the third storage node to obtain a directory list of the target directory;

if the log index is not the preset value, determining a fourth storage node from the plurality of storage nodes according to a preset erasure correction strategy;

reading the target list metadata from a third storage tier of the fourth storage node to obtain a directory list of the target directory.

Optionally, the storage node includes a first storage tier and a second storage tier whose access performance is sequentially reduced, and the step of determining, based on the operation request, a target metadata in the metadata of the target object and operating the target object by accessing the target metadata further includes:

if the operation request is used for creating a target file, determining the access metadata as the target metadata;

acquiring a directory identifier of a directory to which the target file belongs;

determining a fifth storage node for storing the access metadata according to the directory identifier of the directory and the file name of the target file;

storing the access metadata to the first storage tier and the second storage tier of the fifth storage node to create the target file.

Optionally, the storage node further includes a third storage tier, an access performance of the third storage tier being smaller than that of the second storage tier, and the method further includes:

acquiring the total number of the sub-directories and the file numbers included in the directory;

if the total number is larger than the preset number, determining a sixth storage node from the plurality of storage nodes according to a preset erasure correction strategy;

and storing the directory list metadata of the directory to a third storage layer of the sixth storage node.

Optionally, the step of determining target metadata in the metadata of the target object based on the operation request, and operating the target object by accessing the target metadata further includes:

if the operation request is used for acquiring the file state of a target file, determining the access metadata as the target metadata;

acquiring access metadata of the target file according to a target identifier of the directory to which the target file belongs and the file name of the target file;

obtaining access control metadata of the belonged catalog according to the belonged catalog;

and taking the access metadata of the target file and the access control metadata of the directory as the file state of the target file to obtain the file state of the target file.

In a second aspect, an embodiment of the present invention provides a metadata management apparatus, which is applied to any storage node in a plurality of storage nodes in a distributed file system, where the storage node is communicatively connected to a client, and the apparatus includes:

a receiving module, configured to receive an operation request for operating a target object sent by the client, where the target object includes a target directory or a target file;

the processing module is used for determining target metadata in metadata of the target object based on the operation request and operating the target object by accessing the target metadata to obtain an operation result of operating the target object, wherein the metadata of the target directory comprises access control metadata, content metadata and directory list metadata, the metadata of the target file comprises access metadata, the access control metadata is used for controlling access to the target directory, and the content metadata represents management information of the target directory related to subdirectories and files; the directory listing metadata represents listing information of sub-directories and files included in the target directory, and the access metadata represents access information required for accessing the target files;

and the return module is used for returning the operation result to the client.

In a third aspect, an embodiment of the present invention provides a storage node, including a processor and a memory, where the memory is used to store a program, and the processor is configured to implement the metadata management method according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the metadata management method as described in the first aspect.

The method has the advantages that when an operation request for operating the target object sent by a client is received, the target object comprises a target directory or a target file, metadata of the target directory comprises access control metadata, content metadata and directory list metadata, the metadata of the target file comprises the access metadata, the access control metadata are used for controlling access of the target directory, and the content metadata represent management information related to subdirectories and files of the target directory; the directory list metadata represents the list information of subdirectories and files included by the target directory, the access information required by the access metadata representation for accessing the target file determines the target metadata in the metadata of the target object based on the operation request, and the target object is operated by accessing the target metadata to obtain the operation result of the operation target object.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is an exemplary diagram of an application scenario provided in an embodiment of the present invention.

Fig. 2 is a schematic block diagram of a storage node according to an embodiment of the present invention.

Fig. 3 is an exemplary diagram of a distributed file logic architecture according to an embodiment of the present invention.

Fig. 4 is an exemplary diagram of a hierarchical storage according to an embodiment of the present invention.

Fig. 5 is a first flowchart illustrating a metadata management method according to an embodiment of the present invention.

Fig. 6 is a first interaction diagram of a metadata management method according to an embodiment of the present invention.

Fig. 7 is an interaction diagram of a metadata management method according to an embodiment of the present invention.

Fig. 8 is a third interaction diagram of a metadata management method according to an embodiment of the present invention.

Fig. 9 is a fourth interaction diagram of the metadata management method according to the embodiment of the present invention.

Fig. 10 is a schematic block diagram of a metadata management apparatus according to an embodiment of the present invention.

Icon: 10-a storage node; 11-a processor; 12-a memory; 13-a bus; 14-a communication interface; 20-a client; 100-metadata management means; 110-a receiving module; 120-a processing module; 130-return module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is an exemplary diagram of an application scenario provided by an embodiment of the present invention, in fig. 1, a distributed file system includes a plurality of storage nodes 10, the storage nodes 10 communicate with each other, the distributed file system provides a public IP for providing a storage service for the outside, a client 20 accesses a directory or a file in the distributed file system by accessing the public IP, the distributed file system allows the client 20 to create a directory and a file, and modify, delete, or view a directory list and the like for the created directory, and read, write, delete, view a file state and the like for the created file.

Based on the application scenario in fig. 1, the distributed file system manages billions of small files, even billions of small files, in storage, and faces the following major challenges:

1) the method has the advantages that the metadata amount is large, the management is complex, the access efficiency is low, and complicated strategies such as partitioning or table division need to be adopted for carrying out metadata balance management;

2) metadata objects are complex, occupied space is large, utilization efficiency of storage space is low, and a large number of high-performance storage media such as Solid State Disk (SSD) NVMe SSD (Non-Volatile Memory Express Solid State Disk, NVMe SSD) and the like need to be adopted for standardizing host controller interfaces of the SSD;

3) the directory tree is deep in hierarchy, the file access path is long, and the access time delay is high;

in view of the above problems, there are currently existing distributed file systems, such as: and the Cephfs balances the whole namespace partition to different metadata servers for management by adopting a dynamic subtree partition strategy, manages the directory/file metadata in a directory tree hierarchical organization structure, and persistently stores the directory/file metadata into the SDD pool in an object (object) mode. The hot directory/file metadata is cache managed using a Least Recently Used (LRU) access warm-up algorithm. However, the directory/file metadata is saved in a separate object, which is complicated in design and occupies a large storage space. By adopting the LRU access preheating algorithm, the hot directory/file metadata is simply managed, a large cache space is needed, the cache hit efficiency is low, single directory/file operation relates to multiple metadata management server/storage node interactions, the access path is long, and the time delay is high. Moreover, a dynamic sub-tree coarse-grained partitioning strategy is adopted, and the problem of load imbalance is easy to occur in a plurality of types of service load scenes.

For solving the problems existing in the existing distributed file system, the present embodiment provides a metadata management method, an apparatus, a storage node, and a readable storage medium, which can simplify the management of metadata, optimize the access performance of metadata, and effectively alleviate the problem of load imbalance.

On the basis of fig. 1, an embodiment of the present invention further provides a block schematic diagram of the storage node in fig. 1, please refer to fig. 2, fig. 2 is a block schematic diagram of the storage node provided in the embodiment of the present invention, and the storage node 10 may be a storage server, or a storage server group, a storage array, or a host group providing a storage function, which is formed by a plurality of storage servers. The storage node 10 comprises a memory 11, a processor 12, a bus 13, a communication interface 14. The memory 11 and the processor 12 are connected by a bus 13, and the processor 12 is communicatively connected to other storage nodes 10 or clients 20 by a communication interface 14.

The memory 11 is used for storing a program, such as a metadata management apparatus in the embodiment, the metadata management apparatus includes at least one software functional module which can be stored in the memory 11 in a form of software or firmware (firmware), and the processor 12 executes the program after receiving an execution instruction to implement the metadata management method disclosed in the above embodiment.

The Memory 11 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory 11 may be a storage device built in the processor 12, or may be a storage device independent of the processor 12.

The bus 13 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 2 is represented by only one double-headed arrow, but does not represent only one bus or one type of bus.

The processor 12 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 12. The Processor 12 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

The communication connections between the storage node 10 and other storage nodes 10, and the clients 20 are realized by at least one communication interface 14 (which may be wired or wireless).

First, a logical architecture diagram of a distributed file system applied to an application scenario in fig. 1 provided in this embodiment is described, please refer to fig. 3, fig. 3 is an exemplary diagram of a logical architecture of a distributed file system according to an embodiment of the present invention, and in fig. 3, each storage node in the distributed file system (referred to as MinFS in this embodiment) includes three storage media, namely a memory, an SSD, and a mechanical hard disk, and access performance of the storage nodes is as follows: the distributed File System provides two access modes, namely a Network File System (NFS) and a user state space (FUSE), for the client 20, each storage node provides Cache Server service, Metadata Server service and Data Server service, Metadata is stored into the memory of the storage node 10 through the Cache Server service, the SSD of the storage node 10 is stored through the Metadata Server service, and the mechanical hard disk of the storage node 10 is stored through the Data Server service. The Cache Server stores Metadata in a hash mode, the Metadata Server stores Metadata in a key value mode, and the Data Server stores Metadata in an erasure mode. Specifically, the logic architecture of fig. 3 includes the following main components:

1) clients (client 20), MinFS provides a global unified namespace, and supports the access of the Clients through LibMinFS, NFS and FUSE modes. The LibMinFS performance is far superior to NFS and FUSE, direct access is supported, and no path analysis overhead exists.

2) The Metadata Servers maintain a flattened name space, and the rear end of the Metadata Server of each storage node adopts a KV Stor persistence engine, such as: and the RocksDB stores management directories and file metadata in a Key-Value Pairs mode. Typically, high performance storage media are employed, such as: SSD/NVMe SSD to provide efficient metadata access performance.

3) Cache Servers, using a classification hotspot-aware algorithm, such as: and the LRU identifies and caches the hot spot directory and the file metadata. The Cache Server rear end of each storage node adopts an in-memory KV retrieval engine, such as: redis caches hot directory and file metadata in a Key-Value Pairs mode, provides high-efficiency metadata retrieval performance, and effectively reduces path analysis delay.

4) Data Servers, which store uniform distributed files, usually use a mechanical Hard Disk (HDD) as a back-end storage medium, and provide highly reliable Data services by using an erasure correction redundancy strategy. And massive small file data is saved in a polymerization manner so as to improve the utilization efficiency of a storage space. The directory Entry List is saved in a Write-ahead loading (WAL) manner, so that the operation delay of reading the directory List by a large directory is reduced.

In order to simplify the management of metadata, the present embodiment manages target metadata and file metadata separately, and decomposes directory metadata and file metadata, respectively, based on the access characteristics of the metadata, and decomposes the directory metadata into: the method includes accessing control Metadata (referred to as Access Metadata in this embodiment), Content Metadata (referred to as Content Metadata in this embodiment), and directory list Metadata (referred to as Entrylist in this embodiment), where the Access Metadata is used to control Access to a target directory and mainly describes attributes of the directory itself, the Content Metadata represents management information about subdirectories and files of the target directory, and the Entrylist represents list information about subdirectories and files included in the target directory.

This embodiment decomposes file metadata into: accessing Metadata and Extended attribute Metadata, wherein the accessing Metadata is related to user file data, for example, the location where the user file data is stored, the Extended attribute Metadata (referred to as Extended Attributes in this embodiment) describes file Extended Attributes, for example, Access control information of a file, and since the Extended attribute Metadata is generally the same as the Access Metadata of the directory to which the file belongs, in order to further simplify the processing, this embodiment does not need to maintain an own Extended Attributes for each file, but instead, uses the Access Metadata of the directory to which the file belongs as the Extended Attributes of the file, and also greatly optimizes the Metadata storage and cache space.

Typically, standard file systems, such as: EXT4/XFS/Cephfs, etc., directory/file inode takes hundreds of bytes, including: name (directory/file name), id (directory/file identification), permission (access right), user (affiliated user), timestamp (timestamp), entry list (directory list), file location (storage location of file), and the like. In fact, the directory/file inode information can be broken up into multiple independent sections, and Table 1 is an example of metadata classification in MinFS.

TABLE 1

In table 3, Extended Attributes in the file Metadata and Access Metadata of the parent directory to which the Extended Attributes belong are combined into one, because in a massive small file storage application scenario, a user usually cares more about file data and Access delay, and rarely sets Extended Attributes for a certain file alone, so that the merging mode does not cause great influence on information of the file, but greatly reduces the data volume of the Metadata, and the simplified mode enables MinFS 1) to support storage management of billions of small files, and high ops and low delay are realized.

In table 3, the storage formats of the directory metadata and the file metadata are also different, specifically:

the directory Access Metadata is stored in the MetaServers in a Key-Value Pairs mode. Key is < pid, name >, Value is { id, permission, user, timestamp, etc. }, wherein pid is the parent directory id of the directory, accounts for 4bytes, and id of the root directory "/" is 0. And balancing the metadata object to different MetaServer management according to the pid Hash partition.

Secondly, the catalog Content Metadata is stored in the MetaServers in a Key-Value Pairs mode. Key is < id >, Value is { logidx, filiidx, capacity, etc. }, wherein logidx and filiidx respectively represent the storage location information of the directory EntryList and the aggregation file in the DataServers, and the filiidx is used for supporting one-Key deletion of the directory file data operation. And partitioning to different MetaServer management according to the directory id hash, ensuring the locality of metadata access, and acquiring the directory EntryList in a directory id prefix iteration mode. And the directory renaming operation does not influence the directory id, and only the Key of the directory Access Metadata needs to be updated, so that the descendant Metadata is not influenced.

The directory EntryList is stored in DataServers in WAL mode and has the format { < name1, location > < name2, location > < name >, … }. For a large directory, the id prefix iteration mode consumes CPU and IO resources relatively, and the time consumption is long. When the number of files exceeds a threshold (such as: 1000), the WAL file of the directory EntryList can be generated at one time in a background prefix iteration mode, and then updated in an additional writing mode, such as: file creating operation, and additionally recording < name, location >; deleting the file, and additionally recording the name >; and (4) acquiring a directory list operation, and directly reading the directory EntryList from the DataServers through logidx.

And fourthly, file access metadata is stored in MetaServers in a Key-Value Pairs mode, wherein Key is [ pid, name ], Value is < location >, and [ pid ] is a father directory to which the file belongs. The small file data is stored in the DataServers in a mode of aggregating into a large file, the location represents the storage position information, and the specific format is represented as follows: the method comprises the steps of Fileid-Len-Offset, wherein Fileid is the only index identification of an aggregated file in the DataServers, Len is the data length of a small file, Fileid + Len occupy 8Bytes together, and Offset is Offset of the small file in the aggregated file and occupies 4 Bytes. Managing to different MetaServer according to the pid Hash partition, and ensuring the access locality of metadata, such as: the file creating/deleting operation relates to the parent directory Content Metadata and the file Access Metadata, can be completed in a Metadata server local transaction mode, and avoids distributed transaction overhead.

Fifthly, the Extended Attributes of the file is merged with the directory Access Metadata for storage. And the file stat operates to generate a file state attribute by combining the parent directory Access Metadata and the file Access Metadata.

In this embodiment, the KV Stor persistence engine based on the LSM-structure principle generally has the problem of read-write amplification, such as: rocksbb, which provides good sequential write performance by writing MemTables first and then merging the background to ssttables, involves finding multiple MemTables and ssttables for read operation, and consumes CPU and IO resources. Particularly, when compact occurs, the read/write performance is drastically reduced.

In addition, the access hot degree rules of the three types of directory metadata obtained by decomposition are different, such as: the client 20 may perform the operation of obtaining the directory list relatively frequently, accessing an EntryList directory, but relatively rarely accesses the Access Metadata of a directory separately, and the file creating/deleting operation only relates to the parent directory Content Metadata and EntryList to which the file belongs.

For the above defects and access characteristics, the MinFS provided in this embodiment adopts a metadata retrieval engine and a persistence engine to classify, hierarchically store and cache the metadata retrieval engine and the persistence engine, so as to solve the problem that the metadata hot spot reading affects the real-time writing, and improve the utilization efficiency of the SSD and the memory space.

Based on the access characteristic, the embodiment provides layered storage, and a hot-temperature-cold three-layer storage architecture is adopted to realize the separation of the retrieval engine and the persistence engine.

(ii) DataServers "Cold" archive layer, directory Entrylist "Cold", persistently archives it to the HDD. The operation of obtaining the directory list accounts for 5% of all metadata operation types, and the directory EntryList generally occupies a large space. To reduce the latency of obtaining a directory list for a large directory, the format described above persistently archives directory Entrylist into DataServers when the number of directory contexts exceeds a threshold (e.g., 1000). Entrylist obtained by the operation of obtaining the directory list does not preheat CacheServers, and invalid cache is avoided.

And secondly, storing the Metadata of the Metadata 'warm' in a Metadata server, wherein the directory Access Metadata, the Content Metadata and the file Access Metadata are 'warm', and storing the Metadata to the SSD persistently. File class operations, such as: open/close/create/delete/stat (file state of open/close/create/delete/get file), accounting for 95% of all metadata operation classes, related to metadata objects including: the directories Access and Content Access and the files Access are simplified Metadata, occupy small space, are stored in the Metadata in a grouping and partitioning mode according to directory id in the format described above, and are usually stored in the Metadata in a high-performance storage medium such as SSD/NVMe at the rear end so as to provide efficient Metadata Access.

And thirdly, a cache servers 'hot' cache layer, wherein the hot directories Access Metadata, Content Metadata and file Access Metadata are 'hot', and are cached in the memory, so that the Metadata storage space and the Access performance are greatly optimized.

Based on a hot-warm-cold three-tier storage architecture, this embodiment provides a specific example diagram of tiered storage, please refer to fig. 4, fig. 4 is an example diagram of tiered storage provided by an embodiment of the present invention, in fig. 4, a memory corresponds to a hot storage tier, an SSD storage pool corresponds to a warm storage tier, and an HDD storage pool corresponds to a cold storage tier, in terms of capacity, the memory < SSD storage pool < HDD storage pool, the memory is GB level, the SSD storage pool is TB level, the HDD storage pool is PB level, in terms of access delay, the memory < SSD storage pool < HDD storage pool, the memory is ns level, the SSD is us level, and the HDD storage pool is ms level.

As a specific implementation mode, for an application scenario with high real-time requirement and relatively low reliability requirement, for operations such as creating a directory/file, the metadata of the directory/file is supported to be returned immediately after being written into the CacheServers, and the metadata is asynchronously submitted to the CacheServers by a background task for permanent storage.

Aiming at the upper-layer hierarchical storage, different caching strategies can be adopted for the directory/file metadata, and the path analysis time delay is reduced. For directory metadata, a "as much as possible" caching strategy is adopted to reduce path resolution latency (path resolution latency accounts for 70% of the total latency). And (3) the cache space is only 250MB on average per 100 ten thousand directory metadata, and the directory is accessed for the first time, namely, the CacheServers preheating is carried out. For file metadata, a caching strategy as few as possible is adopted, and in a security massive small file storage application scene, the probability that the same small file is accessed twice continuously in a short time is close to zero, so that the CacheServers is not preheated when the file is accessed for the first time. If the fact that the metadata of different files in the same directory are continuously accessed is detected, the fact that the files in the directory are about to become hot points is judged, the operation of obtaining a target list is executed, and CacheServers preheating is conducted on the directory Entrylist.

The method for managing the Metadata adopts simplified Metadata, classification Tier and a classification preheating strategy to split and decouple the directory/file Metadata, merges the Extended Attributes and the directory Access Metadata, and persistently stores the directory Entrylist to the HDD in a WAL mode, thereby greatly optimizing the utilization efficiency of the SSD and the memory space. Every 10 hundred million metadata records only need not exceed 40GB SSD space on average, hot spot metadata accounts for 1% of the calculation, and only need not exceed 400MB of memory space.

Based on the logic architecture of the MinFS described in fig. 3 and the metadata decomposition and hierarchical storage scheme based on the logic architecture, this embodiment further provides a flow of a metadata management method when performing directory and file operations, please refer to fig. 5, fig. 5 is a first schematic flow diagram of the metadata management method provided in this embodiment of the present invention, and the method includes the following steps:

step S100, receiving an operation request of an operation target object sent by a client, where the target object includes a target directory or a target file.

In this embodiment, the operation request includes, but is not limited to, at least one of a create directory, a get directory list, a create file, and a get file, and the target object is an object that the operation request needs to be operated, for example, if the operation request is a create directory, the target object is a target directory that needs to be created.

Step S101, based on the operation request, determining target metadata in metadata of a target object, and operating the target object by accessing the target metadata to obtain an operation result of operating the target object, wherein the metadata of a target directory comprises access control metadata, content metadata and directory list metadata, the metadata of a target file comprises the access metadata, the access control metadata is used for controlling the access of the target directory, and the content metadata characterizes management information of the target directory related to subdirectories and files; the directory listing metadata characterizes list information of subdirectories and files included in the target directory, and the access metadata characterizes access information required for accessing the target files.

In this embodiment, when the target object is a target directory, the target metadata is at least one of access control metadata, content metadata, and directory list metadata, and when the target object is a target file, the target metadata is access metadata, and since the extended attribute metadata of the file metadata is the same as the directory access control metadata to which the file metadata belongs, when the target object is a target file, the target metadata may further include the access control metadata of the directory to which the target file belongs.

In this embodiment, the operation result includes, but is not limited to, operation success and operation failure, and as a specific implementation, different operation failure situations may be included, for example, creating a target directory, and the operation failure may be that the created target directory already exists.

And step S102, returning the operation result to the client.

According to the method provided by the embodiment, the metadata of the directory and the metadata of the file are divided, so that the management of the metadata is simplified, only a part of the metadata needs to be accessed when the target or the file is operated, and the access performance of the metadata is effectively improved.

In this embodiment, in order to more clearly describe how metadata is managed when an operation request is made, the following describes the file states of creating a directory, obtaining a directory list, creating a file, and obtaining a file one by one, and of course, operations on a directory and a file may also include deleting a directory, modifying a directory, deleting a file, accessing a file, and the like, and those skilled in the art can obtain the operations according to the contents described in this embodiment without creative effort.

The processing method for creating the target directory according to the operation request provided by the embodiment is as follows:

first, if the operation request is for creating a target directory, access control metadata and content metadata of the target directory are determined as the target metadata.

In this embodiment, the access control metadata includes, but is not limited to, an identification of the target directory, access rights, the affiliated user, and the like. The access right, read-only or read-write, can be designated when the target directory is created, the user when the target directory is created can be obtained, and the user to which the target directory belongs can be obtained.

Second, the access control metadata and the content metadata of the target directory are stored to create the target directory.

In this embodiment, one of the processes of creating the target directory is to store metadata related to the target directory.

As a specific storage manner, it may be:

(1) acquiring a directory identifier, a directory name and a parent directory identifier of a target directory;

in this embodiment, the directory identification is used to uniquely characterize the target directory, also referred to as directory id. The parent directory of the target directory is the directory to which the target directory belongs, for example, if a target directory a is created under the root directory, the root directory is the parent directory of a.

(2) Determining a first storage node from a plurality of storage nodes according to the father directory identification and the directory name;

in this embodiment, the first storage node is determined by using the parent directory identifier and the directory name of the target directory as key values.

(3) Storing the access control metadata to a first storage tier and a second storage tier of the first storage node;

in this embodiment, the first storage tier and the second storage tier may be a memory and an SSD, respectively. Because the access control metadata belongs to 'hot' data, in order to improve the access performance, the access control metadata can be temporarily stored in a first storage layer, the first storage layer belongs to a volatile storage medium, and the data in the first storage layer is not stored after power failure, so that the access control metadata can be permanently stored in a second storage layer.

(4) Determining a second storage node from the plurality of storage nodes according to the directory identifier of the target directory;

in this embodiment, the second storage node is determined by using the directory identifier of the target directory as a key.

(5) The content metadata is stored to a first storage tier and a second storage tier of a second storage node.

In this embodiment, the content metadata and the access control metadata are also temporarily stored in the first storage tier and persistently stored in the second storage tier, similarly.

To more clearly illustrate the process of creating the target directory, this embodiment provides a process of creating the target directory a under the root directory, please refer to fig. 6, where fig. 6 is a first interaction diagram of the metadata management method provided by the embodiment of the present invention, and includes the following steps:

s11: NFS-Client requests NFS-Server to search A;

s12: NFS-server inquires Cache Servers according to Key <0, A >, whether A exists, if yes, returning that directory A exists;

s13: if A does not exist, the NFS-server returns OK to the NFS-Client.

S14: the NFS-Client requests the NFS-Server to create directory A.

S15: the NFS-server requests an A directory id (1) from a MetaServer (responsible for managing cluster Metadata), and caches the Access Metadata and the Content Metadata of the A directory in a CacheServers (namely, a first storage layer, a determination mode of a first storage node has been described in the foregoing) according to Key <0, A > and Key <1 >.

S16: the NFS-server calculates the located MetaServer according to Key <0, A > and Key <1> Hash, for example, MetaServer A and MetaServer B, and persists Access Metadata and Content Metadata to the corresponding MetaServer.

S17: the NFS-server returns the success of creating A to the NFS-Client.

The processing mode of the operation request provided by this embodiment to obtain the directory list of the target directory is as follows:

first, if the operation request is used to obtain a directory list of a target directory, the directory list metadata is determined as target metadata.

Secondly, according to the directory identification of the target directory, a third storage node for storing the log index of the target directory is determined, and the log index is read from the first storage layer of the third storage node.

In this embodiment, if the target directory includes a small number of subdirectories or files, at this time, each subdirectory and list information of each file included in the target directory may be directly read from the second storage layer, and the list information may be combined and returned to the client 20. If the target directory includes too many subdirectories or files, the list information of the subdirectories or files included in the target directory is directly read from the third storage layer at this time and returned to the client 20. As a specific implementation manner, the journal index may take a special preset value to represent that the target list metadata is read from the second storage layer, otherwise, the journal index represents the position of the target list metadata in the third storage layer.

Thirdly, if the log index is a preset value, reading the target list metadata from the second storage layer of the third storage node to obtain a directory list of the target directory.

In this embodiment, the preset value can be set as required, for example, the preset value is full F.

Fourthly, if the log index is not a preset value, determining a fourth storage node from the plurality of storage nodes according to a preset erasure strategy.

Fifth, the target list metadata is read from the third storage tier of the fourth storage node to obtain a directory list of the target directory.

To more clearly explain the process of obtaining the directory list of the target directory, the present embodiment provides a process of obtaining the directory list of the target directory a, please refer to fig. 7, and fig. 7 is an interaction diagram of a metadata management method according to the second embodiment of the present invention, which includes the following steps:

s21: NFS-Client requests NFS-Server to search A;

s22: NFS-server inquires Cache Servers according to Key <0, A >, whether A exists or not, if not, returning that the directory A does not exist;

s23: if A exists, NFS-server returns OK to NFS-Client.

S24: and the NFS-Client requests the NFS-Server to acquire a directory list of the directory A.

S25: NFS-server queries CacheServers according to Key <1> to obtain logidx.

S26: and acquiring the directory list from the DataServer according to the logidx.

S27: and the NFS-server returns the acquired target list of the target A to the NFS-Client.

The processing mode for creating the target file by the operation request provided by the embodiment is as follows:

firstly, if the operation request is used for creating a target file, determining the access metadata as target metadata;

secondly, acquiring a directory identifier of a directory to which the target file belongs;

in this embodiment, if the target file a.txt is created in the directory a, the directory to which the a.txt belongs is a, which is also referred to as a parent directory of the a.txt.

Thirdly, determining a fifth storage node for storing the access metadata according to the directory identifier of the directory and the file name of the target file;

in this embodiment, the directory identifier of the directory to which the target file belongs and the file name of the target file are used as key values to determine the fifth storage node.

Fourth, the access metadata is stored to the first storage tier and the second storage tier of the fifth storage node to create the target file.

In this embodiment, in order to ensure that the directory to which the target file belongs still ensures the access performance of obtaining the target list of the directory to which the target file belongs when the directory includes too many files and sub-directories, after the target file is created or the target directory is created, the metadata of the parent directory of the target file also needs to be updated, and the specific updating process is as follows:

(1) acquiring the total number of subdirectories and file numbers included in the directory;

(2) if the total number is larger than the preset number, determining a sixth storage node from the plurality of storage nodes according to a preset erasure correction strategy;

in this embodiment, the preset number may be set as needed, for example, the preset number is 1000.

(3) And storing the directory list metadata of the directory to a third storage layer of a sixth storage node.

When the number of the subdirectories or the files included in the directory is excessive, the list information of all the subdirectories and the files of the directory can be directly found through the log index, so that the problem of long processing time caused by combination is solved, and the target list of the target directory can still be quickly obtained when the number of the subdirectories or the files included in the directory is excessive.

To more clearly illustrate the process of creating a target file, this embodiment provides a process of creating a target file a.txt under a directory a, please refer to fig. 8, and fig. 8 is a third interaction diagram of the metadata management method provided by the embodiment of the present invention, including the following steps:

s30: NFS-Client requests NFS-Server to search A;

s31: NFS-server inquires Cache Servers according to Key <0, A >, whether A exists or not, if not, returning that the directory A does not exist;

s32: if A exists, NFS-server returns OK to NFS-Client.

S33: the NFS-Client requests whether the NFS-Server request file a.txt exists.

S34: NFS-server inquires CacheServers according to Key <1, a.txt >, and whether a.txt exists or not. If a.txt exists, the return file a.txt already exists.

S35: and if the a.txt does not exist, the NFS-server returns OK to the NFS-Client.

S36: the NFS-client requests the NFS-server to create a file a.txt.

S37: NFS-server according to Key <1, a.txt >, file a.txt access metadata cache to CacheServers. And if the asynchronous persistence strategy is configured, directly responding to the NFS-client.

S38-1: NFS-server calculates the MetaServer according to Key <1, a.txt > hash, such as: and the MetaServer A persists the access metadata of the file a.txt into the MetaServer A.

S38-2: MetaServer A detects that the total number of subdirectories and files of directory A reaches a threshold (e.g., 1000), and puts its dump into DataServer.

S39: NFS-server returns to NFS-Client that the creation of file a.txt was successful.

The processing mode for acquiring the file state of the target file by the operation request provided by the embodiment is as follows:

firstly, if the operation request is used for acquiring the file state of a target file, determining access metadata as target metadata;

secondly, acquiring access metadata of the target file according to the target identifier of the directory to which the target file belongs and the file name of the target file;

thirdly, acquiring access control metadata of the directory according to the directory;

and fourthly, taking the access metadata of the target file and the access control metadata of the directory to which the target file belongs as the file state of the target file so as to obtain the file state of the target file.

To more clearly explain the process of obtaining the file state, this embodiment provides a process of obtaining the file state of the target file/a/a.txt, please refer to fig. 9, fig. 9 is an interaction diagram of a metadata management method according to an embodiment of the present invention, including the following steps:

s40: NFS-Client requests NFS-Server to search A;

s41: NFS-server inquires Cache Servers according to Key <0, A >, whether A exists or not, if not, returning that the directory A does not exist;

s42: if A exists, NFS-server returns OK to NFS-Client.

S43: the NFS-Client requests whether the NFS-Server request file a.txt exists.

S44: NFS-server inquires CacheServers according to Key <1, a.txt >, and whether a.txt exists or not. If the a.txt does not exist, the return file a.txt does not exist.

S45: and if the a.txt exists, the NFS-server returns OK to the NFS-Client.

S46: and the NFS-client requests the NFS-server to acquire the file state of the file a.txt.

S47: and the NFS-server queries the Cache according to the A directory Key <0, A >, acquires the A directory Access Metadata and confirms whether the files in the directory set the cluster Cache identifier.

S48-1: if the cluster cache identification exists, the NFS-server queries CacheServers according to Key <1, a.txt >, acquires a.txt file Access Metadata, combines the A directory Access Metadata with the a.txt file Access Metadata, and responds to the NFS-client.

S48-2: if the cluster cache identification does not exist, the NFS-server calculates the located MetaServer A according to Key <1, a.txt > Hash, inquires the a.txt file Access Metadata, combines the A directory Access Metadata with the a.txt file Access Metadata, and responds to the NFS-client.

S49: NFS-server returns the file status of file a.txt to NFS-Client.

In order to perform the corresponding steps in the above-described embodiments and various possible implementations, an implementation of the metadata management apparatus is given below. Referring to fig. 10, fig. 10 is a functional module diagram of a metadata management apparatus 100 according to the present invention. It should be noted that the basic principle and the technical effect of the metadata management apparatus 100 according to the embodiment of the present invention are the same as those of the foregoing method embodiment, and for a brief description, no mention part of the embodiment may refer to the corresponding content of the foregoing method embodiment. The metadata management apparatus 100 is applied to any one of the storage nodes 10, and the metadata management apparatus 100 includes: a receiving module 110, a processing module 120, and a returning module 130.

The receiving module 110 is configured to receive an operation request for operating a target object sent by a client, where the target object includes a target directory or a target file.

A processing module 120, configured to determine, based on the operation request, target metadata in metadata of the target object, and operate the target object by accessing the target metadata to obtain an operation result of the operation target object, where the metadata of the target directory includes access control metadata, content metadata, and directory list metadata, the metadata of the target file includes access metadata, the access control metadata is used to control access to the target directory, and the content metadata represents management information of the target directory related to the subdirectories and the files; the directory listing metadata characterizes list information of subdirectories and files included in the target directory, and the access metadata characterizes access information required for accessing the target files.

Optionally, the processing module 120 is specifically configured to: if the operation request is used for creating a target directory, determining the access control metadata and the content metadata of the target directory as target metadata; the access control metadata and the content metadata of the target directory are stored to create the target directory.

Optionally, the storage node includes a first storage tier and a second storage tier whose access performance is sequentially reduced, and the processing module 120 is specifically configured to, when storing the access control metadata and the content metadata of the target directory, specifically: acquiring a directory identifier, a directory name and a parent directory identifier of a target directory; determining a first storage node from the plurality of storage nodes according to the father directory identifier and the directory name; storing access control metadata to a first storage tier and a second storage tier of a first storage node; determining a second storage node from the plurality of storage nodes according to the directory identifier of the target directory; the content metadata is stored to a first storage tier and a second storage tier of a second storage node.

Optionally, the storage node includes a first storage tier, a second storage tier, and a third storage tier whose access performance is sequentially reduced, and the processing module 120 is further configured to: if the operation request is used for obtaining a directory list of the target directory, determining the metadata of the directory list as standard metadata; determining a third storage node for storing the log index of the target directory according to the directory identifier of the target directory and reading the log index from the first storage layer of the third storage node; if the log index is a preset value, reading the metadata of the target list from the second storage layer of the third storage node to obtain a directory list of the target directory; if the log index is not a preset value, determining a fourth storage node from the plurality of storage nodes according to a preset erasure correction strategy; the target list metadata is read from the third storage tier of the fourth storage node to obtain a directory list of the target directory.

Optionally, the storage node includes a first storage tier and a second storage tier whose access performance is sequentially reduced, and the processing module 120 is further configured to: if the operation request is used for creating a target file, determining the access metadata as target metadata; acquiring a directory identifier of a directory to which a target file belongs; determining a fifth storage node for storing the access metadata according to the directory identifier of the directory and the file name of the target file; storing the access metadata to the first storage tier and the second storage tier of the fifth storage node to create a target file.

Optionally, the storage node further includes a third storage tier, an access performance of the third storage tier is smaller than that of the second storage tier, and the processing module 120 is further configured to: acquiring the total number of subdirectories and file numbers included in the directory; if the total number is larger than the preset number, determining a sixth storage node from the plurality of storage nodes according to a preset erasure correction strategy; and storing the directory list metadata of the directory to a third storage layer of a sixth storage node.

Optionally, the processing module 120 is further configured to: if the operation request is used for acquiring the file state of the target file, determining the access metadata as target metadata; acquiring access metadata of a target file according to a target identifier of a directory to which the target file belongs and a file name of the target file; acquiring access control metadata of the directory according to the directory; and taking the access metadata of the target file and the access control metadata of the directory to which the target file belongs as the file state of the target file to obtain the file state of the target file.

And a returning module 130, configured to return the operation result to the client.

Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the metadata management method as described above.

In summary, embodiments of the present invention provide a metadata management method, an apparatus, a storage node, and a readable storage medium, which are applied to any storage node in a plurality of storage nodes in a distributed file system, where the storage node is in communication connection with a client, and the method includes: receiving an operation request of an operation target object sent by a client, wherein the target object comprises a target directory or a target file; determining target metadata in metadata of a target object based on the operation request, and operating the target object by accessing the target metadata to obtain an operation result of operating the target object, wherein the metadata of the target directory comprises access control metadata, content metadata and directory list metadata, the metadata of the target file comprises access metadata, the access control metadata is used for controlling access to the target directory, and the content metadata represents management information of the target directory related to subdirectories and files; the directory list metadata represents the list information of the subdirectories and the files included by the target directory, and the access metadata represents the access information required by accessing the target files; and returning the operation result to the client. The embodiment of the invention simplifies the management of the metadata by dividing the metadata of the directory and the metadata of the file, only needs to access a part of the metadata when operating the target or the file, and effectively improves the access performance of the metadata.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A metadata management method applied to any storage node in a plurality of storage nodes in a distributed file system, wherein the storage node is in communication connection with a client, and the method comprises the following steps:

and returning the operation result to the client.

2. The metadata management method according to claim 1, wherein the step of determining target metadata among metadata of the target object based on the operation request, and operating the target object by accessing the target metadata comprises:

3. The metadata management method according to claim 2, wherein the storage node includes a first storage tier and a second storage tier whose access performance is sequentially lowered, and the step of storing the access control metadata and the content metadata of the target directory includes:

4. The metadata management method according to claim 1, wherein the storage node includes a first storage tier, a second storage tier, and a third storage tier whose access performance is sequentially lowered, and the step of determining target metadata among the metadata of the target object based on the operation request and operating the target object by accessing the target metadata further includes:

5. The metadata management method according to claim 1, wherein the storage node includes a first storage tier, a second storage tier, whose access performance is sequentially lowered, and the step of determining target metadata among the metadata of the target object based on the operation request and operating the target object by accessing the target metadata further includes:

6. The metadata management method of claim 5, wherein the storage node further comprises a third storage tier having a smaller access performance than the second storage tier, the method further comprising:

7. The metadata management method according to claim 1, wherein the step of determining target metadata among metadata of the target object based on the operation request, and operating the target object by accessing the target metadata further comprises:

8. A metadata management apparatus, applied to any storage node in a plurality of storage nodes in a distributed file system, the storage node being communicatively connected to a client, the apparatus comprising:

the processing module is used for determining target metadata in metadata of the target object based on the operation request and operating the target object by accessing the target metadata to obtain an operation result of operating the target object, wherein the metadata of the target directory comprises access control metadata, content metadata and directory list metadata, the metadata of the target file comprises access metadata, the access control metadata is used for controlling access to the target directory, and the content metadata represents management information of the target directory related to subdirectories and files; the directory list metadata represents list information of subdirectories and files included in the target directory, and the access metadata represents access information required for accessing the target files;

and the return module is used for returning the operation result to the client.

9. A storage node comprising a processor and a memory, the memory for storing a program, the processor being configured to implement the metadata management method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements a metadata management method according to any one of claims 1 to 7.