CN114281765A - Metadata processing method and equipment in distributed file system - Google Patents

Metadata processing method and equipment in distributed file system Download PDF

Info

Publication number
CN114281765A
CN114281765A CN202011045589.9A CN202011045589A CN114281765A CN 114281765 A CN114281765 A CN 114281765A CN 202011045589 A CN202011045589 A CN 202011045589A CN 114281765 A CN114281765 A CN 114281765A
Authority
CN
China
Prior art keywords
metadata
client
version
server
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011045589.9A
Other languages
Chinese (zh)
Inventor
朴君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202011045589.9A priority Critical patent/CN114281765A/en
Publication of CN114281765A publication Critical patent/CN114281765A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides a metadata processing method and equipment in a distributed file system, and belongs to the technical field of computers. The application provides a way to obtain metadata by means of combining software and hardware. By deeply combining the operation of accessing metadata in the distributed file system with the hardware technology of the intelligent network card, the client reads the metadata in the metadata server through remote unilateral reading operation, and the COW technology is utilized to realize the remote metadata operation method without locking, so that the network service module and the distributed lock module are unloaded from the CPU to the network card, the resource overhead caused by the network service module and the distributed lock module is eliminated, the problem that the network service module and the distributed lock module repeatedly process the metadata is solved, the resource of the metadata server is saved, the performance of the metadata server is improved, and the single-point performance bottleneck problem of the metadata server is facilitated to be solved.

Description

Metadata processing method and equipment in distributed file system
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing metadata in a distributed file system.
Background
The mainstream distributed file system generally adopts a structure of separating metadata and data, and is used for improving input/output (IO) processing capability. Wherein the data I/O is processed in parallel by a plurality of data servers and the metadata I/O is processed by a metadata server.
In time, when the distributed file system processes the metadata, the client sends a metadata processing request to the metadata server. After the metadata server receives the metadata processing request, the metadata server collaboratively processes the metadata processing request through the network service module, the distributed lock module (also called network lock module) and the metadata management module, reads the stored metadata and returns the metadata to the client.
When the method is adopted, the metadata can be processed repeatedly, so that the resources and the performance of the metadata server are wasted.
Disclosure of Invention
The embodiment of the application provides a metadata processing method and equipment in a distributed file system, which can save resources of a metadata server and improve the performance of the metadata server. The technical scheme is as follows:
in the method, the client executes a remote one-side read operation on a memory of the metadata server through a network card to obtain a target version of the metadata, wherein the target version is a latest version in a copy set of the metadata, and the copy set comprises at least one version of the metadata generated by the metadata server based on Copy On Write (COW) by the metadata server; and the client stores the target version of the metadata to a memory of the client.
The above provides a way to obtain metadata by means of a combination of software and hardware. By deeply combining the operation of accessing metadata in the distributed file system with the hardware technology of the intelligent network card, the client reads the metadata in the metadata server through remote unilateral reading operation, and the COW technology is utilized to realize the remote metadata operation method without locking, so that the network service module and the distributed lock module are unloaded from a Central Processing Unit (CPU) to the network card, the resource overhead caused by the network service module and the distributed lock module is eliminated, the problem that the network service module and the distributed lock module repeatedly process the metadata is solved, the resources of the metadata server are saved, the performance of the metadata server is improved, and the single-point performance bottleneck problem of the metadata server is facilitated to be solved.
Optionally, the performing, by the client, a remote one-sided read operation on the memory of the metadata server through a network card includes: the client maps a remote memory address into a local memory address through the network card, the remote memory address is used for indicating an address of the copy set in the memory of the metadata server, and the local memory address is used for indicating an address in the memory of the client; and the client executes read operation on the local memory address.
Optionally, the performing, by the client, a remote unilateral reading operation on the memory of the metadata server through a network card to obtain a target version of the metadata includes: and the client searches a target version of the metadata in the copy set according to a pointer field in each version of the metadata, wherein the pointer field is used for indicating the next version of the metadata, and the value of the pointer field in the target version of the metadata is null.
Optionally, after the client performs a remote one-sided read operation on the memory of the metadata server through the network card, the method includes: the client executes remote atomic operation (remote atomic operation) on the memory of the metadata server through the network card so as to update the target version of the metadata.
In a second aspect, there is provided a method of metadata processing in a distributed file system comprising a client and a metadata server, in which method,
the metadata server receiving an update request from the client, the update request including a first version of the client's updated metadata;
if the first version is the latest version in the copy set of the metadata, the metadata server copies the COW when writing based on the metadata to obtain a target version of the metadata, wherein the target version of the metadata is a copy of the first version of the metadata;
the metadata server adds a target version of the metadata to the replica pool.
Optionally, after the metadata server receives an update request from the client, the method further includes: and if the first version is not the latest version in the copy set of the metadata, the metadata server sends a failure message to the client, wherein the failure message represents that the metadata update fails.
Optionally, after the metadata server adds the target version of the metadata to the copy set, the method further includes: and if the state of the target version of the metadata meets the condition, the metadata server releases the memory space occupied by the target version of the metadata, and the state is used for indicating the access frequency of the client to the target version of the metadata.
Optionally, before the metadata server receives an update request from the client, the method further includes: the metadata server allocates a memory space from a memory, wherein the memory space is used for storing the copy set; the metadata server registers the memory address corresponding to the memory space to a network card of the metadata server; and the metadata server sends the memory address to the client.
In a third aspect, a client is provided, where the client includes a processor, a network card, and a memory. The client is configured to implement the functionality provided by the first aspect or any of the alternatives of the first aspect.
In a fourth aspect, a metadata server is provided, which includes a processor, a network card and a memory. The metadata server is arranged to implement the functionality provided by the second aspect or any of the alternatives of the second aspect.
In a fifth aspect, a computer-readable storage medium is provided, where at least one instruction is stored in the storage medium, and the instruction is read by a processor to enable a client to execute the metadata processing method in the distributed file system provided in the first aspect or any one of the alternatives of the first aspect.
In a sixth aspect, there is provided a computer-readable storage medium, wherein at least one instruction is stored in the storage medium, and the instruction is read by a processor to cause a metadata server to execute the metadata processing method in the distributed file system provided in the second aspect or any one of the alternatives of the second aspect.
In a seventh aspect, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the client reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the client performs the metadata processing method in the distributed file system provided in the first aspect or any one of the alternatives of the first aspect.
In an eighth aspect, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the metadata server reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the metadata server executes the metadata processing method in the distributed file system provided by the second aspect or any alternative manner of the second aspect.
In a ninth aspect, a chip is provided, which when running on a client, causes the client to execute the metadata processing method in the distributed file system provided in the first aspect or any one of the alternatives of the first aspect.
In a tenth aspect, a chip is provided, which, when running on a metadata server, causes the metadata server to execute the metadata processing method in the distributed file system provided in the second aspect or any one of the alternatives of the second aspect.
In an eleventh aspect, there is provided a distributed file system comprising a client configured to perform the method of the first aspect or any of the alternatives of the first aspect, and a metadata server configured to perform the method of the second aspect or any of the alternatives of the second aspect.
Drawings
Fig. 1 is a schematic diagram of a system architecture of a distributed file system according to an embodiment of the present application;
FIG. 2 is a diagram illustrating metadata processing in a distributed file system according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 4 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 5 is a diagram illustrating metadata processing in a distributed file system according to an embodiment of the present application;
FIG. 6 is a diagram illustrating metadata processing in a distributed file system according to an embodiment of the present application;
FIG. 7 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 8 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 9 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 10 is a flowchart of a metadata processing method in a distributed file system according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a data structure of metadata provided by an embodiment of the present application;
fig. 12 is a schematic structural diagram of a client or a metadata server according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following description will first be made of the concept related to terms related to the embodiments of the present application.
Distributed File System (DFS)
Computers manage and store data through file systems. In the information explosion era, the data generated by people grows exponentially. In order to meet the requirement of storing more data, the conventional single file system expands the capacity by increasing the number of hard disks, and stores data by more hard disks. However, the stand-alone file system is not satisfactory in terms of capacity size, Input/Output (I/O) performance, growth speed, data reliability, data security, and the like.
The distributed file system can effectively solve the storage and management problems of the data. A distributed file system is a system that manages files on multiple node devices. The distributed file system is extended from a single file system fixed at a certain place to a plurality of places and a plurality of file systems, and a plurality of nodes form a file system network. Distributed file systems allow files to be shared across a network across multiple node devices. Different node devices in the distributed file system are optionally distributed at different locations. In the distributed file system, communication and data transmission among node devices can be carried out through a network, so that the storage capacity, the I/O performance, the growth speed and the data reliability are greatly improved. For the upper layer application, the upper layer application does not need to pay attention to the underlying distributed architecture, and the upper layer application still accesses files as if the local file system is used.
Two, distributed lock (distributed lock)
A distributed lock is a way to control the synchronous access of a distributed system to shared resources (files). In general, the distributed lock has the following features (1) to (5).
Feature (1) in a distributed system environment, a method can only be executed by one thread of one machine at a time.
Feature (2) high availability of acquire and release locks.
Feature (3) high performance acquire and release locks.
The feature (4) has reentrant characteristics.
Feature (5) provides a lock failure mechanism to prevent deadlock. It has the property of non-blocking locks, i.e. failure will be returned directly without acquiring the lock.
Copy On Write (COW)
If multiple callers (e.g., multiple clients) request the same resource (e.g., data or file on disk), the multiple callers will get the same pointer together. The pointers point to the same resource. When a caller attempts to modify the contents of a resource (e.g., update a file), the system will actually copy a private copy to the caller. This process of copying copies is transparent to other callers.
Four, zero copy (zero copy)
Zero copy refers to performing a data replication task from one storage area to another without the involvement of a Central Processing Unit (CPU). When the network card or other peripherals carry out I/O operation, the network card or other peripherals can directly interact with the main memory without passing through the CPU through zero copy, so that the consumption of the CPU and the occupation of memory bandwidth can be reduced. Zero copy may be implemented based on Direct Memory Access (DMA). The DMA controller is for example integrated on a network card.
Index node (Inode) information
Inode refers to metadata in a device. The file system locates each file through the Inode. The Inode information includes, for example, an Inode number, a file size, a file creation date, and the like. The inode number is used to uniquely identify the corresponding file in the distributed file system.
Sixthly, Inode table (Inode table)
The Inode table is used for storing Inode information of all files in the file system. When a user searches or accesses a file, the system looks up the correct Inode number through the Inode table.
Seventhly, Remote Direct Memory Access (RDMA)
RDMA is a technique that supports one computer to transfer data directly from memory to memory in another computer over a network. The RDMA process is to copy data to a Network Interface Controller (NIC) in a DMA manner, transmit the data to a remote NIC through a Network, and directly reach a remote memory.
Eight, RDMA read operations
An RDMA read operation is a data read operation in a store operation. During the process of performing RDMA read operation, the NIC of the source device reads data from the data buffer of the destination device according to the virtual address of the data buffer of the destination device and the permission information, and stores the read data in the data buffer of the source device. Specifically, the CPU of the source device may generate the RDMA read instruction according to the virtual address of the memory area of the source device, the virtual address of the memory area of the destination device, and the permission information of the memory area of the destination device. The source device may then enqueue the RDMA read instruction to the send queue. The source device may then notify the NIC, via the doorbell mechanism, to execute the instructions in the transmit queue. The NIC may then read the command from the transmit queue, obtain an RDMA read command, and perform an RDMA read operation according to the RDMA write command. Before the NIC of the source device performs the RDMA read operation, the CPU of the destination device may store the data to be read in the data buffer in advance. The CPU of the destination device may send the virtual address of the memory area and the permission information of the memory area of the destination device to the CPU of the source device in advance, and the CPU of the source device may receive the virtual address of the memory area of the destination device and the permission information of the memory area of the destination device, so as to obtain the virtual address of the memory area of the destination device and the permission information of the memory area of the destination device.
The following describes the use of the distributed file system in a particular application.
The mainstream distributed file system generally adopts a metadata and data separation architecture, and the I/O processing capacity is improved based on the metadata and data separation architecture. Wherein the data I/O is processed in parallel by a plurality of data servers. Metadata I/O is handled by a single metadata server. The client sends a file operation request to the metadata server through the network. After the client receives the data address sent by the metadata server, the client directly initiates a read-write request to the data server.
In the related art, the metadata server mainly includes a network module, a distributed lock module, and a metadata management module.
The network module is used for processing message requests of a remote end (such as a client) and providing services for other modules in an RPC mode.
The distributed lock module is mainly used for controlling a plurality of clients to access the same shared resource simultaneously, and cluster consistency is guaranteed.
The metadata management module is mainly used for providing a global file system view for all the clients.
Wherein the software stack of the network module and the metadata management module is very thick and heavy and contains repeated metadata. In the distributed lock module, lock resource information occupies a large amount of memory, and lock request processing (locking, unlocking, lock collision, lock recovery) occupies a large amount of CPU resources. In the related art, the distributed file system is designed based on a mechanical disk, and the overall I/O performance is limited by the disk performance, so that the overhead of the software stack is not obvious. However, as the storage medium gradually evolves to a flash memory or a non-volatile memory, the medium delay may be reduced to a microsecond (us) level or even a nanosecond (ns) level, so that the overhead of the software stack cannot be ignored.
The mainstream distributed File systems in the industry all adopt a centralized metadata architecture, and a single host is responsible for managing global metadata information, such as Google File System (GFS), Hadoop Distributed File System (HDFS), moose fs (a distributed network File System with a redundant fault-tolerant function), and the like. For example, a metadata server in the GFS architecture includes a Remote Procedure Call bind (RPCbind) module, a Network Lock Management (NLM) module, and a metadata management module (metadata management).
Wherein, the RPCbind module is a network service module. The RPCbind module provides a remote procedure call function for the client, and is used for receiving a file operation request of the analysis client.
The NLM module is a distributed lock module. The NLM module is used for coordinating the concurrent access of a plurality of clients to the same file and solving the access conflict.
The metadata management module is used for managing the global metadata information and providing a global file system view for all the clients.
However, the above architecture has several problems as follows.
Problem one, duplicate metadata processing.
The metadata server processes the metadata request through the network service module and the metadata management module in a cooperative mode, so that the metadata is copied in duplicate, and the resources and the performance of the metadata server are wasted.
Problem two, metadata server performance bottleneck.
The metadata server is designed by adopting a single server architecture. However, the metadata server needs to process metadata requests of a large number of clients, which easily causes a single-point performance bottleneck and causes I/O performance degradation.
In view of this, some embodiments of the present application provide a method for implementing a lock-free distributed file system, which optimizes metadata update operations of the distributed file system in terms of both hardware and software combinations and lock-free. On one hand, the metadata operation of the distributed file system is deeply combined with the hardware technology of the intelligent network card, and most of the metadata operation is unloaded to the hardware of the intelligent network card through the remote memory direct access technology, so that the hardware performance of the intelligent network card is fully utilized. On the other hand, the COW technology is utilized to support a plurality of clients to operate the metadata in a parallel lock-free mode, so that the resource overhead of a network service module and a distributed lock module is eliminated, and the problem of repeated metadata is solved. Therefore, the embodiment of the application greatly reduces the performance overhead of the metadata server and solves the problem of single-point performance bottleneck.
The system architecture provided by the embodiments of the present application is described below.
Referring to fig. 1, a system architecture 100 is provided in an embodiment of the present application. The system architecture 100 is an illustration of a distributed file system. System architecture 100 includes client 101, metadata server 102, chunk server 103, chunk server 113, and chunk server 123. The client 101, the metadata server 102, the block server 103, the block server 113, and the block server 123 are connected to each other via a wireless network or a wired network.
The client 101 includes at least one application 1011, a file system client (FS client) 1012, and an intelligent Network Card device (SNIC) 1013. The client 101 provides a file system interface for the application 1011 through the FS client 1012. The client 101 breaks file operations into metadata requests and data requests. Where metadata is pulled from metadata server 102 and data is pulled from chunk server 103, chunk server 113, or chunk server 123.
The smart card device 1013 is configured to perform a remote one-side reading operation and a remote atomic operation. In addition, the intelligent network card device 1013 is also used for the client 101 to push metadata (such as Inode information) to the metadata server 102.
The client 101 executes a remote one-sided read operation through the smart card device 1013, and can directly access the memory data of the metadata server 102. Data transmission performance is improved and the CPU performance overhead of the metadata server 102 is reduced in a zero copy, CPU By Pass (CPU By Pass) manner.
The client 101 executes remote atomic operation through the intelligent card device 1013, directly updates the memory data (4-8 bytes) of the metadata server 102, ensures the atomicity of the operation through the intelligent card device 1013, and is suitable for a scenario that a large number of clients modify the same memory data at high concurrence.
The FS client 1012 includes a metadata management device 10121 and a data management device 10122.
The metadata management means 10121 is used for acquiring the latest metadata from the metadata server 102. The metadata management means 10121 is also used for submitting metadata modified by the application 1011 to the metadata server 102.
The data management means 10122 is configured to acquire the latest data from the block server 103, the block server 113, or the block server 123, and submit the data modified by the application 1011 to the block server 103, the block server 113, or the block server 123.
The metadata server 102 includes a metadata management apparatus 1021, a metadata recovery apparatus 1022, and an intelligent network card device 1023.
The metadata management apparatus 1021 is used to manage global metadata information. For the metadata update operation, the metadata management apparatus 1021 does not directly overwrite metadata, but creates one copy using the COW technique. Other clients can continue to read old metadata, and the lock-free updating method is realized.
The metadata recycling device 1022 is used for recycling the old copy generated after the metadata update. The metadata recycling apparatus 1022 saves the storage resources of the metadata server by releasing the memory space occupied by the old copy.
The chunk server 103, the chunk server 113, and the chunk server 123 are configured to manage local storage resources and process file data read-write requests of the client 101. Chunk server 103, chunk server 113, and chunk server 123 are optionally distributed at different locations. Block server 103, block server 113, and block server 123 can cooperate in a distributed parallel manner, thereby greatly improving I/O performance.
Optionally, different blocks in system architecture 100 are stored on different block servers. For example, referring to FIG. 1, chunk server 103 stores chunk 1, chunk server 113 stores chunk 2, and chunk server 123 stores chunk 3. In some embodiments, the different chunk servers each store chunks via their respective local disks. For example, block server 103 accesses local disk 104, saves block 1 via local disk 104; the block server 113 accesses the local disk 114, and saves the block 2 via the local disk 114; chunk server 123 accesses local disk 124, and saves chunk 3 via local disk 124.
The overall architecture of the embodiments of the present application is described above in conjunction with the system architecture 100 shown in fig. 1. The system architecture 100 offloads the network module and the distributed lock module in the metadata server to the intelligent network card device, and implements a lock-free metadata operation method by the COW technology, which can support multiple clients to operate the metadata without lock in parallel. How the system architecture 100 implements lockless metadata operations is described in detail below with reference to fig. 2.
Referring to fig. 2, a Global Inode table (Global Inode table) is stored in the memory of the metadata server 102. The global Inode table is used to manage all Inode information. Each Inode information occupies one memory page, which is 4 Kilobytes (KB) in size. Each Inode information maintains a set of replicas. Each time the Inode information is updated, a new copy is generated and added to the set of copies.
For example, metadata server 102 maintains corresponding sets of replicas for Inode1, Inode2, and Inode3, respectively. The set of replicas for Inode1 includes Inode1 replica 1, Inode1 replica 2, and Inode1 replica 3. Inode1 copy 1, Inode1 copy 2, and Inode1 copy 3 are 3 different versions of Inode1, respectively. The set of replicas for Inode2 includes Inode2 replica 1, Inode2 replica 2, and Inode2 replica 3. Inode2 copy 1, Inode2 copy 2, and Inode2 copy 3 are 3 different versions of Inode2, respectively. The set of replicas for Inode3 includes Inode3 replica 1, Inode3 replica 2, and Inode3 replica 3. Inode3 copy 1, Inode3 copy 2, and Inode3 copy 3 are 3 different versions of Inode3, respectively.
The metadata server 102 registers the memory address of the global Inode table in the intelligent network card device 1023. Then, the intelligent network card device 1023 presents the memory address of the global Inode table as a segment of local memory address space for the client 101, so that the client 101 can directly pull the Inode information.
The memory of the client 101 stores a client Inode table. The client Inode table is used to manage Inode information that the host (client 101) needs to operate. The client 101 does not need to maintain the Inode replica set, and the client 101 only needs to update to the latest Inode.
When the client 101 reads the metadata, the remote Inode information is pulled by polling through the remote one-sided reading function of the smart card device 101, so as to ensure that the Inode information is always up-to-date.
When the client 101 updates the metadata, the client 101 transmits the updated Inode information to the metadata server 102. The metadata management device 1021 in the metadata server 102 is responsible for creating a copy of the updated Inode information and adding the copy to the copy set.
The metadata management device 1021 is used for processing Inode information submitted by the client 101. Specifically, the metadata management apparatus 1021 is configured to record an update operation of metadata in a global Inode table in a form of a copy, without affecting reading of the original Inode information.
The metadata management apparatus 1021 is also used to resolve update conflicts of Inode information. For a scenario where multiple clients concurrently submit updated Inode information, the metadata management apparatus 1021 needs to ensure that the same Inode information is updated in series. For example, client a and client B submit update requests for the same Inode information at the same time. The metadata management apparatus 102 receives the update request of the client a first, and the metadata management apparatus 102 establishes a copy-join copy set in response to the update request of the client a. Next, the metadata management apparatus 102 processes the update request of the client B. If the metadata management apparatus 102 finds that the client B is not modified based on the latest copy, the metadata management apparatus 102 directly returns a failure to the client B, triggering the client B to update the Inode information and resubmit the update request.
The system architecture provided by the embodiment of the present application is introduced above, and a method flow for processing metadata based on the system architecture is introduced below.
Referring to fig. 3, fig. 3 is a flowchart of a metadata processing method 200 in a distributed file system according to an embodiment of the present application. Optionally, the method 200 is performed by the client 101 in the system architecture shown in fig. 1, and the method 200 includes S210 to S220.
S210, the client executes remote unilateral reading operation on the memory of the metadata server through the network card to obtain a target version of the metadata.
The metadata is, for example, Inode information. The target version is the latest version in the collection of copies of the metadata. In other words, the target version of metadata is the most recent version of metadata in the replica collection. The replica set includes at least one version of metadata generated by the metadata server for COW based on the metadata. For example, a replica set includes multiple replicas of Inode information, with a target version of metadata being the latest version of Inode information.
Further, if the metadata has not been updated, the set of copies of the metadata is empty. The metadata itself is stored in the memory of the metadata server, and the client side can obtain the metadata itself after executing remote unilateral reading operation through the network card.
S220, the client stores the target version of the metadata to a memory of the client.
The network card of the client is, for example, an intelligent network card device (SNIC). The network card of the client is, for example, an RDMA-capable network card.
The network card of the client provides a remote unilateral reading function. The network card of the client accesses the memory of the metadata server by executing remote unilateral reading operation. The network card of the client pulls the latest metadata from the memory of the metadata server.
In some embodiments, the remote one-sided read operation is implemented via an RDMA read. The network card of the client executes RDMA read operation, thereby obtaining the metadata in the memory of the metadata server.
In some embodiments, the remote one-sided read operation is implemented by performing memory address mapping. Specifically, the client maps a remote memory address into a local memory address through a network card; the client performs a read operation on the local memory address. The remote memory address is the memory address of the metadata server. Specifically, the remote memory address is used to indicate an address of the copy set in the memory of the metadata server, and the local memory address is used to indicate an address in the memory of the client. In this way, the network card supports the client to directly access the memory of the server, so that the server can bypass the CPU of the server when providing the metadata, thereby saving the expense of the CPU of the server.
In combination with the memory address mapping, the specific process of the remote one-sided read operation is, for example: and the SNIC of the server sends the remote memory address to the SNIC of the client. When the client needs to update the metadata, the client encapsulates the remote memory address into the read request, and the client sends the read request to the SNIC of the metadata server. And the SNIC of the metadata server finds the target version of the metadata according to the remote memory address in the read request and sends the target version of the metadata to the SNIC of the client.
In some embodiments, the remote memory address is pre-registered to the network card by the metadata server. After the client sends the mount request to the metadata server, the metadata server sends the remote memory address to the client in response to the mount request of the client. Specifically, the metadata server allocates a memory space from a memory, and the memory space is used for storing a copy set; the metadata server registers a memory address corresponding to the memory space to a network card of the metadata server; and the metadata server sends the memory address to the client. And the network card of the client receives the memory address sent by the metadata server. The client uses the memory address sent by the metadata server as a remote memory address and establishes mapping with the local memory address.
In some embodiments, the recency of the metadata version is indicated by a pointer field. In particular, the metadata contains a pointer field. The pointer field is, for example, referred to as a next field. The values of the pointer fields in different versions of metadata are different. The pointer field of the metadata is used to indicate the next version of the metadata. For example, the value of the pointer field is the logical or physical address of the next copy. Wherein the value of the pointer field in the target version of the metadata is null (null). Specifically, the client searches for a target version of the metadata in the copy set according to the pointer field in each version of the metadata. For example, the client first finds a first version of the metadata in the copy set, then finds a second version of the metadata according to the pointer field of the first version of the metadata, then finds a third version of the metadata according to the pointer field of the second version of the metadata, and so on until the value of the pointer field is found to be null.
In some embodiments, the network card of the client updates the metadata maintained by the metadata server by performing a remote atomic operation. Specifically, after the client performs the remote unilateral reading operation on the memory of the metadata server through the network card, the client performs the remote atomic operation on the memory of the metadata server through the network card to update the target version of the metadata.
Wherein, the remote atomic operation is a function provided by the intelligent network card. The client directly operates the memory mapped to the client through the intelligent network card. The specific implementation of the remote atomic operation is cooperatively provided by an intelligent network card of the client and an intelligent network card of the metadata server. In some embodiments, the remote atomic operation is implemented by an RDMA atomic operation. The network card of the client executes the RDMA atomic operation, thereby sending the updated target version of the metadata to the memory of the metadata server.
In some embodiments, the client's network card updates the state of the metadata by performing a remote atomic operation. Wherein the state of the metadata is used to indicate the frequency of access to the metadata by the client. Optionally, the metadata includes a count field. The network card of the client updates the value of the count field in the metadata by performing a remote atomic operation. The count field is used to carry the state of the metadata. The state of the metadata is used for the metadata server to recycle the metadata. How the metadata server reclaims the metadata according to the state of the metadata will be described in detail in the embodiment of fig. 3 below.
Optionally, the value of the count field is the number of clients in the distributed file system accessing the metadata. For example, if a total of n clients in the distributed file system access the metadata, the value of the count field in the metadata is n. When this implementation is adopted, optionally, the network card of the client increases the value of the count field in the metadata by one by performing a remote atomic operation.
Or, optionally, the value of the count field indicates whether a client in the distributed file system has access to the metadata. For example, if there is a client accessing the metadata in the distributed file system, the value of the count field in the metadata is 1, and if there is no client accessing the metadata in the distributed file system, the value of the count field in the metadata is 0. When this implementation is adopted, optionally, the network card of the client sets the value of the count field in the metadata from 0 to 1 by performing a remote atomic operation.
In the method provided above, the client implements a distributed metadata recovery technique by using the remote atomic operation function provided by the intelligent network card device, thereby greatly improving the metadata copy recovery efficiency and further reducing the performance overhead of the metadata server.
The above provides a way to obtain metadata by means of a combination of software and hardware. By deeply combining the operation of accessing metadata in the distributed file system with the hardware technology of the intelligent network card, the client reads the metadata in the metadata server through remote unilateral reading operation, and the COW technology is utilized to realize the remote metadata operation method without locking, so that the network service module and the distributed lock module are unloaded from the CPU to the network card, the resource overhead caused by the network service module and the distributed lock module is eliminated, the problem that the network service module and the distributed lock module repeatedly process the metadata is solved, the resource of the metadata server is saved, the performance of the metadata server is improved, and the single-point performance bottleneck problem of the metadata server is facilitated to be solved.
Referring to fig. 4, fig. 4 is a flowchart of a metadata processing method 300 in a distributed file system according to an embodiment of the present application. Optionally, the method 300 is performed by the metadata server 102 in the system architecture shown in fig. 1, and the method 300 includes S310 to S330.
S310, the metadata server receives an updating request from the client.
Wherein the update request includes a first version of the client updated metadata. The first version is a version of the metadata. For example, the first version of the metadata is the client updated Inode information.
S320, if the first version is the latest version in the copy set of the metadata, the metadata server performs COW based on the metadata to obtain a target version of the metadata.
The target version of the metadata is a copy of the first version of the metadata. The target version of the metadata is used for recording the updating operation of the client on the metadata. For example, the metadata server makes a copy of the first version of the metadata, resulting in a copy of the first version of the metadata. The metadata server updates the version number in the copy of the first version of the metadata, for example, adds one to the version number in the copy of the first version of the metadata, and takes the copy with the updated version number as the target version of the metadata.
In some embodiments, the metadata server obtains a first version of the metadata from the update request. The metadata server determines whether the first version of the metadata is the latest version in the set of copies of the metadata. If the first version of the metadata is the latest version of the set of copies of the metadata and the metadata server determines that the client modified the metadata based on the latest version of the metadata, then the metadata server performs operations to create and add a copy to the set of copies.
In some embodiments, the metadata server determines whether the metadata sent by the client is the latest version according to the version number. Specifically, if the version number of the first version of the metadata is the largest in the set of replicas, the metadata server determines that the first version of the metadata is the latest version in the set of replicas. For example, the copy set includes 3 copies of metadata, the version numbers of the 3 copies of metadata are 1, 2, and 3, respectively, and when the version number of the metadata sent by the client is 3, the metadata server determines that the first version of the metadata is the latest version in the copy set, that is, the client modifies based on the latest copy.
S330, the metadata server adds the target version of the metadata into the copy set.
In the method provided by the invention, the metadata server maintains the copy set for the metadata, when the client requests to update the metadata, the metadata server does not directly copy the metadata, but records the update operation of the client into the copy set in a copy form by using a COW technology, and other clients can continuously read the old version metadata in the copy set, so that the non-locking remote metadata operation method is realized, the concurrent access of a plurality of clients to the same metadata can be coordinated, and the update concurrency and the update efficiency of the metadata can be greatly improved on the basis of solving the access conflict.
In some embodiments, after the metadata server receives the update request from the client, if the first version is not the latest version in the set of copies of the metadata, the metadata server generates a failure message, which the metadata server sends to the client. Wherein the failure message indicates that the metadata update failed.
In the method provided by the invention, when the metadata is modified by multiple clients concurrently, the metadata server returns a failure message to the client to trigger the client to update the local metadata when the client is found not to be modified based on the latest copy, which is helpful for ensuring that multiple clients update the same metadata in series, avoiding the situation that a large amount of host computer computing resources are consumed due to the coordination consistency of network lock modules of the metadata server, and saving the resource overhead of the metadata server.
In some embodiments, the metadata server also reclaims metadata according to the state of the metadata. Specifically, the metadata determines whether the state of the target version of the metadata satisfies a condition, and if the state of the target version of the metadata satisfies the condition, the metadata server releases a memory space occupied by the target version of the metadata. Wherein the status is used to indicate how often the client accesses the target version of the metadata. For example, the metadata determines whether the target version of the metadata is not accessed by the client, and if the metadata is not accessed by the client, it is determined that the target version of the metadata satisfies the condition.
How the metadata server can recover the metadata will be described in detail below with reference to fig. 5 and 6. For example, the metadata that is reclaimed is an Inode copy. If the Inode copy does not currently have any client access, it means that the Inode copy can be reclaimed.
As shown in FIG. 5, an Inode copy 1 is accessed by client 1 and client 3. The Inode copy 2 is accessed by clients 4 and 6. The Inode copy 3 is accessed by clients 2 and 5. All of Inode copy 1, Inode copy 2, and Inode copy 3 have client access, and Inode copy 1, Inode copy 2, and Inode copy 3 are not recycled.
As shown in FIG. 6, Inode copy 1 and Inode copy 2 have no client access and can be reclaimed. Then, the metadata recycling device 1022 in the metadata server releases the memory space occupied by the Inode copy 1 and the Inode copy 2. The client updates the state of the Inode copy atomically by using a remote atomic operation function provided by the intelligent network card equipment. The state of the Inode copy is, for example, the value of the count field in the Inode copy. For example, the state of the Inode copy indicates whether the Inode copy is accessed by a client. For example, the state of the Inode copy includes two states, one state is that one or more clients exist to access the Inode copy, and the other state is that no clients exist to access the Inode copy. The state of the Inode copy is stored in a memory of the metadata server, for example, in a Global Inode table (Global Inode table).
The process of updating the state of the Inode copy is realized in hardware, and does not occupy CPU resources of the metadata server.
The following describes an exemplary method flow for processing metadata in a read-write concurrency scenario in conjunction with the method 400 shown in fig. 7. Referring to fig. 7, the method 400 includes S401 to S404.
S401, the client 2 sends the updated metadata to the metadata server.
S402, the metadata server receives the updated metadata. The metadata server creates a replica, which the metadata server adds to the set of replicas.
S403, the client 1 pulls the metadata from the metadata server.
S404, the client 1 updates the metadata.
The following describes an exemplary method flow for processing metadata in a write-concurrent scenario, in conjunction with the method 500 shown in fig. 8. Referring to fig. 8, the method 500 includes S501 to S506.
S501, the client 2 sends the updated metadata to the metadata server.
S502, the metadata server creates a copy and adds the copy into a copy set.
S503, the client 1 sends the updated metadata to the metadata server. The metadata server receives the updated metadata sent by the client 1, and if the metadata server finds that the client 1 is not modified based on the copy of the latest version, the metadata server returns a failure message to the client 1.
S504, the client 1 pulls the metadata from the metadata server again.
S505, the client 1 updates the metadata.
S506, the client 1 sends the updated metadata to the metadata server.
The following describes an exemplary deployment flow of the distributed file system shown in fig. 1 in conjunction with the method 600 shown in fig. 9.
Referring to fig. 9, the method 600 includes steps S601 to S606.
S601, deploying a metadata server.
Specifically, S601 includes the following steps (1) to (3).
And (1) initializing a global Inode table by the metadata server.
The metadata server applies for a section of memory space, and the memory space is used for storing a global Inode table. And the metadata server registers the memory address space corresponding to the memory space into the intelligent network card equipment of the metadata server for management.
And (2) initializing a metadata log module by the metadata server. And, the metadata server starts listening for connection requests of the remote client.
And (3) initializing a metadata recovery device by the metadata server.
And S602, deploying the client.
Specifically, S601 includes the following steps (1) to (2).
And (1) initializing a client Inode table by the client.
The client applies for a section of memory space, which is used to store local metadata information (client Inode table).
And (2) initializing the metadata management device by the client.
And S603, mounting the file system.
Specifically, S603 includes the following steps (1) to (3).
And (1) the client sends a mounting request to the metadata server.
In some embodiments, after the mount request is received by the metadata server, the metadata server sends the global Inode table memory address space to the client.
And (2) mapping the memory address space of the remote global Inode table to the local memory address space by the client through intelligent network card equipment so as to be convenient for directly accessing remote memory data in the following process.
And (3) the client creates a mount point and provides a file access interface for the upper application.
And S604, unloading the file system.
Specifically, S604 includes the following steps (1) to (4).
And (1) synchronizing local metadata modification and data modification to a metadata server and a block server by the client side respectively.
And (2) the client clears the mounting points.
And (3) the client releases the memory occupied by the client Inode table.
And (4) stopping the metadata management device by the client.
S605, stopping the metadata server.
Step (1) the metadata server stops the metadata management means and the metadata collection means.
And (2) writing the global Inode table into a local disk by the metadata server. Then, the metadata server releases the memory.
S606, stopping the service of the block server.
The file operation flow of the distributed file system shown in fig. 1 is described below with reference to the method 700 shown in fig. 10.
Referring to fig. 10, the method 700 includes S701 to S704.
S701, the client side initiates a file request.
The file request comprises two processes of metadata updating and metadata obtaining.
For metadata update, when a file request of a client is a request for updating Inode information, the client sends the updated Inode information to a metadata server. And if the Inode information is updated successfully, the client returns that the upper layer application is successful. And if the Inode information is not updated successfully, the client synchronizes the latest Inode information in the remote global Inode table to the local through intelligent network card equipment. The client then initiates the update Inode request again. The process of synchronizing the node information to the local by the client is the same as the metadata acquisition process.
For metadata acquisition, the client updates the latest Inode information in the remote global Inode table to the local through intelligent network card equipment. The specific updating mode is that the client searches the next Inode copy backwards according to the next pointer of the Inode information until the value of the next pointer is null, which indicates that the Inode copy is the latest version. Referring to FIG. 11, FIG. 11 is a diagram illustrating a data structure of an Inode replica. The replica sets are managed by an Inode linked list. Each Inode copy includes a version number field, a next pointer, and a count field. The version number field includes the version number of the Inode copy. The version number of each Inode copy is incremented by 1 compared to the version number of the previous Inode copy. The previous Inode copy points to the next Inode copy through a next pointer.
S702, the metadata server updates the file metadata.
The metadata server receives the Inode information through the metadata management device and adds the Inode information to the copy set in a copy mode.
For concurrent update requests of the same Inode information, the metadata server is responsible for serializing the request sequence. When the metadata server processes the subsequent concurrent update request, the metadata server finds that the version number has been incremented, which indicates that other clients have updated the Inode copy. In this case, the metadata server directly returns a failure to the client, thereby notifying the client to update the version number of the local Inode and resubmitting the Inode update information.
And S703, updating the file metadata by other clients.
And other clients synchronize the latest Inode information in the remote global Inode table to the local through intelligent network card equipment. The specific process of S703 refers to the metadata acquisition process in S701 above.
And S704, other clients initiate file requests.
The specific process of S704 refers to the metadata acquisition process in S701 above.
Referring to fig. 12, fig. 12 is a schematic diagram illustrating a computer device 800 configured as a client or a metadata server in various method embodiments, according to an exemplary embodiment of the present application. The computer device 800 may be a host computer, a server, a personal computer, or the like. The computer device 800 may be implemented by a general bus architecture.
The computer device 800 includes at least one processor 801, a communication bus 802, a memory 803, and at least one network card 804.
The processor 801 is, for example, a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), a neural-Network Processing Unit (NPU), a Data Processing Unit (DPU), a microprocessor, or one or more integrated circuits for implementing the present disclosure. For example, the processor 801 includes an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. PLDs are, for example, Complex Programmable Logic Devices (CPLDs), field-programmable gate arrays (FPGAs), General Array Logic (GAL), or any combination thereof.
A communication bus 802 is used to transfer information between the above components. The communication bus 802 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.
The storage 803 is used to provide memory. The Memory 803 is, for example, but is not limited to, a read-only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only Memory (EEPROM), a compact disk read-only Memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 803 is, for example, separate and coupled to the processor 801 via a communication bus 802. The memory 803 may also be integrated with the processor 801.
Network card 804 uses any transceiver or the like for communicating with other devices or communication networks. Network card 804 includes a wired communication interface and may also include a wireless communication interface. The wired communication interface may be an ethernet interface, for example. The ethernet interface may be an optical interface, an electrical interface, or a combination thereof. The wireless communication interface may be a Wireless Local Area Network (WLAN) interface, a cellular network communication interface, or a combination thereof.
In particular implementations, processor 801 may include one or more CPUs such as CPU0 and CPU1 shown in fig. 12 as one example.
In particular implementations, computer device 800 may include multiple processors, such as processor 801 and processor 805 shown in FIG. 12, as one embodiment. Each of these processors may be a single-Core Processor (CPU) or a multi-Core Processor (CPU). A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, computer device 800 may also include an output device and an input device, as one embodiment. The output device communicates with the processor 801 and may display information in a variety of ways. For example, the output device may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device is in communication with the processor 801 and may receive user input in a variety of ways. For example, the input device may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
In some embodiments, the memory 803 is used to store program code 810 for performing aspects of the present application, and the processor 801 may execute the program code 810 stored in the memory 803. That is, the computer apparatus 800 may implement the methods provided by the method embodiments through the processor 801 and the program code 810 in the memory 803.
In some embodiments, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the client reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the client executes the metadata processing method in the distributed file system.
In some embodiments, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the metadata server reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the metadata server executes the metadata processing method in the distributed file system.
Those of ordinary skill in the art will appreciate that the various method steps and elements described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both, and that the steps and elements of the various embodiments have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the unit is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.
In addition, each unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware or software unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The terms "first," "second," and the like, in this application, are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it is to be understood that "first" and "second" do not have a logical or temporal dependency, nor do they define a quantity or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
The term "at least one" in this application means one or more.
It is also understood that the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer program instructions. When loaded and executed on a computer, produce, in whole or in part, the procedures or functions according to the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device.
The computer program instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., Digital Video Disks (DVDs), or semiconductor media (e.g., solid state disks), among others.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (19)

1. A metadata processing method in a distributed file system, wherein the distributed file system comprises a client and a metadata server, the method comprising:
the client executes remote unilateral reading operation on the memory of the metadata server through a network card to obtain a target version of the metadata, wherein the target version is the latest version in a copy set of the metadata, and the copy set comprises metadata of at least one version generated by copying COW when the metadata server writes based on the metadata;
and the client stores the target version of the metadata to a memory of the client.
2. The method of claim 1, wherein the client performs a remote one-sided read operation on the memory of the metadata server through a network card, comprising:
the client maps a remote memory address into a local memory address through the network card, the remote memory address is used for indicating an address of the copy set in the memory of the metadata server, and the local memory address is used for indicating an address in the memory of the client;
and the client executes read operation on the local memory address.
3. The method according to claim 1 or 2, wherein the client performs a remote one-sided read operation on the memory of the metadata server through a network card to obtain a target version of the metadata, and the method comprises:
and the client searches a target version of the metadata in the copy set according to a pointer field in each version of the metadata, wherein the pointer field is used for indicating the next version of the metadata, and the value of the pointer field in the target version of the metadata is null.
4. The method according to any one of claims 1 to 3, wherein after the client performs a remote unilateral reading operation on the memory of the metadata server through a network card, the method comprises:
and the client executes remote atomic operation on the memory of the metadata server through the network card so as to update the target version of the metadata.
5. A metadata processing method in a distributed file system, wherein the distributed file system comprises a client and a metadata server, the method comprising:
the metadata server receiving an update request from the client, the update request including a first version of the client's updated metadata;
if the first version is the latest version in the copy set of the metadata, the metadata server copies the COW when writing based on the metadata to obtain a target version of the metadata, wherein the target version of the metadata is a copy of the first version of the metadata;
the metadata server adds a target version of the metadata to the replica pool.
6. The method of claim 5, wherein after the metadata server receives an update request from the client, the method further comprises:
and if the first version is not the latest version in the copy set of the metadata, the metadata server sends a failure message to the client, wherein the failure message represents that the metadata update fails.
7. The method of claim 5 or 6, wherein after the metadata server adds the target version of the metadata to the set of replicas, the method further comprises:
and if the state of the target version of the metadata meets the condition, the metadata server releases the memory space occupied by the target version of the metadata, and the state is used for indicating the access frequency of the client to the target version of the metadata.
8. The method of any of claims 5 to 7, wherein prior to the metadata server receiving an update request from the client, the method further comprises:
the metadata server allocates a memory space from a memory, wherein the memory space is used for storing the copy set;
the metadata server registers the memory address corresponding to the memory space to a network card of the metadata server;
and the metadata server sends the memory address to the client.
9. The client is characterized by comprising a network card and a memory:
the network card is used for executing remote unilateral reading operation on a memory of a metadata server to obtain a target version of metadata, wherein the target version is the latest version in a copy set of the metadata, and the copy set comprises metadata of at least one version generated by copying COW when the metadata server writes based on the metadata;
the network card is further configured to store the target version of the metadata in the memory of the client.
10. The client according to claim 9, wherein the network card is configured to map a remote memory address to a local memory address, the remote memory address is used to indicate an address of the copy set in the memory of the metadata server, and the local memory address is used to indicate an address in the memory of the client; and executing read operation on the local memory address.
11. The client according to claim 9 or 10, wherein the network card is configured to search the copy set for the target version of the metadata according to a pointer field in each version of the metadata, where the pointer field is used to indicate a next version of the metadata, and a value of the pointer field in the target version of the metadata is null.
12. The client according to any one of claims 9 to 11, wherein the network card is further configured to perform a remote atomic operation on the memory of the metadata server to update the target version of the metadata.
13. A metadata server, the metadata server comprising a processor and a network card:
the network card is used for receiving an update request from a client, wherein the update request comprises a first version of metadata updated by the client;
the processor is configured to copy a COW when writing based on the metadata to obtain a target version of the metadata if the first version is a latest version in the copy set of the metadata, where the target version of the metadata is a copy of the first version of the metadata;
the processor is further configured to add a target version of the metadata to the replica set.
14. The metadata server of claim 13, wherein the network card is further configured to send a failure message to the client if the first version is not the latest version in the set of copies of the metadata, the failure message indicating that the metadata update failed.
15. The metadata server according to claim 13 or 14, wherein the processor is further configured to release the memory space occupied by the target version of the metadata if a status of the target version of the metadata satisfies a condition, where the status is used to indicate a frequency of access by a client to the target version of the metadata.
16. The metadata server according to any one of claims 13 to 15, wherein the processor is further configured to allocate a memory space from a memory, the memory space being used for storing the copy set; registering a memory address corresponding to the memory space to the network card;
the network card is also used for sending the memory address to the client.
17. A distributed file system, characterized in that the distributed file system comprises a client according to any of claims 9 to 12 and a metadata server according to any of claims 13 to 16.
18. A computer-readable storage medium having stored therein at least one instruction that is read by a processor to cause a client to perform the method of any one of claims 1-4.
19. A computer-readable storage medium having stored therein at least one instruction that is read by a processor to cause a metadata server to perform the method of any one of claims 5-8.
CN202011045589.9A 2020-09-28 2020-09-28 Metadata processing method and equipment in distributed file system Pending CN114281765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011045589.9A CN114281765A (en) 2020-09-28 2020-09-28 Metadata processing method and equipment in distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011045589.9A CN114281765A (en) 2020-09-28 2020-09-28 Metadata processing method and equipment in distributed file system

Publications (1)

Publication Number Publication Date
CN114281765A true CN114281765A (en) 2022-04-05

Family

ID=80868048

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011045589.9A Pending CN114281765A (en) 2020-09-28 2020-09-28 Metadata processing method and equipment in distributed file system

Country Status (1)

Country Link
CN (1) CN114281765A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277145A (en) * 2022-07-20 2022-11-01 北京志凌海纳科技有限公司 Distributed storage access authorization management method, system, device and readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277145A (en) * 2022-07-20 2022-11-01 北京志凌海纳科技有限公司 Distributed storage access authorization management method, system, device and readable medium

Similar Documents

Publication Publication Date Title
KR101914019B1 (en) Fast crash recovery for distributed database systems
KR101827239B1 (en) System-wide checkpoint avoidance for distributed database systems
US11080260B2 (en) Concurrent reads and inserts into a data structure without latching or waiting by readers
EP3803618B1 (en) Distributed transactions in cloud storage with hierarchical namespace
US9164702B1 (en) Single-sided distributed cache system
US9230002B2 (en) High performant information sharing and replication for single-publisher and multiple-subscriber configuration
US20220174130A1 (en) Network attached memory using selective resource migration
US10747673B2 (en) System and method for facilitating cluster-level cache and memory space
CN110377531B (en) Persistent memory storage engine device based on log structure and control method
US10310904B2 (en) Distributed technique for allocating long-lived jobs among worker processes
CN115599747B (en) Metadata synchronization method, system and equipment of distributed storage system
US9747323B1 (en) Method for reconstruction of a distributed lock state after a node addition or removal using a consistent hash
US11960442B2 (en) Storing a point in time coherently for a distributed storage system
US11221777B2 (en) Storage system indexed using persistent metadata structures
CN107408132B (en) Method and system for moving hierarchical data objects across multiple types of storage
US20220413743A1 (en) Method and system for libfabric atomics-based lockless cluster-wide shared memory access api in a distributed system
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
CN114281765A (en) Metadata processing method and equipment in distributed file system
US10055139B1 (en) Optimized layout in a two tier storage
CN114365109A (en) RDMA-enabled key-value store
CN108990422A (en) A kind of method, apparatus and calculating equipment of lock distribution
US20220222201A1 (en) Rpc-less locking mechanism based on rdma caw for storage cluster with active-active architecture
CN116431590A (en) Data processing method and related equipment
US9336232B1 (en) Native file access
CN116821058B (en) Metadata access method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination