WO2019228217A1 - 文件系统数据访问方法和文件系统 - Google Patents

文件系统数据访问方法和文件系统 Download PDF

Info

Publication number
WO2019228217A1
WO2019228217A1 PCT/CN2019/087691 CN2019087691W WO2019228217A1 WO 2019228217 A1 WO2019228217 A1 WO 2019228217A1 CN 2019087691 W CN2019087691 W CN 2019087691W WO 2019228217 A1 WO2019228217 A1 WO 2019228217A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
logical volume
data
target
file system
Prior art date
Application number
PCT/CN2019/087691
Other languages
English (en)
French (fr)
Inventor
朱家稷
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP19811764.0A priority Critical patent/EP3806424A4/en
Priority to JP2020567024A priority patent/JP7378870B2/ja
Publication of WO2019228217A1 publication Critical patent/WO2019228217A1/zh
Priority to US17/092,086 priority patent/US20210056074A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • users want to share file systems among multiple machines (as opposed to storage nodes, which can be called compute nodes).
  • a common scenario is that one machine writes data and serves users' access requests, while other machines pass The shared file system can read the latest data written in real time for data analysis or backup.
  • NFS Network File System
  • CIFS Common Internet File System
  • the commonly used NFS / CIFS network shared file system has bottlenecks in terms of performance and scalability, which are mainly reflected in: in order to support multiple clients (application clients in computing nodes) access to read and write data at the same time, usually in the An independent coordination server will be deployed in the file system to coordinate all access requests by providing a complex lock contention mechanism. Therefore, before any client needs to read and write data, it needs to request a lock from the coordination server. When the amount of data access increases, the coordination server is likely to cause a bottleneck, which makes the system less scalable.
  • embodiments of the present invention provide a file system data access method and a file system to improve data access performance of the file system.
  • an embodiment of the present invention provides a file system data access method executed by a computing node.
  • the file system includes at least one computing node and multiple storage nodes, and the method includes:
  • the computing node sends a mount request triggered to a target logical volume to the target storage node, where the target storage node is any one of the plurality of storage nodes, and the target logical volume corresponds to at least one of the plurality of storage nodes.
  • the computing node reads the log metadata of the log segment and checkpoint according to the log segment and checkpoint storage location information to restore the data state of the target logical volume;
  • the computing node performs data access processing based on a data state of the target logical volume.
  • an embodiment of the present invention provides a file system data access method, which is specifically executed by a file system access process in a computing node.
  • the file system includes at least one computing node and multiple storage nodes.
  • the method includes:
  • a file system access process in the computing node obtains a log segment and check corresponding to the target logical volume through a logical volume service process in a storage node.
  • Point storage location information
  • the file system access process reads log metadata of the log segment and checkpoint according to the log segment and checkpoint storage location information to restore the data state of the target logical volume;
  • the file system access process performs data access processing based on the data state of the target logical volume.
  • an embodiment of the present invention provides a computing node, including a processor and a memory, where the memory is configured to store one or more computer instructions, wherein when the one or more computer instructions are executed by the processor
  • the file system data access method in the first aspect or the second aspect is implemented.
  • the electronic device may further include a communication interface for communicating with other devices or a communication network.
  • An embodiment of the present invention provides a computer storage medium for storing and storing a computer program that causes a computer to implement the file system data access method in the first aspect or the second aspect when executed by a computer.
  • an embodiment of the present invention provides a file system data access method, which is executed by a storage node.
  • the file system includes at least one computing node and multiple storage nodes.
  • the method includes:
  • the target storage node receives a mount request corresponding to the target logical volume sent by the computing node, and the target storage node is any one of the plurality of storage nodes;
  • the data status of the volume is used for data access processing
  • the target logical volume corresponds to at least a portion of storage resources of the plurality of storage nodes.
  • an embodiment of the present invention provides a file system data access method, which is executed by a logical volume service process in a storage node.
  • the file system includes at least one computing node and multiple storage nodes.
  • the method includes:
  • the logical volume service process in the storage node receives the mount request corresponding to the target logical volume sent by the file system access process in the compute node;
  • the logical volume service process sends the first log segment and checkpoint storage location information corresponding to the target logical volume to the file system access process, so that the file system access process is based on the first log segment and check Point storage location information to restore the data state of the target logical volume for data access processing;
  • the target logical volume corresponds to at least part of the storage resources in the plurality of storage nodes, and the storage node is any one of the plurality of storage nodes.
  • an embodiment of the present invention provides a storage node, including a processor and a memory, where the memory is configured to store one or more computer instructions, wherein when the one or more computer instructions are executed by the processor
  • the file system data access method in the fourth aspect or the fifth aspect is implemented.
  • the electronic device may further include a communication interface for communicating with other devices or a communication network.
  • An embodiment of the present invention provides a computer storage medium for storing and storing a computer program that causes a computer to implement the file system data access method in the fourth aspect or the fifth aspect when executed by a computer.
  • an embodiment of the present invention provides a file system, including:
  • At least one computing node multiple storage nodes, and multiple root servers for managing the multiple storage nodes;
  • any one of the at least one computing node is configured to send a mount request triggered to a target logical volume to any one of the plurality of storage nodes; and receive the one sent by the any one of the storage nodes.
  • the any storage node is configured to obtain log segment and checkpoint storage location information corresponding to the target logical volume, and send the log segment and checkpoint storage location information to the any computing node.
  • At least one computing node multiple storage nodes, and multiple root servers for managing data block service processes in the multiple storage nodes;
  • each computing node has a data access process and a file system access process
  • Each storage node has a logical volume service process and a data block service process; the data block service process is used for reading and writing management of each data block stored in the corresponding storage node;
  • the file system access process in response to a mount operation triggered by a data access process in a corresponding computing node, sends a mount request for a target logical volume to a logical volume service process in a target storage node, and receives the logical volume service process Sending the first log segment and checkpoint storage location information corresponding to the target logical volume, and reading the log metadata of the log segment and checkpoint to restore the log segment and checkpoint according to the first log segment and checkpoint storage location information
  • a data state of a target logical volume performing data access processing based on the data state of the target logical volume, the target logical volume corresponding to at least a part of storage resources of the plurality of storage nodes;
  • the logical volume service process is configured to receive the mount request sent by the file system access process, and send the first log segment and checkpoint storage location information corresponding to the target logical volume to the file system access process. .
  • the file system includes at least one computing node, multiple storage nodes, and multiple root servers for managing multiple storage nodes.
  • Data blocks are stored in units, and each storage node performs read-write management on each data block stored in the storage node.
  • Different business applications in multiple computing nodes or different business applications in the same computing node can share the resources of multiple storage nodes, that is, different business applications can build corresponding logical volumes on the basis of the multiple storage nodes, that is, each A logical volume corresponds to at least part of the resources of multiple storage nodes.
  • each storage node maintains log information of multiple logical volumes that share the file system.
  • a logical volume service process can be deployed in each storage node to maintain the log information.
  • a mount request for a certain logical volume called a target logical volume is triggered to any storage node called a target storage node.
  • the target storage node obtains a log segment list and a checkpoint list corresponding to the target logical volume, and the log segment list stores metadata of each log of the target logical volume, so that the computing node can obtain the log segment list and check based on the obtained log segment list.
  • the point list restores the latest data state of the target logical volume in memory, so that the computing node can perform data access processing such as data writing or data reading based on the latest data state of the target logical volume.
  • data access processing such as data writing or data reading based on the latest data state of the target logical volume.
  • the log information of the target logical volume maintained in the storage node can enable multiple computing nodes to maintain good read and write consistency, and can read and write simultaneously, avoiding read and write conflicts.
  • FIG. 1 is an architecture diagram of a file system according to an embodiment of the present invention
  • FIG. 2 is an architecture diagram of another file system according to an embodiment of the present invention.
  • FIG. 3 is a logical hierarchical architecture diagram corresponding to the file system shown in FIG. 2;
  • FIG. 4a is a flowchart of a file system data access method according to an embodiment of the present invention.
  • 4b is a flowchart of another file system data access method according to an embodiment of the present invention.
  • step 404a is a flowchart of an implementation process of step 404a in the embodiment shown in FIG. 4a;
  • 6a is a flowchart of another file system data access method according to an embodiment of the present invention.
  • 6b is a flowchart of still another file system data access method according to an embodiment of the present invention.
  • step 604a is a flowchart of an implementation process of step 604a in the embodiment shown in FIG. 6a;
  • step 604b is a flowchart of an implementation process of step 604b in the embodiment shown in FIG. 6b;
  • FIG. 8a is a flowchart of still another file system data access method according to an embodiment of the present invention.
  • FIG. 8b is a flowchart of another file system data access method according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a file system data access device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of another file system data access device according to an embodiment of the present invention.
  • first, second, third, etc. may be used to describe XXX in the embodiments of the present invention, these XXX should not be limited to these terms. These terms are only used to distinguish XXX.
  • the first XXX may also be referred to as the second XXX, and similarly, the second XXX may also be referred to as the first XXX.
  • the words “if”, “if” as used herein can be interpreted as “at” or “when” or “responding to determination” or “responding to detection”.
  • the phrases “if determined” or “if detected (the stated condition or event)” can be interpreted as “when determined” or “responded to the determination” or “when detected (the stated condition or event) ) “Or” in response to a test (statement or event stated) ".
  • Multiple storage nodes often correspond to storage servers deployed in different geographic regions, and multiple computing nodes can share the storage resources of the multiple storage nodes.
  • data is read and written in chunks, and each chunk is preset to a certain capacity.
  • the sharing of the file system by multiple computing nodes mainly reflects that different users can create logical volumes for business applications in their corresponding computing nodes on the basis of the storage resources provided by multiple storage nodes of the file system.
  • Logical volume can be considered as a logical file system created on the basis of a physical file system.
  • Different logical volumes correspond to different storage resources in multiple storage nodes, that is, different logical volumes occupy different storage resources.
  • each logical volume corresponds to some storage resources in multiple storage nodes.
  • the multiple storage nodes are only used by a business application, the logical volume corresponding to the business application may correspond to multiple storage nodes. Therefore, each logical volume can be considered to correspond to at least a part of storage resources in multiple storage nodes.
  • the stored content is the same.
  • the stored content is mainly the correspondence between the data block identifier and the storage node identifier. That is, each RS knows which data blocks are stored on each storage node.
  • the data block service process running on the storage node is actually used to read and write data from the storage node, and the data block service process corresponds to the storage node one by one. Therefore, also It can be considered that the root server maintains the correspondence between the data block identifier and the data block service process identifier.
  • any root server can also record this Information about all targets and file spaces of the file system, as well as a list of data blocks for each file.
  • storage node allocation and scheduling can be performed based on the storage load of each storage node.
  • the data access process of the volume (referred to as the target logical volume).
  • the data access process can be a data writing process or a data reading process.
  • the any computing node is configured to send a mount request triggered on a target logical volume to any one of the plurality of storage nodes (hereinafter referred to as a target storage node); and receive the target logical volume sent by the target storage node Corresponding log segment and checkpoint storage location information; read the metadata of the log segment and checkpoint log to restore the data status of the target logical volume based on the log segment and checkpoint storage location information; perform data based on the data status of the target logical volume Access processing.
  • the target storage node is configured to obtain log segment and checkpoint storage location information corresponding to the target logical volume, and send the log segment and checkpoint storage location information to any one of the computing nodes.
  • the target logical volume corresponds to multiple log segments and checkpoints, and the log segments and checkpoints have a corresponding relationship. For example, a logpoint (1-1000) is generated after the log segment [1-1000]. Each log segment contains multiple logs.
  • the log consists of data and metadata, but some logs may contain only metadata.
  • the metadata records information about data access to each file corresponding to the target logical volume, such as who accessed the data in which file at what time.
  • the metadata also includes a log sequence number. Based on the log sequence number, consistency of reading and writing of data can be guaranteed, which will be described in subsequent embodiments.
  • each log segment corresponds to a checkpoint, which can be used to recover data based on the checkpoint when an exception occurs in the file system. It is also to quickly recover exceptions of the file system based on the logs, and the logs are organized in the form of log segments.
  • the storage node currently receiving the mount request cannot respond to the mount request due to heavy load and other reasons, the storage node can forward the mount request to other storage nodes for processing.
  • the target storage node When the target storage node receives the mount request sent by the computing node, it knows which logical volume is requested to be mounted according to the identifier of the target logical volume carried in the mount request, so that the log of each logical volume maintained locally
  • the log segment and checkpoint storage location information corresponding to the target logical volume are obtained from the segment and checkpoint storage location information and fed back to the computing node.
  • the computing node then reads the corresponding log metadata from the corresponding storage location according to the obtained log segment and checkpoint storage location information to restore the data state of the target logical volume.
  • the computing node replays the log segment in memory by And checkpoint log metadata to restore the data state of the target logical volume, which reflects the latest data state of the target logical volume, that is, the latest state of the data in each file corresponding to the target logical volume as of the current moment, based on this
  • the computing node can perform subsequent data access processing, such as data writing and data reading processing.
  • subsequent data access processing such as data writing and data reading processing. The process of the data access processing will be described in detail in the subsequent method embodiments, which will not be expanded in detail in this embodiment.
  • the target storage node may also obtain metadata corresponding to the log segment and checkpoint from the corresponding data block of each storage node based on the corresponding relationship and feed it back to the computing node.
  • the log segment and checkpoint corresponding to the target logical volume sent by the target storage node to the computing node may be, but are not limited to, the storage location information of the current last log segment and the last checkpoint of the target logical volume, because the general situation
  • the checkpoint (2000) records the data status of the target logical volume before the log segment [2001-2200].
  • the target logical volume must first be mounted locally for subsequent access.
  • the latest data state of the target logical volume is restored, so that when multiple When a computing node accesses the target logical volume, each computing node can obtain the latest data status of the target logical volume to ensure read and write consistency.
  • FIG. 1 The above-mentioned architecture shown in FIG. 1 is introduced from the perspective of a hardware entity node.
  • each computing node and each storage node will be deployed.
  • Related software modules which can be embodied as various service processes. Therefore, the detailed internal structure of the file system provided by the embodiment of the present invention will be further described below with reference to the embodiment shown in FIG. 2.
  • the file system includes: at least one computing node (host1 and host2 illustrated in FIG. 2), multiple storage nodes, and a data block service process (Chunk Server, CS for short) for managing multiple storage nodes. ) Multiple root servers (RS).
  • host1 and host2 illustrated in FIG. 2 the file system includes: at least one computing node (host1 and host2 illustrated in FIG. 2), multiple storage nodes, and a data block service process (Chunk Server, CS for short) for managing multiple storage nodes.
  • CS data block service process
  • RS Multiple root servers
  • each computing node has a data access process and a file system access process.
  • Each storage node has a logical volume service process (Volume Server, referred to as VS) and a CS.
  • VS logical volume service process
  • the CS is used to read, write, and manage each data chunk stored in the corresponding storage node.
  • the data access process in each computing node may be a data writing process (Writer) or a data reading process (Reader) illustrated in FIG. 2.
  • the data access process is often initiated by a business application in a compute node.
  • the file system access process in each computing node is an access port for accessing the corresponding target logical volume.
  • the file system access process may be a user-based file system (File system system in Figure 2).
  • FUSE is a user mode file system access process (fuse worker).
  • VFS, / dev / fuse, and fuse worker are the constituent modules constituting Fuse. It is worth noting that the file system access process can be implemented without relying on the Fuse architecture, and can also implement related functions played by fuse workers in the operating system kernel like file systems such as Lustre / GPFS.
  • each storage node further includes a garbage collection service process (Garbage Collector, GC for short).
  • Garbage Collector Garbage Collector
  • each storage node can be deployed with VS, GC, and CS service processes.
  • the file system provided by the embodiment of the present invention can be regarded as having a logically hierarchical structure.
  • FIG. 3 illustrates this point more intuitively.
  • Layer structure, where the number of VS, GC, and CS shown in Figure 3 is only to illustrate the layered structure, and is not reflected in the corresponding relationship with the storage node, because these service processes and storage nodes are generally corresponding.
  • the file system can be regarded as including three layers consisting of a top layer, a middle layer, and a lower layer.
  • the bottom layer is a chunk service layer, which provides a distributed append-only read and write service for chunks. Users can create chunks, specify the number of copies, and write data in an append-only manner, and can read the data that has been successfully written in real time consistently, and can specify that some chunk data is cached in memory or high-speed media to speed up the data Read.
  • the chunk service layer is mainly composed of two roles: one is CS, which is used to manage all chunk information in the corresponding storage node and provides read and write services for the chunk; and the other is RS, which is used to manage all CSs, that is, The CS in the storage node can maintain the correspondence between the CS identifier and the chunk identifier.
  • the middle layer is the volume service layer, which provides services to create / delete / mount / unmount logical volumes. It consists of two roles: one is VS, which is used to manage the metadata of each logical volume. This includes maintaining information about the log segment of each logical volume and the storage location of the checkpoints that speed up log recovery. The second is GC, which is used to reclaim junk space in the file system and regularly produce checkpoints to speed up the file system recovery process.
  • the upper layer is the file system access layer, which consists of two parts: one is that the data access process (such as Writer / Reader) accesses the file data by calling the standard Posix interface; the other is the user mode that contains the file system access process fuse worker File system Fuse, based on the Fuse framework, provides fuse workers to manage and read and write user-mode file systems.
  • the data access process such as Writer / Reader
  • Posix interface the standard Posix interface
  • the other is the user mode that contains the file system access process
  • fuse worker File system Fuse based on the Fuse framework, provides fuse workers to manage and read and write user-mode file systems.
  • the file system access process in the computing node responds to a mount operation triggered by a data access process in the compute node, sends a mount request for a target logical volume to a logical volume service process in any storage node, and receives the logic
  • the first log segment and checkpoint storage location information corresponding to the target logical volume sent by the volume service process thereby reading the log metadata of the first log segment and checkpoint to restore the target logic according to the first log segment and checkpoint storage location information
  • the data status of the volume which performs data access processing based on the data status of the target logical volume;
  • the logical volume service process is configured to receive a mount request sent by a file system access process, and send the first log segment and checkpoint storage location information corresponding to the target logical volume to the file system access process.
  • the first log segment and the checkpoint may be the current last log segment and the last checkpoint of the target logical volume.
  • a logical volume service process is deployed in each storage node, and the logical volume service process maintains the log segment and checkpoint storage location information corresponding to each logical volume.
  • the logical volume service process can be started when the storage node is started.
  • the logical volume service processes of different storage nodes have a mutual backup relationship to ensure that the logical volume service process of each storage node maintains the completeness of all logical volumes. Log segment and checkpoint storage location information.
  • FIG. 4a is a flowchart of a file system data access method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may be executed by a computing node in the architecture shown in FIG. As shown in Figure 4a, the method includes the following steps:
  • the computing node In response to the mount operation triggered by the data writing process, the computing node sends a first mount request to the target storage node.
  • the first mount request includes the target logical volume identifier, user information, and a readable and writable mount mode identifier. .
  • the target storage node is any one of a plurality of storage nodes.
  • This embodiment describes a case where a business application in a computing node needs to perform data writing processing, so the business application starts a data writing process in the computing node.
  • the target logical volume corresponding to the business application needs to be accessed, the target logical volume must first be mounted to the local computing node. Therefore, this embodiment first introduces the target logical volume in the data writing scenario. Mount process.
  • the mount operation may be a data writing process that sends a mount notification to the processor of the compute node. Based on the mount operation, The computing node sends a first mount request including the target logical volume ID, user information, and a readable and writable mount mode ID to any storage node called a target storage node.
  • the user information may be information such as a user account, a user name, and the like; the readable and writable mount mode identifier corresponds to a data writing process, that is, the embodiment of the present invention allows the data writing process to be linked in a write-read manner Load logical volumes.
  • This read-write mount mode means that for the same target logical volume, only one data write process can be allowed to write data at a time.
  • the computing node receives the log segment and check corresponding to the target logical volume sent by the target storage node after determining that the user information has passed the user permission verification of the target logical volume and determining that the target logical volume is not currently mounted in a read-write mount mode. Point to store location information.
  • the target storage node After receiving the first mount request of the compute node, the target storage node knows that a data writing process needs to access the target logical volume based on the read-write mountable mode identifier. At this time, the target storage node Verify whether the user has the right to access the target logical volume based on the user information in the first mount request. On the other hand, determine whether the target logical volume is currently occupied by other data writing processes, that is, whether it is currently being read-write mounted. Mount.
  • each storage node may maintain user authority information of each logical volume, such as a user whitelist, for implementing user authority verification.
  • each storage node synchronizes with each other the mount status information of each logical volume, that is, the information about how each logical volume is mounted or not. Based on this, it can be achieved whether the target logical volume is currently Determine the mount of the read-write mount mode.
  • the computing node may feedback the log segment and checkpoint storage location information corresponding to the target logical volume.
  • the computing node reads the log metadata of the log segment and the checkpoint according to the storage location information of the log segment and the checkpoint to restore the data state of the target logical volume.
  • the computing node reads each log from the corresponding storage location according to the above storage location information, and specifically reads the metadata in the log, so that each metadata is played back in sequence in memory until the log with the last log sequence number i, To restore the latest data state of the target logical volume.
  • the computing node performs data access processing based on the data state of the target logical volume.
  • This data writing process often includes a file preparation write stage and a data write stage, which will be explained in the embodiment shown in FIG. 5a. .
  • FIG. 4a The foregoing embodiment shown in FIG. 4a is described based on the role of a computing node and a storage node node or device.
  • the following describes the implementation of the present invention based on the logical system architecture shown in FIG. 2 or FIG. 3 and the embodiment shown in FIG. 4b.
  • the implementation process of the file system data access method provided by the example.
  • FIG. 4b is a flowchart of another file system data access method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may be performed by a file system access process in a computing node in the architecture shown in FIG. 2 or FIG. 3 ( fuse worker). As shown in FIG. 4b, the method includes the following steps:
  • a file system access process in the computing node sends a first mount request to a logical volume service process in the storage node, and the first mount request includes a target The identification of the logical volume, user information, and read-write mount mode identification.
  • the computing node is any one of multiple computing nodes in the file system
  • the storage node is any one of multiple storage nodes.
  • a file system access process such as the fuse worker shown in Figure 2 or Figure 3, is deployed in the computing node, but not limited to this.
  • a logical volume service process and a data block service process are deployed in each storage node.
  • the data block service process is used to read and write manage data blocks in the corresponding storage node, and the logical volume service process maintains each logic.
  • the correspondence between the log segment ID / checkpoint ID and data block ID of the volume assists the file system access process to read and write data.
  • the data write process triggers a mount operation, such as accessing a file system in the compute node
  • a mount operation such as accessing a file system in the compute node
  • the process sends the first mount request, and the file system access process further sends the first mount request to the logical volume service process.
  • the file system access process and the logical volume service process may interact based on an inter-process communication mechanism.
  • the first mount request is used to request the logical volume service process to mount the target logical volume in a write-read mount mode.
  • the read-write mount mode means For the same target logical volume, only one data write process can be allowed to write data at a time.
  • the file system access process receives the log corresponding to the target logical volume sent by the logical volume service process after determining that the user information has passed the user permission verification of the target logical volume and determining that the target logical volume is not currently mounted in a read-write mount mode Segment and checkpoint storage location information.
  • each logical volume service process in each storage node has a mutual backup relationship, each logical volume service process maintains the correspondence between the log segment ID and checkpoint ID and data block ID of each logical volume.
  • the logical volume service process of the first mount request obtains the storage location information of the log segment and the checkpoint corresponding to the target logical volume based on the target logical volume identifier and the corresponding relationship therein, that is, the data corresponding to the log segment and the checkpoint. Block ID.
  • the logical volume service process may feedback the data block identifiers corresponding to the current last log segment and the last checkpoint of the target logical volume to the file system access process, so that the file system The access process restores the latest data state of the target logical volume based on the last log segment and the last checkpoint.
  • the file system access process reads the log metadata of the log segment and the checkpoint according to the log segment and checkpoint storage location information to restore the data state of the target logical volume.
  • the file system access process may query any root server by using the data block ID obtained from the logical volume service process to obtain the data block service process ID corresponding to the data block ID, thereby using the data block service process ID
  • the corresponding data block service process reads the metadata of the log stored in the data block corresponding to the data block identifier.
  • the logical volume service process may also query the root server based on the determined data block identifier to obtain a data block service process identifier corresponding to the data block identifier, so as to serve the data block identifier and the data block service.
  • the process identifier is fed back to the file system access process, so that the file system access process reads the metadata of the log stored in the data block corresponding to the data block identifier through the data block service process corresponding to the data block service process identifier.
  • the file system access process performs data access processing based on the data state of the target logical volume.
  • the file system access process can perform subsequent data writing processes.
  • This data writing process often includes a file preparation and data writing stage, which will be described in the following embodiment shown in FIG. 5b. Description.
  • FIG. 5a is a flowchart of an implementation process of step 404a in the embodiment shown in FIG. 4a. As shown in FIG. 5a, step 404a may specifically include the following specific steps:
  • the computing node In response to a file preparation write operation triggered by a data writing process, the computing node generates a first log including only metadata, and the first log includes a log sequence number i + 1 after incrementing a current last log sequence number i and a File information corresponding to the file preparation write operation.
  • This preparation operation is called a file preparation write operation.
  • the file preparation write operation can be, for example, opening a file or creating a file. And open a file.
  • the process of preparing the file for writing is introduced first.
  • the data writing process may trigger the above file preparation writing operation by calling a file preparation writing corresponding interface.
  • the compute node will generate a log record corresponding to the file ready for write operation called the first log.
  • the first log only includes metadata.
  • the metadata includes a log sequence number i + 1 after incrementing the current last log sequence number i and file information corresponding to a file preparation write operation.
  • the file information includes information such as a file name of a file that is created or opened, and a file identifier descriptor (File Identifier Descriptor (FID)). It is assumed that the last log in the log segment of the target logical volume in the embodiment shown in FIG. 4a If the serial number is i, the log serial number of the first log is increased by one to i + 1.
  • the computing node determines, by using the target storage node and the root server, the first data block where the last log segment corresponding to the target logical volume is located and the first storage node corresponding to the first data block.
  • the storage node data is stored in chunks, and the data is written in an append-only manner, and the data is actually written. It is included in the log. Therefore, the log generated by the compute node is to be append-only stored in a data block of a storage node. Therefore, it is necessary to determine where the last log segment of the target logical volume is. Storage location, that is, the data block where the last log segment is located and the storage node corresponding to the data block.
  • each storage node maintains storage location information of log segments and checkpoints of each logical volume, and the storage location information is mainly between a log segment identifier / checkpoint identifier and a data block identifier. Therefore, the computing node can obtain the data block identifier described in the last log segment of the target logical volume through the target storage node. Based on the data block identifier, the computing node can further query the root server to obtain the data through the root server. The storage node identifier corresponding to the block identifier. The root server maintains the correspondence between each storage node identifier and each data block identifier.
  • the computing node can obtain the first data block where the last log segment is located and the first storage node corresponding to the first data block through the following process:
  • the computing node sends a first query request to the target storage node for querying the data block where the last log segment corresponding to the target logical volume is located;
  • the computing node receives the identifier of the first data block sent by the target storage node, and the identifier of the first data block is determined by the target storage node according to the correspondence between the log segment identifier and the data block identifier of the target logical volume that is maintained;
  • the computing node sends a second query request to the root server for querying the storage node corresponding to the identifier of the first data block;
  • the computing node receives the identifier of the first storage node sent by the root server, and the identifier of the first storage node is determined by the root server according to the correspondence between the maintained data block identifier and the storage node identifier.
  • the computing node sends a log storage request to the first storage node to request to add the first log to the first data block and requests the first storage node to cache the first log.
  • the computing node may send the first log to the first storage node, so that the first storage node appends the first log to the first data block and stores the first log in the memory.
  • the first log is cached, so that readers of subsequent data reading processes can quickly read the first log.
  • a confirmation notification may be fed back to the computing node to inform the computing node that the storage is successful.
  • the computing node In response to the data write request sent by the data writing process, the computing node generates a second log, where the data write request includes the write data and file information corresponding to the write data, and the second log includes metadata and write Data, metadata includes log sequence number i + 2 and file information after incrementing the current last log sequence number i + 1.
  • the computing node determines, through the target storage node and the root server, the second data block where the last log segment corresponding to the target logical volume is located and the second storage node corresponding to the second data block.
  • the computing node sends a log storage request to the second storage node to request to add the second log to the second data block and request the second storage node to cache the metadata in the second log.
  • the data writing process can trigger the above data writing request by calling a standard data writing interface.
  • the data writing request includes both the specific data to be written and the file information corresponding to the written data.
  • the file information For example, it includes the file FID, the write position (offset) and data length (length) of the write data in the file, the data position pointer, and so on.
  • the computing node generates a second log for recording the data writing operation of the computing node.
  • the second log is composed of metadata and data, where the data is the write data of the computing node, and the metadata includes the log sequence number i + 2 and the above file information after incrementing the current last log sequence number i + 1.
  • the computing node needs to store the generated second log in the storage node. Specifically, the second log needs to be written into the data block where the current last log segment of the target logical volume is located.
  • the computing node needs to determine the second data block where the last log segment is located and the second storage node corresponding to the second data block through the target storage node and the root server to send the second log to the second storage node, so that the second The storage node appends the second log to the second data block and caches the metadata of the second log in the memory.
  • the determination process of the second data block and the second storage node reference may be made to the determination process of the first data block and the first storage node, and details are not described herein.
  • the second data block and the first data block are likely to be the same data block. Therefore, the second storage node and the first storage node may also be the same storage node. However, when the final log segment has satisfied the closed condition before being stored in the second log, the second data block will be different from the first data block, but the second storage node may be the same as the first storage node.
  • the computing node closes the last log segment, generates a new checkpoint, and applies a new log segment to the target storage node as the last log segment.
  • the closing condition is that the number of logs contained in the last log segment reaches a preset number, or that the amount of data corresponding to the last log segment reaches a preset capacity value.
  • each storage node records which log segments and checkpoints are stored on each chunk, such as log segment [0-2000] on chunk1, checkpoint (2000) on chunk7, and log segment [2001- 2200] on chunk2. Assume that the file system is abnormal at a certain time after the log segment [2001-2200]. At this time, you only need to read the checkpoint (2000) and play back the log segment [2001-2200] to restore the latest state of the corresponding logical volume.
  • the compute node The log segment [1500-2500] is closed, a checkpoint (2500) is generated, and a new log segment is applied to the target storage node as the last log segment.
  • the new log segment starts from the log sequence number 2501.
  • the applied new log segment corresponds to the subsequent generated logs, and the subsequent logs stored in the new log segment need to be stored in a data block. Therefore, the target storage node receives the After applying, you need to allocate corresponding data blocks for the new log segment.
  • the target storage node may request the root server to allocate a data block for storing subsequent corresponding logs of the new log segment.
  • the root server can assign a storage node A to allocate the data block according to the storage load of each storage node. Assuming that the storage node A allocates chunk5 to store the subsequent corresponding logs of the new log segment, the root server maintains the storage node A and chunk5. At the same time, the root server feeds back the chunk5 to the target storage node.
  • the target storage node maintains the corresponding relationship between the new log segment and chunk5.
  • the target storage node can synchronize the corresponding relationship to other storage nodes.
  • FIG. 5a The foregoing embodiment shown in FIG. 5a is described from the perspective of the node of the computing node and the storage node. Based on the logical system architecture shown in FIG. 2 or FIG. When the internal service process shown in FIG. 2 or FIG. 3 is included, the specific implementation process of the data writing process shown in FIG. 5a is implemented.
  • FIG. 5b is a flowchart of an implementation process of step 404b in the embodiment shown in FIG. 4b. As shown in FIG. 5b, step 404b may specifically include the following specific steps:
  • the file system access process In response to a file preparation write operation triggered by a data writing process, the file system access process generates a first log including only metadata, and the first log includes a log sequence number i + 1 after incrementing a current last log sequence number i And file information corresponding to the file preparation write request.
  • the file system access process determines, through the logical volume service process and the root server, the first data block corresponding to the last log segment corresponding to the target logical volume and the first data block service process corresponding to the first data block.
  • the root server is used to manage the data block service process in each storage node, and specifically, maintains the correspondence between the data block identifier and the data block service process identifier in each storage node.
  • the file system access process may determine the first data block and the first data block service process corresponding to the first data block through the following steps:
  • the file system access process sends a first query request to the logical volume service process to query for the data block where the last log segment corresponding to the target logical volume is located;
  • the file system access process sends a second query request to the root server for querying the data block service process corresponding to the identifier of the first data block;
  • the file system access process receives the identifier of the first data block service process sent by the root server, and the identifier of the first data block service process is determined by the root server according to the correspondence between the maintained data block identifier and the data block service process identifier.
  • the file system access process sends a log storage request to the first data block service process to request that the first log be added to the first data block and that the first data block service process cache the first log.
  • the data writing process can further perform the data writing process of the following steps.
  • the data write request includes the write data and file information corresponding to the write data
  • the file system access process generates a second log
  • the second log includes metadata and Write data.
  • the metadata includes the log sequence number i + 2 and file information after incrementing the current last log sequence number i + 1.
  • the file system access process sends a log storage request to the second data block service process, so as to request that the second log be added to the second data block, and requests the second data block service process to cache metadata in the second log.
  • the logs generated during the above file preparation and data writing processes are appended to the last log segment in an appended manner, but when the last log segment reaches a closed condition, the current last log needs to be closed Segment, apply for a new log segment as the last log segment. Therefore, if the current last log segment reaches the closed condition, the file system access process closes the last log segment, generates a new checkpoint, and requests a new log segment from the logical volume service process as the last log segment.
  • the closing condition is that the number of logs contained in the last log segment reaches a preset number, or that the amount of data corresponding to the last log segment reaches a preset capacity value.
  • the logical volume service process When the logical volume service process receives a request for a file system access process, optionally, it can generate a new log segment on the one hand, and apply to the root server to allocate a corresponding data block and data block service process to the new log segment.
  • the metadata and data are included in the same log and the log is stored in the data block of the storage node, and the metadata is cached in memory. It can enable subsequent data reading processes to quickly pull the latest logs in batches to update the local file system mirroring state, that is, restore the latest data state of the target logical volume, and reduce the reading delay.
  • FIG. 6a is a flowchart of another file system data access method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may be executed by a computing node in the architecture shown in FIG. As shown in FIG. 6a, the following steps may be included:
  • the computing node In response to the mount operation triggered by the data reading process, the computing node sends a second mount request to the target storage node, where the second mount request includes an identifier of the target logical volume, user information, and a read-only mount mode identifier.
  • the target storage node is any one of a plurality of storage nodes.
  • This embodiment describes a case where a business application in a computing node needs to perform data reading processing. Therefore, the business application starts a data reading process in the computing node.
  • the target logical volume corresponding to the business application needs to be accessed, the target logical volume must first be mounted to the local computing node. Therefore, this embodiment first introduces the target logical volume in the data reading scenario. Mount process.
  • the mount operation may be a data read process that sends a mount notification to the processor of the compute node. Based on the mount operation, The compute node sends a second mount request including the target logical volume ID, user information, and read-only mount mode ID to any storage node called the target storage node.
  • the user information may be information such as a user account and a user name;
  • the read-only mount mode identifier corresponds to a data reading process, that is, in the embodiment of the present invention, the data reading process is allowed to mount logic in a read-only manner Volume, this read-only mount mode means that for the same target logical volume, multiple data reading processes are allowed to read data, but data modification and writing operations cannot be performed.
  • the computing node receives the log segment and checkpoint storage location information of the target logical volume sent by the target storage node after determining that the user information passes the user permission verification of the target logical volume.
  • the mounting of the target logical volume only needs to satisfy the verification of user permissions, because the read-only mounting method allows multiple data reading processes to read data at the same time.
  • the computing node reads the log metadata of the log segment and the checkpoint according to the storage location information of the log segment and the checkpoint to restore the data state of the target logical volume.
  • the computing node performs data access processing based on the data state of the target logical volume.
  • the compute node can access all the data that the target logical volume currently has.
  • This data reading process often includes a file preparation read stage and a data read stage, which will be explained in the embodiment shown in FIG. 7a. .
  • FIG. 6a describes the implementation process of the file system data access method provided by the embodiment of the present invention from the perspective of the node of the computing node and the storage node.
  • the following is based on the logical system architecture shown in FIG. 2 or FIG.
  • the embodiment shown in FIG. 6b introduces another specific implementation process of the file system data access method.
  • FIG. 6b is a flowchart of another file system data access method according to an embodiment of the present invention.
  • the method provided by the embodiment of the present invention may be implemented by a file system access process in a computing node in the architecture shown in FIG. 2 or FIG. 3 ( fuse worker). As shown in FIG. 6b, the method includes the following steps:
  • the file system access process in the compute node sends a second mount request to the logical volume service process in the storage node, and the second mount request includes a target The logical volume ID, user information, and read-only mount mode ID.
  • This embodiment describes a case where a business application in a computing node needs to perform data reading processing. Therefore, the business application starts a data reading process in the computing node.
  • the data reading process first triggers a mount operation for the target logical volume, such as sending the above-mentioned second mount request to the file system access process to mount the target logical volume locally for subsequent data access.
  • the file system access process receives the log segment and checkpoint storage location information corresponding to the target logical volume sent by the logical volume service process after determining that the user information passes the user permission verification of the target logical volume.
  • the logical volume service process can allow the user to mount the target logical volume as a read-only mount when verifying that the user has permission to use the target logical volume.
  • the logical volume service process feeds back the log segment and checkpoint storage location information of the target logical volume to the file system access process.
  • the storage location information may be a log segment of the target logical volume determined by the logical volume service process according to the correspondence between the log segment and checkpoint identification of each logical volume and the data block identification and the identifier of the data block where the checkpoint is located.
  • the storage location information may further include a data block service process identifier corresponding to the data block identifier, and the data block service process identifier may be obtained by the logical volume service process by querying any root server with the data block identifier.
  • the file system access process reads the log metadata of the log segment and the checkpoint according to the log segment and checkpoint storage location information to restore the data state of the target logical volume.
  • the file system access process performs data access processing based on the data state of the target logical volume.
  • the file system access process can access all the data that the target logical volume currently has.
  • the data writing process often includes a file preparation read stage and a data read stage, which will be described in the subsequent embodiment shown in FIG. 7b. Description.
  • FIG. 7a is a flowchart of an implementation process of step 604a in the embodiment shown in FIG. 6a. As shown in FIG. 7a, step 604a may specifically include the following specific steps:
  • the computing node In response to the file preparation read operation triggered by the data reading process, the computing node according to the log sequence number j, the third data block corresponding to the log sequence number j, and the third storage node that have been read last time by the data reading process, Send a data synchronization request to the third storage node to obtain the metadata of the cached log after the log sequence number j.
  • the computing node After receiving the log sequence number j sent by the third storage node, the computing node has cached the metadata and indication information of the cached log.
  • the log sequence number corresponding to the cached log is j + 1 to j + m, and the indication information is used to indicate the log. Whether the log segment corresponding to the sequence number j + m has been closed.
  • the computing node updates the data status of the target logical volume according to the metadata and the indication information of the cached log after the log sequence number j.
  • This preparation operation is called a file preparation read operation.
  • the file preparation read operation can be, for example, opening a certain file.
  • the process of preparing the file for reading is introduced first.
  • the data reading process can trigger the file read operation by calling the file read corresponding interface.
  • this embodiment first introduces the actual data read and write scenario: a computing node is constantly writing data to the target logical volume, and multiple computing nodes are continuously reading data from the target logical volume.
  • the purpose of reading is for big data analysis, data backup, etc.
  • multiple computing nodes that read data do not know when the computing nodes that write data will write data, they often start reading based on a certain mechanism, such as periodic reading.
  • periodic reading For any computing node that reads data, suppose he read the data in a log last time, and currently needs to continue reading the data written after the log. However, from the time the target logical volume is mounted to the current time, new data may have been written.
  • the compute node needs to know the current time, the target logical volume from the last read log to The location of each log segment contained in the current last log segment, that is, in which data block of which storage node.
  • the computing node may obtain the data block and the storage node where the last log segment is located in the foregoing embodiment directly by querying the target storage node and the root server.
  • the computing node since the computing node knows which log it read last time, the computing node can also obtain the location from the last read log to the current last log segment by using the method provided in this embodiment. The location of each included log segment.
  • the computing node may locally store a log sequence number j, a third data block corresponding to the log sequence number j, and a third storage node corresponding to the third data block that the data reading process has previously read. Therefore, the computing node may send a data synchronization request to the third storage node to obtain metadata of the cached log after the log sequence number j.
  • the third data block contains at least each log in the log segment to which the log sequence number j belongs, and may also include one or more log segments after the log segment to which the log sequence number j belongs, or other data in the third storage node.
  • the data block that is ranked after the third data block may also include one or more log segments after the log sequence number to which the log sequence number j belongs. Therefore, the third storage node can store the metadata of the locally cached log after the log sequence number j. Feedback to compute nodes.
  • the third storage node can determine the log Whether the log segment to which the sequence number j + m belongs is closed. If it is closed, it means that other data blocks in other storage nodes also include other logs located after the log sequence number after j + m. At this time, the third storage node also reports to the computer. The node feedback indicates that the log sequence number j + m belongs to the closed segment.
  • the third storage node also feeds back to the computing node an indication message indicating that the log sequence number j + m belongs to the log segment is not closed.
  • the computing node updates the data state of the target logical volume according to the metadata of the cached log at j + 1 to j + m and the indication information sent by the third storage node.
  • the computing node directly updates the target logical volume according to the metadata of the cached log with the log sequence number from j + 1 to j + m.
  • Data state that is, metadata of j + 1 to j + m cached logs are played back in memory.
  • the computing node determines, through the target storage node and the root server, each of the cached logs stored in the log sequence number j + m.
  • the metadata of the cached logs numbered j + 1 to j + m and the metadata of the cached logs after j + m update the data status of the target logical volume.
  • the process of determining, by the target storage node and the root server, the log sequence number j + m of each fourth data block in which the log is stored and the fourth storage node corresponding to each fourth data block is the same as in the foregoing embodiment.
  • the process of obtaining the first data block where the last log segment is located is similar to the first storage node, and is based on the correspondence between the log segment ID and data block ID maintained in the target storage node and the data block ID and storage maintained in the root server.
  • the correspondence between the node identifiers is implemented, and details are not described herein.
  • the computing node restores the latest state of the target logical volume to the current moment, the file preparation and reading phase before the real data is read is processed. At this time, based on the latest data state of the restored target logical volume, the data reading process can read Taking any of these data is not necessarily to continue reading from the last read log. Therefore, the data reading process can then proceed to the real data reading process, as follows:
  • the data reading request includes file information to be read, and the computing node determines a fifth storage corresponding to the file information to be read through the target storage node and the root server. The node and the fifth data block to read data from the fifth data block through the fifth storage node.
  • the file information to be read includes, for example, a file name, an offset of the data to be read in the file, and the like.
  • the computing node After the data state recovery process of the foregoing target logical volume, the computing node already knows the metadata information of each log, and the metadata of each log records the file information, including the file name, offset, file FID, etc.
  • the computing node can locate a log based on the file information in the data read request, so as to query the target storage node based on the log sequence number of the log, obtain the fifth data block where the log segment to which the log sequence number belongs, and The identification of the five data blocks queries the root server to obtain the corresponding fifth storage node, so that a read request for reading the corresponding data in the fifth data block can be sent to the fifth storage node to read the required data.
  • the computing node in response to the data read request sent by the data reading process, may perform steps 701a-703a again to obtain the log sequence number j from the last read to the current latest log sequence.
  • the process of log metadata between the serial number because the data read process triggers the file preparation read operation to trigger the data read request, there may be new logs written to the target logical volume.
  • FIG. 7a describes the data reading process from the perspective of the node of the computing node and the storage node. Based on the logical system architecture shown in FIG. 2 or FIG. 3, the following further describes the combination with the embodiment shown in FIG. 7b.
  • the computing node and the storage node have the internal service process shown in FIG. 2 or FIG. 3, the specific implementation process of the data reading process shown in FIG. 7a.
  • FIG. 7b is a flowchart of an implementation process of step 604b in the embodiment shown in FIG. 6b. As shown in FIG. 7b, step 604b may specifically include the following specific steps:
  • the file system access process In response to the file preparation read operation triggered by the data reading process, the file system access process according to the log sequence number j, the third data block and the third data corresponding to the log sequence number j, which the data reading process has read last time.
  • the block service process sends a data synchronization request to the third data block service process to obtain metadata of the cached log after the log sequence number j.
  • the file system access process After receiving the log sequence number j sent by the third data block service process, the file system access process has cached the metadata and the indication information of the log.
  • the corresponding log sequence number of the cached log is j + 1 to j + m, the indication information. Used to indicate whether the log segment corresponding to the log sequence number j + m has been closed.
  • the file system access process updates the data status of the target logical volume according to the metadata and the indication information of the cached log after the log sequence number j.
  • the file system access process updates the target logic according to the metadata of the cached log with the log sequence number from j + 1 to j + m The data status of the volume.
  • the file system access process determines the log sequence number j + m through the logical volume service process and the root server.
  • the file system access process sends a data synchronization request to the fourth data block service process to obtain the log sequence number after the log sequence number j + m has been cached
  • the metadata system the file system access process updates the data status of the target logical volume based on the metadata of the cached log with the log sequence number from j + 1 to j + m and the cached log after j + m.
  • the file system access process receives a data read request sent by the data read process, and the data read request includes file information to be read.
  • the file system access process determines the fifth data block service process and the fifth data block corresponding to the file information to be read through the logical volume service process and the root server, so as to extract the fifth data block from the fifth data block through the fifth data block service process. Read the data.
  • the embodiment of the present invention implements a log-based distributed file system sharing scheme.
  • the log sequence number is used as a standard for the status alignment of the read and write sides to avoid read and write conflicts and ensure read and write consistency. It can support one-write-multi-read data access methods, and multiple readers do not need to wait, reducing Delay in reading data.
  • each storage node synchronizes and maintains the log metadata information of each logical volume, which is equivalent to the distributed management of the log information of each logical volume. Any computing node can be maintained by any storage node.
  • the log metadata restores the data state of the corresponding logical volume for subsequent data access processing, which greatly improves the scalability of the file system.
  • the target storage node receives a mount request corresponding to the target logical volume sent by the computing node.
  • the compute node is any compute node in the file system.
  • the target storage node sends the log segment and checkpoint storage location information corresponding to the target logical volume to the computing node, so that the computing node restores the data state of the target logical volume based on the log segment and checkpoint storage location information for data access processing. .
  • the compute node reads the log metadata from the corresponding storage location based on the log segment and checkpoint storage location information, and plays back the read log metadata in memory to restore the data state of the target logical volume.
  • the target storage node receives the first data link sent by the computing node.
  • the first mount request includes a readable and writable mount mode identifier, a target logical volume identifier, and user information, wherein the readable and writable mount mode identifier corresponds to a data writing process in a computing node, meaning that The data writing process needs to mount the target logical volume in a write-read mode. In this mode, the target logical volume can only be occupied by one data writing process at a time.
  • the target storage node determines that the user information in the first mount request passes the user permission verification of the target logical volume and determines that the target logical volume is not currently mounted in a read-write mode, that is, it is occupied by other data writing processes.
  • the target storage node receives the second sent by the computing node.
  • the mount request includes a read-only mount mode identifier, the target logical volume identifier, and user information.
  • the read-only mount mode identifier corresponds to the data read process in the compute node, which means data read.
  • the process needs to mount the target logical volume in a read-only manner. In this mode, the target logical volume can be shared by multiple data reading processes.
  • the target storage node sends the log segment and checkpoint storage location information corresponding to the target logical volume to the computing node after determining that the user information in the second mount request passes the user permission verification of the target logical volume.
  • the target storage node In the process of data writing or data reading after the target logical volume is mounted, the target storage node also has the following functions:
  • the target storage node receives the query request sent by the computing node, and the query request is used to query the data block where the target log segment corresponding to the target logical volume is located;
  • the target storage node determines the identifier of the data block where the target log segment is located according to the correspondence between the log segment identifier and the data block identifier of the maintained target logical volume;
  • the target storage node sends the identification of the data block to the compute node.
  • a computing node when a computing node generates a certain log based on the trigger of the data writing process, such as the first log or the second log in the embodiment shown in FIG. 5a After the log, the compute node needs to store the currently generated log in the data block where the last log segment of the target logical volume is located. To this end, the compute node needs to determine the data block and the storage node where the data block is located through the target storage node. .
  • the target storage node needs to perform the following steps during the data writing process:
  • the target storage node receives a first query request sent by the computing node after generating a log, where the first query request is used to query a data block where the last log segment corresponding to the target logical volume is located;
  • the target storage node determines the identifier of the first data block where the last log segment is located according to the correspondence between the log segment identifier and the data block identifier of the maintained target logical volume;
  • the target storage node sends the identification of the first data block to the computing node for the computing node to query the root server according to the identification of the first data block to obtain the first storage node corresponding to the first data block and append the log to the first
  • the data block and the first storage node are requested to cache the metadata in the log.
  • the root server maintains the correspondence between the data block identifier and the storage node identifier.
  • the target storage node reads the data During the retrieval process, the following steps need to be performed:
  • the target storage node receives the second query request sent by the computing node.
  • the second query request is the log between the log sequence number j and the log sequence number j + m that has been read by the compute node during the data reading process. Issued later, the second query request is used to request the data block where the cached log is located after the log sequence number j + m;
  • the target storage node determines the identifier of each second data block in which the cached log is located after the log sequence number j + m according to the correspondence between the maintained log segment identifier and the data block identifier;
  • the target storage node sends the identification of each second data block to the computing node for the computing node to query the root server according to the identification of each second data block to obtain the second storage node corresponding to each second data block to obtain the second data block.
  • a garbage collection processing mechanism is deployed for each storage node.
  • a target storage node it can perform garbage collection processing through the following process:
  • the target storage node selects K log segments corresponding to the target logical volume and checkpoints corresponding to the K log segments to restore the corresponding data state;
  • the target storage node determines that the garbage ratio of M log segments in the K log segments reaches a preset threshold, the invalid logs in the M log segments are emptied, and the M log segments after the emptying process are written into the new data block. , And recover the original data blocks where the M log segments are located, and update the correspondence between the M log segments and the data blocks, where the new data blocks are obtained by the target storage node from the root server.
  • the target storage node may first mount the target logical volume in a garbage collection mode, and the target logical volume may be any one of many logical volumes.
  • Garbage collection mode means that log segments and checkpoints of the target logical volume can be rewritten.
  • the target storage node can select K log segments and K log segment corresponding checkpoints from each log segment and checkpoint corresponding to each log segment of the target logical volume to restore the corresponding data state, that is, playback in memory
  • the K metadata of the K log segments and checkpoints are used to recover the data state of the target logical volume corresponding to the K log segments.
  • the K log segments can be K consecutive log segments, but it is not limited to this.
  • the target storage node can traverse the metadata of all the logs of the K log segments one by one to identify the validity of each log. For example, if a file was recorded in a previous log, but the file was deleted in a later log, the previous log becomes invalid. For another example, if a previous log records a piece of data written in a file, but a later log records that the data was overwritten by the new data, the previous log is invalid.
  • the target storage node can determine the garbage proportion of each of the K log segments in turn, and the garbage proportion is positioned as the ratio of the number of invalid log entries in a log segment to the total log entries included in the log segment.
  • the target storage node may determine that all the log segments in the K log segments that exceed a preset threshold are taken as the current recycling target. For example, assuming that the log proportion of M log segments reaches the preset threshold, K ⁇ M ⁇ 1, it is determined that these M log segments are all used as recovery targets.
  • the target storage node may also determine the recovery target according to the following strategy: the target storage node sequentially traverses the metadata of all the logs contained in the K log segments, and continuously counts the invalid logs in the currently traversed logs, which have been traversed The ratio of the total number of logs is continuously counting the garbage ratio of the traversed logs until the log (called the end log) when the garbage ratio reaches a preset threshold or the last log in the log segment to which the log belongs, so that The target storage node may determine each log segment from the first log segment of the K log segments to the log segment to which the end log belongs as a recovery target, assuming a total of M log segments as the recovery target.
  • the target storage node determines that the garbage proportion of the M log segments in the K log segments reaches a preset threshold, the invalid logs in the M log segments are first emptied, that is, the metadata and data stored in the invalid logs are emptied. For example, fill the invalid log with null-op to reduce the storage space occupied by the invalid log; then apply for a new data block from the root server to write the cleared M log segments into the new data block and recycle M
  • the original data block where each log segment is located, and the correspondence between M log segments and data blocks is updated, that is, the correspondence between the IDs of the M log segments and the original data block is updated to the IDs of the M log segments and the new data. Correspondence between block identifiers. After the target storage node has updated the corresponding relationship, the updated new corresponding relationship may also be synchronized to other storage nodes.
  • FIG. 8b is a flowchart of another file system data access method according to an embodiment of the present invention.
  • the execution of the method provided by the embodiment of the present invention involves at least the logical volume service in the storage node in the architecture shown in FIG. 2 or FIG. 3 Process (VS) and garbage collection service process (GC).
  • VS logical volume service
  • GC garbage collection service process
  • the logical volume service process in the storage node receives a mount request corresponding to the target logical volume sent by a file system access process in the computing node.
  • the logical volume service process sends the first log segment and checkpoint storage location information corresponding to the target logical volume to the file system access process, so that the file system access process recovers the target logical volume based on the first log segment and checkpoint storage location information.
  • the data status is used for data access processing.
  • a business application in a computing node triggers a data writing process, that is, when a target logical volume needs to be mounted due to data writing
  • the process of accessing a file system in the computing node is triggered.
  • the logical volume service process in the storage node receives the first mount request sent by the file system access process.
  • the first mount request includes a read-write mount mode identifier, and the target logic. The identity and user information of the volume.
  • the logical volume service process In the subsequent data writing or data reading process of the file system access process after the target logical volume is mounted, the logical volume service process also has the following functions:
  • the logical volume service process determines the identity of the data block where the target log segment is located according to the correspondence between the maintained log segment ID and the data block ID;
  • the logical volume service process sends the identification of the data block to the file system access process.
  • the logical volume service process receives a first query request sent by the file system access process after generating a log.
  • the first query request is used to query a data block where the last log segment corresponding to the target logical volume is located.
  • the logical volume service process determines the identifier of the first data block where the last log segment is located according to the correspondence between the maintained log segment identifier and the data block identifier;
  • the logical volume service process sends the identification of the first data block to the file system access process for the file system access process to query the root server to obtain the first data block service process corresponding to the first data block and
  • the log is appended to the first data block and the first data block service process is requested to cache the metadata in the log.
  • the root server maintains the correspondence between the data block identifier and the storage node identifier.
  • the logical volume service process when the file system access process needs to determine the log block number j + m after the data block stored in the log is cached, the logical volume service process The following steps need to be performed during data reading:
  • the logical volume service process sends the identification of each second data block to the file system access process for the file system access process to query the root server to obtain the second data block service corresponding to each second data block according to the identification of each second data block.
  • the process obtains the metadata of the cached log with the log sequence number after j + m from each second data block and updates the data status of the target logical volume according to the metadata of the cached log with the log sequence number after j + m.
  • the embodiment of the present invention also provides a garbage collection mechanism to reorganize and reclaim the storage space of previously written data.
  • the garbage collection service process (GC) deployed in each storage node cooperates with other service processes to complete garbage collection. .
  • any storage node it can perform garbage collection processing through the following process:
  • the logical volume service process in the storage node receives a third mount request corresponding to the target logical volume sent by the garbage collection service process in the storage node;
  • the logical volume service process sends the second log segment and checkpoint storage location information corresponding to the target logical volume to the garbage collection service process, so that the garbage collection service process selects K log segments and checkpoints from the second log segment and checkpoint to Restore the corresponding data state and recover the original data block where M log segments are located when it is determined that the garbage proportion of the M log segments of the K log segments reaches a preset threshold, K ⁇ M ⁇ 1;
  • the logical volume service process receives the log segment update notification sent by the garbage collection service process, and the log segment update notification includes the correspondence between the M log segments and the new data blocks where the M log segments are located;
  • the logical volume service process updates the original correspondence between the M log segments and the data blocks as the correspondence.
  • file system data access device of one or more embodiments of the present invention will be described in detail below. Those skilled in the art can understand that all of these file system data access devices can be configured by using commercially available hardware components through the steps taught in this solution.
  • FIG. 9 is a schematic structural diagram of a file system data access device according to an embodiment of the present invention. As shown in FIG. 9, the device includes a sending module 11, a receiving module 12, a recovery module 13, and a processing module 14.
  • a sending module 11 is configured to send a mount request triggered by a target logical volume to a target storage node, where the target storage node is any one of the plurality of storage nodes, and the target logical volume corresponds to a plurality of storage nodes. At least part of the storage resources.
  • the receiving module 12 is configured to receive log segment and checkpoint storage location information corresponding to the target logical volume sent by the target storage node.
  • the recovery module 13 is configured to read the log metadata of the log segment and the checkpoint according to the log segment and checkpoint storage location information to restore the data state of the target logical volume.
  • the processing module 14 is configured to perform data access processing based on a data state of the target logical volume.
  • the device shown in FIG. 9 can execute the methods of the embodiments shown in FIG. 4a, FIG. 5a, FIG. 6a, and FIG. 7a.
  • the structure of the file system data access device may be implemented as a computing node, such as an application server, as shown in FIG. 10, the The computing node may include a processor 21 and a memory 22.
  • the memory 22 is configured to store a program that supports a file system data access device to execute the file system data access method provided in the embodiments shown in FIG. 4a, FIG. 5a, FIG. 6a, and FIG. 7a, and the processor 21 is configured Is used to execute a program stored in the memory 22.
  • the program includes one or more computer instructions, and when the one or more computer instructions are executed by the processor 21, the following steps can be implemented:
  • Target storage node is any one of a plurality of storage nodes included in the file system, and the target logical volume corresponds to at least one of the plurality of storage nodes
  • Data access processing is performed based on the data state of the target logical volume.
  • the processor 21 is further configured to execute all or part of the steps in the foregoing embodiments shown in Figs. 4a, 5a, 6a, and 7a.
  • the structure of the file system data access device may further include a communication interface 23, which is used for the user interface operation device to communicate with other devices or a communication network.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions used by a file system data access device, which includes a method embodiment for executing the method shown in FIG. 4a, FIG. 5a, FIG. 6a, and FIG. 7a. The procedures involved in the file system data access method.
  • FIG. 11 is a schematic structural diagram of another file system data access device according to an embodiment of the present invention. As shown in FIG. 11, the device includes a display module 11, an operation event processing module 12, and a rendering processing module 13.
  • the receiving module 31 is configured to receive a mount request corresponding to a target logical volume sent by a computing node.
  • the obtaining module 32 is configured to obtain log segment information and checkpoint storage location information corresponding to the target logical volume.
  • a sending module 33 configured to send the log segment and checkpoint storage location information corresponding to the target logical volume to the computing node, so that the computing node restores the target based on the log segment and checkpoint storage location information
  • the data state of the logical volume is used for data access processing; the target logical volume corresponds to at least part of the storage resources in the plurality of storage nodes.
  • the apparatus shown in FIG. 11 can execute the method in the embodiment shown in FIG. 8a.
  • the apparatus shown in FIG. 8a can execute the method in the embodiment shown in FIG. 8a.
  • the parts that are not described in detail in this embodiment reference may be made to the related description of the embodiment shown in FIG. 8a.
  • the structure of the file system data access device may be implemented as a storage node.
  • the storage node may include: a processor 41 And memory 42.
  • the memory 42 is configured to store a program that supports a file system data access device to execute the file system data access method provided in the embodiment shown in FIG. 8a, and the processor 41 is configured to execute the program in the memory 42. Stored programs.
  • the program includes one or more computer instructions, and when the one or more computer instructions are executed by the processor 41, the following steps can be implemented:
  • the target logical volume corresponds to at least a portion of storage resources of the plurality of storage nodes.
  • the processor 41 is further configured to execute all or part of the steps in the foregoing embodiment shown in FIG. 8a.
  • the structure of the file system data access device may further include a communication interface 43 for the user interface operation device to communicate with other equipment or a communication network.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions used by a file system data access device, which includes instructions for executing a file system data access method in the method embodiment shown in FIG. 8a. program.
  • the device embodiments described above are only schematic, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located One place, or it can be distributed across multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without creative labor.
  • each embodiment can be implemented by adding a necessary universal hardware platform, and of course, it can also be implemented by a combination of hardware and software.
  • the above-mentioned technical solution essentially or part that contributes to the existing technology may be embodied in the form of a computer product.
  • the present invention may use one or more computer-usable storages containing computer-usable program code therein.
  • the form of a computer program product implemented on a medium including but not limited to disk storage, CD-ROM, optical storage, etc.).
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes permanent and non-persistent, removable and non-removable media.
  • Information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供一种文件系统数据访问方法和文件系统,文件系统包括至少一个计算节点和多个存储节点,该方法包括:计算节点向目标存储节点发送针对目标逻辑卷触发的挂载请求;计算节点接收目标存储节点发送的目标逻辑卷对应的日志段和检查点存储位置信息;计算节点根据日志段和检查点存储位置信息读取日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态;计算节点基于目标逻辑卷的数据状态进行数据访问处理。当多个计算节点都针对目标逻辑卷进行访问时,基于各存储节点中所维护的目标逻辑卷的日志信息能够使得多个计算节点保持良好的读写一致性,且可以同步进行读写,避免了读写冲突。

Description

文件系统数据访问方法和文件系统
本申请要求2018年06月01日递交的申请号为201810558071.1、发明名称为“文件系统数据访问方法和文件系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及互联网技术领域,尤其涉及一种文件系统数据访问方法和文件系统。
背景技术
计算机通过文件系统管理、存储数据,在处于信息爆炸时代的当今,各种各样的网络业务所产生的数据量成指数倍的增长,为满足海量数据的存储容量、数据备份、数据安全等需求,分布式文件系统应运而生,简单来说,分布式文件系统将传统的固定于某个地点的某个文件系统,扩展到任意多个地点的多个文件系统,由众多的存储节点组成一个文件系统网络。从而,用户在使用分布式文件系统时,无需关心数据将被存储到哪个存储节点上、或者数据是从哪个存储节点上获取的,只需要像使用本地文件系统一样管理和存储数据。
在一些情况下,用户希望多台机器(相对于存储节点,可以称为计算节点)间能够共享文件系统,常见的场景是一台机器进行数据写入并服务用户的访问请求,而其他机器通过共享文件系统能实时读取写入的最新数据,以便进行数据分析或备份等工作。
随着计算机网络技术的发展,网络技术的应用也越来越多,基于网络的共享文件系统也得到了大量使用。目前,通过采用网络文件系统(Network File System,简称NFS)、通用网络文件系统(Common Internet File System,简称CIFS)等技术来实现文件系统的共享。
但是,常用的NFS/CIFS网络共享文件系统存在性能和扩展性方面的瓶颈,主要体现为:为了支持多个客户端(计算节点中的应用客户端)同时读写数据的访问一致性,通常在文件系统中会部署独立的协调服务器,通过提供复杂的锁竞争机制来协调所有访问请求。从而,任一客户端需要进行数据读写前都需要向协调服务器请求获取锁,当数据访问量增加时,协调服务器容易造成瓶颈,使得系统可扩展性较差。而且,当多个客户端需要读写同一文件数据时还会产生锁竞争,使得没有得到锁的客户端必须等待其他 客户端完成数据读写后才能进行数据读写,这会导致数据读写的延迟增大,即读写冲突严重。
发明内容
有鉴于此,本发明实施例提供一种文件系统数据访问方法和文件系统,用以提高文件系统的数据访问性能。
第一方面,本发明实施例提供一种文件系统数据访问方法,由计算节点执行,所述文件系统包括至少一个计算节点和多个存储节点,所述方法包括:
计算节点向目标存储节点发送针对目标逻辑卷触发的挂载请求,所述目标存储节点为所述多个存储节点中的任一个,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
所述计算节点接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;
所述计算节点根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;
所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理。
第二方面,本发明实施例提供一种文件系统数据访问方法,具体由计算节点中的文件系统访问进程执行,所述文件系统包括至少一个计算节点和多个存储节点,该方法包括:
响应于计算节点中的数据访问进程针对目标逻辑卷触发的挂载请求,所述计算节点中的文件系统访问进程通过存储节点中的逻辑卷服务进程获取所述目标逻辑卷对应的日志段和检查点存储位置信息;
所述文件系统访问进程根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;
所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理。
第三方面,本发明实施例提供一种计算节点,包括处理器和存储器,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行时实现上述第一方面或第二方面中的文件系统数据访问方法。该电子设备还可以包括通信接口,用于与其他设备或通信网络通信。
本发明实施例提供了一种计算机存储介质,用于储存存储计算机程序,所述计算机 程序使计算机执行时实现上述第一方面或第二方面中的文件系统数据访问方法。
第四方面,本发明实施例提供一种文件系统数据访问方法,由存储节点执行,所述文件系统包括至少一个计算节点和多个存储节点,该方法包括:
目标存储节点接收计算节点发送的与目标逻辑卷对应的挂载请求,所述目标存储节点为所述多个存储节点中的任一个;
所述目标存储节点将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,以使所述计算节点基于所述日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;
所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源。
第五方面,本发明实施例提供一种文件系统数据访问方法,由存储节点中的逻辑卷服务进程执行,所述文件系统包括至少一个计算节点和多个存储节点,该方法包括:
存储节点中的逻辑卷服务进程接收计算节点中的文件系统访问进程发送的与目标逻辑卷对应的挂载请求;
所述逻辑卷服务进程将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程,以使所述文件系统访问进程基于所述第一日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;
所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源,所述存储节点是所述多个存储节点中的任一个。
第六方面,本发明实施例提供一种存储节点,包括处理器和存储器,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行时实现上述第四方面或第五方面中的文件系统数据访问方法。该电子设备还可以包括通信接口,用于与其他设备或通信网络通信。
本发明实施例提供了一种计算机存储介质,用于储存存储计算机程序,所述计算机程序使计算机执行时实现上述第四方面或第五方面中的文件系统数据访问方法。
第七方面,本发明实施例提供一种文件系统,包括:
至少一个计算节点、多个存储节点以及用于管理所述多个存储节点的多个根服务器;
其中,所述至少一个计算节点中的任一计算节点,用于向所述多个存储节点中的任一存储节点发送针对目标逻辑卷触发的挂载请求;接收所述任一存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;基于所述目 标逻辑卷的数据状态进行数据访问处理;所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
所述任一存储节点,用于获取所述目标逻辑卷对应的日志段和检查点存储位置信息,将所述日志段和检查点存储位置信息发送至所述任一计算节点。
第八方面,本发明实施例提供一种文件系统,包括:
至少一个计算节点、多个存储节点以及用于管理所述多个存储节点中的数据块服务进程的多个根服务器;
其中,每个计算节点中具有数据访问进程和文件系统访问进程;
每个存储节点中具有逻辑卷服务进程和数据块服务进程;所述数据块服务进程用于对相应存储节点中存储的各数据块进行读写管理;
所述文件系统访问进程,响应于相应计算节点中的数据访问进程触发的挂载操作,向目标存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,接收所述逻辑卷服务进程发送的所述目标逻辑卷对应的第一日志段和检查点存储位置信息,根据所述第一日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态,基于所述目标逻辑卷的数据状态进行数据访问处理,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
所述逻辑卷服务进程,用于接收所述文件系统访问进程发送的所述挂载请求,将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程。
本发明实施例提供的文件系统数据访问方法和文件系统,该文件系统包括至少一个计算节点、多个存储节点以及用于管理多个存储节点的多个根服务器,其中,数据在文件系统中以数据块为单位进行存储,每个存储节点对该存储节点中存储的各数据块进行读写管理。多个计算节点中的不同业务应用或同一计算节点中的不同业务应用可以共享多个存储节点的资源,即不同业务应用可以在该多个存储节点的基础上构建对应的逻辑卷,即每个逻辑卷对应于多个存储节点的至少部分资源。另外,每个存储节点维护有共享该文件系统的多个逻辑卷的日志信息,比如可以通过在每个存储节点中部署逻辑卷服务进程为维护该日志信息。基于该文件系统,当任一计算节点中的任一业务应用由于需要进行数据访问而向任一存储节点称为目标存储节点触发了对某个逻辑卷称为目标逻辑卷的挂载请求时,该目标存储节点获取该目标逻辑卷对应的日志段列表和检查点列表,日志段列表中存储有该目标逻辑卷的各个日志的元数据,从而,该计算节点可以根据获取的日志段列表和检查点列表在内存中恢复目标逻辑卷的最新数据状态,从而使得计算 节点能够基于目标逻辑卷的最新数据状态进行数据访问处理比如进行数据写入或数据读取。在由上述计算节点和存储节点构成的分布式文件系统中,当多个计算节点都在针对目标逻辑卷进行访问时,比如多个计算节点需要读写该目标逻辑卷对应的数据时,基于各存储节点中所维护的目标逻辑卷的日志信息能够使得多个计算节点保持良好的读写一致性,且可以同步进行读写,避免了读写冲突。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种文件系统的架构图;
图2为本发明实施例提供的另一种文件系统的架构图;
图3为与图2所示文件系统对应的逻辑分层架构图;
图4a为本发明实施例提供的一种文件系统数据访问方法的流程图;
图4b为本发明实施例提供的另一种文件系统数据访问方法的流程图;
图5a为图4a所示实施例中步骤404a的一种实现过程的流程图;
图5b为图4b所示实施例中步骤404b的一种实现过程的流程图;
图6a为本发明实施例提供的又一种文件系统数据访问方法的流程图;
图6b为本发明实施例提供的还一种文件系统数据访问方法的流程图;
图7a为图6a所示实施例中步骤604a的一种实现过程的流程图;
图7b为图6b所示实施例中步骤604b的一种实现过程的流程图;
图8a为本发明实施例提供的再一种文件系统数据访问方法的流程图;
图8b为本发明实施例提供的还一种文件系统数据访问方法的流程图;
图9为本发明实施例提供的文件系统数据访问装置的结构示意图;
图10为与图9所示实施例提供的文件系统数据访问装置对应的计算节点的结构示意图;
图11为本发明实施例提供的另一文件系统数据访问装置的结构示意图;
图12为与图11所示实施例提供的文件系统数据访问装置对应的存储节点的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义,“多种”一般包含至少两种,但是不排除包含至少一种的情况。
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应当理解,尽管在本发明实施例中可能采用术语第一、第二、第三等来描述XXX,但这些XXX不应限于这些术语。这些术语仅用来将XXX区分开。例如,在不脱离本发明实施例范围的情况下,第一XXX也可以被称为第二XXX,类似地,第二XXX也可以被称为第一XXX。
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。
另外,下述各方法实施例中的步骤时序仅为一种举例,而非严格限定。
图1为本发明实施例提供的一种文件系统的架构图,该文件系统为分布式文件系统, 如图1所示,该文件系统中包括:至少一个计算节点、多个存储节点以及用于管理多个存储节点的多个根服务器(Root Server,简称RS)。
其中,计算节点往往对应于应用服务器,其中部署有一种或多种业务应用。
多个存储节点往往对应于部署在不同地理位置区域的存储服务器,多个计算节点可以共享该多个存储节点的存储资源。在每个存储节点中,数据是以数据块(chunk)为单位进行数据读写处理的,每个数据块被预设为一定容量大小。
多个计算节点对该文件系统的共享主要体现为:不同用户可以在该文件系统的多个存储节点提供的存储资源的基础上,为各自对应的计算节点中的业务应用创建逻辑卷(volume),逻辑卷可以认为是在物理的文件系统的基础上创建的逻辑文件系统。不同逻辑卷对应于多个存储节点中的不同存储资源,即不同逻辑卷所占用的存储资源不同。一般地,每个逻辑卷对应于多个存储节点中的部分存储资源,但是,当该多个存储节点仅供某个业务应用使用时,该业务应用对应的逻辑卷可以对应于多个存储节点的全部存储资源,因此,可以认为每个逻辑卷对应于多个存储节点中的至少部分存储资源。
多个RS为互备份关系,其中存储的内容是相同的,存储的内容主要是数据块标识与存储节点标识间的对应关系,即每个RS知道每个存储节点上存储了哪些数据块。在实际应用中,对于每个存储节点来说,实际是由其中运行的数据块服务进程来进行该存储节点的数据读写处理的,而且数据块服务进程与存储节点一一对应,因此,也可以认为根服务器维护了数据块标识和数据块服务进程标识间的对应关系。
任一根服务器对多个存储节点的管理,除了体现为上述维护数据块标识和存储节点标识/数据块服务进程标识间的对应关系外,实际应用中,任一根服务器中还可以记录有该文件系统的所有目标和文件空间以及每个文件的数据块列表等信息,而且,还可以根据各存储节点的存储负载进行存储节点的分配调度等。
实际应用中,假设上述至少一个计算节点中的任一计算节点中的某个业务应用需要进行数据访问比如写入新的业务数据或者读取业务数据时,会启动数据访问进程以触发针对相应逻辑卷(称为目标逻辑卷)的数据访问过程,该数据访问进程可以是数据写入进程也可以是数据读取进程。
此时,在针对目标逻辑卷的数据访问过程中:
该任一计算节点,用于向所述多个存储节点中的任一存储节点(以下称为目标存储节点)发送针对目标逻辑卷触发的挂载请求;接收该目标存储节点发送的目标逻辑卷对应的日志段和检查点存储位置信息;根据日志段和检查点存储位置信息读取日志段和检 查点的日志的元数据以恢复目标逻辑卷的数据状态;基于目标逻辑卷的数据状态进行数据访问处理。
所述目标存储节点,用于获取目标逻辑卷对应的日志段和检查点存储位置信息,将日志段和检查点存储位置信息发送至所述任一计算节点。
其中,目标逻辑卷会对应有多个日志段和检查点,日志段和检查点具有对应关系,比如日志段[1-1000]后产生一个检查点(1000)。每个日志段中会包含多条日志,日志由数据和元数据组成,但是有些日志中可能仅包含元数据。简单来说,该元数据中记录了对目标逻辑卷对应的各文件进行数据访问的相关信息,比如什么时间由谁对哪个文件中的数据进行了读取访问。元数据中除了包含上述数据访问的相关信息,还包括日志序列号,基于该日志序列号可以保证数据的读写一致性,将在后续实施例中说明。
另外,每个日志段对应有一个检查点,用于在文件系统出现异常时能够基于检查点进行数据恢复。也正是为了基于日志快速恢复文件系统的异常,日志是以日志段的形式组织的。
本发明实施例中,由于上述多个存储节点可以供多个业务应用共享,对于任一业务应用来说,该多个存储节点会被视为一个整体而不做具体区分,因此,当上述任一计算节点基于其中的某个业务应用的触发而需要发出针对相应的目标逻辑卷的挂载请求时,该任一计算节点可以向任一存储节点发送该挂载请求,以请求挂载目标逻辑卷。基于此,每个存储节点中都会维护有各个逻辑卷对应的日志段和检查点存储位置信息,该存储位置信息描述了各日志段、检查点位于哪个数据块中。
可以理解的是,如果当前接收挂载请求的存储节点由于负载压力大等原因无法响应该挂载请求,该存储节点可以将该挂载请求转发给其他存储节点来处理。
当目标存储节点接收到上述计算节点发送的挂载请求时,根据该挂载请求中携带的目标逻辑卷的标识知道请求挂载的是哪个逻辑卷,从而,从本地维护的各逻辑卷的日志段和检查点存储位置信息中获取该目标逻辑卷对应的日志段和检查点存储位置信息并反馈给该计算节点。该计算节点继而根据获得的日志段和检查点存储位置信息从相应存储位置读取出相应的日志元数据以恢复目标逻辑卷的数据状态,具体地,该计算节点通过在内存中回放该日志段和检查点日志元数据来恢复目标逻辑卷的数据状态,该数据状态反映了目标逻辑卷的最新数据状态,即反映了截止当前时刻为止目标逻辑卷对应的各个文件中数据的最新状态,基于此,计算节点可以进行后续的数据访问处理,比如数据写入、数据读取处理,该数据访问处理的过程将在后续方法实施例中详细说明,本实施例 中暂不详细展开。
值得说明的是,上述目标存储节点对日志段和检查点存储位置信息的维护可以体现为其维护有每个日志段、检查点的标识与存储节点的数据块标识之间的对应关系,比如记录有日志段[0-2000]在chunk1上,日志段[2001-2200]在chunk2上,检查点(2000)在chunk7上。从而,在一可选实施例中,上述目标存储节点也可以基于该对应关系从各存储节点的相应数据块中获取对应的日志段和检查点的元数据反馈给计算节点。
值得说明的是,上述目标存储节点向计算节点发送的目标逻辑卷对应的日志段和检查点可以但不限于是目标逻辑卷当前的最后日志段和最后一个检查点的存储位置信息,因为一般情况下,基于上述日志段和检查点的举例,当要恢复目标逻辑卷的最新数据状态时,只需读取检查点(2000),并回放日志段[2001-2200]中各日志的元数据即可,因为检查点(2000)记录了目标逻辑卷在日志段[2001-2200]之前的数据状态。
另外,值得说明的是,计算节点要访问目标逻辑卷,首先需要将目标逻辑卷挂载在本地才能进行后续访问。只是,本发明实施例中,在目标逻辑卷挂载的过程中,基于各个存储节点中维护的各个逻辑卷的日志段和检查点信息,恢复了目标逻辑卷的最新数据状态,以使得当多个计算节点来访问该目标逻辑卷时,每个计算节点都能获得目标逻辑卷的最新数据状态,以保证读写一致性。
上述图1所示的架构是站在硬件实体节点的角度进行的介绍,实际上,为支持多个计算节点共享多个存储节点的存储资源,在每个计算节点和每个存储节点中都会部署相关的软件模块,该软件模块可以体现为各种服务进程,因此,下面结合图2所示实施例,进一步介绍下为本发明实施例提供的文件系统的内部详细架构。
如图2所示,该文件系统包括:至少一个计算节点(图2中示意的host1和host2)、多个存储节点以及用于管理多个存储节点中的数据块服务进程(Chunk Server,简称CS)的多个根服务器(RS)。
其中,每个计算节点中具有数据访问进程和文件系统访问进程。
每个存储节点中具有逻辑卷服务进程(Volume Server,简称VS)和CS,CS用于对相应存储节点中存储的各数据块(chunk)进行读写管理。
实际应用中,每个计算节点中的数据访问进程可以是图2中所示意的数据写入进程(Writer)或者数据读取进程(Reader)。该数据访问进程往往是由计算节点中的业务应用启动的。
每个计算节点中的文件系统访问进程是访问对应的目标逻辑卷的接入口,实际应用 中,可选地,该文件系统访问进程可以是图2中基于用户态文件系统(File system in User space,简称FUSE)实现的用户态文件系统访问进程(fuse worker),其中,VFS、/dev/fuse、fuse worker是构成Fuse的组成模块。值得说明的是,该文件系统访问进程可以不依赖Fuse架构实现,也可以像Lustre/GPFS等文件系统一样在操作系统内核中实现fuse worker所起到的相关功能。
另外,在一可选实施例中,如图2所示,每个存储节点中还包含垃圾回收服务进程(Garbage Collector,简称GC)。
综上,每个存储节点中可以部署有VS、GC和CS这几类服务进程。
另外,值得说明的是,基于上述计算节点和存储节点以及根服务器的组成,本发明实施例提供的文件系统可以看作是具有逻辑上的分层结构,图3中更加直观地示意出该分层结构,其中,图3中示意的VS、GC、CS的个数仅是为了示意该分层结构,并非体现为与存储节点间的对应关系,因为在这几个服务进程与存储节点一般是一一对应的。由图3可以看出:该文件系统可以看作是包括顶层、中层和下层三个层次构成。
其中,底层为数据块服务(chunk service)层,向上提供分布式的只能追加(append-only)方式的chunk读写服务。用户可以创建chunk,指定副本份数,并以append-only方式写入数据,并能实时一致性读取已经写入成功的数据,并且可以指定部分chunk数据缓存在内存或高速介质中来加速数据的读取。其中,该chunk service层主要由两个角色构成:其一是CS,用于管理相应存储节点中所有的chunk信息并提供chunk的读写服务;其二是RS,用于管理所有的CS即所有存储节点中的CS,可以维护CS标识与chunk标识间的对应关系。
中间层为卷服务(volume service)层,向上提供创建/删除/挂载/卸载逻辑卷的服务,它由两个角色构成:其一是VS,用于管理每个逻辑卷的元数据,主要包括维护了每个逻辑卷的日志段和加速日志恢复的检查点所在存储位置的信息;其二是GC,用于回收文件系统的垃圾空间,并定期生产检查点加快文件系统的恢复过程。
上层是文件系统接入层,它由两个部分构成:其一是数据访问进程(比如Writer/Reader)通过调用标准的Posix接口访问文件数据;其二是包含文件系统访问进程fuse worker的用户态文件系统Fuse,基于Fuse框架,提供fuse worker进行用户态文件系统的管理和读写。
综上,在图2和图3所示的文件系统架构下,当任一计算节点中的某业务应用启动了数据访问进程需要访问该业务应用对应的目标逻辑卷时会触发挂载操作,此时:
该计算节点中的文件系统访问进程响应于该计算节点中的数据访问进程触发的挂载操作,向任一存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,以及接收该逻辑卷服务进程发送的目标逻辑卷对应的第一日志段和检查点存储位置信息,从而根据第一日志段和检查点存储位置信息读取第一日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态,以基于目标逻辑卷的数据状态进行数据访问处理;
所述逻辑卷服务进程,用于接收文件系统访问进程发送的挂载请求,将目标逻辑卷对应的第一日志段和检查点存储位置信息发送至文件系统访问进程。
其中,可选地,第一日志段和检查点可以是目标逻辑卷当前的最后日志段和最后一个检查点。
由此可以看出,每个存储节点中都会部署逻辑卷服务进程,由该逻辑卷服务进程来维护各个逻辑卷对应的日志段和检查点存储位置信息。该逻辑卷服务进程在存储节点启动时即可被启动,另外,不同存储节点的逻辑卷服务进程为互备份关系,以保证每个存储节点的逻辑卷服务进程都维护的是全部逻辑卷的完备的日志段和检查点存储位置信息。
在上述分布式的文件系统中,当多个计算节点都在针对目标逻辑卷进行访问时,比如多个计算节点需要读写该目标逻辑卷对应的数据时,基于各存储节点中所维护的目标逻辑卷的日志信息能够使得多个计算节点保持良好的读写一致性,且可以同步进行读写,避免了读写冲突。
下面结合图1-图3所示的文件系统架构,具体介绍下该文件系统的数据访问方法。
下面先结合图4a至图5b所示实施例对数据写入的完整流程进行介绍。
图4a为本发明实施例提供的一种文件系统数据访问方法的流程图,本发明实施例提供的该方法可以由图1所示架构下的计算节点来执行。如图4a所示,该方法包括如下步骤:
401a、响应于数据写入进程触发的挂载操作,计算节点向目标存储节点发送第一挂载请求,第一挂载请求中包括目标逻辑卷的标识、用户信息和可读写挂载方式标识。
由图1所示实施例可知,该目标存储节点是多个存储节点中的任一个。
本实施例介绍的是某计算节点中的业务应用需要进行数据写入处理的情况,因此,该业务应用会启动该计算节点中的数据写入进程。由前述本实施例的介绍可知,当需要访问业务应用对应的目标逻辑卷时,首先需要挂载该目标逻辑卷到计算节点本地,因此,本实施例先介绍目标逻辑卷在数据写入场景下的挂载过程。
计算节点中的数据写入进程被启动后,该数据写入进程会触发挂载操作,该挂载操作可以是数据写入进程向计算节点的处理器发送挂载通知,基于该挂载操作,计算节点会向任一存储节点称为目标存储节点发送包括目标逻辑卷的标识、用户信息和可读写挂载方式标识的第一挂载请求。
其中,用户信息可以是用户账号、用户姓名等信息;可读写挂载方式标识对应于数据写入进程,即本发明实施例中允许数据写入进程以可读写(write-read)方式挂载逻辑卷,该可读写挂载方式意味着对于同一目标逻辑卷,同时只能允许一个数据写入进程进行数据写入。
402a、计算节点接收目标存储节点在确定用户信息通过目标逻辑卷的用户权限验证以及确定目标逻辑卷当前并未被以可读写挂载方式挂载后发送的目标逻辑卷对应的日志段和检查点存储位置信息。
目标存储节点在接收到计算节点的第一挂载请求后,基于其中的可读写挂载方式标识知道此时是某个数据写入进程需要访问目标逻辑卷,此时,目标存储节点一方面基于第一挂载请求中的用户信息验证用户是否具有访问目标逻辑卷的权限,另一方面要确定该目标逻辑卷当前是否被其他数据写入进程占用即当前是否被以可读写挂载方式挂载。
实际应用中,可选地,每个存储节点中可以维护各逻辑卷的用户权限信息,比如用户白名单,以用于实现用户权限验证。另外,各存储节点彼此同步有各逻辑卷的挂载状态信息,即各逻辑卷被以什么挂载方式挂载或未被挂载的相关信息,基于此,可以实现目标逻辑卷当前是否被以可读写挂载方式挂载的确定。
当目标存储节点确定用户通过权限验证并确定目标逻辑卷当前并未被以可读写方式挂载时,可以向计算节点反馈目标逻辑卷对应的日志段和检查点存储位置信息。
403a、计算节点根据日志段和检查点存储位置信息读取日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态。
计算节点根据上述存储位置信息从相应的存储位置处读取到各日志,具体是读取日志中的元数据,从而在内存中依次回放各元数据,直到具有最后的日志序列号i的日志,以恢复目标逻辑卷的最新数据状态。
404a、计算节点基于目标逻辑卷的数据状态进行数据访问处理。
计算节点在恢复目标逻辑卷的最新数据状态后,可以进行后续的数据写入过程,该数据写入过程往往包括文件准备写阶段和数据写入阶段,将在后续图5a所示实施例中说明。
前述图4a所示实施例是以计算节点和存储节点的节点或者说设备的角色进行说明的,下面基于图2或图3所示的逻辑系统架构,结合图4b所示实施例介绍本发明实施例提供的文件系统数据访问方法的实现过程。
图4b为本发明实施例提供的另一种文件系统数据访问方法的流程图,本发明实施例提供的该方法可以由图2或图3所示架构下的计算节点中的文件系统访问进程(fuse worker)来执行。如图4b所示,该方法包括如下步骤:
401b、响应于计算节点中的数据写入进程触发的挂载操作,计算节点中的文件系统访问进程向存储节点中的逻辑卷服务进程发送第一挂载请求,第一挂载请求中包括目标逻辑卷的标识、用户信息和可读写挂载方式标识。
本实施例中,上述计算节点是文件系统中多个计算节点中的任一个,上述存储节点是多个存储节点中的任一个。为了实现计算节点对其对应的逻辑卷的访问,在计算节点中会部署有文件系统访问进程,比如图2或图3中示意的fuse worker,但不以此为限。
另外,在每个存储节点中会部署有逻辑卷服务进程以及数据块服务进程,其中,数据块服务进程用于对相应存储节点中的数据块进行读写管理,逻辑卷服务进程维护了各逻辑卷的日志段标识/检查点标识与数据块标识间的对应关系,辅助文件系统访问进程进行数据读写处理。
实际应用中,计算节点中的某业务应用触发了数据写入进程想要对目标逻辑卷进行数据写入处理时,该数据写入进程触发挂载操作,比如向该计算节点中的文件系统访问进程发送上述第一挂载请求,文件系统访问进程进而将该第一挂载请求发送至上述逻辑卷服务进程。其中,可选地,文件系统访问进程与逻辑卷服务进程之间可以基于进程间通信机制进行交互。
由前述实施例中的介绍可知,该第一挂载请求是用于请求逻辑卷服务进程以可读写(write-read)挂载方式挂载目标逻辑卷,该可读写挂载方式意味着对于同一目标逻辑卷,同时只能允许一个数据写入进程进行数据写入。
402b、文件系统访问进程接收逻辑卷服务进程在确定用户信息通过目标逻辑卷的用户权限验证以及确定目标逻辑卷当前并未被以可读写挂载方式挂载后发送的目标逻辑卷对应的日志段和检查点存储位置信息。
由于每个存储节点中的逻辑卷服务进程是互备份关系,每个逻辑卷服务进程中都维护有各逻辑卷的日志段标识和检查点标识与数据块标识间的对应关系,从而,接收到上述第一挂载请求的逻辑卷服务进程基于其中的目标逻辑卷的标识和该对应关系,获取目 标逻辑卷对应的日志段和检查点的存储位置信息,即日志段和检查点所对应的数据块标识。
由前述实施例的说明可知,此时可选地,逻辑卷服务进程可以将目标逻辑卷当前的最后日志段和最后的检查点分别对应的数据块标识反馈给文件系统访问进程,以使文件系统访问进程基于该最后的日志段和最后的检查点恢复目标逻辑卷的最新数据状态。
403b、文件系统访问进程根据日志段和检查点存储位置信息读取日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态。
可选地,文件系统访问进程可以通过以从逻辑卷服务进程获得的数据块标识查询任一根服务器,以获得该数据块标识所对应的数据块服务进程标识,从而通过该数据块服务进程标识所对应的数据块服务进程读取该数据块标识所对应的数据块中存储的日志的元数据。
或者,可选地,上述逻辑卷服务进程也可以基于确定出的上述数据块标识查询根服务器,以获得该数据块标识所对应的数据块服务进程标识,以将该数据块标识和数据块服务进程标识反馈给文件系统访问进程,从而文件系统访问进程通过该数据块服务进程标识所对应的数据块服务进程读取该数据块标识所对应的数据块中存储的日志的元数据。
404b、文件系统访问进程基于目标逻辑卷的数据状态进行数据访问处理。
文件系统访问进程在恢复目标逻辑卷的最新数据状态后,可以进行后续的数据写入过程,该数据写入过程往往包括文件准备写阶段和数据写入阶段,将在后续图5b所示实施例中说明。
图5a为图4a所示实施例中步骤404a的一种实现过程的流程图,如图5a所示,步骤404a可以具体包括如下具体步骤:
501a、响应于数据写入进程触发的文件准备写操作,计算节点生成仅包括元数据的第一日志,第一日志中包括递增当前最后的日志序列号i后的日志序列号i+1和与文件准备写操作对应的文件信息。
在数据写入进程真正进行数据写入之前,往往需要先进行一些准备操作,该准备操作称为文件准备写操作,实际应用中,该文件准备写操作比如可以是打开某个文件,或者是创建并打开某个文件。本实施例中先介绍该文件准备写操作的处理过程。
数据写入进程可以通过调用文件准备写对应的接口以触发上述文件准备写操作。此 时,计算节点会生成记录该文件准备写操作对应的日志记录称为第一日志,由于此时数据写入进程还未写入真正的数据,所以该第一日志中仅包括元数据,该元数据中包括递增当前最后的日志序列号i后的日志序列号i+1和与文件准备写操作对应的文件信息。其中,该文件信息比如包括创建或打开的文件的文件名以及文件标识描述符(File Identifier Descriptor,简称FID)等信息,假设在图4a所示实施例的目标逻辑卷的日志段中最后的日志序列号为i,则该第一日志的日志序列号递增加一为i+1。
502a、计算节点通过目标存储节点和根服务器确定目标逻辑卷对应的最后日志段所在的第一数据块和第一数据块所对应的第一存储节点。
前述实施例中提过,在存储节点中,数据是以数据块(chunk)为单位进行存储的,而且,数据是以只能追加(append-only)方式写入的,而真正写入的数据是包含在日志中的,因此,计算节点所生成的日志是要被以append-only方式存入到某个存储节点的某个数据块中的,所以需要确定目标逻辑卷的最后日志段所在的存储位置,即最后日志段所在的数据块以及该数据块所对应的存储节点。
如前述所说,本发明实施例中,每个存储节点中维护有各逻辑卷的日志段和检查点的存储位置信息,该存储位置信息主要是日志段标识/检查点标识与数据块标识间的对应关系,因此,计算节点可以通过目标存储节点获得目标逻辑卷的最后日志段所述的数据块标识,基于该数据块标识,计算节点可以进一步查询根服务器,以通过根服务器获得与该数据块标识对应的存储节点标识,其中,根服务器中维护有各存储节点标识和各数据块标识间的对应关系。
因此,计算节点可以通过如下过程获得上述最后日志段所在的第一数据块和第一数据块所对应的第一存储节点:
计算节点向目标存储节点发送第一查询请求,用于查询取目标逻辑卷对应的最后日志段所在的数据块;
计算节点接收目标存储节点发送的第一数据块的标识,第一数据块的标识是目标存储节点根据维护的目标逻辑卷的日志段标识和数据块标识间的对应关系确定出的;
计算节点向根服务器发送第二查询请求,用于查询取第一数据块的标识对应的存储节点;
计算节点接收根服务器发送的第一存储节点的标识,第一存储节点的标识是根服务器根据维护的数据块标识和存储节点标识间的对应关系确定出的。
503a、计算节点向第一存储节点发送日志存入请求,以请求将第一日志追加到第一 数据块中以及请求第一存储节点缓存所述第一日志。
计算节点在确定出上述第一数据块和第一存储节点后,可以将第一日志发送至第一存储节点,以使得第一存储节点将第一日志追加到第一数据块中,并在内存中缓存该第一日志,以便后续数据读取进程reader能够快速读取到该第一日志。
第一存储节点存储成功后,可以向计算节点反馈确认通知,以告知计算节点存储成功了。
上述步骤介绍的是数据写入前的文件准备写操作阶段的处理过程,在该文件准备写过程完成后,数据写入进程可以进行真正的数据写入过程,如后续下述步骤所示:
504a、响应于数据写入进程发送的数据写入请求,计算节点生成第二日志,数据写入请求中包括写入数据和写入数据对应的文件信息,第二日志中包括元数据和写入数据,元数据包括递增当前最后的日志序列号i+1后的日志序列号i+2和文件信息。
505a、计算节点通过目标存储节点和根服务器确定目标逻辑卷对应的最后日志段所在的第二数据块和第二数据块所对应的第二存储节点。
506a、计算节点向第二存储节点发送日志存入请求,以请求将第二日志追加到第二数据块中以及请求第二存储节点缓存所述第二日志中的元数据。
数据写入进程可以通过调用标准的数据写入接口以触发上述数据写入请求,该数据写入请求中既包括需要写入的具体数据,还包括写入数据所对应的文件信息,该文件信息比如包括文件FID、写入数据在文件中的写入位置(offset)和数据长度(length)、数据位置指针等等。
此时,计算节点会生成第二日志,用于记录计算节点的该数据写入操作。该第二日志由元数据和数据组成,其中,数据即为计算节点的写入数据,元数据包括递增当前最后的日志序列号i+1后的日志序列号i+2和上述文件信息。进而,计算节点需要将生成的第二日志存储到存储节点中,具体地,需要将该第二日志写入到目标逻辑卷当前的最后日志段所在的数据块中。因此,计算节点需要通过目标存储节点和根服务器确定最后日志段所在的第二数据块以及第二数据块所对应的第二存储节点,以将第二日志发送至第二存储节点,令第二存储节点将该第二日志追加到第二数据块中并在内存中缓存第二日志的元数据。其中,第二数据块和第二存储节点的确定过程可以参考第一数据块和第一存储节点的确定过程,在此不赘述。
值得说明的是,第二数据块与第一数据块很有可能是同一数据块,从而,第二存储节点与第一存储节点也可能是同一存储节点。但是,当最后日志段在存入第二日志前已 经满足封闭条件时,第二数据块与第一数据块将不同,但是第二存储节点有可能与第一存储节点相同。
具体地,若最后日志段当前已经达到封闭条件,则计算节点封闭最后日志段,生成新的检查点,并向目标存储节点申请新的日志段作为最后日志段。其中,封闭条件为最后日志段中包含的日志条数达到预设条数,或者,最后日志段对应的数据量达到预设容量值。
因为在实际应用中,检查点是用于在文件系统发生异常时加速恢复文件系统的,并且一个日志段后会生成一个检查点,因此,日志段不易过大,具体体现为日志段包含的日志条数或日志中总数据量的大小不易过大,否则将影响文件系统恢复的速率。举例来说,每个存储节点中记录了每个chunk上存储了哪些日志段和检查点,比如日志段[0-2000]在chunk1上,检查点(2000)在chunk7上,日志段[2001-2200]在chunk2上。假设在日志段[2001-2200]后某时刻文件系统异常,此时只需读取检查点(2000),并回放日志段[2001-2200]即可恢复相应逻辑卷的最新状态。
本实施例中,假设追加上述第二日志后当前的最后日志段为[1500-2500],即第二日志的日志序列号为2500,并假设该最后日志段当前已经达到封闭条件,则计算节点封闭该日志段[1500-2500],生成检查点(2500),向目标存储节点申请新的日志段作为最后日志段,新的日志段为从日志序列号2501开始。
申请到的新日志段是对应于后续生成的日志的,而该新日志段中后续存入的日志是需要被存入到某个数据块中的,因此,目标存储节点在接收到计算节点的申请后,需要为新日志段分配相应的数据块。具体地,由于根服务器是用于管理所有存储节点的,包含对各存储节点进行调度与分配,因此,目标存储节点可以请求根服务器分配一个数据块用于存储新日志段后续对应的日志。根服务器可以根据各存储节点的存储负载指派某个存储节点A分配该数据块,假设该存储节点A分配chunk5来存储新日志段后续对应的日志,则根服务器中维护有该存储节点A与chunk5的对应关系,同时,根服务器将该chunk5反馈给目标存储节点,目标存储节点维护新日志段与chunk5的对应关系,目标存储节点可以将该对应关系同步给其他存储节点。
前述图5a所示实施例是以计算节点和存储节点的节点角度进行说明的,下面基于图2或图3所示的逻辑系统架构,结合图5b所示实施例进一步说明当计算节点和存储节点具有图2或图3所示的内部服务进程组成时,图5a所示的数据写入过程的具体实现过程。
图5b为图4b所示实施例中步骤404b的一种实现过程的流程图,如图5b所示,步骤404b可以具体包括如下具体步骤:
501b、响应于数据写入进程触发的文件准备写操作,文件系统访问进程生成仅包括元数据的第一日志,第一日志中包括递增当前最后的日志序列号i后的日志序列号i+1和与文件准备写请求对应的文件信息。
502b、文件系统访问进程通过逻辑卷服务进程和根服务器确定目标逻辑卷对应的最后日志段所在的第一数据块和第一数据块所对应的第一数据块服务进程。
其中,根服务器用于管理各存储节点中的数据块服务进程,具体是维护各存储节点中的数据块标识与数据块服务进程标识间的对应关系。
具体地,文件系统访问进程可以通过如下步骤确定上述第一数据块和第一数据块所对应的第一数据块服务进程:
文件系统访问进程向逻辑卷服务进程发送第一查询请求,用于查询取目标逻辑卷对应的最后日志段所在的数据块;
文件系统访问进程接收逻辑卷服务进程发送的第一数据块的标识,第一数据块的标识是逻辑卷服务进程根据维护的目标逻辑卷的日志段标识和数据块标识间的对应关系确定出的;
文件系统访问进程向根服务器发送第二查询请求,用于查询取第一数据块的标识对应的数据块服务进程;
文件系统访问进程接收根服务器发送的第一数据块服务进程的标识,第一数据块服务进程的标识是根服务器根据维护的数据块标识和数据块服务进程标识间的对应关系确定出的。
503b、文件系统访问进程向第一数据块服务进程发送日志存入请求,以请求将第一日志追加到第一数据块中以及请求第一数据块服务进程缓存第一日志。
上述三个步骤对应于文件准备写阶段,其中未尽的内容可以参考图5a所示实施例中的介绍,在此不赘述。在文件准备写阶段完成后,数据写入进程进而可以执行如下步骤的数据写入过程。
504b、响应于数据写入进程发送的数据写入请求,数据写入请求中包括写入数据和写入数据对应的文件信息,文件系统访问进程生成第二日志,第二日志中包括元数据和写入数据,元数据包括递增当前最后的日志序列号i+1后的日志序列号i+2和文件信息。
505b、文件系统访问进程通过逻辑卷服务进程和根服务器确定目标逻辑卷对应的最 后日志段所在的第二数据块和第二数据块所对应的第二数据块服务进程。
第二数据块和第二数据块所对应的第二数据块服务进程的确定可以参见第一数据块和第一数据块服务进程的确定过程,在此不赘述。
506b、文件系统访问进程向第二数据块服务进程发送日志存入请求,以请求将第二日志追加到第二数据块中以及请求第二数据块服务进程缓存第二日志中的元数据。
上述数据写入过程中未尽的内容可以参考图5a所示实施例中的介绍,在此不赘述。
值得说明的是,上述文件准备写过程和数据写入过程中产生的日志都是以追加的方式追加到最后日志段中的,但是,当最后日志段达到封闭条件时,需要封闭当前的最后日志段,申请新的日志段作为最后的日志段。因此,若当前的最后日志段达到封闭条件,则文件系统访问进程封闭该最后日志段,生成新的检查点,并向逻辑卷服务进程申请新的日志段作为最后日志段。其中,封闭条件为最后日志段中包含的日志条数达到预设条数,或者,最后日志段对应的数据量达到预设容量值。
逻辑卷服务进程在接收到文件系统访问进程的申请时,可选地,可以一方面生成新的日志段,一方面向根服务器申请为新日志段分配相应的数据块和数据块服务进程。
综上,在数据写入的整个过程中,一方面,在数据写入之前,基于各存储节点维护的逻辑卷的元数据信息即日志段和检查点的存储位置信息,可以使得计算节点恢复目标逻辑卷的最新状态,另一方面,在数据写入过程中,将生成记录数据写入过程执行的任何操作和写入的数据的日志,并按照序号编号各日志,将各日志以append-only方式追加到当前的最后日志段中,以便能够基于日志序列号实现读写一致性。
此外,在数据写入过程中,将元数据和数据一次提交落盘即在同一日志中包含元数据和数据并将日志存入到存储节点的数据块中,并且还将元数据缓存在内存中,可以使得后续数据读取进程能够快速批量拉取最新日志更新本地的文件系统镜像状态即恢复目标逻辑卷的最新数据状态,降低读取时延。
下面结合图6a至图7b所示实施例对数据读取的完整流程进行介绍。
图6a为本发明实施例提供的又一种文件系统数据访问方法的流程图,本发明实施例提供的该方法可以由图1所示架构下的计算节点来执行。如图6a所示,可以包括如下步骤:
601a、响应于数据读取进程触发的挂载操作,计算节点向目标存储节点发送第二挂载请求,第二挂载请求中包括目标逻辑卷的标识、用户信息和只读挂载方式标识。
由图1所示实施例可知,该目标存储节点是多个存储节点中的任一个。
本实施例介绍的是某计算节点中的业务应用需要进行数据读取处理的情况,因此,该业务应用会启动该计算节点中的数据读取进程。由前述本实施例的介绍可知,当需要访问业务应用对应的目标逻辑卷时,首先需要挂载该目标逻辑卷到计算节点本地,因此,本实施例先介绍目标逻辑卷在数据读取场景下的挂载过程。
计算节点中的数据读取进程被启动后,该数据读取进程会触发挂载操作,该挂载操作可以是数据读取进程向计算节点的处理器发送挂载通知,基于该挂载操作,计算节点会向任一存储节点称为目标存储节点发送包括目标逻辑卷的标识、用户信息和只读挂载方式标识的第二挂载请求。
其中,用户信息可以是用户账号、用户姓名等信息;只读挂载方式标识对应于数据读取进程,即本发明实施例中允许数据读取进程以只读(read-only)方式挂载逻辑卷,该只读挂载方式意味着对于同一目标逻辑卷,允许多个数据读取进程进行数据读取,但是不能进行数据修改、写入等操作。
602a、计算节点接收目标存储节点在确定用户信息通过目标逻辑卷的用户权限验证后发送的目标逻辑卷的日志段和检查点存储位置信息。
与数据写入过程不同,在数据读取过程中,对于目标逻辑卷的挂载仅需要满足用户权限的验证即可,因为只读挂载方式允许多个数据读取进程同时读取数据。
603a、计算节点根据日志段和检查点存储位置信息读取日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态。
604a、计算节点基于目标逻辑卷的数据状态进行数据访问处理。
上述步骤中的未尽描述可以参考图4a所示实施例,在此不赘述。通过该挂载过程,计算节点可以访问目标逻辑卷当前具有的所有数据。
计算节点在恢复目标逻辑卷的最新数据状态后,可以进行后续的数据读取过程,该数据读取过程往往包括文件准备读阶段和数据读取阶段,将在后续图7a所示实施例中说明。
前述图6a所示实施例是以计算节点和存储节点的节点角度对本发明实施例提供的文件系统数据访问方法的实现过程进行说明的,下面基于图2或图3所示的逻辑系统架构,结合图6b所示实施例介绍该文件系统数据访问方法的另一种具体实现过程。
图6b为本发明实施例提供的还一种文件系统数据访问方法的流程图,本发明实施例提供的该方法可以由图2或图3所示架构下的计算节点中的文件系统访问进程(fuse  worker)来执行。如图6b所示,该方法包括如下步骤:
601b、响应于计算节点中的数据读取进程触发的挂载操作,计算节点中的文件系统访问进程向存储节点中的逻辑卷服务进程发送第二挂载请求,第二挂载请求中包括目标逻辑卷的标识、用户信息和只读挂载方式标识。
本实施例介绍的是某计算节点中的业务应用需要进行数据读取处理的情况,因此,该业务应用会启动该计算节点中的数据读取进程。该数据读取进程首先会触发针对目标逻辑卷的挂载操作,比如向文件系统访问进程发送上述第二挂载请求,以将目标逻辑卷挂载到本地以进行后续的数据访问。
602b、文件系统访问进程接收逻辑卷服务进程在确定用户信息通过目标逻辑卷的用户权限验证后发送的目标逻辑卷对应的日志段和检查点存储位置信息。
本发明实施例中,允许多个数据读取进程同时对同一目标逻辑卷进行数据读取。因此,逻辑卷服务进程在验证用户具有使用目标逻辑卷的权限时即可允许其以只读挂载方式挂载目标逻辑卷。同时,逻辑卷服务进程将目标逻辑卷的日志段和检查点的存储位置信息反馈给文件系统访问进程。其中,该存储位置信息可以是逻辑卷服务进程根据维护的各逻辑卷的日志段和检查点标识与数据块标识间的对应关系确定的目标逻辑卷的日志段和检查点所在的数据块的标识,或者,该存储位置信息还可以进一步包括该数据块标识对应的数据块服务进程标识,该数据块服务进程标识可以是逻辑卷服务进程通过以上述数据块标识查询任一根服务器而获得的。
603b、文件系统访问进程根据日志段和检查点存储位置信息读取日志段和检查点的日志元数据以恢复目标逻辑卷的数据状态。
604b、文件系统访问进程基于目标逻辑卷的数据状态进行数据访问处理。
上述步骤中的未尽描述可以参考前述实施例中的介绍,在此不赘述。通过该挂载过程,文件系统访问进程可以访问目标逻辑卷当前具有的所有数据。
文件系统访问进程在恢复目标逻辑卷的最新数据状态后,可以进行后续的数据读取过程,该数据写入过程往往包括文件准备读阶段和数据读取阶段,将在后续图7b所示实施例中说明。
图7a为图6a所示实施例中步骤604a的一种实现过程的流程图,如图7a所示,步骤604a可以具体包括如下具体步骤:
701a、响应于数据读取进程触发的文件准备读操作,计算节点根据数据读取进程上 次已读取到的日志序列号j、日志序列号j对应的第三数据块和第三存储节点,向第三存储节点发送数据同步请求,以获取日志序列号j后已缓存日志的元数据。
702a、计算节点接收第三存储节点发送的日志序列号j后已缓存日志的元数据和指示信息,已缓存日志所对应的日志序列号为j+1到j+m,指示信息用于指示日志序列号j+m所对应的日志段是否已经被封闭。
703a、计算节点根据日志序列号j后已缓存日志的元数据和指示信息更新目标逻辑卷的数据状态。
在数据读取进程真正进行数据读取之前,往往需要先进行一些准备操作,该准备操作称为文件准备读操作,实际应用中,该文件准备读操作比如可以是打开某个文件。本实施例中先介绍该文件准备读操作的处理过程。
数据读取进程可以通过调用文件准备读对应的接口以触发上述文件准备读操作。
为方便理解步骤701a的执行,本实施例先介绍实际的数据读写场景:某个计算节点在不断向目标逻辑卷中写入数据,另外的多个计算节点在不断读取目标逻辑卷的数据,读取的目的比如为进行大数据分析、数据备份等等。但是由于读取数据的多个计算节点不知道写入数据的计算节点何时会写入数据,因此往往是基于一定的机制开始进行读取的,比如周期性读取。对于读取数据的任一计算节点来说,假设他上次读取到某条日志中的数据,当前需要继续读取该条日志之后被写入的数据。但是,自目标逻辑卷挂载完成到当前时刻的过程中,可能又有新的数据已经被写入。因此此时,为了能够读取到自上次读取到的日志开始的后续已经存在的所有日志中的数据,计算节点需要知道当前时刻,目标逻辑卷的自其上次读取到的日志到当前的最后日志段所包含的各日志段所在的位置,即在哪个存储节点的哪个数据块中。
此时,可选地,计算节点可以如前述实施例中获取最后日志段所在的数据块和存储节点的过程一样,直接通过查询目标存储节点和根服务器来获得。
但是,可选地,由于计算节点知道其上次读取到了哪条日志,因此,计算节点也可以通过本实施例提供的方法获取自其上次读取到的日志到当前的最后日志段所包含的各日志段所在的位置。
计算节点本地可以存储有其数据读取进程上次已读取到的日志序列号j、日志序列号j对应的第三数据块和第三数据块所对应的第三存储节点。从而,计算节点可以向第三存储节点发送数据同步请求,以获取日志序列号j后已缓存日志的元数据。
承接前述实施例中的数据写入过程的写入结果,假设目标逻辑卷当前已经存入的最 后日志的日志序列号为i+2,则可以理解的是,j<(i+2)。
第三数据块中至少包含日志序列号j所属日志段中的各条日志,也有可能还会包含日志序列号j所属日志段后的一个或多个日志段,或者,第三存储节点中的其他排在第三数据块之后的数据块中也可能包含日志序列号j所属日志段后的一个或多个日志段,因此,第三存储节点可以将日志序列号j后本地已缓存日志的元数据反馈给计算节点。
另外,由前述实施例中提到的:如果某个日志段满足封闭条件,则该日志段会被封闭。因此,假设第三存储节点中存储有日志序列号j+1到j+m的日志,其中,m>1,(j+m)≤(i+2)),则第三存储节点可以确定日志序列号j+m所属的日志段是否封闭,如果封闭,说明在其他存储节点的其他数据块中还包含位于日志序列号位于j+m之后的其他日志,此时,第三存储节点还向计算节点反馈指示日志序列号j+m所属的日志段已经封闭的指示消息;相反地,如果没有封闭,说明目标逻辑卷最后的日志序列号即为j+m,日志序列号j+m所属的日志段并未封闭,因此此时,第三存储节点还向计算节点反馈指示日志序列号j+m所属的日志段并未封闭的指示消息。
从而,计算节点根据第三存储节点发送的日志序列号在j+1到j+m的已缓存日志的元数据和该指示信息更新目标逻辑卷的数据状态。
具体地,若该指示信息指示日志序列号j+m所对应的日志段没有被封闭,则计算节点直接根据日志序列号在j+1到j+m的已缓存日志的元数据更新目标逻辑卷的数据状态,即在内存中回放j+1到j+m的已缓存日志的元数据。
相反地,若该指示信息指示日志序列号j+m所对应的日志段已经被封闭,则计算节点通过目标存储节点和根服务器确定日志序列号j+m之后已缓存日志所存储于的各第四数据块和各第四数据块各自对应的第四存储节点;计算节点向第四存储节点发送数据同步请求,以获取日志序列号j+m后已缓存日志的元数据;计算节点根据日志序列号在j+1到j+m的已缓存日志的元数据以及j+m后的已缓存日志的元数据更新目标逻辑卷的数据状态。
其中,通过目标存储节点和根服务器确定日志序列号j+m之后已缓存日志所存储于的各第四数据块和各第四数据块各自对应的第四存储节点的过程,与前述实施例中获得最后日志段所在的第一数据块和第一存储节点的过程类似,都是基于目标存储节点中维护的日志段标识与数据块标识间的对应关系以及根服务器中维护的数据块标识与存储节点标识间的对应关系实现的,在此不赘述。
计算节点在恢复出目标逻辑卷的到当前时刻的最新状态后,真正数据读取前的文件 准备读阶段处理完成,此时,基于恢复的目标逻辑卷的最新数据状态,数据读取进程可以读取其中的任一数据,并不一定是接着上次读到的日志继续读,从而,数据读取进程接下来可以进行真正的数据读取过程,如下述步骤:
704a、响应于数据读取进程发送的数据读取请求,数据读取请求中包括待读取的文件信息,计算节点通过目标存储节点和根服务器确定与待读取的文件信息对应的第五存储节点和第五数据块,以通过第五存储节点从第五数据块中读取数据。
其中,待读取的文件信息比如包括文件名称、需要读取的数据在文件中的位置(offset)等。由于经过前述的目标逻辑卷的数据状态恢复过程,计算节点已经知道各日志的元数据信息,而每个日志的元数据中记录有文件信息,包括文件名、offset、文件FID等等,因此,计算节点可以基于数据读取请求中的文件信息定位到某个日志,从而基于该日志的日志序列号查询目标存储节点,获得该日志序列号所属的日志段所在的第五数据块,并以第五数据块的标识查询根服务器,获得对应的第五存储节点,从而可以向第五存储节点发送用于读取第五数据块中相应数据的读取请求,读取到所需的数据。
在一可选实施例中,响应于数据读取进程发送的数据读取请求,计算节点还可以再次执行步骤701a-703a中获取自上次读取到的日志序列号j到当前最新的日志序列号之间的日志元数据的过程,因为在数据读取进程触发文件准备读操作到触发数据读取请求之间,可能还有新的日志写入到目标逻辑卷中。
前述图7a所示实施例是以计算节点和存储节点的节点角度对数据读取过程进行说明的,下面基于图2或图3所示的逻辑系统架构,结合图7b所示实施例进一步说明当计算节点和存储节点具有图2或图3所示的内部服务进程组成时,图7a所示的数据读取过程的具体实现过程。
图7b为图6b所示实施例中步骤604b的一种实现过程的流程图,如图7b所示,步骤604b可以具体包括如下具体步骤:
701b、响应于数据读取进程触发的文件准备读操作,文件系统访问进程根据数据读取进程上次已读取到的日志序列号j、日志序列号j对应的第三数据块和第三数据块服务进程,向第三数据块服务进程发送数据同步请求,以获取日志序列号j后已缓存日志的元数据。
702b、文件系统访问进程接收第三数据块服务进程发送的日志序列号j后已缓存日志的元数据和指示信息,已缓存日志所对应的日志序列号为j+1到j+m,指示信息用于 指示日志序列号j+m所对应的日志段是否已经被封闭。
703b、文件系统访问进程根据日志序列号j后已缓存日志的元数据和指示信息更新目标逻辑卷的数据状态。
可选地,若指示信息指示日志序列号j+m所对应的日志段没有被封闭,则文件系统访问进程根据日志序列号在j+1到j+m的已缓存日志的元数据更新目标逻辑卷的数据状态。
可选地,若指示信息指示日志序列号j+m所对应的日志段已经被封闭,则文件系统访问进程通过逻辑卷服务进程和根服务器确定日志序列号j+m之后已缓存日志所存储于的各第四数据块和各第四数据块各自对应的第四数据块服务进程;文件系统访问进程向第四数据块服务进程发送数据同步请求,以获取日志序列号j+m后已缓存日志的元数据;文件系统访问进程根据日志序列号在j+1到j+m的已缓存日志的元数据以及j+m后的已缓存日志的元数据更新目标逻辑卷的数据状态。
704b、文件系统访问进程接收数据读取进程发送的数据读取请求,数据读取请求中包括待读取的文件信息。
705b、文件系统访问进程通过逻辑卷服务进程和根服务器确定与待读取的文件信息对应的第五数据块服务进程和第五数据块,以通过第五数据块服务进程从第五数据块中读取数据。
上述数据读取过程中未尽的内容可以参考图7a等所示实施例中的相关介绍,在此不赘述。
综上,结合整个数据写入和数据读取过程,本发明实施例实现了基于日志的分布式的文件系统的共享方案。具体地,使用日志序列号作为读写双方状态对齐的标准,避免了读写冲突,保证了读写一致性,可以支持一写多读的数据访问方式,多个读取方不需要等待,降低了读取数据的延迟。另外,每个存储节点中都同步地维护有各逻辑卷的日志元数据信息,相当于实现了对各逻辑卷的日志信息的分布式管理,任一计算节点可以通过任一存储节点中维护的日志元数据恢复相应逻辑卷的数据状态以进行后续的数据访问处理,使得文件系统的扩展性得到很大的提高。
图8a为本发明实施例提供的再一种文件系统数据访问方法的流程图,本发明实施例提供的该方法可以由图1所示架构下的目标存储节点来执行,该目标存储节点为文件系统的多个存储节点中的任一个。如图8a所示,该方法包括如下步骤:
801a、目标存储节点接收计算节点发送的与目标逻辑卷对应的挂载请求。
该计算节点是文件系统中的任一计算节点。
802a、目标存储节点将目标逻辑卷对应的日志段和检查点存储位置信息发送至计算节点,以使计算节点基于日志段和检查点存储位置信息恢复目标逻辑卷的数据状态用于进行数据访问处理。
计算节点基于日志段和检查点存储位置信息从相应的存储位置读取到日志元数据,在内存中回放读取到的日志元数据以恢复目标逻辑卷的数据状态。
下面结合如下两个可选实施例对数据写入和数据读取过程中目标存储节点的具体处理过程进行说明。
在一可选实施例中,当计算节点中的某业务应用触发了数据写入过程,即由于需要进行数据写入而需要挂载目标逻辑卷时,目标存储节点接收计算节点发送的第一挂载请求,该第一挂载请求中包括可读写挂载方式标识、目标逻辑卷的标识和用户信息,其中,可读写挂载方式标识与计算节点中的数据写入进程对应,意味着数据写入进程需要以可读写(write-read)方式挂载目标逻辑卷,该方式下,在某个时候目标逻辑卷只能允许被一个数据写入进程占用。此时,目标存储节点在确定第一挂载请求中的用户信息通过目标逻辑卷的用户权限验证以及确定目标逻辑卷当前并未被以可读写方式挂载即被其他数据写入进程占用后发送目标逻辑卷对应的日志段和检查点存储位置信息至计算节点。
在另一可选实施例中,当计算节点中的某业务应用触发了数据读取过程,即由于需要进行数据读取而需要挂载目标逻辑卷时,目标存储节点接收计算节点发送的第二挂载请求,第二挂载请求中包括只读挂载方式标识、目标逻辑卷的标识和用户信息,该只读挂载方式标识与计算节点中的数据读取进程对应,意味着数据读取进程需要以只读(read-only)方式挂载目标逻辑卷,该方式下,目标逻辑卷可以被多个数据读取进程共同占用。此时,目标存储节点在确定第二挂载请求中的用户信息通过目标逻辑卷的用户权限验证后发送目标逻辑卷对应的日志段和检查点存储位置信息至计算节点。
在目标逻辑卷挂载后的数据写入或数据读取过程中,该目标存储节点还具有如下作用:
目标存储节点接收计算节点发送的查询请求,查询请求用于查询目标逻辑卷对应的目标日志段所在的数据块;
目标存储节点根据维护的目标逻辑卷的日志段标识和数据块标识间的对应关系,确定目标日志段所在的数据块的标识;
目标存储节点将数据块的标识发送至计算节点。
比如,与图5a所示实施例中介绍的数据写入过程相对应的,当计算节点基于数据写入进程的触发而生成某条日志比如图5a所示实施例中的第一日志或第二日志后,计算节点需要将当前生成的日志存入到目标逻辑卷的最后日志段所在的数据块中,为此,计算节点需要通过该目标存储节点确定该数据块以及该数据块所在的存储节点。
基于此,目标存储节点在数据写入过程中还需要执行如下步骤:
目标存储节点接收计算节点在生成日志后发送的第一查询请求,第一查询请求用于查询目标逻辑卷对应的最后日志段所在的数据块;
目标存储节点根据维护的目标逻辑卷的日志段标识和数据块标识间的对应关系,确定最后日志段所在的第一数据块的标识;
目标存储节点将第一数据块的标识发送至计算节点,以供计算节点根据第一数据块的标识查询根服务器以获得第一数据块对应的第一存储节点以及将所述日志追加到第一数据块中以及请求第一存储节点缓存该日志中的元数据。
其中,根服务器中维护有数据块标识和存储节点标识间的对应关系。
再比如,与图7a所示实施例中介绍的数据读取过程相对应的,当计算节点需要确定日志序列号j+m之后已缓存日志所存储于的数据块时,目标存储节点在数据读取过程中还需要执行如下步骤:
目标存储节点接收计算节点发送的第二查询请求,第二查询请求是在计算节点在获取到数据读取进程自上次已读取到的日志序列号j到日志序列号j+m间的日志后发出的,第二查询请求用于请求日志序列号j+m之后的已缓存日志所在的数据块;
目标存储节点根据维护的日志段标识和数据块标识间的对应关系,确定日志序列号j+m之后的已缓存日志所在的各第二数据块的标识;
目标存储节点将各第二数据块的标识发送至计算节点,以供计算节点根据各第二数据块的标识查询根服务器获得各第二数据块各自对应的第二存储节点以从各第二数据块中获取日志序列号在j+m后的已缓存日志的元数据以及根据日志序列号在j+m后的已缓存日志的元数据更新目标逻辑卷的数据状态。
上述是目标存储节点在数据写入和数据读取过程中可能需要执行的一些处理,除此之外,本发明实施例中还为每个存储节点部署了垃圾回收处理机制。
因为逻辑卷的数据写入是以日志递增的方式写入,即append-only方式写入,那么,当某条日志中的数据被修改,该条日志中的数据和元数据就变成了垃圾,即该条日志变 为了无效日志。因此,提供了垃圾回收机制以重新整理和回收之前写入数据的存储空间。
具体地,以目标存储节点来说,其可以通过如下过程进行垃圾回收处理:
目标存储节点选取目标逻辑卷对应的K个日志段和所述K个日志段对应的检查点以恢复对应的数据状态;
目标存储节点在确定K个日志段中的M个日志段的垃圾比例达到预设阈值时将M个日志段中的无效日志清空,将经过清空处理后的M个日志段写入新数据块中,并回收M个日志段所在的原数据块,以及更新M个日志段与数据块的对应关系,其中,新数据块是目标存储节点从根服务器中申请获得的。
实际应用中,在进行垃圾回收处理前,目标存储节点可以先以垃圾回收模式挂载目标逻辑卷,该目标逻辑卷可以是众多逻辑卷中的任一个。垃圾回收模式意味着可以对目标逻辑卷的日志段和检查点进行重写。
此时,目标存储节点可以从目标逻辑卷的各日志段和各日志段对应的检查点中选取K个日志段和K个日志段对应的检查点以恢复对应的数据状态,即在内存中回放这K个日志段和检查点的日志元数据,以恢复目标逻辑卷与这K个日志段对应的数据状态。其中,这K个日志段可以是连续的K个日志段,但是并不以此为限。
之后,目标存储节点可以逐条遍历这K个日志段的所有日志的元数据,识别每条日志的有效性。举例来说,如果前面某条日志中记录了创建了某个文件,但是在后面某条日志中记录了该文件被删除了,那么前面的这条日志就变为无效的。再比如,如果之前的某日志中记录了在某文件中写入了一段数据,但是后面的某条日志中记录了该段数据被新数据覆盖了,那么前面的这条日志就是无效的。
从而,目标存储节点可以依次确定这K个日志段中各日志段的垃圾比例,该垃圾比例定位于一个日志段中无效日志条数与该日志段中包含的总日志条数的比值。
可选地,目标存储节点可以确定这K个日志段中所有垃圾比例超过了预设阈值的日志段均作为当前的回收目标,比如假设其中有M个日志段的垃圾比例达到预设阈值,K≥M≥1,则确定这M个日志段都作为回收目标。
可选地,由于垃圾回收过程也会占用目标存储节点的处理资源,比如内存资源,因此,目标存储节点一次回收的日志段个数可以被控制下。基于此,可选地,目标存储节点也可以按照如下策略确定回收目标:目标存储节点依次遍历K个日志段包含的全部日志的元数据,并不断统计当前遍历了的日志中无效日志占已经遍历的总日志数的比值即不断统计已经遍历过的日志的垃圾比例,直到遍历到使垃圾比值达到预设阈值时的日志 (称为结束日志)或该日志所属日志段的最后一条日志为止,从而目标存储节点可以确定从K个日志段的第一个日志段到该结束日志所属的日志段间的各日志段作为回收目标,假设共M个日志段作为回收目标。
从而,当目标存储节点在确定K个日志段中的M个日志段的垃圾比例达到预设阈值时,先将M个日志段中的无效日志清空,即清空无效日志中存储的元数据和数据,比如以null-op填充无效日志,以降低无效日志所占的存储空间;然后向根服务器申请新数据块,以将经过清空处理后的M个日志段写入新数据块中,并回收M个日志段所在的原数据块,以及更新M个日志段与数据块的对应关系,即将M个日志段的标识与原数据块的标识间的对应关系更新为M个日志段的标识与新数据块的标识间的对应关系。目标存储节点更新完该对应关系后,还可以将更新后的新对应关系同步给其他存储节点。
前述图8a所示实施例是以存储节点的节点角度对存储节点在文件系统数据访问过程中涉及到的处理过程进行了说明,下面基于图2或图3所示的逻辑系统架构,结合图8b所示实施例介绍该存储节点在上述处理过程中的一种具体实现过程。
图8b为本发明实施例提供的还一种文件系统数据访问方法的流程图,本发明实施例提供的该方法的执行至少涉及图2或图3所示架构下的存储节点中的逻辑卷服务进程(VS)和垃圾回收服务进程(GC)。如图8b所示,该方法包括如下步骤:
801b、存储节点中的逻辑卷服务进程接收计算节点中的文件系统访问进程发送的与目标逻辑卷对应的挂载请求。
801b、逻辑卷服务进程将目标逻辑卷对应的第一日志段和检查点存储位置信息发送至文件系统访问进程,以使文件系统访问进程基于第一日志段和检查点存储位置信息恢复目标逻辑卷的数据状态用于进行数据访问处理。
下面结合如下两个可选实施例对数据写入和数据读取过程中存储节点中的逻辑卷服务进程的具体处理过程进行说明。
在一可选实施例中,当计算节点中的某业务应用触发了数据写入过程,即由于需要进行数据写入而需要挂载目标逻辑卷时,向该计算节点中的文件系统访问进程触发针对目标逻辑卷的挂载操作,此时,存储节点中的逻辑卷服务进程接收文件系统访问进程发送的第一挂载请求,第一挂载请求中包括可读写挂载方式标识、目标逻辑卷的标识和用户信息。从而,逻辑卷服务进程在确定用户信息通过目标逻辑卷的用户权限验证以及确定目标逻辑卷当前并未被以可读写挂载方式挂载后发送目标逻辑卷对应的第一日志段和检查点存储位置信息至文件系统访问进程。
在另一可选实施例中,当计算节点中的某业务应用触发了数据读取过程,即由于需要进行数据读取而需要挂载目标逻辑卷时,向该计算节点中的文件系统访问进程触发针对目标逻辑卷的挂载操作,此时,存储节点中的逻辑卷服务进程接收文件系统访问进程发送的第二挂载请求,第二挂载请求中包括只读挂载方式标识、目标逻辑卷的标识和用户信息。从而,逻辑卷服务进程在确定用户信息通过目标逻辑卷的用户权限验证后发送目标逻辑卷对应的第一日志段和检查点存储位置信息至文件系统访问进程。
在目标逻辑卷挂载后的文件系统访问进程后续的数据写入或数据读取过程中,该逻辑卷服务进程还具有如下作用:
逻辑卷服务进程接收文件系统访问进程发送的查询请求,该查询请求用于查询目标逻辑卷对应的目标日志段所在的数据块;
逻辑卷服务进程根据维护的日志段标识和数据块标识间的对应关系,确定目标日志段所在的数据块的标识;
逻辑卷服务进程将数据块的标识发送至文件系统访问进程。
比如,与图5b所示实施例中介绍的数据写入过程相对应的,当文件系统访问进程基于数据写入进程的触发而生成某条日志比如图5b所示实施例中的第一日志或第二日志后,文件系统访问进程需要将当前生成的日志存入到目标逻辑卷的最后日志段所在的数据块中,为此,文件系统访问进程需要通过该逻辑卷服务进程确定该数据块以及该数据块所对应的数据块服务进程。
基于此,逻辑卷服务进程在数据写入过程中还需要执行如下步骤:
逻辑卷服务进程接收文件系统访问进程在生成日志后发送的第一查询请求,第一查询请求用于查询目标逻辑卷对应的最后日志段所在的数据块;
逻辑卷服务进程根据维护的日志段标识和数据块标识间的对应关系,确定最后日志段所在的第一数据块的标识;
逻辑卷服务进程将第一数据块的标识发送至文件系统访问进程,以供文件系统访问进程根据第一数据块的标识查询根服务器以获得第一数据块对应的第一数据块服务进程以及将日志追加到第一数据块中以及请求第一数据块服务进程缓存该日志中的元数据。
其中,根服务器中维护有数据块标识和存储节点标识间的对应关系。
再比如,与图7b所示实施例中介绍的数据读取过程相对应的,当文件系统访问进程需要确定日志序列号j+m之后已缓存日志所存储于的数据块时,逻辑卷服务进程在数据读取过程中还需要执行如下步骤:
逻辑卷服务进程接收文件系统访问进程发送的第二查询请求,第二查询请求是在文件系统访问进程在获取到数据读取进程自上次已读取到的日志序列号j到日志序列号j+m间的日志后发出的,第二查询请求用于请求日志序列号j+m之后的已缓存日志所在的数据块;
逻辑卷服务进程根据维护的日志段标识和数据块标识间的对应关系,确定日志序列号j+m之后的已缓存日志所在的各第二数据块的标识;
逻辑卷服务进程将各第二数据块的标识发送至文件系统访问进程,以供文件系统访问进程根据各第二数据块的标识查询根服务器获得各第二数据块各自对应的第二数据块服务进程以从各第二数据块中获取日志序列号在j+m后的已缓存日志的元数据以及根据日志序列号在j+m后的已缓存日志的元数据更新目标逻辑卷的数据状态。
因为逻辑卷的数据写入是以日志递增的方式写入,即append-only方式写入,那么,当某条日志中的数据被修改,该条日志中的数据和元数据就变成了垃圾,即该条日志变为了无效日志。因此,本发明实施例还提供了垃圾回收机制以重新整理和回收之前写入数据的存储空间,具体由每个存储节点中部署的垃圾回收服务进程(GC)与其他服务进程配合来完成垃圾回收。
具体地,以任一存储节点来说,其可以通过如下过程进行垃圾回收处理:
存储节点中的逻辑卷服务进程接收存储节点中的垃圾回收服务进程发送的与目标逻辑卷对应的第三挂载请求;
逻辑卷服务进程将目标逻辑卷对应的第二日志段和检查点存储位置信息发送至垃圾回收服务进程,以使垃圾回收服务进程从第二日志段和检查点选取K个日志段和检查点以恢复对应的数据状态并在确定K个日志段中的M个日志段的垃圾比例达到预设阈值时回收M个日志段所在的原数据块,K≥M≥1;
逻辑卷服务进程接收垃圾回收服务进程发送的日志段更新通知,日志段更新通知中包括M个日志段与M个日志段所在的新数据块间的对应关系;
逻辑卷服务进程更新M个日志段与数据块的原对应关系为所述对应关系。
上述垃圾回收过程中的未尽描述可以参见图8a所示实施例中的相关介绍,在此不赘述。
以下将详细描述本发明的一个或多个实施例的文件系统数据访问装置。本领域技术人员可以理解,这些文件系统数据访问装置均可使用市售的硬件组件通过本方案所教导 的步骤进行配置来构成。
图9为本发明实施例提供的文件系统数据访问装置的结构示意图,如图9所示,该装置包括:发送模块11、接收模块12、恢复模块13、处理模块14。
发送模块11,用于向目标存储节点发送针对目标逻辑卷触发的挂载请求,所述目标存储节点为所述多个存储节点中的任一个,所述目标逻辑卷对应于多个存储节点中的至少部分存储资源。
接收模块12,用于接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息。
恢复模块13,用于根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态。
处理模块14,用于基于所述目标逻辑卷的数据状态进行数据访问处理。
图9所示装置可以执行图4a、图5a、图6a、图7a所示实施例的方法,本实施例未详细描述的部分,可参考对图4a、图5a、图6a、图7a所示实施例的相关说明。
以上描述了文件系统数据访问装置的内部功能和结构,在一个可能的设计中,该文件系统数据访问装置的结构可实现为计算节点,该计算节点比如为应用服务器,如图10所示,该计算节点可以包括:处理器21和存储器22。其中,所述存储器22用于存储支持文件系统数据访问装置执行上述图4a、图5a、图6a、图7a所示实施例中提供的文件系统数据访问方法的程序,所述处理器21被配置为用于执行所述存储器22中存储的程序。
所述程序包括一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器21执行时能够实现如下步骤:
向目标存储节点发送针对目标逻辑卷触发的挂载请求,所述目标存储节点为文件系统包含的多个存储节点中的任一个,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;
根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;
基于所述目标逻辑卷的数据状态进行数据访问处理。
可选地,所述处理器21还用于执行前述图4a、图5a、图6a、图7a所示实施例中的 全部或部分步骤。
其中,所述文件系统数据访问装置的结构中还可以包括通信接口23,用于用户界面操作装置与其他设备或通信网络通信。
另外,本发明实施例提供了一种计算机存储介质,用于储存文件系统数据访问装置所用的计算机软件指令,其包含用于执行上述图4a、图5a、图6a、图7a所示方法实施例中文件系统数据访问方法所涉及的程序。
图11为本发明实施例提供的另一文件系统数据访问装置的结构示意图,如图11所示,该装置包括:显示模块11、操作事件处理模块12,渲染处理模块13。
接收模块31,用于接收计算节点发送的与目标逻辑卷对应的挂载请求。
获取模块32,用于获取所述目标逻辑卷对应的日志段和检查点存储位置信息。
发送模块33,用于将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,以使所述计算节点基于所述日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源。
图11所示装置可以执行图8a所示实施例的方法,本实施例未详细描述的部分,可参考对图8a所示实施例的相关说明。
以上描述了文件系统数据访问装置的内部功能和结构,在一个可能的设计中,该文件系统数据访问装置的结构可实现为存储节点,如图12所示,该存储节点可以包括:处理器41和存储器42。其中,所述存储器42用于存储支持文件系统数据访问装置执行上述图8a所示实施例中提供的文件系统数据访问方法的程序,所述处理器41被配置为用于执行所述存储器42中存储的程序。
所述程序包括一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器41执行时能够实现如下步骤:
接收计算节点发送的与目标逻辑卷对应的挂载请求;
将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,以使所述计算节点基于所述日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源。
可选地,所述处理器41还用于执行前述图8a所示实施例中的全部或部分步骤。
其中,所述文件系统数据访问装置的结构中还可以包括通信接口43,用于用户界面 操作装置与其他设备或通信网络通信。
另外,本发明实施例提供了一种计算机存储介质,用于储存文件系统数据访问装置所用的计算机软件指令,其包含用于执行上述图8a所示方法实施例中文件系统数据访问方法所涉及的程序。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助加必需的通用硬件平台的方式来实现,当然也可以通过硬件和软件结合的方式来实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以计算机产品的形式体现出来,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (36)

  1. 一种文件系统数据访问方法,其特征在于,所述文件系统包括至少一个计算节点和多个存储节点,所述方法包括:
    计算节点向目标存储节点发送针对目标逻辑卷触发的挂载请求,所述目标存储节点为所述多个存储节点中的任一个,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
    所述计算节点接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;
    所述计算节点根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;
    所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理。
  2. 根据权利要求1所述的方法,其特征在于,所述计算节点向目标存储节点发送针对目标逻辑卷触发的挂载请求,包括:
    响应于数据写入进程触发的挂载操作,所述计算节点向所述目标存储节点发送第一挂载请求,所述第一挂载请求中包括所述目标逻辑卷的标识、用户信息和可读写挂载方式标识;
    所述计算节点接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息,包括:
    所述计算节点接收所述目标存储节点在确定所述用户信息通过所述目标逻辑卷的用户权限验证以及确定所述目标逻辑卷当前并未被以可读写挂载方式挂载后发送的所述日志段和检查点存储位置信息。
  3. 根据权利要求2所述的方法,其特征在于,所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据写入进程触发的文件准备写操作,所述计算节点生成仅包括元数据的第一日志,所述第一日志中包括递增当前最后的日志序列号i后的日志序列号i+1和与所述文件准备写操作对应的文件信息;
    所述计算节点通过所述目标存储节点和根服务器确定所述目标逻辑卷对应的最后日志段所在的第一数据块和所述第一数据块所对应的第一存储节点;
    所述计算节点向所述第一存储节点发送日志存入请求,以请求将所述第一日志追加到所述第一数据块中以及请求所述第一存储节点缓存所述第一日志。
  4. 根据权利要求3所述的方法,其特征在于,所述计算节点通过所述目标存储节点和根服务器确定所述目标逻辑卷对应的最后日志段所在的第一数据块和所述第一数据块所对应的第一存储节点,包括:
    所述计算节点向所述目标存储节点发送第一查询请求,用于查询取所述目标逻辑卷对应的最后日志段所在的数据块;
    所述计算节点接收所述目标存储节点发送的所述第一数据块的标识,所述第一数据块的标识是所述目标存储节点根据维护的所述目标逻辑卷的日志段标识和数据块标识间的对应关系确定出的;
    所述计算节点向所述根服务器发送第二查询请求,用于查询取所述第一数据块的标识对应的存储节点;
    所述计算节点接收所述根服务器发送的所述第一存储节点的标识,所述第一存储节点的标识是所述根服务器根据维护的数据块标识和存储节点标识间的对应关系确定出的。
  5. 根据权利要求3所述的方法,其特征在于,所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据写入进程发送的数据写入请求,所述计算节点生成第二日志,所述数据写入请求中包括写入数据和所述写入数据对应的文件信息,所述第二日志中包括元数据和所述写入数据,所述元数据包括递增当前最后的日志序列号i+1后的日志序列号i+2和所述文件信息;
    所述计算节点通过所述目标存储节点和根服务器确定所述目标逻辑卷对应的最后日志段所在的第二数据块和所述第二数据块所对应的第二存储节点;
    所述计算节点向所述第二存储节点发送日志存入请求,以请求将所述第二日志追加到所述第二数据块中以及请求所述第二存储节点缓存所述第二日志中的元数据。
  6. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    若所述最后日志段达到封闭条件,则所述计算节点封闭所述最后日志段,生成新的检查点,并向所述目标存储节点申请新的日志段作为最后日志段;
    其中,所述封闭条件为所述最后日志段中包含的日志条数达到预设条数或者,所述最后日志段对应的数据量达到预设容量值。
  7. 根据权利要求1所述的方法,其特征在于,所述计算节点向目标存储节点发送针对目标逻辑卷触发的挂载请求,包括:
    响应于数据读取进程触发的挂载操作,所述计算节点向所述目标存储节点发送第二挂载请求,所述第二挂载请求中包括所述目标逻辑卷的标识、用户信息和只读挂载方式标识;
    所述计算节点接收所述目标存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息,包括:
    所述计算节点接收所述目标存储节点在确定所述用户信息通过所述目标逻辑卷的用户权限验证后发送的所述日志段和检查点存储位置信息。
  8. 根据权利要求7所述的方法,其特征在于,所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据读取进程触发的文件准备读操作,所述计算节点根据所述数据读取进程上次已读取到的日志序列号j、所述日志序列号j对应的第三数据块和第三存储节点,向所述第三存储节点发送数据同步请求,以获取所述日志序列号j后已缓存日志的元数据;
    所述计算节点接收所述第三存储节点发送的所述日志序列号j后已缓存日志的元数据和指示信息,所述已缓存日志所对应的日志序列号为j+1到j+m,所述指示信息用于指示所述日志序列号j+m所对应的日志段是否已经被封闭;
    所述计算节点根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态。
  9. 根据权利要求8所述的方法,其特征在于,所述计算节点根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态,包括:
    若所述指示信息指示所述日志序列号j+m所对应的日志段没有被封闭,则所述计算节点根据日志序列号在j+1到j+m的已缓存日志的元数据更新所述目标逻辑卷的数据状态。
  10. 根据权利要求8所述的方法,其特征在于,所述计算节点根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态,包括:
    若所述指示信息指示所述日志序列号j+m所对应的日志段已经被封闭,则所述计算节点通过所述目标存储节点和根服务器确定所述日志序列号j+m之后已缓存日志所存储于的各第四数据块和所述各第四数据块各自对应的第四存储节点;
    所述计算节点向所述第四存储节点发送数据同步请求,以获取所述日志序列号j+m后已缓存日志的元数据;
    所述计算节点根据所述日志序列号在j+1到j+m的已缓存日志的元数据以及j+m后的已缓存日志的元数据更新所述目标逻辑卷的数据状态。
  11. 根据权利要求8所述的方法,其特征在于,所述计算节点基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据读取进程发送的数据读取请求,所述数据读取请求中包括待读取的文件信息,所述计算节点通过所述目标存储节点和根服务器确定与所述待读取的文件信息对应的第五存储节点和第五数据块,以通过所述第五存储节点从所述第五数据块中读取数据。
  12. 一种文件系统数据访问方法,其特征在于,所述文件系统包括至少一个计算节点和多个存储节点,所述方法包括:
    目标存储节点接收计算节点发送的与目标逻辑卷对应的挂载请求,所述目标存储节点为所述多个存储节点中的任一个;
    所述目标存储节点将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,以使所述计算节点基于所述日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;
    所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源。
  13. 根据权利要求12所述的方法,其特征在于,所述目标存储节点接收计算节点发送的与目标逻辑卷对应的挂载请求,包括:
    所述目标存储节点接收所述计算节点发送的第一挂载请求,所述第一挂载请求中包括可读写挂载方式标识、所述目标逻辑卷的标识和用户信息,所述可读写挂载方式标识与所述计算节点中的数据写入进程对应;
    所述目标存储节点将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,包括:
    所述目标存储节点在确定所述用户信息通过所述目标逻辑卷的用户权限验证以及确定所述目标逻辑卷当前并未被以可读写方式挂载后发送所述目标逻辑卷对应的日志段和检查点存储位置信息至所述计算节点。
  14. 根据权利要求12所述的方法,其特征在于,所述目标存储节点接收计算节点发送的与目标逻辑卷对应的挂载请求,包括:
    所述目标存储节点接收所述计算节点发送的第二挂载请求,所述第二挂载请求中包括只读挂载方式标识、所述目标逻辑卷的标识和用户信息,所述只读挂载方式标识与所 述计算节点中的数据读取进程对应;
    所述目标存储节点将所述目标逻辑卷对应的日志段和检查点存储位置信息发送至所述计算节点,包括:
    所述目标存储节点在确定所述用户信息通过所述目标逻辑卷的用户权限验证后发送所述目标逻辑卷对应的日志段和检查点存储位置信息至所述计算节点。
  15. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    所述目标存储节点接收所述计算节点发送的查询请求,所述查询请求用于查询所述目标逻辑卷对应的目标日志段所在的数据块;
    所述目标存储节点根据维护的所述目标逻辑卷的日志段标识和数据块标识间的对应关系,确定所述目标日志段所在的数据块的标识;
    所述目标存储节点将所述数据块的标识发送至所述计算节点。
  16. 根据权利要求12至15中任一项所述的方法,其特征在于,所述方法还包括:
    所述目标存储节点选取所述目标逻辑卷对应的K个日志段和所述K个日志段对应的检查点以恢复对应的数据状态;
    所述目标存储节点在确定所述K个日志段中的M个日志段的垃圾比例达到预设阈值时将所述M个日志段中的无效日志清空,从根服务器中申请新数据块,将经过所述清空处理后的M个日志段写入所述新数据块中,并回收所述M个日志段所在的原数据块,以及更新所述M个日志段与数据块的对应关系,K≥M≥1。
  17. 一种文件系统,其特征在于,包括:
    至少一个计算节点、多个存储节点以及用于管理所述多个存储节点的多个根服务器;
    其中,所述至少一个计算节点中的任一计算节点,用于向所述多个存储节点中的任一存储节点发送针对目标逻辑卷触发的挂载请求;接收所述任一存储节点发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;基于所述目标逻辑卷的数据状态进行数据访问处理;所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
    所述任一存储节点,用于获取所述目标逻辑卷对应的日志段和检查点存储位置信息,将所述日志段和检查点存储位置信息发送至所述任一计算节点。
  18. 一种文件系统数据访问方法,其特征在于,所述文件系统包括至少一个计算节 点和多个存储节点,所述方法包括:
    响应于计算节点中的数据访问进程触发的挂载操作,所述计算节点中的文件系统访问进程向任一存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
    所述文件系统访问进程接收所述逻辑卷服务进程发送的所述目标逻辑卷对应的日志段和检查点存储位置信息;
    所述文件系统访问进程根据所述日志段和检查点存储位置信息读取所述日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态;
    所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理。
  19. 根据权利要求18所述的方法,其特征在于,所述数据访问进程为数据写入进程,所述响应于计算节点中的数据访问进程触发的挂载操作,所述计算节点中的文件系统访问进程向任一存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,包括:
    响应于所述数据写入进程触发的挂载操作,所述文件系统访问进程向所述逻辑卷服务进程发送第一挂载请求,所述第一挂载请求中包括所述目标逻辑卷的标识、用户信息和可读写挂载方式标识;
    所述文件系统访问进程接收所述逻辑卷服务进程发送的所述目标逻辑卷对应的日志段和检查点存储位置信息,包括:
    所述文件系统访问进程接收所述逻辑卷服务进程在确定所述用户信息通过所述目标逻辑卷的用户权限验证以及确定所述目标逻辑卷当前并未被以可读写挂载方式挂载后发送的所述目标逻辑卷对应的日志段和检查点存储位置信息。
  20. 根据权利要求19所述的方法,其特征在于,所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据写入进程触发的文件准备写操作,所述文件系统访问进程生成仅包括元数据的第一日志,所述第一日志中包括递增当前最后的日志序列号i后的日志序列号i+1和与所述文件准备写请求对应的文件信息;
    所述文件系统访问进程通过所述逻辑卷服务进程和根服务器确定所述目标逻辑卷对应的最后日志段所在的第一数据块和所述第一数据块所对应的第一数据块服务进程,所述根服务器用于管理各存储节点中的数据块服务进程;
    所述文件系统访问进程向所述第一数据块服务进程发送日志存入请求,以请求将所 述第一日志追加到所述第一数据块中以及请求所述第一数据块服务进程缓存所述第一日志。
  21. 根据权利要求20所述的方法,其特征在于,所述文件系统访问进程通过所述逻辑卷服务进程和根服务器确定所述目标逻辑卷对应的最后日志段所在的第一数据块和所述第一数据块所对应的第一数据块服务进程,包括:
    所述文件系统访问进程向所述逻辑卷服务进程发送第一查询请求,用于查询取所述目标逻辑卷对应的最后日志段所在的数据块;
    所述文件系统访问进程接收所述逻辑卷服务进程发送的所述第一数据块的标识,所述第一数据块的标识是所述逻辑卷服务进程根据维护的所述目标逻辑卷的日志段标识和数据块标识间的对应关系确定出的;
    所述文件系统访问进程向所述根服务器发送第二查询请求,用于查询取所述第一数据块的标识对应的数据块服务进程;
    所述文件系统访问进程接收所述根服务器发送的所述第一数据块服务进程的标识,所述第一数据块服务进程的标识是所述根服务器根据维护的数据块标识和数据块服务进程标识间的对应关系确定出的。
  22. 根据权利要求20所述的方法,其特征在于,所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据写入进程发送的数据写入请求,所述数据写入请求中包括写入数据和所述写入数据对应的文件信息,所述文件系统访问进程生成第二日志,所述第二日志中包括元数据和所述写入数据,所述元数据包括递增当前最后的日志序列号i+1后的日志序列号i+2和所述文件信息;
    所述文件系统访问进程通过所述逻辑卷服务进程和根服务器确定所述目标逻辑卷对应的最后日志段所在的第二数据块和所述第二数据块所对应的第二数据块服务进程;
    所述文件系统访问进程向所述第二数据块服务进程发送日志存入请求,以请求将所述第二日志追加到所述第二数据块中以及请求所述第二数据块服务进程缓存所述第二日志中的元数据。
  23. 根据权利要求20所述的方法,其特征在于,所述方法还包括:
    若所述最后日志段达到封闭条件,则所述文件系统访问进程封闭所述最后日志段,生成新的检查点,并向所述逻辑卷服务进程申请新的日志段作为最后日志段;
    其中,所述封闭条件为所述最后日志段中包含的日志条数达到预设条数或者,所述 最后日志段对应的数据量达到预设容量值。
  24. 根据权利要求18所述的方法,其特征在于,所述数据访问进程为数据读取进程,所述响应于计算节点中的数据访问进程触发的挂载操作,所述计算节点中的文件系统访问进程向任一存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,包括:
    响应于所述数据读取进程触发的挂载操作,所述文件系统访问进程向所述逻辑卷服务进程发送第二挂载请求,所述第二挂载请求中包括所述目标逻辑卷的标识、用户信息和只读挂载方式标识;
    所述文件系统访问进程接收所述逻辑卷服务进程发送的所述目标逻辑卷对应的日志段和检查点存储位置信息,包括:
    所述文件系统访问进程接收所述逻辑卷服务进程在确定所述用户信息通过所述目标逻辑卷的用户权限验证后发送的所述目标逻辑卷对应的日志段和检查点存储位置信息。
  25. 根据权利要求24所述的方法,其特征在于,所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    响应于所述数据读取进程触发的文件准备读操作,所述文件系统访问进程根据所述数据读取进程上次已读取到的日志序列号j、所述日志序列号j对应的第三数据块和第三数据块服务进程,向所述第三数据块服务进程发送数据同步请求,以获取所述日志序列号j后已缓存日志的元数据;
    所述文件系统访问进程接收所述第三数据块服务进程发送的所述日志序列号j后已缓存日志的元数据和指示信息,所述已缓存日志所对应的日志序列号为j+1到j+m,所述指示信息用于指示所述日志序列号j+m所对应的日志段是否已经被封闭;
    所述文件系统访问进程根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态。
  26. 根据权利要求25所述的方法,其特征在于,所述文件系统访问进程根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态,包括:
    若所述指示信息指示所述日志序列号j+m所对应的日志段没有被封闭,则所述文件系统访问进程根据日志序列号在j+1到j+m的已缓存日志的元数据更新所述目标逻辑卷的数据状态。
  27. 根据权利要求25所述的方法,其特征在于,所述文件系统访问进程根据所述日志序列号j后已缓存日志的元数据和所述指示信息更新所述目标逻辑卷的数据状态,包括:
    若所述指示信息指示所述日志序列号j+m所对应的日志段已经被封闭,则所述文件系统访问进程通过所述逻辑卷服务进程和根服务器确定所述日志序列号j+m之后已缓存日志所存储于的各第四数据块和所述各第四数据块各自对应的第四数据块服务进程;
    所述文件系统访问进程向所述第四数据块服务进程发送数据同步请求,以获取所述日志序列号j+m后已缓存日志的元数据;
    所述文件系统访问进程根据所述日志序列号在j+1到j+m的已缓存日志的元数据以及j+m后的已缓存日志的元数据更新所述目标逻辑卷的数据状态。
  28. 根据权利要求25所述的方法,其特征在于,所述文件系统访问进程基于所述目标逻辑卷的数据状态进行数据访问处理,包括:
    所述文件系统访问进程接收所述数据读取进程发送的数据读取请求,所述数据读取请求中包括待读取的文件信息;
    所述文件系统访问进程通过所述逻辑卷服务进程和根服务器确定与所述待读取的文件信息对应的第五数据块服务进程和第五数据块,以通过所述第五数据块服务进程从所述第五数据块中读取数据。
  29. 一种文件系统数据访问方法,其特征在于,所述文件系统包括至少一个计算节点和多个存储节点,所述方法包括:
    存储节点中的逻辑卷服务进程接收计算节点中的文件系统访问进程发送的与目标逻辑卷对应的挂载请求;
    所述逻辑卷服务进程将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程,以使所述文件系统访问进程基于所述第一日志段和检查点存储位置信息恢复所述目标逻辑卷的数据状态用于进行数据访问处理;
    所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源,所述存储节点是所述多个存储节点中的任一个。
  30. 根据权利要求29所述的方法,其特征在于,所述存储节点中的逻辑卷服务进程接收计算节点中的文件系统访问进程发送的与目标逻辑卷对应的挂载请求,包括:
    所述逻辑卷服务进程接收所述文件系统访问进程发送的第一挂载请求,所述第一挂载请求中包括可读写挂载方式标识、所述目标逻辑卷的标识和用户信息;
    所述逻辑卷服务进程将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程,包括:
    所述逻辑卷服务进程在确定所述用户信息通过所述目标逻辑卷的用户权限验证以及确定所述目标逻辑卷当前并未被以可读写挂载方式挂载后发送所述目标逻辑卷对应的第一日志段和检查点存储位置信息至所述文件系统访问进程。
  31. 根据权利要求29所述的方法,其特征在于,所述存储节点中的逻辑卷服务进程接收计算节点中的文件系统访问进程发送的与目标逻辑卷对应的挂载请求,包括:
    所述逻辑卷服务进程接收所述文件系统访问进程发送的第二挂载请求,所述第二挂载请求中包括只读挂载方式标识、所述目标逻辑卷的标识和用户信息;
    所述逻辑卷服务进程将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程,包括:
    所述逻辑卷服务进程在确定所述用户信息通过所述目标逻辑卷的用户权限验证后发送所述目标逻辑卷对应的第一日志段和检查点存储位置信息至所述文件系统访问进程。
  32. 根据权利要求29所述的方法,其特征在于,所述方法还包括:
    所述逻辑卷服务进程接收所述文件系统访问进程发送的查询请求,所述查询请求用于查询所述目标逻辑卷对应的目标日志段所在的数据块;
    所述逻辑卷服务进程根据维护的日志段标识和数据块标识间的对应关系,确定所述目标日志段所在的数据块的标识;
    所述逻辑卷服务进程将所述数据块的标识发送至所述文件系统访问进程。
  33. 根据权利要求29至32中任一项所述的方法,其特征在于,所述方法还包括:
    所述逻辑卷服务进程接收所述存储节点中的垃圾回收服务进程发送的与所述目标逻辑卷对应的第三挂载请求;
    所述逻辑卷服务进程将所述目标逻辑卷对应的第二日志段和检查点存储位置信息发送至所述垃圾回收服务进程,以使所述垃圾回收服务进程从所述第二日志段和检查点选取K个日志段和检查点以恢复对应的数据状态并在确定所述K个日志段中的M个日志段的垃圾比例达到预设阈值时回收所述M个日志段所在的原数据块,K≥M≥1。
  34. 根据权利要求33所述的方法,其特征在于,所述方法还包括:
    所述逻辑卷服务进程接收所述垃圾回收服务进程发送的日志段更新通知,所述日志段更新通知中包括所述M个日志段与所述M个日志段所在的新数据块间的对应关系;
    所述逻辑卷服务进程更新所述M个日志段与数据块的原对应关系为所述对应关系。
  35. 一种文件系统,其特征在于,包括:
    至少一个计算节点、多个存储节点以及用于管理所述多个存储节点中的数据块服务进程的多个根服务器;
    其中,每个计算节点中具有数据访问进程和文件系统访问进程;
    每个存储节点中具有逻辑卷服务进程和数据块服务进程;所述数据块服务进程用于对相应存储节点中存储的各数据块进行读写管理;
    所述文件系统访问进程,用于响应于相应计算节点中的数据访问进程触发的挂载操作,向任一存储节点中的逻辑卷服务进程发送针对目标逻辑卷的挂载请求,接收所述逻辑卷服务进程发送的所述目标逻辑卷对应的第一日志段和检查点存储位置信息,根据所述第一日志段和检查点存储位置信息读取所述第一日志段和检查点的日志元数据以恢复所述目标逻辑卷的数据状态,基于所述目标逻辑卷的数据状态进行数据访问处理,所述第一日志段列表对应于所述目标逻辑卷的元数据,所述目标逻辑卷对应于所述多个存储节点中的至少部分存储资源;
    所述逻辑卷服务进程,用于接收所述文件系统访问进程发送的所述挂载请求,将所述目标逻辑卷对应的第一日志段和检查点存储位置信息发送至所述文件系统访问进程。
  36. 根据权利要求35所述的系统,其特征在于,每个存储节点中还包括垃圾回收服务进程;
    所述逻辑卷服务进程,还用于接收对应的存储节点中的垃圾回收服务进程发送的与所述目标逻辑卷对应的挂载请求,将所述目标逻辑卷对应的第二日志段和检查点存储位置信息发送至所述垃圾回收服务进程,以及接收所述垃圾回收服务进程发送的包括M个日志段与所述M个日志段所在的新数据块间的对应关系的日志段更新通知,更新所述M个日志段与数据块的原对应关系为所述对应关系;
    所述垃圾回收服务进程,用于从所述第二日志段和检查点中选取K个日志段和检查点以恢复对应的数据状态,若确定所述K个日志段中的M个日志段的垃圾比例达到预设阈值,则将所述M个日志段中的无效日志清空,向所述根服务器申请新数据块以将经过所述清空处理后的M个日志段写入所述新数据块中,向所述M个日志段所在的原数据块所对应的数据块服务进程发送回收通知以使所述数据块服务进程回收所述原数据块,以及向所述逻辑卷服务进程发送所述日志段更新通知,K≥M≥1。
PCT/CN2019/087691 2018-06-01 2019-05-21 文件系统数据访问方法和文件系统 WO2019228217A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP19811764.0A EP3806424A4 (en) 2018-06-01 2019-05-21 METHOD OF ACCESSING DATA FROM A FILE SYSTEM AND FILE SYSTEM
JP2020567024A JP7378870B2 (ja) 2018-06-01 2019-05-21 ファイルシステムデータアクセス方法およびファイルシステム
US17/092,086 US20210056074A1 (en) 2018-06-01 2020-11-06 File System Data Access Method and File System

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810558071.1 2018-06-01
CN201810558071.1A CN110554834B (zh) 2018-06-01 2018-06-01 文件系统数据访问方法和文件系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/092,086 Continuation US20210056074A1 (en) 2018-06-01 2020-11-06 File System Data Access Method and File System

Publications (1)

Publication Number Publication Date
WO2019228217A1 true WO2019228217A1 (zh) 2019-12-05

Family

ID=68697428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087691 WO2019228217A1 (zh) 2018-06-01 2019-05-21 文件系统数据访问方法和文件系统

Country Status (5)

Country Link
US (1) US20210056074A1 (zh)
EP (1) EP3806424A4 (zh)
JP (1) JP7378870B2 (zh)
CN (1) CN110554834B (zh)
WO (1) WO2019228217A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639008A (zh) * 2020-05-29 2020-09-08 杭州海康威视系统技术有限公司 基于双端口ssd的文件系统状态监测方法、装置及电子设备
CN111767257A (zh) * 2020-06-28 2020-10-13 星辰天合(北京)数据科技有限公司 基于fuse文件系统和nfs协议的数据传输方法及装置
CN111767257B (zh) * 2020-06-28 2024-05-31 北京星辰天合科技股份有限公司 基于fuse文件系统和nfs协议的数据传输方法及装置

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10459892B2 (en) 2014-04-23 2019-10-29 Qumulo, Inc. Filesystem hierarchical aggregate metrics
US11360936B2 (en) 2018-06-08 2022-06-14 Qumulo, Inc. Managing per object snapshot coverage in filesystems
CN113127437B (zh) * 2019-12-31 2023-12-22 阿里巴巴集团控股有限公司 文件系统管理方法、云系统、装置、电子设备及存储介质
CN113064546B (zh) * 2020-01-02 2024-02-09 阿里巴巴集团控股有限公司 一种文件系统的管理方法、装置、文件系统及存储介质
CN111241040B (zh) * 2020-01-10 2023-04-18 阿里巴巴集团控股有限公司 信息获取方法、装置、电子设备及计算机存储介质
US10795796B1 (en) 2020-01-24 2020-10-06 Qumulo, Inc. Predictive performance analysis for file systems
US11151001B2 (en) 2020-01-28 2021-10-19 Qumulo, Inc. Recovery checkpoints for distributed file systems
CN111522784B (zh) * 2020-04-20 2023-11-21 支付宝(杭州)信息技术有限公司 一种非结构化数据文件的元数据同步方法、装置及设备
CN111552437B (zh) * 2020-04-22 2024-03-15 上海天玑科技股份有限公司 一种应用于分布式存储系统的快照方法及快照装置
CN111586141B (zh) * 2020-04-30 2023-04-07 中国工商银行股份有限公司 作业处理方法、装置、系统和电子设备
CN111930828B (zh) * 2020-05-29 2024-01-19 武汉达梦数据库股份有限公司 一种基于日志解析的数据同步方法和数据同步系统
CN112231286A (zh) * 2020-08-28 2021-01-15 杭州沃趣科技股份有限公司 一种快速恢复数据库历史数据的方法
CN112035069B (zh) * 2020-09-17 2024-02-27 上海二三四五网络科技有限公司 一种判断文件是否完整落盘的控制方法及装置
US11775481B2 (en) 2020-09-30 2023-10-03 Qumulo, Inc. User interfaces for managing distributed file systems
CN112328663A (zh) * 2020-11-24 2021-02-05 深圳市鹰硕技术有限公司 一种应用于大数据的数据发现方法及系统
CN112463306A (zh) * 2020-12-03 2021-03-09 南京机敏软件科技有限公司 一种虚拟机中共享盘数据一致性的方法
CN112748881B (zh) * 2021-01-15 2024-02-20 长城超云(北京)科技有限公司 一种面向虚拟化场景的存储容错方法和集群系统
US11157458B1 (en) 2021-01-28 2021-10-26 Qumulo, Inc. Replicating files in distributed file systems using object-based data storage
US11461241B2 (en) 2021-03-03 2022-10-04 Qumulo, Inc. Storage tier management for file systems
US11567660B2 (en) 2021-03-16 2023-01-31 Qumulo, Inc. Managing cloud storage for distributed file systems
US11132126B1 (en) 2021-03-16 2021-09-28 Qumulo, Inc. Backup services for distributed file systems in cloud computing environments
US11907170B2 (en) * 2021-06-14 2024-02-20 International Business Machines Corporation Switching serialization techniques for handling concurrent write requests to a shared file
JP2023004324A (ja) * 2021-06-25 2023-01-17 キヤノン株式会社 画像形成装置、画像形成装置の制御方法、及びプログラム
US11669255B2 (en) 2021-06-30 2023-06-06 Qumulo, Inc. Distributed resource caching by reallocation of storage caching using tokens and agents with non-depleted cache allocations
CN113630450B (zh) * 2021-07-26 2024-03-15 深圳市杉岩数据技术有限公司 分布式存储系统的访问控制方法及分布式存储系统
CN113778755B (zh) * 2021-09-16 2023-07-14 浪潮商用机器有限公司 一种数据同步方法、装置、设备及计算机可读存储介质
CN115017534B (zh) * 2021-11-05 2023-08-29 荣耀终端有限公司 文件处理权限控制方法、装置及存储介质
US11354273B1 (en) * 2021-11-18 2022-06-07 Qumulo, Inc. Managing usable storage space in distributed file systems
CN114218593B (zh) * 2021-12-20 2024-01-09 南京宁铎科技有限公司 基于办公设备的信息安全检测方法
US11599508B1 (en) 2022-01-31 2023-03-07 Qumulo, Inc. Integrating distributed file systems with object stores
CN114237989B (zh) * 2022-02-25 2022-04-26 北京奥星贝斯科技有限公司 数据库服务部署、容灾方法及装置
CN114567573B (zh) * 2022-03-10 2023-12-15 贵州中融信通科技有限公司 异常数据的定位方法、装置、服务器及存储介质
CN114676166B (zh) * 2022-05-26 2022-10-11 阿里巴巴(中国)有限公司 数据处理方法及装置
US11722150B1 (en) 2022-09-28 2023-08-08 Qumulo, Inc. Error resistant write-ahead log
US11729269B1 (en) 2022-10-26 2023-08-15 Qumulo, Inc. Bandwidth management in distributed file systems
US11921677B1 (en) 2023-11-07 2024-03-05 Qumulo, Inc. Sharing namespaces across file system clusters
US11934660B1 (en) 2023-11-07 2024-03-19 Qumulo, Inc. Tiered data storage with ephemeral and persistent tiers
CN117591038A (zh) * 2024-01-18 2024-02-23 济南浪潮数据技术有限公司 一种数据访问方法、装置、分布式存储系统及设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164177A (zh) * 2011-03-11 2011-08-24 浪潮(北京)电子信息产业有限公司 一种集群共享存储池的方法、装置及系统
CN102394923A (zh) * 2011-10-27 2012-03-28 周诗琦 一种基于n×n陈列结构的云系统平台
CN102664923A (zh) * 2012-03-30 2012-09-12 浪潮电子信息产业股份有限公司 一种利用Linux全局文件系统实现共享存储池的方法
CN103561101A (zh) * 2013-11-06 2014-02-05 中国联合网络通信集团有限公司 一种网络文件系统
US20160085472A1 (en) * 2014-09-24 2016-03-24 Fujitsu Limited Storage device and storage control method
CN105718217A (zh) * 2016-01-18 2016-06-29 浪潮(北京)电子信息产业有限公司 一种精简配置存储池数据一致性维护的方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08329102A (ja) * 1995-05-31 1996-12-13 Hitachi Ltd ファイルアクセス管理方法
JP2005050024A (ja) 2003-07-31 2005-02-24 Toshiba Corp 計算機システムおよびプログラム
US20050081099A1 (en) * 2003-10-09 2005-04-14 International Business Machines Corporation Method and apparatus for ensuring valid journaled file system metadata during a backup operation
CN100583051C (zh) * 2008-03-10 2010-01-20 清华大学 检查点容错技术中文件状态一致性维护的实现方法
JP2009251791A (ja) 2008-04-03 2009-10-29 Nec Corp 分散ファイルシステム、データ書き込み方法、及びプログラム
US8078622B2 (en) 2008-10-30 2011-12-13 Network Appliance, Inc. Remote volume access and migration via a clustered server namespace
US8572048B2 (en) * 2009-11-10 2013-10-29 TIM LaBERGE Supporting internal consistency checking with consistency coded journal file entries
US9336149B2 (en) * 2010-05-06 2016-05-10 International Business Machines Corporation Partial volume access in a physical stacked volume
JP6132980B2 (ja) 2013-06-19 2017-05-24 株式会社日立製作所 非集中的な分散型コンピューティング・システム
JP6327028B2 (ja) 2014-07-14 2018-05-23 日本電気株式会社 オブジェクトストレージシステムおよびその制御方法およびその制御プログラム
US9852026B2 (en) 2014-08-06 2017-12-26 Commvault Systems, Inc. Efficient application recovery in an information management system based on a pseudo-storage-device driver
US10089315B2 (en) * 2014-08-22 2018-10-02 AsterionDB, Inc. Systems, apparatus, and methods for accessing data from a database as a file
CN104866435B (zh) * 2015-06-06 2018-05-15 成都云祺科技有限公司 一种连续数据保护方法
US20170011054A1 (en) * 2015-07-11 2017-01-12 International Business Machines Corporation Intelligent caching in distributed clustered file systems
US20180137291A1 (en) * 2016-11-14 2018-05-17 Linkedin Corporation Securing files at rest in remote storage systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164177A (zh) * 2011-03-11 2011-08-24 浪潮(北京)电子信息产业有限公司 一种集群共享存储池的方法、装置及系统
CN102394923A (zh) * 2011-10-27 2012-03-28 周诗琦 一种基于n×n陈列结构的云系统平台
CN102664923A (zh) * 2012-03-30 2012-09-12 浪潮电子信息产业股份有限公司 一种利用Linux全局文件系统实现共享存储池的方法
CN103561101A (zh) * 2013-11-06 2014-02-05 中国联合网络通信集团有限公司 一种网络文件系统
US20160085472A1 (en) * 2014-09-24 2016-03-24 Fujitsu Limited Storage device and storage control method
CN105718217A (zh) * 2016-01-18 2016-06-29 浪潮(北京)电子信息产业有限公司 一种精简配置存储池数据一致性维护的方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639008A (zh) * 2020-05-29 2020-09-08 杭州海康威视系统技术有限公司 基于双端口ssd的文件系统状态监测方法、装置及电子设备
CN111639008B (zh) * 2020-05-29 2023-08-25 杭州海康威视系统技术有限公司 基于双端口ssd的文件系统状态监测方法、装置及电子设备
CN111767257A (zh) * 2020-06-28 2020-10-13 星辰天合(北京)数据科技有限公司 基于fuse文件系统和nfs协议的数据传输方法及装置
CN111767257B (zh) * 2020-06-28 2024-05-31 北京星辰天合科技股份有限公司 基于fuse文件系统和nfs协议的数据传输方法及装置

Also Published As

Publication number Publication date
JP2021525926A (ja) 2021-09-27
CN110554834A (zh) 2019-12-10
CN110554834B (zh) 2022-12-02
EP3806424A4 (en) 2022-02-23
US20210056074A1 (en) 2021-02-25
JP7378870B2 (ja) 2023-11-14
EP3806424A1 (en) 2021-04-14

Similar Documents

Publication Publication Date Title
WO2019228217A1 (zh) 文件系统数据访问方法和文件系统
US11153380B2 (en) Continuous backup of data in a distributed data store
US8515911B1 (en) Methods and apparatus for managing multiple point in time copies in a file system
KR101914019B1 (ko) 분산 데이터베이스 시스템들을 위한 고속 장애 복구
JP4568115B2 (ja) ハードウェアベースのファイルシステムのための装置および方法
KR101827239B1 (ko) 분산 데이터 시스템들을 위한 전 시스템에 미치는 체크포인트 회피
US11797491B2 (en) Inofile management and access control list file handle parity
US11797213B2 (en) Freeing and utilizing unused inodes
US11816348B2 (en) Persistent hole reservation
US11151162B2 (en) Timestamp consistency for synchronous replication
US11822520B2 (en) Freeing pages within persistent memory
US20220107916A1 (en) Supporting a lookup structure for a file system implementing hierarchical reference counting
US20230259529A1 (en) Timestamp consistency for synchronous replication
US20170286442A1 (en) File system support for file-level ghosting
CN111796767B (zh) 一种分布式文件系统及数据管理方法
US11341163B1 (en) Multi-level replication filtering for a distributed database
US20210056120A1 (en) In-stream data load in a replication environment
US11914571B1 (en) Optimistic concurrency for a multi-writer database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19811764

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020567024

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019811764

Country of ref document: EP

Effective date: 20210111