WO2016202199A1 - 分布式文件系统及其文件元信息管理方法 - Google Patents

分布式文件系统及其文件元信息管理方法 Download PDF

Info

Publication number
WO2016202199A1
WO2016202199A1 PCT/CN2016/085208 CN2016085208W WO2016202199A1 WO 2016202199 A1 WO2016202199 A1 WO 2016202199A1 CN 2016085208 W CN2016085208 W CN 2016085208W WO 2016202199 A1 WO2016202199 A1 WO 2016202199A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
information
node
primary index
data node
Prior art date
Application number
PCT/CN2016/085208
Other languages
English (en)
French (fr)
Inventor
段兵
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016202199A1 publication Critical patent/WO2016202199A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Definitions

  • the present application relates to the field of computer technology, and in particular, to a distributed file system and a file element information management method thereof.
  • the meta information is stored centrally in the main control node, and there is a single point of failure;
  • the size of the cluster is limited by the memory size of the master node
  • the main purpose of the present application is to provide a distributed file system and a file meta information management method thereof to overcome the performance bottleneck caused by the centralized storage of the distributed file system in the prior art.
  • a file element information management method for a distributed file system includes a master control node, a data node, and a client.
  • the method includes: the master control node receives the After the request for creating the file meta-information sent by the client, the first-level index identification information of the file is generated, wherein the first-level index identification information is globally unique within the system; and the data node receives the created file meta-information sent by the client.
  • the secondary index identification information of the file is allocated according to the primary index identification information, where the secondary index
  • the identifier information is globally unique within the primary index
  • the client generates a file name according to the primary index identification information and the secondary index identification information
  • the data node stores file meta information, which includes: primary index identification information, Secondary index identification information and file name.
  • the method further includes: the client receiving the primary index identification information of the file returned by the master control node; the client sending a file creation request to the corresponding data node, where the file carries the first level Index identification information.
  • the file meta information stored by the data node further includes: a file creation time, a file modification time, a file size, and a file status.
  • the method further includes: the data node forwarding the file meta information to the backed up data node.
  • the method further includes: the master control node receives location information of a data node in the system; and the master control node separately sends a report request for reporting the primary index to the data node according to the location information of the data node; After receiving the report request, the data node sends a primary index stored locally to the master control node, where the primary index includes the following information: primary index identification information, size of a file managed by the primary index, The number of files managed by the primary index and the primary index version information; the primary control node loads the received primary index into the memory of the primary control node, and if there is already a corresponding primary index in the memory, two ones are stored. The newer level index in the level index information.
  • the method further includes: when the system is started or according to a predetermined time, the master node requests the data node in the system to report the primary index, and performs the step of loading the primary index into the memory of the primary control node.
  • the method further includes: the client receiving the request for accessing the file meta information, according to the file carried in the request, at the same time or after the step of the master node performing the loading of the primary index into the memory of the master node
  • the name resolution obtains the corresponding primary index identification information and the secondary index identification information
  • the master control node receives the access file meta information request sent by the client, where the primary index identifier information is carried; the master control node is in its memory Whether the first-level index identification information exists in the query, and if so, the primary index identification information is returned to the client;
  • the data node receives the access file meta-information request sent by the client, where the first-level index identifier is carried The information and the secondary index identification information;
  • the data node queries the corresponding file meta information according to the primary index identification information and the secondary index identification information, and returns the information to the client.
  • the method further includes: the master control node receives the request for creating a file meta information sent by the client, at the same time or after the step of the master node performing the loading of the first level index into the memory of the master node, The master node newly generates or allocates the existing primary index identification information and returns the information to the client; the data node receives the request for creating the file meta information sent by the client, according to the primary index identifier carried therein Information distribution secondary index Identification information; the data node creates file meta information and returns the created file meta information to the client.
  • the method further includes: the master control node determines, according to a load balancing algorithm, a primary index that needs to be migrated, a source data node and a target data node of the primary index, and sends a data migration command to the target data node.
  • the target data node acquires and stores all file element information under the primary index through the source data node; and the master control node stores the relationship between the primary index and the target data node. And sending a command to delete the primary index to the source data node, so that the source data node deletes the primary index.
  • the embodiment of the present application further provides a distributed file system, including: a master node, a data node, and a client; the master node is configured to receive a request for creating a file meta information sent by the client The first level index identification information of the generated file, wherein the first level index identification information is globally unique within the system; the data node is configured to receive the request for creating the file meta information sent by the client, according to the first level The index identifies the secondary index identification information of the information distribution file, wherein the secondary index identification information is globally unique within the primary index; the client is configured to generate a file according to the primary index identification information and the secondary index identification information.
  • the data node is further configured to store file meta information, including: primary index identification information, secondary index identification information, and file name.
  • the client is further configured to receive the primary index identification information of the file returned by the primary control node, and send a request for creating a file to the corresponding data node, where the primary index identification information of the file is carried.
  • the file meta information stored by the data node further includes: a file creation time, a file modification time, a file size, and a file status.
  • the data node is further configured to forward the file meta information to the backed up data node.
  • the master node is further configured to: receive location information of the data node in the system, and send a report request reporting the primary index to the data node according to the location information of the data node; the data node is further configured to receive After the report request, the primary index stored locally is sent to the master node, where the primary index includes the following information: primary index identification information, size of the file managed by the primary index, and primary index. The number of the managed files and the version information of the primary index; the master node is further configured to load the received primary index into the memory of the primary control node, and if the corresponding primary index already exists in the memory, store two The newer primary index in the primary index.
  • the master control node is further configured to: when the system starts or request the data node in the system to report the primary index according to the predetermined time, and perform the step of loading the primary index into the memory of the primary control node.
  • the client is further configured to: when the step of loading the primary index information is performed by the master node, receive the request for accessing the file meta information, and obtain a corresponding primary index according to the file name carried in the request. Identification information and secondary index identification information; the master node is further configured to receive the access file metadata sent by the client The information request, wherein the primary control node carries the primary index identification information in its memory, and if so, returns the primary index identification information to the client; the data The node is further configured to: receive the access file meta information request sent by the client, where the first-level index identification information and the second-level index identification information are carried; the data node queries the corresponding information according to the primary index identification information and the secondary index identification information. The file meta information is returned to the client.
  • the master control node is further configured to: when the step of loading the primary index information is performed, the request for creating a file meta information sent by the client is received; or the master node newly generates or allocates an existing one.
  • the level index identification information is returned to the client;
  • the data node is further configured to: receive a request for creating a file meta information sent by the client, and allocate secondary index identification information according to the primary index identifier information carried therein;
  • the data node creates file meta information and returns the created file meta information to the client.
  • the master control node is further configured to: determine, according to a load balancing algorithm, a primary index that needs to be migrated, a source data node and a target data node of the primary index, and send a data migration command to the target data node;
  • the target data node is configured to: after receiving the data migration command, acquire, by using the source data node, all file element information in the primary index, and store the information;
  • the primary control node is further configured to store the primary index and A relationship of the target data node, and sending a command to delete the primary index to the source data node, so that the source data node deletes the primary index.
  • the file meta information management storage is completed by the cooperation of the main control node and the data node (storage node), and solves the single point failure problem that is easily caused by the file meta information being stored in the main control node. .
  • FIG. 1 shows a flowchart of a file element information management method of a distributed file system according to an embodiment of the present application
  • FIG. 2 shows a flowchart of file element information loading according to an embodiment of the present application
  • FIG. 3 shows a structural block diagram of a distributed file system according to an embodiment of the present application.
  • a file meta information management method for a distributed file system includes: at least one master node, multiple data nodes, and at least one client.
  • FIG. 1 is a flowchart of a file meta information management method of a distributed file system according to an embodiment of the present application, where the method includes:
  • Step S102 After receiving the request for creating the file meta information sent by the client, the master node generates the primary index identification information of the file, where the primary index identifier information is globally unique within the system.
  • the client is responsible for providing an interface for creating a file (creating file meta-information), and the user initiates a request for creating a file meta-information (ie, creating a file request) to the main control node through the client, and generates a file meta-information while creating the file.
  • the master node generates and stores a level index of the file locally and the location information of the data node responsible for managing the file.
  • the master node may determine the data node responsible for managing the file according to the load balancing algorithm, and details are not described herein again.
  • the primary index information of the file includes: primary index identification information (ID), size of a file managed by the primary index (Size), number of files managed by the primary index (Count), and primary index version information. (Version), and the primary index ID is globally unique within the system.
  • Step S104 After receiving the request for creating the file meta information sent by the client, the data node allocates the secondary index identification information of the file according to the primary index identification information, where the secondary index identification information is in the primary index. Internally globally unique.
  • the client after the client receives the primary index identification information of the file returned by the master control node, the client sends a file creation request to the corresponding data node according to the location information of the data node responsible for managing the file.
  • the creation file request carries the primary index identification information of the file.
  • the data node identifies the secondary index identification information (ID) of the information distribution component according to the primary index, wherein the secondary index ID is globally unique within the same primary index.
  • ID secondary index identification information
  • Step S106 The client generates a file name according to the primary index identification information and the secondary index identification information.
  • the secondary index identification information is returned to the client; the client passes the encryption algorithm according to the returned primary index identification information and the secondary index identification information. (for example, the base64 algorithm) generates a file name (filename) and then sends the generated file name to the data node.
  • the base64 algorithm for example, the base64 algorithm
  • Step S108 the data node stores file meta information, which includes: primary index identification information, secondary index Identification information and file name.
  • the file meta information stored by the data node further includes: a file creation time (create_time), a file modification time (modify_time), a file size (size), and a file. Status (status), etc.
  • the data node Master Data Node
  • the data node also needs to forward the above file meta information to the backed up data node (Slave data node).
  • the above embodiment describes the file meta information generation process.
  • the file meta information management storage of the present application is completed by the cooperation of the main control node and the data node (storage node), and the meta information in the prior art is stored in the main control node. The resulting single point of failure problem.
  • the loading process of the file meta information will be described in detail below with reference to FIG.
  • the distributed file system needs to load the meta information of the file into the memory of the main control node at the time of startup.
  • the file meta information according to the embodiment of the present application is distributed and stored on each data node of the distributed file system, as shown in FIG. 2 .
  • the file meta information loading process includes:
  • Step S202 when the system is started, the master node acquires related information of all data nodes in the system, including: location information (IP address) of the data node and a port number (PORT) of the master node monitored by the data node;
  • location information IP address
  • PORT port number
  • Step S204 The master control node sends a report request for reporting the primary index to each data node according to the location information of all the data nodes.
  • Step S206 after receiving the reporting request, the data node reads the primary index from the local disk and sends the primary index to the primary control node, where the primary index includes the following information: primary index identification information, file of the primary index management file. Size, number of files managed by the primary index, and primary index version information;
  • Step S208 after receiving the returned information, the master node traverses the returned primary index and queries whether there is a corresponding primary index in the memory of the master node;
  • Step S210 if it does not exist, newly create a data structure related to the primary index, that is, create a primary index that is the same as the primary index in the primary control node;
  • Step S212 if yes, compare two primary index related information, store a newer primary index, and delete another primary index from the related node;
  • the primary index stored by the master node is newer, the primary index stored by the data node is deleted. If the primary index stored by the data node is newer, the primary data node stores the primary index to implement the primary control node and Align the primary index information between data nodes.
  • Step S214 the master node periodically checks whether all the data nodes report success, and if not succeeds, the execution continues.
  • Step S204 sending a report command to the data node until the report is successful.
  • the master node is required to periodically (for example, 1 day) request the data node to re-report the first-level index information, so as to achieve the purpose of aligning the primary index between the master node and the data node.
  • the master node when the master node starts, it waits for the data node to report the primary index information, and the master node does not need to wait for all the data nodes to report completion (that is, does not need to establish all the primary index information).
  • Service the master node can load the first-level index, establish the relationship between the primary index and the data node, and provide read and write services to the outside. The process of reading and writing services is described in detail below.
  • the read service process includes:
  • the client receives the access file meta information request from the user, and the request carries the file name information; the client reversely parses the first index ID and the second level of the corresponding file according to the file name carried in the request. Index ID;
  • the client sends a request for accessing a file meta information to the main control node, where the request carries a primary index ID;
  • the master node queries in its memory whether there is information consistent with the primary index ID, and if present, returns the primary index information in the memory and the location information of the data node responsible for managing the file to the client, otherwise Return a failure message;
  • the client sends a request for accessing the file meta information to the corresponding data node according to the location information of the returned data node, where the request includes a primary index ID and a secondary index ID;
  • the data node queries the corresponding file meta information according to the primary index ID and the secondary index ID and returns it to the client, and the process ends.
  • the writing service process includes:
  • the user sends a request to create a file (create file meta information) to the master node through the client;
  • the master node searches for the first-level index that meets the condition from its memory, and if so, returns the first-level index to the client, otherwise allocates a brand-new level index and returns it to the client;
  • the client sends a request to create a file (create file meta information) to the corresponding data node;
  • the data node creates a file (file meta information) and stores the file meta information to the local disk, and then returns the file meta information to the client.
  • the master node only needs to load a small amount of primary index information, and can provide external read/write services by loading a primary index. Moreover, multiple data nodes simultaneously load meta information, and the amount of data that a single data node reads meta information from the disk is only 1/(the number of data nodes) of the total meta information, so that the system starts faster.
  • the process of capacity expansion is mainly controlled by the master control node.
  • the master node only migrates one level index at a time until all the data node capacity and the space occupied by the file meta information are basically balanced.
  • the primary index being migrated cannot provide a write service at this time, but can provide a read service; the primary index without migration is not affected at all, and both the write service and the read service can be provided.
  • the master node controls the speed of the migration.
  • the migration process has little impact on the user.
  • the expansion process is described in detail below.
  • the master control node periodically performs a load balancing algorithm in the background, determines a primary index ID to be migrated, a source data node where the primary index is located, and a target data node, and then sends a data migration command to the target data node;
  • the target data node After receiving the data migration command, the target data node actively pulls all the file meta-information under the primary index to the local data node to the local, and stores it persistently;
  • the target data node sends a migration command to the source data node according to the location information of the source data node, where the primary data node includes a primary index ID; after receiving the data migration command, the source data node queries the primary index according to the primary index ID. All secondary indexes and file meta information (creation time, modification time, size, etc.) are packaged and returned to the target data node; the target data node pulls the packaged data from the source data node to the local, and the file meta information and data are stored locally. After successful, report the migration result to the master node.
  • the master node reconstructs the relationship between the primary index and the data node (increasing the relationship between the primary index and the target node), and simultaneously sends a delete primary index command to the source data node, and the primary data node deletes the primary index.
  • a distributed file system is further provided according to an embodiment of the present application, where the system includes: at least one master node, multiple data nodes, and at least one client.
  • FIG. 3 is a structural block diagram of a distributed file system according to an embodiment of the present application. Only one master node, one data node, and one client are shown in FIG. 3, but this does not limit the master node in this application. , the number of data nodes, and the number of clients.
  • the master control node 10 is configured to: after receiving the request for creating the file meta information sent by the client, generate primary index identification information of the file, where the primary index identifier information is globally unique within the system;
  • the client 30 receives the first-level index identification information of the file returned by the master control node, and corresponds to The data node sends a create file request carrying the primary index identification information of the file.
  • the data node 20 is configured to: after receiving the request for creating the file meta information sent by the client, assigning the secondary index identification information of the file according to the primary index identifier information, where the secondary index identifier information is at the first level
  • the index is internally globally unique
  • the client 30 is configured to generate a file name according to the primary index identification information and the secondary index identification information;
  • the data node 20 is further configured to store file meta information, including: primary index identification information, secondary index identification information, and file name. Further, the data node is further configured to forward the file meta information to the backed up data node.
  • the file meta information stored by the data node includes:
  • File name (filename): generated by a primary index ID and a secondary index ID by an encryption algorithm (for example, a base64 algorithm);
  • Primary index ID generated by the master node, globally unique
  • the distributed file system needs to load the meta information of the file into the memory of the main control node at the time of startup.
  • the file meta information according to the embodiment of the present application is distributed and stored on each data node of the distributed file system, and the main control node needs to connect the data node.
  • the file meta information is loaded into the memory of the master node.
  • the master node receives location information of all data nodes, and sends a report request reporting the primary index information to the data node according to the location information of the data node. After receiving the reporting request, the data node sends its locally stored primary index information to the primary control node. After receiving the returned primary index information, the master node traverses the returned primary index information and queries whether the primary index information exists in the memory, and if so, stores the newer one of the two primary index information. Index information; otherwise, the primary index information is newly created on the master node. Finally, the master node periodically checks whether all the data nodes report success. If the process of loading the file metadata information is not successfully performed, the master node sends a report command to the data node until the report is successful.
  • the system may be unstable due to network instability or system bugs.
  • a part of the index exists on the data node, but it does not exist on the master node (the user cannot see or access it).
  • the master node is required to periodically (for example, 1 day) request the data node to re-report the first-level index information, so as to achieve the purpose of aligning the primary index between the master node and the data node.
  • the master node when the master node starts, it waits for the data node to report the primary index information, and the master node does not need to wait for all the data nodes to report completion (that is, does not need to establish all the primary index information).
  • Service the master node can load the first-level index, establish the relationship between the primary index and the data node, and provide read and write services to the outside. The process of reading and writing services is described in detail below.
  • the read service process includes:
  • the client receives the access file meta information request from the user, and the request carries the file name information; the client reversely parses the first index ID and the second level of the corresponding file according to the file name carried in the request. Index ID;
  • the client sends a request for accessing a file meta information to the main control node, where the request carries a primary index ID;
  • the master node queries in its memory whether there is information consistent with the primary index ID, and if present, returns the primary index information in the memory and the location information of the data node responsible for managing the file to the client, otherwise Return a failure message;
  • the client sends a request for accessing the file meta information to the corresponding data node according to the location information of the returned data node, where the request includes a primary index ID and a secondary index ID;
  • the data node queries the corresponding file meta information according to the primary index ID and the secondary index ID and returns it to the client, and the process ends.
  • the writing service process includes:
  • the user sends a request to create a file (create file meta information) to the master node through the client;
  • the master node searches for the first-level index that meets the condition from its memory, and if so, returns the first-level index to the client, otherwise allocates a brand-new level index and returns it to the client;
  • the client sends a request to create a file (create file meta information) to the corresponding data node;
  • the data node creates a file (file meta information) and stores the file meta information to the local disk, and then returns the file meta information to the client.
  • the master node only needs to load a small amount of primary index information, and can provide external read/write services by loading a primary index. And, multiple data nodes simultaneously load meta information, a single data node from The amount of data read on the disk is only 1/(the number of data nodes) of the total meta information, making the system boot faster.
  • the process of capacity expansion is mainly controlled by the master control node.
  • the master node only migrates one level index at a time until all the data node capacity and the space occupied by the file meta information are basically balanced.
  • the primary index being migrated cannot provide a write service at this time, but can provide a read service; the primary index without migration is not affected at all, and both the write service and the read service can be provided.
  • the master node controls the speed of the migration.
  • the migration process has little impact on the user.
  • the expansion process is described in detail below.
  • the master control node periodically performs a load balancing algorithm in the background, determines a primary index ID to be migrated, a source data node where the primary index is located, and a target data node, and then sends a data migration command to the target data node;
  • the target data node After receiving the data migration command, the target data node actively pulls all the file meta-information under the primary index to the local data node to the local, and stores it persistently;
  • the target data node sends a migration command to the source data node according to the location information of the source data node, where the primary data node includes a primary index ID; after receiving the data migration command, the source data node queries the primary index according to the primary index ID. All secondary indexes and file meta information (creation time, modification time, size, etc.) are packaged and returned to the target data node; the target data node pulls the packaged data from the source data node to the local, and the file meta information and data are stored locally. After successful, report the migration result to the master node.
  • the master node reconstructs the relationship between the primary index and the data node (increasing the relationship between the primary index and the target node), and simultaneously sends a delete primary index command to the source data node, and the primary data node deletes the primary index.
  • Meta information is distributed and stored on multiple data nodes to avoid single point of failure
  • the main control node only needs to store part of the meta-information, that is, only the first-level index information is stored, which reduces the storage burden of the main control node;
  • Meta-information can provide read-write services to the outside while loading.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can employ computer programs embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer usable program code embodied therein.
  • the form of the product includes but not limited to disk storage, CD-ROM, optical storage, etc.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种分布式文件系统及其文件元信息管理方法,其中所述方法包括:所述主控节点接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一(S102);所述数据节点接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内部全局唯一(S104);所述客户端根据所述一级索引标识信息和二级索引标识信息生成文件名(S106);所述数据节点存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名(S108)。所述分布式文件系统及其文件元信息管理方法解决了文件元信息集中存储在主控节点而易导致的单点故障问题。

Description

分布式文件系统及其文件元信息管理方法
本申请要求2015年06月18日递交的申请号为201510342104.5、发明名称为“分布式文件系统及其文件元信息管理方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及分布式文件系统及其文件元信息管理方法。
背景技术
目前分布式文件系统(例如Hadoop Distribute File System,简称HDFS)大多采用存储在本地磁盘的方式管理文件的元信息。随着分布式文件系统规模越来越大,文件元信息的规模也越来越大,这样就无法将全部的元信息存储在内存中,因此系统需要从磁盘中加载元信息,这导致加载元信息的时间也越来越长。
这种分布式文件系统的集中式存储文件元信息的方式使得扩展不容易,并导致了性能瓶颈,具体表现在:
(1)元信息集中存储在主控节点,存在单点故障;
(2)集群的规模受限于主控节点的内存大小;
(3)系统启动速度慢,无法做到边加载边提供服务;
(4)元信息规模大时无法动态平滑扩展。
基于现有技术中存在的分布式文件系统的集中式存储导致的系统启动慢、系统长时间不能提供服务等性能瓶颈,因此有必要提出改进的技术手段解决上述问题。
发明内容
本申请的主要目的在于提供一种分布式文件系统及其文件元信息管理方法,以克服现有技术中的分布式文件系统的集中式存储导致的性能瓶颈的问题。
根据本申请实施例提供了一种分布式文件系统的文件元信息管理方法,所述分布式文件系统包括主控节点、数据节点以及客户端,所述方法包括:所述主控节点接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一;所述数据节点接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引 标识信息在一级索引内部全局唯一;所述客户端根据所述一级索引标识信息和二级索引标识信息生成文件名;所述数据节点存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名。
其中,所述方法还包括:所述客户端接收到所述主控节点返回的文件的一级索引标识信息;所述客户端向对应的数据节点发送创建文件请求,其中携带有文件的一级索引标识信息。
其中,创建文件后,所述数据节点存储的文件元信息还包括:文件创建时间、文件修改时间、文件大小、文件状态。
其中,所述方法还包括:所述数据节点向备份的数据节点转发所述文件元信息。
其中,所述方法还包括:所述主控节点接收到系统内的数据节点的位置信息;所述主控节点根据数据节点的位置信息分别向数据节点发送汇报一级索引的汇报请求;所述数据节点接收到汇报请求后,将其本地存储的一级索引发送至所述主控节点,其中,所述一级索引包括以下信息:一级索引标识信息、一级索引管理的文件的大小、一级索引管理的文件的数量、一级索引版本信息;所述主控节点将接收到的一级索引加载至主控节点内存,如果内存中已经存在对应的一级索引,则存储两个一级索引信息中较新的一级索引。
其中,所述方法还包括:系统启动时或按照预定时间,所述主控节点要求系统内的数据节点汇报一级索引,并执行将一级索引加载至主控节点内存的步骤。
其中,在所述主控节点执行将一级索引加载至主控节点内存的步骤同时或之后,所述方法还包括:所述客户端接收到访问文件元信息请求,根据该请求中携带的文件名解析得到对应的一级索引标识信息和二级索引标识信息;所述主控节点接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息;所述主控节点在其内存中查询是否存在该一级索引标识信息,如果是则将该一级索引标识信息返回至所述客户端;所述数据节点接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息和二级索引标识信息;所述数据节点根据一级索引标识信息和二级索引标识信息查询到对应的文件元信息并返回给所述客户端。
其中,在所述主控节点执行将一级索引加载至主控节点内存的步骤同时或之后,所述方法还包括:所述主控节点接收到所述客户端发送的创建文件元信息请求,所述主控节点新生成或分配已有的一级索引标识信息返回至所述客户端;所述数据节点接收到所述客户端发送的创建文件元信息请求,根据其中携带的一级索引标识信息分配二级索引 标识信息;所述数据节点创建文件元信息并将创建的文件元信息返回给所述客户端。
其中,所述方法还包括:所述主控节点根据负载均衡算法确定出需要迁移的一级索引、该一级索引的源数据节点和目标数据节点,并向所述目标数据节点发送数据迁移命令;接收到数据迁移命令后,所述目标数据节点通过所述源数据节点获取该一级索引下所有的文件元信息并进行存储;所述主控节点存储该一级索引与目标数据节点的关系,并向所述源数据节点发送删除一级索引的命令,以使所述源数据节点删除该一级索引。
根据本申请实施例还提供了一种分布式文件系统,其包括:主控节点、数据节点以及客户端;所述主控节点,用于接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一;所述数据节点,用于接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内部全局唯一;所述客户端,用于根据所述一级索引标识信息和二级索引标识信息生成文件名;所述数据节点还用于存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名。
其中,所述客户端还用于,接收到所述主控节点返回的文件的一级索引标识信息,向对应的数据节点发送创建文件请求,其中携带有文件的一级索引标识信息。
其中,创建文件后,所述数据节点存储的文件元信息还包括:文件创建时间、文件修改时间、文件大小、文件状态。
其中,所述数据节点还用于,向备份的数据节点转发所述文件元信息。
其中,所述主控节点还用于,接收到系统内的数据节点的位置信息,根据数据节点的位置信息分别向数据节点发送汇报一级索引的汇报请求;所述数据节点还用于,接收到汇报请求后,将其本地存储的一级索引发送至所述主控节点,其中,所述一级索引包括以下信息:一级索引标识信息、一级索引管理的文件的大小、一级索引管理的文件的数量、一级索引的版本信息;所述主控节点还用于,将接收到的一级索引加载至主控节点内存,如果内存中已经存在对应的一级索引,则存储两个一级索引中较新的一级索引。
其中,所述主控节点还用于,系统启动时或按照预定时间要求系统内的数据节点汇报一级索引,并执行将一级索引加载至主控节点内存的步骤。
其中,所述客户端还用于,在所述主控节点执行加载一级索引信息的步骤同时或之后接收到访问文件元信息请求,根据该请求中携带的文件名解析得到对应的一级索引标识信息和二级索引标识信息;所述主控节点还用于,接收到客户端发送的访问文件元信 息请求,其中携带有一级索引标识信息;所述主控节点在其内存中查询是否存在该一级索引标识信息,如果是则将该一级索引标识信息返回至所述客户端;所述数据节点还用于,接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息和二级索引标识信息;所述数据节点根据一级索引标识信息和二级索引标识信息查询到对应的文件元信息并返回给所述客户端。
其中,所述主控节点还用于,在执行加载一级索引信息的步骤同时或之后接收到所述客户端发送的创建文件元信息请求;所述主控节点新生成或分配已有的一级索引标识信息返回至所述客户端;所述数据节点还用于,接收到所述客户端发送的创建文件元信息请求,,根据其中携带的一级索引标识信息分配二级索引标识信息;所述数据节点创建文件元信息并将创建的文件元信息返回给所述客户端。
其中,所述主控节点还用于,根据负载均衡算法确定出需要迁移的一级索引、该一级索引的源数据节点和目标数据节点,并向所述目标数据节点发送数据迁移命令;所述目标数据节点用于,接收到数据迁移命令后,通过所述源数据节点获取该一级索引下所有的文件元信息并进行存储;所述主控节点还用于,存储该一级索引与目标数据节点的关系,并向所述源数据节点发送删除一级索引的命令,以使所述源数据节点删除该一级索引。
综上所述,根据本申请的技术方案,文件元信息管理存储由主控节点和数据节点(存储节点)协作完成,解决了文件元信息集中存储在主控节点而易导致的单点故障问题。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出根据本申请实施例的分布式文件系统的文件元信息管理方法的流程图;
图2示出根据本申请实施例的文件元信息加载的流程图;
图3示出根据本申请实施例的分布式文件系统的结构框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在 没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
根据本申请实施例提供一种分布式文件系统的文件元信息管理方法,所述分布式文件系统包括:至少一个主控节点、多个数据节点以及至少一个客户端。
参考图1,图1示出根据本申请一个实施例的分布式文件系统的文件元信息管理方法的流程图,所述方法包括:
步骤S102,主控节点接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一。
根据本申请实施例,客户端负责提供创建文件(创建文件元信息)的接口,用户通过客户端向主控节点发起创建文件元信息请求(即创建文件请求),创建文件的同时生成文件元信息,主控节点生成并在本地存储文件的一级索引以及负责管理该文件的数据节点的位置信息。在具体实施中,主控节点可以根据负载均衡算法确定负责管理文件的数据节点,具体细节此处不再赘述。
其中,所述文件的一级索引信息包括:一级索引标识信息(ID)、一级索引管理的文件的大小(Size)、一级索引管理的文件的数量(Count)、一级索引版本信息(Version),而一级索引ID在系统内部全局唯一。
步骤S104,所述数据节点接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内部全局唯一。
根据本申请实施例,所述客户端接收到所述主控节点返回的文件的一级索引标识信息后,客户端根据负责管理该文件的数据节点的位置信息向对应的数据节点发送创建文件请求,其中该创建文件请求中携带有文件的一级索引标识信息。然后,所述数据节点根据一级索引标识信息分配件的二级索引标识信息(ID),其中,二级索引ID在同一个一级索引内部全局唯一。
步骤S106,所述客户端根据所述一级索引标识信息和二级索引标识信息生成文件名。
所述数据节点生成文件的二级索引标识信息后,将所述二级索引标识信息返回给所述客户端;所述客户端根据返回的一级索引标识信息和二级索引标识信息通过加密算法(例如base64算法)生成文件名(filename),然后将生成的文件名发送至所述数据节点。
步骤S108,所述数据节点存储文件元信息,其包括:一级索引标识信息、二级索引 标识信息以及文件名。
在本申请的一个实施例中,所述数据节点创建文件之后,所述数据节点存储的文件元信息还包括:文件创建时间(create_time)、文件修改时间(modify_time)、文件大小(size)、文件状态(status)等。此外,所述数据节点(Master数据节点)还需要将上述的文件元信息转发至备份的数据节点(Slave数据节点)。
上述实施例描述了文件元信息生成过程,本申请的文件元信息管理存储由主控节点和数据节点(存储节点)协作完成,解决了现有技术中的元信息集中存储在主控节点而易导致的单点故障问题。
下面结合图2详细描述文件元信息的加载流程。分布式文件系统在启动时需要将文件的元信息加载到主控节点内存,根据本申请实施例的文件元信息分散存储在分布式文件系统的每一个数据节点上,如图2所示,具体的文件元信息加载流程包括:
步骤S202,系统启动时,主控节点获取系统内所有数据节点的相关信息,包括:数据节点的位置信息(IP地址)以及该数据节点所监听的主控节点的端口号(PORT);
步骤S204,主控节点根据全部数据节点的位置信息,分别向每个数据节点发送汇报一级索引的汇报请求;
步骤S206,数据节点收到汇报请求后,从本地磁盘读取一级索引并发送给主控节点,其中,所述一级索引包括以下信息:一级索引标识信息、一级索引管理的文件的大小、一级索引管理的文件的数量、一级索引版本信息;
步骤S208,主控节点接收到返回的信息后,遍历返回的一级索引并在主控节点内存中查询是否存在对应的一级索引;
步骤S210,如果不存在,则新建与一级索引相关的数据结构,即在主控节点新建与该一级索引相同的一级索引;
步骤S212,如果存在,则对比两个一级索引相关信息,存储较新的一级索引,并将另一个一级索引从相关的节点上删除;
具体地,在比对两个一级索引时,可通过索引的版本号(Version)判断哪个一级索引较新。如果主控节点存储的一级索引较新,则删除数据节点存储的一级索引,如果数据节点存储的一级索引较新,则主控数据节点存储该一级索引,以实现主控节点与数据节点之间对齐一级索引信息。
步骤S214,主控节点定时检查所有的数据节点是否汇报成功,如果不成功继续执行 步骤S204,向数据节点发送汇报命令直到汇报成功。
需要说明,在系统运行过程中由于网络不稳定或者系统有漏洞(bug),可能会出现一部分索引在数据节点上存在,但在主控节点上不存在的情况(用户看不到也不能访问)。这时就需要主控节点定期(例如1天)要求数据节点重新汇报一级索引信息,以实现主控节点与数据节点之间对齐一级索引的目的。
在本申请的一个实施例中,主控节点启动时会等待数据节点汇报一级索引信息,主控节点不需要等待所有数据节点汇报完成(即不需要全部建立一级索引信息)后才对外提供服务;主控节点可以一边加载一级索引、建立一级索引与数据节点的关系,一边对外提供读写服务,下面详细描述读写服务的过程。
读服务流程包括:
(1)客户端接收到来自用户的访问文件元信息请求,该请求中携带有文件名信息;所述客户端根据该请求中携带的文件名反解析得到对应文件的一级索引ID和二级索引ID;
(2)所述客户端向主控节点发送访问文件元信息请求,该请求中携带有一级索引ID;
(3)主控节点在其内存中查询是否存在与一级索引ID一致的信息,如果存在就将内存中的一级索引信息以及负责管理该文件的数据节点的位置信息返回给客户端,否则返回失败消息;
(4)客户端根据返回的数据节点的位置信息向对应的数据节点发送访问文件元信息请求,该请求中包括一级索引ID和二级索引ID;
(5)数据节点根据一级索引ID和二级索引ID查询到对应的文件元信息并返回给客户端,流程结束。
写服务流程包括:
(1)用户通过客户端向主控节点发送创建文件(创建文件元信息)请求;
(2)主控节点从其内存中查找符合条件的一级索引,如果有则将该一级索引返回给客户端,否则分配一个全新的一级索引并返回给客户端;
(3)客户端向对应的数据节点发送创建文件(创建文件元信息)请求;
(4)数据节点创建文件(文件元信息)并将文件元信息持久存储到本地磁盘,然后将所述文件元信息返回给客户端。
根据本申请实施例,主控节点只需要加载少量的一级索引信息,只要加载一个一级索引就可以对外提供读写服务。并且,多个数据节点同时加载元信息,单个数据节点从磁盘上读取元信息的数据量只有总元信息的1/(数据节点数量),使得系统启动更快。
在本申请的一个实施例中,由于数据和文件元信息存放在一起,当对元信息扩容或者对存储数据的空间扩容时,这两者都会一起扩容。扩容的过程主要由主控节点来控制,主控节点每一次只迁移一个一级索引,直到所有的数据节点容量、文件元信息所占的空间基本均衡为止。正在迁移的一级索引此时不能对外提供写服务,但可以提供读服务;没有迁移的一级索引不受任何影响,既可以对外提供写服务又可以提供读服务。主控节点会控制迁移的速度,迁移过程中对用户基本没有什么影响,下面详细描述扩容流程。
(1)主控节点定期在后台执行负载均衡算法,确定出需要迁移的一级索引ID、一级索引所在的源数据节点、以及目标数据节点,然后向目标数据节点发送数据迁移命令;
(2)目标数据节点接收到数据迁移命令后,主动向源数据节点拉取一级索引下所有的文件元信息到本地,并持久存储;
具体地,目标数据节点根据源数据节点的位置信息向源数据节点发送迁移命令,其中包括有一级索引ID;源数据节点收到数据迁移命令后,根据一级索引ID查询出该一级索引下所有的二级索引以及文件元信息(创建时间、修改时间、大小等)并打包返回给目标数据节点;目标数据节点从源数据节点将打包数据拉到本地,文件元信息以及数据都在本地存储成功后向主控节点汇报迁移结果。
(3)主控节点重建一级索引与数据节点的关系(增加一级索引与目标节点的关系),同时向源数据节点发送删除一级索引命令,由源数据节点删除一级索引。
根据本申请实施例还提供一种分布式文件系统,所述系统包括:至少一个主控节点、多个数据节点以及至少一个客户端。
图3示出根据本申请实施例的分布式文件系统的结构框图,为简明图3中仅示出一个主控节点、一个数据节点以及一个客户端,但是这并不限制本申请中主控节点、数据节点以及客户端的数量。
所述主控节点10,用于接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一;
其中,所述客户端30接收到所述主控节点返回的文件的一级索引标识信息,向对应 的数据节点发送创建文件请求,其中携带有文件的一级索引标识信息。
所述数据节点20,用于接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内部全局唯一;
所述客户端30,用于根据所述一级索引标识信息和二级索引标识信息生成文件名;
所述数据节点20,还用于存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名。进一步地,所述数据节点还用于,向备份的数据节点转发所述文件元信息。
根据本申请实施例,创建文件之后,数据节点存储的文件元信息包括有:
文件名(filename):由一级索引ID、二级索引ID通过加密算法(例如base64算法)生成;
文件创建时间(create_time);
文件修改时间(modify_time);
文件长度或文件大小(size);
文件状态(status);
一级索引ID(first_index_id):由主控节点生成,全局唯一;
二级索引ID(second_index_id):由数据节点生成,在同一个一级索引内部全局唯一;一级索引ID以及二级索引ID可以确定一个唯一的文件。
分布式文件系统在启动时需要将文件的元信息加载到主控节点内存,根据本申请实施例的文件元信息分散存储在分布式文件系统的每一个数据节点上,主控节点需要将数据节点的文件元信息加载到主控节点内存。
首先,所述主控节点接收所有数据节点的位置信息,并根据数据节点的位置信息向数据节点发送汇报一级索引信息的汇报请求。数据节点收到汇报请求后,将其本地存储的一级索引信息发送至所述主控节点。所述主控节点接收到返回的一级索引信息后,遍历返回的一级索引信息并在内存中查询一级索引信息是否存在,如果存在则存储两个一级索引信息中较新的一级索引信息;否则,在所述主控节点新建该一级索引信息。最后,主控节点定时检查所有的数据节点是否汇报成功,如果不成功继续执行文件元信息加载的过程,主控节点向数据节点发送汇报命令直到汇报成功。
需要说明,在系统运行过程中由于网络不稳定或者系统有漏洞(bug),可能会出现 一部分索引在数据节点上存在,但在主控节点上不存在的情况(用户看不到也不能访问)。这时就需要主控节点定期(例如1天)要求数据节点重新汇报一级索引信息,以实现主控节点与数据节点之间对齐一级索引的目的。
在本申请的一个实施例中,主控节点启动时会等待数据节点汇报一级索引信息,主控节点不需要等待所有数据节点汇报完成(即不需要全部建立一级索引信息)后才对外提供服务;主控节点可以一边加载一级索引、建立一级索引与数据节点的关系,一边对外提供读写服务,下面详细描述读写服务的过程。
读服务流程包括:
(1)客户端接收到来自用户的访问文件元信息请求,该请求中携带有文件名信息;所述客户端根据该请求中携带的文件名反解析得到对应文件的一级索引ID和二级索引ID;
(2)所述客户端向主控节点发送访问文件元信息请求,该请求中携带有一级索引ID;
(3)主控节点在其内存中查询是否存在与一级索引ID一致的信息,如果存在就将内存中的一级索引信息以及负责管理该文件的数据节点的位置信息返回给客户端,否则返回失败消息;
(4)客户端根据返回的数据节点的位置信息向对应的数据节点发送访问文件元信息请求,该请求中包括一级索引ID和二级索引ID;
(5)数据节点根据一级索引ID和二级索引ID查询到对应的文件元信息并返回给客户端,流程结束。
写服务流程包括:
(1)用户通过客户端向主控节点发送创建文件(创建文件元信息)请求;
(2)主控节点从其内存中查找符合条件的一级索引,如果有则将该一级索引返回给客户端,否则分配一个全新的一级索引并返回给客户端;
(3)客户端向对应的数据节点发送创建文件(创建文件元信息)请求;
(4)数据节点创建文件(文件元信息)并将文件元信息持久存储到本地磁盘,然后将所述文件元信息返回给客户端。
根据本申请实施例,主控节点只需要加载少量的一级索引信息,只要加载一个一级索引就可以对外提供读写服务。并且,多个数据节点同时加载元信息,单个数据节点从 磁盘上读取元信息的数据量只有总元信息的1/(数据节点数量),使得系统启动更快。
在本申请的一个实施例中,由于数据和文件元信息存放在一起,当对元信息扩容或者对存储数据的空间扩容时,这两者都会一起扩容。扩容的过程主要由主控节点来控制,主控节点每一次只迁移一个一级索引,直到所有的数据节点容量、文件元信息所占的空间基本均衡为止。正在迁移的一级索引此时不能对外提供写服务,但可以提供读服务;没有迁移的一级索引不受任何影响,既可以对外提供写服务又可以提供读服务。主控节点会控制迁移的速度,迁移过程中对用户基本没有什么影响,下面详细描述扩容流程。
(1)主控节点定期在后台执行负载均衡算法,确定出需要迁移的一级索引ID、一级索引所在的源数据节点、以及目标数据节点,然后向目标数据节点发送数据迁移命令;
(2)目标数据节点接收到数据迁移命令后,主动向源数据节点拉取一级索引下所有的文件元信息到本地,并持久存储;
具体地,目标数据节点根据源数据节点的位置信息向源数据节点发送迁移命令,其中包括有一级索引ID;源数据节点收到数据迁移命令后,根据一级索引ID查询出该一级索引下所有的二级索引以及文件元信息(创建时间、修改时间、大小等)并打包返回给目标数据节点;目标数据节点从源数据节点将打包数据拉到本地,文件元信息以及数据都在本地存储成功后向主控节点汇报迁移结果。
(3)主控节点重建一级索引与数据节点的关系(增加一级索引与目标节点的关系),同时向源数据节点发送删除一级索引命令,由源数据节点删除一级索引。
本申请的方法的操作步骤与系统的结构特征对应,可以相互参照,不再一一赘述。
综上所述,根据本申请实施例具备以下优点:
(1)元信息分散存储在多个数据节点上,避免单点故障;
(2)主控节点只需要存储部分元信息,即只存储一级索引信息,减轻了主控节点的存储负担;
(3)元信息平滑扩容不影响对外提供读写服务;
(4)元信息能够边加载边对外提供读写服务。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程 序产品的形式。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (18)

  1. 一种分布式文件系统的文件元信息管理方法,所述分布式文件系统包括主控节点、数据节点以及客户端,其特征在于,所述方法包括:
    所述主控节点接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一;
    所述数据节点接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内部全局唯一;
    所述客户端根据所述一级索引标识信息和二级索引标识信息生成文件名;
    所述数据节点存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述客户端接收到所述主控节点返回的文件的一级索引标识信息;
    所述客户端向对应的数据节点发送创建文件请求,其中携带有文件的一级索引标识信息。
  3. 根据权利要求1所述的方法,其特征在于,创建文件后,所述数据节点存储的文件元信息还包括:文件创建时间、文件修改时间、文件大小、文件状态。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    所述数据节点向备份的数据节点转发所述文件元信息。
  5. 根据权利要求1所述的方法,其特征在于,还包括:
    所述主控节点接收到系统内的数据节点的位置信息;
    所述主控节点根据数据节点的位置信息分别向数据节点发送汇报一级索引的汇报请求;
    所述数据节点接收到汇报请求后,将其本地存储的一级索引发送至所述主控节点,其中,所述一级索引包括以下信息:一级索引标识信息、一级索引管理的文件的大小、一级索引管理的文件的数量、一级索引版本信息;
    所述主控节点将接收到的一级索引加载至主控节点内存,如果内存中已经存在对应的一级索引,则存储两个一级索引信息中较新的一级索引。
  6. 根据权利要求5所述的方法,其特征在于,还包括:
    系统启动时或按照预定时间,所述主控节点要求系统内的数据节点汇报一级索引, 并执行将一级索引加载至主控节点内存的步骤。
  7. 根据权利要求5所述的方法,其特征在于,在所述主控节点执行将一级索引加载至主控节点内存的步骤同时或之后,所述方法还包括:
    所述客户端接收到访问文件元信息请求,根据该请求中携带的文件名解析得到对应的一级索引标识信息和二级索引标识信息;
    所述主控节点接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息;所述主控节点在其内存中查询是否存在该一级索引标识信息,如果是则将该一级索引标识信息返回至所述客户端;
    所述数据节点接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息和二级索引标识信息;所述数据节点根据一级索引标识信息和二级索引标识信息查询到对应的文件元信息并返回给所述客户端。
  8. 根据权利要求5所述的方法,其特征在于,在所述主控节点执行将一级索引加载至主控节点内存的步骤同时或之后,所述方法还包括:
    所述主控节点接收到所述客户端发送的创建文件元信息请求,所述主控节点新生成或分配已有的一级索引标识信息返回至所述客户端;
    所述数据节点接收到所述客户端发送的创建文件元信息请求,根据其中携带的一级索引标识信息分配二级索引标识信息;
    所述数据节点创建文件元信息并将创建的文件元信息返回给所述客户端。
  9. 根据权利要求5所述的方法,其特征在于,还包括:
    所述主控节点根据负载均衡算法确定出需要迁移的一级索引、该一级索引的源数据节点和目标数据节点,并向所述目标数据节点发送数据迁移命令;
    接收到数据迁移命令后,所述目标数据节点通过所述源数据节点获取该一级索引下所有的文件元信息并进行存储;
    所述主控节点存储该一级索引与目标数据节点的关系,并向所述源数据节点发送删除一级索引的命令,以使所述源数据节点删除该一级索引。
  10. 一种分布式文件系统,其特征在于,包括:主控节点、数据节点以及客户端;
    所述主控节点,用于接收到所述客户端发送的创建文件元信息请求后,生成文件的一级索引标识信息,其中,一级索引标识信息在系统内部全局唯一;
    所述数据节点,用于接收到所述客户端发送的创建文件元信息请求后,根据所述一级索引标识信息分配文件的二级索引标识信息,其中,二级索引标识信息在一级索引内 部全局唯一;
    所述客户端,用于根据所述一级索引标识信息和二级索引标识信息生成文件名;
    所述数据节点还用于存储文件元信息,其包括:一级索引标识信息、二级索引标识信息以及文件名。
  11. 根据权利要求10所述的系统,其特征在于,所述客户端还用于,接收到所述主控节点返回的文件的一级索引标识信息,向对应的数据节点发送创建文件请求,其中携带有文件的一级索引标识信息。
  12. 根据权利要求10所述的系统,其特征在于,创建文件后,所述数据节点存储的文件元信息还包括:文件创建时间、文件修改时间、文件大小、文件状态。
  13. 根据权利要求12所述的系统,其特征在于,所述数据节点还用于,向备份的数据节点转发所述文件元信息。
  14. 根据权利要求10所述的系统,其特征在于,
    所述主控节点还用于,接收到系统内的数据节点的位置信息,根据数据节点的位置信息分别向数据节点发送汇报一级索引的汇报请求;
    所述数据节点还用于,接收到汇报请求后,将其本地存储的一级索引发送至所述主控节点,其中,所述一级索引包括以下信息:一级索引标识信息、一级索引管理的文件的大小、一级索引管理的文件的数量、一级索引的版本信息;
    所述主控节点还用于,将接收到的一级索引加载至主控节点内存,如果内存中已经存在对应的一级索引,则存储两个一级索引中较新的一级索引。
  15. 根据权利要求14所述的系统,其特征在于,
    所述主控节点还用于,系统启动时或按照预定时间要求系统内的数据节点汇报一级索引,并执行将一级索引加载至主控节点内存的步骤。
  16. 根据权利要求14所述的系统,其特征在于,
    所述客户端还用于,在所述主控节点执行加载一级索引信息的步骤同时或之后接收到访问文件元信息请求,根据该请求中携带的文件名解析得到对应的一级索引标识信息和二级索引标识信息;
    所述主控节点还用于,接收到客户端发送的访问文件元信息请求,其中携带有一级索引标识信息;所述主控节点在其内存中查询是否存在该一级索引标识信息,如果是则将该一级索引标识信息返回至所述客户端;
    所述数据节点还用于,接收到客户端发送的访问文件元信息请求,其中携带有一级 索引标识信息和二级索引标识信息;所述数据节点根据一级索引标识信息和二级索引标识信息查询到对应的文件元信息并返回给所述客户端。
  17. 根据权利要求14所述的系统,其特征在于,
    所述主控节点还用于,在执行加载一级索引信息的步骤同时或之后接收到所述客户端发送的创建文件元信息请求;所述主控节点新生成或分配已有的一级索引标识信息返回至所述客户端;
    所述数据节点还用于,接收到所述客户端发送的创建文件元信息请求,,根据其中携带的一级索引标识信息分配二级索引标识信息;所述数据节点创建文件元信息并将创建的文件元信息返回给所述客户端。
  18. 根据权利要求14所述的系统,其特征在于,
    所述主控节点还用于,根据负载均衡算法确定出需要迁移的一级索引、该一级索引的源数据节点和目标数据节点,并向所述目标数据节点发送数据迁移命令;
    所述目标数据节点用于,接收到数据迁移命令后,通过所述源数据节点获取该一级索引下所有的文件元信息并进行存储;
    所述主控节点还用于,存储该一级索引与目标数据节点的关系,并向所述源数据节点发送删除一级索引的命令,以使所述源数据节点删除该一级索引。
PCT/CN2016/085208 2015-06-18 2016-06-08 分布式文件系统及其文件元信息管理方法 WO2016202199A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510342104.5A CN106326239B (zh) 2015-06-18 2015-06-18 分布式文件系统及其文件元信息管理方法
CN201510342104.5 2015-06-18

Publications (1)

Publication Number Publication Date
WO2016202199A1 true WO2016202199A1 (zh) 2016-12-22

Family

ID=57545012

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/085208 WO2016202199A1 (zh) 2015-06-18 2016-06-08 分布式文件系统及其文件元信息管理方法

Country Status (2)

Country Link
CN (1) CN106326239B (zh)
WO (1) WO2016202199A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334054A (zh) * 2019-05-17 2019-10-15 杭州亦笔科技有限公司 一种区块链文件分片存储方法
CN111125216A (zh) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 数据导入Phoenix的方法及装置
CN111190861A (zh) * 2019-12-27 2020-05-22 中移(杭州)信息技术有限公司 热点文件管理方法、服务器及计算机可读存储介质
WO2020125630A1 (zh) * 2018-12-17 2020-06-25 新华三大数据技术有限公司 文件读取
CN111399764A (zh) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 数据存储方法、读取方法、装置、设备及存储介质
CN111666035A (zh) * 2019-03-05 2020-09-15 阿里巴巴集团控股有限公司 一种分布式存储系统的管理方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727652B (zh) * 2018-07-17 2023-06-30 阿里巴巴集团控股有限公司 一种云存储处理系统及其实现数据处理的方法
CN111221814B (zh) * 2018-11-27 2023-06-27 阿里巴巴集团控股有限公司 二级索引的构建方法、装置及设备
CN110196851B (zh) * 2019-05-09 2024-05-10 腾讯科技(深圳)有限公司 一种数据存储方法、装置、设备及存储介质
CN110413407B (zh) * 2019-06-27 2022-05-17 国网浙江省电力有限公司电力科学研究院 一种大流量环境下还原文件的存储及快速索引方法
CN113239013B (zh) * 2021-05-17 2024-04-09 北京青云科技股份有限公司 分布式系统及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411637A (zh) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 分布式文件系统的元数据管理方法
CN103150394A (zh) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 面向高性能计算的分布式文件系统元数据管理方法
CN103577500A (zh) * 2012-08-10 2014-02-12 腾讯科技(深圳)有限公司 分布式文件系统进行数据处理的方法及该分布式文件系统
CN104376025A (zh) * 2013-08-16 2015-02-25 华为技术有限公司 分布式数据库的数据存储方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092927B (zh) * 2012-12-29 2016-01-20 华中科技大学 一种分布式环境下的文件快速读写方法
US10120868B2 (en) * 2013-09-04 2018-11-06 Red Hat, Inc. Outcast index in a distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411637A (zh) * 2011-12-30 2012-04-11 创新科软件技术(深圳)有限公司 分布式文件系统的元数据管理方法
CN103577500A (zh) * 2012-08-10 2014-02-12 腾讯科技(深圳)有限公司 分布式文件系统进行数据处理的方法及该分布式文件系统
CN103150394A (zh) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 面向高性能计算的分布式文件系统元数据管理方法
CN104376025A (zh) * 2013-08-16 2015-02-25 华为技术有限公司 分布式数据库的数据存储方法和装置

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020125630A1 (zh) * 2018-12-17 2020-06-25 新华三大数据技术有限公司 文件读取
CN111666035A (zh) * 2019-03-05 2020-09-15 阿里巴巴集团控股有限公司 一种分布式存储系统的管理方法及装置
CN111666035B (zh) * 2019-03-05 2023-06-20 阿里巴巴集团控股有限公司 一种分布式存储系统的管理方法及装置
CN110334054A (zh) * 2019-05-17 2019-10-15 杭州亦笔科技有限公司 一种区块链文件分片存储方法
CN111125216A (zh) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 数据导入Phoenix的方法及装置
CN111125216B (zh) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 数据导入Phoenix的方法及装置
CN111399764A (zh) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 数据存储方法、读取方法、装置、设备及存储介质
CN111190861A (zh) * 2019-12-27 2020-05-22 中移(杭州)信息技术有限公司 热点文件管理方法、服务器及计算机可读存储介质
CN111190861B (zh) * 2019-12-27 2023-06-30 中移(杭州)信息技术有限公司 热点文件管理方法、服务器及计算机可读存储介质

Also Published As

Publication number Publication date
CN106326239B (zh) 2020-01-31
CN106326239A (zh) 2017-01-11

Similar Documents

Publication Publication Date Title
WO2016202199A1 (zh) 分布式文件系统及其文件元信息管理方法
US10831720B2 (en) Cloud storage distributed file system
US10789217B2 (en) Hierarchical namespace with strong consistency and horizontal scalability
US20190370362A1 (en) Multi-protocol cloud storage for big data and analytics
US10817498B2 (en) Distributed transactions in cloud storage with hierarchical namespace
US10540119B2 (en) Distributed shared log storage system having an adapter for heterogenous big data workloads
US20130218934A1 (en) Method for directory entries split and merge in distributed file system
CN109684282B (zh) 一种构建元数据缓存的方法及装置
US20110153606A1 (en) Apparatus and method of managing metadata in asymmetric distributed file system
US20120259901A1 (en) Distributed storage of data in a cloud storage system
US11297031B2 (en) Hierarchical namespace service with distributed name resolution caching and synchronization
CN111597148B (zh) 用于分布式文件系统的分布式元数据管理方法
CN106570113B (zh) 一种海量矢量切片数据云存储方法及系统
US20190199794A1 (en) Efficient replication of changes to a byte-addressable persistent memory over a network
KR101236477B1 (ko) 비대칭 클러스터 파일 시스템의 데이터 처리 방법
WO2023036005A1 (zh) 信息处理方法及装置
US10082978B2 (en) Distributed shared log storage system having an adapter for heterogenous big data workloads
CN107493309B (zh) 一种分布式系统中的文件写入方法及装置
KR102208704B1 (ko) Sql 쿼리에 해당하는 동작을 수행할 수 있는 블록체인 소프트웨어, 블록체인 시스템, 및 이의 동작 방법
US20180004430A1 (en) Chunk Monitoring
KR20110070697A (ko) 비대칭 분산 파일 시스템에서의 메타데이터 관리 장치 및 방법
KR20130038517A (ko) 분산된 컨테이너들을 사용하여 데이터를 관리하는 시스템 및 방법
CN113553314A (zh) 一种超融合系统的服务处理方法、装置、设备及介质
CN112181899A (zh) 一种元数据的处理方法、装置及计算机可读存储介质
WO2024002349A1 (zh) 文件管理方法、服务器、存储节点、文件存储系统、客户端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16810940

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16810940

Country of ref document: EP

Kind code of ref document: A1