WO2023179787A1 - Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués - Google Patents

Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués Download PDF

Info

Publication number
WO2023179787A1
WO2023179787A1 PCT/CN2023/083879 CN2023083879W WO2023179787A1 WO 2023179787 A1 WO2023179787 A1 WO 2023179787A1 CN 2023083879 W CN2023083879 W CN 2023083879W WO 2023179787 A1 WO2023179787 A1 WO 2023179787A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
directory
index node
node number
level
Prior art date
Application number
PCT/CN2023/083879
Other languages
English (en)
Chinese (zh)
Inventor
苏昆辉
殳鑫鑫
杨彦斌
郑锴
王道远
孙大鹏
曹杰
孙立晟
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023179787A1 publication Critical patent/WO2023179787A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Definitions

  • This specification relates to the field of storage technology, and in particular, to a metadata management method and device for a distributed file system.
  • this specification provides a metadata management method and device for a distributed file system.
  • a metadata management method for a distributed file system applied to a distributed file system.
  • the distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keywords are generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the method includes:
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset sequence to generate this Keywords for level directories.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • a data access method for distributed file systems, applied to distributed file systems including:
  • the aforementioned metadata management method is used to perform metadata query based on the path.
  • a metadata management device for a distributed file system applied to a distributed file system.
  • the distributed file system stores a mapping relationship between keywords of directories at all levels and index node numbers of directory metadata, wherein, The current-level directory keyword is generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the device includes:
  • the name acquisition unit extracts the directory names of directories at all levels from the file path
  • the keyword generation unit generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
  • a number search unit that searches the index node number of the current level directory metadata in the mapping relationship based on the keyword
  • a metadata acquisition unit that acquires the file from the cloud database based on the index node number of the file path. Metadata for the path.
  • a data access device for a distributed file system applied to the distributed file system, including:
  • the metadata query unit responds to the data access request sent by the client and queries the corresponding metadata according to the path of the data to be accessed;
  • a data access unit returns the metadata to the client so that the client can access data based on the metadata
  • the aforementioned metadata management method is used to perform metadata query based on the path.
  • a metadata management device for a distributed file system including:
  • Memory used to store machine-executable instructions
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata.
  • the keywords of the directory at this level are based on the index node numbers of the upper-level directory metadata and the name of the directory at this level.
  • a computer-readable storage medium stores a computer program, and the computer program is used to cause a processor to execute the above metadata management method.
  • the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system.
  • the method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
  • Figure 1 is a schematic diagram of the architecture of a distributed file system in related technologies.
  • Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
  • Figure 4 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
  • Figure 5 is a schematic flowchart of a data access method in a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 6 is a hardware structure diagram of an electronic device in which a metadata management device of a distributed file system is located according to an exemplary embodiment of this specification.
  • FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
  • FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • word “if” as used herein may be interpreted as "when” or “when” or “in response to determining.”
  • Figure 1 is a schematic architectural diagram of a distributed file system according to an exemplary embodiment of this specification.
  • a distributed file system may include name nodes and data nodes.
  • the namenode is responsible for managing the namespace of the distributed file system, maintaining the file system tree, and the metadata of each file and folder in the file system.
  • Datanode is used to store data in the form of blocks.
  • a distributed file system client When a distributed file system client performs data access, it can send an access request to the name node. Taking the data read request as an example, the name node will search for the corresponding metadata and return the metadata to the client. The client can then obtain the data block where the data is based on the metadata, and then read it from the data node based on the data block. corresponding data. Taking the data write request as an example, the name node will also search for the corresponding metadata. If the metadata is found, the metadata can be returned to the client.
  • the client can then obtain the data block where the data is based on the metadata, and then based on the data block to the data node and write the corresponding data; if the metadata is not found, you can create a new index node number and write Metadata such as the location of the data block is returned to the client, and the client can then write data in the corresponding data block.
  • This manual provides a metadata management solution for a distributed file system.
  • the distributed file system can jointly implement metadata storage in conjunction with the cloud database, thus solving the limitation of disk capacity on metadata storage.
  • Metadata is the data of data, which can be used to describe data attributes and support functions such as indicating storage location, historical data, resource search, file recording, etc.
  • Metadata includes necessary descriptive information for reading and writing, such as real path, size, creation time, permissions, etc.
  • the file path can usually point to a specific file, and the file can be accessed through the file path.
  • a path usually includes multiple levels of directories, and each level of directory can have a corresponding directory name.
  • the file path includes four levels of directories.
  • the names of these four levels of directories are the folder names, which are user, hive, warehouse and file.
  • the distributed file system can store the mapping relationship between the keywords of directories at all levels and the index node numbers of directory metadata, without the need to store the full amount of metadata.
  • the mapping relationship may be stored in the form of key-value, for example, in the Namenode.
  • the index node number is the Inode (index node, index node) number, that is, the Inode ID.
  • An inode is a data structure that allows metadata to be looked up based on the inode number.
  • the keyword may be generated based on the index node number of the upper-level directory metadata and the current-level directory name.
  • the name of the current-level directory and the index node number of the metadata of the upper-level directory are spliced to obtain the keywords of the current-level directory.
  • the keyword 100hive can be generated.
  • the current-level directory name and the index node number of the upper-level directory metadata are calculated to obtain the keywords of the current-level directory.
  • the keywords for the first-level directory there is no upper-level directory.
  • it can be based on a The first-level directory name and preset characters generate keywords for the first-level directory.
  • the keyword 0user can be generated based on the preset character 0 and the directory name user.
  • the cloud database can store the full amount of metadata, and can also store the mapping relationship between the keywords of directories at all levels and the index node numbers of the directory metadata, so that the distributed file system can update its stored mapping relationship.
  • the distributed file system and the cloud database jointly realize the storage of metadata. There is no need to store all metadata in the distributed file system.
  • This distributed metadata storage method can effectively solve the storage problem of metadata due to the disk capacity of the distributed file system. Restricted, suitable for application scenarios with massive files such as data lakes.
  • Figure 2 is a schematic flowchart of a metadata management method for a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management method of the distributed file system can be applied to the distributed file system, for example, applied to the name node in the distributed file system, including the following steps:
  • Step 202 Extract the directory names of directories at each level from the file path.
  • the user-side client can send read and write requests to the distributed file system.
  • the distributed file system usually needs to search for the metadata of the file, the metadata of the file path, and sometimes the metadata of the file. Metadata of the directory where the file is located, metadata of the directory above the file, etc. Based on these metadata, information such as file type, file size, creation time, modification time, user, executable permissions, etc. can be obtained.
  • the distributed file system when performing metadata search, can first extract the directory names of directories at all levels from the file path.
  • the directory names user, hive, warehouse, and file at all levels can be extracted.
  • Step 204 According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
  • Step 206 Search the index node number of the current level directory metadata in the mapping relationship based on the keyword.
  • the distributed file system can find the index node numbers of the directories at each level on the file path based on the mapping relationship between the locally stored keywords of the directories at each level and the index node numbers of the directory metadata.
  • the distributed file system Before performing a lookup, the distributed file system can generate the keys needed to look up the inode number.
  • the keywords in this manual are generated based on the index node number of the upper-level directory metadata, when querying the index node number, the keywords for each level of directory can be generated in order from the upper level to the lower level of the directory to perform directory indexing at all levels. Query node number.
  • the keywords of the first-level directory can be generated first, and then the index node number of the first-level directory metadata can be found in the above mapping relationship stored locally in the distributed file system based on the keywords of the first-level directory. Then, the index node number of the secondary directory metadata can be found in the above mapping relationship based on the directory name of the secondary directory and the index node number of the primary directory metadata. Then, based on the directory name of the third-level directory and the metadata of the second-level directory The index node number finds the index node number of the third-level directory metadata in the above mapping relationship. By analogy, the index node numbers of directory metadata at all levels on the file path can be found.
  • the distributed file system can store the following table in the form of key-value The mapping relationship shown in 2.
  • Table 2 is only an illustrative description. In actual applications, there is no need to store the left directory column. Moreover, in addition to storing the index node number, the value field can also store some metadata of the directory, such as directory name, directory size, etc.
  • the keyword 0user of the first-level directory when querying the index node number, can be generated based on the first-level directory name user and the preset character 0, and the mapping relationship shown in Table 2 can be queried based on the keyword 0user. Then the index node number 100 of the first-level directory metadata is found.
  • the secondary directory keyword 100hive can be generated based on the secondary directory name hive and the index node number 100 of the primary directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 100hive to find the secondary directory metadata.
  • the index node number is 101.
  • the third-level directory keyword 101warehouse can be generated based on the third-level directory name warehouse and the index node number 101 of the second-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 101warehouse to find the third-level directory metadata.
  • the index node number is 102.
  • the fourth-level directory keyword 102file can be generated based on the fourth-level directory name file and the index node number 102 of the third-level directory metadata, and the mapping relationship shown in Table 2 can be queried based on the keyword 102file to find the fourth-level directory metadata.
  • the index node number is 103.
  • step 202 can be executed before step 204, that is, before generating keywords, the directory names of directories at each level are extracted from the file path.
  • Step 202 can also be executed in conjunction with the loop process of steps 204-206, that is, in step 202, first extract the first-level directory name from the file path, and then execute steps 204-206 to generate the first-level directory keywords and search for the first-level directory name.
  • the index node number of the directory metadata then you can return to step 202 to extract the secondary directory name from the file path, and then execute steps 204-206 to generate the keywords of the secondary directory and search for the index node of the secondary directory metadata. number, and so on, and execute steps 202-206 in a loop. This manual does not impose special restrictions on this.
  • Step 208 Obtain the metadata of the file path from the cloud database based on the index node number of the file path.
  • the distributed file system can obtain the full amount of metadata pointed to by the index node numbers from the cloud database.
  • metadata can be obtained in the cloud database based on access requirements.
  • the metadata of the file path /user/hive/warehouse/file can be obtained based on the index number 103. If you need to obtain the file's upper-level directory, the metadata of the third-level directory /user/hive/warehouse can be obtained based on the index node number 102, or on this basis, the metadata of the second-level directory /user/hive can be obtained based on the index node number 101, etc.
  • the mapping relationship between directory keywords at all levels and the index node numbers of directory metadata is locally stored in the distributed file system.
  • the method of jointly storing metadata in a distributed file system and a cloud database solves the performance bottleneck of single-machine metadata services, improves the scalability of the system, and can provide file storage of more than one billion levels.
  • batch processing can be used to merge the multiple index node numbers that need to be queried, and then obtain these index node numbers from the cloud database at one time
  • Batch processing is used to obtain metadata from the cloud database.
  • the index node number of the directory metadata cannot be found in the mapping relationship based on the generated keywords in step 206, it means that the distributed file system has not yet matched the directory keywords at all levels stored in the cloud database with the directory.
  • the mapping relationship between the index node numbers of the metadata is stored locally; or, the original directory name is modified, and the keyword generated by the distributed file system using the new directory name cannot find the corresponding index node number.
  • the distributed file system When the distributed file system cannot find the index node number of the directory metadata in the local mapping relationship based on the generated keywords, it can search the index node number from the cloud database based on the generated keywords to update the local mapping relationship, and can update the local mapping relationship based on the generated keywords. Obtain the directory metadata from the index node number found in the cloud database.
  • the distributed file system when the distributed file system obtains the index node number of each level directory metadata on the new file path /user/hive001/warehouse/file, it first generates a first-level directory based on the first-level directory name user and the preset character 0.
  • the keyword 0user of the directory is used, and based on the keyword 0user, the locally stored mapping relationship shown in Table 2 is queried, and the index node number 100 of the first-level directory metadata is found.
  • the secondary directory keyword 100hive001 is generated based on the secondary directory name hive001 and the index node number 100 of the primary directory metadata. Based on this keyword 100hive001, the corresponding index node number cannot be queried in the locally stored table 2.
  • the distributed file system can then query the index node number in the cloud database. That is, query the index node number 101 corresponding to the keyword 100hive001 in the mapping relationship shown in Table 3 stored in the cloud database.
  • the distributed file system can also update the local storage mapping relationship based on the correspondence between the keyword 100hive001 and the index node number 101 queried in the cloud database, that is, update the local storage mapping relationship shown in Table 2 to that shown in Table 3 mapping relationship.
  • the directory name is modified, only the keywords of the corresponding directory need to be modified.
  • the distributed file system can also query the index node number of each lower-level directory metadata of the secondary directory in the cloud database, that is, further query the third-level directory and fourth-level directory in the cloud database
  • the index node number of the metadata is updated, and the local storage mapping relationship is updated based on the query results to avoid problems such as local query failure or inaccurate query caused by the name of the lower-level directory being also modified.
  • the distributed file system can also periodically obtain the latest mapping relationship from the cloud database and update the latest mapping relationship locally. This specification does not impose special restrictions on this.
  • Using the metadata management solution for distributed file systems provided in this manual can also ensure that the distributed file system obtains accurate metadata when metadata changes, and avoids obtaining incorrect metadata due to failure to update local storage mapping relationships in a timely manner. Data issues.
  • FIG. 3 is a schematic flowchart of another metadata management method of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management method of the distributed file system can be applied to the distributed file system and includes the following steps:
  • Step 302 Extract the directory names of directories at each level from the file path.
  • Step 304 According to the order of directories from upper level to lower level, for each extracted directory name, generate a keyword for the current level directory based on the directory name and the index node number of the upper level directory metadata.
  • Step 306 Search the index node index of the current level directory metadata in the local mapping relationship based on the keyword. Number.
  • steps 302-306 may refer to the implementation of steps 202-206 in the embodiment shown in FIG. 2, and will not be described in detail here.
  • Step 308 Search the cloud database for the index node number of the directory metadata corresponding to the directory keywords at each level of the file path.
  • the distributed file system after the distributed file system finds the index node numbers of directory metadata at all levels on the file path based on the mapping relationship of local storage, it can also query the cloud database at all levels based on the generated keywords of directories at all levels.
  • the directory's inode number After the distributed file system finds the index node numbers of directory metadata at all levels on the file path based on the mapping relationship of local storage, it can also query the cloud database at all levels based on the generated keywords of directories at all levels. The directory's inode number.
  • batch processing is used to merge multiple index node numbers that need to be queried, and then query the cloud database.
  • the distributed file system finds the index node numbers 100-103 of the directory metadata at all levels in the local storage mapping relationship, it can use the keywords of the directories at all levels to 0user, 100hive, 101warehouse and 102file query the index node numbers of directory metadata at all levels in the cloud database. That is, the index node number is queried based on the mapping relationship between the directory keywords at all levels stored in the cloud database and the index node number.
  • Step 310 Determine whether the index node number of the directory metadata at each level found based on the mapping relationship is the same as the index node number found in the cloud database.
  • the distributed file system determines whether the index node number found in the local mapping relationship is the same as the index node number found in the cloud database. .
  • step 312 can be executed.
  • step 314 can be executed.
  • Step 312 When the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database, the index node number based on the file path is retrieved from the cloud database. Get the metadata for the file path.
  • the index node number found in the local mapping relationship is the same as the index node number found in the cloud database, it can be explained that the locally stored mapping relationship is the latest mapping relationship, and the mapping relationship stored in the cloud database is the same.
  • the metadata has not changed and can be obtained from the cloud database based on the index node number.
  • Step 314 When the index node number of the directory metadata at all levels found based on the mapping relationship is different from the index node number found in the cloud database, update the index node number based on the index node number found in the cloud database. Mapping relationship, and obtain metadata based on the index node number queried in the cloud database.
  • the index node number found in the local mapping relationship is different from the index node number found in the cloud database, that is, the metadata index node number corresponding to the same directory keyword is different, it means that the cloud database
  • the directory name in the database may have been updated, and the local mapping relationship has not been updated in time.
  • the index node number found in the local mapping relationship based on the new directory name is not the updated directory you want to find.
  • the index node number of the metadata may be the index node number of the historical directory metadata in the original cloud database.
  • the index node number corresponding to the keyword 100hive001 can be found in the local storage mapping relationship, for example, the index node number found in the local storage mapping relationship is 200, It is different from the index node number 101 of 100hive001 stored in the cloud database, which means that the local mapping relationship has not been updated in time.
  • 200 may be the index node number of the historical directory /user/hive001 in the cloud database, which may no longer exist or has been modified.
  • the distributed file system can update the locally stored mapping relationship based on the index node number queried in the cloud database, and on the other hand, it can obtain metadata based on the index node number queried on the cloud database.
  • the distributed file system finds the index node number of directory metadata at all levels in the local storage mapping relationship is 100-103, while the index node number found in the cloud database is 100. , 101, 105 and 106, that is, the index node numbers of the third-level directory metadata and the fourth-level directory metadata are different from those stored locally.
  • the distributed file system can store the third-level directory metadata in the local mapping relationship based on the query results of the cloud database.
  • the index node number 102 of the directory metadata is modified to 105
  • the index node number 103 of the fourth-level directory metadata stored in the local mapping relationship is modified to 106.
  • the distributed file system can also obtain third-level directory metadata and fourth-level directory metadata based on index node numbers 105 and 106.
  • the distributed file system determines the index node number queried in the cloud database and the index node queried based on the local mapping relationship before obtaining metadata from the cloud database. Whether the numbers are the same, and obtain metadata if the index node numbers are the same.
  • accurate metadata can still be obtained, which can effectively avoid problems such as metadata acquisition errors caused by the local mapping relationship of the distributed file system not being updated in time in high concurrency scenarios.
  • this specification also provides a data access method of the distributed file system, which can be applied to the name node in the distributed file system. Please refer to Figure 4 and Figure 5. Includes the following steps:
  • Step 502 In response to the data access request sent by the client, query the corresponding data according to the path of the data to be accessed. metadata.
  • the data access request may be a data read request or a data write request.
  • the name node queries the corresponding metadata based on the data path to be read.
  • the metadata query can be implemented based on the metadata query solution described in the embodiment of FIG. 2 or FIG. 3 mentioned above in this specification. For example, the name node first queries the index node number of the path in the mapping relationship between locally stored directory keywords at all levels and metadata index node numbers, and then obtains the corresponding metadata from the cloud database.
  • Step 504 Return the metadata to the client so that the client can access data based on the metadata.
  • the metadata can be returned to the client.
  • the client can obtain the data block where the data is based on the metadata, and then based on the data block to the data node to read the corresponding data.
  • the name node can also implement metadata query based on the metadata query scheme recorded in the embodiment of Figure 2 or Figure 3 mentioned above in this specification.
  • metadata query scheme recorded in the embodiment of Figure 2 or Figure 3 mentioned above in this specification.
  • this specification also provides embodiments of the metadata management apparatus of the distributed file system.
  • the embodiments of the metadata management apparatus of the distributed file system in this specification can be applied in electronic devices.
  • the device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • Taking software implementation as an example as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory and running them through the processor of the electronic device where it is located.
  • From the hardware level as shown in Figure 6, it is a hardware structure diagram of the electronic equipment where the metadata management device of the distributed file system in this specification is located.
  • the electronic device where the device in the embodiment is located may also include other hardware based on the actual functions of the electronic device, which will not be described again.
  • FIG. 7 is a block diagram of a metadata management device of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management apparatus 700 of the distributed file system can be applied to the electronic device shown in Figure 3, and the electronic device can be the name node of the distributed file system.
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of the directory metadata.
  • the directory keywords at this level are generated based on the index node numbers of the upper-level directory metadata and the directory name at this level.
  • the device 700 includes:
  • the name acquisition unit 701 extracts the directory names of directories at each level from the file path;
  • the keyword generation unit 702 generates keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata for each extracted directory name in the order of the directory from upper level to lower level;
  • the number search unit 703 searches for the index of the metadata of the current level directory in the mapping relationship based on the keyword. Lead node number;
  • the metadata obtaining unit 704 obtains the metadata of the file path from the cloud database based on the index node number of the file path.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides embodiments of the data access apparatus of the distributed file system.
  • the embodiment of the data access device of the distributed file system in this specification can be applied in electronic equipment.
  • the device embodiments may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • the non-volatile storage is stored in the processor of the electronic device where it is located.
  • the corresponding computer program instructions in the device are read into the memory and run.
  • the hardware structure of the electronic device where the data access device of the distributed file system in this specification is located can be similar to the electronic device shown in Figure 6, and this specification does not place special restrictions on this.
  • FIG. 8 is a block diagram of a data access device of a distributed file system according to an exemplary embodiment of this specification.
  • the metadata management device 800 of the distributed file system can be applied in the name node of the distributed file system, including:
  • the metadata query unit 80 in response to the data access request sent by the client, queries the corresponding metadata according to the path of the data to be accessed.
  • the metadata query can be implemented using the metadata management method provided in this specification.
  • the data access unit 802 returns the metadata to the client so that the client can perform data access based on the metadata.
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the partial description of the method embodiment for relevant details.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • a typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
  • this specification also provides a metadata management device of a distributed file system, which includes: a processor and a memory for storing machine-executable instructions.
  • the processor and the memory are usually connected to each other through an internal bus.
  • the device may also include an external interface to be able to communicate with other devices or components.
  • the distributed file system stores a mapping relationship between the keywords of directories at each level and the index node numbers of directory metadata, where the directory keywords at this level are based on the index nodes of the upper-level directory metadata. Number and directory name of this level are generated.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides a data access device of the distributed file system, which includes: a processor and a memory for storing machine-executable instructions.
  • the processor and the memory are usually connected to each other through an internal bus.
  • the device may also include an external interface to be able to communicate with other devices or components.
  • the metadata query can be implemented using the metadata management method provided in this specification.
  • the distributed file system stores a mapping relationship between the keywords of the directories at each level and the index node numbers of the directory metadata, where the current level Directory keywords are generated based on the index node number of the upper-level directory metadata and the name of the current-level directory.
  • This specification also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium. When the program is executed by the processor, the following steps are implemented:
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • a keyword for the current level directory is generated based on the directory name and preset characters.
  • generating keywords for the current-level directory based on the directory name and the index node number of the upper-level directory metadata includes:
  • the directory name and the index node number of the upper-level directory metadata are spliced based on a preset order to generate a keyword for the current-level directory.
  • the metadata of the file path is obtained from the cloud database based on the index node number of the file path, including:
  • the index node number of the directory metadata at all levels found based on the mapping relationship is the same as the index node number found in the cloud database
  • the index node number of the file path is obtained from the cloud database based on the index node number. Metadata for the file path.
  • Optional also includes:
  • mapping relationship is updated based on the index node numbers found in the cloud database.
  • Optional also includes:
  • Optional also includes:
  • the index node number of the directory metadata at this level is searched from the cloud database based on the keyword, and based on the found The index node number updates the mapping relationship.
  • this specification also provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the program is executed by the processor, the following is implemented: step:
  • the metadata query can be implemented using the metadata management method provided in this specification.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La description divulgue un procédé et un appareil de gestion de métadonnées pour un système de fichiers distribués. Le procédé et l'appareil de gestion de métadonnées sont appliqués à un système de fichiers distribués et le système de fichiers distribués stocke une relation de mappage entre des mots-clés et des nombres de nœuds d'index de métadonnées de répertoire de répertoires à tous les niveaux, un mot-clé d'un répertoire au niveau actuel étant généré sur la base d'un nombre de nœuds d'index de métadonnées d'un répertoire à un niveau supérieur et du nom du répertoire au niveau actuel. Le procédé consiste : à extraire, d'un trajet de fichier, des noms de répertoire de répertoires à tous les niveaux ; selon l'ordre des répertoires allant d'un niveau supérieur à un niveau inférieur, par rapport à chaque nom de répertoire qui est extrait, et sur la base du nom de répertoire et d'un nombre de nœuds d'index de métadonnées de répertoire à un niveau supérieur, à générer un mot-clé d'un répertoire au niveau actuel ; sur la base du mot-clé, à rechercher la relation de mappage pour un nombre de nœuds d'index de métadonnées de répertoire au niveau actuel ; et, sur la base d'un nombre de nœuds d'index du trajet de fichier, à acquérir des métadonnées du trajet de fichier à partir d'une base de données en nuage.
PCT/CN2023/083879 2022-03-25 2023-03-24 Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués WO2023179787A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210307777.7A CN114840487A (zh) 2022-03-25 2022-03-25 分布式文件系统的元数据管理方法和装置
CN202210307777.7 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179787A1 true WO2023179787A1 (fr) 2023-09-28

Family

ID=82564017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083879 WO2023179787A1 (fr) 2022-03-25 2023-03-24 Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués

Country Status (2)

Country Link
CN (1) CN114840487A (fr)
WO (1) WO2023179787A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840487A (zh) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 分布式文件系统的元数据管理方法和装置
CN117873967B (zh) * 2024-03-08 2024-05-17 腾讯科技(深圳)有限公司 分布式文件系统的数据管理方法、装置、设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147479A1 (en) * 2014-11-26 2016-05-26 International Business Machines Corporation Metadata storing technique
CN108009254A (zh) * 2017-12-05 2018-05-08 北京百度网讯科技有限公司 多索引方法及装置、云系统以及计算机可读存储介质
CN111694791A (zh) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 一种分布式基础框架中的数据存取方法及装置
CN112988062A (zh) * 2021-01-28 2021-06-18 腾讯科技(深圳)有限公司 一种元数据读取限制方法、装置、电子设备及介质
CN113010476A (zh) * 2021-03-15 2021-06-22 腾讯科技(深圳)有限公司 元数据查找方法、装置、设备及计算机可读存储介质
CN114116613A (zh) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 基于分布式文件系统的元数据查询方法、设备和存储介质
CN114840487A (zh) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 分布式文件系统的元数据管理方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685148B (zh) * 2012-05-31 2014-10-15 清华大学 一种云存储环境下的安全网盘系统的实现方法
JP5843965B2 (ja) * 2012-07-13 2016-01-13 株式会社日立ソリューションズ 検索装置、検索装置の制御方法及び記録媒体
CN103634616B (zh) * 2012-08-27 2018-04-17 中兴通讯股份有限公司 一种基于云存储的流媒体点播方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147479A1 (en) * 2014-11-26 2016-05-26 International Business Machines Corporation Metadata storing technique
CN108009254A (zh) * 2017-12-05 2018-05-08 北京百度网讯科技有限公司 多索引方法及装置、云系统以及计算机可读存储介质
CN111694791A (zh) * 2020-04-01 2020-09-22 新华三大数据技术有限公司 一种分布式基础框架中的数据存取方法及装置
CN112988062A (zh) * 2021-01-28 2021-06-18 腾讯科技(深圳)有限公司 一种元数据读取限制方法、装置、电子设备及介质
CN113010476A (zh) * 2021-03-15 2021-06-22 腾讯科技(深圳)有限公司 元数据查找方法、装置、设备及计算机可读存储介质
CN114116613A (zh) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 基于分布式文件系统的元数据查询方法、设备和存储介质
CN114840487A (zh) * 2022-03-25 2022-08-02 阿里巴巴(中国)有限公司 分布式文件系统的元数据管理方法和装置

Also Published As

Publication number Publication date
CN114840487A (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
JP7113040B2 (ja) 分散型データストアのバージョン化された階層型データ構造
US10754562B2 (en) Key value based block device
US10754878B2 (en) Distributed consistent database implementation within an object store
US11182356B2 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
US9411840B2 (en) Scalable data structures
WO2023179787A1 (fr) Procédé et appareil de gestion de métadonnées pour un système de fichiers distribués
US10275489B1 (en) Binary encoding-based optimizations at datastore accelerators
WO2018064962A1 (fr) Procédé de mémorisation de données, dispositif électronique et support d'informations non volatil pour ordinateur
US20170075909A1 (en) In-line policy management with multi-level object handle
US7469257B2 (en) Generating and monitoring a multimedia database
US9659023B2 (en) Maintaining and using a cache of child-to-parent mappings in a content-addressable storage system
US11151081B1 (en) Data tiering service with cold tier indexing
WO2018097846A1 (fr) Conceptions de mémoire d'arêtes pour bases de données orientées graphe
US7844596B2 (en) System and method for aiding file searching and file serving by indexing historical filenames and locations
US10146833B1 (en) Write-back techniques at datastore accelerators
US10558636B2 (en) Index page with latch-free access
Hua et al. SANE: Semantic-aware namespacein ultra-large-scale file systems
CN115918110A (zh) 使用键值存储库的空间搜索
CN107273443B (zh) 一种基于大数据模型元数据的混合索引方法
Cheng et al. A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud
EP3995972A1 (fr) Procédé et appareil de traitement de métadonnées, et support d'informations lisible par ordinateur
US20210286793A1 (en) Indexing stored data objects using probabilistic filters
JP4825504B2 (ja) データ登録・検索システムおよびデータ登録・検索方法
Zhou et al. HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
CN117540056B (zh) 数据查询的方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23774030

Country of ref document: EP

Kind code of ref document: A1