CN113688113A - Metadata prefetching system and method for distributed file system - Google Patents

Metadata prefetching system and method for distributed file system Download PDF

Info

Publication number
CN113688113A
CN113688113A CN202110859541.XA CN202110859541A CN113688113A CN 113688113 A CN113688113 A CN 113688113A CN 202110859541 A CN202110859541 A CN 202110859541A CN 113688113 A CN113688113 A CN 113688113A
Authority
CN
China
Prior art keywords
file
metadata
client
prefetching
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110859541.XA
Other languages
Chinese (zh)
Inventor
张静逸
江波
杜欣军
张浩博
雷旸
王梦童
于楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 32 Research Institute
Original Assignee
CETC 32 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 32 Research Institute filed Critical CETC 32 Research Institute
Priority to CN202110859541.XA priority Critical patent/CN113688113A/en
Publication of CN113688113A publication Critical patent/CN113688113A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata

Abstract

The invention provides a metadata prefetching system and method for a distributed file system, which comprises a function level framework, wherein the function level framework further comprises a client side layer and a metadata server layer; the client side layer comprises a client side cache layer, and the client side cache layer extracts, updates and synchronizes file related characteristics and provides support for file related characteristic operation; the metadata server layer comprises a metadata server cache layer which is responsible for storing and synchronizing file related characteristics and executing file metadata operation. The method extracts the relevant characteristics of the file, performs metadata prefetching by means of the extracted characteristics, reduces the prefetching range and improves the prefetching accuracy rate; the metadata of the associated file is prefetched to the client in advance through the prefetching method, the metadata access flow of the associated file is shortened, the number of metadata requests in the system is reduced, the access performance of the metadata in the distributed file system is greatly improved, and the overall performance of the distributed file system is improved.

Description

Metadata prefetching system and method for distributed file system
Technical Field
The present invention relates to the technical field of metadata prefetching, and in particular, to a metadata prefetching system and method for a distributed file system.
Background
The distributed file system is a shared file storage system with high reliability and high extensibility, and can provide a perfect concurrent access mechanism, so that the distributed file system gets more and more attention. For most distributed file systems that are popular at present, metadata operations of the file system usually occupy most of the workload of the file system, and therefore efficient metadata management and fast metadata access operations are crucial.
In a distributed file system, the access amount of metadata can account for more than half of the total access amount, the metadata is increasingly large in scale, and the access performance of the metadata gradually becomes a bottleneck for restricting the improvement of the I/O performance of the metadata. The performance problem can be effectively alleviated by improving the performance of the system due to the improvement of the metadata access operation performance. In order to improve the performance of metadata access operation, the hit rate of metadata can be improved through metadata prefetching and metadata cache optimization, and the delay of metadata access is reduced. Although the prefetching methods greatly alleviate the problem that the metadata operation is frequently performed by the distributed file system, the metadata hit rate is low, which results in the problems of too large prefetching range, low prefetching accuracy and the like.
Because of the nature of distributed file systems that operate on file metadata this frequently, much research has also emerged that has expanded around file metadata prefetching. In the currently popular metadata prefetching methods, a large part of methods are mainly based on the access relevance of files, and adopt an off-line mode to find relevance information in the history access records of the file system, and use the file sets which are often accessed simultaneously for subsequent prefetching. The prefetching method based on the file access relevance has strong limitation, and is difficult to dynamically adjust the file relevance relation according to the real-time characteristic change of the system load, so that the problem that how to mine the potential relevance relation among file data and how to accurately predict the subsequent file accessed by a user becomes a crucial problem.
In recent years, there have been many studies on metadata prefetching and caching. The DiskSeen analyzes the time and space relation of disk access, takes the reading and prefetching of files as two windows, and guides the prefetching window to prefetch data by using the reading window. QuickMine takes advantage of this idea to introduce context information at the transaction query application level to predict future access sequences. Nexus is a grouping method based on weighted graphs, constructs a metadata relational graph, represents files and directories by using the vertexes of the graph, represents the locality strength between the vertexes by using weighted edges of the graph, and realizes the prefetching of metadata by maintaining the graph by a metadata server, dynamically inserting or deleting the edges and adjusting the related weight values. SmartStore organizes files into related groups according to the semantics of the metadata, providing low latency for complex queries. CFFS changes the one-to-one mapping relationship between files and their metadata into many-to-one mapping, integrates file directories, file internal associations and file access frequencies to discover correlations, and performs metadata prefetching. SEER records the semantic distance between several nearest related files of each file, and calculates the correlation between files by using the number of sharing neighbors. The file cache management method based on the group is used for grouping the files with the descendant relationship with the files, and describing the relationship between the files by using a weighted probability graph. The C-Miner is an effective file system block correlation search algorithm, and utilizes a data mining technology to mine frequent block access sequences, search the correlation of blocks on a storage server, and utilize the discovered sequences to generate correlation rules to guide block prefetching and layout optimization. In addition to the above-described methods of describing file relationships, there have been some studies on methods of recording file relationships and access patterns using a tree structure, which capture dependency relationships between user process files by accessing the tree structure. The paths of the whole tree structure from the root node to the leaf nodes form a group of access paths of continuous file sequences, a plurality of access trees can be maintained for programs with different access modes, the current access activity of the program is matched with the access trees, and the access trees are used for guiding file prefetching. Most of these methods work well in general file systems, but do not work well in distributed file systems with large numbers of files.
The chinese patent publication CN108920600A discloses a distributed file system metadata prefetching method based on data association, which is characterized in that the steps of designing an extraction manner and a storage structure of data association, prefetching metadata of associated files, dynamic feedback of data association, and dynamic update of data association are adopted.
For the related technologies, the inventor thinks that the method has a low hit rate of metadata, which results in an excessively large prefetching range and low prefetching accuracy, and has strong limitations, and it is difficult to dynamically adjust the file association relationship according to the real-time characteristic change of the system load, and the method has a poor effect in a distributed file system with a large number of files.
Disclosure of Invention
In view of the deficiencies in the prior art, it is an object of the present invention to provide a system and method for metadata prefetching for a distributed file system.
The metadata prefetching system for the distributed file system provided by the invention comprises a function hierarchy framework, a client layer and a metadata server layer, wherein the function hierarchy framework further comprises a client layer and a metadata server layer;
the client side layer comprises a client side cache layer, and the client side cache layer is responsible for extracting, updating and synchronizing relevant characteristics of the files and provides support for file relevant characteristic operation;
the metadata server layer comprises a metadata server cache layer which is responsible for storing and synchronizing file related characteristics and executing file metadata operation.
Preferably, the system further comprises a system overall module, wherein the system overall module comprises a client and a metadata server; the method comprises the steps that a reading request operation of metadata initiated by an application program reaches a client through a distributed file system, and the client searches metadata of a target file in a local metadata cache space to check whether the required metadata exists in a local cache of the client; if the local cache is hit, corresponding processing operation is carried out according to the found file metadata and the request of the application program is responded; otherwise, the client sends the read request operation of the metadata to a metadata server managed by the client through the network to search the required metadata; the metadata server comprises a prefetching module, the prefetching module in the metadata server starts to search and collect metadata information of the target file and files related to the target file in the metadata cache, then the metadata server integrates the metadata information of the target file and the files related to the target file in a response message and sends the response message to the client, and the client processes subsequent metadata requests of the target file and the related files with the response message in a local cache of the client.
Preferably, the system overall module further comprises: when an application program initiates the write operation of metadata, the client comprises a library, the library comprises a syntax analysis module, and the syntax analysis module in the library of the client is triggered to extract the relevant characteristics of the file existing in the file; the extracted file related features are firstly cached in a metadata cache in a client of the distributed file system, and the newly added file related features are synchronized to a metadata server along with the original metadata synchronization I/O of the distributed file system to obtain a new metadata version; when the metadata server receives a metadata synchronization request, the old metadata version is replaced with the new metadata version in the metadata cache.
Preferably, the client cache layer includes a real-time file feature extraction module, and the real-time file feature extraction module: firstly, file metadata information comprising a file access sequence, a directory and a file path name is obtained, and data content in a target format is searched for in a file metadata part through a pattern matching algorithm of a syntax analysis mechanism in a client syntax analysis module so as to determine file information related to the data content in the target format; file-related features are extracted from information of file access sequences, directories and file path names based on target keywords given by a user.
Preferably, the client cache layer further includes a file feature update module, and the file feature update module: whether the relevant characteristics of the file are covered or not is judged by checking whether the offset of the relevant characteristics of the file in the file metadata extended attribute and the newly added data overlap or not at the client; and if the coverage phenomenon exists, regarding the part of the file-related features as invalid, and directly deleting the invalid file-related features to complete the updating operation.
Preferably, the file feature updating module further includes: when a file deletion operation occurs, the distributed file system adopts a delayed updating mode, does not immediately delete file related characteristic information, and deletes file related characteristic information about the removed file in the metadata extension attribute of the accessed file when the file related characteristic related to the file is accessed again.
Preferably, the client cache layer further includes a strong consistency synchronization control module, where the strong consistency synchronization control module: the strong consistency is provided by adopting a strong consistency strategy based on distributed storage and simultaneously and in a manner of distributing the authority with the file, and the client can also regularly clear the metadata cache of the client.
Preferably, the metadata server cache layer includes a storage and synchronization file feature module, and the storage and synchronization file feature module: when the metadata pre-fetched from the metadata server is returned to the client, the metadata is serialized in a pre-fetching queue in a specified organization form; the metadata structure of the file maintains and records metadata information of the file, after performing feature analysis on a file access sequence and performing syntactic analysis on a directory and a file path name, extracting and coding file-related features aiming at a file access sequence, a peer directory relation, an application access sequence and a user reading sequence, and storing the extracted file-related feature codes in an extended attribute of the file metadata; meanwhile, a data set is extracted in the real cluster operation, a file prefetching analysis model is built, and files obtained after the file prefetching analysis model is calculated, file related characteristic information and prefetching scores are organized into key value pairs and stored in metadata extension attributes of the files related to the key value pairs; when the client needs to access the target file, the file metadata can be directly read.
Preferably, the metadata server cache layer further comprises a prefetch association metadata module, and the prefetch association metadata module: when a client side initiates a metadata request to a metadata server, the metadata server processes the metadata request of a target file, traverses the metadata information of the associated file in the cache of the metadata server according to the file related characteristics in the metadata extended attribute of the target file, packs the metadata information of the associated file into a response message together and returns the response message to the client side in a single metadata I/O mode.
The invention provides a metadata prefetching method aiming at a distributed file system, which comprises the following steps:
client caching: the system is in charge of extracting, updating and synchronizing the relevant characteristics of the files and providing support for the operation of the relevant characteristics of the files;
caching by a metadata server: and the file storage and synchronization module is responsible for storing and synchronizing related characteristics of the file and executing metadata operation of the file.
Compared with the prior art, the invention has the following beneficial effects:
1. the method and the device have the advantages that the hidden relation between the extracted file and the file to be analyzed is learned from the file access sequence to extract the relevant characteristics of the file, and the metadata is prefetched by means of the extracted characteristics, so that the prefetching range is narrowed, and the prefetching accuracy is improved;
2. when the metadata server responds to the metadata request of the target file, the metadata of the associated file is prefetched to the client in advance through the prefetching method, so that the metadata access flow of the associated file is shortened, the number of metadata requests in the system is reduced, the access performance of the metadata in the distributed file system is greatly improved, and the overall performance of the distributed file system is improved;
3. the file related feature extraction is efficient, the related features of the file are explored in real time by performing feature analysis on a file access sequence and performing syntactic analysis on a directory and a file path name, a lightweight mode matching method is introduced to accelerate an exploration process, and extra overhead brought by feature extraction operation is reduced;
4. the invention is transparent and supported, standardizes file characteristic information through a certain organization form, and stores the extracted file related characteristics and the file characteristic information code obtained after model calculation in the extended attribute of file metadata, and reuses the existing file system interface, thereby avoiding introducing additional metadata I/O request and synchronous operation;
5. the invention has a double-layer metadata cache management mechanism, and by setting a double-layer structure of a client-side cache layer and a server-side cache layer, wherein the client-side provides support for file related characteristic operation, the metadata server is responsible for storing and synchronizing file related characteristics and prefetching metadata of related files, and the double-layer structure improves the query efficiency of metadata so as to provide two-stage acceleration.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram of a prefetch system architecture;
FIG. 2 is a data flow diagram of a prefetching method.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention discloses a metadata prefetching system for a distributed file system, which comprises a function hierarchy frame and a system overall module, wherein the function hierarchy frame further comprises a client side layer and a metadata server layer, as shown in fig. 1 and fig. 2.
The client-side layer comprises a client-side cache layer, and the client-side cache layer is responsible for extracting, updating and synchronizing the relevant characteristics of the files and provides support for the operation of the relevant characteristics of the files.
The metadata server layer comprises a metadata server cache layer which is responsible for storing and synchronizing file related characteristics and executing file metadata operation.
The system overall module comprises a client and a metadata server; the method comprises the steps that a reading request operation of metadata initiated by an application program reaches a client through a distributed file system, and the client searches metadata of a target file in a local metadata cache space to check whether the required metadata exists in a local cache of the client; if the local cache is hit, corresponding processing operation is carried out according to the found file metadata and the request of the application program is responded; otherwise, the client sends the read request operation of the metadata to a metadata server managed by the client through the network to search the required metadata; the metadata server comprises a prefetching module, the prefetching module in the metadata server starts to search and collect metadata information of the target file and files related to the target file in the metadata cache, then the metadata server integrates the metadata information of the target file and the files related to the target file in a response message and sends the response message to the client, and the client processes subsequent metadata requests of the target file and the related files with the response message in a local cache of the client.
When the application program initiates the write operation of the metadata, the client comprises a library, the library comprises a syntax analysis module, and the syntax analysis module in the library of the client is triggered to extract the file related characteristics of the file. The extracted file related features are firstly cached in a metadata cache in a client of the distributed file system, and the newly added file related features are synchronized to a metadata server along with the original metadata synchronization I/O of the distributed file system to obtain a new metadata version; when the metadata server receives a metadata synchronization request, the old metadata version is replaced with the new metadata version in the metadata cache. I/O denotes input/output of an interface.
The client cache layer comprises a real-time file feature extraction module, a file feature updating module and a strong consistency synchronization control module. A real-time file feature extraction module: firstly, file metadata information comprising a file access sequence, a directory and a file path name is obtained, and data content in a target format is searched for in a file metadata part through a pattern matching algorithm of a syntax analysis mechanism in a client syntax analysis module so as to determine file information related to the data content in the target format; and extracting file related characteristics from information such as file access sequences, directories and file path names based on target keywords given by a user.
The technology for extracting the relevant characteristics of the files in real time can better adapt to the dynamic change of the system load characteristics through a real-time extraction mode of the relevant characteristics of the files. Firstly, file metadata information such as file access sequences, directories and file path names is obtained, and data content in a target format is searched in the file metadata part through a pattern matching algorithm in a syntax analysis mechanism so as to determine file information related to the file metadata. And the related characteristics of the file can be quickly extracted based on the target keywords given by the user.
File feature update module: whether the relevant characteristics of the file are covered or not is judged by checking whether the offset of the relevant characteristics of the file in the file metadata extended attribute and the newly added data overlap or not at the client. If the covering phenomenon exists, the part is regarded as invalid, and the invalid file-related features are directly deleted to complete the updating operation. When a file deletion operation occurs, the distributed file system adopts a delayed updating mode, does not immediately delete file related characteristic information, and deletes file related characteristic information about the removed file in the metadata extension attribute of the accessed file when the file related characteristic related to the file is accessed again.
The file feature updating technology judges whether the relevant features of the file are covered or not by checking whether the offset of the relevant features of the file in the metadata extended attribute of the file and the newly added data overlap or not at the client; if the covering phenomenon exists, the related characteristics of the part of the files are regarded as invalid, and the invalid related characteristics of the files are directly deleted to complete the updating operation. When the file deletion operation occurs, the system adopts a delayed updating mode, the related characteristic information of the file is not deleted immediately, but when the related file is accessed again, the characteristic information of the file which is already removed in the metadata extension attribute of the accessed file is deleted.
Strong consistency synchronization control module: the strong consistency is provided by adopting a strong consistency strategy based on distributed storage and simultaneously and in a manner of distributing the authority with the file, and the client can also regularly clear the metadata cache of the client.
The strong consistency synchronization control technology adopts a strong consistency strategy based on distributed storage to ensure that all the client-side accesses the latest and consistent metadata. The strong consistency is provided by only distributing one authority with the file at the same time, and the client can also regularly clear the metadata cache of the client.
The metadata server cache layer comprises a storage and synchronization file characteristic module and a pre-fetching related metadata module. Storage and synchronization file feature module: the metadata that is pre-fetched in the metadata server is serialized in a pre-fetch queue in a specified organizational form when returned to the client. The metadata structure of the file maintains and records metadata information of the file, after performing feature analysis on a file access sequence and performing syntactic analysis on a directory and a file path name, extracting and coding file related features aiming at four related features of a file access sequence, a peer directory relationship, an application access sequence and a user reading sequence, and storing the extracted file related feature codes in an extended attribute of the file metadata. And converting characters in the extracted data features into corresponding digital ids according to a dictionary with a fixed sequence, so that splicing can be performed according to a set sequence, and finally a file feature vector for calculation is obtained. And meanwhile, extracting a data set in the operation of the real cluster, constructing a file prefetching analysis model and training, organizing files obtained by calculating the file prefetching analysis model, file related characteristic information and prefetching values into a key value pair form, and storing the key value pair form in a metadata expansion attribute of the file related to the key value pair. When the client needs to access the target file, the file metadata can be directly read.
The storage and synchronization file feature technology needs to be serialized in a pre-fetching queue in a specified organization form when the metadata pre-fetched in the metadata server is returned to the client. The metadata structure of the file maintains and records metadata information of the file, and after the file access sequence is subjected to characteristic analysis and directory and file path name are subjected to syntactic analysis, extracted file related characteristic codes are stored in the extended attributes of the file metadata. Meanwhile, files obtained after model calculation, characteristic information of the files and pre-fetching values are organized into a form of key value pairs of < K, V >, and the key value pairs are stored in metadata extension attributes of the files related to the files. When the client needs to access the target file, the file metadata can be directly read without sending a request to the metadata server, so that the metadata access flow of the associated file is shortened, and the number of metadata access requests in the system is reduced.
A prefetch associated metadata module: when a client side initiates a metadata request to a metadata server, the metadata server processes the metadata request of a target file, traverses the metadata information of the associated file in the cache of the metadata server according to the file related characteristics in the metadata extended attribute of the target file, packs the metadata information of the associated file into a response message together and returns the response message to the client side in a single metadata I/O mode.
The pre-fetching related metadata technology is characterized in that when a client side initiates a metadata request to a metadata server, the metadata server firstly processes the metadata request of a target file, then sequentially traverses the metadata information of related files in a cache according to file related characteristics in the metadata extended attribute of the target file, and finally packs the metadata information of the related files into a response message together and returns the response message to the client side in a single metadata I/O mode.
The method for prefetching the metadata aiming at the distributed file system is developed from two aspects of the overall structural design and the functional hierarchy framework design of the system.
The overall structure design of the system: the read request operation of the metadata firstly reaches the client through the file system, and the client firstly searches the metadata of the target file in the local metadata cache space to check whether the required metadata exists in the local cache of the client. If the local cache is hit, corresponding processing operation is carried out according to the found file metadata and an application request is responded; otherwise the client will send a request over the network to the metadata server it manages to find the required metadata. The pre-fetch module in the metadata server starts searching and collecting metadata information of the target file and its associated files in the metadata cache, and then the metadata server integrates all metadata in one response message and sends to the client so that the client can process subsequent metadata requests of the files in its local cache. When the writing operation is initiated, a grammar analysis module in the library is triggered to extract the relevant characteristics of the files; the extracted features are firstly cached in a metadata cache in a client of the distributed file system, and the related features of the newly added files are synchronized to a metadata server along with the original metadata synchronization I/O of the system. When the metadata server receives a metadata synchronization request, the old metadata version is replaced with the new metadata version in the metadata cache.
Designing a function level framework: a client side layer: the client cache layer mainly provides support for file related feature operation and is responsible for extracting, updating and synchronizing file related features.
Metadata server level: the metadata server cache layer is responsible for storing and synchronizing file related features and executing file metadata operations.
According to the method, the hidden relation between the extracted file and the file to be analyzed is learned from the file access sequence to extract the relevant characteristics of the file, and the metadata prefetching is performed by means of the extracted characteristics, so that the prefetching range is narrowed, the prefetching accuracy is improved, when the metadata server responds to the metadata request of the target file, the metadata of the associated file is prefetched to the client in advance through the prefetching method, the metadata access flow of the associated file is shortened, the number of metadata requests in the system is reduced, the access performance of the metadata in the distributed file system is greatly improved, and the overall performance of the distributed file system is improved.
The method and the system are used for improving the access performance of the metadata in the distributed file system, and when the metadata server responds to the metadata request of the target file, the system can send the metadata of the associated file to the client in advance, so that the access flow of the metadata of the associated file is shortened, and the request number of the metadata in the system is reduced.
The invention adopts a method of learning the hidden relation between the extracted file and the file to be analyzed from the file access sequence to extract the file related characteristics and carries out the metadata prefetching cache by the extracted characteristics, and the method has the following technical characteristics and capability advantages: 1. efficient file-related feature extraction: the file access sequence is subjected to feature analysis, and the directory and file path names are subjected to syntactic analysis so as to search the relevant features of the file in real time, and a lightweight mode matching method is introduced to accelerate the search process and reduce the additional overhead brought by feature extraction operation. 2. Transparent support: standardizing file characteristic information through a certain organization form, and storing the extracted file related characteristics and the file obtained after model calculation and characteristic information codes thereof in the extended attributes of file metadata; and meanwhile, the existing file system interface is multiplexed, so that the introduction of additional metadata I/O (input/output) requests and synchronous operation are avoided. 3. A two-level metadata cache management mechanism: the client side cache layer and the server side cache layer are arranged in a double-layer structure. The client provides support for file related feature operation, and the metadata server is responsible for storing and synchronizing file related features and prefetching metadata of related files; the two-tier structure increases the efficiency of metadata querying to provide two-stage acceleration.
The embodiment of the invention also discloses a metadata prefetching method aiming at the distributed file system, which comprises the following steps: client caching: the file synchronization method is responsible for extracting, updating and synchronizing file related features and provides support for file related feature operations. Caching by a metadata server: and the file storage and synchronization module is responsible for storing and synchronizing related characteristics of the file and executing metadata operation of the file.
According to the metadata prefetching cache, the method for learning the hidden relation between the extracted file and the file to be analyzed from the file access sequence is used for extracting the relevant characteristics of the file, metadata prefetching is carried out by means of the extracted characteristics, the problem that the load pressure of a metadata server cluster is too large due to the fact that the metadata of a client are highly and concurrently accessed can be solved, the access performance of the metadata in a distributed file system is greatly improved, and the overall performance of the distributed file system is improved.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A metadata prefetching system for a distributed file system comprising a functional hierarchy that further comprises a client side level and a metadata server side level;
the client side layer comprises a client side cache layer, and the client side cache layer is responsible for extracting, updating and synchronizing relevant characteristics of the files and provides support for file relevant characteristic operation;
the metadata server layer comprises a metadata server cache layer which is responsible for storing and synchronizing file related characteristics and executing file metadata operation.
2. The metadata prefetching system for distributed file system as in claim 1 further comprising a system population module, said system population module comprising a client and a metadata server; the method comprises the steps that a reading request operation of metadata initiated by an application program reaches a client through a distributed file system, and the client searches metadata of a target file in a local metadata cache space to check whether the required metadata exists in a local cache of the client; if the local cache is hit, corresponding processing operation is carried out according to the found file metadata and the request of the application program is responded; otherwise, the client sends the read request operation of the metadata to a metadata server managed by the client through the network to search the required metadata; the metadata server comprises a prefetching module, the prefetching module in the metadata server starts to search and collect metadata information of the target file and files related to the target file in the metadata cache, then the metadata server integrates the metadata information of the target file and the files related to the target file in a response message and sends the response message to the client, and the client processes subsequent metadata requests of the target file and the related files with the response message in a local cache of the client.
3. The metadata prefetch system for a distributed file system of claim 2, wherein the system overview module further comprises: when an application program initiates the write operation of metadata, the client comprises a library, the library comprises a syntax analysis module, and the syntax analysis module in the library of the client is triggered to extract the relevant characteristics of the file existing in the file; the extracted file related features are firstly cached in a metadata cache in a client of the distributed file system, and the newly added file related features are synchronized to a metadata server along with the original metadata synchronization I/O of the distributed file system to obtain a new metadata version; when the metadata server receives a metadata synchronization request, the old metadata version is replaced with the new metadata version in the metadata cache.
4. The metadata prefetching system for a distributed file system as in claim 2 wherein the client cache layer comprises a real-time extraction file features module that: firstly, file metadata information comprising a file access sequence, a directory and a file path name is obtained, and data content in a target format is searched for in a file metadata part through a pattern matching algorithm of a syntax analysis mechanism in a client syntax analysis module so as to determine file information related to the data content in the target format; file-related features are extracted from information of file access sequences, directories and file path names based on target keywords given by a user.
5. The metadata prefetch system for a distributed file system of claim 2, wherein the client cache layer further comprises a file characteristic update module that: whether the relevant characteristics of the file are covered or not is judged by checking whether the offset of the relevant characteristics of the file in the file metadata extended attribute and the newly added data overlap or not at the client; and if the coverage phenomenon exists, regarding the part of the file-related features as invalid, and directly deleting the invalid file-related features to complete the updating operation.
6. The metadata prefetch system for a distributed file system of claim 5, wherein the file characteristic update module further comprises: when a file deletion operation occurs, the distributed file system adopts a delayed updating mode, does not immediately delete file related characteristic information, and deletes file related characteristic information about the removed file in the metadata extension attribute of the accessed file when the file related characteristic related to the file is accessed again.
7. The metadata prefetching system for a distributed file system as in claim 2 wherein the client cache layer further comprises a strong consistency synchronization control module that: the strong consistency is provided by adopting a strong consistency strategy based on distributed storage and simultaneously and in a manner of distributing the authority with the file, and the client can also regularly clear the metadata cache of the client.
8. The metadata prefetch system for a distributed file system of claim 2, wherein the metadata server cache layer comprises a store and synchronize file characterization module that: when the metadata pre-fetched from the metadata server is returned to the client, the metadata is serialized in a pre-fetching queue in a specified organization form; the metadata structure of the file maintains and records metadata information of the file, after performing feature analysis on a file access sequence and performing syntactic analysis on a directory and a file path name, extracting and coding file-related features aiming at a file access sequence, a peer directory relation, an application access sequence and a user reading sequence, and storing the extracted file-related feature codes in an extended attribute of the file metadata; meanwhile, a data set is extracted in the real cluster operation, a file prefetching analysis model is built, and files obtained after the file prefetching analysis model is calculated, file related characteristic information and prefetching scores are organized into key value pairs and stored in metadata extension attributes of the files related to the key value pairs; when the client needs to access the target file, the file metadata can be directly read.
9. The metadata prefetching system for a distributed file system as in claim 2 wherein the metadata server caching tier further comprises a prefetching associated metadata module that: when a client side initiates a metadata request to a metadata server, the metadata server processes the metadata request of a target file, traverses the metadata information of the associated file in the cache of the metadata server according to the file related characteristics in the metadata extended attribute of the target file, packs the metadata information of the associated file into a response message together and returns the response message to the client side in a single metadata I/O mode.
10. A metadata prefetching method for distributed file system, which is characterized by applying the metadata prefetching system for distributed file system according to any one of claims 1 to 9, comprising the following steps:
client caching: the system is in charge of extracting, updating and synchronizing the relevant characteristics of the files and providing support for the operation of the relevant characteristics of the files;
caching by a metadata server: and the file storage and synchronization module is responsible for storing and synchronizing related characteristics of the file and executing metadata operation of the file.
CN202110859541.XA 2021-07-28 2021-07-28 Metadata prefetching system and method for distributed file system Pending CN113688113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110859541.XA CN113688113A (en) 2021-07-28 2021-07-28 Metadata prefetching system and method for distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110859541.XA CN113688113A (en) 2021-07-28 2021-07-28 Metadata prefetching system and method for distributed file system

Publications (1)

Publication Number Publication Date
CN113688113A true CN113688113A (en) 2021-11-23

Family

ID=78578163

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110859541.XA Pending CN113688113A (en) 2021-07-28 2021-07-28 Metadata prefetching system and method for distributed file system

Country Status (1)

Country Link
CN (1) CN113688113A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048425A (en) * 2023-03-09 2023-05-02 浪潮电子信息产业股份有限公司 Hierarchical caching method, hierarchical caching system and related components

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279240A (en) * 2015-09-28 2016-01-27 暨南大学 Client origin information associative perception based metadata pre-acquisition method and system
CN108920600A (en) * 2018-06-27 2018-11-30 中国科学技术大学 A kind of metadata of distributed type file system forecasting method based on data correlation
CN112101891A (en) * 2020-07-30 2020-12-18 杭州正策信息科技有限公司 Data processing method applied to project declaration system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279240A (en) * 2015-09-28 2016-01-27 暨南大学 Client origin information associative perception based metadata pre-acquisition method and system
CN108920600A (en) * 2018-06-27 2018-11-30 中国科学技术大学 A kind of metadata of distributed type file system forecasting method based on data correlation
CN112101891A (en) * 2020-07-30 2020-12-18 杭州正策信息科技有限公司 Data processing method applied to project declaration system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048425A (en) * 2023-03-09 2023-05-02 浪潮电子信息产业股份有限公司 Hierarchical caching method, hierarchical caching system and related components
CN116048425B (en) * 2023-03-09 2023-07-14 浪潮电子信息产业股份有限公司 Hierarchical caching method, hierarchical caching system and related components

Similar Documents

Publication Publication Date Title
US20200372004A1 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
US8620900B2 (en) Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface
CN109359095B (en) DLK method for quickly reading big data
US8583598B2 (en) Device and method for enabling long-lived snapshots
CN104679898A (en) Big data access method
CN101184106A (en) Associated transaction processing method of mobile database
CN108920600B (en) Distributed file system metadata prefetching method based on data relevance
JPH02230373A (en) Data base processing system
WO2014011481A1 (en) Solid state drives as a persistent cache for database systems
US10838933B2 (en) Periodic performance optimization through heatmap based management of an in-memory area
US11080196B2 (en) Pattern-aware prefetching using parallel log-structured file system
Luo et al. Umzi: Unified multi-zone indexing for large-scale HTAP
CN101236564A (en) Mass data high performance reading display process
CN115827907A (en) Cross-cloud multi-source data cube discovery and integration method based on distributed memory
Shaull et al. Skippy: a new snapshot indexing method for time travel in the storage manager
CN113688113A (en) Metadata prefetching system and method for distributed file system
Yan et al. Hmfs: efficient support of small files processing over HDFS
CN114896250A (en) Key value separated key value storage engine index optimization method and device
Zhang et al. A kind of metadata prefetch method for distributed file system
US11354252B2 (en) On-demand cache management of derived cache
Zhang et al. Redis rehash optimization based on machine learning
US11966393B2 (en) Adaptive data prefetch
CN104657460B (en) A kind of file search method based on extensive file system load characteristic keyword
Tanvir et al. Translytics: A Novel Approach for Runtime Selection of Database Layout Based on User’s Context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination