CN112579543A - Dynamic metadata management method for distributed file system and distributed file system - Google Patents

Dynamic metadata management method for distributed file system and distributed file system Download PDF

Info

Publication number
CN112579543A
CN112579543A CN202011586836.6A CN202011586836A CN112579543A CN 112579543 A CN112579543 A CN 112579543A CN 202011586836 A CN202011586836 A CN 202011586836A CN 112579543 A CN112579543 A CN 112579543A
Authority
CN
China
Prior art keywords
metadata
server
preposed
server cluster
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011586836.6A
Other languages
Chinese (zh)
Inventor
马俊杰
苏帅
苏玉娇
瞿秋薏
姜瀚
付慧慧
付长杰
刘曦冉
黄亚杰
晋晨
丛峰日
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Science And Technology Network Information Development Co ltd
Original Assignee
Aerospace Science And Technology Network Information Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Science And Technology Network Information Development Co ltd filed Critical Aerospace Science And Technology Network Information Development Co ltd
Priority to CN202011586836.6A priority Critical patent/CN112579543A/en
Publication of CN112579543A publication Critical patent/CN112579543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/164File meta data generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a metadata dynamic management method of a distributed file system and the distributed file system, belonging to the field of distributed computers. In the invention, a plurality of servers are selected from a metadata server cluster as prepositive servers to form a prepositive metadata server cluster, and other metadata servers in the metadata server cluster form a non-prepositive metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster; starting a front server; and processing the metadata read-write request of the client. The invention can provide high-speed access to the metadata, reduce the load on a distributed system and realize better load balance.

Description

Dynamic metadata management method for distributed file system and distributed file system
Technical Field
The invention belongs to the technical field of distributed computers, and particularly relates to a metadata dynamic management method of a distributed file system and the distributed file system.
Background
With the continuous development of internet information technology, storage systems are becoming more and more important as the data volume increases. The distributed file system conforms to the trend of exponential increase of information by the characteristics of high fault tolerance, high concurrency and high expandability, and is valued by storage manufacturers and practitioners.
Two types of data are mainly managed in the distributed file system, one is data of a user, and the other is referred to as metadata, that is, data for managing and indexing user data. The access characteristics of the user's data are more storage intensive, while the access characteristics of the metadata are more compute intensive. Generally, therefore, a distributed file system manages and stores these two types of data independently, wherein a component storing user data is called a data server, and a component storing metadata is called a metadata server.
In order to enable the whole distributed file system to have stronger fault-tolerant capability and higher parallel access capability, the distributed file system respectively uses a plurality of nodes to construct a data server cluster and a metadata server cluster. Due to frequent access to metadata, the problem of uneven dynamic load often occurs, so that the response speed of the system becomes slow and even the system becomes unstable.
In order to solve the above problems, the present invention provides a distributed file system, which can provide high-speed access to metadata and can achieve better load balancing.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the present invention is how to provide a dynamic metadata management method for a distributed file system and a distributed file system, so as to solve the problems of slow response speed, even system instability and the like of the existing distributed file system.
(II) technical scheme
In order to solve the above technical problem, the present invention provides a dynamic metadata management method for a distributed file system, which includes the following steps:
s1, selecting a plurality of servers as prepositive servers in a metadata server cluster to form a prepositive metadata server cluster, wherein the rest metadata servers in the metadata server cluster form a non-prepositive metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster;
s2, starting a front server;
and S3, processing the metadata read-write request of the client.
Further, step S2 includes:
s201, preprocessing: setting a configuration file for the front-end server;
s202, initialization of the front server: according to the configuration file, each prepositive server and other prepositive servers are mutually communicated to automatically form a prepositive metadata server cluster;
the configuration file comprises two types of communication addresses and ports, wherein one type is used for communication between the front-end servers, and the other type is used for communication with the client side of the user side.
Further, the method further comprises the step S203: the front-end server elects a main front-end server.
Further, the step S3 of processing the metadata read-write request of the client includes:
s301, the prepositive server processes the metadata writing request: the method comprises the following steps that a client is connected with any one preposed server to initiate a metadata writing request, and after the preposed metadata server cluster receives the metadata writing request initiated by the client, the metadata writing request is processed according to the following process:
if the client is connected with the main front-end server, the main front-end server performs writing operation;
and if the client is connected with the main front-end server, automatically forwarding the metadata writing request to the main front-end server by the front-end server for writing.
Further, after receiving the write request, the main front-end server writes the metadata in the write request into the memory of the server, and then writes the metadata into the memories of other front-end servers until the number of the front-end servers which are successfully written in is greater than half of the total number of the front-end servers; the main front-end server returns a result of successful writing to the client; after the write request is completed, the front metadata server cluster records a log, and the content of the log comprises the directory, the file path, the modification content and the modification time of the metadata.
Further, the step S3 of processing the metadata read-write request of the client includes:
s302, the front server processes a metadata reading request: the client is connected with any one preposed server and sends a metadata reading request, and the preposed server processes according to the following procedures when receiving the reading request of the client: the preposed server receiving the reading request determines whether all metadata to be read by the client is stored in the preposed metadata server cluster through the preposed metadata server cluster communication, and if all the metadata to be read by the client is stored in the preposed metadata server cluster, the preposed server directly reads the metadata from the preposed metadata server cluster and returns the metadata to the client; if the metadata to be read by the client is not stored in the preposed metadata server cluster at all, the preposed server receiving the reading request sends the reading request to the non-preposed metadata server cluster, the non-preposed metadata server cluster calls the metadata required by the client from the hard disk, returns the metadata to the preposed server receiving the reading request and returns the metadata to the client by the preposed server; if the metadata part to be read by the client is stored in the preposed metadata server cluster, the preposed server firstly sends a reading request to the preposed metadata server cluster and a non-preposed metadata server cluster, the preposed server in the preposed metadata server cluster returns the required part of metadata to the preposed server receiving the reading request, and the non-preposed metadata server cluster calls the part of metadata required by the client from the hard disk and returns the metadata to the preposed server receiving the reading request; and the prepositive server receiving the reading request carries out aggregation processing on the received two parts of metadata and returns the metadata to the client after the processing.
Further, the method further comprises: s4, the preposed metadata server cluster synchronizes the latest data to the non-preposed metadata server cluster: the main front-end server analyzes the log recorded by the front-end server at a preset time every natural day, and only the latest metadata log is reserved for the same directory and file, so that log compression is realized;
after the analysis is finished, the main front-end server starts a new synchronization thread, sequentially initiates write requests to the non-front-end metadata server cluster according to the compressed logs, and synchronizes the latest metadata to the hard disk of the non-front-end metadata server cluster;
after the primary front-end server completes the synchronization of one metadata, the metadata stored in the memory of each server in each front-end metadata server is gradually deleted until all the metadata are deleted.
Further, when the front-end metadata server cluster processes the read-write request, the access load of each directory and each file is recorded; the access load counting method is as follows: the primary read request is counted as 1, the primary modification request is counted as 2, the primary creation request is counted as 3, and the primary deletion request is counted as 2; the load factor of each file is equal to: read request times + write request times + create request times + 3+ delete request times + 2; the load factor for each directory is equal to: read request times 1+ write request times 2+ create request times 3+ delete request times 2, and add the total of all directories and file load factors under the directory.
Further, the method includes step S5, repartitioning the namespace of the metadata server cluster: the main front-end server calculates the load factor of each directory and each file, and averages the load factors with the last calculated load factor stored in the main front-end server; and identifying the directories and files with the load factors exceeding a preset load threshold according to the average number of the load factors, and re-dividing the metadata of the directories and files with the load factors exceeding the load threshold by the front-end server.
The invention also provides a distributed file system which comprises a metadata server cluster, wherein the metadata server cluster consists of a plurality of metadata servers and comprises a preposed metadata server cluster and a non-preposed metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster, so that the system executes the metadata dynamic management method as claimed in any one of claims 1 to 9.
(III) advantageous effects
The invention provides a dynamic metadata management method for a distributed file system and the distributed file system, wherein a preposed metadata server cluster is constructed in a metadata server cluster, all metadata read-write requests initiated by a client are uniformly processed by the preposed metadata server cluster, metadata in the write requests are only stored in a memory of the preposed metadata server cluster, the preposed metadata server cluster synchronizes latest data to a non-preposed metadata server cluster at regular time, and the naming space of the metadata server cluster is divided again according to a set rule, so that high-speed access to the metadata can be provided, the load of the distributed file system is reduced, and better load balance is realized.
Drawings
FIG. 1 is a schematic diagram of the overall architecture of the distributed file system of the present invention.
Detailed Description
In order to make the objects, contents and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
In order to realize the purpose of the invention, the following technical scheme is adopted for realizing the purpose:
a dynamic management method for metadata of a distributed file system comprises the following steps:
s1, selecting a plurality of servers as prepositive servers in a metadata server cluster to form a prepositive metadata server cluster, wherein the rest metadata servers in the metadata server cluster form a non-prepositive metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster;
s2, starting a front server;
and S3, processing the metadata read-write request of the client.
The dynamic management method of the metadata comprises the following steps: step S2 includes:
s201, preprocessing: setting configuration files for front-end server
S202, initialization of the front server:
and according to the configuration file, each preposed server and other preposed servers are mutually communicated to automatically form a preposed metadata server cluster.
The dynamic management method of the metadata comprises the following steps: the configuration file comprises two types of communication addresses and ports, wherein one type is used for communication between the front-end servers, and the other type is used for communication with clients on the user side.
The dynamic management method of the metadata comprises the following steps: further comprising step S203: the front-end server elects a main front-end server.
The dynamic management method of the metadata comprises the following steps: the step S3 of processing the metadata read-write request of the client includes:
s301, the prepositive server processes the metadata writing request: the method comprises the following steps that a client is connected with any one preposed server to initiate a metadata writing request, and after the preposed metadata server cluster receives the metadata writing request initiated by the client, the metadata writing request is processed according to the following process:
if the client is connected with the main front-end server, the main front-end server performs writing operation;
and if the client is connected with the main front-end server, automatically forwarding the metadata writing request to the main front-end server by the front-end server for writing.
The dynamic management method of the metadata comprises the following steps: after receiving the write request, the main front-end server firstly writes the metadata in the write request into the memory of the server, and then writes the metadata into the memories of other front-end servers until the number of the front-end servers which are successfully written is more than half of the total number of the front-end servers; and the main front server returns a result of successful writing to the client.
The dynamic management method of the metadata comprises the following steps: after the write request is completed, the front metadata server cluster records a log, and the content of the log comprises the directory, the file path, the modification content and the modification time of the metadata.
The dynamic management method of the metadata comprises the following steps: the step S3 of processing the metadata read-write request of the client includes:
s302, the front server processes a metadata reading request: the client is connected with any one preposed server and sends a metadata reading request, and the preposed server processes according to the following procedures when receiving the reading request of the client: and the front-end server receiving the reading request determines whether the metadata to be read by the client is all stored in the front-end metadata server cluster through front-end metadata server cluster communication, and if the metadata to be read by the client is all stored in the front-end metadata server cluster, the front-end server directly reads the metadata from the front-end metadata server cluster and returns the metadata to the client.
The dynamic management method of the metadata comprises the following steps: if the metadata to be read by the client is not stored in the preposed metadata server cluster at all, the preposed server receiving the reading request sends the reading request to the non-preposed metadata server cluster, the non-preposed metadata server cluster calls the metadata required by the client from the hard disk, returns the metadata to the preposed server receiving the reading request, and returns the metadata to the client by the preposed server.
The dynamic management method of the metadata comprises the following steps: if the metadata part to be read by the client is stored in the preposed metadata server cluster, the preposed server firstly sends a reading request to the preposed metadata server cluster and a non-preposed metadata server cluster, the preposed server in the preposed metadata server cluster returns the required part of metadata to the preposed server receiving the reading request, and the non-preposed metadata server cluster calls the part of metadata required by the client from the hard disk and returns the metadata to the preposed server receiving the reading request; the prepositive server receiving the reading request carries out aggregation processing on the received two parts of metadata, and returns the metadata to the client after the processing;
the dynamic metadata management method further includes: s4, synchronizing the latest data from the preposed metadata server cluster to the non-preposed metadata server cluster: the main front-end server analyzes the log recorded by the front-end server at a preset time every natural day, and only the latest metadata log is reserved for the same directory and file, so that log compression is realized;
after the analysis is finished, the main front-end server starts a new synchronization thread, sequentially initiates write requests to the non-front-end metadata server cluster according to the compressed logs, and synchronizes the latest metadata to the hard disk of the non-front-end metadata server cluster.
The dynamic management method of the metadata comprises the following steps: after the primary front-end server completes the synchronization of one metadata, the metadata stored in the memory of each server in each front-end metadata server is gradually deleted until all the metadata are deleted.
The dynamic management method of the metadata comprises the following steps: when the preposed metadata server cluster processes the read-write request, the access load of each directory and each file is recorded.
The dynamic management method of the metadata comprises the following steps: the access load is as follows: one read request counts as 1, one modify request counts as 2, one create request counts as 3, and one delete request counts as 2.
The dynamic management method of the metadata comprises the following steps: the load factor of each file is equal to: read request times + write request times + create request times + 3+ delete request times + 2; the load factor for each directory is equal to: read request times 1+ write request times 2+ create request times 3+ delete request times 2, and add the total of all directories and file load factors under the directory.
The dynamic management method of the metadata further comprises the following steps of S5, repartitioning the name space of the metadata server cluster: the main front-end server calculates the load factor of each directory and each file, and averages the load factors with the last calculated load factor stored in the main front-end server; and identifying the directories and files with the load factors exceeding a preset load threshold according to the average number of the load factors, and re-dividing the metadata of the directories and files with the load factors exceeding the load threshold by the front-end server.
The metadata dynamic management method comprises the following division modes:
1) if a plurality of subdirectories exist under a certain directory and the load factors of the subdirectories are approximate, splitting and storing the subdirectories in different metadata server cluster servers;
2) if a plurality of subdirectories exist under a certain directory and the load factors of the subdirectories are greatly different, executing the step 1 on the subdirectories with high load factors;
3) if only one subdirectory exists under a certain directory, splitting and storing each subdirectory in different metadata server cluster servers;
4) if a plurality of files exist in a directory and the load factors of the files are similar, splitting and storing the files in different metadata server cluster servers;
5) if there is only one file under a directory, the main front-end server stores the metadata of the file in the memory of the front-end metadata server cluster, and sets the maximum survival time of the metadata.
A distributed file system comprising a cluster of metadata servers, wherein: the metadata server cluster consists of a plurality of metadata servers, and comprises a preposed metadata server cluster and a non-preposed metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster.
The distributed file system, wherein: the system performs a method for dynamic management of metadata as described in one of the above.
The distributed file system, wherein: the distributed file system also includes a cluster of data servers.
As shown in fig. 1, the distributed file system of the present invention includes a metadata server cluster, where the metadata server cluster is composed of a plurality of metadata servers, and the metadata dynamic management method includes the following steps:
s1, selecting N stations in a metadata server cluster (wherein N is odd greater than 3)
The number) servers are used as prepositive servers to form a prepositive metadata server cluster, and the prepositive servers store metadata in a memory; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster; the front server is used for executing metadata reading and writing requests initiated by the client. The selection rule of the front server is to select N servers with the first N bits of memory capacity so as to improve the response speed to the read-write request. The servers other than the front-end server in the metadata server cluster are called non-front-end servers, and the cluster formed by the non-front-end servers is called a non-front-end metadata server cluster.
S2, starting a front server:
s201, preprocessing: and setting a configuration file for the front server. The configuration file comprises two types of communication addresses and ports, wherein one type of communication addresses and ports is used for heartbeat detection, health check and data synchronization (hereinafter referred to as "peer communication") between the front-end servers, and the other type of communication addresses and ports is used for being connected with a client side at a user side and processing a read-write request (hereinafter referred to as "client communication");
s202, initialization of the front server:
according to the configuration file, each prepositive server is communicated with other prepositive servers, and a prepositive metadata server cluster is automatically formed through a distributed consistency protocol (Raft protocol);
s203, the front-end server uses a Raft protocol to elect a main front-end server;
s3, the front server processes the metadata read-write request
S301, the prepositive server processes the metadata writing request:
the client is connected with the client communication address and the port of any one prepositive server to initiate a metadata writing request, and after receiving the metadata writing request initiated by the client, the prepositive metadata server cluster processes according to the following procedures:
if the client is connected with the main front-end server, the main front-end server performs writing operation;
if the client is not connected with the main front-end server, the front-end server automatically forwards the metadata writing request to the main front-end server to perform actual writing operation;
after receiving a write request, a main front-end server writes metadata in the write request into a memory of a server, and then writes the metadata into memories of other front-end servers through peer communication until the number of the front-end servers which are successfully written is larger than half of the total number of the front-end servers (namely, the main front-end server follows the majority principle in the Raft protocol), so that a multi-point copy of data can be stored, the disaster tolerance characteristic of a distributed system is enhanced, and the data loss caused by the damage of a certain server is prevented;
the main front-end server returns a successful writing result to the client to realize distributed consistency;
after the cluster of the preposed metadata server completes the writing request, the main preposed server records a log, and the contents (called as a "journal log") of a directory, a file path, modification contents, modification time and the like of the metadata are recorded in detail in the log;
s302, the front server processes a metadata reading request:
the client can be connected with the client communication address and the port of any one front-end server and sends a metadata reading request. When receiving a read request of a client, a front-end server processes according to the following flow:
the read request can be sent to any pre-server responsible site receiving the read request
The processing is not required to be forwarded to the main preposed server for processing;
the front server receiving the reading request determines that the client is required to read through peer communication
Whether the read metadata are all stored in a memory of the preposed metadata server cluster, if the metadata to be read by the client are all stored in the preposed metadata server cluster, the preposed server directly reads the metadata from the preposed metadata server cluster and returns the metadata to the client;
if the metadata to be read by the client is not stored in the preposed metadata server at all
In the memory of the cluster, the prepositive server receiving the reading request sends the reading request to a non-prepositive metadata server cluster, the non-prepositive metadata server cluster calls metadata required by a client from a hard disk and returns the metadata to the prepositive server receiving the reading request, and the prepositive server stores the metadata in the memory and returns the metadata to the client by the prepositive server;
if the metadata part to be read by the client is stored in the preposed metadata server cluster, the preposed server firstly sends a reading request to the preposed metadata server cluster and then sends the reading request to the non-preposed metadata server cluster, the preposed server in the preposed metadata server cluster returns the required part of metadata to the preposed server receiving the reading request, and the non-preposed metadata server cluster calls the part of metadata required by the client from the hard disk and returns the metadata to the preposed server receiving the reading request; finally, the prepositive server receiving the reading request carries out aggregation processing on the received two parts of metadata, and the metadata is returned to the client after the aggregation processing;
s4, synchronizing the latest data from the front metadata server cluster to the metadata server cluster:
the preposed metadata server cluster contains the latest metadata information of the natural day, and the synchronization to the metadata server cluster is needed every natural day. On one hand, the latest metadata can be synchronized to the disks in the metadata server cluster, so that the metadata can be stored more safely; on the other hand, the method also recycles the memory resources to process the metadata writing request of the next natural day. Data synchronization is processed according to the following flow:
an independent analysis thread is arranged in the main front-end server, the main front-end server is started at a specific time (for example, 24 click) every natural day, a journal log recorded by the main front-end server is analyzed, and the journal log is analyzed according to a time reverse order, namely only the latest metadata log is reserved for the same directory and file, so that a journal log compression function is realized;
after the analysis is finished, the main preposed server starts a new synchronization thread, and sequentially initiates write requests to the non-preposed metadata server clusters according to the compressed journal log to synchronize the latest metadata to the non-preposed metadata server clusters, namely the non-preposed metadata server clusters update the latest metadata to the hard disk;
after the main front-end server completes the synchronization of one metadata, the metadata stored in the memory of each server in the front-end metadata server cluster is gradually deleted so as to recycle the memory resources until all the metadata are deleted;
s5, repartitioning the name space of the metadata server cluster:
when the preposed metadata server cluster processes the read-write request, the access load of each directory and each file is recorded:
since the front metadata server cluster processes all metadata read-write requests, it records the access load of each directory and file in the memory of the front server executing the read and write requests (including modification, creation and deletion requests). The load factor calculation formula is as follows:
the primary read request is as follows: 1
The primary modification request is counted as: 2
The primary creation request is counted as: 3
The primary delete request is as follows: 2
The load factor of each file is equal to: number of read requests × 1+ number of write requests × 2+ number of creation requests × 3+ number of deletion requests × 2
The load factor for each directory is equal to: the number of times of the directory read request is 1+ the number of times of the directory write request is 2+ the number of times of the directory creation request is 3+ the number of times of the directory deletion request is 2, plus the sum of all directories and file load factors in the directory.
After the step S4 is completed, the main front-end server calculates the load factor of each directory and file according to the above access load factor calculation formula, and averages the load factor with the last calculated load factor stored in the main front-end server. And identifying the directories and files with the load factors exceeding a preset load threshold according to the average number of the load factors, and re-dividing the metadata of the directories and files with the load factors exceeding the load threshold by the front-end server, wherein the division mode is as follows:
1. if a plurality of subdirectories exist under a certain directory and the load factors of the subdirectories are approximate, splitting and storing the subdirectories in different metadata server cluster servers;
2. if a plurality of subdirectories exist under a certain directory and the load factors of the subdirectories are greatly different (for example, the difference is more than 10 times), executing the step 1 on the subdirectories with high load factors;
3. if only one subdirectory exists under a certain directory, splitting and storing each subdirectory in different metadata server cluster servers;
4. if a plurality of files exist in a directory and the load factors of the files are similar, splitting and storing the files in different metadata server cluster servers;
5. if only one file exists in a certain directory, the main preposed server stores the file metadata in the memory of the preposed metadata server cluster, and sets the maximum survival time of the metadata to prevent the metadata from permanently occupying the memory resources of the preposed metadata server cluster.
The invention can provide high-speed access to the metadata, reduce the load on the distributed system and realize better load balance.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A dynamic management method for metadata of a distributed file system is characterized by comprising the following steps:
s1, selecting a plurality of servers as prepositive servers in a metadata server cluster to form a prepositive metadata server cluster, wherein the rest metadata servers in the metadata server cluster form a non-prepositive metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster;
s2, starting a front server;
and S3, processing the metadata read-write request of the client.
2. The method for dynamically managing metadata according to claim 1, wherein step S2 includes:
s201, preprocessing: setting a configuration file for the front-end server;
s202, initialization of the front server: according to the configuration file, each prepositive server and other prepositive servers are mutually communicated to automatically form a prepositive metadata server cluster;
the configuration file comprises two types of communication addresses and ports, wherein one type is used for communication between the front-end servers, and the other type is used for communication with the client side of the user side.
3. The dynamic metadata management method according to claim 1 or 2, further comprising step S203: the front-end server elects a main front-end server.
4. The dynamic metadata management method according to claim 3, wherein the step S3 of processing the read-write metadata request of the client includes:
s301, the prepositive server processes the metadata writing request: the method comprises the following steps that a client is connected with any one preposed server to initiate a metadata writing request, and after the preposed metadata server cluster receives the metadata writing request initiated by the client, the metadata writing request is processed according to the following process:
if the client is connected with the main front-end server, the main front-end server performs writing operation;
and if the client is connected with the main front-end server, automatically forwarding the metadata writing request to the main front-end server by the front-end server for writing.
5. The dynamic metadata management method according to claim 4, wherein after receiving the write request, the primary front-end server writes the metadata in the write request into the memory of the server, and then writes the metadata into the memories of other front-end servers until the number of front-end servers successfully written is greater than half of the total number of front-end servers; the main front-end server returns a result of successful writing to the client; after the write request is completed, the front metadata server cluster records a log, and the content of the log comprises the directory, the file path, the modification content and the modification time of the metadata.
6. The dynamic metadata management method according to claim 3, wherein the step S3 of processing the read-write metadata request of the client includes:
s302, the front server processes a metadata reading request: the client is connected with any one preposed server and sends a metadata reading request, and the preposed server processes according to the following procedures when receiving the reading request of the client: the preposed server receiving the reading request determines whether all metadata to be read by the client is stored in the preposed metadata server cluster through the preposed metadata server cluster communication, and if all the metadata to be read by the client is stored in the preposed metadata server cluster, the preposed server directly reads the metadata from the preposed metadata server cluster and returns the metadata to the client; if the metadata to be read by the client is not stored in the preposed metadata server cluster at all, the preposed server receiving the reading request sends the reading request to the non-preposed metadata server cluster, the non-preposed metadata server cluster calls the metadata required by the client from the hard disk, returns the metadata to the preposed server receiving the reading request and returns the metadata to the client by the preposed server; if the metadata part to be read by the client is stored in the preposed metadata server cluster, the preposed server firstly sends a reading request to the preposed metadata server cluster and a non-preposed metadata server cluster, the preposed server in the preposed metadata server cluster returns the required part of metadata to the preposed server receiving the reading request, and the non-preposed metadata server cluster calls the part of metadata required by the client from the hard disk and returns the metadata to the preposed server receiving the reading request; and the prepositive server receiving the reading request carries out aggregation processing on the received two parts of metadata and returns the metadata to the client after the processing.
7. The dynamic metadata management method according to claim 3, further comprising: s4, the preposed metadata server cluster synchronizes the latest data to the non-preposed metadata server cluster: the main front-end server analyzes the log recorded by the front-end server at a preset time every natural day, and only the latest metadata log is reserved for the same directory and file, so that log compression is realized;
after the analysis is finished, the main front-end server starts a new synchronization thread, sequentially initiates write requests to the non-front-end metadata server cluster according to the compressed logs, and synchronizes the latest metadata to the hard disk of the non-front-end metadata server cluster;
after the primary front-end server completes the synchronization of one metadata, the metadata stored in the memory of each server in each front-end metadata server is gradually deleted until all the metadata are deleted.
8. The dynamic management method of metadata according to claim 3, characterized in that:
when the preposed metadata server cluster processes the read-write request, the access load of each directory and each file is recorded; the access load counting method is as follows: the primary read request is counted as 1, the primary modification request is counted as 2, the primary creation request is counted as 3, and the primary deletion request is counted as 2; the load factor of each file is equal to: read request times + write request times + create request times + 3+ delete request times + 2; the load factor for each directory is equal to: read request times 1+ write request times 2+ create request times 3+ delete request times 2, and add the total of all directories and file load factors under the directory.
9. The dynamic metadata management method according to claim 8, further comprising, at S5, repartitioning the namespace of the metadata server cluster: the main front-end server calculates the load factor of each directory and each file, and averages the load factors with the last calculated load factor stored in the main front-end server; and identifying the directories and files with the load factors exceeding a preset load threshold according to the average number of the load factors, and re-dividing the metadata of the directories and files with the load factors exceeding the load threshold by the front-end server.
10. A distributed file system comprises a metadata server cluster, and is characterized in that the metadata server cluster consists of a plurality of metadata servers, and comprises a preposed metadata server cluster and a non-preposed metadata server cluster; all metadata read-write requests initiated by the client are uniformly processed by the preposed metadata server cluster, and metadata in the write requests are only stored in a memory of the preposed metadata server cluster, so that the system executes the metadata dynamic management method as claimed in any one of claims 1 to 9.
CN202011586836.6A 2020-12-29 2020-12-29 Dynamic metadata management method for distributed file system and distributed file system Pending CN112579543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011586836.6A CN112579543A (en) 2020-12-29 2020-12-29 Dynamic metadata management method for distributed file system and distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011586836.6A CN112579543A (en) 2020-12-29 2020-12-29 Dynamic metadata management method for distributed file system and distributed file system

Publications (1)

Publication Number Publication Date
CN112579543A true CN112579543A (en) 2021-03-30

Family

ID=75140383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011586836.6A Pending CN112579543A (en) 2020-12-29 2020-12-29 Dynamic metadata management method for distributed file system and distributed file system

Country Status (1)

Country Link
CN (1) CN112579543A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885552A (en) * 2019-02-18 2019-06-14 天固信息安全系统(深圳)有限责任公司 The metadata dynamic management approach and distributed file system of distributed file system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143215A (en) * 2011-01-20 2011-08-03 中国人民解放军理工大学 Network-based PB level cloud storage system and processing method thereof
CN103150394A (en) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 Distributed file system metadata management method facing to high-performance calculation
CN109885552A (en) * 2019-02-18 2019-06-14 天固信息安全系统(深圳)有限责任公司 The metadata dynamic management approach and distributed file system of distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143215A (en) * 2011-01-20 2011-08-03 中国人民解放军理工大学 Network-based PB level cloud storage system and processing method thereof
CN103150394A (en) * 2013-03-25 2013-06-12 中国人民解放军国防科学技术大学 Distributed file system metadata management method facing to high-performance calculation
CN109885552A (en) * 2019-02-18 2019-06-14 天固信息安全系统(深圳)有限责任公司 The metadata dynamic management approach and distributed file system of distributed file system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885552A (en) * 2019-02-18 2019-06-14 天固信息安全系统(深圳)有限责任公司 The metadata dynamic management approach and distributed file system of distributed file system
CN109885552B (en) * 2019-02-18 2023-08-18 天固信息安全系统(深圳)有限责任公司 Metadata dynamic management method of distributed file system and distributed file system

Similar Documents

Publication Publication Date Title
CN111124301B (en) Data consistency storage method and system of object storage device
KR102392944B1 (en) Data backup methods, storage media and computing devices
US10496669B2 (en) System and method for augmenting consensus election in a distributed database
US11755415B2 (en) Variable data replication for storage implementing data backup
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN107807794B (en) Data storage method and device
US8572022B2 (en) Automatic synchronization conflict resolution
CN102662992B (en) Method and device for storing and accessing massive small files
CN102855294B (en) Intelligent hash data layout method, cluster storage system and method thereof
EP1569085B1 (en) Method and apparatus for increasing data storage capacity
CN101854388B (en) Method and system concurrently accessing a large amount of small documents in cluster storage
CN105138571B (en) Distributed file system and method for storing massive small files
US9405643B2 (en) Multi-level lookup architecture to facilitate failure recovery
JP6086463B2 (en) Method, device and system for peer-to-peer data replication and method, device and system for master node switching
CN103455577A (en) Multi-backup nearby storage and reading method and system of cloud host mirror image file
CN111061431B (en) Distributed storage method, server and client
CN112181309A (en) Online capacity expansion method for mass object storage
KR20180061493A (en) Recovery technique of data intergrity with non-stop database server redundancy
CN108595119B (en) Data synchronization method and distributed system
CN103501319A (en) Low-delay distributed storage system for small files
CN113806300B (en) Data storage method, system, device, equipment and storage medium
CN109726211B (en) Distributed time sequence database
CN106951456B (en) Memory database system and data processing system
CN101470733A (en) Data block copy amount regulation method and distributed file system
CN112579543A (en) Dynamic metadata management method for distributed file system and distributed file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination