CN116501760A - Efficient distributed metadata management method combining memory and prefix tree - Google Patents

Efficient distributed metadata management method combining memory and prefix tree Download PDF

Info

Publication number
CN116501760A
CN116501760A CN202310349675.6A CN202310349675A CN116501760A CN 116501760 A CN116501760 A CN 116501760A CN 202310349675 A CN202310349675 A CN 202310349675A CN 116501760 A CN116501760 A CN 116501760A
Authority
CN
China
Prior art keywords
metadata
key
prefix tree
mds
bloom filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310349675.6A
Other languages
Chinese (zh)
Inventor
俞万刚
薛梅婷
曾艳
袁俊峰
张纪林
万健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202310349675.6A priority Critical patent/CN116501760A/en
Publication of CN116501760A publication Critical patent/CN116501760A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a high-efficiency distributed metadata management method combining a memory and a prefix tree. The method relates to a management mode of metadata in a distributed system, and comprises the steps of mapping the metadata by using an improved consistency Hash algorithm and storing the metadata by using a prefix tree mode so as to reduce the memory space. The invention uses the binary search based on the memory form and the search and the addition, thereby greatly improving the performance of the search and the addition. The uniformity of metadata distribution is solved through the balance factors of the MDS nodes, so that metadata can be distributed on each MDS node as uniformly as possible, and the metadata management performance is improved. In addition, the invention stores the actual metadata index by using the efficient prefix tree, and directly acquires the metadata from the disk through the index, thereby utilizing a small amount of space to exchange efficiency.

Description

Efficient distributed metadata management method combining memory and prefix tree
Technical Field
The invention relates to a management method of metadata in a distributed system, which comprises the steps of mapping the metadata by using an improved consistency Hash algorithm and storing the metadata in a prefix tree mode so as to reduce the memory space.
Background
Metadata is a special data used to describe data, such as in a file system, metadata is data describing file attributes, including file directory contents, file sizes, and file pointers, and includes locations from file names to recorded data.
With the development of the internet and the digital transformation of many aspects, a large number of applications generate massive data, such as picture data and system log data, which require a large amount of storage resources to store and manage the data. In order to better manage the data, the picture resources and the log data are compressed and stored in a file system, the files are organized by an operating system, and finally, resource metadata are stored in a structured database, and the database serves as a mapping tool of the files to the resources. Therefore, management of mass metadata becomes a great difficulty for a system using a file system as a storage medium.
The usual methods for distributed metadata management are: the static subtree partitioning method is suitable for scenes with more frequent Metadata searching, and when Metadata is dynamically increased, the load between distributed Metadata servers (Metadata servers) is unbalanced. According to the dynamic subtree partitioning method, the subtree dynamic alignment strategy greatly increases communication among MDSs, and the system performance has a certain influence. Hash mapping method: when the data features are similar, the phenomenon of data 'tilting' exists, and the problem of uneven load among MDSs is caused.
In order to cope with the access pressure of the single database to the massive data, a multi-database cluster mode is adopted to lighten the access of the single database, but the method reduces the access pressure of each database, but also leads to the redundancy of the data, and the influence of the massive data on the performance of the database is not solved at all. Therefore, in order to fundamentally solve the influence of mass data on the database, the data volume of a single database needs to be reduced to improve the performance, so that the data slicing storage is adopted, the mass data slicing storage is adopted in a database cluster, the data volume of an individual database is reduced, the data slicing aims at adopting slicing storage instead of slicing among metadata among different metadata, the pressure of the database is relieved, the metadata query efficiency is further improved, but the complexity of a system is greatly improved, the problems of distributed transaction caused by slicing data are solved, and the problems of the slicing strategy and the complex positioning after the data slicing are adopted.
Disclosure of Invention
According to the defects of the prior art, the invention provides a high-efficiency distributed metadata management method based on the combination of a memory and a prefix tree for massive metadata generated by applications, and provides a high-efficiency storage and query scheme for the massive metadata.
The invention comprises the following two stages:
the first stage: positioning a distributed metadata server MDS where metadata are located through improved consistency Hash; the improved consistency Hash comprises: the MDS object set, the Hash ring, and the metadata pointed to by each MDS identify a Bucket set Key socket.
The second stage: corresponding metadata is obtained from the located distributed metadata server MDS, all parts with the same metadata Key are shared through the prefix tree, and the address where the metadata is located is stored on the node of the corresponding prefix tree.
Compared with the prior art, the invention has the advantages that:
1. through the memory-based form, the operation efficiency is far higher than that of the traditional IO mode, and the search and the addition use binary search, so that the search and the addition performance are greatly improved.
2. The uniformity of metadata distribution is solved through the balance factors of the MDS nodes, so that metadata can be distributed on each MDS node as uniformly as possible, the problem of inclination caused by too high load of a certain MDS node is avoided, and the metadata management performance is improved.
3. The actual metadata index is stored by an efficient prefix tree, the metadata is directly obtained from the disk through the index, and a small amount of space is utilized to exchange efficiency.
Drawings
FIG. 1 is a schematic diagram of a compressed Bloom Filter;
FIG. 2 is a schematic diagram of a modified consistency Hash;
figure 3-prefix tree storage schematic.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific implementation steps:
the invention adopts the idea of two-level caching, and manages the metadata by dividing the management of the metadata into two stages:
the first stage: the MDS where the metadata is located through improved consistency Hash, the original consistency Hash adopts a single-ring structure, when a large amount of data has similar characteristics, a large amount of data can be mapped onto a Hash ring through a Hash function, the problem of uneven distribution occurs, the improved consistency Hash has good data uniformity and high-efficiency data query efficiency, and the improved consistency Hash comprises: MDS object set, hash ring (32 bit, data range (0-2) 31 -1)) and the metadata pointed to by each MDS identifies a Bucket set (Key Bucket). The MDS object has the IP address of the target MDS, the MDS serial number, the position of the Hash ring and the equalization factor attribute, as shown in figure 2.
The second stage: corresponding metadata is obtained from the located MDS. In order to improve efficiency and reduce memory occupation, the prefix tree is used for sharing the same part of all metadata keys (identification of the requested metadata), and the address where the metadata is located is stored on the node of the corresponding prefix tree.
Based on the conception, the invention adopts the following technical means:
and during initialization, the initial MDS is uniformly distributed on the Hash ring, then the Hash ring positions corresponding to the MDS are written into MDS objects, and all the MDS objects are loaded into one set to form an MDS set. The Key socket adopts an array structure, such as a Hash Code containing 8 bytes of metadata keys in fig. 2, a 1 byte check Code (whether the same Key is judged by the check Code when the Hash conflicts), a 1 byte MDS serial number (the MDS to which the metadata identification belongs is recorded), and the Key socket adopts a binary search mode to locate data.
As shown in fig. 1, compressing the Bloom filters groups the Bloom filters BFa and BFb and hashes the original data and inserts it into BFa and BFb. The compressed Bloom Filter is denoted by BFc and the insertion procedure of BFa and BFb is the same as conventional BF. The compressed Bloom Filter may be expressed as: assuming that the required Compressed Bloom Filter (CBF) length is m, first creating BFa of length m/2, filling BFa data, when the data fill exceeds a threshold (set according to the false positive rate requirement), creating BFb of length m/2, and filling BFb data. When BFa and BFb reach the threshold value under the respective misjudgment rate, BFa and BFb are combined into BFc through table rules, so that the purpose of memory compression is achieved.
Examples:
the embodiment comprises the following steps:
and (1) creating a consistent Hash server through an IO multiplexing technology, and receiving an external request. The operating system monitors the descriptors of the files by using the selection of NIO, the overall adopts a Reactor model to distribute events, a work thread receives requests and forwards the requests to the work thread, and the work thread uniformly processes the requests.
When the read request arrives, firstly, the metadata Key (identification of the requested metadata) is primarily judged through a compressed bloom filter (Compress Bloom Filter) of consistency Hash, for BFc with the length of m, the number of Hash mapping functions is k, the number of stored data is n, and the misjudgment rate of Compress Bloom Filter is as follows:
data that is not present is returned directly to the corresponding requestor and the connection is closed.
Preliminary estimation of 1000W metadata takes up approximately 1000W 10 b-95.37M, which is considerable. In the search process of Key socket, binary search is adopted, the search time complexity of single MDS is O (log), and the overall time complexity is O (Klog).
When Compress Bloom Filter determines that the metadata Key does not exist, the metadata is returned as it is, and if Compress Bloom Filter determines that the metadata Key exists, the metadata corresponding to the Key is preliminarily determined to exist, and the process proceeds to step (2).
The Compress Bloom Filter in this embodiment is used to efficiently screen out the non-existent metadata and provide a uniform hash basis, and occupies a very small amount of memory, thereby relieving the stress of the server.
When Compress Bloom Filter determines that the metadata Key may exist, the request searches the MDS where the metadata Key exists through the improved consistency Hash. In order to solve the problem of uneven distribution of metadata keys caused by the fact that the metadata keys with similar characteristic values pass through consistency Hash, the improved consistency Hash comprises two hashes, firstly, the metadata keys are mapped to one of N Hash functions through a first Hash function, and the first Hash function is as follows:
H i =a i *W+H i-1
wherein H is i Hash value, a, representing metadata Key i The Ascii value, W, representing a single character contained in the metadata Key represents a perturbation parameter.
And mapping the metadata Key to one of N Hash functions in Compress Bloom Filter through the Hash function, and mapping the metadata Key to a position corresponding to a Hash ring through the selected Hash function.
The problem of uneven distribution of the consistent Hash caused by the metadata Key with similar characteristic values is effectively solved by the method of the twice Hash, and then the metadata Key is searched by binary search.
For a write request, the requested metadata Key is directly added to BFc, the metadata Key is recorded, then a corresponding MDS set is found through improved consistency Hash, the metadata Key is found through binary search traversing the MDS set, and then the metadata Key is covered. For the case of Hash collision, a collision linked list needs to be traversed, whether the metadata exists or not is determined by comparing check codes, and if the metadata does not exist; if the load factor of the MDS node is too high, searching MDS with relatively low load through the MDS set, then storing the metadata Key into the positioned MDS, writing the check value and the adjusted MDS serial number into the node, and writing the check value and the adjusted MDS serial number into the positioned MDS.
And (3) when the metadata Key finds the target MDS through the positioning of the improved consistency Hash, forwarding the request to an actual metadata storage node, and managing and storing the metadata by the actual storage node. Each storage node adopts a prefix tree structure as an index of a metadata storage position, the nodes of the prefix tree store a data structure of specific indexes, the structure comprises 1B index number, 4B offset and 2B file serial number as shown in figure 3, the prefix tree adopts a four-layer structure, each node comprises 26 lowercase letters, and the whole prefix tree approximately occupies memory 26 4 *8B≡ 3.48M, where 8B represents the metadata index address (64-bit machine). 1000W data approximately occupies 1000 W.7B.apprxeq.66.76M memory. So 1000W data occupies 70.24M of the memory, and the disk memory can occupy 2: 32 *2 16 /2 40 =256T; and the data is shared in a circulating mode, so that the storage capacity is saved.
When the read-write metadata Key reaches the MDS storage node, firstly, the address of the prefix tree is obtained from the memory, the address of the next node is found by searching the root node, and then the subscript of the metadata Key is also increased. And returning the metadata index contained in the characters in the tree node after the Key is traversed, otherwise continuing to recursion from the root node until the metadata index represented by the Key is found. Finding the index of the metadata, finding the sequence number of the file through the index, and then finding the metadata through the offset after finding the file.
Other less than perfect matters of the invention are known in the art.
The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims (9)

1. A high-efficiency distributed metadata management method combining a memory and a prefix tree is characterized in that:
the first stage: positioning a distributed metadata server MDS where metadata are located through improved consistency Hash; the improved consistency Hash comprises: MDS object sets, hash rings and metadata pointed by each MDS identify Bucket set Key socket;
the second stage: corresponding metadata is obtained from the located distributed metadata server MDS, all parts with the same metadata Key are shared through the prefix tree, and the address where the metadata is located is stored on the node of the corresponding prefix tree.
2. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein: the metadata identification Bucket set Key socket comprises a Hash Code of an 8-byte metadata Key, a 1-byte check Code and a 1-byte MDS serial number.
3. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 2, wherein: and the metadata identification Bucket set Key socket adopts a binary search mode to locate data.
4. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein: the method further comprises the following steps of after the read request arrives, performing preliminary judgment on the metadata Key through a compression bloom filter:
when the compression bloom filter judges that the metadata Key does not exist, the metadata is directly returned to be absent;
when the compression bloom filter judges that the metadata Key exists, the metadata corresponding to the metadata Key is preliminarily determined to exist.
5. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein: the compressed bloom filter is realized by the following steps:
setting the required compressed bloom filter length to m, creating a first bloom filter BFa of length m/2, filling data to BFa, and when the data fill exceeds a threshold, creating a second bloom filter BFb of length m/2, filling data to BFb;
when the first bloom filter BFa and the second bloom filter BFb reach the threshold values under the respective misjudgment rates, the first bloom filter BFa and the second bloom filter BFb are combined into the compression bloom filter through the table rule, so as to achieve the purpose of compressing the memory.
6. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein: after the metadata corresponding to the metadata Key is preliminarily identified, the distributed metadata server MDS where the metadata Key is located is searched through the improved consistency Hash.
7. The method for efficient distributed metadata management in combination with a prefix tree, according to claim 1, wherein:
the node of the prefix tree stores a data structure of specific indexes, which comprises 1B index number, 4B offset and 2B file serial number; the prefix tree takes a four-level structure, with each node containing 26 lowercase letters.
8. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 4, wherein:
for a write request, adding the requested metadata Key into a compression bloom filter, recording the metadata Key, finding a corresponding MDS set through improved consistency Hash, and finding the metadata Key by traversing the MDS set through binary search; and then covered.
9. The method for efficient distributed metadata management in combination with a prefix tree as recited in claim 7, wherein:
when the read-write metadata key reaches the MDS storage node of the distributed metadata server, firstly, the address of the prefix tree is obtained from the memory, the address of the next node is found by searching the root node, and then the subscript of the metadata key is also increased; returning the metadata index contained in the characters in the tree node when the metadata Key is traversed, otherwise continuing to recursion from the root node until the metadata index represented by the metadata Key is found;
and finding the index of the metadata, finding the sequence number of the file through the index, and finding the metadata through the offset after finding the file.
CN202310349675.6A 2023-04-04 2023-04-04 Efficient distributed metadata management method combining memory and prefix tree Pending CN116501760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310349675.6A CN116501760A (en) 2023-04-04 2023-04-04 Efficient distributed metadata management method combining memory and prefix tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310349675.6A CN116501760A (en) 2023-04-04 2023-04-04 Efficient distributed metadata management method combining memory and prefix tree

Publications (1)

Publication Number Publication Date
CN116501760A true CN116501760A (en) 2023-07-28

Family

ID=87322275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310349675.6A Pending CN116501760A (en) 2023-04-04 2023-04-04 Efficient distributed metadata management method combining memory and prefix tree

Country Status (1)

Country Link
CN (1) CN116501760A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573703B (en) * 2024-01-16 2024-04-09 科来网络技术股份有限公司 Universal retrieval method, system, equipment and storage medium for time sequence data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573703B (en) * 2024-01-16 2024-04-09 科来网络技术股份有限公司 Universal retrieval method, system, equipment and storage medium for time sequence data

Similar Documents

Publication Publication Date Title
US4611272A (en) Key-accessed file organization
US5752243A (en) Computer method and storage structure for storing and accessing multidimensional data
CN109299113B (en) Range query method with storage-aware mixed index
Hutflesz et al. Globally order preserving multidimensional linear hashing
US10331641B2 (en) Hash database configuration method and apparatus
US20030004938A1 (en) Method of storing and retrieving multi-dimensional data using the hilbert curve
US20090125478A1 (en) Database heap management system with variable page size and fixed instruction set address resolution
CN113377868B (en) Offline storage system based on distributed KV database
CN111400306B (en) RDMA (remote direct memory Access) -and non-volatile memory-based radix tree access system
CN116501760A (en) Efficient distributed metadata management method combining memory and prefix tree
Otoo et al. A mapping function for the directory of a multidimensional extendible hashing
CN114610708A (en) Vector data processing method and device, electronic equipment and storage medium
Litwin et al. A new method for fast data searches with keys
CN114116612A (en) B + tree index-based access method for archived files
CN106168883A (en) A kind of efficient data tissue and access method
CN111274259A (en) Data updating method for storage nodes in distributed storage system
CN113157692B (en) Relational memory database system
CN113326262B (en) Data processing method, device, equipment and medium based on key value database
CN113392040B (en) Address mapping method, device and equipment
CN112380004B (en) Memory management method, memory management device, computer readable storage medium and electronic equipment
CN115203211A (en) Unique hash sequence number generation method and system
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
CN110569221B (en) File system management method, device, equipment and storage medium with version function
EP0117906B1 (en) Key-accessed file organization
Ramamohanarao et al. Partial match retrieval using recursive linear hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination