WO2022267508A1 - Procédé et appareil de compression de métadonnées - Google Patents

Procédé et appareil de compression de métadonnées Download PDF

Info

Publication number
WO2022267508A1
WO2022267508A1 PCT/CN2022/077759 CN2022077759W WO2022267508A1 WO 2022267508 A1 WO2022267508 A1 WO 2022267508A1 CN 2022077759 W CN2022077759 W CN 2022077759W WO 2022267508 A1 WO2022267508 A1 WO 2022267508A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
data
pieces
storage
user data
Prior art date
Application number
PCT/CN2022/077759
Other languages
English (en)
Chinese (zh)
Inventor
高蒙
潘浩
宋雨恒
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110944078.9A external-priority patent/CN115525209A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022267508A1 publication Critical patent/WO2022267508A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present application relates to the field of storage, in particular to a metadata compression method and device.
  • the present application provides a method and device for compressing metadata, which solves the problem that metadata occupies more storage resources.
  • the present application provides a metadata compression method, which includes: acquiring n pieces of metadata, where n is a positive integer greater than 1.
  • a piece of metadata includes a key-value pair, and the key-value pair includes a keyword and a value.
  • the keyword is used to indicate the identification of the data corresponding to the metadata
  • the value is used to indicate the actual address of the data storage.
  • m data corresponding to at least part of the n metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, where m is a positive integer less than or equal to n. Then, compress the n target values.
  • the n pieces of metadata in this application may specifically be n pieces of metadata stored in the storage system.
  • the n pieces of metadata may include n pieces of metadata in one or more nodes in the tree structure.
  • the n pieces of metadata may include n pieces of metadata in one or more ordered string tables (SStable) in a storage layer in the LSM tree.
  • processing m data may include migrating m data to change the actual address of m data, and then obtain n data that conforms to the set rule.
  • n target values corresponding to the metadata for another example, data may not be migrated to make the actual address of the data appear regular, and then n target values corresponding to n metadata that conform to the set rule can be obtained.
  • the present application does not have to limit the law that the n target values conform to, as long as the target value can be compressed according to the law.
  • the above n target values conforming to the set rule may refer to the n actual addresses indicated by the n target values as continuous actual addresses; for another example, the above n target values conforming to the set rule may refer to n Among the n actual addresses indicated by the target value, there is a storage space of the same size between each adjacent two actual addresses; for another example, the above n target values conforming to the set rule can refer to the n indicated by the n target values.
  • the size of the storage space between every two adjacent real addresses in one real address changes regularly and so on.
  • the n target values are compressed. Therefore, the problem that the value in the metadata cannot be compressed due to irregularity is solved, thereby achieving the effect of reducing storage resources occupied by the metadata.
  • the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
  • n target values by making the n actual addresses indicated by the n target values continuous, it is convenient to compress the target values in the metadata. For example, when compressing n target values, it is possible to record n target values by recording the first target value as the start bit among the n target values and the offset between other target values and the first target value. effect, so as to realize the compression of n target values.
  • the above-mentioned m data corresponding to at least part of the metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, including: migrating the m data to The data corresponding to the n pieces of metadata is stored in a storage space with continuous actual addresses. Afterwards, the actual addresses where n data are stored in the continuous storage space are saved as n target values.
  • the method of migrating m data is adopted, so that the data corresponding to n metadata is stored in a storage space with continuous actual addresses, and then the n data is stored in the continuous storage space.
  • the actual address is stored as n target values, so that the n target values can be made regular (that is, conform to the set rule), thereby facilitating the compression of the n target values.
  • the method further includes: selecting n pieces of metadata from the pieces of metadata according to the hotness and coldness of the data corresponding to the pieces of metadata included in the metadata set.
  • the data corresponding to the n pieces of metadata is cold data.
  • the metadata set may be any set including multiple metadata.
  • the metadata set can be , a set of metadata included in the storage layer above the first storage layer.
  • the metadata collection can be a collection of multiple metadata in the first storage layer.
  • the metadata collection is in The range in practical application may not be limited.
  • the n pieces of metadata are metadata in the first storage layer in the LSM tree.
  • the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
  • the metadata compression method provided in the present application can be applied to any storage layer in the LSM tree, so as to achieve the effect of compressing the metadata value in the storage layer.
  • the above key and value are stored in two data entries respectively.
  • the keyword data entry may not be affected when the data entry storing the value (called the value data entry) is modified.
  • the value data entry may not be affected when the data entry storing the value (called the value data entry) is modified.
  • the method further includes: detecting a data change amount of the metadata set; the metadata set is used to record metadata of multiple pieces of data. Acquiring n pieces of metadata includes: after determining that the amount of data change in the metadata set exceeds a change threshold, acquiring n pieces of metadata included in the metadata set.
  • the data change amount of the metadata set is detected, and after it is determined that the data change amount exceeds the change threshold, the acquisition of n pieces of metadata in the metadata set is triggered, so that the n pieces of metadata can be analyzed according to the method of the present application.
  • This way of compressing the value can achieve the effect of compressing the metadata in the metadata set after new metadata is stored in the metadata set, so as to reduce the storage resources occupied by the metadata set.
  • the metadata set refers to any set including multiple metadata.
  • the method further includes: acquiring a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata.
  • the above-mentioned processing of m data corresponding to at least part of the metadata in the n metadata includes: after determining that the degree of dispersion is greater than the discrete threshold, processing the m data corresponding to at least part of the metadata in the n metadata to process.
  • the method can be applied to a centralized storage system. Specifically, the method can be executed by an engine in the centralized storage system.
  • the method can be applied to a distributed storage system.
  • the distributed storage system includes multiple storage servers, and the foregoing method may be executed by one or more storage servers among the multiple storage servers.
  • the metadata compression device may be a hardware device for managing metadata in a storage system.
  • the metadata compression device may be an engine in a centralized storage system or a part of hardware devices in an engine, or the metadata compression device may be a storage server in a distributed storage system or a part of hardware devices in a storage server.
  • the metadata compression device may include: an acquisition unit, configured to acquire n pieces of metadata, one piece of metadata includes a key-value pair, the key-value pair includes a keyword and a value, and the keyword is used to indicate The identifier of the data corresponding to the metadata, the value is used to indicate the actual address of the data storage, and the n is a positive integer greater than 1.
  • a processing unit configured to process m data corresponding to at least some of the n metadata, to obtain n target values corresponding to the n metadata conforming to a set rule, where m is less than or equal to n positive integer of .
  • a compression unit configured to compress the n target values.
  • the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
  • the processing unit is configured to process the m data corresponding to the at least part of the metadata, and obtain the n target values corresponding to the n metadata conforming to the set rules, including:
  • the processing unit is specifically configured to migrate the m pieces of data, so as to store the data corresponding to the n pieces of metadata in a storage space with continuous actual addresses.
  • the processing unit is specifically configured to save the actual addresses of the n data stored in the continuous storage space as the n target values.
  • the processing unit is further configured to select the n pieces of metadata from the multiple pieces of metadata according to the hotness and coldness of the data corresponding to the multiple pieces of metadata, and the n pieces of metadata The corresponding data is cold data.
  • the n pieces of metadata are metadata in the first storage layer in the LSM tree.
  • the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
  • the above key and value are stored in two data entries respectively.
  • the processing unit is further configured to detect a data change amount of a metadata set; the metadata set is used to record metadata of multiple pieces of data.
  • the acquisition unit is configured to acquire n pieces of metadata, including: the acquisition unit is specifically configured to acquire the metadata included in the metadata set after determining that the amount of data change in the metadata set exceeds a change threshold. n metadata.
  • the obtaining unit is further configured to obtain a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata.
  • the processing unit is configured to process m pieces of data corresponding to at least part of the metadata in the n pieces of metadata, including: the processing unit is specifically configured to, after determining that the degree of dispersion is greater than a dispersion threshold, process m data corresponding to at least part of the n metadata are processed.
  • the metadata compression device is located in an engine in the centralized storage system.
  • the metadata compression device is located in a storage server in a distributed storage system.
  • a storage device including: a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute computer instructions from the memory, so as to realize the first aspect or the implementations in the first aspect The method provided by any of the methods.
  • a storage system including an engine and a plurality of hard disks, the plurality of hard disks are used to store data, and the engine is used to execute the method provided in any one of the above-mentioned first aspect or each implementation manner of the first aspect .
  • the storage system may be a centralized storage system.
  • a storage system including a plurality of storage servers, the plurality of storage servers are used to store data, and the first server among the plurality of storage servers is used to perform the above-mentioned first aspect or in each implementation manner of the first aspect Either of the methods provided.
  • the storage system may be a distributed storage system.
  • the first server may be a storage server capable of managing metadata in the distributed storage system.
  • a chip including a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to implement the first aspect or The method provided by any one of the implementations in the first aspect.
  • a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned first aspect or each implementation manner in the first aspect can be realized. method provided by the item.
  • a computer program product includes instructions, and when the instructions are run on a processor, the above-mentioned first aspect or any one of the implementations in the first aspect is implemented. method.
  • FIG. 1 is a schematic structural diagram of a storage system provided by the present application.
  • FIG. 2 is a schematic flow chart of writing data to a storage system provided by the present application
  • FIG. 3 is a schematic flow diagram of a metadata compression method provided by the present application.
  • FIG. 4A is one of the flow diagrams for merging metadata from layer L1 to layer L2 in the LSM tree provided by the present application ;
  • FIG. 4B is the second schematic flow diagram of merging metadata from L1 layer to L2 layer in the LSM tree provided by the present application;
  • FIG. 5A is one of the schematic diagrams for data migration provided by the present application.
  • Figure 5B is a second schematic diagram of data migration provided by this application.
  • Figure 6A is the third schematic diagram of data migration provided by this application.
  • FIG. 6B is a fourth schematic diagram of data migration provided by this application.
  • FIG. 7 is one of the schematic structural diagrams of a keyword data entry and a value data entry provided by the present application.
  • Fig. 8 is the second structural diagram of a keyword data entry and a value data entry provided by the present application.
  • FIG. 9 is the third schematic diagram of the structure of a keyword data entry and a value data entry provided by this application.
  • FIG. 10 is one of the structural schematic diagrams of a metadata compression device provided by the present application.
  • FIG. 11 is the second structural schematic diagram of a metadata compression device provided by the present application.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • user data can be accessed by running an application program.
  • the computer running the application program may be referred to as an "application server".
  • the application server 100 may be a physical machine or a virtual machine.
  • the application server 100 includes, but is not limited to, desktop computers, servers, notebook computers, and mobile devices.
  • the application server accesses the storage system 120 through the switch 110 to access user data.
  • the switch 110 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the switch 110 can also be replaced with an Ethernet switch, an InfiniBand switch, a RoCE (RDMA over Converged Ethernet) switch, and the like.
  • the storage system 120 is a device or a device cluster for storing user data.
  • the storage system 120 may be a centralized storage system.
  • a centralized storage system is characterized by a unified entrance, and all data from external devices such as application servers must pass through this entrance.
  • the entrance of the centralized storage system may specifically be the engine 121 of the centralized storage system.
  • the engine 121 may include one or more controllers, and one controller 122 is taken as an example in FIG. 1 for illustration.
  • multiple controllers can be used as backups for each other through mirroring channels.
  • one of the controllers fails, other controllers can take over the business of the faulty controller, thereby Avoid hardware failures leading to the unavailability of the entire storage system.
  • the engine 121 may further include a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100 .
  • the backend interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 can be connected with more hard disks 134, thereby forming a very large storage resource pool.
  • the controller 122 may include a processor 123 and a memory 124 .
  • Processor 112 may be a central processing unit (central processing unit, CPU), used to process data access requests from outside the storage system (such as application servers or other storage systems), and also used to process requests generated inside the storage system.
  • CPU central processing unit
  • the CPU 123 receives the write data requests sent by the application server 100 through the front-end interface 125, it will temporarily store the user data in these write data requests in the memory 124.
  • the CPU 123 sends the user data stored in the internal memory 124 to the hard disk 134 for persistent storage through the back-end interface.
  • the memory 124 is an internal memory for directly exchanging data with the processor. It can read and write data at any time, and the reading and writing speed is fast. It can be used as a temporary data storage for an operating system or other running programs.
  • the memory 124 may include various types of memory, for example, the memory may be a random access memory or a read-only memory (Read Only Memory, ROM).
  • the random access memory is Dynamic Random Access Memory (Dynamic Random Access Memory, DRAM) or Storage Class Memory (Storage Class Memory, SCM).
  • DRAM Dynamic Random Access Memory
  • SCM Storage Class Memory
  • DRAM is a semiconductor memory, which, like most Random Access Memory (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory.
  • DRAM and SCM are exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as Static Random Access Memory (Static Random Access Memory, SRAM) and the like.
  • Static Random Access Memory Static Random Access Memory
  • the read-only memory for example, it may be a programmable read-only memory (Programmable Read Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), and the like.
  • the memory 124 can also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM for short), that is, a module composed of DRAM, or a solid state disk (Solid State Disk, SSD) .
  • DIMM Dual In-line Memory Module
  • multiple memories 124 and different types of memories 124 may be configured in the controller 0 .
  • This embodiment does not limit the quantity and type of the memory 124 .
  • the memory 124 can be configured to have a power saving function.
  • the power saving function means that the data stored in the internal memory 124 will not be lost when the system is powered off and then powered on again. Memory with a power saving function is called non-volatile memory.
  • the storage system may include two or more engines 121 , and redundancy or load balancing is performed among the multiple engines 121 .
  • the engine 121 may also include hard disk slots.
  • the hard disk 134 can be directly deployed in the engine 121, and the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, you can More hard disks or hard disk enclosures are connected through the back-end interface 126 .
  • FIG. 1 only provides a schematic structural diagram of a centralized storage system as an example.
  • the storage system 120 may be composed of multiple independent storage servers, where the storage servers may communicate with each other.
  • each storage server may respectively include hardware components such as a processor, a memory, a network card, and a hard disk.
  • the processor and memory are used to provide computing resources; the processor is used to process data access requests from outside the storage server; the memory is used to directly exchange data with the processor's internal memory, which can read and write data at any time, and the speed is very fast. Can be used as temporary data storage for the operating system or other running programs.
  • a hard disk is used to provide storage resources, such as storing data, and it can be a magnetic disk or other types of storage media, such as solid-state hard disks or shingled magnetic recording hard disks.
  • the storage server may also include a network card for communicating with the application server.
  • the actual address of the storage space provided by the hard disk 134 is generally not directly exposed to the application server 100 for use.
  • the storage system 120 stores metadata recording actual addresses of user data.
  • the application server 100 writes data into the hard disk 134
  • the metadata of the user data is added to the metadata file to record the actual address of the data.
  • the actual address of the user data can be determined by searching the metadata of the user data in the metadata file recording the above metadata.
  • the data described by metadata is referred to as "user data" in this embodiment of the application.
  • the user data mentioned in the embodiment of this application can be understood as the data stored in the storage system provided by the application server to provide related services, and the metadata is the data used to describe these user data (data that describes other data), including but It is not limited to the actual address of the user data storage, the mapping relationship between the logical address and the actual address, the attribute of the user data and other information.
  • user data may also be called “data” or other names, which may not be limited in this embodiment of the present application.
  • the application server 100 may send a read data request carrying the identifier of the user data to the storage system 120, where the identifier of the user data may be the application server 100 The logical address of this user data used in , etc.
  • the CPU 123 finds the actual address of the user data from the metadata file stored in the internal memory 124 or the hard disk 134 according to the identifier of the user data, wherein the actual address of the user data The address may be the physical address of the bottom layer of the user data in the storage system 120 or the logical address of the middle layer. Then CPU 123 reads the user data in the above-mentioned actual address of hard disk 134 through back-end interface 126, and feeds back to application server 100 through front-end interface 125.
  • mapping relationship between the identifier of the user data and the actual address of the user data is recorded in the metadata.
  • the mapping relationship may be stored in the form of a key-value pair (KV pair).
  • KV pair key-value pair
  • a metadata file including metadata is stored in the internal memory 124.
  • the identification of user data identification 1-5 in the figure
  • the actual address of the data addresses 1-5 in the figure
  • the value of the key-value pair so as to establish a mapping relationship between the identifier of the user data and the actual address of the data in the form of a key-value pair.
  • the content of the key-value pairs in the metadata is only shown in the form of a list as an example in Figure 1, and the key-value pairs are also stored in other forms (such as using a tree structure) in practical applications.
  • the storage form may not be limited in this application.
  • key-value pairs are stored as a representative of non-relational databases, which abandons the strict field structure of data tables in relational databases and the relationship restrictions between tables.
  • the data stored in key-value correspondence adopts a simplified data model, so that key-value pair storage has the following advantages: First, high scalability, because there is no strict field structure of the data table and the relationship between tables, the key-value pair Distributed applications can be easily deployed on multiple servers, thereby improving the scalability of the entire system and making it more convenient and flexible. Second, mass storage and high throughput capacity to meet the needs of cloud computing. Key-value pair storage can well meet the flexible needs of users for scalability in the cloud computing environment. Therefore, key-value pair storage is increasingly becoming the mainstream storage method.
  • structured forms such as binary search tree, balanced tree (B tree), B+ tree or structured merge tree (log structured merge tree, LSM tree) are usually used.
  • data is stored.
  • LSM tree LSM tree
  • the LSM tree is one of the commonly used storage structures in a log-based database system, and the LSM tree is a multi-layer framework.
  • the LSM tree is mainly stored in memory.
  • the metadata in all or some nodes of the LSM tree can also be temporarily stored in the hard disk. When these nodes need to be read When the metadata in the node is copied to the memory.
  • the hard disk 134 includes an ordered string table area 1341 for storing an ordered string table (sorted string table, SStable) and a data storage area 1342 for storing user data, wherein the ordered string table
  • the table area 1341 and the data storage area 1342 are generally logically divided storage areas in the hard disk 134 .
  • the storage system 120 When the storage system 120 receives a data write request for writing user data X into the storage system, as shown in FIG. Corresponding key-value pairs) are written into memory table (memtable) 1241 in order. In addition, what is not shown in the figure is that in practical applications, after receiving a data write request for data X, the storage system 120 can also record this write operation through a write-ahead logging (WAL) in the log file for failover.
  • WAL write-ahead logging
  • the memory table 1241 is located in the memory 124 , and when the memory table 1241 exceeds a certain threshold, it will be frozen in the memory and switched to an immutable memory table (immutable memtable) 1242 . At this time, in order not to block the write operation, a new memory table will be regenerated in the memory of the storage system 120 to continue to provide services. Afterwards, the non-modifiable memory table 1242 will be written into the ordered string table area 1341 in batches.
  • the ordered string table area is located on one or more hard disks 134 .
  • the ordered character string table area 1341 includes a multi - storage layer structure, such as the L0 layer, L1 layer and L2 layer as shown in FIG.
  • each storage layer may include one or more SSTables, and further one or more SSTables may be stored in the form of structural data.
  • the non-modifiable memory table 1242 When the non-modifiable memory table 1242 is written into the ordered character string table area 1341, the non-modifiable memory table 1242 will first be written into the top-level storage layer, such as layer L0 in FIG. 2 .
  • the SSTable in the L 0 layer will be merged into the L 1 layer, and when the data amount in the L 1 layer reaches the threshold, the SSTable in the L 1 layer will be merged (merge) to L2 layer, and so on , so that old metadata can be continuously deleted and new data can be continuously written.
  • the LSM tree for storing metadata is introduced by taking the three-layer storage layer as an example. It can be understood that in practical applications, the LSM tree can be composed of more or fewer layers of storage layers. Do limit.
  • the storage system 120 first searches the metadata of the user data X in the memory table 1241; if found, then according to the actual address in the metadata , to access the data storage area 1342 .
  • search down in turn specifically first look up the metadata of user data X in the non-modifiable memory table 1242, if it is determined that there is no user data X in the non-modifiable memory table 1242 metadata of user data X in L 0 layer; if it is determined that there is no metadata of user data X in L 0 layer, then search for metadata of user data X in L 1 layer, and so on, Until the metadata of the user data X is found, then access the data storage area 1342 according to the actual address in the metadata.
  • the key in the key-value pair is the identifier of the user data, and the value in the key-value pair is the actual address of the user data storage.
  • logical unit number logical unit number ID, LUN ID
  • snapshot number snapshot number
  • logical block address logical block address, LBA
  • the key may also be one or more items of a version number (Version), a file name and an offset within the file, or a hash value of the file name and the offset within the file.
  • the address corresponding to the user data identifier in the metadata is called the "actual address”.
  • Type produces restrictions.
  • the identification and actual address of user data can be understood as a relative relationship. Compared with the actual address of user data, the identification of user data is closer to the upper application, and the actual address of user data is closer to the underlying hardware. .
  • the actual address of the user data can be the address in the logical chunk group (chunk group) corresponding to the user data in the LUN.
  • the actual address of the user data can include the chunk group ID of the chunk group where the user data is located and the user
  • the offset of the data in the chunk group, or the actual address of the user data can be the address of the user data in the physical block (chunk) in the hard disk.
  • the actual address of the user data can include the physical chunk ID and the physical chunk of the physical chunk where the user data is located. The offset in the physical chunk.
  • the actual address of the user data can also be the physical address where the user data is stored in the hard disk. Assuming that the hard disk is a solid-state hard disk, the physical address is the block ID and page ID where the user data is located in the solid-state hard disk. This application may not limit the specific form of the identification and actual address of the user data. For the convenience of description, this embodiment is described by taking the physical address of the user data as an example in which the value in the key-value pair is used.
  • the size of the key-value pair depends on the LUN ID, LBA, version number, or the number of bytes in the physical address. In this case, the key-value pair The number of sections is generally 24 to 32 bytes. If calculated according to the data block size of 8K, the storage capacity occupied by metadata and the storage capacity occupied by user data account for about 0.3%.
  • a prefix compression method may be used to compress the keys in the key-value pairs.
  • the key part of multiple metadata such as multiple metadata in a storage layer in an LSM tree or in one or more SSTables in a storage layer
  • a common part such as LUN ID or snap ID
  • this application considers that in related technologies, only the key field in the key-value pair is compressed, because the value field is generally the actual address of user data or Fingerprint, in which the actual address is related to the space allocated when user data is written, so the actual address usually has no regularity; in addition, the fingerprint of user data depends on the content of the user data itself, and it is difficult to have regularity, so it is difficult to determine the value Some content is compressed.
  • the embodiment of the present application provides a technical solution for metadata compression.
  • the value used to indicate the actual address of user data contained in multiple metadata does not have regularity, it can By processing the user data corresponding to the plurality of metadata, for example, the user data is migrated to multiple continuous actual addresses. Therefore, the value in the metadata of these user data has regularity, thereby facilitating the compression of the value.
  • the embodiment of the present application provides a metadata compression method.
  • the method may be implemented by the engine 121 in the storage system 120 in FIG. 1 , specifically, by a controller in the engine 121 .
  • the central processing unit 123 invokes the program instructions in the memory for execution.
  • the memory may be the memory 124 in FIG. 1 , or a cache located in the central processing unit 123 .
  • the method may also be implemented by other hardware devices used to manage metadata in the storage system.
  • the storage system is a distributed storage system
  • the method may be implemented by a storage server capable of managing metadata in the distributed storage system or by some hardware in the storage server.
  • the metadata compression method is introduced below: as shown in FIG. 3 , in the process of writing data to the storage system, the method includes:
  • the write data request carries user data requested to be written into the storage system.
  • metadata includes key-value pairs.
  • the key of the key-value pair is used to indicate the identity of the user data, and the value is used to indicate the actual address of the user data storage.
  • the keywords in the key-value pairs included in the above-mentioned metadata are referred to as "keywords in the metadata”; the values in the key-value pairs included in the above-mentioned metadata , referred to as “values in metadata”, unless otherwise specified, can be understood as above for “keywords in metadata” and “values in metadata”, which will not be repeated below.
  • the engine 121 continues to receive data write requests, more and more user data is stored in the memory 124.
  • the engine 121 will store the user data in the memory 124
  • the data is written into the hard disk 134 for persistent storage.
  • the address where the user data is stored in the hard disk 134 is the actual address where the user data is stored in this embodiment.
  • the write data request received by the engine 121 not only carries the user data, but also includes the logical address of the user data.
  • the logical address is an address presented to the application server 100, and is used to enable the application server 100 to access the user data.
  • the engine 121 may use the logical address or the hash value corresponding to the logical address as a key in the metadata, and use the actual address of the user data stored in the hard disk 134 as the metadata In the value, save the corresponding relationship between the keyword and the value as a key-value pair.
  • the engine 121 switches the content in the memory table 1241 to an unmodifiable memory table 1242 and generates a new memory table.
  • switching the content in the memory table 1241 to the non-modifiable memory table 1242 may mean that the engine 121 modifies the attributes of the memory table 1241 so that the modified memory table (that is, the non-modifiable memory table) no longer receives new data.
  • the threshold in S304 may be the same as or different from the threshold in S303.
  • the engine 121 may transfer the content in the unmodifiable memory table 1242 to the top layer L0 of the ordered string table area 1341 . Then, when the amount of data in the L 0 layer exceeds the threshold, the metadata in the L 0 layer is merged into the L 1 layer.
  • the L2 layer includes two metadata files, which are respectively used to store metadata corresponding to cold data and metadata corresponding to hot data.
  • the metadata file used to store metadata corresponding to cold data is called a cold metadata file
  • the metadata file used to store metadata corresponding to hot data is called a hot metadata file.
  • the two metadata files in the L2 layer may respectively include different ordered character string tables in the L2 layer.
  • each metadata file can be stored in a tree structure to facilitate data search.
  • the hot or cold degree of user data can be understood as the possibility of the user data being modified (the possibility of being modified can be specifically reflected as the historical modification frequency or historical modification times of the user data and other parameters) high and low.
  • cold data may be understood as user data whose possibility of being modified is lower than a certain threshold
  • hot data may be understood as user data whose possibility of being modified is higher than a certain threshold.
  • each ordered string table stores metadata of different keyword ranges respectively, wherein the ordered string table 1 stores key
  • the metadata whose word range is key1-key20 for example, the metadata corresponding to key1, key3, key4, key7, key12, key15, key16, key18, key19, and key20 are currently stored in the ordered string table 1 in Figure 4A.
  • String table 2 stores metadata whose key range is key21-key40
  • ordered string table 3 stores metadata whose key range is key41-key60
  • ordered string table 4 The metadata with the key range of key61-key80 is stored in , and the metadata with the key range of key81-key100 is stored in the ordered string table 5.
  • the L1 layer includes 5 ordered string tables and each ordered string table is used to store metadata of 20 keyword ranges as an example for illustration.
  • this example compares the number of ordered string tables included in each storage layer in the LSM tree, the keyword range corresponding to each ordered string table, and the number of metadata included in each ordered string table. The number is not limited.
  • S305 may specifically include:
  • the keywords of the leftmost path of the binary tree corresponding to each ordered string table can be traversed sequentially by post-order traversal and merge sorting, so as to realize traversal and read Each key in the sequence string table.
  • S3053 is executed for each keyword.
  • the hot or cold degree of the user data may be judged according to the IO type corresponding to the user data. Specifically, considering that under normal circumstances, user data using sequential write IO is colder and hotter, and user data using random write IO is hotter. Therefore, user data using sequential write IO can be used as cold Data, the user data of random write IO will be used as hot data.
  • the cold metadata file can be divided into multiple sub-files, wherein each sub-file can be an ordered string table, and each ordered string table stores metadata of different key ranges respectively.
  • each sub-file can be an ordered string table
  • each ordered string table stores metadata of different key ranges respectively.
  • the cold metadata file in the L2 layer includes 10 SSTables: ordered string table 6-ordered string table 15, these 10 SSTables are used to store key1-key10, Metadata for the key range of key11-key20...key99-key100.
  • the hot metadata file can also be divided into multiple subfiles, where each subfile can be an SSTable, and each SSTable stores metadata of different keyword ranges.
  • each subfile can be an SSTable
  • each SSTable stores metadata of different keyword ranges.
  • the hot metadata file in the L2 layer includes 10 SSTables: ordered string table 16-ordered string table 25, and these 10 SSTables are used to store hot metadata files respectively Metadata for key ranges of key1-key10, key11-key20...key99-key100.
  • the method provided in this embodiment also includes:
  • the L2 layer may include 10 sub-files: ordered string table 6-ordered string table 15, these 10 sub-files are used to store key1-key10, key11-key20...key99 respectively Keyword-scoped metadata for -key100. Furthermore, subsequent steps may be performed on the 10 subfiles to compress the 10 subfiles; or, subsequent steps may be performed on some of the 10 subfiles to compress the subfiles.
  • the address discrete degree corresponding to the sub-file can be understood as the discrete degree of the actual address of the user data corresponding to the metadata included in the sub-file.
  • the sub-file includes metadata of n pieces of user data, and the n pieces of user data are respectively stored in n actual addresses, where n is a positive integer. Then, the higher the dispersion degree of the n actual addresses, the higher the dispersion degree of the address corresponding to the sub-file; the lower the dispersion degree of the n actual addresses, the lower the dispersion degree of the address corresponding to the sub-file.
  • the degree of discreteness of the n actual addresses can be specifically reflected in that the n actual addresses can be reflected according to several laws.
  • m actual addresses are continuous, where m is a positive integer less than n
  • the other n-m actual addresses are continuous, that is to say, it can follow two rules to reflect the n actual addresses
  • there are q actual addresses that are continuous where q is a positive integer less than n
  • the n actual addresses are reflected according to four rules. Then the address dispersion of the sub-file in the first case is smaller than the address dispersion of the sub-file in the second case.
  • the user data corresponding to the metadata of the sub-file can be stored in a continuous storage space by migrating the user data; in another implementation, the sub-file can also be The user data corresponding to the metadata of the file is segmented and stored in multiple blocks of continuous storage space.
  • the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in a continuous storage space.
  • the user data corresponding to the sub-file may be migrated to a continuous free storage space.
  • the sub-file includes metadata of 5 user data, as shown in FIG. 5A , the 5 user data are respectively stored in address 1, address 3, address 4, address 8 and address 10. Then, by reading the data in these 5 addresses respectively and migrating the data in these 5 addresses to the unused address 11-address 15 respectively, the storage spaces of these 5 user data are continuous.
  • part of the user data corresponding to the sub-file can be migrated to the continuous storage space of other user data corresponding to the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in the in a contiguous storage space.
  • the sub-file includes metadata of 5 user data, as shown in FIG. 5B , the 5 user data are stored in address 1, address 3, address 4, address 8 and address 10 respectively. Then, by reading the data at address 8 and address 10 respectively, and migrating the data at address 8 and address 10 to address 2 and address 5 respectively, the storage spaces of these five user data are continuous.
  • the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata of the sub-file is stored in segments into multiple consecutive storage spaces .
  • the user data corresponding to the sub-files may be migrated to multiple consecutive segments of storage space.
  • the sub-file includes metadata of 10 user data, as shown in FIG. 16.
  • Address 17 and Address 20 Then read the data in these 10 addresses respectively, and migrate the data in these 10 addresses to the unused address 21-address 25, address 32-address 36, so that the 10 user data
  • the storage space of user data 1-user data 5 is continuous
  • the storage space of user data 6-user data 10 is continuous.
  • the user data corresponding to the subfile can be migrated to the continuous storage space of other user data corresponding to the subfile, so that the user data corresponding to the metadata of the subfile can be stored in segments at most in a contiguous storage space.
  • the sub-file includes metadata of 10 user data, as shown in FIG. 16.
  • Address 17 and Address 20 By migrating user data 2-user data 5 to the storage space continuous with user data 1, and user data 7-user data 10 to the storage space continuous with user data 7, so that the users in these 10 user data
  • the storage space of data 1-user data 5 is continuous
  • the storage space of user data 6-user data 10 is continuous.
  • the migrated actual address may be used as a value in the metadata of the migrated user data, and updated in the metadata of the user data.
  • the actual address of the user data is irregular, the actual address of the user data is usually stored as a value in the metadata.
  • the user data is stored in consecutive actual addresses, it can be By summarizing the change law of continuous actual addresses, a compression algorithm reflecting this change law is generated, for example, the compression algorithm may be a binary first-order function.
  • the law of the actual address of the user data can be summarized by using machine learning to generate the binary first-order function.
  • the actual address of the user data needs to be persistently stored, and the actual address of the user data does not need to be stored.
  • the actual address of the user data needs to be read, by using the compression value corresponding to the actual address of the user data as an input of the compression algorithm, the actual address of the user data can be output by the compression algorithm.
  • the actual addresses of the five user data can be expressed as 0x0000000100000000, 0x0000000100000001, 0x0000000100000002, 0x0000000100000003, and 0x0000000001.
  • "0x" in the actual address represents a hexadecimal number
  • "00000001" in the middle represents the disk ID
  • the last 8 digits represent the physical block number in the disk.
  • start key (start key): 0000000100000000
  • number of data 5
  • compression algorithm 1 (indicating that the compression algorithm is a first-order function with a slope of 1)
  • each element Compressed values for values in data 0, 1, 2, 3, 4.
  • a prefix compression method may also be used to compress the values in the metadata by extracting a common prefix.
  • the embodiment of the present application may not limit the compression method adopted for the value in the metadata.
  • the method further includes:
  • a prefix compression method can be used to extract a common prefix (such as a volume number (LUN ID), a subnet access point identifier (snap ID), etc.) for keywords in metadata, and then compress the keywords.
  • a common prefix such as a volume number (LUN ID), a subnet access point identifier (snap ID), etc.
  • a first-order function can also be used to represent the relationship between the offset and the keyword subscript, and the keyword subscript and the function can directly Calculate the offset part of the keyword, that is, only need to record the key coefficient and order in the function at this time, so as to realize the compression of the keyword.
  • the keywords and values in the metadata in this embodiment can be stored in two different logically or physically divided data entries, which can be referred to as keyword data entries and Value data entry.
  • the two different data entries may be two data entries that can respectively write data through independent write operations.
  • the above key data entry and value data entry can be understood as two different physical chunks.
  • the value data entry is modified (for example, after the value in the metadata is compressed, the compressed value of the value in the metadata is updated to the value data entry, that is, when the value data entry is modified)
  • keyword data entries For example, when the value data entry is written, the key data entry will not be affected.
  • the keyword data entry and the value data entry may be different physical chunks as an example to illustrate the two data entries to which the keyword data entry and the value data entry belong.
  • the two data entries to which the keyword data entry and the value data entry belong may also be storage space units of other granularity, which may not be limited in this embodiment.
  • the keywords and values in the metadata included in the subfile are respectively compressed by S310 and S311, as shown in FIG. 7 , the keywords in the metadata included in the subfile (Key1 -Key10) and values (Value1-Value10 in the figure), are stored in the key data entry and the value data entry respectively.
  • the keyword data entry is used to store the compressed value of the keyword in the metadata in the subfile
  • the value data entry is used to store The compressed value of the value in the metadata in the subfile.
  • the keyword data entry and the value data entry respectively include a header part (header) and a content part (vlaue).
  • the content part of the keyword data entry is used to record the compressed value of the keyword;
  • the header part of the keyword data entry is used to record the n kinds of compression algorithms used to compress the keyword, and which keywords in the content part each compression algorithm applies to .
  • the content part in the keyword data entry can be organized according to a tree structure, such as balanced+tree (balanced+tree, B+tree), adaptive radix tree (the adaptive radix tree, ARtree), etc.
  • a value data entry includes a header part and a content part.
  • the content part is used to record the compressed value of the actual address in the storage space with different serial numbers;
  • the header part is used to record the m compression algorithms used to compress the actual address, and which storage spaces in the content part each compression algorithm is applicable to.
  • the content of the key data entry includes: key_1' to key_10', key_1' to key_10' are compressed values of keys key1-key10 of 10 user data.
  • the header part of the keyword data entry records two compression algorithms: compression algorithm 1 and compression algorithm 2, and the range of keywords applicable to compression algorithm 1 and compression algorithm 2 respectively (that is, compression algorithm 1 corresponds to key1-key5 and compression algorithm 2 corresponds to key6 -key10).
  • the content part of the value data entry includes 10 storage spaces with serial numbers V1-V10, which respectively store the compressed values of the actual addresses of 10 user data.
  • the header part of the value data entry records three compression algorithms: compression algorithm 13, compression algorithm 14, and compression algorithm 15, and the storage spaces (ie, V1-V3, V4-V7, and V8-V10) respectively applicable to the three compression algorithms.
  • the compressed values of the keywords in the content part of the keyword data item respectively point to different storage spaces of the content part of the value data item.
  • key_1' in the key data entry points to storage space V1 in the value data entry
  • key_2' points to storage space V2 in the value data entry
  • the compression algorithm corresponding to key1 is compression algorithm 11; after that, According to the compression algorithm 11 and key1, the compressed value key_1' corresponding to key1 is obtained; then the storage space V1 in the value data entry is determined according to key_1'; then the value1_offset in the storage space V1 is read from the value data entry; and then by reading In the title part of the value data entry, it can be known that the compression algorithm corresponding to the storage space V1 is the compression algorithm 13; after that, according to the compression algorithm 13 and value1_offset, the actual address of the user data can be obtained, and then the data in the actual address can be read to complete the access User data.
  • keywords and values in multiple metadata are stored separately.
  • the metadata value needs to be compressed, it is only necessary to update the compressed value of the metadata value obtained after compression to the original storage space in the corresponding value data entry, and there is no need to modify the key Word data entries are modified, which improves compression efficiency.
  • the keywords of the metadata do not need to be compressed
  • only the keywords in the metadata may be stored in the keyword data entry as shown in FIG. 9 ;
  • the value data entry is still stored in a manner similar to the above design. In this way, when the metadata value needs to be recompressed, it is only necessary to update the compressed value of the metadata value obtained after recompression to the original storage space in the corresponding value data entry. Modifications are made to key data entries, which improves the effect of compression efficiency.
  • the method further includes:
  • the data change amount of the cold metadata file may specifically refer to the change amount of user data corresponding to the metadata in the cold metadata file within a preset time period.
  • multiple pieces of metadata in a cold metadata file are referred to as a metadata set.
  • the preset time period may be the period from the last processing of the user data corresponding to the metadata in the cold metadata file to the current moment; further exemplary, the preset time period may also be preset Set a fixed duration.
  • the method for setting the length of the preset time period may be set according to actual needs, and this application may not limit it.
  • the change amount of the user data in the cold metadata file may be the number of changes of the user data in the cold metadata file, or the data volume of the changed user data in the cold metadata file. In practical applications, technicians may use appropriate parameters to reflect the variation of user data in the cold metadata file according to actual requirements, and this application may not limit this.
  • the method further includes: compressing the metadata in the hot metadata file.
  • prefix compression or slope compression is used for the metadata keywords in the hot metadata file; in addition, when the metadata value in the hot metadata file has no regularity, the metadata value may not be compressed.
  • the value in the metadata of a certain user data (the user data is hot data) has just been compressed by the method of S308-310 above , the actual address of the user data changes due to the modification of the user data, which is equivalent to the process of compressing the value in the metadata of the user data, which is meaningless.
  • the metadata is first divided into cold metadata files and hot metadata files (that is, the above S304), and then on the one hand, according to the process of S308-310, the value of the metadata in the cold metadata files is compressed , on the other hand, the metadata value in the hot metadata file may not be compressed. In this way, the efficiency of metadata compression can be improved.
  • cold data and hot data may not be distinguished, that is, the content of S304 above is not executed, but the metadata in the L2 layer is taken as a whole, and all or part of the metadata in this whole , adopting processes such as migration of user data, so that the value of the metadata conforms to the set rule, and then compresses all or part of the value of the metadata.
  • the metadata in the L2 layer is taken as a whole, and all or part of the metadata in this whole , adopting processes such as migration of user data, so that the value of the metadata conforms to the set rule, and then compresses all or part of the value of the metadata.
  • the metadata compression process is performed at the L2 layer mainly in the scenario where the metadata of the L1 layer in the LSM tree is merged to the L2 layer, and the metadata compression method provided by the present application is carried out. introduce.
  • this method can also be applied to compress metadata of other data structures, for example, this method can also be used to perform metadata compression on other storage layers in the LSM tree, or this method can also be applied to other than Metadata for data structures other than the LSM tree are compressed.
  • the user data corresponding to the metadata is mainly migrated to store the user data corresponding to the metadata in a continuous storage space, so that the actual addresses of the multiple user data appear as regularity to compress the values in the metadata of these user data.
  • the user data may not be migrated, so that the actual addresses of multiple user data show regularity.
  • the actual address of the storage space where the user data is stored can be modified so that the actual addresses of multiple user data show regularity, that is, the values in the metadata of multiple user data conform to the set rule, The values in the metadata of these user data are thus compressed.
  • the mapping relationship between the actual address of the user data before modification and the underlying physical address is updated to the mapping relationship between the actual address of the user data after modification and the underlying physical address.
  • the value in the metadata refers to the chunk group ID of the storage space where the user data resides and the offset in the chunk group.
  • the chunk group IDs and offsets in chunk groups in the storage space of multiple user data can be regularized, for example, making multiple user data
  • the chunk group ID of the storage space and the offset in the chunk group are continuous, so that the values in the metadata of these user data can be compressed.
  • this embodiment also provides a metadata compression device, which can be used to perform some or all of the steps in the above-mentioned metadata compression method of this embodiment.
  • the metadata compression device includes hardware structures and/or software modules corresponding to each function.
  • the technical solutions provided in this embodiment can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • the metadata compression apparatus may be located in a hardware device used to manage metadata in the storage system.
  • metadata compressors are located in engines in centralized storage systems.
  • the metadata compression device is located in a storage server with a metadata management function in the distributed storage system.
  • FIG. 10 is a schematic structural diagram of a metadata compression device provided by the present application.
  • the metadata compression device 40 includes an acquisition unit 401 , a processing unit 402 and a compression unit 403 .
  • the metadata compression device is used to realize the functions of some or all steps in the method described above in FIG. 3 .
  • the acquiring unit 401 is configured to execute one or more items of S301 and S306 in FIG. 3 .
  • the processing unit 402 is configured to execute one or more items of S302-S305, S307-S309, and S312 in FIG. 3 .
  • the compression unit 403 is configured to execute one or more items of S310 and S311 in FIG. 3 .
  • FIG. 11 is a schematic structural diagram of a chip provided by the present application.
  • the chip 50 is used to implement the metadata compression method provided in this application.
  • the chip may be a chip used to realize the functions of the controller in the engine 121 .
  • the chip 50 includes:
  • the processor 501 is configured to execute the metadata compression method provided in this application.
  • the processor 501 may include a general-purpose central processing unit (central processing unit, CPU) and a memory, and the processor 501 may also be a microprocessor, a field programmable gate array (Field Programmable Gate Array, FPGA) or a specific application integration Circuit (application-specific integrated circuit, ASIC), etc.
  • CPU central processing unit
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • the chip 50 may further include: a memory 502 .
  • Computer instructions are stored in the memory 502, and the processor 501 executes the computer instructions stored in the memory to execute the metadata compression method provided in this application.
  • the memory 502 may be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, or a random access memory (random access memory, RAM) that can store information and instructions
  • ROM read-only memory
  • RAM random access memory
  • Other types of dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage , optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store program codes in the form of instructions or data structures and can be used by Any other medium accessed by a computer, but not limited to.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc
  • the chip 50 may further include: an interface 503 .
  • Interface 503 can be used to receive and send data.
  • the interface 502 may be a communication interface or a transceiver or the like.
  • the chip 50 may further include a communication line 504 .
  • communication line 504 may be a data bus for transferring information between the aforementioned components.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art .
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media.
  • the available medium may be a magnetic medium, such as a floppy disk, a hard disk, or a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); it may also be a semiconductor medium, such as an SSD.
  • “at least one” means one or more
  • “multiple” means two or more
  • other quantifiers are similar.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three kinds of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the singular forms “a”, “an” and “the” do not mean “one or only one” but “one or more” unless the context clearly dictates otherwise. in one".
  • “a device” means reference to one or more such devices.
  • At least one (at least one of). «" means one or any combination of subsequent associated objects, such as "at least one of A, B and C” includes A, B, C, AB, AC, BC, or ABC.
  • the character “/” generally indicates that the front and rear related objects are a kind of "or” relationship; in the formula of the application, the character “/” indicates that the front and rear Associated objects are a "division" relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente demande concerne un procédé et un appareil de compression de métadonnées, relatifs au domaine du stockage et utilisés pour réduire les ressources de stockage occupées par des métadonnées. Le procédé consiste à : acquérir n éléments de métadonnées, n étant un nombre entier positif supérieur à 1. Un élément de métadonnées comprend une paire clé-valeur, la paire clé-valeur comprenant un mot-clé et une valeur. Le mot-clé est utilisé pour indiquer un identifiant de données correspondant aux métadonnées, et la valeur est utilisée pour indiquer une adresse réelle où les données sont stockées. Ensuite, m éléments de données correspondant à au moins une partie des n éléments de métadonnées sont traités pour obtenir n valeurs cibles correspondant aux n éléments de métadonnées qui sont conformes à un motif défini, m étant un nombre entier positif inférieur ou égal à n. Les n valeurs cibles sont ensuite compressées.
PCT/CN2022/077759 2021-06-25 2022-02-24 Procédé et appareil de compression de métadonnées WO2022267508A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110710701 2021-06-25
CN202110710701.4 2021-06-25
CN202110944078.9 2021-08-17
CN202110944078.9A CN115525209A (zh) 2021-06-25 2021-08-17 元数据压缩方法及装置

Publications (1)

Publication Number Publication Date
WO2022267508A1 true WO2022267508A1 (fr) 2022-12-29

Family

ID=84545054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077759 WO2022267508A1 (fr) 2021-06-25 2022-02-24 Procédé et appareil de compression de métadonnées

Country Status (1)

Country Link
WO (1) WO2022267508A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909490A (zh) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 数据处理方法、装置、存储系统及计算机可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202138A (zh) * 2015-06-01 2016-12-07 三星电子株式会社 用于自主空间压缩的存储设备和方法
US20190121581A1 (en) * 2015-06-01 2019-04-25 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
US20190188291A1 (en) * 2017-12-15 2019-06-20 Western Digital Technologies, Inc. Utilization of Optimized Ordered Metadata Structure for Container-Based Large-Scale Distributed Storage
CN111309270A (zh) * 2020-03-13 2020-06-19 清华大学 一种持久性内存键值存储系统
US20200333968A1 (en) * 2019-04-17 2020-10-22 Oath Inc. Method and system for key-value storage
CN112099725A (zh) * 2019-06-17 2020-12-18 华为技术有限公司 一种数据处理方法、装置及计算机可读存储介质
CN112131140A (zh) * 2020-09-24 2020-12-25 北京计算机技术及应用研究所 基于ssd的支持高效存储空间管理的键值分离存储方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202138A (zh) * 2015-06-01 2016-12-07 三星电子株式会社 用于自主空间压缩的存储设备和方法
US20190121581A1 (en) * 2015-06-01 2019-04-25 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
US20190188291A1 (en) * 2017-12-15 2019-06-20 Western Digital Technologies, Inc. Utilization of Optimized Ordered Metadata Structure for Container-Based Large-Scale Distributed Storage
US20200333968A1 (en) * 2019-04-17 2020-10-22 Oath Inc. Method and system for key-value storage
CN112099725A (zh) * 2019-06-17 2020-12-18 华为技术有限公司 一种数据处理方法、装置及计算机可读存储介质
CN111309270A (zh) * 2020-03-13 2020-06-19 清华大学 一种持久性内存键值存储系统
CN112131140A (zh) * 2020-09-24 2020-12-25 北京计算机技术及应用研究所 基于ssd的支持高效存储空间管理的键值分离存储方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909490A (zh) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 数据处理方法、装置、存储系统及计算机可读存储介质
CN116909490B (zh) * 2023-09-11 2024-01-05 腾讯科技(深圳)有限公司 数据处理方法、装置、存储系统及计算机可读存储介质

Similar Documents

Publication Publication Date Title
US10761758B2 (en) Data aware deduplication object storage (DADOS)
US20240012714A1 (en) Indirect Dataset Replication To Cloud-Based Targets
US11016955B2 (en) Deduplication index enabling scalability
TWI778157B (zh) 固態驅動器、分散式資料儲存系統和利用鍵值儲存的方法
US10303797B1 (en) Clustering files in deduplication systems
CN106066896B (zh) 一种应用感知的大数据重复删除存储系统及方法
US10210188B2 (en) Multi-tiered data storage in a deduplication system
US11625169B2 (en) Efficient token management in a storage system
CN110908589B (zh) 数据文件的处理方法、装置、系统和存储介质
US11334523B2 (en) Finding storage objects of a snapshot group pointing to a logical page in a logical address space of a storage system
CN113626431A (zh) 一种基于lsm树的延迟垃圾回收的键值分离存储方法及系统
WO2023246754A1 (fr) Procédé de déduplication de données et système associé
WO2024021488A1 (fr) Procédé et appareil de stockage de métadonnées basés sur une base de données de valeurs clés distribuées
US11232043B2 (en) Mapping virtual block addresses to portions of a logical address space that point to the virtual block addresses
CN113535670A (zh) 一种虚拟化资源镜像存储系统及其实现方法
WO2022262381A1 (fr) Procédé et appareil de compression de données
CN115114294A (zh) 数据库存储模式的自适应方法、装置、计算机设备
WO2022267508A1 (fr) Procédé et appareil de compression de métadonnées
WO2015073712A1 (fr) Élagage d'informations de doublons sur un serveur pour une mise en antémémoire efficace
CN111274259A (zh) 一种分布式存储系统中存储节点的数据更新方法
WO2023050856A1 (fr) Procédé de traitement de données et système de stockage
CN115438039A (zh) 存储系统的数据索引结构的调整方法和装置
Liu et al. Smash: Flexible, fast, and resource-efficient placement and lookup of distributed storage
US11360691B2 (en) Garbage collection in a storage system at sub-virtual block granularity level
Klein et al. Dxram: A persistent in-memory storage for billions of small objects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22827015

Country of ref document: EP

Kind code of ref document: A1