WO2022267508A1 - Metadata compression method and apparatus - Google Patents

Metadata compression method and apparatus Download PDF

Info

Publication number
WO2022267508A1
WO2022267508A1 PCT/CN2022/077759 CN2022077759W WO2022267508A1 WO 2022267508 A1 WO2022267508 A1 WO 2022267508A1 CN 2022077759 W CN2022077759 W CN 2022077759W WO 2022267508 A1 WO2022267508 A1 WO 2022267508A1
Authority
WO
WIPO (PCT)
Prior art keywords
metadata
data
pieces
storage
user data
Prior art date
Application number
PCT/CN2022/077759
Other languages
French (fr)
Chinese (zh)
Inventor
高蒙
潘浩
宋雨恒
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202110944078.9A external-priority patent/CN115525209A/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022267508A1 publication Critical patent/WO2022267508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present application relates to the field of storage, in particular to a metadata compression method and device.
  • the present application provides a method and device for compressing metadata, which solves the problem that metadata occupies more storage resources.
  • the present application provides a metadata compression method, which includes: acquiring n pieces of metadata, where n is a positive integer greater than 1.
  • a piece of metadata includes a key-value pair, and the key-value pair includes a keyword and a value.
  • the keyword is used to indicate the identification of the data corresponding to the metadata
  • the value is used to indicate the actual address of the data storage.
  • m data corresponding to at least part of the n metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, where m is a positive integer less than or equal to n. Then, compress the n target values.
  • the n pieces of metadata in this application may specifically be n pieces of metadata stored in the storage system.
  • the n pieces of metadata may include n pieces of metadata in one or more nodes in the tree structure.
  • the n pieces of metadata may include n pieces of metadata in one or more ordered string tables (SStable) in a storage layer in the LSM tree.
  • processing m data may include migrating m data to change the actual address of m data, and then obtain n data that conforms to the set rule.
  • n target values corresponding to the metadata for another example, data may not be migrated to make the actual address of the data appear regular, and then n target values corresponding to n metadata that conform to the set rule can be obtained.
  • the present application does not have to limit the law that the n target values conform to, as long as the target value can be compressed according to the law.
  • the above n target values conforming to the set rule may refer to the n actual addresses indicated by the n target values as continuous actual addresses; for another example, the above n target values conforming to the set rule may refer to n Among the n actual addresses indicated by the target value, there is a storage space of the same size between each adjacent two actual addresses; for another example, the above n target values conforming to the set rule can refer to the n indicated by the n target values.
  • the size of the storage space between every two adjacent real addresses in one real address changes regularly and so on.
  • the n target values are compressed. Therefore, the problem that the value in the metadata cannot be compressed due to irregularity is solved, thereby achieving the effect of reducing storage resources occupied by the metadata.
  • the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
  • n target values by making the n actual addresses indicated by the n target values continuous, it is convenient to compress the target values in the metadata. For example, when compressing n target values, it is possible to record n target values by recording the first target value as the start bit among the n target values and the offset between other target values and the first target value. effect, so as to realize the compression of n target values.
  • the above-mentioned m data corresponding to at least part of the metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, including: migrating the m data to The data corresponding to the n pieces of metadata is stored in a storage space with continuous actual addresses. Afterwards, the actual addresses where n data are stored in the continuous storage space are saved as n target values.
  • the method of migrating m data is adopted, so that the data corresponding to n metadata is stored in a storage space with continuous actual addresses, and then the n data is stored in the continuous storage space.
  • the actual address is stored as n target values, so that the n target values can be made regular (that is, conform to the set rule), thereby facilitating the compression of the n target values.
  • the method further includes: selecting n pieces of metadata from the pieces of metadata according to the hotness and coldness of the data corresponding to the pieces of metadata included in the metadata set.
  • the data corresponding to the n pieces of metadata is cold data.
  • the metadata set may be any set including multiple metadata.
  • the metadata set can be , a set of metadata included in the storage layer above the first storage layer.
  • the metadata collection can be a collection of multiple metadata in the first storage layer.
  • the metadata collection is in The range in practical application may not be limited.
  • the n pieces of metadata are metadata in the first storage layer in the LSM tree.
  • the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
  • the metadata compression method provided in the present application can be applied to any storage layer in the LSM tree, so as to achieve the effect of compressing the metadata value in the storage layer.
  • the above key and value are stored in two data entries respectively.
  • the keyword data entry may not be affected when the data entry storing the value (called the value data entry) is modified.
  • the value data entry may not be affected when the data entry storing the value (called the value data entry) is modified.
  • the method further includes: detecting a data change amount of the metadata set; the metadata set is used to record metadata of multiple pieces of data. Acquiring n pieces of metadata includes: after determining that the amount of data change in the metadata set exceeds a change threshold, acquiring n pieces of metadata included in the metadata set.
  • the data change amount of the metadata set is detected, and after it is determined that the data change amount exceeds the change threshold, the acquisition of n pieces of metadata in the metadata set is triggered, so that the n pieces of metadata can be analyzed according to the method of the present application.
  • This way of compressing the value can achieve the effect of compressing the metadata in the metadata set after new metadata is stored in the metadata set, so as to reduce the storage resources occupied by the metadata set.
  • the metadata set refers to any set including multiple metadata.
  • the method further includes: acquiring a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata.
  • the above-mentioned processing of m data corresponding to at least part of the metadata in the n metadata includes: after determining that the degree of dispersion is greater than the discrete threshold, processing the m data corresponding to at least part of the metadata in the n metadata to process.
  • the method can be applied to a centralized storage system. Specifically, the method can be executed by an engine in the centralized storage system.
  • the method can be applied to a distributed storage system.
  • the distributed storage system includes multiple storage servers, and the foregoing method may be executed by one or more storage servers among the multiple storage servers.
  • the metadata compression device may be a hardware device for managing metadata in a storage system.
  • the metadata compression device may be an engine in a centralized storage system or a part of hardware devices in an engine, or the metadata compression device may be a storage server in a distributed storage system or a part of hardware devices in a storage server.
  • the metadata compression device may include: an acquisition unit, configured to acquire n pieces of metadata, one piece of metadata includes a key-value pair, the key-value pair includes a keyword and a value, and the keyword is used to indicate The identifier of the data corresponding to the metadata, the value is used to indicate the actual address of the data storage, and the n is a positive integer greater than 1.
  • a processing unit configured to process m data corresponding to at least some of the n metadata, to obtain n target values corresponding to the n metadata conforming to a set rule, where m is less than or equal to n positive integer of .
  • a compression unit configured to compress the n target values.
  • the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
  • the processing unit is configured to process the m data corresponding to the at least part of the metadata, and obtain the n target values corresponding to the n metadata conforming to the set rules, including:
  • the processing unit is specifically configured to migrate the m pieces of data, so as to store the data corresponding to the n pieces of metadata in a storage space with continuous actual addresses.
  • the processing unit is specifically configured to save the actual addresses of the n data stored in the continuous storage space as the n target values.
  • the processing unit is further configured to select the n pieces of metadata from the multiple pieces of metadata according to the hotness and coldness of the data corresponding to the multiple pieces of metadata, and the n pieces of metadata The corresponding data is cold data.
  • the n pieces of metadata are metadata in the first storage layer in the LSM tree.
  • the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
  • the above key and value are stored in two data entries respectively.
  • the processing unit is further configured to detect a data change amount of a metadata set; the metadata set is used to record metadata of multiple pieces of data.
  • the acquisition unit is configured to acquire n pieces of metadata, including: the acquisition unit is specifically configured to acquire the metadata included in the metadata set after determining that the amount of data change in the metadata set exceeds a change threshold. n metadata.
  • the obtaining unit is further configured to obtain a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata.
  • the processing unit is configured to process m pieces of data corresponding to at least part of the metadata in the n pieces of metadata, including: the processing unit is specifically configured to, after determining that the degree of dispersion is greater than a dispersion threshold, process m data corresponding to at least part of the n metadata are processed.
  • the metadata compression device is located in an engine in the centralized storage system.
  • the metadata compression device is located in a storage server in a distributed storage system.
  • a storage device including: a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute computer instructions from the memory, so as to realize the first aspect or the implementations in the first aspect The method provided by any of the methods.
  • a storage system including an engine and a plurality of hard disks, the plurality of hard disks are used to store data, and the engine is used to execute the method provided in any one of the above-mentioned first aspect or each implementation manner of the first aspect .
  • the storage system may be a centralized storage system.
  • a storage system including a plurality of storage servers, the plurality of storage servers are used to store data, and the first server among the plurality of storage servers is used to perform the above-mentioned first aspect or in each implementation manner of the first aspect Either of the methods provided.
  • the storage system may be a distributed storage system.
  • the first server may be a storage server capable of managing metadata in the distributed storage system.
  • a chip including a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to implement the first aspect or The method provided by any one of the implementations in the first aspect.
  • a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned first aspect or each implementation manner in the first aspect can be realized. method provided by the item.
  • a computer program product includes instructions, and when the instructions are run on a processor, the above-mentioned first aspect or any one of the implementations in the first aspect is implemented. method.
  • FIG. 1 is a schematic structural diagram of a storage system provided by the present application.
  • FIG. 2 is a schematic flow chart of writing data to a storage system provided by the present application
  • FIG. 3 is a schematic flow diagram of a metadata compression method provided by the present application.
  • FIG. 4A is one of the flow diagrams for merging metadata from layer L1 to layer L2 in the LSM tree provided by the present application ;
  • FIG. 4B is the second schematic flow diagram of merging metadata from L1 layer to L2 layer in the LSM tree provided by the present application;
  • FIG. 5A is one of the schematic diagrams for data migration provided by the present application.
  • Figure 5B is a second schematic diagram of data migration provided by this application.
  • Figure 6A is the third schematic diagram of data migration provided by this application.
  • FIG. 6B is a fourth schematic diagram of data migration provided by this application.
  • FIG. 7 is one of the schematic structural diagrams of a keyword data entry and a value data entry provided by the present application.
  • Fig. 8 is the second structural diagram of a keyword data entry and a value data entry provided by the present application.
  • FIG. 9 is the third schematic diagram of the structure of a keyword data entry and a value data entry provided by this application.
  • FIG. 10 is one of the structural schematic diagrams of a metadata compression device provided by the present application.
  • FIG. 11 is the second structural schematic diagram of a metadata compression device provided by the present application.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • user data can be accessed by running an application program.
  • the computer running the application program may be referred to as an "application server".
  • the application server 100 may be a physical machine or a virtual machine.
  • the application server 100 includes, but is not limited to, desktop computers, servers, notebook computers, and mobile devices.
  • the application server accesses the storage system 120 through the switch 110 to access user data.
  • the switch 110 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the switch 110 can also be replaced with an Ethernet switch, an InfiniBand switch, a RoCE (RDMA over Converged Ethernet) switch, and the like.
  • the storage system 120 is a device or a device cluster for storing user data.
  • the storage system 120 may be a centralized storage system.
  • a centralized storage system is characterized by a unified entrance, and all data from external devices such as application servers must pass through this entrance.
  • the entrance of the centralized storage system may specifically be the engine 121 of the centralized storage system.
  • the engine 121 may include one or more controllers, and one controller 122 is taken as an example in FIG. 1 for illustration.
  • multiple controllers can be used as backups for each other through mirroring channels.
  • one of the controllers fails, other controllers can take over the business of the faulty controller, thereby Avoid hardware failures leading to the unavailability of the entire storage system.
  • the engine 121 may further include a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100 .
  • the backend interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 can be connected with more hard disks 134, thereby forming a very large storage resource pool.
  • the controller 122 may include a processor 123 and a memory 124 .
  • Processor 112 may be a central processing unit (central processing unit, CPU), used to process data access requests from outside the storage system (such as application servers or other storage systems), and also used to process requests generated inside the storage system.
  • CPU central processing unit
  • the CPU 123 receives the write data requests sent by the application server 100 through the front-end interface 125, it will temporarily store the user data in these write data requests in the memory 124.
  • the CPU 123 sends the user data stored in the internal memory 124 to the hard disk 134 for persistent storage through the back-end interface.
  • the memory 124 is an internal memory for directly exchanging data with the processor. It can read and write data at any time, and the reading and writing speed is fast. It can be used as a temporary data storage for an operating system or other running programs.
  • the memory 124 may include various types of memory, for example, the memory may be a random access memory or a read-only memory (Read Only Memory, ROM).
  • the random access memory is Dynamic Random Access Memory (Dynamic Random Access Memory, DRAM) or Storage Class Memory (Storage Class Memory, SCM).
  • DRAM Dynamic Random Access Memory
  • SCM Storage Class Memory
  • DRAM is a semiconductor memory, which, like most Random Access Memory (RAM), is a volatile memory device.
  • SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory.
  • DRAM and SCM are exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as Static Random Access Memory (Static Random Access Memory, SRAM) and the like.
  • Static Random Access Memory Static Random Access Memory
  • the read-only memory for example, it may be a programmable read-only memory (Programmable Read Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), and the like.
  • the memory 124 can also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM for short), that is, a module composed of DRAM, or a solid state disk (Solid State Disk, SSD) .
  • DIMM Dual In-line Memory Module
  • multiple memories 124 and different types of memories 124 may be configured in the controller 0 .
  • This embodiment does not limit the quantity and type of the memory 124 .
  • the memory 124 can be configured to have a power saving function.
  • the power saving function means that the data stored in the internal memory 124 will not be lost when the system is powered off and then powered on again. Memory with a power saving function is called non-volatile memory.
  • the storage system may include two or more engines 121 , and redundancy or load balancing is performed among the multiple engines 121 .
  • the engine 121 may also include hard disk slots.
  • the hard disk 134 can be directly deployed in the engine 121, and the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, you can More hard disks or hard disk enclosures are connected through the back-end interface 126 .
  • FIG. 1 only provides a schematic structural diagram of a centralized storage system as an example.
  • the storage system 120 may be composed of multiple independent storage servers, where the storage servers may communicate with each other.
  • each storage server may respectively include hardware components such as a processor, a memory, a network card, and a hard disk.
  • the processor and memory are used to provide computing resources; the processor is used to process data access requests from outside the storage server; the memory is used to directly exchange data with the processor's internal memory, which can read and write data at any time, and the speed is very fast. Can be used as temporary data storage for the operating system or other running programs.
  • a hard disk is used to provide storage resources, such as storing data, and it can be a magnetic disk or other types of storage media, such as solid-state hard disks or shingled magnetic recording hard disks.
  • the storage server may also include a network card for communicating with the application server.
  • the actual address of the storage space provided by the hard disk 134 is generally not directly exposed to the application server 100 for use.
  • the storage system 120 stores metadata recording actual addresses of user data.
  • the application server 100 writes data into the hard disk 134
  • the metadata of the user data is added to the metadata file to record the actual address of the data.
  • the actual address of the user data can be determined by searching the metadata of the user data in the metadata file recording the above metadata.
  • the data described by metadata is referred to as "user data" in this embodiment of the application.
  • the user data mentioned in the embodiment of this application can be understood as the data stored in the storage system provided by the application server to provide related services, and the metadata is the data used to describe these user data (data that describes other data), including but It is not limited to the actual address of the user data storage, the mapping relationship between the logical address and the actual address, the attribute of the user data and other information.
  • user data may also be called “data” or other names, which may not be limited in this embodiment of the present application.
  • the application server 100 may send a read data request carrying the identifier of the user data to the storage system 120, where the identifier of the user data may be the application server 100 The logical address of this user data used in , etc.
  • the CPU 123 finds the actual address of the user data from the metadata file stored in the internal memory 124 or the hard disk 134 according to the identifier of the user data, wherein the actual address of the user data The address may be the physical address of the bottom layer of the user data in the storage system 120 or the logical address of the middle layer. Then CPU 123 reads the user data in the above-mentioned actual address of hard disk 134 through back-end interface 126, and feeds back to application server 100 through front-end interface 125.
  • mapping relationship between the identifier of the user data and the actual address of the user data is recorded in the metadata.
  • the mapping relationship may be stored in the form of a key-value pair (KV pair).
  • KV pair key-value pair
  • a metadata file including metadata is stored in the internal memory 124.
  • the identification of user data identification 1-5 in the figure
  • the actual address of the data addresses 1-5 in the figure
  • the value of the key-value pair so as to establish a mapping relationship between the identifier of the user data and the actual address of the data in the form of a key-value pair.
  • the content of the key-value pairs in the metadata is only shown in the form of a list as an example in Figure 1, and the key-value pairs are also stored in other forms (such as using a tree structure) in practical applications.
  • the storage form may not be limited in this application.
  • key-value pairs are stored as a representative of non-relational databases, which abandons the strict field structure of data tables in relational databases and the relationship restrictions between tables.
  • the data stored in key-value correspondence adopts a simplified data model, so that key-value pair storage has the following advantages: First, high scalability, because there is no strict field structure of the data table and the relationship between tables, the key-value pair Distributed applications can be easily deployed on multiple servers, thereby improving the scalability of the entire system and making it more convenient and flexible. Second, mass storage and high throughput capacity to meet the needs of cloud computing. Key-value pair storage can well meet the flexible needs of users for scalability in the cloud computing environment. Therefore, key-value pair storage is increasingly becoming the mainstream storage method.
  • structured forms such as binary search tree, balanced tree (B tree), B+ tree or structured merge tree (log structured merge tree, LSM tree) are usually used.
  • data is stored.
  • LSM tree LSM tree
  • the LSM tree is one of the commonly used storage structures in a log-based database system, and the LSM tree is a multi-layer framework.
  • the LSM tree is mainly stored in memory.
  • the metadata in all or some nodes of the LSM tree can also be temporarily stored in the hard disk. When these nodes need to be read When the metadata in the node is copied to the memory.
  • the hard disk 134 includes an ordered string table area 1341 for storing an ordered string table (sorted string table, SStable) and a data storage area 1342 for storing user data, wherein the ordered string table
  • the table area 1341 and the data storage area 1342 are generally logically divided storage areas in the hard disk 134 .
  • the storage system 120 When the storage system 120 receives a data write request for writing user data X into the storage system, as shown in FIG. Corresponding key-value pairs) are written into memory table (memtable) 1241 in order. In addition, what is not shown in the figure is that in practical applications, after receiving a data write request for data X, the storage system 120 can also record this write operation through a write-ahead logging (WAL) in the log file for failover.
  • WAL write-ahead logging
  • the memory table 1241 is located in the memory 124 , and when the memory table 1241 exceeds a certain threshold, it will be frozen in the memory and switched to an immutable memory table (immutable memtable) 1242 . At this time, in order not to block the write operation, a new memory table will be regenerated in the memory of the storage system 120 to continue to provide services. Afterwards, the non-modifiable memory table 1242 will be written into the ordered string table area 1341 in batches.
  • the ordered string table area is located on one or more hard disks 134 .
  • the ordered character string table area 1341 includes a multi - storage layer structure, such as the L0 layer, L1 layer and L2 layer as shown in FIG.
  • each storage layer may include one or more SSTables, and further one or more SSTables may be stored in the form of structural data.
  • the non-modifiable memory table 1242 When the non-modifiable memory table 1242 is written into the ordered character string table area 1341, the non-modifiable memory table 1242 will first be written into the top-level storage layer, such as layer L0 in FIG. 2 .
  • the SSTable in the L 0 layer will be merged into the L 1 layer, and when the data amount in the L 1 layer reaches the threshold, the SSTable in the L 1 layer will be merged (merge) to L2 layer, and so on , so that old metadata can be continuously deleted and new data can be continuously written.
  • the LSM tree for storing metadata is introduced by taking the three-layer storage layer as an example. It can be understood that in practical applications, the LSM tree can be composed of more or fewer layers of storage layers. Do limit.
  • the storage system 120 first searches the metadata of the user data X in the memory table 1241; if found, then according to the actual address in the metadata , to access the data storage area 1342 .
  • search down in turn specifically first look up the metadata of user data X in the non-modifiable memory table 1242, if it is determined that there is no user data X in the non-modifiable memory table 1242 metadata of user data X in L 0 layer; if it is determined that there is no metadata of user data X in L 0 layer, then search for metadata of user data X in L 1 layer, and so on, Until the metadata of the user data X is found, then access the data storage area 1342 according to the actual address in the metadata.
  • the key in the key-value pair is the identifier of the user data, and the value in the key-value pair is the actual address of the user data storage.
  • logical unit number logical unit number ID, LUN ID
  • snapshot number snapshot number
  • logical block address logical block address, LBA
  • the key may also be one or more items of a version number (Version), a file name and an offset within the file, or a hash value of the file name and the offset within the file.
  • the address corresponding to the user data identifier in the metadata is called the "actual address”.
  • Type produces restrictions.
  • the identification and actual address of user data can be understood as a relative relationship. Compared with the actual address of user data, the identification of user data is closer to the upper application, and the actual address of user data is closer to the underlying hardware. .
  • the actual address of the user data can be the address in the logical chunk group (chunk group) corresponding to the user data in the LUN.
  • the actual address of the user data can include the chunk group ID of the chunk group where the user data is located and the user
  • the offset of the data in the chunk group, or the actual address of the user data can be the address of the user data in the physical block (chunk) in the hard disk.
  • the actual address of the user data can include the physical chunk ID and the physical chunk of the physical chunk where the user data is located. The offset in the physical chunk.
  • the actual address of the user data can also be the physical address where the user data is stored in the hard disk. Assuming that the hard disk is a solid-state hard disk, the physical address is the block ID and page ID where the user data is located in the solid-state hard disk. This application may not limit the specific form of the identification and actual address of the user data. For the convenience of description, this embodiment is described by taking the physical address of the user data as an example in which the value in the key-value pair is used.
  • the size of the key-value pair depends on the LUN ID, LBA, version number, or the number of bytes in the physical address. In this case, the key-value pair The number of sections is generally 24 to 32 bytes. If calculated according to the data block size of 8K, the storage capacity occupied by metadata and the storage capacity occupied by user data account for about 0.3%.
  • a prefix compression method may be used to compress the keys in the key-value pairs.
  • the key part of multiple metadata such as multiple metadata in a storage layer in an LSM tree or in one or more SSTables in a storage layer
  • a common part such as LUN ID or snap ID
  • this application considers that in related technologies, only the key field in the key-value pair is compressed, because the value field is generally the actual address of user data or Fingerprint, in which the actual address is related to the space allocated when user data is written, so the actual address usually has no regularity; in addition, the fingerprint of user data depends on the content of the user data itself, and it is difficult to have regularity, so it is difficult to determine the value Some content is compressed.
  • the embodiment of the present application provides a technical solution for metadata compression.
  • the value used to indicate the actual address of user data contained in multiple metadata does not have regularity, it can By processing the user data corresponding to the plurality of metadata, for example, the user data is migrated to multiple continuous actual addresses. Therefore, the value in the metadata of these user data has regularity, thereby facilitating the compression of the value.
  • the embodiment of the present application provides a metadata compression method.
  • the method may be implemented by the engine 121 in the storage system 120 in FIG. 1 , specifically, by a controller in the engine 121 .
  • the central processing unit 123 invokes the program instructions in the memory for execution.
  • the memory may be the memory 124 in FIG. 1 , or a cache located in the central processing unit 123 .
  • the method may also be implemented by other hardware devices used to manage metadata in the storage system.
  • the storage system is a distributed storage system
  • the method may be implemented by a storage server capable of managing metadata in the distributed storage system or by some hardware in the storage server.
  • the metadata compression method is introduced below: as shown in FIG. 3 , in the process of writing data to the storage system, the method includes:
  • the write data request carries user data requested to be written into the storage system.
  • metadata includes key-value pairs.
  • the key of the key-value pair is used to indicate the identity of the user data, and the value is used to indicate the actual address of the user data storage.
  • the keywords in the key-value pairs included in the above-mentioned metadata are referred to as "keywords in the metadata”; the values in the key-value pairs included in the above-mentioned metadata , referred to as “values in metadata”, unless otherwise specified, can be understood as above for “keywords in metadata” and “values in metadata”, which will not be repeated below.
  • the engine 121 continues to receive data write requests, more and more user data is stored in the memory 124.
  • the engine 121 will store the user data in the memory 124
  • the data is written into the hard disk 134 for persistent storage.
  • the address where the user data is stored in the hard disk 134 is the actual address where the user data is stored in this embodiment.
  • the write data request received by the engine 121 not only carries the user data, but also includes the logical address of the user data.
  • the logical address is an address presented to the application server 100, and is used to enable the application server 100 to access the user data.
  • the engine 121 may use the logical address or the hash value corresponding to the logical address as a key in the metadata, and use the actual address of the user data stored in the hard disk 134 as the metadata In the value, save the corresponding relationship between the keyword and the value as a key-value pair.
  • the engine 121 switches the content in the memory table 1241 to an unmodifiable memory table 1242 and generates a new memory table.
  • switching the content in the memory table 1241 to the non-modifiable memory table 1242 may mean that the engine 121 modifies the attributes of the memory table 1241 so that the modified memory table (that is, the non-modifiable memory table) no longer receives new data.
  • the threshold in S304 may be the same as or different from the threshold in S303.
  • the engine 121 may transfer the content in the unmodifiable memory table 1242 to the top layer L0 of the ordered string table area 1341 . Then, when the amount of data in the L 0 layer exceeds the threshold, the metadata in the L 0 layer is merged into the L 1 layer.
  • the L2 layer includes two metadata files, which are respectively used to store metadata corresponding to cold data and metadata corresponding to hot data.
  • the metadata file used to store metadata corresponding to cold data is called a cold metadata file
  • the metadata file used to store metadata corresponding to hot data is called a hot metadata file.
  • the two metadata files in the L2 layer may respectively include different ordered character string tables in the L2 layer.
  • each metadata file can be stored in a tree structure to facilitate data search.
  • the hot or cold degree of user data can be understood as the possibility of the user data being modified (the possibility of being modified can be specifically reflected as the historical modification frequency or historical modification times of the user data and other parameters) high and low.
  • cold data may be understood as user data whose possibility of being modified is lower than a certain threshold
  • hot data may be understood as user data whose possibility of being modified is higher than a certain threshold.
  • each ordered string table stores metadata of different keyword ranges respectively, wherein the ordered string table 1 stores key
  • the metadata whose word range is key1-key20 for example, the metadata corresponding to key1, key3, key4, key7, key12, key15, key16, key18, key19, and key20 are currently stored in the ordered string table 1 in Figure 4A.
  • String table 2 stores metadata whose key range is key21-key40
  • ordered string table 3 stores metadata whose key range is key41-key60
  • ordered string table 4 The metadata with the key range of key61-key80 is stored in , and the metadata with the key range of key81-key100 is stored in the ordered string table 5.
  • the L1 layer includes 5 ordered string tables and each ordered string table is used to store metadata of 20 keyword ranges as an example for illustration.
  • this example compares the number of ordered string tables included in each storage layer in the LSM tree, the keyword range corresponding to each ordered string table, and the number of metadata included in each ordered string table. The number is not limited.
  • S305 may specifically include:
  • the keywords of the leftmost path of the binary tree corresponding to each ordered string table can be traversed sequentially by post-order traversal and merge sorting, so as to realize traversal and read Each key in the sequence string table.
  • S3053 is executed for each keyword.
  • the hot or cold degree of the user data may be judged according to the IO type corresponding to the user data. Specifically, considering that under normal circumstances, user data using sequential write IO is colder and hotter, and user data using random write IO is hotter. Therefore, user data using sequential write IO can be used as cold Data, the user data of random write IO will be used as hot data.
  • the cold metadata file can be divided into multiple sub-files, wherein each sub-file can be an ordered string table, and each ordered string table stores metadata of different key ranges respectively.
  • each sub-file can be an ordered string table
  • each ordered string table stores metadata of different key ranges respectively.
  • the cold metadata file in the L2 layer includes 10 SSTables: ordered string table 6-ordered string table 15, these 10 SSTables are used to store key1-key10, Metadata for the key range of key11-key20...key99-key100.
  • the hot metadata file can also be divided into multiple subfiles, where each subfile can be an SSTable, and each SSTable stores metadata of different keyword ranges.
  • each subfile can be an SSTable
  • each SSTable stores metadata of different keyword ranges.
  • the hot metadata file in the L2 layer includes 10 SSTables: ordered string table 16-ordered string table 25, and these 10 SSTables are used to store hot metadata files respectively Metadata for key ranges of key1-key10, key11-key20...key99-key100.
  • the method provided in this embodiment also includes:
  • the L2 layer may include 10 sub-files: ordered string table 6-ordered string table 15, these 10 sub-files are used to store key1-key10, key11-key20...key99 respectively Keyword-scoped metadata for -key100. Furthermore, subsequent steps may be performed on the 10 subfiles to compress the 10 subfiles; or, subsequent steps may be performed on some of the 10 subfiles to compress the subfiles.
  • the address discrete degree corresponding to the sub-file can be understood as the discrete degree of the actual address of the user data corresponding to the metadata included in the sub-file.
  • the sub-file includes metadata of n pieces of user data, and the n pieces of user data are respectively stored in n actual addresses, where n is a positive integer. Then, the higher the dispersion degree of the n actual addresses, the higher the dispersion degree of the address corresponding to the sub-file; the lower the dispersion degree of the n actual addresses, the lower the dispersion degree of the address corresponding to the sub-file.
  • the degree of discreteness of the n actual addresses can be specifically reflected in that the n actual addresses can be reflected according to several laws.
  • m actual addresses are continuous, where m is a positive integer less than n
  • the other n-m actual addresses are continuous, that is to say, it can follow two rules to reflect the n actual addresses
  • there are q actual addresses that are continuous where q is a positive integer less than n
  • the n actual addresses are reflected according to four rules. Then the address dispersion of the sub-file in the first case is smaller than the address dispersion of the sub-file in the second case.
  • the user data corresponding to the metadata of the sub-file can be stored in a continuous storage space by migrating the user data; in another implementation, the sub-file can also be The user data corresponding to the metadata of the file is segmented and stored in multiple blocks of continuous storage space.
  • the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in a continuous storage space.
  • the user data corresponding to the sub-file may be migrated to a continuous free storage space.
  • the sub-file includes metadata of 5 user data, as shown in FIG. 5A , the 5 user data are respectively stored in address 1, address 3, address 4, address 8 and address 10. Then, by reading the data in these 5 addresses respectively and migrating the data in these 5 addresses to the unused address 11-address 15 respectively, the storage spaces of these 5 user data are continuous.
  • part of the user data corresponding to the sub-file can be migrated to the continuous storage space of other user data corresponding to the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in the in a contiguous storage space.
  • the sub-file includes metadata of 5 user data, as shown in FIG. 5B , the 5 user data are stored in address 1, address 3, address 4, address 8 and address 10 respectively. Then, by reading the data at address 8 and address 10 respectively, and migrating the data at address 8 and address 10 to address 2 and address 5 respectively, the storage spaces of these five user data are continuous.
  • the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata of the sub-file is stored in segments into multiple consecutive storage spaces .
  • the user data corresponding to the sub-files may be migrated to multiple consecutive segments of storage space.
  • the sub-file includes metadata of 10 user data, as shown in FIG. 16.
  • Address 17 and Address 20 Then read the data in these 10 addresses respectively, and migrate the data in these 10 addresses to the unused address 21-address 25, address 32-address 36, so that the 10 user data
  • the storage space of user data 1-user data 5 is continuous
  • the storage space of user data 6-user data 10 is continuous.
  • the user data corresponding to the subfile can be migrated to the continuous storage space of other user data corresponding to the subfile, so that the user data corresponding to the metadata of the subfile can be stored in segments at most in a contiguous storage space.
  • the sub-file includes metadata of 10 user data, as shown in FIG. 16.
  • Address 17 and Address 20 By migrating user data 2-user data 5 to the storage space continuous with user data 1, and user data 7-user data 10 to the storage space continuous with user data 7, so that the users in these 10 user data
  • the storage space of data 1-user data 5 is continuous
  • the storage space of user data 6-user data 10 is continuous.
  • the migrated actual address may be used as a value in the metadata of the migrated user data, and updated in the metadata of the user data.
  • the actual address of the user data is irregular, the actual address of the user data is usually stored as a value in the metadata.
  • the user data is stored in consecutive actual addresses, it can be By summarizing the change law of continuous actual addresses, a compression algorithm reflecting this change law is generated, for example, the compression algorithm may be a binary first-order function.
  • the law of the actual address of the user data can be summarized by using machine learning to generate the binary first-order function.
  • the actual address of the user data needs to be persistently stored, and the actual address of the user data does not need to be stored.
  • the actual address of the user data needs to be read, by using the compression value corresponding to the actual address of the user data as an input of the compression algorithm, the actual address of the user data can be output by the compression algorithm.
  • the actual addresses of the five user data can be expressed as 0x0000000100000000, 0x0000000100000001, 0x0000000100000002, 0x0000000100000003, and 0x0000000001.
  • "0x" in the actual address represents a hexadecimal number
  • "00000001" in the middle represents the disk ID
  • the last 8 digits represent the physical block number in the disk.
  • start key (start key): 0000000100000000
  • number of data 5
  • compression algorithm 1 (indicating that the compression algorithm is a first-order function with a slope of 1)
  • each element Compressed values for values in data 0, 1, 2, 3, 4.
  • a prefix compression method may also be used to compress the values in the metadata by extracting a common prefix.
  • the embodiment of the present application may not limit the compression method adopted for the value in the metadata.
  • the method further includes:
  • a prefix compression method can be used to extract a common prefix (such as a volume number (LUN ID), a subnet access point identifier (snap ID), etc.) for keywords in metadata, and then compress the keywords.
  • a common prefix such as a volume number (LUN ID), a subnet access point identifier (snap ID), etc.
  • a first-order function can also be used to represent the relationship between the offset and the keyword subscript, and the keyword subscript and the function can directly Calculate the offset part of the keyword, that is, only need to record the key coefficient and order in the function at this time, so as to realize the compression of the keyword.
  • the keywords and values in the metadata in this embodiment can be stored in two different logically or physically divided data entries, which can be referred to as keyword data entries and Value data entry.
  • the two different data entries may be two data entries that can respectively write data through independent write operations.
  • the above key data entry and value data entry can be understood as two different physical chunks.
  • the value data entry is modified (for example, after the value in the metadata is compressed, the compressed value of the value in the metadata is updated to the value data entry, that is, when the value data entry is modified)
  • keyword data entries For example, when the value data entry is written, the key data entry will not be affected.
  • the keyword data entry and the value data entry may be different physical chunks as an example to illustrate the two data entries to which the keyword data entry and the value data entry belong.
  • the two data entries to which the keyword data entry and the value data entry belong may also be storage space units of other granularity, which may not be limited in this embodiment.
  • the keywords and values in the metadata included in the subfile are respectively compressed by S310 and S311, as shown in FIG. 7 , the keywords in the metadata included in the subfile (Key1 -Key10) and values (Value1-Value10 in the figure), are stored in the key data entry and the value data entry respectively.
  • the keyword data entry is used to store the compressed value of the keyword in the metadata in the subfile
  • the value data entry is used to store The compressed value of the value in the metadata in the subfile.
  • the keyword data entry and the value data entry respectively include a header part (header) and a content part (vlaue).
  • the content part of the keyword data entry is used to record the compressed value of the keyword;
  • the header part of the keyword data entry is used to record the n kinds of compression algorithms used to compress the keyword, and which keywords in the content part each compression algorithm applies to .
  • the content part in the keyword data entry can be organized according to a tree structure, such as balanced+tree (balanced+tree, B+tree), adaptive radix tree (the adaptive radix tree, ARtree), etc.
  • a value data entry includes a header part and a content part.
  • the content part is used to record the compressed value of the actual address in the storage space with different serial numbers;
  • the header part is used to record the m compression algorithms used to compress the actual address, and which storage spaces in the content part each compression algorithm is applicable to.
  • the content of the key data entry includes: key_1' to key_10', key_1' to key_10' are compressed values of keys key1-key10 of 10 user data.
  • the header part of the keyword data entry records two compression algorithms: compression algorithm 1 and compression algorithm 2, and the range of keywords applicable to compression algorithm 1 and compression algorithm 2 respectively (that is, compression algorithm 1 corresponds to key1-key5 and compression algorithm 2 corresponds to key6 -key10).
  • the content part of the value data entry includes 10 storage spaces with serial numbers V1-V10, which respectively store the compressed values of the actual addresses of 10 user data.
  • the header part of the value data entry records three compression algorithms: compression algorithm 13, compression algorithm 14, and compression algorithm 15, and the storage spaces (ie, V1-V3, V4-V7, and V8-V10) respectively applicable to the three compression algorithms.
  • the compressed values of the keywords in the content part of the keyword data item respectively point to different storage spaces of the content part of the value data item.
  • key_1' in the key data entry points to storage space V1 in the value data entry
  • key_2' points to storage space V2 in the value data entry
  • the compression algorithm corresponding to key1 is compression algorithm 11; after that, According to the compression algorithm 11 and key1, the compressed value key_1' corresponding to key1 is obtained; then the storage space V1 in the value data entry is determined according to key_1'; then the value1_offset in the storage space V1 is read from the value data entry; and then by reading In the title part of the value data entry, it can be known that the compression algorithm corresponding to the storage space V1 is the compression algorithm 13; after that, according to the compression algorithm 13 and value1_offset, the actual address of the user data can be obtained, and then the data in the actual address can be read to complete the access User data.
  • keywords and values in multiple metadata are stored separately.
  • the metadata value needs to be compressed, it is only necessary to update the compressed value of the metadata value obtained after compression to the original storage space in the corresponding value data entry, and there is no need to modify the key Word data entries are modified, which improves compression efficiency.
  • the keywords of the metadata do not need to be compressed
  • only the keywords in the metadata may be stored in the keyword data entry as shown in FIG. 9 ;
  • the value data entry is still stored in a manner similar to the above design. In this way, when the metadata value needs to be recompressed, it is only necessary to update the compressed value of the metadata value obtained after recompression to the original storage space in the corresponding value data entry. Modifications are made to key data entries, which improves the effect of compression efficiency.
  • the method further includes:
  • the data change amount of the cold metadata file may specifically refer to the change amount of user data corresponding to the metadata in the cold metadata file within a preset time period.
  • multiple pieces of metadata in a cold metadata file are referred to as a metadata set.
  • the preset time period may be the period from the last processing of the user data corresponding to the metadata in the cold metadata file to the current moment; further exemplary, the preset time period may also be preset Set a fixed duration.
  • the method for setting the length of the preset time period may be set according to actual needs, and this application may not limit it.
  • the change amount of the user data in the cold metadata file may be the number of changes of the user data in the cold metadata file, or the data volume of the changed user data in the cold metadata file. In practical applications, technicians may use appropriate parameters to reflect the variation of user data in the cold metadata file according to actual requirements, and this application may not limit this.
  • the method further includes: compressing the metadata in the hot metadata file.
  • prefix compression or slope compression is used for the metadata keywords in the hot metadata file; in addition, when the metadata value in the hot metadata file has no regularity, the metadata value may not be compressed.
  • the value in the metadata of a certain user data (the user data is hot data) has just been compressed by the method of S308-310 above , the actual address of the user data changes due to the modification of the user data, which is equivalent to the process of compressing the value in the metadata of the user data, which is meaningless.
  • the metadata is first divided into cold metadata files and hot metadata files (that is, the above S304), and then on the one hand, according to the process of S308-310, the value of the metadata in the cold metadata files is compressed , on the other hand, the metadata value in the hot metadata file may not be compressed. In this way, the efficiency of metadata compression can be improved.
  • cold data and hot data may not be distinguished, that is, the content of S304 above is not executed, but the metadata in the L2 layer is taken as a whole, and all or part of the metadata in this whole , adopting processes such as migration of user data, so that the value of the metadata conforms to the set rule, and then compresses all or part of the value of the metadata.
  • the metadata in the L2 layer is taken as a whole, and all or part of the metadata in this whole , adopting processes such as migration of user data, so that the value of the metadata conforms to the set rule, and then compresses all or part of the value of the metadata.
  • the metadata compression process is performed at the L2 layer mainly in the scenario where the metadata of the L1 layer in the LSM tree is merged to the L2 layer, and the metadata compression method provided by the present application is carried out. introduce.
  • this method can also be applied to compress metadata of other data structures, for example, this method can also be used to perform metadata compression on other storage layers in the LSM tree, or this method can also be applied to other than Metadata for data structures other than the LSM tree are compressed.
  • the user data corresponding to the metadata is mainly migrated to store the user data corresponding to the metadata in a continuous storage space, so that the actual addresses of the multiple user data appear as regularity to compress the values in the metadata of these user data.
  • the user data may not be migrated, so that the actual addresses of multiple user data show regularity.
  • the actual address of the storage space where the user data is stored can be modified so that the actual addresses of multiple user data show regularity, that is, the values in the metadata of multiple user data conform to the set rule, The values in the metadata of these user data are thus compressed.
  • the mapping relationship between the actual address of the user data before modification and the underlying physical address is updated to the mapping relationship between the actual address of the user data after modification and the underlying physical address.
  • the value in the metadata refers to the chunk group ID of the storage space where the user data resides and the offset in the chunk group.
  • the chunk group IDs and offsets in chunk groups in the storage space of multiple user data can be regularized, for example, making multiple user data
  • the chunk group ID of the storage space and the offset in the chunk group are continuous, so that the values in the metadata of these user data can be compressed.
  • this embodiment also provides a metadata compression device, which can be used to perform some or all of the steps in the above-mentioned metadata compression method of this embodiment.
  • the metadata compression device includes hardware structures and/or software modules corresponding to each function.
  • the technical solutions provided in this embodiment can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • the metadata compression apparatus may be located in a hardware device used to manage metadata in the storage system.
  • metadata compressors are located in engines in centralized storage systems.
  • the metadata compression device is located in a storage server with a metadata management function in the distributed storage system.
  • FIG. 10 is a schematic structural diagram of a metadata compression device provided by the present application.
  • the metadata compression device 40 includes an acquisition unit 401 , a processing unit 402 and a compression unit 403 .
  • the metadata compression device is used to realize the functions of some or all steps in the method described above in FIG. 3 .
  • the acquiring unit 401 is configured to execute one or more items of S301 and S306 in FIG. 3 .
  • the processing unit 402 is configured to execute one or more items of S302-S305, S307-S309, and S312 in FIG. 3 .
  • the compression unit 403 is configured to execute one or more items of S310 and S311 in FIG. 3 .
  • FIG. 11 is a schematic structural diagram of a chip provided by the present application.
  • the chip 50 is used to implement the metadata compression method provided in this application.
  • the chip may be a chip used to realize the functions of the controller in the engine 121 .
  • the chip 50 includes:
  • the processor 501 is configured to execute the metadata compression method provided in this application.
  • the processor 501 may include a general-purpose central processing unit (central processing unit, CPU) and a memory, and the processor 501 may also be a microprocessor, a field programmable gate array (Field Programmable Gate Array, FPGA) or a specific application integration Circuit (application-specific integrated circuit, ASIC), etc.
  • CPU central processing unit
  • FPGA field programmable gate array
  • ASIC application-specific integrated circuit
  • the chip 50 may further include: a memory 502 .
  • Computer instructions are stored in the memory 502, and the processor 501 executes the computer instructions stored in the memory to execute the metadata compression method provided in this application.
  • the memory 502 may be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, or a random access memory (random access memory, RAM) that can store information and instructions
  • ROM read-only memory
  • RAM random access memory
  • Other types of dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage , optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store program codes in the form of instructions or data structures and can be used by Any other medium accessed by a computer, but not limited to.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc
  • the chip 50 may further include: an interface 503 .
  • Interface 503 can be used to receive and send data.
  • the interface 502 may be a communication interface or a transceiver or the like.
  • the chip 50 may further include a communication line 504 .
  • communication line 504 may be a data bus for transferring information between the aforementioned components.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • the software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art .
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media.
  • the available medium may be a magnetic medium, such as a floppy disk, a hard disk, or a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); it may also be a semiconductor medium, such as an SSD.
  • “at least one” means one or more
  • “multiple” means two or more
  • other quantifiers are similar.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three kinds of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the singular forms “a”, “an” and “the” do not mean “one or only one” but “one or more” unless the context clearly dictates otherwise. in one".
  • “a device” means reference to one or more such devices.
  • At least one (at least one of). «" means one or any combination of subsequent associated objects, such as "at least one of A, B and C” includes A, B, C, AB, AC, BC, or ABC.
  • the character “/” generally indicates that the front and rear related objects are a kind of "or” relationship; in the formula of the application, the character “/” indicates that the front and rear Associated objects are a "division" relationship.

Abstract

The present application provides a metadata compression method and apparatus, related to the field of storage and used for reducing storage resources occupied by metadata. The method comprises: acquiring n pieces of metadata, where n is a positive integer greater than 1. One piece of metadata comprises a key-value pair, the key-value pair comprising a keyword and a value. The keyword is used for indicating an identifier of data corresponding to the metadata, and the value is used for indicating an actual address where the data is stored. Then, m pieces of data corresponding to at least some of the n pieces of metadata are processed to obtain n target values corresponding to the n pieces of metadata which conform to a set pattern, where m is a positive integer less than or equal to n. The n target values are then compressed.

Description

元数据压缩方法及装置Metadata compression method and device
本申请要求于2021年6月25日提交国家知识产权局、申请号为202110710701.4、申请名称为“元数据索引处理方法”和2021年08月17日提交国家知识产权局、申请号为202110944078.9、申请名称为“元数据压缩方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application is required to be submitted to the State Intellectual Property Office on June 25, 2021, the application number is 202110710701.4, and the application name is "Metadata Index Processing Method" and submitted to the State Intellectual Property Office on August 17, 2021, the application number is 202110944078.9. The priority of the Chinese patent application entitled "Metadata Compression Method and Device", the entire content of which is incorporated in this application by reference.
技术领域technical field
本申请涉及存储领域,尤其涉及一种元数据压缩方法及装置。The present application relates to the field of storage, in particular to a metadata compression method and device.
背景技术Background technique
目前,需要在存储系统中存储记录用户数据的地址的元数据,以便根据该元数据访问存储系统中所存储的用户数据。而随着存储系统中的数据量越来越多,就需要占用越多的存储资源来存储元数据。At present, it is necessary to store metadata recording addresses of user data in the storage system, so as to access the user data stored in the storage system according to the metadata. As the amount of data in the storage system increases, more storage resources are required to store metadata.
因此,如何减小元数据所占用的存储资源,这是目前需要解决的问题。Therefore, how to reduce the storage resources occupied by metadata is a problem that needs to be solved at present.
发明内容Contents of the invention
本申请提供一种元数据压缩方法及装置,解决了元数据占用存储资源较多的问题。The present application provides a method and device for compressing metadata, which solves the problem that metadata occupies more storage resources.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above object, the application adopts the following technical solutions:
第一方面,本申请提供一种元数据压缩方法,该方法包括:获取n个元数据,n为大于1的正整数。其中,一个元数据包括一个键值对,该键值对包括关键字和值。其中,关键字用于指示元数据对应的数据的标识,值用于指示数据存储的实际地址。之后,对n个元数据中至少部分元数据所对应的m个数据进行处理,得到符合设定规律的n个元数据对应的n个目标值,m为小于等于n的正整数。然后,对n个目标值进行压缩。In a first aspect, the present application provides a metadata compression method, which includes: acquiring n pieces of metadata, where n is a positive integer greater than 1. Wherein, a piece of metadata includes a key-value pair, and the key-value pair includes a keyword and a value. Wherein, the keyword is used to indicate the identification of the data corresponding to the metadata, and the value is used to indicate the actual address of the data storage. Afterwards, m data corresponding to at least part of the n metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, where m is a positive integer less than or equal to n. Then, compress the n target values.
其中,本申请中n个元数据具体可以为存储系统中所存储的多个元数据中的n个。例如,在以树形结构的数据结构组织来存储元数据的场景下,n个元数据可以包括树形结构中一个或多个节点中的n个元数据。具体的,在采用LSM树的数据结构组织存储元数据的场景下,n个元数据可以包括LSM树中一个存储层中一个或多个有序字符串表(SStable)中的n个元数据。另外,本申请中对m个数据的处理方式可以不做限制,例如对m个数据进行处理可以包括对m个数据进行迁移以改变m个数据的实际地址,进而得到符合设定规律的n个元数据对应的n个目标值;再例如也可以不对数据进行迁移来使得数据的实际地址呈现规律性,进而得到符合设定规律的n个元数据对应的n个目标值。另外,本申请对于n个目标值所符合的规律可以不做限制,只要能够按照该规律对目标值进行压缩即可。例如,上述符合设定规律的n个目标值,可以指n个目标值所指示的n个实际地址为连续的实际地址;再例如,上述符合设定规律的n个目标值,可以指n个目标值所指示的n个实际地址中每相邻的两个实际地址之间间隔相同大小的存储空间;再例如上述符合设定规律的n个目标值,可以指n个目标值所指示的n个实际地址中每相邻的两个实际地址之间间隔的存储空间大小成规律性变化等等。Wherein, the n pieces of metadata in this application may specifically be n pieces of metadata stored in the storage system. For example, in a scenario where metadata is stored in a tree-structured data structure, the n pieces of metadata may include n pieces of metadata in one or more nodes in the tree structure. Specifically, in the scenario where the data structure of the LSM tree is used to organize and store metadata, the n pieces of metadata may include n pieces of metadata in one or more ordered string tables (SStable) in a storage layer in the LSM tree. In addition, there is no limit to the processing method of m data in this application. For example, processing m data may include migrating m data to change the actual address of m data, and then obtain n data that conforms to the set rule. n target values corresponding to the metadata; for another example, data may not be migrated to make the actual address of the data appear regular, and then n target values corresponding to n metadata that conform to the set rule can be obtained. In addition, the present application does not have to limit the law that the n target values conform to, as long as the target value can be compressed according to the law. For example, the above n target values conforming to the set rule may refer to the n actual addresses indicated by the n target values as continuous actual addresses; for another example, the above n target values conforming to the set rule may refer to n Among the n actual addresses indicated by the target value, there is a storage space of the same size between each adjacent two actual addresses; for another example, the above n target values conforming to the set rule can refer to the n indicated by the n target values. The size of the storage space between every two adjacent real addresses in one real address changes regularly and so on.
本申请上述方法中,通过对n个元数据中m个元数据所对应的数据进行处理,进而得到符合设定规律的n个元数据对应的n个目标值,这样一来便可以根据上述设定规律,对n个目标值进行压缩。从而解决了因为元数据中的值不具有规律性而无法压缩的问题,进 而达到减小元数据所占用存储资源的效果。In the above-mentioned method of the present application, by processing the data corresponding to the m metadata in the n metadata, and then obtaining n target values corresponding to the n metadata that conform to the set rule, in this way, according to the above design According to a certain rule, the n target values are compressed. Therefore, the problem that the value in the metadata cannot be compressed due to irregularity is solved, thereby achieving the effect of reducing storage resources occupied by the metadata.
在一种实现方式中,上述符合设定规律的n个目标值所指示的n个实际地址是连续的。In an implementation manner, the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
上述实现方式中,通过使n个目标值所指示的n个实际地址是连续的,这样一来便于对元数据中的目标值进行压缩。例如在压缩n个目标值时,可以通过记录n个目标值中作为开始位的第一个目标值以及其他目标值与第一个目标值的偏移量的方式,达到记录n个目标值的效果,从而实现对n个目标值进行压缩。In the above implementation manner, by making the n actual addresses indicated by the n target values continuous, it is convenient to compress the target values in the metadata. For example, when compressing n target values, it is possible to record n target values by recording the first target value as the start bit among the n target values and the offset between other target values and the first target value. effect, so as to realize the compression of n target values.
在一种实现方式中,上述对至少部分元数据所对应的m个数据进行处理,得到符合设定规律的n个元数据对应的n个目标值,包括:将m个数据进行迁移,以将n个元数据对应的数据存储至一段实际地址连续的存储空间中。之后,将n个数据存储在连续的存储空间中的实际地址保存为n个目标值。In an implementation manner, the above-mentioned m data corresponding to at least part of the metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, including: migrating the m data to The data corresponding to the n pieces of metadata is stored in a storage space with continuous actual addresses. Afterwards, the actual addresses where n data are stored in the continuous storage space are saved as n target values.
上述实现方式中,采用对m个数据进行迁移的方式,以使得n个元数据对应的数据存储至一段实际地址连续的存储空间中,然后再通过将n个数据存储在连续的存储空间中的实际地址保存为n个目标值,这样一来便可以使得n个目标值呈现出规律性(即符合设定规律),从而便于对n个目标值进行压缩。In the above implementation, the method of migrating m data is adopted, so that the data corresponding to n metadata is stored in a storage space with continuous actual addresses, and then the n data is stored in the continuous storage space. The actual address is stored as n target values, so that the n target values can be made regular (that is, conform to the set rule), thereby facilitating the compression of the n target values.
在一种实现方式中,该方法还包括:根据元数据集合所包含的多个元数据所对应的数据的冷热程度,从多个元数据中选择n个元数据。其中n个元数据所对应的数据是冷数据。In an implementation manner, the method further includes: selecting n pieces of metadata from the pieces of metadata according to the hotness and coldness of the data corresponding to the pieces of metadata included in the metadata set. The data corresponding to the n pieces of metadata is cold data.
上述实现方式中,考虑到存储系统中热数据被修改的可能性高,因此热数据的元数据中的值发生改变的可能性就高,因此如果对热数据也采用对用户数据进行处理的这一方式对热数据的元数据进行压缩,则会导致压缩效率低下。特别是在采用追加写或写时重定向(redirect-on-write,ROW)方式进行用户数据写入的场景下,由于用户数据每次被修改时,都会将该用户数据的修改后的数据内容存入新的实际地址,因此上述问题会更加明显。例如,在采用追加写或ROW方式进行用户数据写入的场景下,可能出现:刚对某个用户数据(该用户数据为热数据)的元数据中的值进行压缩,用户数据的实际地址就因为用户数据发生修改而改变,这就相当于对用户数据的元数据中的值进行压缩的过程是没有意义的。因此,通过上述实现方式从元数据集合所包含的多个元数据中选择n个所对应数据为冷数据的元数据,然后再按照上述方法对n个元数据的n个目标值进行压缩,这样一来达到只对冷数据对应的值进行处理并压缩,避免对热数据对应的值进行处理、压缩,从而提高元数据压缩的效率。其中,元数据集合可以为包括了多个元数据的任意集合。例如,在上述n个元数据为LSM树中第一存储层(可以为任一存储层)中的元数据的情况下,元数据集合可以为在将n个元数据合并至第一存储层之前,第一存储层上一层的存储层所包括元数据的一个集合。再例如,在上述n个元数据为LSM树中第一存储层中的元数据的情况下,元数据集合可以为第一存储层中多个元数据的集合,本申请中对元数据集合在实际应用中的范围可以不做限制。In the above implementation, considering that the hot data in the storage system is likely to be modified, the possibility of changing the value of the metadata of the hot data is high. Compressing the metadata of hot data in one way will lead to low compression efficiency. Especially in the scenario where user data is written by means of additional write or redirect-on-write (ROW), each time the user data is modified, the modified data content of the user data will be Deposit a new physical address, so the above problem will be more obvious. For example, in the scenario where user data is written in append or ROW mode, it may happen that the actual address of the user data is just Because the user data is modified and changed, this is equivalent to the process of compressing the value in the metadata of the user data, which is meaningless. Therefore, select n pieces of metadata corresponding to cold data from the multiple pieces of metadata contained in the metadata set through the above implementation method, and then compress the n target values of the n pieces of metadata according to the above method, so that One is to only process and compress the values corresponding to the cold data, and avoid processing and compressing the values corresponding to the hot data, thereby improving the efficiency of metadata compression. Wherein, the metadata set may be any set including multiple metadata. For example, in the case where the above n pieces of metadata are metadata in the first storage layer (which can be any storage layer) in the LSM tree, the metadata set can be , a set of metadata included in the storage layer above the first storage layer. For another example, in the case where the above n pieces of metadata are metadata in the first storage layer in the LSM tree, the metadata collection can be a collection of multiple metadata in the first storage layer. In this application, the metadata collection is in The range in practical application may not be limited.
在一种实现方式中,n个元数据为LSM树中第一存储层中的元数据。其中,该LSM树用于存储元数据,该LSM树包括多个存储层,该多个存储层包括上述第一存储层。In an implementation manner, the n pieces of metadata are metadata in the first storage layer in the LSM tree. Wherein, the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
在上述实现方式中,可以将本申请所提供的元数据压缩方法,应用于LSM树中的任一存储层中,从而达到对该存储层中的元数据的值进行压缩的效果。In the above implementation manner, the metadata compression method provided in the present application can be applied to any storage layer in the LSM tree, so as to achieve the effect of compressing the metadata value in the storage layer.
在一种实现方式中,上述关键字和值分别存储在两个数据条目中。In an implementation manner, the above key and value are stored in two data entries respectively.
上述实现方式中,通过将关键字和值分别存储在两个数据条目中,从而在对存储值的 数据条目(称为值数据条目)进行修改时,可以不对关键字数据条目产生影响。这样一来,在对元数据中的值进行压缩得到压缩值后,只需要将压缩值更新至值数据条目中即可,对关键字数据条目不产生影响,即不需对关键字数据条目进行处理,从而提高元数据的压缩效率。In the above implementation, by storing the keyword and the value in two data entries, the keyword data entry may not be affected when the data entry storing the value (called the value data entry) is modified. In this way, after compressing the value in the metadata to obtain the compressed value, you only need to update the compressed value to the value data entry, and it will not affect the keyword data entry, that is, no need to update the keyword data entry Processing, thereby improving the compression efficiency of metadata.
在一种实现方式中,该方法还包括:检测元数据集合的数据变化量;元数据集合用于记录多个数据的元数据。获取n个元数据,包括:在确定元数据集合的数据变化量超出变化阈值后,获取元数据集合中包括的n个元数据。In an implementation manner, the method further includes: detecting a data change amount of the metadata set; the metadata set is used to record metadata of multiple pieces of data. Acquiring n pieces of metadata includes: after determining that the amount of data change in the metadata set exceeds a change threshold, acquiring n pieces of metadata included in the metadata set.
上述实现方式中,考虑到随着存储系统的运行,即便之前对元数据中的值进行了压缩处理,但之后会不断有新的元数据存入元数据集合。因此上述实现方式中,采用检测元数据集合的数据变化量,并在确定数据变化量超出变化阈值后触发获取元数据集合中的n个元数据,以便按照本申请的方法对n个元数据的值进行压缩的这一方式,从而达到在有新的元数据存入元数据集合中后,对元数据集合中的元数据进行压缩,以降低元数据集合所占用的存储资源的效果。其中,元数据集合指包括多个元数据的任意集合。In the above implementation manner, it is considered that with the operation of the storage system, even if the values in the metadata have been compressed before, new metadata will be continuously stored in the metadata collection afterwards. Therefore, in the above implementation, the data change amount of the metadata set is detected, and after it is determined that the data change amount exceeds the change threshold, the acquisition of n pieces of metadata in the metadata set is triggered, so that the n pieces of metadata can be analyzed according to the method of the present application. This way of compressing the value can achieve the effect of compressing the metadata in the metadata set after new metadata is stored in the metadata set, so as to reduce the storage resources occupied by the metadata set. Wherein, the metadata set refers to any set including multiple metadata.
在一种实现方式中,该方法还包括:获取n个元数据所对应的数据的实际地址的离散程度。另外,上述对n个元数据中至少部分元数据所对应的m个数据进行处理,包括:在确定上述离散程度大于离散阈值后,对n个元数据中至少部分元数据所对应的m个数据进行处理。In an implementation manner, the method further includes: acquiring a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata. In addition, the above-mentioned processing of m data corresponding to at least part of the metadata in the n metadata includes: after determining that the degree of dispersion is greater than the discrete threshold, processing the m data corresponding to at least part of the metadata in the n metadata to process.
上述实现方式中,考虑到当n个元数据所对应的数据的实际地址的离散程度较小时,则说明这n个实际地址本身存在一定的规律性,因此此时不对数据进行处理,也可以对元数据中的值进行一定程度的压缩。因此,上述实现方式中,采用在确定n个元数据所对应的数据的实际地址的离散程度足够大时,再对n个元数据中至少部分元数据所对应的m个数据进行处理的方式。也就是说,当离散程度比较小时,则可以不对数据进行处理,而是直接对元数据的值进行压缩。这样一来,减少元数据压缩过程中的数据处理量,减小写放大,提高压缩效率。In the above implementation, considering that when the discreteness of the actual addresses of the data corresponding to the n pieces of metadata is small, it means that the n actual addresses themselves have certain regularity, so the data is not processed at this time, and the Values in metadata are compressed to some extent. Therefore, in the above implementation manner, when it is determined that the discreteness of the actual addresses of the data corresponding to the n metadata is sufficiently large, then the m data corresponding to at least part of the n metadata are processed. That is to say, when the degree of discreteness is relatively small, the value of the metadata may be directly compressed instead of processing the data. In this way, the amount of data processing in the metadata compression process is reduced, write amplification is reduced, and compression efficiency is improved.
在一种实现方式中,该方法可以应用于集中式存储系统。具体的,该方法可以由集中式存储系统中的引擎执行。In one implementation, the method can be applied to a centralized storage system. Specifically, the method can be executed by an engine in the centralized storage system.
在一种实现方式中,该方法可以应用于分布式存储系统。具体的,该分布式存储系统中包括多个存储服务器,上述方法可以由所述多个存储服务器中的一个或多个存储服务器执行。In an implementation manner, the method can be applied to a distributed storage system. Specifically, the distributed storage system includes multiple storage servers, and the foregoing method may be executed by one or more storage servers among the multiple storage servers.
第二方面,本申请提供一种元数据压缩装置。该元数据压缩装置,可以为存储系统中用于管理元数据的硬件装置。例如,该元数据压缩装置可以为集中式存储系统中的引擎或引擎中的部分硬件装置,或者该元数据压缩装置可以为分布式存储系统中的存储服务器或存储服务器中的部分硬件装置。具体的,该元数据压缩装置,可以包括:获取单元,用于获取n个元数据,一个元数据包括一个键值对,所述键值对包括关键字和值,所述关键字用于指示所述元数据对应的数据的标识,所述值用于指示所述数据存储的实际地址,所述n为大于1的正整数。处理单元,用于对所述n个元数据中至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,m为小于等于n的正整数。压缩单元,用于对所述n个目标值进行压缩。In a second aspect, the present application provides a metadata compression device. The metadata compression device may be a hardware device for managing metadata in a storage system. For example, the metadata compression device may be an engine in a centralized storage system or a part of hardware devices in an engine, or the metadata compression device may be a storage server in a distributed storage system or a part of hardware devices in a storage server. Specifically, the metadata compression device may include: an acquisition unit, configured to acquire n pieces of metadata, one piece of metadata includes a key-value pair, the key-value pair includes a keyword and a value, and the keyword is used to indicate The identifier of the data corresponding to the metadata, the value is used to indicate the actual address of the data storage, and the n is a positive integer greater than 1. A processing unit, configured to process m data corresponding to at least some of the n metadata, to obtain n target values corresponding to the n metadata conforming to a set rule, where m is less than or equal to n positive integer of . A compression unit, configured to compress the n target values.
在一种实现方式中,上述符合设定规律的n个目标值所指示的n个实际地址是连续的。In an implementation manner, the n actual addresses indicated by the above n target values conforming to the set rule are continuous.
在一种实现方式中,处理单元,用于对所述至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,包括:所述处理单元,具体用于将所述m个数据进行迁移,以将所述n个元数据对应的数据存储至一段实际地址连续的存储空间中。所述处理单元,具体用于将所述n个数据存储在所述连续的存储空间中的实际地址保存为所述n个目标值。In an implementation manner, the processing unit is configured to process the m data corresponding to the at least part of the metadata, and obtain the n target values corresponding to the n metadata conforming to the set rules, including: The processing unit is specifically configured to migrate the m pieces of data, so as to store the data corresponding to the n pieces of metadata in a storage space with continuous actual addresses. The processing unit is specifically configured to save the actual addresses of the n data stored in the continuous storage space as the n target values.
在一种实现方式中,处理单元,还用于根据多个元数据所对应的数据的冷热程度,从所述多个元数据中选择所述n个元数据,所述n个元数据所对应的数据是冷数据。In an implementation manner, the processing unit is further configured to select the n pieces of metadata from the multiple pieces of metadata according to the hotness and coldness of the data corresponding to the multiple pieces of metadata, and the n pieces of metadata The corresponding data is cold data.
在一种实现方式中,n个元数据为LSM树中第一存储层中的元数据。其中,该LSM树用于存储元数据,该LSM树包括多个存储层,该多个存储层包括上述第一存储层。In an implementation manner, the n pieces of metadata are metadata in the first storage layer in the LSM tree. Wherein, the LSM tree is used to store metadata, and the LSM tree includes multiple storage layers, and the multiple storage layers include the above-mentioned first storage layer.
在一种实现方式中,上述关键字和值分别存储在两个数据条目中。In an implementation manner, the above key and value are stored in two data entries respectively.
在一种实现方式中,处理单元,还用于检测元数据集合的数据变化量;所述元数据集合用于记录多个数据的元数据。所述获取单元,用于获取n个元数据,包括:所述获取单元,具体用于在确定所述元数据集合的数据变化量超出变化阈值后,获取所述元数据集合中包括的所述n个元数据。In an implementation manner, the processing unit is further configured to detect a data change amount of a metadata set; the metadata set is used to record metadata of multiple pieces of data. The acquisition unit is configured to acquire n pieces of metadata, including: the acquisition unit is specifically configured to acquire the metadata included in the metadata set after determining that the amount of data change in the metadata set exceeds a change threshold. n metadata.
在一种实现方式中,所述获取单元,还用于获取所述n个元数据所对应的数据的实际地址的离散程度。所述处理单元,用于对所述n个元数据中至少部分元数据所对应的m个数据进行处理,包括:所述处理单元,具体用于在确定所述离散程度大于离散阈值后,对所述n个元数据中至少部分元数据所对应的m个数据进行处理。In an implementation manner, the obtaining unit is further configured to obtain a degree of dispersion of actual addresses of data corresponding to the n pieces of metadata. The processing unit is configured to process m pieces of data corresponding to at least part of the metadata in the n pieces of metadata, including: the processing unit is specifically configured to, after determining that the degree of dispersion is greater than a dispersion threshold, process m data corresponding to at least part of the n metadata are processed.
在一种实现方式中,该元数据压缩装置位于集中式存储系统中的引擎中。In one implementation, the metadata compression device is located in an engine in the centralized storage system.
在一种实现方式中,该元数据压缩装置位于分布式存储系统中的存储服务器中。In an implementation manner, the metadata compression device is located in a storage server in a distributed storage system.
第三方面,提供一种存储设备,包括:存储器和处理器,存储器用于存储计算机指令,处理器用于从存储器中调用并运行计算机指令,以实现如上述第一方面或第一方面中各实现方式中任一项所提供的方法。In a third aspect, a storage device is provided, including: a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute computer instructions from the memory, so as to realize the first aspect or the implementations in the first aspect The method provided by any of the methods.
第四方面,提供一种存储系统,包括引擎和多个硬盘,多个硬盘用于存储数据,引擎用于执行如上述第一方面或第一方面中各实现方式中任一项所提供的方法。具体的,该存储系统可以为集中式存储系统。In a fourth aspect, a storage system is provided, including an engine and a plurality of hard disks, the plurality of hard disks are used to store data, and the engine is used to execute the method provided in any one of the above-mentioned first aspect or each implementation manner of the first aspect . Specifically, the storage system may be a centralized storage system.
第五方面,提供一种存储系统,包括多个存储服务器,多个存储服务器用于存储数据,多个存储服务器中的第一服务器用于执行上述第一方面或第一方面中各实现方式中任一项所提供的方法。具体的,该存储系统可以为分布式存储系统。其中,第一服务器可以为分布式存储系统中具有管理元数据功能的存储服务器。In a fifth aspect, a storage system is provided, including a plurality of storage servers, the plurality of storage servers are used to store data, and the first server among the plurality of storage servers is used to perform the above-mentioned first aspect or in each implementation manner of the first aspect Either of the methods provided. Specifically, the storage system may be a distributed storage system. Wherein, the first server may be a storage server capable of managing metadata in the distributed storage system.
第六方面,提供一种芯片,包括存储器和处理器,所述存储器用于存储计算机指令,所述处理器用于从所述存储器中调用并运行所述计算机指令,以实现如上述第一方面或第一方面中各实现方式中任一项所提供的方法。In a sixth aspect, there is provided a chip, including a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to implement the first aspect or The method provided by any one of the implementations in the first aspect.
第七方面,提供一种计算机可读存储介质,该存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如上述第一方面或第一方面中各实现方式中任一项所提供的方法。In the seventh aspect, there is provided a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned first aspect or each implementation manner in the first aspect can be realized. method provided by the item.
第八方面,提供一种计算机程序产品,该计算机程序产品包括指令,当所述指令在处理器上运行时,实现如上述第一方面或第一方面中各实现方式中任一项所提供的方法。In an eighth aspect, there is provided a computer program product, the computer program product includes instructions, and when the instructions are run on a processor, the above-mentioned first aspect or any one of the implementations in the first aspect is implemented. method.
上述第二方面至第八方面的有益效果可以参加第一方面和第一方面中各实现方式的 有益效果,在此不再赘述。The above-mentioned beneficial effects of the second aspect to the eighth aspect can participate in the beneficial effects of the first aspect and each implementation manner in the first aspect, and will not be repeated here.
附图说明Description of drawings
图1为本申请提供的一种存储系统的结构示意图;FIG. 1 is a schematic structural diagram of a storage system provided by the present application;
图2为本申请提供的一种向存储系统写入数据的流程示意图;FIG. 2 is a schematic flow chart of writing data to a storage system provided by the present application;
图3为本申请提供的一种元数据压缩方法的流程示意图;FIG. 3 is a schematic flow diagram of a metadata compression method provided by the present application;
图4A为本申请提供的一种将元数据从LSM树中L 1层向L 2层合并的流程示意图之一; FIG. 4A is one of the flow diagrams for merging metadata from layer L1 to layer L2 in the LSM tree provided by the present application ;
图4B为本申请提供的一种将元数据从LSM树中L 1层向L 2层合并的流程示意图之二; FIG. 4B is the second schematic flow diagram of merging metadata from L1 layer to L2 layer in the LSM tree provided by the present application;
图5A为本申请提供的一种对数据进行迁移的示意图之一;FIG. 5A is one of the schematic diagrams for data migration provided by the present application;
图5B为本申请提供的一种对数据进行迁移的示意图之二;Figure 5B is a second schematic diagram of data migration provided by this application;
图6A为本申请提供的一种对数据进行迁移的示意图之三;Figure 6A is the third schematic diagram of data migration provided by this application;
图6B为本申请提供的一种对数据进行迁移的示意图之四;FIG. 6B is a fourth schematic diagram of data migration provided by this application;
图7为本申请提供的一种关键字数据条目和值数据条目的结构示意图之一;FIG. 7 is one of the schematic structural diagrams of a keyword data entry and a value data entry provided by the present application;
图8为本申请提供的一种关键字数据条目和值数据条目的结构示意图之二;Fig. 8 is the second structural diagram of a keyword data entry and a value data entry provided by the present application;
图9为本申请提供的一种关键字数据条目和值数据条目的结构示意图之三;FIG. 9 is the third schematic diagram of the structure of a keyword data entry and a value data entry provided by this application;
图10为本申请提供的一种元数据压缩装置的结构示意图之一;FIG. 10 is one of the structural schematic diagrams of a metadata compression device provided by the present application;
图11为本申请提供的一种元数据压缩装置的结构示意图之二。FIG. 11 is the second structural schematic diagram of a metadata compression device provided by the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Among them, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not necessarily limit the difference. Meanwhile, in the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. To be precise, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner for easy understanding.
为了便于理解本申请实施例,首先对本申请实施例所提供技术方案的应用场景进行介绍:In order to facilitate the understanding of the embodiments of the present application, firstly, the application scenarios of the technical solutions provided by the embodiments of the present application are introduced:
示例性的,如图1所示为本申请实施例提供的一种网络架构的示意图。在图1所示的应用场景中,可以通过运行应用程序来存取用户数据。其中,运行应用程序的计算机可以称为“应用服务器”。应用服务器100可以是物理机,也可以是虚拟机。应用服务器100包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。应用服务器通过交换机110访问存储系统120以存取用户数据。然而,交换机110只是一个可选设备,应用服务器100也可以直接通过网络与存储系统120通信。或者,交换机110也可以替换成以太网交换机、InfiniBand交换机、RoCE(RDMA over Converged Ethernet)交换机等。Exemplarily, FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application. In the application scenario shown in Figure 1, user data can be accessed by running an application program. Wherein, the computer running the application program may be referred to as an "application server". The application server 100 may be a physical machine or a virtual machine. The application server 100 includes, but is not limited to, desktop computers, servers, notebook computers, and mobile devices. The application server accesses the storage system 120 through the switch 110 to access user data. However, the switch 110 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network. Alternatively, the switch 110 can also be replaced with an Ethernet switch, an InfiniBand switch, a RoCE (RDMA over Converged Ethernet) switch, and the like.
其中,存储系统120为用于存储用户数据的设备或设备集群。具体的,在实际应用过程中,存储系统120可以为集中式存储系统。集中式存储系统的特点是由一个统一的入口,所有从外部设备如应用服务器来的数据都要经过这个入口。Wherein, the storage system 120 is a device or a device cluster for storing user data. Specifically, in an actual application process, the storage system 120 may be a centralized storage system. A centralized storage system is characterized by a unified entrance, and all data from external devices such as application servers must pass through this entrance.
如图1所示,集中式存储系统的入口具体可以是集中式存储系统的引擎121。其中,引擎121中可以包括一个或多个控制器,图1中以一个控制器122为例进行说明。另外, 当引擎121中有多个控制器时,可以通过镜像通道的方式使多个控制器互为备份,当其中一个控制器发生故障时,其他控制器可以接管该故障控制器的业务,从而避免硬件故障导致整个存储系统的不可用。As shown in FIG. 1 , the entrance of the centralized storage system may specifically be the engine 121 of the centralized storage system. Wherein, the engine 121 may include one or more controllers, and one controller 122 is taken as an example in FIG. 1 for illustration. In addition, when there are multiple controllers in the engine 121, multiple controllers can be used as backups for each other through mirroring channels. When one of the controllers fails, other controllers can take over the business of the faulty controller, thereby Avoid hardware failures leading to the unavailability of the entire storage system.
另外,引擎121中还可以包含前端接口125和后端接口126,其中前端接口125用于与应用服务器100通信,从而为应用服务器100提供存储服务。后端接口126用于与硬盘134通信,以扩充存储系统的容量。通过后端接口126,引擎121可以连接更多的硬盘134,从而形成一个非常大的存储资源池。In addition, the engine 121 may further include a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100 . The backend interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 can be connected with more hard disks 134, thereby forming a very large storage resource pool.
另外,在控制器122内,可以包括处理器123、内存124。处理器112可以为一个中央处理器(central processing unit,CPU),用于处理来自存储系统外部(如应用服务器或其他存储系统)的数据访问请求,也用于处理存储系统内部发生的请求。示例性的,CPU 123通过前端接口125接收应用服务器100发送的写数据请求时,会将这些写数据请求中的用户数据暂时保存在内存124中。当内存124中的用户数据总量达到一定阈值时,CPU 123通过后端接口将内存124中存储的用户数据发送至硬盘134进行持久化存储。In addition, the controller 122 may include a processor 123 and a memory 124 . Processor 112 may be a central processing unit (central processing unit, CPU), used to process data access requests from outside the storage system (such as application servers or other storage systems), and also used to process requests generated inside the storage system. Exemplarily, when the CPU 123 receives the write data requests sent by the application server 100 through the front-end interface 125, it will temporarily store the user data in these write data requests in the memory 124. When the total amount of user data in the internal memory 124 reaches a certain threshold, the CPU 123 sends the user data stored in the internal memory 124 to the hard disk 134 for persistent storage through the back-end interface.
内存124是用于与处理器直接交换数据的内部存储器,它可以随时读写数据,而且读写速度快,可以作为操作系统或其他正在运行中的程序的临时数据存储器。其中内存124可以包括多种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。举例来说,随机存取存储器是动态随机存取存储器(Dynamic Random Access Memory,DRAM)或者存储级存储器(Storage Class Memory,SCM)。DRAM是一种半导体存储器,与大部分随机存取存储器(Random Access Memory,RAM)一样,属于一种易失性存储器(volatile memory)设备。SCM是一种同时结合传统存储装置和存储器特性的复合型存储技术,SCM能够提供比硬盘更快的读写速度,但存取速度上比DRAM慢,在成本上也比DRAM更便宜。然而,DRAM和SCM在本实施例中指示示例性的说明,内存还可以包括其他随机存取存储器,例如静态随机存取存储器(Static Random Access Memory,SRAM)等。而对于只读存储器,举例来说,可以是可编程只读存储器(Programmable Read Only Memory,PROM)、可抹除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)等。另外,内存124还可以是双列直插式存储器模块或双线存储器模块(Dual In-line Memory Module,简称DIMM),即由DRAM组成的模块,还可以是固态硬盘(Solid State Disk,SSD)。实际应用中,控制器0中可配置多个内存124,以及不同类型的内存124。本实施例不对内存124的数量和类型进行限定。此外,可对内存124进行配置使其具有保电功能。保电功能是指系统发生掉电又重新上电时,内存124中存储的数据也不会丢失。具有保电功能的内存被称为非易失性存储器。The memory 124 is an internal memory for directly exchanging data with the processor. It can read and write data at any time, and the reading and writing speed is fast. It can be used as a temporary data storage for an operating system or other running programs. The memory 124 may include various types of memory, for example, the memory may be a random access memory or a read-only memory (Read Only Memory, ROM). For example, the random access memory is Dynamic Random Access Memory (Dynamic Random Access Memory, DRAM) or Storage Class Memory (Storage Class Memory, SCM). DRAM is a semiconductor memory, which, like most Random Access Memory (RAM), is a volatile memory device. SCM is a composite storage technology that combines the characteristics of traditional storage devices and memory. SCM can provide faster read and write speeds than hard disks, but slower access speeds than DRAM and cheaper than DRAM. However, DRAM and SCM are exemplary illustrations in this embodiment, and the memory may also include other random access memories, such as Static Random Access Memory (Static Random Access Memory, SRAM) and the like. As for the read-only memory, for example, it may be a programmable read-only memory (Programmable Read Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), and the like. In addition, the memory 124 can also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM for short), that is, a module composed of DRAM, or a solid state disk (Solid State Disk, SSD) . In practical applications, multiple memories 124 and different types of memories 124 may be configured in the controller 0 . This embodiment does not limit the quantity and type of the memory 124 . In addition, the memory 124 can be configured to have a power saving function. The power saving function means that the data stored in the internal memory 124 will not be lost when the system is powered off and then powered on again. Memory with a power saving function is called non-volatile memory.
需要说明的是,图1中只示出了一个引擎121,然而在实际应用中,存储系统中可包含两个或两个以上引擎121,多个引擎121之间做冗余或者负载均衡。另外,在一种实现方式中,引擎121还可以包括硬盘槽位,这种情况下硬盘134可以直接部署在引擎121中,后端接口126属于可选配置,当系统的存储空间不足时,可以通过后端接口126连接更多的硬盘或硬盘框。It should be noted that only one engine 121 is shown in FIG. 1 , but in practical applications, the storage system may include two or more engines 121 , and redundancy or load balancing is performed among the multiple engines 121 . In addition, in an implementation manner, the engine 121 may also include hard disk slots. In this case, the hard disk 134 can be directly deployed in the engine 121, and the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, you can More hard disks or hard disk enclosures are connected through the back-end interface 126 .
另外需要说明的是,图1仅示例性的提供了一种集中式存储系统的结构示意图。在另一些应用场景中,存储系统120可以由多个独立的存储服务器构成,其中存储服务器之间可以相互通信。其中,各存储服务器可以分别包括处理器、内存、网卡和硬盘等硬件部件。 其中,处理器与内存用于提供计算资源;处理器用于处理来自存储服务器外的数据访问请求;内存用于与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,可以作为操作系统或其他正在运行中的程序的临时数据存储器。硬盘用于提供存储资源,例如存储数据,它可以是磁盘或其他类型的存储介质,例如固态硬盘或叠瓦式磁记录硬盘等。另外,存储服务器中还可以包括用于与应用服务器通信的网卡。In addition, it should be noted that FIG. 1 only provides a schematic structural diagram of a centralized storage system as an example. In other application scenarios, the storage system 120 may be composed of multiple independent storage servers, where the storage servers may communicate with each other. Wherein, each storage server may respectively include hardware components such as a processor, a memory, a network card, and a hard disk. Among them, the processor and memory are used to provide computing resources; the processor is used to process data access requests from outside the storage server; the memory is used to directly exchange data with the processor's internal memory, which can read and write data at any time, and the speed is very fast. Can be used as temporary data storage for the operating system or other running programs. A hard disk is used to provide storage resources, such as storing data, and it can be a magnetic disk or other types of storage media, such as solid-state hard disks or shingled magnetic recording hard disks. In addition, the storage server may also include a network card for communicating with the application server.
在存储系统120运行过程中,硬盘134所提供的存储空间的实际地址通常不直接暴露给应用服务器100使用。具体的,存储系统120中存储着记录有用户数据的实际地址的元数据。具体的,当应用服务器100向硬盘134中写入数据时,通过在元数据文件中增加该用户数据的元数据,以记录该数据的实际地址。当应用服务器100需要读取存储系统120中存储在硬盘134中的用户数据时,可以通过在记录上述元数据的元数据文件中查找该用户数据的元数据,确定该用户数据的实际地址。During the operation of the storage system 120 , the actual address of the storage space provided by the hard disk 134 is generally not directly exposed to the application server 100 for use. Specifically, the storage system 120 stores metadata recording actual addresses of user data. Specifically, when the application server 100 writes data into the hard disk 134, the metadata of the user data is added to the metadata file to record the actual address of the data. When the application server 100 needs to read the user data stored in the hard disk 134 in the storage system 120, the actual address of the user data can be determined by searching the metadata of the user data in the metadata file recording the above metadata.
需要说明的是,为了区别元数据与被元数据描述的数据,本申请实施例中将被元数据描述的数据称为“用户数据”。本申请实施例中所称用户数据,可以理解为应用服务器提供相关服务而存放在存储系统中的数据,而元数据则是用来描述这些用户数据的数据(data that describes other data),包括但不限于用户数据存储的实际地址,逻辑地址与实际地址之间的映射关系,用户数据的属性等信息。在具体实施中,用户数据也可以被称为“数据”等其他名称,对此本申请实施例可以不做限制。It should be noted that, in order to distinguish metadata from data described by metadata, the data described by metadata is referred to as "user data" in this embodiment of the application. The user data mentioned in the embodiment of this application can be understood as the data stored in the storage system provided by the application server to provide related services, and the metadata is the data used to describe these user data (data that describes other data), including but It is not limited to the actual address of the user data storage, the mapping relationship between the logical address and the actual address, the attribute of the user data and other information. In a specific implementation, user data may also be called "data" or other names, which may not be limited in this embodiment of the present application.
示例性的,在应用服务器100从存储系统120中读取用户数据的场景下,应用服务器可以向存储系统120发送携带该用户数据的标识的读数据请求,其中用户数据的标识可以为应用服务器100中使用的该用户数据的逻辑地址等等。在存储系统120接收到该读数据请求后,CPU 123根据该用户数据的标识,从内存124或硬盘134中存储的元数据文件中查找到该用户数据的实际地址,其中,该用户数据的实际地址可以为该用户数据在存储系统120中底层的物理地址或中间层的逻辑地址。然后CPU 123通过后端接口126读取硬盘134的上述实际地址中的用户数据,并通过前端接口125反馈给应用服务器100。Exemplarily, in the scenario where the application server 100 reads user data from the storage system 120, the application server may send a read data request carrying the identifier of the user data to the storage system 120, where the identifier of the user data may be the application server 100 The logical address of this user data used in , etc. After the storage system 120 receives the read data request, the CPU 123 finds the actual address of the user data from the metadata file stored in the internal memory 124 or the hard disk 134 according to the identifier of the user data, wherein the actual address of the user data The address may be the physical address of the bottom layer of the user data in the storage system 120 or the logical address of the middle layer. Then CPU 123 reads the user data in the above-mentioned actual address of hard disk 134 through back-end interface 126, and feeds back to application server 100 through front-end interface 125.
进一步的,在元数据中记录有用户数据的标识与该用户数据的实际地址的映射关系。具体的,该映射关系可以用键值对(key value pair,KV pair)的形式来存储。如图1中,在内存124中存储有包括元数据的元数据文件,在元数据文件中,可以将用户数据的标识(如图中标识1-5)作为键值对的关键字(key),将数据的实际地址(如图中地址1-5)作为键值对的值(value),从而通过键值对的形式,建立用户数据的标识与数据实际地址的映射关系。需要说明的是,图1中仅示例性的以列表形式展示元数据中键值对的内容,在实际应用中还以其他形式(例如采用树形结构)存储键值对,对于键值对的存储形式,本申请可以不做限制。Further, the mapping relationship between the identifier of the user data and the actual address of the user data is recorded in the metadata. Specifically, the mapping relationship may be stored in the form of a key-value pair (KV pair). As shown in Fig. 1, a metadata file including metadata is stored in the internal memory 124. In the metadata file, the identification of user data (identification 1-5 in the figure) can be used as the keyword (key) of the key-value pair , using the actual address of the data (addresses 1-5 in the figure) as the value of the key-value pair, so as to establish a mapping relationship between the identifier of the user data and the actual address of the data in the form of a key-value pair. It should be noted that the content of the key-value pairs in the metadata is only shown in the form of a list as an example in Figure 1, and the key-value pairs are also stored in other forms (such as using a tree structure) in practical applications. For the key-value pairs The storage form may not be limited in this application.
其中,键值对存储为非关系型数据库的代表,它放弃了关系型数据库中数据表严格的字段结构和表格之间的关系限制。以键值对应存储的数据,采用简化的数据模型,使得键值对存储具有以下优势:第一、高可扩展性,由于没有数据表严格的字段结构和表格之间的关系限制,键值对可以很容易的在多台服务器上部署分布式应用,从而提升整个系统的伸缩性,更加方便灵活。第二、适应云计算需求的海量存储和高吞吐能力。键值对存储可以很好的满足云计算环境下用户对可伸缩性的灵活需求,因此,键值对存储日益成为主流的存储方式。Among them, key-value pairs are stored as a representative of non-relational databases, which abandons the strict field structure of data tables in relational databases and the relationship restrictions between tables. The data stored in key-value correspondence adopts a simplified data model, so that key-value pair storage has the following advantages: First, high scalability, because there is no strict field structure of the data table and the relationship between tables, the key-value pair Distributed applications can be easily deployed on multiple servers, thereby improving the scalability of the entire system and making it more convenient and flexible. Second, mass storage and high throughput capacity to meet the needs of cloud computing. Key-value pair storage can well meet the flexible needs of users for scalability in the cloud computing environment. Therefore, key-value pair storage is increasingly becoming the mainstream storage method.
进一步的,为了便于管理元数据,通常采用结构化的形式例如二叉查找树、平衡树(balanced tree,B tree)、B+树或结构化合并树(log structured merge tree,LSM tree),对元数据进行存储。下面具体以LSM树为例,对元数据的存储结构进行介绍:Further, in order to facilitate metadata management, structured forms such as binary search tree, balanced tree (B tree), B+ tree or structured merge tree (log structured merge tree, LSM tree) are usually used. data is stored. The following takes the LSM tree as an example to introduce the storage structure of metadata:
LSM树是基于日志(log)的数据库系统中的常用存储结构之一,LSM树是一个多层框架。在将LSM树用于管理元数据的场景中,LSM树主要存储在内存中,在一些场景中也可以将LSM树的全部或部分节点中的元数据暂存在硬盘中,当需要读取这些节点中的元数据时再将该节点中的元数据复制到内存中。The LSM tree is one of the commonly used storage structures in a log-based database system, and the LSM tree is a multi-layer framework. In the scenario where the LSM tree is used to manage metadata, the LSM tree is mainly stored in memory. In some scenarios, the metadata in all or some nodes of the LSM tree can also be temporarily stored in the hard disk. When these nodes need to be read When the metadata in the node is copied to the memory.
示例性的,在存储系统120采用LSM树框架管理元数据的场景下。如图2所示,硬盘134中包括用于存储有序字符串表(sorted string table,SStable)的有序字符串表区域1341和用于存储用户数据的数据存储区域1342,其中有序字符串表区域1341和数据存储区域1342一般为硬盘134中逻辑上划分的存储区域。Exemplarily, in a scenario where the storage system 120 uses the LSM tree framework to manage metadata. As shown in Figure 2, the hard disk 134 includes an ordered string table area 1341 for storing an ordered string table (sorted string table, SStable) and a data storage area 1342 for storing user data, wherein the ordered string table The table area 1341 and the data storage area 1342 are generally logically divided storage areas in the hard disk 134 .
当存储系统120接收将用户数据X写入存储系统的写数据请求时,如图2所示,存储系统120可以将数据X写入数据存储区域1342,并将数据X的元数据(包括数据X对应的键值对)有序写入内存表(memtable)1241。另外,图中未示出的是,在实际应用中在接收到数据X的写数据请求后,存储系统120还可以通过预写式日志(write-ahead logging,WAL)将本次写入操作记录在日志文件中,以便进行故障恢复。When the storage system 120 receives a data write request for writing user data X into the storage system, as shown in FIG. Corresponding key-value pairs) are written into memory table (memtable) 1241 in order. In addition, what is not shown in the figure is that in practical applications, after receiving a data write request for data X, the storage system 120 can also record this write operation through a write-ahead logging (WAL) in the log file for failover.
内存表1241位于内存124中,当内存表1241超过一定阈值后会在内存中冻结,切换为不可修改内存表(immutable memtable)1242。此时,为了不阻塞写操作,存储系统120的内存中会重新生成新的内存表继续提供服务。之后,不可修改内存表1242会批量写入有序字符串表区域1341中。有序字符串表区域位于一个或多个硬盘134中。其中,有序字符串表区域1341中包括多存储层结构,如图2中包括L 0层、L 1层和L 2层,通常越上层的存储层其存储空间越小,越下层存储层的存储空间越大,其中,各存储层可以包括一个或多个SStable,进一步的一个或多个SStable可以采用结构数据的形式存储。在将不可修改内存表1242写入有序字符串表区域1341时,会先将不可修改内存表1242写入顶层的存储层,如图2中L 0层。当L 0层中的数据量达到阈值后,会将L 0层中的SStable合并(merge)到L 1层,当L 1层中的数据量达到阈值后,会将L 1层中的SStable合并(merge)到L 2层,以此类推,从而使旧的元数据可以不断被删除,新数据能够不断被写入。上述示例中,以三层存储层为例对存储元数据的LSM树进行介绍,可以理解的是在实际应用中LSM树可以由更多或更少层的存储层构成,对此本申请可以不做限制。 The memory table 1241 is located in the memory 124 , and when the memory table 1241 exceeds a certain threshold, it will be frozen in the memory and switched to an immutable memory table (immutable memtable) 1242 . At this time, in order not to block the write operation, a new memory table will be regenerated in the memory of the storage system 120 to continue to provide services. Afterwards, the non-modifiable memory table 1242 will be written into the ordered string table area 1341 in batches. The ordered string table area is located on one or more hard disks 134 . Wherein, the ordered character string table area 1341 includes a multi - storage layer structure, such as the L0 layer, L1 layer and L2 layer as shown in FIG. The larger the storage space, each storage layer may include one or more SSTables, and further one or more SSTables may be stored in the form of structural data. When the non-modifiable memory table 1242 is written into the ordered character string table area 1341, the non-modifiable memory table 1242 will first be written into the top-level storage layer, such as layer L0 in FIG. 2 . When the amount of data in the L 0 layer reaches the threshold, the SSTable in the L 0 layer will be merged into the L 1 layer, and when the data amount in the L 1 layer reaches the threshold, the SSTable in the L 1 layer will be merged (merge) to L2 layer, and so on , so that old metadata can be continuously deleted and new data can be continuously written. In the above example, the LSM tree for storing metadata is introduced by taking the three-layer storage layer as an example. It can be understood that in practical applications, the LSM tree can be composed of more or fewer layers of storage layers. Do limit.
继续以存储系统120接收将用户数据X写入存储系统的写数据请求为例,存储系统120首先在内存表1241中查找用户数据X的元数据;如果查到,则根据元数据中的实际地址,访问数据存储区域1342。若在内存表1241中没有用户数据X的元数据,则依次往下查找,具体的先在不可修改内存表1242中查找用户数据X的元数据,若确定不可修改内存表1242中没有用户数据X的元数据,则在L 0层中查找用户数据X的元数据;若确定L 0层中没有用户数据X的元数据,则在L 1层中查找用户数据X的元数据,以此类推,直至找到用户数据X的元数据,然后根据元数据中的实际地址,访问数据存储区域1342。 Continuing with the storage system 120 receiving a data write request for writing user data X into the storage system as an example, the storage system 120 first searches the metadata of the user data X in the memory table 1241; if found, then according to the actual address in the metadata , to access the data storage area 1342 . If there is no metadata of user data X in the memory table 1241, then search down in turn, specifically first look up the metadata of user data X in the non-modifiable memory table 1242, if it is determined that there is no user data X in the non-modifiable memory table 1242 metadata of user data X in L 0 layer; if it is determined that there is no metadata of user data X in L 0 layer, then search for metadata of user data X in L 1 layer, and so on, Until the metadata of the user data X is found, then access the data storage area 1342 according to the actual address in the metadata.
可以看出,在上述应用场景中,为了能够顺利访问存储系统120中的数据,就需要建立一个完整的元数据索引,用于存储、查询硬盘中各用户数据的元数据。键值对中的关键字是用户数据的标识,键值对中的值是用户数据存储的实际地址。在存储系统120不具有 或者不使能重复数据删除功能的场景下,可以采用逻辑单元号(logical unit number ID,LUN ID)、快照编号(snap ID)、逻辑块地址(logical block address,LBA)中一项或多项作为关键字以唯一标志用户数据的信息。如果存储系统120具有文件系统,那么关键字也可以是版本号(Version)、文件名和文件内偏移,或者文件名和文件内偏移的哈希值中的一项或多项。It can be seen that in the above application scenarios, in order to successfully access the data in the storage system 120, it is necessary to establish a complete metadata index for storing and querying the metadata of each user data in the hard disk. The key in the key-value pair is the identifier of the user data, and the value in the key-value pair is the actual address of the user data storage. In the scenario where the storage system 120 does not have or does not enable the deduplication function, logical unit number (logical unit number ID, LUN ID), snapshot number (snap ID), logical block address (logical block address, LBA) can be used One or more of them are used as keywords to uniquely identify user data. If the storage system 120 has a file system, the key may also be one or more items of a version number (Version), a file name and an offset within the file, or a hash value of the file name and the offset within the file.
本实施例中为了将元数据中的值所指示的地址与其他的地址相区别,将元数据中与用户数据的标识对应的地址称为“实际地址”,可以理解的该名称并不对地址的类型产生限制。用户数据的标识与实际地址可以理解为一种相对的关系,其中用户数据的标识相对于用户数据的实际地址而言,用户数据的标识是更接近上层应用,用户数据的实际地址更接近底层硬件。举例来说,用户数据的实际地址可以是用户数据在LUN中所对应的逻辑块组(chunk group)中的地址,如用户数据的实际地址可以包括用户数据所在chunk group的chunk group ID和该用户数据在chunk group中的offset,或者,用户数据的实际地址可以是用户数据在硬盘中的物理块(chunk)中的地址,如用户数据的实际地址可以包括用户数据所在物理chunk的物理chunk ID和物理chunk中的offset。用户数据的实际地址还可以是用户数据存储在硬盘中的物理地址,假设该硬盘是固态硬盘,那么物理地址即用户数据位于该固态硬盘中的块ID以及页ID。对于用户数据的标识与实际地址的具体形式,本申请可以不做限制。为了描述方便,本实施例均以键值对中的值为用户数据的物理地址来举例说明。In this embodiment, in order to distinguish the address indicated by the value in the metadata from other addresses, the address corresponding to the user data identifier in the metadata is called the "actual address". Type produces restrictions. The identification and actual address of user data can be understood as a relative relationship. Compared with the actual address of user data, the identification of user data is closer to the upper application, and the actual address of user data is closer to the underlying hardware. . For example, the actual address of the user data can be the address in the logical chunk group (chunk group) corresponding to the user data in the LUN. For example, the actual address of the user data can include the chunk group ID of the chunk group where the user data is located and the user The offset of the data in the chunk group, or the actual address of the user data can be the address of the user data in the physical block (chunk) in the hard disk. For example, the actual address of the user data can include the physical chunk ID and the physical chunk of the physical chunk where the user data is located. The offset in the physical chunk. The actual address of the user data can also be the physical address where the user data is stored in the hard disk. Assuming that the hard disk is a solid-state hard disk, the physical address is the block ID and page ID where the user data is located in the solid-state hard disk. This application may not limit the specific form of the identification and actual address of the user data. For the convenience of description, this embodiment is described by taking the physical address of the user data as an example in which the value in the key-value pair is used.
以物理地址(physical_address)作为值的形式构建键值对为例,键值对的大小就取决于LUN ID、LBA、版本号、或物理地址的字节数,这种情况下键值对的字节数一般为24~32字节。若按照数据块为8K计算,元数据所占据的存储容量与用户数据所占据的存储容量的占比约保持在0.3%。Take the physical address (physical_address) as an example to construct a key-value pair. The size of the key-value pair depends on the LUN ID, LBA, version number, or the number of bytes in the physical address. In this case, the key-value pair The number of sections is generally 24 to 32 bytes. If calculated according to the data block size of 8K, the storage capacity occupied by metadata and the storage capacity occupied by user data account for about 0.3%.
而随着存储系统的容量越大,存储系统中元数据的数据量也随之增大,这就意味着需要占用更多的内存、硬盘资源来保存元数据,并且由于存储系统中热点数据的变化更多,元数据在内存中换入换出的次数也更多。As the capacity of the storage system increases, the amount of metadata in the storage system also increases, which means that more memory and hard disk resources are needed to save metadata, and due to the hot data in the storage system There are more changes, and more metadata is swapped in and out of memory.
为了减少元数据的数据量,在一种可能的设计中可以采用前缀压缩的方式,对键值对中的key进行压缩。具体的,考虑到在元数据索引中,多个元数据(例如LSM树中一个存储层或一个存储层中的一个或多个SStable中的多个元数据)中key部分通常具有公共部分(例如LUN ID或snap ID),那么可以对这些key提取公共部分,然后每个元数据的key部分仅记录与其他key的差异部分。这样便可以实现对元数据的压缩。In order to reduce the amount of metadata, in a possible design, a prefix compression method may be used to compress the keys in the key-value pairs. Specifically, considering that in the metadata index, the key part of multiple metadata (such as multiple metadata in a storage layer in an LSM tree or in one or more SSTables in a storage layer) usually has a common part (such as LUN ID or snap ID), then the common part of these keys can be extracted, and then the key part of each metadata only records the difference with other keys. This enables compression of metadata.
但是,在上述设计中,仅涉及对键值对中key部分进行压缩,而键值对中的value部分通常占键值对数据量的40%~50%,可以看出上述设计所能够压缩的数据量有限。However, in the above design, only the key part of the key-value pair is compressed, and the value part of the key-value pair usually accounts for 40% to 50% of the data volume of the key-value pair. It can be seen that the above design can compress The amount of data is limited.
针对上述存储系统中元数据占用资源过多的问题,本申请中考虑到:相关技术中之所以采用仅对键值对中key字段进行压缩,是由于value字段中一般为用户数据的实际地址或指纹,其中实际地址与用户数据写入时分配的空间相关,因此实际地址通常不具备规律性;另外用户数据的指纹则取决于用户数据本身内容,也很难具有规律性,因此很难对value部分内容进行压缩。In view of the above-mentioned problem that metadata occupies too much resources in the storage system, this application considers that in related technologies, only the key field in the key-value pair is compressed, because the value field is generally the actual address of user data or Fingerprint, in which the actual address is related to the space allocated when user data is written, so the actual address usually has no regularity; in addition, the fingerprint of user data depends on the content of the user data itself, and it is difficult to have regularity, so it is difficult to determine the value Some content is compressed.
因此,本申请实施例提供一种针对元数据压缩的技术方案,该技术方案中,在多个元数据中所包含的用于指示用户数据的实际地址的value不具备规律性的情况下,可以通过 对这多个元数据对应的用户数据进行处理,例如将这些用户数据迁移至连续的多个实际地址中。从而使得这些用户数据的元数据中value具有规律性,从而便于对value进行压缩。Therefore, the embodiment of the present application provides a technical solution for metadata compression. In this technical solution, when the value used to indicate the actual address of user data contained in multiple metadata does not have regularity, it can By processing the user data corresponding to the plurality of metadata, for example, the user data is migrated to multiple continuous actual addresses. Therefore, the value in the metadata of these user data has regularity, thereby facilitating the compression of the value.
下面结合附图,对本申请实施例所提供技术方案进行详细介绍:Below in conjunction with the accompanying drawings, the technical solutions provided by the embodiments of the present application are described in detail:
具体的,本申请实施例提供一种元数据压缩方法。在一些场景中,该方法可以由图1中存储系统120中引擎121,具体可以由引擎121中的控制器来实现。在该控制器中,由中央处理器123调用存储器中的程序指令来执行。所述存储器可以是图1中的内存124,也可以是位于中央处理器123中的缓存。在另一些场景中,该方法也可以由存储系统中用于管理元数据的其他硬件装置来实现。例如,当存储系统为分布式存储系统时,该方法可以由分布式存储系统中具有管理元数据功能的存储服务器或存储服务器内的部分硬件来实现。Specifically, the embodiment of the present application provides a metadata compression method. In some scenarios, the method may be implemented by the engine 121 in the storage system 120 in FIG. 1 , specifically, by a controller in the engine 121 . In the controller, the central processing unit 123 invokes the program instructions in the memory for execution. The memory may be the memory 124 in FIG. 1 , or a cache located in the central processing unit 123 . In some other scenarios, the method may also be implemented by other hardware devices used to manage metadata in the storage system. For example, when the storage system is a distributed storage system, the method may be implemented by a storage server capable of managing metadata in the distributed storage system or by some hardware in the storage server.
下面以在图1和图2的场景下引擎121的运行过程为例,对该元数据压缩方法进行介绍:如图3所示,在向存储系统中写入数据的过程中,该方法包括:Taking the running process of the engine 121 in the scenarios of FIG. 1 and FIG. 2 as an example, the metadata compression method is introduced below: as shown in FIG. 3 , in the process of writing data to the storage system, the method includes:
S301、接收来自应用服务器100的写数据请求。S301. Receive a data write request from the application server 100.
其中,写数据请求携带请求写入存储系统的用户数据。Wherein, the write data request carries user data requested to be written into the storage system.
S302、根据写数据请求,将用户数据写入内存124中,并将用户数据的元数据存储在内存表1241中。S302. Write the user data into the memory 124 according to the data write request, and store metadata of the user data in the memory table 1241.
其中,元数据包括键值对。其中,键值对的关键字用于指示用户数据的标识,值用于指示用户数据存储的实际地址。需要说明的是,下文中为了简便描述,将上述元数据中所包括键值对中的关键字,简称为“元数据中的关键字”;将上述元数据中所包括键值对中的值,简称为“元数据中的值”,除另有说明外对于“元数据中的关键字”和“元数据中的值”均可以做上述理解,对此下文中不再重复说明。Among them, metadata includes key-value pairs. Wherein, the key of the key-value pair is used to indicate the identity of the user data, and the value is used to indicate the actual address of the user data storage. It should be noted that, for the sake of simplicity of description, the keywords in the key-value pairs included in the above-mentioned metadata are referred to as "keywords in the metadata"; the values in the key-value pairs included in the above-mentioned metadata , referred to as "values in metadata", unless otherwise specified, can be understood as above for "keywords in metadata" and "values in metadata", which will not be repeated below.
需要说明的是,随着引擎121不断地接收写数据请求,内存124中存储的用户数据越来越多,当内存124中累积的用户数据达到一定阈值时,引擎121会将内存124中的用户数据写入硬盘134中进行持久化存储。用户数据存储在硬盘134中的地址是本实施例中所称用户数据存储的实际地址。引擎121接收的写数据请求除了携带用户数据之外,还包括该用户数据的逻辑地址。逻辑地址是对应用服务器100呈现的地址,用于实现应用服务器100对该用户数据的访问。引擎121在存储该用户数据之后,可以将所述逻辑地址或者所述逻辑地址对应的哈希值作为元数据中的关键字,以及将所述用户数据存储在硬盘134中的实际地址作为元数据中的值,保存所述关键字和值的对应关系,作为一个键值对。It should be noted that as the engine 121 continues to receive data write requests, more and more user data is stored in the memory 124. When the accumulated user data in the memory 124 reaches a certain threshold, the engine 121 will store the user data in the memory 124 The data is written into the hard disk 134 for persistent storage. The address where the user data is stored in the hard disk 134 is the actual address where the user data is stored in this embodiment. The write data request received by the engine 121 not only carries the user data, but also includes the logical address of the user data. The logical address is an address presented to the application server 100, and is used to enable the application server 100 to access the user data. After storing the user data, the engine 121 may use the logical address or the hash value corresponding to the logical address as a key in the metadata, and use the actual address of the user data stored in the hard disk 134 as the metadata In the value, save the corresponding relationship between the keyword and the value as a key-value pair.
S303、当内存表1241的元数据的数据量超出阈值后,引擎121将内存表1241中的内容切换为不可修改内存表1242并生成新的内存表。S303. When the amount of metadata in the memory table 1241 exceeds the threshold, the engine 121 switches the content in the memory table 1241 to an unmodifiable memory table 1242 and generates a new memory table.
在实际应用过程中,将内存表1241中的内容切换为不可修改内存表1242,可以指引擎121通过修改内存表1241的属性,使得修改属性后的内存表(即不可修改内存表)不再接收新的数据。In the actual application process, switching the content in the memory table 1241 to the non-modifiable memory table 1242 may mean that the engine 121 modifies the attributes of the memory table 1241 so that the modified memory table (that is, the non-modifiable memory table) no longer receives new data.
S304、当不可修改内存表1242包括的的数据量超出阈值后,引擎121将不可修改内存表1242中的内容转移至硬盘134。S304 , when the amount of data included in the non-modifiable memory table 1242 exceeds the threshold, the engine 121 transfers the content in the non-modifiable memory table 1242 to the hard disk 134 .
其中,S304中的阈值可以和S303中阈值相同,也可以不相同。Wherein, the threshold in S304 may be the same as or different from the threshold in S303.
具体的,引擎121可以将不可修改内存表1242中的内容,转移到有序字符串表区域1341的顶层L 0层。然后,当L 0层中的数据量超出阈值时,则将L 0层中的元数据合并到L 1层。 Specifically, the engine 121 may transfer the content in the unmodifiable memory table 1242 to the top layer L0 of the ordered string table area 1341 . Then, when the amount of data in the L 0 layer exceeds the threshold, the metadata in the L 0 layer is merged into the L 1 layer.
S305、根据L 1层中的元数据所对应的用户数据的冷热程度,将该元数据分别合并至L 2层中的元数据文件中。 S305. Merge the metadata into the metadata files in the L2 layer according to the hotness and coldness of the user data corresponding to the metadata in the L1 layer.
其中,L 2层中包括两个元数据文件,分别用于存储冷数据对应的元数据和热数据对应的元数据。下文为便于描述,将用于存储冷数据对应元数据的元数据文件称为冷元数据文件,将用于存储热数据对应元数据的元数据文件称为热元数据文件。在实际应用过程中,L 2层中的两个元数据文件,可以分别包括L 2层中不同的有序字符串表。其中,每个元数据文件可以以树结构的形式存储以便于数据查找。 Wherein, the L2 layer includes two metadata files, which are respectively used to store metadata corresponding to cold data and metadata corresponding to hot data. For ease of description, the metadata file used to store metadata corresponding to cold data is called a cold metadata file, and the metadata file used to store metadata corresponding to hot data is called a hot metadata file. In the actual application process, the two metadata files in the L2 layer may respectively include different ordered character string tables in the L2 layer. Wherein, each metadata file can be stored in a tree structure to facilitate data search.
本实施例中,用户数据的冷热程度,可以理解为该用户数据被修改的可能性(被修改的可能性具体可以反映为该用户数据的历史被修改频率或历史被修改次数等参数)的高低。其中,用户数据的冷热程度越冷,表示该用户数据被访问的可能性更低;用户数据的冷热程度越热,表示该用户数据被访问的可能性更高。另外,本实施例中,冷数据可以理解为被修改的可能性低于某一阈值的用户数据,热数据可以理解为被修改的可能性高于某一阈值的用户数据。In this embodiment, the hot or cold degree of user data can be understood as the possibility of the user data being modified (the possibility of being modified can be specifically reflected as the historical modification frequency or historical modification times of the user data and other parameters) high and low. The colder the user data is, the lower the possibility that the user data will be accessed; the hotter the user data is, the higher the possibility that the user data will be accessed. In addition, in this embodiment, cold data may be understood as user data whose possibility of being modified is lower than a certain threshold, and hot data may be understood as user data whose possibility of being modified is higher than a certain threshold.
示例性的,如图4A所示,L 1层中可以包括5个有序字符串表,各有序字符串表分别存储不同关键字范围的元数据,其中有序字符串表1中存储关键字范围为key1-key20的元数据,例如图4A中有序字符串表1中当前存储了key1、key3、key4、key7、key12、key15、key16、key18、key19、key20对应的元数据,有序字符串表2中存储关键字范围为key21-key40的元数据,有序字符串表3中存储关键字范围为关键字范围为关键字范围为key41-key60的元数据,有序字符串表4中存储关键字范围为key61-key80的元数据,有序字符串表5中存储关键字范围为key81-key100的元数据。 Exemplarily, as shown in FIG. 4A, five ordered string tables may be included in the L1 layer, and each ordered string table stores metadata of different keyword ranges respectively, wherein the ordered string table 1 stores key The metadata whose word range is key1-key20, for example, the metadata corresponding to key1, key3, key4, key7, key12, key15, key16, key18, key19, and key20 are currently stored in the ordered string table 1 in Figure 4A. String table 2 stores metadata whose key range is key21-key40, and ordered string table 3 stores metadata whose key range is key41-key60, and ordered string table 4 The metadata with the key range of key61-key80 is stored in , and the metadata with the key range of key81-key100 is stored in the ordered string table 5.
当将L 1层中的元数据合并至L 2层时,以有序字符串表1为例,依次对有序字符串表1中的元数据,判断对应用户数据的冷热程度;当确定用户数据为冷数据时,将该用户数据的元数据合并至冷元数据文件,例如图4A中将key1、key3、key12和key15(即冷数据对应的key)对应的元数据合并至冷元数据文件;当确定用户数据为热数据时,将该用户数据的元数据合并至热元数据文件,例如图4A中将key4、key7、key16、key18、key19和key20(即热数据对应的key)对应的元数据合并至热元数据文件。以此类推,可以将L 1层中各有序字符串表中的元数据合并至L 2层。 When merging the metadata in the L1 layer to the L2 layer, take the ordered string table 1 as an example, and judge the degree of hotness and coldness of the corresponding user data for the metadata in the ordered string table 1 in turn ; When the user data is cold data, merge the metadata of the user data into the cold metadata file, for example, in Figure 4A, merge the metadata corresponding to key1, key3, key12 and key15 (that is, the key corresponding to the cold data) into the cold metadata file; when it is determined that the user data is hot data, the metadata of the user data is merged into the hot metadata file, for example, key4, key7, key16, key18, key19 and key20 (that is, the key corresponding to the hot data) are corresponding in Figure 4A The metadata of the file is merged into the hot metadata file. By analogy, the metadata in each ordered string table in layer L1 can be merged into layer L2.
需要说明的是,图4A中示例性的以L 1层中包括5个有序字符串表并且各有序字符串表中分别用于存储20个关键字范围的元数据为例进行举例说明,在实际应用过程中本实例对LSM树中各存储层包括的有序字符串表的个数、各有序字符串表对应的关键字范围以及各有序字符串表中包括的元数据的个数均不作限制。 It should be noted that, in FIG. 4A, the L1 layer includes 5 ordered string tables and each ordered string table is used to store metadata of 20 keyword ranges as an example for illustration. In the actual application process, this example compares the number of ordered string tables included in each storage layer in the LSM tree, the keyword range corresponding to each ordered string table, and the number of metadata included in each ordered string table. The number is not limited.
基于图4A的示例,如图4B所示,S305具体可以包括:Based on the example in FIG. 4A, as shown in FIG. 4B, S305 may specifically include:
S3051、当L 1层中的数据量超出阈值大小或阈值个数后,触发L 1层向L 2层合并。 S3051 . When the amount of data in the L1 layer exceeds the threshold size or the threshold number, trigger the L1 layer to merge with the L2 layer.
S3052、对L 1层各有序字符串表,依次遍历读取有序字符串表中每个关键字。 S3052. For each ordered string table of layer L1, traverse and read each keyword in the ordered string table in turn.
例如,当有序字符串表采用二叉树的数据结构时,可以采用后序遍历以及归并排序的方式依次遍历各有序字符串表对应的二叉树的最左路径的关键字,从而实现遍历读取有序字符串表中每个关键字。For example, when the ordered string table adopts the data structure of binary tree, the keywords of the leftmost path of the binary tree corresponding to each ordered string table can be traversed sequentially by post-order traversal and merge sorting, so as to realize traversal and read Each key in the sequence string table.
其中,针对每个关键字执行S3053。Wherein, S3053 is executed for each keyword.
S3053、查询该关键字对应的用户数据的冷热程度。S3053. Query the hotness and coldness of the user data corresponding to the keyword.
若该用户数据为冷数据,则执行S3054;若该用户数据为热数据,则执行S3055。If the user data is cold data, execute S3054; if the user data is hot data, execute S3055.
在一种实现方式中,可以根据该用户数据所对应的IO类型,判断该用户数据的冷热程度。具体的,考虑到通常情况下,采用顺序写IO的用户数据的冷热程度更冷,采用随机写IO的用户数据的冷热程度更热,因此,可以将采用顺序写IO的用户数据作为冷数据,将采用随机写IO的用户数据作为热数据。In an implementation manner, the hot or cold degree of the user data may be judged according to the IO type corresponding to the user data. Specifically, considering that under normal circumstances, user data using sequential write IO is colder and hotter, and user data using random write IO is hotter. Therefore, user data using sequential write IO can be used as cold Data, the user data of random write IO will be used as hot data.
S3054、将该冷数据的元数据,合并进入冷元数据文件。S3054. Merge the metadata of the cold data into the cold metadata file.
其中,为了便于查找元数据,可以将冷元数据文件分为多个子文件,其中各子文件可以为一个有序字符串表,各有序字符串表分别存储不同关键字范围的元数据。例如图4A所示,L 2层中冷元数据文件包括10个SStable:有序字符串表6-有序字符串表15,这10个SStable分别用于存储冷元数据文件中key1-key10、key11-key20…key99-key100的关键字范围的元数据。 Wherein, in order to find metadata conveniently, the cold metadata file can be divided into multiple sub-files, wherein each sub-file can be an ordered string table, and each ordered string table stores metadata of different key ranges respectively. For example, as shown in Figure 4A, the cold metadata file in the L2 layer includes 10 SSTables: ordered string table 6-ordered string table 15, these 10 SSTables are used to store key1-key10, Metadata for the key range of key11-key20...key99-key100.
S3055、将该热数据的元数据,合并进入热元数据文件。S3055. Merge the metadata of the hot data into the hot metadata file.
与冷元数据文件同理,热元数据文件也可以分为多个子文件,其中各子文件可以为一个SStable,各SStable分别存储不同关键字范围的元数据。示例性的,如图4A所示,L 2层中热元数据文件包括10个SStable:有序字符串表16-有序字符串表25,这10个SStable分别用于存储热元数据文件中key1-key10、key11-key20…key99-key100的关键字范围的元数据。 Similar to the cold metadata file, the hot metadata file can also be divided into multiple subfiles, where each subfile can be an SSTable, and each SSTable stores metadata of different keyword ranges. Exemplarily, as shown in Figure 4A, the hot metadata file in the L2 layer includes 10 SSTables: ordered string table 16-ordered string table 25, and these 10 SSTables are used to store hot metadata files respectively Metadata for key ranges of key1-key10, key11-key20...key99-key100.
通过上述S3051-S3055的过程,最终可以将L 1层中的元数据分别合并至L 2层中的不同元数据文件中。 Through the above process of S3051 - S3055 , the metadata in the L1 layer can finally be merged into different metadata files in the L2 layer respectively.
如上文所述,由于用户数据的实际地址通常不具备规律性、不易被压缩,因此需要占据较大的存储空间用于存储用户数据的元数据。为了降低元数据对存储空间的占用量,本实施例中针对冷元数据文件中的元数据,采用通过对元数据所对应的用户数据进行处理,例如对用户数据进行迁移改变用户数据存储的实际地址,使得多个用户数据的实际地址呈现一定的规律性,即使得元数据中的值符合设定规律。这样,便可以按照该设定规律,对元数据中的值进行压缩,从而达到降低元数据对存储空间的占用量的效果。具体的,为了使得冷元数据文件中元数据中的值符合设定规律,本实施例所提供方法还包括:As mentioned above, since the actual address of user data is usually not regular and not easy to be compressed, it needs to occupy a relatively large storage space for storing metadata of user data. In order to reduce the storage space occupied by the metadata, in this embodiment, for the metadata in the cold metadata file, the user data corresponding to the metadata is processed, such as migrating the user data to change the actual storage of the user data. address, so that the actual addresses of multiple user data present a certain regularity, that is, the value in the metadata conforms to the set rule. In this way, the value in the metadata can be compressed according to the set rule, so as to achieve the effect of reducing the storage space occupied by the metadata. Specifically, in order to make the value in the metadata in the cold metadata file conform to the set rule, the method provided in this embodiment also includes:
S306、依次遍历冷元数据文件中各子文件,读取子文件所包括的元数据中的值。S306. Traverse each sub-file in the cold metadata file in turn, and read values in the metadata included in the sub-file.
针对各子文件所包括的元数据中的值,分别执行S307-S311中的全部或部分步骤。All or part of the steps in S307-S311 are respectively executed for the values in the metadata included in each sub-file.
需要说明的是,在只需要对冷元数据文件中部分子文件进行压缩时,也可以仅遍历该部分子文件,针对该部分子文件执行后续步骤。It should be noted that, when only some subfiles in the cold metadata file need to be compressed, only this part of subfiles may be traversed, and subsequent steps may be performed on this part of subfiles.
具体的,例如图4A所示,L 2层中可以包括10个子文件:有序字符串表6-有序字符串表15,这10个子文件分别用于存储key1-key10、key11-key20…key99-key100的关键字范围的元数据。进而,可以分别对这10个子文件执行后续步骤,以对这10个子文件进行压缩;或者,可以对这10个子文件中的部分子文件执行后续步骤,以对这部分子文件进行压缩。 Specifically, as shown in Figure 4A, the L2 layer may include 10 sub-files: ordered string table 6-ordered string table 15, these 10 sub-files are used to store key1-key10, key11-key20...key99 respectively Keyword-scoped metadata for -key100. Furthermore, subsequent steps may be performed on the 10 subfiles to compress the 10 subfiles; or, subsequent steps may be performed on some of the 10 subfiles to compress the subfiles.
S307、根据子文件所包括的元数据中的值,确定子文件对应的地址离散度。S307. Determine the address dispersion corresponding to the sub-file according to the value in the metadata included in the sub-file.
当子文件对应的地址离散度超过离散度阈值时,则执行S308;当子文件对应的地址离散度未超过离散阈值时,则返回S306遍历下个子文件。When the dispersion of addresses corresponding to the sub-file exceeds the dispersion threshold, execute S308; when the dispersion of addresses corresponding to the sub-file does not exceed the dispersion threshold, return to S306 to traverse the next sub-file.
其中,子文件对应的地址离散度,可以理解为该子文件所包括元数据所对应的用户数 据的实际地址的离散程度。Wherein, the address discrete degree corresponding to the sub-file can be understood as the discrete degree of the actual address of the user data corresponding to the metadata included in the sub-file.
例如,该子文件包括n个用户数据的元数据,这n个用户数据分别存储在n个实际地址中,其中n为正整数。那么,这n个实际地址的离散程度越高,则说明该子文件对应的地址离散度越高;这n个实际地址的离散程度越低,则说明该子文件对应的地址离散度越低。For example, the sub-file includes metadata of n pieces of user data, and the n pieces of user data are respectively stored in n actual addresses, where n is a positive integer. Then, the higher the dispersion degree of the n actual addresses, the higher the dispersion degree of the address corresponding to the sub-file; the lower the dispersion degree of the n actual addresses, the lower the dispersion degree of the address corresponding to the sub-file.
其中,这n个实际地址的离散程度,具体可以反映为能够按照几种规律来反映这n个实际地址。例如,在第一种情况下,这n个实际地址中,m个实际地址是连续的,其中m为小于n的正整数,另外n-m个实际地址是连续的,也就是说能够按照2种规律来反映这n个实际地址;在第二种情况下,这n个实际地址中,有p个实际地址是连续的,其中p为小于n的正整数,另外还有q个实际地址是连续的,其中q为小于n的正整数,另外还有(n-p-q-1)个实际地址是连续的,另外还有1个实际地址是与其他实际地址均不连续,也就是说第二种情况下能够按照4种规律来反映这n个实际地址。则在第一种情况下子文件的地址离散度小于第二种情况下子文件的地址离散度。Wherein, the degree of discreteness of the n actual addresses can be specifically reflected in that the n actual addresses can be reflected according to several laws. For example, in the first case, among the n actual addresses, m actual addresses are continuous, where m is a positive integer less than n, and the other n-m actual addresses are continuous, that is to say, it can follow two rules to reflect the n actual addresses; in the second case, among the n actual addresses, there are p actual addresses that are continuous, where p is a positive integer less than n, and there are q actual addresses that are continuous , where q is a positive integer less than n, and there are (n-p-q-1) actual addresses that are continuous, and there is another actual address that is not continuous with other actual addresses, that is to say, in the second case, it can The n actual addresses are reflected according to four rules. Then the address dispersion of the sub-file in the first case is smaller than the address dispersion of the sub-file in the second case.
需要说明的是,此处仅以连续地址这种规律进行举例说明,在实际应用中还可以有其他规律,例如n个实际地址中每相邻的两个实际地址之间间隔相同大小的存储空间,再例如n个实际地址中每相邻的两个实际地址之间间隔的存储空间大小成规律性变化等等。It should be noted that here only the law of continuous addresses is used as an example, and there may be other rules in practical applications, such as the storage space of the same size between every two adjacent real addresses among n real addresses , and for another example, the size of the storage space between every two adjacent actual addresses among the n actual addresses changes regularly and so on.
S308、对子文件中的元数据所对应的用户数据进行迁移,以使得子文件中的元数据所对应的用户数据存储至连续的存储空间中。S308. Migrate the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in a continuous storage space.
具体的,在一种实现方式中,可以通过对用户数据进行迁移,使子文件的元数据所对应的用户数据存储至一块连续的存储空间中;在另一种实现方式中,还可以使子文件的元数据所对应的用户数据分段存储至多块连续的存储空间中。下面对两种实现方式分别进行介绍:Specifically, in one implementation, the user data corresponding to the metadata of the sub-file can be stored in a continuous storage space by migrating the user data; in another implementation, the sub-file can also be The user data corresponding to the metadata of the file is segmented and stored in multiple blocks of continuous storage space. The following two implementation methods are introduced respectively:
在第一种实现方式中,上述S308具体可以包括:对子文件中的元数据所对应的用户数据迁移,以使得子文件中的元数据所对应的用户数据存储至一块连续的存储空间中。In a first implementation manner, the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in a continuous storage space.
其中,在一种可能的设计中,可以将子文件对应的用户数据迁移至一段连续的空闲存储空间中。Wherein, in a possible design, the user data corresponding to the sub-file may be migrated to a continuous free storage space.
示例性的,若子文件包括5个用户数据的元数据,如图5A所示这5个用户数据分别存储在地址1、地址3、地址4、地址8和地址10中。则通过分别读取这5个地址中的数据,并将这5个地址中的数据分别迁移至未被使用的地址11-地址15中,从而使得这5个用户数据的存储空间连续。Exemplarily, if the sub-file includes metadata of 5 user data, as shown in FIG. 5A , the 5 user data are respectively stored in address 1, address 3, address 4, address 8 and address 10. Then, by reading the data in these 5 addresses respectively and migrating the data in these 5 addresses to the unused address 11-address 15 respectively, the storage spaces of these 5 user data are continuous.
在另一种可能的设计中,可以将子文件对应的部分用户数据迁移至与子文件对应的其他用户数据的连续的存储空间中,以使得子文件中的元数据所对应的用户数据存储至一块连续的存储空间中。In another possible design, part of the user data corresponding to the sub-file can be migrated to the continuous storage space of other user data corresponding to the sub-file, so that the user data corresponding to the metadata in the sub-file is stored in the in a contiguous storage space.
示例性的,若子文件包括5个用户数据的元数据,如图5B所示这5个用户数据分别存储在地址1、地址3、地址4、地址8和地址10中。则通过分别读取地址8和地址10中的数据,并将地址8和地址10中的数据分别迁移至地址2和地址5中,从而使得这5个用户数据的存储空间连续。Exemplarily, if the sub-file includes metadata of 5 user data, as shown in FIG. 5B , the 5 user data are stored in address 1, address 3, address 4, address 8 and address 10 respectively. Then, by reading the data at address 8 and address 10 respectively, and migrating the data at address 8 and address 10 to address 2 and address 5 respectively, the storage spaces of these five user data are continuous.
在第二种实现方式中,上述S308具体可以包括:对子文件中的元数据所对应的用户数据迁移,以使得子文件的元数据所对应的用户数据分段存储至多块连续的存储空间中。In the second implementation manner, the above S308 may specifically include: migrating the user data corresponding to the metadata in the sub-file, so that the user data corresponding to the metadata of the sub-file is stored in segments into multiple consecutive storage spaces .
与第一种实现方式中类似,在第二种实现方式中,也可以包括两种可能的设计:Similar to the first implementation, in the second implementation, two possible designs may also be included:
在第一种可能的设计中,可以将子文件对应的用户数据迁移至多段连续的存储空间中。In the first possible design, the user data corresponding to the sub-files may be migrated to multiple consecutive segments of storage space.
示例性的,若子文件包括10个用户数据的元数据,如图6A所示这10个用户数据分别存储在地址1、地址3、地址4、地址8、地址10、地址11、地址14、地址16、地址17和地址20中。则通过分别读取这10个地址中的数据,并将这10个地址中的数据分别迁移至未被使用的地址21-地址25、地址32-地址36中,从而使得这10个用户数据中用户数据1-用户数据5的存储空间连续、用户数据6-用户数据10的存储空间连续。Exemplarily, if the sub-file includes metadata of 10 user data, as shown in FIG. 16. Address 17 and Address 20. Then read the data in these 10 addresses respectively, and migrate the data in these 10 addresses to the unused address 21-address 25, address 32-address 36, so that the 10 user data The storage space of user data 1-user data 5 is continuous, and the storage space of user data 6-user data 10 is continuous.
在第二种可能的设计中,可以将子文件对应的用户数据迁移至与子文件对应的其他用户数据的连续的存储空间中,以使得子文件的元数据所对应的用户数据分段存储至多块连续的存储空间中。In the second possible design, the user data corresponding to the subfile can be migrated to the continuous storage space of other user data corresponding to the subfile, so that the user data corresponding to the metadata of the subfile can be stored in segments at most in a contiguous storage space.
示例性的,若子文件包括10个用户数据的元数据,如图6B所示这10个用户数据分别存储在地址1、地址3、地址4、地址8、地址10、地址11、地址14、地址16、地址17和地址20中。通过将用户数据2-用户数据5迁移至与用户数据1连续的存储空间中,将用户数据7-用户数据10迁移至与用户数据7连续的存储空间中,从而使得这10个用户数据中用户数据1-用户数据5的存储空间连续、用户数据6-用户数据10的存储空间连续。Exemplarily, if the sub-file includes metadata of 10 user data, as shown in FIG. 16. Address 17 and Address 20. By migrating user data 2-user data 5 to the storage space continuous with user data 1, and user data 7-user data 10 to the storage space continuous with user data 7, so that the users in these 10 user data The storage space of data 1-user data 5 is continuous, and the storage space of user data 6-user data 10 is continuous.
S309、根据用户数据被迁移后的实际地址,更新子文件中被迁移的用户数据的元数据中的值。S309. Update the value in the metadata of the migrated user data in the subfile according to the actual address of the migrated user data.
例如,可以将迁移后的实际地址作为被迁移的用户数据的元数据中的值,更新至该用户数据的元数据中。For example, the migrated actual address may be used as a value in the metadata of the migrated user data, and updated in the metadata of the user data.
S310、对子文件中包括的元数据中的值进行压缩。S310. Compress the values in the metadata included in the sub-file.
相比于相关技术中由于用户数据的实际地址没有规律,因此通常将用户数据的实际地址作为元数据中的值进行保存,本实施例中则因为用户数据存储在连续的实际地址中,因此可以通过归纳连续的实际地址的变化规律,生成反映这种变化规律的压缩算法,如该压缩算法可以为二元一阶函数。示例性的,可以利用机器学习归纳用户数据的实际地址的规律,生成该二元一阶函数。Compared with the related art, because the actual address of the user data is irregular, the actual address of the user data is usually stored as a value in the metadata. In this embodiment, because the user data is stored in consecutive actual addresses, it can be By summarizing the change law of continuous actual addresses, a compression algorithm reflecting this change law is generated, for example, the compression algorithm may be a binary first-order function. Exemplarily, the law of the actual address of the user data can be summarized by using machine learning to generate the binary first-order function.
这样一来,本实施例中仅需要持久化存储压缩所采用的压缩算法以及用户数据的实际地址对应的压缩值,不需要存储用户数据的实际地址。当需要读取该用户数据的实际地址时,通过将该用户数据的实际地址对应的压缩值作为压缩算法的输入,即可由压缩算法输出该用户数据的实际地址。In this way, in this embodiment, only the compression algorithm used for compression and the compression value corresponding to the actual address of the user data need to be persistently stored, and the actual address of the user data does not need to be stored. When the actual address of the user data needs to be read, by using the compression value corresponding to the actual address of the user data as an input of the compression algorithm, the actual address of the user data can be output by the compression algorithm.
示例性的,以5个用户数据为例,在对用户数据进行迁移使得这5个用户数据的实际地址连续之后,5个用户数据的实际地址可以表示为0x0000000100000000、0x0000000100000001、0x0000000100000002、0x0000000100000003和0x0000000100000004。其中,实际地址中的“0x”表示16位进制数,中间“00000001”表示磁盘ID,最后8位数表示磁盘中的物理块编号。通过压缩,在元数据中需要记录以下信息:开始位关键字(start key):0000000100000000,数据个数:5,压缩算法:1(表示压缩算法为斜率为1的一阶函数),以及各元数据中的值的压缩值:0、1、2、3、4。Exemplarily, taking five user data as an example, after the user data is migrated so that the actual addresses of the five user data are continuous, the actual addresses of the five user data can be expressed as 0x0000000100000000, 0x0000000100000001, 0x0000000100000002, 0x0000000100000003, and 0x0000000001. Among them, "0x" in the actual address represents a hexadecimal number, "00000001" in the middle represents the disk ID, and the last 8 digits represent the physical block number in the disk. Through compression, the following information needs to be recorded in the metadata: start key (start key): 0000000100000000, number of data: 5, compression algorithm: 1 (indicating that the compression algorithm is a first-order function with a slope of 1), and each element Compressed values for values in data: 0, 1, 2, 3, 4.
另外,对于子文件中包括的元数据中的值还可以采用前缀压缩方式,对元数据中的值提取公共前缀进行压缩。对于元数据中的值采用的压缩方式,本申请实施例可以不做限制。In addition, for the values in the metadata included in the sub-files, a prefix compression method may also be used to compress the values in the metadata by extracting a common prefix. The embodiment of the present application may not limit the compression method adopted for the value in the metadata.
在一种实现方式中,该方法还包括:In an implementation manner, the method further includes:
S311、对子文件中包括的元数据中的关键字进行压缩。S311. Compress keywords in the metadata included in the sub-file.
具体的,可以采用前缀压缩方式,对元数据中的关键字提取公共前缀(例如卷号(LUN ID)、子网接入点标识(snap ID)等),进而对关键字进行压缩。再例如,当子文件中的多个元数据中的关键字中包括线性或近似线性变化的offset部分,还可以利用一阶函数表示offset与关键字下标的关系,通过关键字下标与函数直接计算关键字的offset部分,也就是此时只需要记录函数中的关键系数与阶数,从而实现对关键字进行压缩。Specifically, a prefix compression method can be used to extract a common prefix (such as a volume number (LUN ID), a subnet access point identifier (snap ID), etc.) for keywords in metadata, and then compress the keywords. For another example, when the keywords in the multiple metadata in the subfile include offsets that change linearly or approximately linearly, a first-order function can also be used to represent the relationship between the offset and the keyword subscript, and the keyword subscript and the function can directly Calculate the offset part of the keyword, that is, only need to record the key coefficient and order in the function at this time, so as to realize the compression of the keyword.
另外,在一种实现方式中,本实施例中元数据中的关键字和值可以分别存储在从逻辑上或者从物理上划分出的两个不同的数据条目,可称为关键字数据条目和值数据条目。其中,两个不同的数据条目,可以为能够通过独立的写入操作来分别写入数据的两个数据条目。In addition, in an implementation manner, the keywords and values in the metadata in this embodiment can be stored in two different logically or physically divided data entries, which can be referred to as keyword data entries and Value data entry. Wherein, the two different data entries may be two data entries that can respectively write data through independent write operations.
如以存储系统中硬盘中的物理chunk为例,上述关键字数据条目和值数据条目,可以理解为两个不同的物理chunk。这样一来,在对值数据条目进行修改(例如,在将元数据中的值进行压缩后,将元数据中的值的压缩值更新至值数据条目中,即对值数据条目进行修改)时,不会对关键字数据条目产生影响。同样的,在对值数据条目进行写操作时,也不会对关键字数据条目产生影响。For example, taking the physical chunk in the hard disk in the storage system as an example, the above key data entry and value data entry can be understood as two different physical chunks. In this way, when the value data entry is modified (for example, after the value in the metadata is compressed, the compressed value of the value in the metadata is updated to the value data entry, that is, when the value data entry is modified) , has no effect on keyword data entries. Similarly, when the value data entry is written, the key data entry will not be affected.
需要说明的是,上述示例仅以关键字数据条目和值数据条目可以为不同的物理chunk为例,来对关键字数据条目和值数据条目所属的两个数据条目进行说明。在实际应用中,关键字数据条目和值数据条目所属的两个数据条目,还可以是其他粒度的存储空间单元,对此本实施例中可以不做限制。It should be noted that, in the above example, the keyword data entry and the value data entry may be different physical chunks as an example to illustrate the two data entries to which the keyword data entry and the value data entry belong. In practical applications, the two data entries to which the keyword data entry and the value data entry belong may also be storage space units of other granularity, which may not be limited in this embodiment.
示例性的,在通过S310和S311分别对子文件中包括的元数据中的关键字和值进行压缩之前,如图7所示,子文件中包括的元数据中的关键字(如图中Key1-Key10)和值(如图中Value1-Value10),分别存储在关键字数据条目和值数据条目中。Exemplarily, before the keywords and values in the metadata included in the subfile are respectively compressed by S310 and S311, as shown in FIG. 7 , the keywords in the metadata included in the subfile (Key1 -Key10) and values (Value1-Value10 in the figure), are stored in the key data entry and the value data entry respectively.
在通过S310和S311分别对子文件中包括的元数据中的关键字和值进行压缩后,关键字数据条目用于存储子文件中元数据中的关键字的压缩值,值数据条目用于存储子文件中元数据中的值的压缩值。After the keywords and values in the metadata included in the subfile are compressed by S310 and S311, the keyword data entry is used to store the compressed value of the keyword in the metadata in the subfile, and the value data entry is used to store The compressed value of the value in the metadata in the subfile.
具体的,如图8所示,关键字数据条目和值数据条目分别包括标题部分(header)和内容部分(vlaue)。其中,关键字数据条目的内容部分用于记录关键字的压缩值;关键字数据条目的标题部分用于记录压缩关键字所采用的n种压缩算法,以及各压缩算法适用内容部分中哪些关键字。其中,关键字数据条目中内容部分可以按照树形结构组织,例如平衡+树(balanced+tree,B+tree)、自适应基数树(the adaptive radix tree,ARtree)等。Specifically, as shown in FIG. 8 , the keyword data entry and the value data entry respectively include a header part (header) and a content part (vlaue). Among them, the content part of the keyword data entry is used to record the compressed value of the keyword; the header part of the keyword data entry is used to record the n kinds of compression algorithms used to compress the keyword, and which keywords in the content part each compression algorithm applies to . Wherein, the content part in the keyword data entry can be organized according to a tree structure, such as balanced+tree (balanced+tree, B+tree), adaptive radix tree (the adaptive radix tree, ARtree), etc.
值数据条目包括标题部分和内容部分。其中,内容部分用于在不同序号的存储空间中记录实际地址的压缩值;标题部分用于记录压缩实际地址所采用的m种压缩算法,以及各压缩算法适用内容部分中哪些存储空间。A value data entry includes a header part and a content part. Among them, the content part is used to record the compressed value of the actual address in the storage space with different serial numbers; the header part is used to record the m compression algorithms used to compress the actual address, and which storage spaces in the content part each compression algorithm is applicable to.
例如,图8中,关键字数据条目的内容部分包括:key_1'至key_10',key_1'至key_10'是对10个用户数据的关键字key1-key10压缩后的压缩值。关键字数据条目的标题部分记录两种压缩算法:压缩算法1和压缩算法2,以及压缩算法1和压缩算法2分别适用的关键字范围(即压缩算法1对应key1-key5和压缩算法2对应key6-key10)。For example, in FIG. 8 , the content of the key data entry includes: key_1' to key_10', key_1' to key_10' are compressed values of keys key1-key10 of 10 user data. The header part of the keyword data entry records two compression algorithms: compression algorithm 1 and compression algorithm 2, and the range of keywords applicable to compression algorithm 1 and compression algorithm 2 respectively (that is, compression algorithm 1 corresponds to key1-key5 and compression algorithm 2 corresponds to key6 -key10).
值数据条目的内容部分包括序号为V1-V10的10个存储空间,分别存储10个用户数据的实际地址的压缩值。值数据条目的标题部分记录三种压缩算法:压缩算法13、压缩算 法14和压缩算法15,以及三种压缩算法分别适用的存储空间(即V1-V3、V4-V7和V8-V10)。The content part of the value data entry includes 10 storage spaces with serial numbers V1-V10, which respectively store the compressed values of the actual addresses of 10 user data. The header part of the value data entry records three compression algorithms: compression algorithm 13, compression algorithm 14, and compression algorithm 15, and the storage spaces (ie, V1-V3, V4-V7, and V8-V10) respectively applicable to the three compression algorithms.
其中,关键字数据条目的内容部分中关键字的压缩值分别指向值数据条目的内容部分的不同存储空间。例如,图8中,关键字数据条目中key_1'指向值数据条目中存储空间V1、key_2'指向值数据条目中存储空间V2,以此类推。Wherein, the compressed values of the keywords in the content part of the keyword data item respectively point to different storage spaces of the content part of the value data item. For example, in FIG. 8 , key_1' in the key data entry points to storage space V1 in the value data entry, key_2' points to storage space V2 in the value data entry, and so on.
示例性的,当根据用户数据的标识key1,访问用户数据时:先在关键字数据条目的标题部分中查找key1对应的压缩算法,如图8中key1对应的压缩算法为压缩算法11;之后,根据压缩算法11和key1,得到key1对应的的压缩值key_1';之后根据key_1'确定值数据条目中的存储空间V1;之后从值数据条目中读取存储空间V1中的value1_offset;之后通过读取值数据条目的标题部分,可知存储空间V1对应的压缩算法为压缩算法13;之后,根据压缩算法13和value1_offset,即可得到用户数据的实际地址,进而读取该实际地址中的数据,完成访问用户数据。Exemplarily, when accessing user data according to the identifier key1 of the user data: first look up the compression algorithm corresponding to key1 in the header part of the keyword data entry, as shown in Figure 8, the compression algorithm corresponding to key1 is compression algorithm 11; after that, According to the compression algorithm 11 and key1, the compressed value key_1' corresponding to key1 is obtained; then the storage space V1 in the value data entry is determined according to key_1'; then the value1_offset in the storage space V1 is read from the value data entry; and then by reading In the title part of the value data entry, it can be known that the compression algorithm corresponding to the storage space V1 is the compression algorithm 13; after that, according to the compression algorithm 13 and value1_offset, the actual address of the user data can be obtained, and then the data in the actual address can be read to complete the access User data.
上述实现方式中,通过将多个元数据(例如上述实现方式中一个子文件中的多个元数据)中关键字和值分开存储。这样一来,当需要对元数据的值进行压缩时,仅需要将压缩后得到的元数据的值的压缩值,更新至相应的值数据条目中原来的存储空间中即可,不需要对关键字数据条目进行修改,从而提高了压缩效率。In the above implementation manner, keywords and values in multiple metadata (for example, multiple metadata in a subfile in the above implementation manner) are stored separately. In this way, when the metadata value needs to be compressed, it is only necessary to update the compressed value of the metadata value obtained after compression to the original storage space in the corresponding value data entry, and there is no need to modify the key Word data entries are modified, which improves compression efficiency.
在另一种示例中,在不需要对元数据的关键字进行压缩的情况下,如图9所示关键字数据条目中可以仅存储元数据中的关键字(即未压缩的关键字);另一方面,对于值数据条目则依然采用与上述设计类似的方式存储。这样,也达到当需要对元数据的值重新进行压缩时,仅需要将重新压缩后得到的元数据的值的压缩值,更新至相应的值数据条目中原来的存储空间中即可,不需要对关键字数据条目进行修改,从而提高了压缩效率的效果。In another example, in the case where the keywords of the metadata do not need to be compressed, only the keywords in the metadata (that is, the uncompressed keywords) may be stored in the keyword data entry as shown in FIG. 9 ; On the other hand, the value data entry is still stored in a manner similar to the above design. In this way, when the metadata value needs to be recompressed, it is only necessary to update the compressed value of the metadata value obtained after recompression to the original storage space in the corresponding value data entry. Modifications are made to key data entries, which improves the effect of compression efficiency.
另外,在一种实现方式中,该方法还包括:In addition, in an implementation manner, the method further includes:
S312、周期性检测冷元数据文件的数据变化量是否超出变化阈值。S312. Periodically detect whether the data change amount of the cold metadata file exceeds a change threshold.
若冷元数据文件的数据变化量超出变化阈值,则执行S306;若没有超出变化阈值,则等待下一周期重新执行S312。If the data change amount of the cold metadata file exceeds the change threshold, execute S306; if it does not exceed the change threshold, wait for the next cycle and re-execute S312.
其中,冷元数据文件的数据变化量,具体可以指预设时间段内,冷元数据文件中元数据所对应的用户数据的变化量。本实施例中,冷元数据文件中的多个元数据被称为元数据集合。Wherein, the data change amount of the cold metadata file may specifically refer to the change amount of user data corresponding to the metadata in the cold metadata file within a preset time period. In this embodiment, multiple pieces of metadata in a cold metadata file are referred to as a metadata set.
示例性的,预设时间段可以为从上一次对冷元数据文件中元数据所对应的用户数据进行处理到当前时刻的这一段时间;再示例性的,预设时间段还可以为预设设定的固定时长。对于预设时间段的长度设定方式可以根据实际需要设置,对此本申请可以不做限制。另外,冷元数据文件中用户数据的变化量,可以为冷元数据文件中用户数据的变化次数,或者冷元数据文件中发生变化的用户数据的数据量大小。在实际应用中,技术人员可以根据实际需求,采用合适的参数来反映冷元数据文件中用户数据的变化量,对此本申请可以不做限制。Exemplarily, the preset time period may be the period from the last processing of the user data corresponding to the metadata in the cold metadata file to the current moment; further exemplary, the preset time period may also be preset Set a fixed duration. The method for setting the length of the preset time period may be set according to actual needs, and this application may not limit it. In addition, the change amount of the user data in the cold metadata file may be the number of changes of the user data in the cold metadata file, or the data volume of the changed user data in the cold metadata file. In practical applications, technicians may use appropriate parameters to reflect the variation of user data in the cold metadata file according to actual requirements, and this application may not limit this.
上述实现方式中,考虑到在存储系统运行过程中,存储系统会不断收到新的数据写入请求,用于写入新的用户数据或修改在前的用户数据,因此即便是冷元数据文件中的元数据也可能逐渐发生变化。因此,上述实现方式中,通过周期性检测冷元数据文件的变化是否超出变化阈值,当确定超出变化阈值时,则执行S306,以通过S306-S311中相应的技术手段,再次对元数据中的值进行压缩。In the above implementation, considering that during the operation of the storage system, the storage system will continuously receive new data write requests for writing new user data or modifying previous user data, so even cold metadata files Metadata in may also change gradually. Therefore, in the above-mentioned implementation, by periodically detecting whether the change of the cold metadata file exceeds the change threshold, when it is determined that the change threshold is exceeded, S306 is executed, so that the corresponding technical means in S306-S311 are used to update the metadata file again. The value is compressed.
另外,在一种实现方式中,该方法还包括:对热元数据文件中的元数据进行压缩。In addition, in an implementation manner, the method further includes: compressing the metadata in the hot metadata file.
例如,对热元数据文件中元数据的关键字采用前缀压缩或斜率压缩;另外,当热元数据文件中元数据的值,没有规律性时,则可以不对元数据的值进行压缩。For example, prefix compression or slope compression is used for the metadata keywords in the hot metadata file; in addition, when the metadata value in the hot metadata file has no regularity, the metadata value may not be compressed.
上述实现方式中,考虑到存储系统中热数据被修改的可能性高,因此热数据的元数据中的值发生改变的可能性就高,因此如果对热数据也采用对用户数据进行处理的这一方式对热数据的元数据进行压缩,则会导致压缩效率低下。特别是在采用追加写或写时重定向(redirect-on-write,ROW)方式进行用户数据写入的场景下,由于用户数据每次被修改时,都会将该用户数据的修改后的数据内容存入新的实际地址,因此上述问题会更加明显。In the above implementation, considering that the hot data in the storage system is likely to be modified, the possibility of changing the value of the metadata of the hot data is high. Compressing the metadata of hot data in one way will lead to low compression efficiency. Especially in the scenario where user data is written by means of additional write or redirect-on-write (ROW), each time the user data is modified, the modified data content of the user data will be Deposit a new physical address, so the above problem will be more obvious.
例如,在采用追加写或ROW方式进行用户数据写入的场景下,可能出现:刚通过上述S308-310的方式对某个用户数据(该用户数据为热数据)的元数据中的值进行压缩,用户数据的实际地址就因为用户数据发生修改而改变,这就相当于对用户数据的元数据中的值进行压缩的过程是没有意义的。For example, in the scenario where user data is written in the append or ROW mode, it may occur that: the value in the metadata of a certain user data (the user data is hot data) has just been compressed by the method of S308-310 above , the actual address of the user data changes due to the modification of the user data, which is equivalent to the process of compressing the value in the metadata of the user data, which is meaningless.
因此,本实例中,采用先将元数据分为冷元数据文件和热元数据文件(即上述S304),之后一方面按照S308-310的过程,对冷元数据文件中元数据的值进行压缩,另一方面对于热元数据文件中的元数据的值可以不压缩。这样一来,可以提高元数据压缩的效率。Therefore, in this example, the metadata is first divided into cold metadata files and hot metadata files (that is, the above S304), and then on the one hand, according to the process of S308-310, the value of the metadata in the cold metadata files is compressed , on the other hand, the metadata value in the hot metadata file may not be compressed. In this way, the efficiency of metadata compression can be improved.
当然,在另外一些场景中,也可以不区分冷数据和热数据,即不执行上述S304的内容,而是对L 2层中的元数据作为一个整体,对这个整体中的全部或部分元数据,采用对用户数据进行迁移等处理,以使得元数据的值符合设定规律,进而对全部或部分元数据的值进行压缩。对此,本实例可以不做限制。 Of course, in some other scenarios, cold data and hot data may not be distinguished, that is, the content of S304 above is not executed, but the metadata in the L2 layer is taken as a whole, and all or part of the metadata in this whole , adopting processes such as migration of user data, so that the value of the metadata conforms to the set rule, and then compresses all or part of the value of the metadata. In this regard, there is no limitation in this example.
另外,上述实施例中,主要是在LSM树中的L 1层的元数据向L 2层合并的场景下,在L 2层进行元数据压缩的过程,对本申请所提供元数据压缩的方法进行介绍。在实际应用过程中,该方法也可以应用于对其他数据结构的元数据进行压缩,例如,该方法还可以用于在LSM树中其他存储层进行元数据压缩,或者该方法还可以应用于除LSM树之外的其他数据结构的元数据进行压缩。 In addition, in the above - mentioned embodiment, the metadata compression process is performed at the L2 layer mainly in the scenario where the metadata of the L1 layer in the LSM tree is merged to the L2 layer, and the metadata compression method provided by the present application is carried out. introduce. In the actual application process, this method can also be applied to compress metadata of other data structures, for example, this method can also be used to perform metadata compression on other storage layers in the LSM tree, or this method can also be applied to other than Metadata for data structures other than the LSM tree are compressed.
另外,上述实施例中,主要是通过对元数据所对应的用户数据进行迁移,以将多个元数据对应的用户数据存储至连续的存储空间中,从而使得多个用户数据的实际地址呈现出规律性,以便对这些用户数据的元数据中的值进行压缩。In addition, in the above embodiment, the user data corresponding to the metadata is mainly migrated to store the user data corresponding to the metadata in a continuous storage space, so that the actual addresses of the multiple user data appear as regularity to compress the values in the metadata of these user data.
在另一些实施例中,如果用户数据的实际地址是用户数据在LUN中所对应的逻辑块组(chunk group)中的地址,或者是用户数据在硬盘中的物理块(chunk)中的地址时,还可以不对用户数据进行迁移,来使得多个用户数据的实际地址呈现出规律性。例如,可以通过对用户数据所存储的存储空间的实际地址进行修改,以使得多个用户数据的实际地址呈现出规律性,也就是使得多个用户数据的元数据中的值符合设定规律,从而对这些用户数据的元数据中的值进行压缩。然后,再将修改前用户数据的实际地址与底层物理地址之间的映射关系更新为修改后的用户数据的实际地址与底层物理地址之间的映射关系。In some other embodiments, if the actual address of the user data is the address in the logical block group (chunk group) corresponding to the user data in the LUN, or the address in the physical block (chunk) of the user data in the hard disk , the user data may not be migrated, so that the actual addresses of multiple user data show regularity. For example, the actual address of the storage space where the user data is stored can be modified so that the actual addresses of multiple user data show regularity, that is, the values in the metadata of multiple user data conform to the set rule, The values in the metadata of these user data are thus compressed. Then, the mapping relationship between the actual address of the user data before modification and the underlying physical address is updated to the mapping relationship between the actual address of the user data after modification and the underlying physical address.
举个具体的例子,当元数据中的值是指用户数据所在存储空间的chunk group ID和chunk group中的offset时。一种可能的设计中,可以通过修改chunk group ID和chunk group中offset的映射规则,使得多个用户数据的存储空间的chunk group ID和chunk group中的offset呈现规律性,例如使得多个用户数据的存储空间的chunk group ID和chunk group中的offset连续,这样一来可以实现对这些用户数据的元数据中的值进行压缩。To give a specific example, when the value in the metadata refers to the chunk group ID of the storage space where the user data resides and the offset in the chunk group. In a possible design, by modifying the mapping rules of chunk group IDs and offsets in chunk groups, the chunk group IDs and offsets in chunk groups in the storage space of multiple user data can be regularized, for example, making multiple user data The chunk group ID of the storage space and the offset in the chunk group are continuous, so that the values in the metadata of these user data can be compressed.
另外,本实施例还提供一种元数据压缩装置,该元数据压缩装置可以用于执行本实施例上述元数据压缩方法中的部分或全部步骤。In addition, this embodiment also provides a metadata compression device, which can be used to perform some or all of the steps in the above-mentioned metadata compression method of this embodiment.
可以理解的是,为了实现上述元数据压缩方法中功能,元数据压缩装置包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本实施例中描述的各示例的单元及方法步骤,本实施例所提供技术方案能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。It can be understood that, in order to realize the functions in the above metadata compression method, the metadata compression device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that, in combination with the units and method steps described in each example in this embodiment, the technical solutions provided in this embodiment can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
在本实施例中,该元数据压缩装置可以位于存储系统中用于管理元数据的硬件设备中。例如,元数据压缩装置位于集中式存储系统中的引擎中。再例如,元数据压缩装置位于分布式存储系统中具有管理元数据功能的存储服务器中。In this embodiment, the metadata compression apparatus may be located in a hardware device used to manage metadata in the storage system. For example, metadata compressors are located in engines in centralized storage systems. For another example, the metadata compression device is located in a storage server with a metadata management function in the distributed storage system.
图10为本申请提供的一种元数据压缩装置的结构示意图。该元数据压缩装置40包括获取单元401、处理单元402以及压缩单元403。该元数据压缩装置用于实现上述图3所述的方法中部分或全部步骤的功能。FIG. 10 is a schematic structural diagram of a metadata compression device provided by the present application. The metadata compression device 40 includes an acquisition unit 401 , a processing unit 402 and a compression unit 403 . The metadata compression device is used to realize the functions of some or all steps in the method described above in FIG. 3 .
例如,获取单元401用于执行图3中S301和S306中的一项或多项。处理单元402用于执行图3中S302-S305以及S307-S309以及S312中的一项或多项。压缩单元403用于执行图3中S310和S311中的一项或多项。For example, the acquiring unit 401 is configured to execute one or more items of S301 and S306 in FIG. 3 . The processing unit 402 is configured to execute one or more items of S302-S305, S307-S309, and S312 in FIG. 3 . The compression unit 403 is configured to execute one or more items of S310 and S311 in FIG. 3 .
有关上述获取单元401、处理单元402以及压缩单元403更详细的描述,可以直接参考图3所示的方法中相关描述,这里不再赘述。For a more detailed description of the acquisition unit 401 , the processing unit 402 and the compression unit 403 , you can directly refer to the relevant description in the method shown in FIG. 3 , which will not be repeated here.
图11为本申请提供的一种芯片的结构示意图。该芯片50用于实现本申请所提供的元数据压缩方法。具体的,芯片可以为用于实现引擎121中控制器的功能的芯片。其中,该芯片50包括:FIG. 11 is a schematic structural diagram of a chip provided by the present application. The chip 50 is used to implement the metadata compression method provided in this application. Specifically, the chip may be a chip used to realize the functions of the controller in the engine 121 . Wherein, the chip 50 includes:
处理器501用于执行本申请所提供的元数据压缩方法。The processor 501 is configured to execute the metadata compression method provided in this application.
具体的,处理器501可以包含通用中央处理器(central processing unit,CPU)和存储器,处理器501还可以为微处理器、现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)或者特定应用集成电路(application-specific integrated circuit,ASIC)等。在处理器501包含CPU和存储器的场景中,CPU执行存储器中存储的计算机指令,用于执行本申请所提供的元数据压缩方法。Specifically, the processor 501 may include a general-purpose central processing unit (central processing unit, CPU) and a memory, and the processor 501 may also be a microprocessor, a field programmable gate array (Field Programmable Gate Array, FPGA) or a specific application integration Circuit (application-specific integrated circuit, ASIC), etc. In the scenario where the processor 501 includes a CPU and a memory, the CPU executes computer instructions stored in the memory to execute the metadata compression method provided in this application.
另外,该芯片50还可以包括:存储器502。存储器502中存储有计算机指令,处理器501执行存储器中存储的计算机指令,用于执行本申请所提供的元数据压缩方法。In addition, the chip 50 may further include: a memory 502 . Computer instructions are stored in the memory 502, and the processor 501 executes the computer instructions stored in the memory to execute the metadata compression method provided in this application.
具体的,存储器502可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的程序代码并能够由计算机存取的任何其他介质,但不限于此。Specifically, the memory 502 may be a read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, or a random access memory (random access memory, RAM) that can store information and instructions Other types of dynamic storage devices can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage , optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store program codes in the form of instructions or data structures and can be used by Any other medium accessed by a computer, but not limited to.
另外,该芯片50还可以包括:接口503。接口503可以用于接收和发送数据。接口502可以为通信接口或收发器等。In addition, the chip 50 may further include: an interface 503 . Interface 503 can be used to receive and send data. The interface 502 may be a communication interface or a transceiver or the like.
另外,芯片50还可以包括通信线路504。例如,通信线路504可以为数据总线,用于在上述组件之间传输信息。In addition, the chip 50 may further include a communication line 504 . For example, communication line 504 may be a data bus for transferring information between the aforementioned components.
关于上述元数据压缩装置40和芯片50更详细的描述可以直接参考上述元数据压缩方法中相关描述,这里不再赘述。For a more detailed description of the above-mentioned metadata compression apparatus 40 and chip 50 , reference may be made directly to relevant descriptions in the above-mentioned metadata compression method, which will not be repeated here.
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM、闪存、ROM、PROM、EPROM、EEPROM、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。The method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage medium known in the art . An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC. In addition, the ASIC can be located in a network device or a terminal device. Certainly, the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,SSD。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices. The computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. The available medium may be a magnetic medium, such as a floppy disk, a hard disk, or a magnetic tape; it may also be an optical medium, such as a digital video disc (digital video disc, DVD); it may also be a semiconductor medium, such as an SSD.
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。In each embodiment of the present application, if there is no special explanation and logical conflict, the terms and/or descriptions between different embodiments are consistent and can be referred to each other, and the technical features in different embodiments are based on their inherent Logical relationships can be combined to form new embodiments.
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上,其它量词与之类似。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,对于单数形式“a”,“an”和“the”出现的元素(element),除非上下文另有明确规定,否则其不意味着“一个或仅一个”,而是意味着“一个或多于一个”。例如,“a device”意味着对一个或多个这样的device。再者,至少一个(at least one of).......”意味着后续关联对象中的一个或任意组合,例如“A、B和C中的至少一个”包括A,B,C,AB,AC,BC,或ABC。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。In the present application, "at least one" means one or more, "multiple" means two or more, and other quantifiers are similar. "And/or" describes the association relationship of associated objects, indicating that there may be three kinds of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. Furthermore, the singular forms "a", "an" and "the" do not mean "one or only one" but "one or more" unless the context clearly dictates otherwise. in one". For example, "a device" means reference to one or more such devices. Furthermore, at least one (at least one of)......." means one or any combination of subsequent associated objects, such as "at least one of A, B and C" includes A, B, C, AB, AC, BC, or ABC. In the text description of the application, the character "/" generally indicates that the front and rear related objects are a kind of "or" relationship; in the formula of the application, the character "/" indicates that the front and rear Associated objects are a "division" relationship.
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。It can be understood that the various numbers involved in the embodiments of the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application. The size of the serial numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic.

Claims (26)

  1. 一种元数据压缩方法,其特征在于,包括:A method for compressing metadata, comprising:
    获取n个元数据,一个元数据包括一个键值对,所述键值对包括关键字和值,所述关键字用于指示所述元数据对应的数据的标识,所述值用于指示所述数据存储的实际地址,所述n为大于1的正整数;Acquiring n pieces of metadata, one piece of metadata includes a key-value pair, the key-value pair includes a keyword and a value, the keyword is used to indicate the identity of the data corresponding to the metadata, and the value is used to indicate the The actual address of the data storage, the n is a positive integer greater than 1;
    对所述n个元数据中至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,m为小于等于n的正整数;Processing m pieces of data corresponding to at least part of the metadata in the n pieces of metadata to obtain n target values corresponding to the n pieces of metadata conforming to the set rule, where m is a positive integer less than or equal to n;
    对所述n个目标值进行压缩。Compress the n target values.
  2. 根据权利要求1所述的方法,其特征在于,所述符合设定规律的n个目标值所指示的n个实际地址是连续的。The method according to claim 1, characterized in that the n actual addresses indicated by the n target values conforming to the set rules are continuous.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,包括:The method according to claim 2, characterized in that the m data corresponding to the at least part of the metadata are processed to obtain n target values corresponding to the n metadata conforming to the set rules, include:
    将所述m个数据进行迁移,以将所述n个元数据对应的数据存储至一段实际地址连续的存储空间中;Migrating the m pieces of data, so as to store the data corresponding to the n pieces of metadata in a storage space with continuous actual addresses;
    将所述n个数据存储在所述连续的存储空间中的实际地址保存为所述n个目标值。saving the actual addresses of the n data stored in the continuous storage space as the n target values.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    根据元数据集合所包含的多个元数据所对应的数据的冷热程度,从所述多个元数据中选择所述n个元数据,所述n个元数据所对应的数据是冷数据。The n pieces of metadata are selected from the multiple pieces of metadata according to the hotness and coldness of the data corresponding to the pieces of metadata included in the metadata set, and the data corresponding to the n pieces of metadata is cold data.
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述n个元数据为结构化合成LSM树中第一存储层中的元数据;所述LSM树用于存储元数据;所述LSM树包括多个存储层,所述多个存储层包括所述第一存储层。The method according to any one of claims 1-3, wherein the n pieces of metadata are metadata in the first storage layer in a structured synthetic LSM tree; the LSM tree is used to store metadata; The LSM tree includes a plurality of storage tiers, and the plurality of storage tiers includes the first storage tier.
  6. 根据权利要求1-3任一项所述的方法,其特征在于,所述关键字和值分别存储在两个数据条目中。The method according to any one of claims 1-3, wherein the key and the value are respectively stored in two data entries.
  7. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    检测元数据集合的数据变化量;所述元数据集合用于记录多个数据的元数据;Detecting the amount of data change in the metadata set; the metadata set is used to record metadata of multiple data;
    所述获取n个元数据,包括:在确定所述元数据集合的数据变化量超出变化阈值后,获取所述元数据集合中包括的所述n个元数据。The acquiring n pieces of metadata includes: acquiring the n pieces of metadata included in the metadata set after determining that the amount of data change in the metadata set exceeds a change threshold.
  8. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    获取所述n个元数据所对应的数据的实际地址的离散程度;Obtain the degree of dispersion of the actual address of the data corresponding to the n pieces of metadata;
    所述对所述n个元数据中至少部分元数据所对应的m个数据进行处理,包括:在确定所述离散程度大于离散阈值后,对所述n个元数据中至少部分元数据所对应的m个数据进行处理。The processing of m data corresponding to at least some of the n metadata includes: after determining that the degree of dispersion is greater than a discrete threshold, processing the m data corresponding to at least some of the n metadata The m data for processing.
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法应用于集中式存储系统,所述方法由所述集中式存储系统中的引擎执行。The method according to any one of claims 1-8, wherein the method is applied to a centralized storage system, and the method is executed by an engine in the centralized storage system.
  10. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法应用于分布式存储系统,所述分布式存储系统中包括多个存储服务器,所述方法由所述多个存储服务器中的一个或多个存储服务器执行。The method according to any one of claims 1-8, wherein the method is applied to a distributed storage system, the distributed storage system includes multiple storage servers, and the method consists of the multiple storage servers One or more storage servers in the server execute.
  11. 一种元数据压缩装置,其特征在于,包括:A metadata compression device is characterized in that it comprises:
    获取单元,用于获取n个元数据,一个元数据包括一个键值对,所述键值对包括关键 字和值,所述关键字用于指示所述元数据对应的数据的标识,所述值用于指示所述数据存储的实际地址,所述n为大于1的正整数;An acquisition unit, configured to acquire n pieces of metadata, one piece of metadata includes a key-value pair, the key-value pair includes a keyword and a value, the keyword is used to indicate the identity of the data corresponding to the metadata, the The value is used to indicate the actual address of the data storage, and the n is a positive integer greater than 1;
    处理单元,用于对所述n个元数据中至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,m为小于等于n的正整数;A processing unit, configured to process m data corresponding to at least some of the n metadata, to obtain n target values corresponding to the n metadata conforming to a set rule, where m is less than or equal to n a positive integer;
    压缩单元,用于对所述n个目标值进行压缩。A compression unit, configured to compress the n target values.
  12. 根据权利要求11所述的装置,其特征在于,包括所述符合设定规律的n个目标值所指示的n个实际地址是连续的。The device according to claim 11, characterized in that the n actual addresses indicated by the n target values conforming to the set rule are continuous.
  13. 根据权利要求12所述的装置,其特征在于,所述处理单元,用于对所述至少部分元数据所对应的m个数据进行处理,得到符合设定规律的所述n个元数据对应的n个目标值,包括:The device according to claim 12, wherein the processing unit is configured to process the m pieces of data corresponding to the at least part of the metadata, and obtain the data corresponding to the n pieces of metadata conforming to the set rule. n target values, including:
    所述处理单元,具体用于将所述m个数据进行迁移,以将所述n个元数据对应的数据存储至一段实际地址连续的存储空间中;The processing unit is specifically configured to migrate the m pieces of data, so as to store the data corresponding to the n pieces of metadata in a storage space with continuous actual addresses;
    所述处理单元,具体用于将所述n个数据存储在所述连续的存储空间中的实际地址保存为所述n个目标值。The processing unit is specifically configured to save the actual addresses of the n data stored in the continuous storage space as the n target values.
  14. 根据权利要求11-13任一项所述的装置,其特征在于,所述处理单元,还用于根据元数据集合所包含的多个元数据所对应的数据的冷热程度,从所述多个元数据中选择所述n个元数据,所述n个元数据所对应的数据是冷数据。The device according to any one of claims 11-13, wherein the processing unit is further configured to select from the multiple The n pieces of metadata are selected from the pieces of metadata, and the data corresponding to the n pieces of metadata is cold data.
  15. 根据权利要求11-13任一项所述的装置,其特征在于,所述n个元数据为结构化合成LSM树中第一存储层中的元数据;所述LSM树用于存储元数据;所述LSM树包括多个存储层,所述多个存储层包括所述第一存储层。The device according to any one of claims 11-13, wherein the n pieces of metadata are metadata in the first storage layer in the structured synthetic LSM tree; the LSM tree is used to store metadata; The LSM tree includes a plurality of storage tiers, and the plurality of storage tiers includes the first storage tier.
  16. 根据权利要求11-13任一项所述的装置,其特征在于,所述关键字和值分别存储在两个数据条目中。The device according to any one of claims 11-13, wherein the key and the value are stored in two data entries respectively.
  17. 根据权利要求11-13任一项所述的装置,其特征在于,所述处理单元,还用于检测元数据集合的数据变化量;所述元数据集合用于记录多个数据的元数据;The device according to any one of claims 11-13, wherein the processing unit is further configured to detect a data change amount of a metadata set; the metadata set is used to record metadata of multiple data;
    所述获取单元,用于获取n个元数据,包括:所述获取单元,具体用于在确定所述元数据集合的数据变化量超出变化阈值后,获取所述元数据集合中包括的所述n个元数据。The acquisition unit is configured to acquire n pieces of metadata, including: the acquisition unit is specifically configured to acquire the metadata included in the metadata set after determining that the amount of data change in the metadata set exceeds a change threshold. n metadata.
  18. 根据权利要求11-13任一项所述的装置,其特征在于,所述获取单元,还用于获取所述n个元数据所对应的数据的实际地址的离散程度;The device according to any one of claims 11-13, wherein the acquiring unit is further configured to acquire the degree of discreteness of the actual address of the data corresponding to the n pieces of metadata;
    所述处理单元,用于对所述n个元数据中至少部分元数据所对应的m个数据进行处理,包括:所述处理单元,具体用于在确定所述离散程度大于离散阈值后,对所述n个元数据中至少部分元数据所对应的m个数据进行处理。The processing unit is configured to process m pieces of data corresponding to at least part of the metadata in the n pieces of metadata, including: the processing unit is specifically configured to, after determining that the degree of dispersion is greater than a dispersion threshold, process m data corresponding to at least part of the n metadata are processed.
  19. 根据权利要求11-13任一项所述的装置,其特征在于,所述元数据压缩装置位于集中式存储系统中的引擎中。The device according to any one of claims 11-13, wherein the metadata compression device is located in an engine in a centralized storage system.
  20. 根据权利要求11-13任一项所述的装置,其特征在于,所述元数据压缩装置位于分布式存储系统中的存储服务器中。The device according to any one of claims 11-13, wherein the metadata compression device is located in a storage server in a distributed storage system.
  21. 一种存储设备,其特征在于,包括存储器和处理器,所述存储器用于存储计算机指令,所述处理器用于从所述存储器中调用并运行所述计算机指令,以实现如权利要求1-10中任一项所述的方法。A storage device, characterized by comprising a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to realize claims 1-10 any one of the methods described.
  22. 一种存储系统,其特征在于,包括引擎和多个硬盘,所述多个硬盘用于存储数据, 所述引擎用于执行如权利要求1-10任一项所述的方法。A storage system, characterized by comprising an engine and a plurality of hard disks, the plurality of hard disks are used to store data, and the engine is used to execute the method according to any one of claims 1-10.
  23. 一种存储系统,其特征在于,包括多个存储服务器,所述多个存储服务器用于存储数据,所述多个存储服务器中的第一服务器用于执行如权利要求1-10任一项所述的方法。A storage system, characterized by comprising multiple storage servers, the multiple storage servers are used to store data, and the first server among the multiple storage servers is used to perform the described method.
  24. 一种芯片,其特征在于,包括存储器和处理器,所述存储器用于存储计算机指令,所述处理器用于从所述存储器中调用并运行所述计算机指令,以实现如权利要求1-10中任一项所述的方法。A chip, characterized in that it includes a memory and a processor, the memory is used to store computer instructions, and the processor is used to call and run the computer instructions from the memory, so as to implement claims 1-10 any one of the methods described.
  25. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序,当所述计算机程序被处理器执行时,实现如权利要求1-10任一项所述的方法。A computer-readable storage medium, wherein a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1-10 is realized.
  26. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令在处理器上运行时,实现如权利要求1-10任一项所述的方法。A computer program product, characterized in that the computer program product includes instructions, and when the instructions are run on a processor, the method according to any one of claims 1-10 is implemented.
PCT/CN2022/077759 2021-06-25 2022-02-24 Metadata compression method and apparatus WO2022267508A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110710701 2021-06-25
CN202110710701.4 2021-06-25
CN202110944078.9A CN115525209A (en) 2021-06-25 2021-08-17 Metadata compression method and device
CN202110944078.9 2021-08-17

Publications (1)

Publication Number Publication Date
WO2022267508A1 true WO2022267508A1 (en) 2022-12-29

Family

ID=84545054

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077759 WO2022267508A1 (en) 2021-06-25 2022-02-24 Metadata compression method and apparatus

Country Status (1)

Country Link
WO (1) WO2022267508A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909490A (en) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, device, storage system and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202138A (en) * 2015-06-01 2016-12-07 三星电子株式会社 Storage device and method for autonomous space compression
US20190121581A1 (en) * 2015-06-01 2019-04-25 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
US20190188291A1 (en) * 2017-12-15 2019-06-20 Western Digital Technologies, Inc. Utilization of Optimized Ordered Metadata Structure for Container-Based Large-Scale Distributed Storage
CN111309270A (en) * 2020-03-13 2020-06-19 清华大学 Persistent memory key value storage system
US20200333968A1 (en) * 2019-04-17 2020-10-22 Oath Inc. Method and system for key-value storage
CN112099725A (en) * 2019-06-17 2020-12-18 华为技术有限公司 Data processing method and device and computer readable storage medium
CN112131140A (en) * 2020-09-24 2020-12-25 北京计算机技术及应用研究所 SSD-based key value separation storage method supporting efficient storage space management

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202138A (en) * 2015-06-01 2016-12-07 三星电子株式会社 Storage device and method for autonomous space compression
US20190121581A1 (en) * 2015-06-01 2019-04-25 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
US20190188291A1 (en) * 2017-12-15 2019-06-20 Western Digital Technologies, Inc. Utilization of Optimized Ordered Metadata Structure for Container-Based Large-Scale Distributed Storage
US20200333968A1 (en) * 2019-04-17 2020-10-22 Oath Inc. Method and system for key-value storage
CN112099725A (en) * 2019-06-17 2020-12-18 华为技术有限公司 Data processing method and device and computer readable storage medium
CN111309270A (en) * 2020-03-13 2020-06-19 清华大学 Persistent memory key value storage system
CN112131140A (en) * 2020-09-24 2020-12-25 北京计算机技术及应用研究所 SSD-based key value separation storage method supporting efficient storage space management

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116909490A (en) * 2023-09-11 2023-10-20 腾讯科技(深圳)有限公司 Data processing method, device, storage system and computer readable storage medium
CN116909490B (en) * 2023-09-11 2024-01-05 腾讯科技(深圳)有限公司 Data processing method, device, storage system and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10761758B2 (en) Data aware deduplication object storage (DADOS)
US20240012714A1 (en) Indirect Dataset Replication To Cloud-Based Targets
US11016955B2 (en) Deduplication index enabling scalability
US10303797B1 (en) Clustering files in deduplication systems
CN106066896B (en) Application-aware big data deduplication storage system and method
TWI778157B (en) Ssd, distributed data storage system and method for leveraging key-value storage
US9268502B2 (en) Dense tree volume metadata organization
US8712963B1 (en) Method and apparatus for content-aware resizing of data chunks for replication
US10210188B2 (en) Multi-tiered data storage in a deduplication system
CN105683898A (en) Set-associative hash table organization for efficient storage and retrieval of data in a storage system
US11625169B2 (en) Efficient token management in a storage system
CN110908589B (en) Data file processing method, device, system and storage medium
US11334523B2 (en) Finding storage objects of a snapshot group pointing to a logical page in a logical address space of a storage system
CN113626431A (en) LSM tree-based key value separation storage method and system for delaying garbage recovery
CN113535670A (en) Virtual resource mirror image storage system and implementation method thereof
WO2022267508A1 (en) Metadata compression method and apparatus
US11232043B2 (en) Mapping virtual block addresses to portions of a logical address space that point to the virtual block addresses
US20150134625A1 (en) Pruning of server duplication information for efficient caching
CN111274259A (en) Data updating method for storage nodes in distributed storage system
WO2023050856A1 (en) Data processing method and storage system
WO2022262381A1 (en) Data compression method and apparatus
WO2024021488A1 (en) Metadata storage method and apparatus based on distributed key-value database
CN115438039A (en) Method and device for adjusting data index structure of storage system
CN115114294A (en) Self-adaption method and device of database storage mode and computer equipment
Klein et al. Dxram: A persistent in-memory storage for billions of small objects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22827015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE