WO2023125630A1 - 一种数据管理方法及相关装置 - Google Patents

一种数据管理方法及相关装置 Download PDF

Info

Publication number
WO2023125630A1
WO2023125630A1 PCT/CN2022/142699 CN2022142699W WO2023125630A1 WO 2023125630 A1 WO2023125630 A1 WO 2023125630A1 CN 2022142699 W CN2022142699 W CN 2022142699W WO 2023125630 A1 WO2023125630 A1 WO 2023125630A1
Authority
WO
WIPO (PCT)
Prior art keywords
key
layer
value pair
value
tree
Prior art date
Application number
PCT/CN2022/142699
Other languages
English (en)
French (fr)
Inventor
吴沛
董如良
涂剑洪
张进毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023125630A1 publication Critical patent/WO2023125630A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present application relates to the field of storage technology, and in particular to a data management method, device, cluster, computer-readable storage medium, and computer program product.
  • data may be stored in a key-value pair (key-value, KV) manner.
  • key-value pair is a data pair formed by a key name (denoted as key) and a key value (denoted as value).
  • the key name is usually a constant that defines the data set, such as gender, name, color, etc.
  • the key value is usually a variable in the data set, such as male/female, Zhang xx, green, etc.
  • LSM log Structured Merge
  • KeyLen the length of the key in the key-value pair (denoted as KeyLen) is not uniform, it is necessary to write an additional KeyLen every time the key is written.
  • a KeyLen generally needs to occupy 4 bytes (Byte, B) of storage space, so more storage space needs to be reserved to store the KeyLen, which increases the storage cost.
  • the present application provides a data management method. By converting the key name, the method unifies the length of the key name in the key-value pair to be written, so that each time the key name is stored, there is no need to additionally store the length of the key name. Save storage space and reduce storage costs.
  • the present application also provides a device, a computer cluster, a computer-readable storage medium, and a computer program product corresponding to the above method.
  • the present application provides a data management method.
  • the method can be executed by a storage system (for example, a centralized storage system or a distributed storage system).
  • the storage system obtains the metadata to be written, and converts the key-value pairs (for convenience of description, may be referred to as the first key-value pairs, which include the first key name and key value) for conversion, specifically converting the first key name into a second key name, the length of the second key name is a preset value, thereby generating a second key value including the above-mentioned second key name and key value Yes, and record the mapping relationship between the second key name and the first key name, and then the storage system stores the second key-value pair and the mapping relationship.
  • the storage system unifies the length of the key name in the key-value pair to be written by converting the key name, for example, converting the first key name into a second key name whose length is a preset value, thus, When the storage system writes a key name every time, it does not need to write the length of the key name additionally, which saves storage space and reduces storage costs.
  • the storage system may determine whether the length of the first key name is greater than the preset value, and if it is greater than the preset value, convert the first key name into the second key name, and the preset value is less than the length of the first key name.
  • the length of the key name after conversion is usually much smaller than the length of the key name before conversion, and storage space can be further saved and storage cost can be reduced by storing the converted key name.
  • the storage system may determine a hash identifier through a hash algorithm according to the first key name, and then generate a second key name according to the hash identifier. Considering that there may be hash collision or hash collision when some key names are hashed, the storage system can generate the second key name according to the hash identifier and the short identifier. Among them, the function of the short identifier is to distinguish the key names with the same hash identifier, so as to ensure the uniqueness of the key names.
  • the short identifier may be, for example, a randomly generated character string or a sequentially generated character string.
  • the storage system may perform hashing according to some or all characters of the first key name. Among them, the more characters involved in the hash, the longer the hash identifier, and the lower the possibility of hash collision. The fewer characters that participate in the hash, the more characters are reserved, and the longer the common prefix that can be extracted can further improve the compression rate and save storage space.
  • the second key name further includes a user identifier.
  • the user ID may be an ID of a user pool. For each user, a logical user pool can be maintained. The data to be written is written to the memory through the user pool, and then written from the memory to the persistent medium for persistent storage.
  • user data can be normalized without user isolation, which avoids waste of resources caused by repeated application for resources caused by isolating user pools.
  • the data to be written is metadata of stored data.
  • Metadata is the data describing the data
  • the metadata can be used to address the data, in order to facilitate data management, the metadata can be expressed as a key-value pair, so the metadata can be stored efficiently by using the data management method of this application, Avoid wasting storage space.
  • the stored data is an object in object storage
  • the first key name in the first key-value pair is the identifier of the object
  • the key in the first key-value pair The value includes the storage address of the object.
  • the storage system may form the second key-value pair and at least one third key-value pair into a prefix tree.
  • the key name of the third key-value pair has the same length as the key name of the second key-value pair
  • the key name of the third key-value pair and the key name of the second key-value pair include public prefix.
  • a common prefix refers to the same prefix that different strings have. For example, the key name "objectXXX.partXXX.dataXXX.437" and the key name "objectXXX.partXYZ.dataXXX.437” have a common prefix "objectXXX.partX” as follows.
  • the storage system may use the common prefix as the root node of the prefix tree, and a character string other than the common prefix in the second key-value pair and the at least one third key-value pair as the prefix An intermediate node of the tree, the key value is used as a leaf node of the prefix tree, and then the prefix tree is stored in a persistent medium.
  • the storage system only needs to store the common prefix once when storing the prefix tree, which reduces redundant data.
  • the compression rate is improved, the storage space is saved, and the storage cost is reduced.
  • the storage system may write the prefix tree into the first layer of the structured merged LSM tree.
  • the LSM tree includes L layers, and for any layer from the first layer to the L-1th layer in the L layers, when the number of trees included in the i-th layer reaches the preset value of the i-th layer When the threshold is reached, the preset number of trees in the i-th layer are merged, and the merged trees are written into the i+1-th layer, and the i and L are positive integers.
  • the preset number of trees in the L layer are merged into the final Tree. Then the storage system stores the trees included in each layer of the LSM tree to the persistent medium.
  • the preset threshold and preset number of each layer can be set according to experience values, and the preset threshold and preset number of each layer can be different.
  • the number of layers of the LSM tree should not be too large to avoid write amplification caused by frequent writing.
  • the prefix trees of each layer in the LSM tree are arranged in time sequence, and the storage system can execute concurrently when merging the trees of each layer, which can speed up the merging speed.
  • the storage system may write the values of the root node, intermediate node, and leaf node of the tree included in each layer of the LSM tree into the persistent medium, and generate and record the values of the root node, intermediate node and the index information of the relationship of the leaf nodes in the prefix tree or the final tree, and then write the index information into the persistent medium.
  • this method can effectively compress similar parts of different key names, save storage space, and reduce storage costs.
  • the present application provides a data management device.
  • the devices include:
  • An interaction module configured to obtain data to be written, the data to be written includes a first key-value pair, and the first key-value pair includes a first key name and a key value;
  • a conversion module configured to convert the first key name into a second key name, the length of the second key name is a preset value, and generate a second key-value pair, the second key-value pair includes the first The second key name and the key value, and record the mapping relationship between the second key name and the first key name;
  • a management module configured to store the second key-value pair and the mapping relationship.
  • the conversion module is specifically used for:
  • the conversion module is specifically used for:
  • a second key name is generated according to the hash identifier.
  • the second key name further includes a user identifier.
  • the data to be written is metadata of stored data.
  • the stored data is an object in object storage
  • the first key name in the first key-value pair is the identifier of the object
  • the key in the first key-value pair The value includes the storage address of the object.
  • the management module is specifically configured to:
  • prefix tree with the second key-value pair and at least one third key-value pair, wherein the key name of the third key-value pair has the same length as the key name of the second key-value pair, and the The key name of the third key-value pair and the key name of the second key-value pair include a common prefix, wherein the common prefix serves as the root node of the prefix tree, and the second key-value pair and the at least one Different character strings in the third key-value pair except the common prefix are used as intermediate nodes of the prefix tree, and the key values are used as leaf nodes of the prefix tree;
  • the management module is specifically configured to:
  • the LSM tree includes L layers, for any layer from the first layer to the L-1th layer in the L layer, when the When the number of trees included in the i-th layer reaches the preset threshold value of the i-th layer, the preset number of trees in the i-th layer are merged, and the merged tree is written into the i+1th layer, the i and L are positive integers;
  • the preset number of trees in the L layer are merged into the final Tree
  • the tree included in each layer of the LSM tree is stored in the persistent medium.
  • the management module is specifically configured to:
  • the present application provides a computer cluster.
  • the computer cluster includes at least one computer, and the at least one computer includes a processor and a memory.
  • the processor and the memory communicate with each other.
  • the processor is configured to execute instructions stored in the memory, so that the computer cluster executes the data management method in the first aspect or any implementation manner of the first aspect.
  • the present application provides a computer-readable storage medium, where an instruction is stored in the computer-readable storage medium, and the instruction instructs the computer cluster to execute the above-mentioned first aspect or any implementation manner of the first aspect. data management method.
  • the present application provides a computer program product containing instructions, which, when run on a computer cluster, causes the computer cluster to execute the data management method described in the first aspect or any implementation manner of the first aspect .
  • FIG. 1 is a schematic structural diagram of a centralized storage system provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the architecture of a distributed storage system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a storage system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a data management device provided in an embodiment of the present application.
  • FIG. 5 is a flow chart of a data management method provided in an embodiment of the present application.
  • FIG. 6 is a schematic flow chart of a key-value pair conversion provided by an embodiment of the present application.
  • FIG. 7 is a schematic flow chart of merging prefix trees provided by an embodiment of the present application.
  • Fig. 8 is a schematic flow chart of a disk under a structured merging tree provided by the embodiment of the present application.
  • FIG. 9 is a schematic flow diagram of a data management method provided in an embodiment of the present application.
  • FIG. 10 is a schematic flow diagram of user pool normalization provided by an embodiment of the present application.
  • first and second in the embodiments of the present application are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • a centralized storage system refers to a central node composed of one or more master devices, where data is stored centrally, and all data processing services of the entire system are centrally deployed on this central node.
  • the terminal or client is only responsible for the input and output of data, while the storage and control processing of data is completely handed over to the central node.
  • the application server 100 may be a physical machine or a virtual machine. Physical machines include, but are not limited to, desktop computers, servers, laptops, and mobile devices.
  • the application server accesses the storage system through the switch 110 (such as a fiber optic switch) to access data.
  • the switch 110 is only an optional device, and the application server 100 can also directly communicate with the storage system 120 through the network.
  • the optical fiber switch 110 may also be replaced with an Ethernet switch, a wireless bandwidth (InfiniBand, IB) switch, a converged Ethernet-based remote memory direct access (RDMA over Converged Ethernet, RoCE) switch, and the like.
  • the storage system 120 shown in FIG. 1 is a centralized storage system.
  • the storage system 120 may receive data from the application server 100, and then store the data, for example, perform persistent storage, so as to implement data management.
  • the characteristic of the centralized storage system is that there is a unified entrance, and all data from external devices must pass through this entrance, and this entrance is the engine 121 of the centralized storage system.
  • the engine 121 is the most core component in the centralized storage system, where many advanced functions of the storage system are implemented.
  • controllers in the engine 121 there are one or more controllers in the engine 121 .
  • the engine includes two controllers as an example for illustration.
  • controller 0 fails, controller 1 can take over the business of controller 0.
  • controller 1 fails, controller 0 can take over the business of controller 1. business, so as to avoid the unavailability of the entire storage system 120 caused by hardware failure.
  • four controllers are deployed in the engine 121, there is a mirroring channel between any two controllers, so any two controllers are mutual backups.
  • the engine 121 also includes a front-end interface 125 and a back-end interface 126 , wherein the front-end interface 125 is used to communicate with the application server 100 to provide storage services for the application server 100 .
  • the back-end interface 126 is used to communicate with the hard disk 134 to expand the capacity of the storage system. Through the back-end interface 126, the engine 121 can be connected with more hard disks 134, thereby forming a very large storage resource pool.
  • the controller 0 includes at least a processor 123 and a memory 124 .
  • Processor 123 is a central processing unit (central processing unit, CPU), used for processing data access requests from outside the storage system (server or other storage systems), and also used for processing requests generated inside the storage system.
  • CPU central processing unit
  • the processor 123 receives the write data request sent by the application server 100 through the front-end port 125 , it will temporarily save the data in the write data request in the memory 124 .
  • the processor 123 sends the data stored in the memory 124 to the hard disk 134 for persistent storage through the back-end port.
  • the memory 124 refers to an internal memory directly exchanging data with the processor. It can read and write data at any time, and the speed is very fast. It is used as a temporary data storage for an operating system or other running programs.
  • Memory includes at least two kinds of memory, for example, memory can be either random access memory or read-only memory (Read Only Memory, ROM).
  • the random access memory is dynamic random access memory (Dynamic Random Access Memory, DRAM), or storage class memory (Storage Class Memory, SCM).
  • the read-only memory for example, it may be a programmable read-only memory (Programmable Read Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), and the like.
  • the memory 124 may also be a dual in-line memory module or a dual in-line memory module (Dual In-line Memory Module, DIMM), or a solid state disk (Solid State Disk, SSD). In practical applications, multiple memories 124 and different types of memories 124 may be configured in the controller 0 . This embodiment does not limit the quantity and type of the memory 113 .
  • a software program is stored in the memory 124, and the processor 123 runs the software program in the memory 124 to manage the hard disk.
  • the hard disk is abstracted into a storage resource pool, and then divided into logical unit blocks (logic unit number, LUN) for the server to use.
  • LUN logic unit number
  • the LUN here is actually the hard disk seen on the server.
  • some centralized storage systems are also file servers themselves, which can provide shared file services for servers.
  • controller 1 The hardware components and software structures of the controller 1 and other controllers not shown in FIG. 1 are similar to those of the controller 0 and will not be repeated here. It should be noted that only one engine 121 is shown in FIG. 1 , but in practical applications, the storage system may include two or more engines 121 , and redundancy or load balancing is performed among the multiple engines 121 .
  • Figure 1 shows a centralized storage system with integrated disk control.
  • the engine 121 has a hard disk slot, and the hard disk 134 can be directly deployed in the engine 121.
  • the back-end interface 126 is an optional configuration. When the storage space of the system is insufficient, more hard disks can be connected through the back-end interface 126. or disk enclosure.
  • the centralized storage system may also be a storage system with separate disk control.
  • the engine 121 may not have a hard disk slot, the hard disk 134 needs to be placed in a hard disk enclosure, and the back-end interface 126 communicates with the hard disk enclosure.
  • the back-end interface 126 exists in the engine 121 in the form of an adapter card, and one engine 121 can use two or more back-end interfaces 126 to connect multiple hard disk enclosures at the same time.
  • the adapter card can also be integrated on the motherboard, and at this time the adapter card can communicate with the processor 112 through a high-speed serial peripheral component interconnect (Peripheral Component Interconnect Express, PCI-E) bus.
  • PCI-E serial peripheral component interconnect Express
  • a distributed storage system refers to a system that stores data dispersedly on multiple independent storage nodes.
  • the distributed network storage system adopts a scalable system structure and uses multiple storage nodes to share the storage load. It not only improves the reliability, availability and access efficiency of the system, but is also easy to expand.
  • FIG. 2 shows an architecture diagram of a distributed storage system, and the distributed storage system includes a storage cluster.
  • the storage cluster includes one or more servers 110 (three servers 110 are shown in FIG. 2 , but not limited to three servers 110 ), and the servers 110 can communicate with each other.
  • the servers 110 in the storage cluster may receive data, for example, data from other servers 110, and then store the data persistently, so as to realize data management.
  • the server 110 is a device having both computing capability and storage capability, such as a server, a desktop computer, and the like.
  • a server a desktop computer
  • Advanced RISC Machine Advanced RISC Machine, ARM
  • X86 server X86 server
  • the server 110 includes at least a processor 112 , a memory 113 , a network card 114 and a hard disk 105 .
  • the processor 112, the memory 113, the network card 114 and the hard disk 105 are connected through a bus.
  • the processor 112 and the memory 113 are used to provide computing resources.
  • the processor 112 is a central processing unit (central processing unit, CPU), used for processing data access requests from outside the server 110 (application server or other servers 110), and also used for processing requests generated inside the server 110.
  • CPU central processing unit
  • the processor 112 receives the write data request, it will temporarily save the data in the write data request in the memory 113 .
  • the processor 112 sends the data stored in the memory 113 to the hard disk 105 for persistent storage.
  • the processor 112 is also used for computing or processing data, such as metadata management, deduplication, data compression, data verification, virtualized storage space, and address translation. Only one CPU 112 is shown in FIG. 2 . In practical applications, there are often multiple CPUs 112, and one CPU 112 has one or more CPU cores. This embodiment does not limit the number of CPUs and the number of CPU cores.
  • the memory 113 refers to an internal memory directly exchanging data with the processor. It can read and write data at any time, and the speed is very fast. It is used as a temporary data storage for an operating system or other running programs.
  • the hard disk 105 is used to provide storage resources, such as storing data.
  • the network card 114 is used to communicate with other servers 110 .
  • the distributed storage system shown in Figure 2 is a storage system with a storage-computing integrated architecture.
  • the distributed storage system can also adopt a storage-computing separation architecture, or a fully integrated architecture. The embodiment does not limit this.
  • the storage system may store configuration data, report data, etc., for use in subsequent data analysis.
  • configuration data and report data can usually be represented as key-value pairs.
  • the metadata of some data can also represent a key-value pair.
  • a file's size, creation time, creator, last edit time, etc. can also be represented as key-value pairs.
  • a storage system When a storage system (whether it is a centralized storage system or a distributed storage system) stores data or metadata represented as key-value pairs, it can store based on a log Structured Merge (LSM) tree. Specifically, the storage engine of the storage system may write the key-value pair into a memory table (MemTable) in the memory. When the size of the memory table exceeds the limit, the memory table can be set to read-only mode, and the storage engine can start the background flushing thread to write the key-value pairs in the memory table in read-only mode to the persistent medium. For example, write the key-value pairs in the memory table in read-only mode to high-speed media such as non-volatile memory (Non-Volatile Memory Express, NVMe).
  • NVMe non-Volatile Memory Express
  • Ceph uses the storage engine BlueStore to manage raw disks (also called block devices). Its main metadata is the Onode structure of the object.
  • the Onode structure is specifically a data structure including an object identifier (identifer, ID) and a storage address of the object, wherein the storage address of the object can be represented by an ExtentMap structure pointing to the logical location ⁇ offset, len> of the object.
  • ID object identifier
  • ExtentMap structure pointing to the logical location ⁇ offset, len> of the object.
  • the above Onode structure can form a key-value pair, the key in the key-value pair is the ID of the object, and the value in the key-value pair is the storage address of the object.
  • the ID lengths of different objects are usually not uniform.
  • the storage engine writes the ID of the object to the persistent medium, specifically the SST of the persistent medium, it usually needs to add an additional Write the length of the ID of the object (ie KeyLen). It can be seen that when the length of the key in the key-value pair is not uniform, the storage system needs to store KeyLen additionally every time the key is written, which increases the storage cost.
  • an embodiment of the present application provides a data management method.
  • the method may be executed by a storage system (for example, the centralized storage system in FIG. 1 or the distributed storage system in FIG. 2 ).
  • the storage system obtains the data to be written, and converts the key-value pairs (for convenience of description, referred to as the first key-value pairs, which include the first key-value pairs) Name and key value) for conversion, specifically converting the first key name into a second key name, the length of the second key name is a preset value, thereby generating a second key-value pair including the above-mentioned second key name and key value , and record the mapping relationship between the second key name and the first key name, and then the storage system stores the above-mentioned second key-value pair and the mapping relationship.
  • the key-value pairs for convenience of description, referred to as the first key-value pairs, which include the first key-value pairs
  • Name and key value for conversion, specifically converting the first key name into a second key name, the length
  • the storage system unifies the length of the key name in the key-value pair to be written by converting the key name, for example, converting the first key name into a second key name whose length is a preset value, thus, When the storage system writes a key name every time, it does not need to write the length of the key name additionally, which saves storage space and reduces storage costs. Moreover, the length of the key name after conversion is usually much smaller than the length of the key name before conversion, and storage space and storage cost can be further saved by storing the converted key name.
  • the storage system may implement a data management method through a data management device.
  • a data management device is deployed in the storage system.
  • the data management device may specifically be a software device.
  • the storage system implements data management by running a computer program corresponding to the software device.
  • the data management device can be used not only to manage data in the form of key-value pairs, but also to manage metadata in the form of key-value pairs. That is, the data management device can be used to implement the functions of the metadata management device.
  • the storage system may also include a data management device and a metadata management device. The data management device is used to manage the data itself in the form of a key-value pair, and the metadata management device is used to manage the key-value pair. Form metadata are managed.
  • the persistent medium of the storage system can be divided into a data area and a metadata area
  • the data area refers to an area for storing data
  • the metadata area refers to an area for storing metadata.
  • the data area may be a storage area provided by a hard disk
  • the metadata area may be a storage area provided by a high-speed medium.
  • the metadata area may also include a storage area provided by a hard disk
  • the data area may also include a storage area provided by a high-speed medium, which is not limited in this embodiment.
  • a user may perform an operation of accessing data, for example, a user A to a user N may perform an operation of storing data into a storage system or reading data from a storage system.
  • the storage system writes data into the data area through the data management device, and writes the metadata of the data into the metadata area through the metadata management device.
  • the storage system reads the metadata of the data from the metadata area, and the metadata can be used to address the data, and the storage system can read the data from the data area based on the metadata.
  • the storage system when the storage system writes data, it may first write the data, then write the metadata of the data into the memory, and then write the metadata from the memory to the persistent medium.
  • the storage system when the storage system reads data, it can first read the metadata from the persistent medium into the memory, and then read the data from the persistent medium into the memory according to the information indicated in the metadata.
  • the device 400 includes an interaction module 402 , a conversion module 404 and a management module 406 .
  • the interaction module 402 is used to obtain the data to be written, the data to be written includes a first key-value pair, and the first key-value pair includes a first key name and a key value;
  • the conversion module 404 is used to convert the first The key name is converted into a second key name, the length of the second key name is a preset value, and then a second key-value pair is generated, and the second key-value pair includes the second key name and key value, and records the relationship between the second key name and the first One-key name mapping relationship.
  • the management module 406 is used for storing the above-mentioned second key-value pair and mapping relationship.
  • the management module 406 has a long key name (long key) management function and a key-value pair management function.
  • the management module 406 can store the mapping relationship between the long key (for example, the first key name) and the converted short key (for example, the second key name) whose KeyLen is greater than the preset value based on the long key management function, so as to realize the long key key management.
  • the management module 406 may store the converted key-value pair (for example, the second key-value pair) based on the key-value pair management function, so as to implement key-value pair management.
  • the device 400 unifies the storage methods of fixed-length keys and variable-length keys through key name conversion, without additionally storing the key length KeyLen, reducing the storage cost of keys, thereby reducing the storage cost of key-value pairs.
  • the data acquired by the interaction module 402 may come from user pools of different users, for example, from user pool 1 to user N (denoted as pool 1 to pool N, respectively user 1 to user N user pool, the user pool refers to the storage pool of the logical plane), the data management device 400 can also normalize the data of different user pools, so as to uniformly manage the data of different user pools, and avoid repeated application for management resources ( For example, resources required to create SST).
  • the data management device 400 may further include a normalization module 408, which is configured to update the second key name according to the user identifier (for example, the identifier of the user pool from which the first key-value pair originates).
  • the normalization module 408 may add the identifier pool ID of the user pool to the second key name, thereby obtaining an updated second key name, and the updated second key name includes the pool ID.
  • the data management device 400 can also adopt a strategy of separating key-name and key-value storage for key-value pairs that occupy more storage space, so as to save high-speed media. storage space.
  • the data management apparatus 400 may further include a separation module 409 .
  • the separation module 409 separates the second key name in the second key-value pair from the key value.
  • the management module 406 can write the key value into a log, and use The key-value address replaces the key-value, and the replaced key-value pair is stored.
  • the management module 406 has a key value management function, and the management module can write the key value into a log based on the key value management function, such as a value log, and write it to the hard disk. Based on the key value pair management function, the address of the key value will be used to replace the key value.
  • the key-value pairs are written to high-speed media.
  • the data management device 400 can not only manage the key-value pairs formed by the ID of the object/storage address of the object, but also manage the attributes of the above objects, such as the object's Creation time, creator, object size, etc. are managed.
  • the management module 406 also provides an attribute management function for managing attributes.
  • the data management device 400 shown in FIG. 4 can also be used to manage metadata in the form of key-value pairs, and the management method of the metadata by the data management device 400 can refer to the management method of data.
  • metadata management may also be implemented by a metadata management device, wherein the structure of the metadata management device is similar to that of the data management device 400 , which will not be repeated here.
  • the method includes:
  • S502 The data management apparatus 400 acquires data to be written.
  • the data to be written may be data to be stored, or metadata of stored data.
  • the data to be written can be expressed as a key-value pair.
  • the data to be stored may be configuration data and report data, and the configuration data and report data may be expressed as key-value pairs.
  • the data to be stored may be metadata of stored data.
  • the metadata may be a key-value pair formed by the ID of the object and the storage address of the object.
  • the data to be written includes a first key-value pair, and the first key-value pair includes a first key name and a key value.
  • the first key name may be the ID of the object
  • the key value may be the storage address of the object.
  • this embodiment is only described by using object storage as an example. In other possible implementation manners of this embodiment of the present application, the stored data may also be stored using block storage or file storage.
  • the data management device 400 converts the first key name into a second key name, generates a second key-value pair, and records a mapping relationship between the second key name and the first key name.
  • the data management apparatus 400 may convert the key-value pairs that meet the conversion conditions.
  • the conversion condition can be that the length of the key name is greater than the preset value.
  • the preset value can be set according to empirical values, for example, it can be set to 12 bytes B or 20B.
  • the data management device 400 can determine whether the length of the first key name is greater than the preset length when receiving the first key-value pair (ie, the original key/value) input by the user value, so as to determine whether the first key-value pair meets the conversion condition.
  • the data management device 400 may use a conversion algorithm to convert the first key name of the first key-value pair into the second key name, thereby generating the second key-value pair.
  • the second key-value pair includes a second key name and a key value. Wherein, the length of the second key name is a preset value.
  • the data management apparatus 400 may not perform conversion. It should also be noted that, when the length of the key name of the first key-value pair is less than the preset value, the data management device 400 can fill in the key name of the first key-value pair so that the length of the key name after filling is equal to default value.
  • the data management apparatus 400 may perform key name conversion through a hash algorithm. Specifically, the data management device 400 can perform hash processing on the first key name (also referred to as the original key) through a hash algorithm to obtain a hash ID (hash id), based on which the second key name can be obtained. Considering the situation of hash collision, the data management device 400 can combine the short identifier (short id) on the basis of the hash id to obtain the second key name. Among them, the short id can be a randomly generated character string, or a sequentially generated character string.
  • the second key name may be, for example, a character string concatenated by hash id and short id.
  • the data management device 400 when the data management device 400 performs hash processing on the first key name through a hash algorithm, it may perform hash processing on some or all characters of the first key name. Considering that the key names of different key-value pairs are highly similar, the data management device 400 can hash some characters of the first key name, so that a longer common prefix (That is, different strings have the same prefix), thereby improving the compression rate and saving storage space.
  • the first key name is the ID of the object, and the ID of the object is objectXXX.partXXX.dataXXX.437.
  • the ID of the object is divided into four fields separated by ".”.
  • the first field "objectXXX” and the second field "partXXX” are highly similar to the corresponding fields of other key names.
  • the data management device 400 can hash other fields than the first field to obtain a hash ID, and then generate a second key name according to the first field, hash ID and short ID, and the second key name can be expressed as " objectXXX"+hash ID+short ID.
  • the data management device 400 may also perform hashing on fields other than the first field and the second field to obtain a hash ID, and then generate the first field according to the first field, the second field, the hash ID and the short ID Second key name, the second key name can be expressed as "objectXXX.partXXX"+hash ID+short ID.
  • the hash ID of the second type of second key name "objectXXX.partXXX"+hash ID+short ID is shorter, and the collision probability is relatively higher.
  • a longer public prefix can be extracted, thereby saving more storage space, and the hash ID in the first type of second key name is longer, and the collision probability is lower, so the uniqueness of the key name can be better guaranteed.
  • the data management device 400 may also record the mapping relationship between the second key name and the first key name.
  • the data management apparatus 400 may write the mapping relationship into long key management (for example, a cache for managing long keys) for subsequent persistent storage.
  • the data management apparatus 400 stores the second key-value pair and the mapping relationship.
  • the data management apparatus 400 may use an LSM tree to store the second key-value pair. Considering that binding and storing the second key-value pair in the LSM tree will produce certain redundancy, for example, the key name of the second key-value pair and the key name of the third key-value pair include a common prefix, then the data management device 400 stores the second key-value pair and the third key-value pair, the above-mentioned common prefix is stored multiple times, resulting in redundancy, and the data management device 400 can also form the above-mentioned second key-value pair and at least one third key-value pair into a prefix tree, and The prefix tree is stored on persistent media to reduce redundancy.
  • the length of the key name of the third key-value pair is the same as the length of the key name of the second key-value pair.
  • the third key-value pair may be a converted key-value pair (the original length of the key name is greater than the preset value), or an original key-value pair (the original length of the key name is the preset value).
  • the common prefix of the key name of the third key-value pair and the key name of the second key-value pair is used as the root node of the prefix tree, different strings except the common prefix are used as the intermediate nodes of the prefix tree, and the key value is used as the leaf of the prefix tree node.
  • the key name of the second key-value pair is "objectXXX.partXXX"+hash ID+short ID
  • the key name of the third key-value pair is "objectXXX.partXYZ”+hash ID+short ID
  • the second key-value pair The common prefix of the key name of the value pair and the key name of the third key-value pair is "objectXXX.partX”. "+hash ID+short ID is used as the intermediate node of the prefix tree.
  • the intermediate node refers to the non-root non-leaf node.
  • the intermediate node usually has child nodes and parent nodes.
  • the key value of the second key-value pair and the key value of the third key-value pair are respectively stored in the leaf nodes.
  • At least one non-leaf node on the path to the leaf node is used to store the key name, and each path to the leaf node corresponds to a key name.
  • each path to the leaf node corresponds to a key name.
  • "objectXXX.partX” stored in the root node and "XX"+hash ID+short ID stored in an intermediate node correspond to the key name of the second key-value pair.
  • the data management device 400 merges the prefix trees to obtain an LSM tree. Specifically, referring to the schematic flowchart of merging prefix trees shown in FIG. 7 , the data management device 400 may write the prefix trees into the memory, for example, into the memory table MemTable. When the size of the memory table is larger than the limit, set the MemTable to read-only mode to obtain the ImmTable. The data management device 400 may write the prefix tree in the ImmTable into the first layer of the LSM tree in time sequence.
  • the LSM tree includes L layers, and for any layer from the first layer to the L-1th layer in the L layers, when the number of trees included in the i-th layer reaches the preset value of the i-th layer threshold, merge the preset number of trees in the i-th layer, and write the merged trees into the i+1-th layer, where i is a positive integer. Wherein, after the data management device 400 merges the preset number of trees in the i-th layer and writes the merged trees into the next layer, it can delete the trees in the i-th layer.
  • the serialization of the memory to the first layer adopts a one-to-one correspondence writing method, and a tree in the ImmTable will be converted into a tree of the same size in the L0 layer. All trees in the first layer to the L layer (that is, L0->Ln layer) are arranged in time order.
  • the data management device 400 will A preset number of trees are merged into one tree.
  • the preset threshold and preset number of each layer can be set according to experience values, and the preset threshold and preset number of each layer can be different.
  • n should not be too large to avoid write amplification caused by frequent writing.
  • the data management device 400 may also set the preset threshold of the Lth layer in the LSM tree to Let a number of trees be merged into a final tree. It should be noted that when the data management device 400 merges the trees of each layer, it can be executed concurrently, which can speed up the merge speed.
  • the data management apparatus 400 may store the trees included in each layer of the LSM tree to a persistent medium. Specifically, the data management apparatus 400 may store the prefix tree written in the LSM tree to a persistent medium. When the prefix trees of the i-th layer of the LSM tree are merged, the data management apparatus 400 may write the merged tree into a persistent medium. For example, when the number of prefix trees included in the L0 layer reaches 10, the data management apparatus 400 may merge the earliest 4 trees among the 10 trees, and write the merged trees to the persistent medium. Further, after the merged tree is written into the persistent medium, the data management apparatus 400 may also delete the plurality of prefix trees corresponding to the merged tree stored in the persistent medium. For example, the data management apparatus 400 may delete the four trees that have been stored in the persistent medium.
  • the data management device 400 can write the values of the root node, intermediate node, and leaf node of the tree included in each layer of the LSM tree into the persistent medium, and generate and record the root node, intermediate node, and leaf node in the prefix tree or the final
  • the index information of the relationship in the tree is written, and then the index information is written to the persistent medium, so as to implement the disk of the LSM tree.
  • the data management device 400 divides the non-leaf nodes (that is, the inner node Inner node) and the leaf node Leaf node of the tree in L0-LN (Ln is used as an example in FIG. 8 )
  • the first type of SST written to persistent media that is, Inner SST.
  • the data written into the first type of SST can be written into the cache synchronously, so as to improve the search efficiency.
  • the data management device 400 can write the Leaf nodes of the final tree into a persistent In the second type of SST of the medium, it is also called Leaf SST.
  • the data in the Leaf SST is not cached in memory.
  • the Inner node in the final tree can be written to the Inner SST and synchronized to the cache.
  • node information refers to information in a node, for example, information of a root node, information of child nodes pointed to by the root node, and information of leaf nodes.
  • Index information refers to pointer information between nodes, and is used to indicate the relationship between nodes, for example, to indicate a parent node and a child node of a node.
  • the SST root information refers to the relevant information of the entire SST, such as the size of the SST, the position of the node in the SST, and so on.
  • LSM tree adopts the method of full time series, which can improve the speed of merging.
  • the data management device 400 may also use an LSM tree to store the mapping relationship. Specifically, the data management apparatus 400 may form a plurality of mapping relationships into a prefix tree, then write the prefix tree into the LSM tree, and merge the prefix trees that meet the requirements in the LSM tree.
  • the data management device 400 may also write the Leaf node of the final tree corresponding to the mapping relationship into the Inner SST, and synchronize it to the cache.
  • users 1 to N can perform data access operations, and when users 1 to N perform data storage operations, they can also access the metadata of the data.
  • the metadata of each user is represented by key-value pairs.
  • the key name is the ID of the object
  • the key value is the storage address of the object.
  • the key name of the metadata stored by user 1 is Objectx.xxxxxx.1
  • the key name of the metadata stored by user 2 is Objectx.xxxxxx.2, and so on.
  • the key name of the data is Objectx.xxxxxx.n.
  • the data management device 400 when the data management device 400 receives the data to be written, it may first check the validity of the data based on the set rules. For example, if the key name is empty, it is considered illegal and the verification fails. When the verification is passed, the data management device 400 can determine the key-value pairs that meet the conversion conditions, specifically the key-value pairs whose key name length is greater than the preset value, and use the conversion algorithm to convert the key-value pairs that meet the conversion conditions. The key name (long key) is converted to a short key name (short key).
  • the short key generated by the data management device 400 is generated for the first time, the short key->long key mapping relationship can be written into the long key management structure, and the mapping relationship will be cached.
  • the data management device 400 can compress the original long key, and finally output the converted short key and value.
  • the data management device 400 can directly output the original Key/Value. In this way, space waste caused by variable-length keys can be avoided, and keys can be compressed to further save storage costs.
  • the data management apparatus 400 may further normalize the resource pools to solve the problem of waste of resources. As shown in FIG. 10 , when the data of different user pools is written, the data management device 400 can identify the identification pool IDs of different user pools, and add the pool IDs to corresponding keys, such as converted short keys. Since the keys of multiple user pools have a common prefix, when the prefix tree is used for storage, the pool id hardly takes up additional space, thus saving a lot of resource waste caused by multi-pool isolation.
  • the data management device 400 can also judge the length of the key value in the converted key-value pair. For the target key value whose key value is longer than the preset length, the data management device 400 can separate the key-value pair, and write the target key value into the key value management, specifically into the value log by means of additional writing. Then the data management device 400 can use the address of the target key to replace the target key, obtain the replaced key-value pair, and write the replaced key-value pair into key/value management for persistence storage. For a key value whose length is not greater than a preset length, the data management apparatus 400 may directly write it into the key/value management for persistent storage.
  • the data management device 400 may also write object attributes, such as object creation time, creator, and object size, into attribute management, so as to store the object attributes persistently.
  • object attributes such as object creation time, creator, and object size
  • the mapping relationship, the attribute of the object, the key-value pair (for example, the key-value pair formed by the original key or the converted key and the key value whose length is not greater than the preset length, and the original key or the converted key and the length greater than the preset length)
  • the key-value pairs formed by the address of the target key value with the length set can respectively form corresponding prefix trees, and merge them according to time sequence to form an LSM tree.
  • the data management apparatus 400 may store the trees included in each layer of the LSM tree to a persistent medium.
  • the data management device 400 performs downloading according to the node granularity instead of the KV granularity, which also saves the space of the persistent medium (such as high-speed medium). Moreover, the data management device 400 performs downloading at the granularity of nodes, which can speed up the speed of reading data.
  • the device 400 includes:
  • An interaction module 402 configured to obtain data to be written, the data to be written includes a first key-value pair, and the first key-value pair includes a first key name and a key value;
  • a conversion module 404 configured to convert the first key name into a second key name, the length of the second key name is a preset value, and generate a second key-value pair, the second key-value pair includes the The second key name and the key value, and record the mapping relationship between the second key name and the first key name;
  • the management module 406 is configured to store the second key-value pair and the mapping relationship.
  • the specific implementation of the interaction module 402 acquiring the data to be written can refer to the description of S502 related content in the embodiment shown in FIG.
  • the specific implementation of recording the mapping relationship between the second key name and the first key name please refer to the relevant content description of S504 in the embodiment shown in FIG.
  • the related content of S506 in the embodiment is described.
  • the conversion module 404 is specifically configured to:
  • the specific implementation of the key name conversion by the conversion module 404 can refer to the relevant content description of S504 in the embodiment shown in FIG. 5 , which will not be repeated here.
  • the conversion module 404 is specifically configured to:
  • a second key name is generated according to the hash identifier.
  • the specific implementation of the key name conversion by the conversion module 404 can refer to the relevant content description of S504 in the embodiment shown in FIG. 5 , which will not be repeated here.
  • the second key name further includes a user identifier.
  • the conversion module 404 may add a user identifier, such as a pool ID, to the second key name, so as to realize normalization of user data and avoid repeated application for management resources.
  • a user identifier such as a pool ID
  • the data to be written is metadata of stored data.
  • the stored data is an object in object storage
  • the first key name in the first key-value pair is the identifier of the object
  • the key in the first key-value pair The value includes the storage address of the object.
  • the management module 406 is specifically configured to:
  • prefix tree with the second key-value pair and at least one third key-value pair, wherein the key name of the third key-value pair has the same length as the key name of the second key-value pair, and the The key name of the third key-value pair and the key name of the second key-value pair include a common prefix, wherein the common prefix serves as the root node of the prefix tree, and the second key-value pair and the at least one Different character strings in the third key-value pair except the common prefix are used as intermediate nodes of the prefix tree, and the key values are used as leaf nodes of the prefix tree;
  • the specific implementation of storing the second key-value pair by the management module 406 can refer to the related content description of S506 in the embodiment shown in FIG. 5 , which will not be repeated here.
  • the management module 406 is specifically configured to:
  • the LSM tree includes L layers, for any layer from the first layer to the L-1th layer in the L layer, when the When the number of trees included in the i-th layer reaches the preset threshold value of the i-th layer, the preset number of trees in the i-th layer are merged, and the merged tree is written into the i+1th layer, the i and L are positive integers;
  • the preset number of trees in the L layer are merged into the final Tree
  • the tree included in each layer of the LSM tree is stored in the persistent medium.
  • the specific implementation of storing the second key-value pair by the management module 406 can refer to the related content description of S506 in the embodiment shown in FIG. 5 , which will not be repeated here.
  • the management module 406 is specifically configured to:
  • the specific implementation of storing the second key-value pair by the management module 406 can refer to the related content description of S506 in the embodiment shown in FIG. 5 , which will not be repeated here.
  • the data management device 400 may correspond to the implementation of the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the various modules/units of the data management device 400 are respectively in order to realize Fig. 5 to Fig. 9
  • the corresponding processes of the various methods in the illustrated embodiments are not repeated here.
  • the embodiment of the present application also provides a computer cluster.
  • the computer cluster includes at least one computer.
  • at least one computer can form a centralized storage system as shown in FIG. 1 , and when the computer cluster includes multiple computers, such as servers, it can also form a distributed storage system as shown in FIG. 2 .
  • the computer cluster is specifically used to implement the functions of the data management device 400 in the embodiment shown in FIG. 4 .
  • the at least one computer includes at least one processor and at least one memory, computer-readable instructions are stored in the at least one memory, and the at least one processor executes the computer-readable instructions, so that the computer The cluster executes the aforementioned data management method, or implements the functions of the aforementioned data management device 400 .
  • the interaction module 402 and the conversion module in FIG. 4 are executed.
  • Software or program code required for the function of 404 and management module 406 is stored in memory.
  • the processor executes the software or program codes in the memory, thereby executing the aforementioned data management method, or realizing the functions of the aforementioned data management device.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that includes one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state hard disk), etc.
  • the computer-readable storage medium includes instructions, and the instructions instruct a computing device to execute the above-mentioned data management method.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, the processes or functions according to the embodiments of the present application will be generated in whole or in part.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • another computer-readable storage medium e.g. (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wirelessly (such as infrared, wireless, microwave, etc.) to another website site, computer or data center.
  • the computer program product may be a software installation package which can be downloaded and executed on a computing device if any of the aforementioned data management methods are required.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提供了一种数据管理方法,包括:获取待写入的数据,待写入的数据包括第一键值对,第一键值对包括第一键名和键值,将第一键名转换为长度为预设值的第二键名,生成第二键值对,第二键值对包括第二键名与键值,并记录第二键名与第一键名的映射关系,存储第二键值对与映射关系。方法统一了定长键名和变长键名的存储,无需额外存储键名的长度,节省了存储空间,降低了存储成本。

Description

一种数据管理方法及相关装置
本申请要求于2021年12月31日提交中国专利局、申请号为202111676219.X、发明名称为“一种数据管理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种数据管理方法、装置、集群以及计算机可读存储介质、计算机程序产品。
背景技术
随着信息时代的到来,互联网中产生了大量的数据。例如,互联网中产生有大量的配置数据、报表数据(如出行信息统计表)。为了便于数据管理,可以采用键值对(key-value,KV)的方式对数据进行存储。其中,键值对是键名(记作key)和键值(记作value)形成的数据对。键名通常是定义数据集的常量,例如可以包括性别、姓名、颜色等,键值通常是数据集中的变量,如男/女、张xx、绿色等。
目前,很多存储系统通常是采用结构化合并(log Structured Merge,LSM)树存储键值对。具体地,当内存中的键值对的数据量超过限值后,可以将键值对写入有序字符串表(sorted strings table,SST)进行持久化存储。
然而,键值对中key的长度(记作KeyLen)不统一时,还需要在每次写入key时,额外写入KeyLen。一个KeyLen通常需要占用4字节(Byte,B)的存储空间,如此需要预留较多的存储空间存储KeyLen,增加了存储成本。
发明内容
本申请提供了一种数据管理方法,该方法通过对键名进行转换,统一待写入的键值对中键名的长度,从而使得每次存储键名时,无需额外存储键名的长度,节省存储空间,降低存储成本。本申请还提供了上述方法对应的装置、计算机集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请提供了一种数据管理方法。该方法可以由存储系统(例如是集中式存储系统或者是分布式存储系统)执行。具体地,存储系统获取待写入的元数据,将待写入的数据中满足转换条件的键值对(为了便于描述,可以称之为第一键值对,第一键值对包括第一键名和键值)进行转换,具体是将第一键名转换为第二键名,该第二键名的长度为预设值,由此生成包括上述第二键名和键值的第二键值对,并记录第二键名和第一键名的映射关系,接着存储系统存储上述第二键值对和映射关系。
在该方法中,存储系统通过对键名进行转换,例如将第一键名转换为长度为预设值的第二键名,统一了待写入的键值对中键名的长度,如此,存储系统在每次写入键名时,无需额外写入该键名的长度,节省了存储空间,降低了存储成本。
在一些可能的实现方式中,存储系统可以判断所述第一键名的长度是否大于所述预设值,如果大于所述预设值,则将所述第一键名转换为所述第二键名,且所述预设值小于所述 第一键名的长度。
在该方法中,转换后的键名的长度通常远小于转换前的键名的长度,通过存储转换后的键名可以进一步节省存储空间,降低存储成本。
在一些可能的实现方式中,存储系统可以根据所述第一键名,通过哈希算法确定哈希标识,然后根据所述哈希标识生成第二键名。考虑到一些键名进行哈希时可能存在哈希碰撞或哈希冲突,存储系统可以根据哈希标识和短标识生成第二键名。其中,短标识的作用在于对具有相同哈希标识的键名进行区分,从而保障键名的唯一性。该短标识例如可以是随机生成的字符串或者顺序生成的字符串。
在该方法中,存储系统可以根据第一键名的部分或全部字符进行哈希。其中,参与哈希的字符越多,哈希标识越长,哈希冲突的可能性越低。参与哈希字符的越少,则预留的字符越多,能够提取的公共前缀越长,可以进一步提高压缩率,节省存储空间。
在一些可能的实现方式中,所述第二键名中还包括用户标识。该用户标识可以是用户池的标识。对于每个用户,可以维护一个逻辑层面的用户池,待写入的数据经过用户池写入至内存,然后从内存写入至持久化介质进行持久化存储。
由于在第二键名中增加用户标识,可以实现用户数据归一化,无需进行用户隔离,避免了隔离用户池导致重复申请资源所产生的资源浪费。
在一些可能的实现方式中,所述待写入的数据为已存储数据的元数据。元数据为描述数据的数据,该元数据可以用于对数据进行寻址,为了便于数据管理,元数据可以表示为键值对,因而可以采用本申请的数据管理方法对元数据进行高效存储,避免浪费存储空间。
在一些可能的实现方式中,所述已存储数据为对象存储中的对象,所述第一键值对中的第一键名为所述对象的标识,所述第一键值对中的键值包括所述对象的存储地址。如此,可以实现高效管理对象存储的元数据。
在一些可能的实现方式中,存储系统可以将所述第二键值对与至少一个第三键值对形成前缀树。其中,所述第三键值对的键名与所述第二键值对的键名的长度相同,且所述第三键值对的键名与所述第二键值对的键名包括公共前缀。公共前缀是指不同字符串所具有的相同前缀。例如,键名“objectXXX.partXXX.dataXXX.437”和键名“objectXXX.partXYZ.dataXXX.437”具有如下公共前缀“objectXXX.partX”。
存储系统可以将所述公共前缀作为所述前缀树的根节点,所述第二键值对与所述至少一个第三键值对中除所述公共前缀之外的不同字符串作为所述前缀树的中间节点,所述键值作为所述前缀树的叶子节点,然后将所述前缀树存储至持久化介质。
在该方法中,存储系统存储前缀树时只需存储一次公共前缀,减少了冗余数据,通过对不同键名的相似部分进行压缩,提高了压缩率,节省了存储空间,降低了存储成本。
在一些可能的实现方式中,存储系统可以将所述前缀树写入结构化合并LSM树的第1 层。所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1层,所述i和L为正整数。对于所述L层中的第L层,当所述第L层中的树的数量达到所述第L层的预设阈值时,将所述第L层中的预设数量棵树合并为最终树。然后存储系统将LSM树的各层包括的树存储至所述持久化介质。
其中,每层的预设阈值和预设数量可以根据经验值设置,并且每层预设阈值和预设数量可以不同。在该实施例中,LSM树层数不宜过大,避免频繁写导致写放大。需要说明的是,LSM树中各层的前缀树是按时序排列的,存储系统在合并各层的树时,可以并发执行,如此可以加速合并速度。
在一些可能的实现方式中,存储系统可以将所述LSM树的各层包括的树的根节点、中间节点及叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在所述前缀树或所述最终树中的关系的索引信息,然后将所述索引信息写入所述持久化介质。相对于基于键值对粒度进行持久化存储,该方法能够有效压缩不同键名的相似部分,节省存储空间,降低存储成本。
第二方面,本申请提供了一种数据管理装置。所述装置包括:
交互模块,用于获取待写入的数据,所述待写入的数据包括第一键值对,所述第一键值对包括第一键名和键值;
转换模块,用于将所述第一键名转换为第二键名,所述第二键名的长度为预设值,生成第二键值对,所述第二键值对包括所述第二键名与所述键值,并记录所述第二键名与所述第一键名的映射关系;
管理模块,用于存储所述第二键值对与所述映射关系。
在一些可能的实现方式中,所述转换模块具体用于:
判断所述第一键名的长度是否大于所述预设值,如果大于所述预设值,则将所述第一键名转换为所述第二键名,且所述预设值小于所述第一键名的长度。
在一些可能的实现方式中,所述转换模块具体用于:
根据所述第一键名,通过哈希算法确定哈希标识;
根据所述哈希标识生成第二键名。
在一些可能的实现方式中,所述第二键名中还包括用户标识。
在一些可能的实现方式中,所述待写入的数据为已存储数据的元数据。
在一些可能的实现方式中,所述已存储数据为对象存储中的对象,所述第一键值对中的第一键名为所述对象的标识,所述第一键值对中的键值包括所述对象的存储地址。
在一些可能的实现方式中,所述管理模块具体用于:
将所述第二键值对与至少一个第三键值对形成前缀树,其中,所述第三键值对的键名与所述第二键值对的键名的长度相同,且所述第三键值对的键名与所述第二键值对的键名包括公共前缀,其中,所述公共前缀作为所述前缀树的根节点,所述第二键值对与所述至少一个第三键值对中除所述公共前缀之外的不同字符串作为所述前缀树的中间节点,所述键值作为所述前缀树的叶子节点;
将所述前缀树存储至持久化介质。
在一些可能的实现方式中,所述管理模块具体用于:
将所述前缀树写入结构化合并LSM树的第1层,所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1层,所述i和L为正整数;
对于所述L层中的第L层,当所述第L层中的树的数量达到所述第L层的预设阈值时,将所述第L层中的预设数量棵树合并为最终树;
将所述LSM树各层包括的树存储至所述持久化介质。
在一些可能的实现方式中,所述管理模块具体用于:
将所述LSM树的各层包括的树的根节点、中间节点及叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在所述前缀树或所述最终树中的关系的索引信息,然后将索引信息写入所述持久化介质。
第三方面,本申请提供一种计算机集群。所述计算机集群包括至少一台计算机,所述至少一台计算机包括处理器和存储器。所述处理器、所述存储器进行相互的通信。所述处理器用于执行所述存储器中存储的指令,以使得所述计算机集群执行如第一方面或第一方面的任一种实现方式中的数据管理方法。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,所述指令指示计算机集群执行上述第一方面或第一方面的任一种实现方式所述的数据管理方法。
第五方面,本申请提供了一种包含指令的计算机程序产品,当其在计算机集群上运行时,使得计算机集群执行上述第一方面或第一方面的任一种实现方式所述的数据管理方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简 单地介绍。
图1为本申请实施例提供的一种集中式存储系统的架构示意图;
图2为本申请实施例提供的一种分布式存储系统的架构示意图;
图3为本申请实施例提供的一种存储系统的架构示意图;
图4为本申请实施例提供的一种数据管理装置的结构示意图;
图5为本申请实施例提供的一种数据管理方法的流程图;
图6为本申请实施例提供的一种键值对转换的流程示意图;
图7为本申请实施例提供的一种合并前缀树的流程示意图;
图8为本申请实施例提供的一种结构化合并树下盘的流程示意图;
图9为本申请实施例提供的一种数据管理方法的流程示意图;
图10为本申请实施例提供的一种用户池归一化的流程示意图。
具体实施方式
本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
首先对本申请实施例中所涉及到的一些技术术语进行介绍。
本申请可以应用于集中式存储系统的应用场景中。集中式存储系统就是指由一台或多台主设备组成中心节点,数据集中存储于这个中心节点中,并且整个系统的所有数据处理业务都集中部署在这个中心节点上。换言之,集中式存储系统中,终端或客户端仅负责数据的录入和输出,而数据的存储与控制处理完全交由中心节点来完成。
参见图1所示的集中式存储系统的架构图,用户通过应用程序来存取数据。运行这些应用程序的计算机被称为“应用服务器”。应用服务器100可以是物理机,也可以是虚拟机。物理机包括但不限于桌面电脑、服务器、笔记本电脑以及移动设备。应用服务器通过交换机110(例如是光纤交换机)访问存储系统以存取数据。然而,交换机110只是一个可选设备,应用服务器100也可以直接通过网络与存储系统120通信。或者,光纤交换机110也可以替换成以太网交换机、无线带宽(InfiniBand,IB)交换机、基于融合以太网的远程内存直接访问(RDMA over Converged Ethernet,RoCE)交换机等。
图1所示的存储系统120是一个集中式存储系统。存储系统120可以接收来自应用服务器100的数据,然后对该数据进行存储,例如是进行持久化存储,从而实现对数据的管理。
集中式存储系统的特点是有一个统一的入口,所有从外部设备来的数据都要经过这个入口,这个入口就是集中式存储系统的引擎121。引擎121是集中式存储系统中最为核心的部件,许多存储系统的高级功能都在其中实现。
如图1所示,引擎121中有一个或多个控制器,图1以引擎包含两个控制器为例予以说明。控制器0与控制器1之间具有镜像通道,那么当控制器0将一份数据写入其内存124后,可以通过所述镜像通道将所述数据的副本发送给控制器1,控制器1将所述副本存储在自己本地的内存124中。由此,控制器0和控制器1互为备份,当控制器0发生故障时,控制器1可以接管控制器0的业务,当控制器1发生故障时,控制器0可以接管控制器1的业务,从而避免硬件故障导致整个存储系统120的不可用。当引擎121中部署有4个控制器时,任意两个控制器之间都具有镜像通道,因此任意两个控制器互为备份。
引擎121还包含前端接口125和后端接口126,其中前端接口125用于与应用服务器100通信,从而为应用服务器100提供存储服务。而后端接口126用于与硬盘134通信,以扩充存储系统的容量。通过后端接口126,引擎121可以连接更多的硬盘134,从而形成一个非常大的存储资源池。
在硬件上,如图1所示,控制器0至少包括处理器123、内存124。处理器123是一个中央处理器(central processing unit,CPU),用于处理来自存储系统外部(服务器或者其他存储系统)的数据访问请求,也用于处理存储系统内部生成的请求。示例性的,处理器123通过前端端口125接收应用服务器100发送的写数据请求时,会将这些写数据请求中的数据暂时保存在内存124中。当内存124中的数据总量达到一定阈值时,处理器123通过后端端口将内存124中存储的数据发送给硬盘134进行持久化存储。
内存124是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。内存包括至少两种存储器,例如内存既可以是随机存取存储器,也可以是只读存储器(Read Only Memory,ROM)。举例来说,随机存取存储器是动态随机存取存储器(Dynamic Random Access Memory,DRAM),或者存储级存储器(Storage Class Memory,SCM)。而对于只读存储器,举例来说,可以是可编程只读存储器(Programmable Read Only Memory,PROM)、可抹除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)等。另外,内存124还可以是双列直插式存储器模块或双线存储器模块(Dual In-line Memory Module,DIMM),或者是固态硬盘(Solid State Disk,SSD)。实际应用中,控制器0中可配置多个内存124,以及不同类型的内存124。本实施例不对内存113的数量和类型进行限定。
内存124中存储有软件程序,处理器123运行内存124中的软件程序可实现对硬盘的管理。例如将硬盘抽象化为存储资源池,然后划分为逻辑单元块(logic unit number,LUN)提供给服务器使用等。这里的LUN其实就是在服务器上看到的硬盘。当然,一些集中式存储系统本身也是文件服务器,可以为服务器提供共享文件服务。
控制器1,以及其他图1中未示出的控制器的硬件组件和软件结构与控制器0类似,这里不再赘述。需要说明的是,图1中只示出了一个引擎121,然而在实际应用中,存储系统中可包含两个或两个以上引擎121,多个引擎121之间做冗余或者负载均衡。
图1所示的是一种盘控一体的集中式存储系统。在该系统中,引擎121具有硬盘槽位,硬盘134可直接部署在引擎121中,后端接口126属于可选配置,当系统的存储空间不足时,可通过后端接口126连接更多的硬盘或硬盘框。
在一些可能的实现方式中,集中式存储系统也可以是盘控分离的存储系统。在盘控分离的存储系统中,引擎121可以不具有硬盘槽位,硬盘134需要放置在硬盘框中,后端接口126与硬盘框通信。后端接口126以适配卡的形态存在于引擎121中,一个引擎121上可以同时使用两个或两个以上后端接口126来连接多个硬盘框。或者,适配卡也可以集成在主板上,此时适配卡可通过高速串行外设组件互连(Peripheral Component Interconnect Express,PCI-E)总线与处理器112通信。
在一些可能的实现方式中,本申请也可以应用于分布式存储系统的应用场景中。分布式存储系统是指将数据分散存储在多台独立的存储节点上的系统。分布式网络存储系统采用可扩展的系统结构,利用多台存储节点分担存储负荷,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。
图2示出了一种分布式存储系统的架构图,分布式存储系统包括存储集群。存储集群包括一个或多个服务器110(图2中示出了三个服务器110,但不限于三个服务器110),各个服务器110之间可以相互通信。存储集群中的服务器110可以接收数据,例如是接收其他服务器110的数据,然后将数据进行持久化存储,以实现对数据的管理。
服务器110是一种既具有计算能力又具有存储能力的设备,如服务器、台式计算机等。示例性的,高级精简指令集机器(Advanced RISC Machine,ARM)服务器或者X86服务器都可以作为这里的服务器110。
在硬件上,如图2所示,服务器110至少包括处理器112、内存113、网卡114和硬盘105。处理器112、内存113、网卡114和硬盘105之间通过总线连接。其中,处理器112和内存113用于提供计算资源。具体地,处理器112是一个中央处理器(central processing unit,CPU),用于处理来自服务器110外部(应用服务器或者其他服务器110)的数据访问请求,也用于处理服务器110内部生成的请求。示例性的,处理器112接收写数据请求时,会将这些写数据请求中的数据暂时保存在内存113中。当内存113中的数据总量达到一定阈值时,处理器112将内存113中存储的数据发送给硬盘105进行持久化存储。除此之外,处理器112还用于数据进行计算或处理,例如元数据管理、重复数据删除、数据压缩、数据校验、虚拟化存储空间以及地址转换等。图2中仅示出了一个CPU 112,在实际应用中,CPU 112的数量往往有多个,其中,一个CPU 112又具有一个或多个CPU核。本实施例不对CPU的数量,以及CPU核的数量进行限定。
内存113是指与处理器直接交换数据的内部存储器,它可以随时读写数据,而且速度很快,作为操作系统或其他正在运行中的程序的临时数据存储器。硬盘105用于提供存储资源,例如存储数据。网卡114用于与其他服务器110通信。
需要说明的是,图2所示的分布式存储系统是存算一体架构的存储系统,在一些可能的实现方式中,分布式存储系统也可以采用存算分离架构,或者是全融合架构,本实施例对此不作限制。
在以上介绍的应用场景中,随着数据的爆炸式增长,数据存储需求与日俱增。例如,存储系统可以对配置数据、报表数据等进行存储,以便后续进行数据分析时使用。其中,配置数据、报表数据通常可以表示为键值对。除了数据本身为键值对,一些数据的元数据也可以表示键值对。例如,文件的大小、创建时间、创建者、最近编辑时间等也可以表示为键值对。
存储系统(无论是集中式存储系统还是分布式存储系统)在存储表示为键值对的数据或元数据时,可以基于结构化合并(log Structured Merge,LSM)树进行存储。具体地,存储系统的存储引擎可以将键值对写入内存中的内存表(MemTable)。当内存表的大小超过限值之后,该内存表可以被设置为只读模式,存储引擎可以启动后台的刷盘线程,以将只读模式的内存表中的键值对写入持久化介质,例如将只读模式的内存表中的键值对写入非易失性内存(Non-Volatile Memory Express,NVMe)等高速介质。
然而,在很多应用场景中,写入持久化介质的键值对中key的长度不是统一的。以分布式存储系统Ceph为例,Ceph使用存储引擎BlueStore对裸盘(也称作块设备)进行管理。其主要元数据为对象的Onode结构。Onode结构具体是包括对象的标识(identifer,ID)和对象的存储地址的数据结构,其中,对象的存储地址可以通过指向对象的逻辑位置<offset,len>的ExtentMap结构表示。上述Onode结构可以形成键值对,键值对中的key为对象的ID,键值对中的value为对象的存储地址。不同对象的ID的长度通常是不统一的,存储引擎在将对象 的ID写入持久化介质,具体是写入持久化介质的SST时,通常需要在每次写入对象的id时,均额外写入对象的ID的长度(即KeyLen)。由此可见,键值对中key的长度不统一时,存储系统需要每次写入key时,均额外存储KeyLen,增加了存储成本。
有鉴于此,本申请实施例提供了一种数据管理方法。该方法可以由存储系统(例如是图1中的集中式存储系统或者是图2中的分布式存储系统)执行。具体地,存储系统获取待写入的数据,将待写入的数据中满足转换条件的键值对(为了便于描述,可以称之为第一键值对,第一键值对包括第一键名和键值)进行转换,具体是将第一键名转换为第二键名,该第二键名的长度为预设值,由此生成包括上述第二键名和键值的第二键值对,并记录第二键名和第一键名的映射关系,接着存储系统存储上述第二键值对和映射关系。
在该方法中,存储系统通过对键名进行转换,例如将第一键名转换为长度为预设值的第二键名,统一了待写入的键值对中键名的长度,如此,存储系统在每次写入键名时,无需额外写入该键名的长度,节省了存储空间,降低了存储成本。而且,转换后的键名的长度通常远小于转换前的键名的长度,通过存储转换后的键名可以进一步节省存储空间,降低存储成本。
在一些可能的实现方式中,存储系统可以通过数据管理装置实现数据管理方法。参见图3所示的存储系统的架构示意图,存储系统中部署有数据管理装置,数据管理装置具体可以是软件装置,存储系统通过运行软件装置对应的计算机程序,从而实现数据管理。
需要说明的是,该数据管理装置不仅可以用于对键值对形式的数据本身进行管理,也可以用于对键值对形式的元数据进行管理。也即数据管理装置可以用于实现元数据管理装置的功能。在一些实施例中,参见图3,存储系统也可以包括数据管理装置和元数据管理装置,数据管理装置用于对键值对形式的数据本身进行管理,元数据管理装置用于对键值对形式的元数据进行管理。
其中,存储系统的持久化介质可以分为数据区和元数据区,数据区是指用于存储数据的区域,元数据区是指用于存储元数据的区域。在本实施例中,数据区可以是硬盘提供的存储区域,元数据区可以是高速介质提供的存储区域。在一些实施例中,元数据区也可以包括硬盘提供的存储区域,数据区也可以包括高速介质提供的存储区域,本实施例对此不作限制。
具体地,用户可以执行存取数据的操作,例如用户A至用户N可以执行向存储系统存入数据,或者从存储系统读取数据的操作。存储系统响应于用户的存入数据的操作,通过数据管理装置将数据写入数据区,并通过元数据管理装置将该数据的元数据写入元数据区。存储系统响应于用户的读取数据的操作,从元数据区读取该数据的元数据,该元数据可以用于对数据进行寻址,存储系统可以基于元数据从数据区读取数据。其中,存储系统在写入数据时,可以先写入数据,然后将数据的元数据写入内存,接着再将元数据从内存写入至持久化介质。类似地,存储系统在读取数据时,也可以先将元数据从持久化介质读入内存,然后根据元数据中指示的信息,将数据从持久化介质读入内存。
为了使得本申请的技术方案更加清楚、易于理解,下面结合附图对数据管理装置的结构进行介绍。
参见图4所示的数据管理装置的结构示意图,该装置400包括交互模块402、转换模块404和管理模块406。其中,交互模块402用于获取待写入的数据,该待写入的数据包括第一键值对,第一键值对包括第一键名和键值;转换模块404用于将所述第一键名转换为第二键 名,第二键名的长度为预设值,然后生成第二键值对,该第二键值对包括第二键名和键值,并记录第二键名与第一键名的映射关系。管理模块406用于存储上述第二键值对和映射关系。
其中,管理模块406具有长键名(长key)管理功能和键值对管理功能。管理模块406可以基于长key管理功能,对KeyLen大于预设值的长key(例如是第一键名)与转换后的短key(例如是第二键名)的映射关系进行存储,以实现长key管理。并且,管理模块406可以基于键值对管理功能,对转换后的键值对(例如第二键值对)进行存储,以实现键值对管理。该装置400通过键名转换,统一定长key和变长key的存储方法,无需额外存储key的长度KeyLen,降低key的存储成本,从而降低键值对的存储成本。
在一些可能的实现方式中,考虑到交互模块402获取的数据可以来源于不同用户的用户池,例如来源于用户池1至用户N(记作pool 1至pool N,分别为用户1至用户N的用户池,该用户池是指逻辑面的存储池),数据管理装置400还可以对不同用户池的数据进行归一化,以对不同用户池的数据进行统一管理,避免重复申请管理资源(例如是创建SST所需资源)。具体地,数据管理装置400还可以包括归一化模块408,归一化模块408用于根据用户标识(例如是所述第一键值对来源的用户池的标识)更新所述第二键名,例如归一化模块408可以为第二键名添加用户池的标识pool ID,从而得到更新后的第二键名,该更新后的第二键名中包括pool ID。
进一步地,考虑到有些键值对中键值占用较多存储空间,数据管理装置400还可以对键值占用较多存储空间的键值对采用键名键值分离存储的策略,以节省高速介质的存储空间。具体地,数据管理装置400还可以包括分离模块409。分离模块409用于键值的长度大于预设长度时,将第二键值对中的第二键名与键值分离,相应地,管理模块406可以将所述键值写入日志,并利用所述键值的地址替换所述键值,存储替换后的键值对。其中,管理模块406具有键值管理功能,管理模块可以基于键值管理功能将键值以日志,例如是value log,写入硬盘,基于键值对管理功能将利用键值的地址替换键值所得的键值对写入高速介质。
在一些可能的实现方式中,数据采用对象存储时,数据管理装置400不仅可以对由对象的ID/对象的存储地址形成的键值对进行管理,还可以对上述对象的属性,例如是对象的创建时间、创建者、对象的大小等进行管理。具体地,管理模块406还提供属性管理功能,用于对属性进行管理。
需要说明的是,图4所示的数据管理装置400也可以用于对键值对形式的元数据进行管理,数据管理装置400对元数据的管理方式可以参考对数据的管理方式。在一些实施例中,对元数据的管理也可以由元数据管理装置实现,其中,元数据的管理装置的结构与数据管理装置400类似,在此不再赘述。
接下来,结合附图从数据管理装置400的角度对本申请实施例的数据管理方法进行详细说明。
参见图5所示的数据管理方法的流程图,该方法包括:
S502:数据管理装置400获取待写入的数据。
待写入的数据可以是待存储的数据,或者是已存储数据的元数据。其中,待写入的数据可以表示为键值对。例如,待存储的数据可以为配置数据、报表数据,配置数据、报表数据可以表示为键值对。又例如,待存储的数据可以为已存储数据的元数据。当已存储数据为对象存储中的对象时,该元数据可以为由对象的ID和对象的存储地址形成的键值对。
在本实施例中,待写入的数据包括第一键值对,第一键值对包括第一键名和键值。例如,第一键名可以对象的ID,键值可以为该对象的存储地址。其中,本实施例仅以对象存储进行示例说明,在本申请实施例其他可能的实现方式中,已存储数据也可以采用块存储或文件存储。
S504:数据管理装置400将所述第一键名转换为第二键名,生成第二键值对,并记录第二键名与第一键名的映射关系。
具体地,数据管理装置400可以对符合转换条件的键值对进行转换。转换条件可以是键名长度大于预设值。该预设值可以根据经验值设置,例如可以设置为12字节B或者20B。
参见图6所示的键值对转换的流程示意图,数据管理装置400可以在接收到用户输入的第一键值对(即原key/value)时,判断第一键名的长度是否大于预设值,从而确定该第一键值对是否符合转换条件。针对符合转换条件的第一键值对,数据管理装置400可以采用转换算法,将第一键值对的第一键名转换为第二键名,从而生成第二键值对。该第二键值对包括第二键名和键值。其中,第二键名的长度为预设值。
针对不符合转换条件的第一键值对,例如是键名的长度等于预设值的键值对,数据管理装置400可以不进行转换。还需要说明的是,第一键值对的键名的长度小于预设值时,数据管理装置400可以将该第一键值对的键名补齐,使得补齐后的键名的长度等于预设值。
在一些可能的实现方式中,数据管理装置400可以通过哈希算法进行键名转换。具体地,数据管理装置400可以通过哈希算法对第一键名(也称作原key)进行哈希处理,得到哈希标识(hash id),基于该hash id可以获得第二键名。考虑到哈希碰撞的情况,数据管理装置400可以在hash id基础上结合短标识(short id),得到第二键名。其中,short id可以是随机生成的字符串,或者顺序生成的字符串。该第二键名例如可以是hash id和short id拼接的字符串。
其中,数据管理装置400在通过哈希算法对第一键名进行哈希处理时,可以是对第一键名的部分或全部字符进行哈希。考虑到不同键值对的键名存在高度相似的情况,数据管理装置400可以对第一键名的部分字符进行哈希,以便于能够对转换所得的第二键名提取更长的公共前缀(即不同字符串具有的相同前缀),从而提高压缩率,节省存储空间。
为了便于理解,下面结合具体示例进行说明。
在该示例中,第一键名为对象的ID,该对象的ID为objectXXX.partXXX.dataXXX.437。该对象的ID通过“.”分隔为四个字段,第一字段“objectXXX”、第二字段“partXXX”与其他键名的对应字段具有高度相似性。基于此,数据管理装置400可以对第一字段之外的其他字段进行哈希得到hash ID,然后根据第一字段、hash ID和short ID生成第二键名,该第二键名可以表示为“objectXXX”+hash ID+short ID。在一些实施例中,数据管理装置400也可以对第一字段、第二字段之外的其他字段进行哈希,得到hash ID,然后根据第一字段、第二字段、hash ID和short ID生成第二键名,该第二键名可以表示为“objectXXX.partXXX”+hash ID+short ID。
与第一种第二键名“objectXXX”+hash ID+short ID相比,第二种第二键名“objectXXX.partXXX”+hash ID+short ID的hash ID更短,碰撞概率相对更高,但是能够提取到更长的公共前缀,从而节省更多的存储空间,而第一种第二键名中hash ID更长,碰撞概率较低,如此可以较好地保障键名的唯一性。
此外,数据管理装置400还可以记录第二键名与第一键名的映射关系。当该映射关系为第一次生成时,数据管理装置400可以将该映射关系写入长key管理(例如是用于管理长key的缓存),以便后续进行持久化存储。
S506:数据管理装置400存储第二键值对和映射关系。
数据管理装置400可以采用LSM树对第二键值对进行存储。考虑到将第二键值对绑定存储在LSM树会产生一定的冗余,例如第二键值对的键名和第三键值对的键名包括公共前缀,则数据管理装置400存储第二键值对和第三键值对时,多次存储上述公共前缀,由此产生冗余,数据管理装置400还可以将上述第二键值对与至少一个第三键值对形成前缀树,将该前缀树存储至持久化介质,以减少冗余。
其中,第三键值对的键名的长度与第二键值对的键名的长度相同。该第三键值对可以是经过转换的键值对(键名的原始长度大于预设值),也可以是原键值对(键名的原始长度即为预设值)。第三键值对的键名与第二键值对的键名的公共前缀作为前缀树的根节点,除公共前缀之外的不同字符串作为前缀树的中间节点,键值作为前缀树的叶子节点。通过存储上述前缀树,可以避免多次存储公共前缀,减少冗余,提高压缩率。
为了便于理解,下面结合一具体示例进行说明。
在该示例中,第二键值对的键名为“objectXXX.partXXX”+hash ID+short ID,第三键值对的键名为“objectXXX.partXYZ”+hash ID+short ID,第二键值对的键名和第三键值对的键名的公共前缀为“objectXXX.partX”,该公共前缀可以形成前缀树的根节点,每个键名中除公共前缀之外的字符串例如“XX”+hash ID+short ID作为前缀树的中间节点,该中间节点是指非根非叶节点,中间节点通常具有子节点和父节点。第二键值对的键值、第三键值对的键值分别存储在叶子节点。
其中,到达叶子节点的路径上的至少一个非叶子节点用于存储键名,到达叶子节点的每一条路径对应一个键名。例如,根节点存储的“objectXXX.partX”和一个中间节点存储的“XX”+hash ID+short ID对应第二键值对的键名。
进一步地,数据管理装置400将前缀树合并,得到LSM树。具体地,参见图7所示的合并前缀树的流程示意图,数据管理装置400可以将前缀树写入内存,例如是写入内存表MemTable。当内存表的大小大于限值时,将MemTable设置为只读模式,得到ImmTable。数据管理装置400可以将ImmTable中的前缀树按时序写入LSM树的第1层。所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1 层,所述i为正整数。其中,数据管理装置400在将第i层的预设数量棵树合并,然后将合并后的树写入下一层后,可以删除第i层的树。
其中,内存到第1层(也称作L0层)的序列化,采用一一对应写入的方式,ImmTable中的一个树会转化成同等大小的L0层的一个树。第1层到第L层(也即L0->Ln层)中所有树排列均按照时间序,当某一层包括的树的数量达到该层的预设阈值时,数据管理装置400将该层中预设数量棵树合并成一棵树。其中,每层的预设阈值和预设数量可以根据经验值设置,并且每层预设阈值和预设数量可以不同。在该实施例中,n不宜过大,避免频繁写导致写放大。
进一步地,数据管理装置400还可以在第L层(即图7中的Ln层)中的树的数量达到所述第L层的预设阈值时,将所述LSM树中第L层的预设数量棵树合并为最终树(final tree)。需要说明的是,数据管理装置400在合并各层的树时,可以并发执行,如此可以加速合并速度。
为了防止数据丢失,数据管理装置400可以将LSM树各层包括的树存储至持久化介质。具体地,数据管理装置400可以将写入所述LSM树的前缀树存储至持久化介质。当LSM树的第i层的前缀树合并时,数据管理装置400可以将合并后的树写入持久化介质。例如,第L0层包括的前缀树的数量达到10棵时,数据管理装置400可以将10棵树中时间最早的4棵树合并,将合并后的树写入持久化介质。进一步地,在合并后的树写入持久化介质后,数据管理装置400还可以将持久化介质中存储的与该合并后的树对应的多个前缀树删除。例如,数据管理装置400可以将已存储至持久化介质的4棵树删除。
数据管理装置400可以将LSM树的各层包括的树的根节点、中间节点、叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在前缀树或最终树中的关系的索引信息,然后将所述索引信息写入所述持久化介质,从而实现将LSM树下盘。
参见图8所示的LSM树下盘的示意图,数据管理装置400将L0~LN(图8以Ln进行示例说明)中的树的非叶子节点(也即内部节点Inner node)和叶子节点Leaf node写入持久化介质的第一类型SST,也即Inner SST。其中,写入第一类型SST的数据可以同步写入缓存,以便于提高查找效率。
由于键值对对应的final tree中的Leaf node比较多,最终占用空间可以超过总空间的90%,该类数据无法全部缓存到内存中,数据管理装置400可以将final tree的Leaf node写入持久化介质的第二类型SST中,也称作Leaf SST。该Leaf SST中的数据不缓存至内存。final tree中的Inner node可以写入Inner SST,并同步至缓存中。
需要说明的是,Inner SST和Leaf SST的存储格式类似。此外,数据管理装置400在存储LSM树时,可以按照节点信息->索引信息->SST根信息的顺序进行存储。其中,节点信息是指节点中的信息,例如是根节点的信息、根节点指向的子节点的信息,直至叶子节点的信息。索引信息是指节点之间的指针信息,用于指示节点之间关系,例如指示一个节点的父节点、子节点。SST根信息是指整个SST的相关信息,如SST的大小、节点在SST中的位置等。如此,不仅可以节省key与key之间的重复部分,还可以将大量的key和value分开进行存储,通过对分开后的数据进行压缩,进一步地节省了存储空间。而且LSM tree采用全时间序的方法,可以提升合并速度。
在本实施例中,数据管理装置400也可以采用LSM tree对映射关系进行存储。具体地,数据管理装置400可以将多个映射关系形成前缀树,然后将该前缀树写入LSM树,并在LSM树中针对满足要求的前缀树进行合并。形成前缀树以及将前缀树合并的过程可以参考将第二 键值对和第三键值对形成前缀树,并对前缀树进行合并的过程。由于映射关系的数据量较少,因此,数据管理装置400可以将映射关系对应的final tree的Leaf node也写入Inner SST,并将其同步至缓存。
接下来,结合一具体示例对本申请实施例的数据管理方法进行详细介绍。
参见图9所示的数据管理方法的流程示意图,该示例中,用户1至N可以执行数据存取操作,当用户1至N执行数据存入操作时,可以一并存取数据的元数据,该示例中,各用户的元数据通过键值对表示。其中,键名为对象的ID,键值为对象的存储地址。
如图9所示,用户1存入的元数据的键名为Objectx.xxxxxx.1,用户2存入的元数据的键名为Objectx.xxxxxx.2,以此类推,用户N存入的元数据的键名为Objectx.xxxxxx.n。多个键名之间存在公共前缀,该公共前缀为Objectx。
在该实施例中,数据管理装置400可以在接收到待写入的数据时,先基于设定规则对数据进行合法性校验。例如,键名为空,则视为不合法,校验不通过。当校验通过时,数据管理装置400可以确定满足转换条件的键值对,具体是键名的长度大于预设值的键值对,并通过转换算法将满足转换条件的键值对中的长键名(长key)转换为短键名(短key)。
若数据管理装置400生成的短key为第一次生成,则可以将短Key->长Key的映射关系写入长key管理结构中,该映射关系会进行缓存。同时,数据管理装置400可以针对原先的长key进行压缩,最终输出转换后的短Key与value。针对不满足转换条件的键值对,则数据管理装置400可以直接输出原Key/Value。如此可以避免因变长key带来的空间浪费,并且可以针对key进行压缩,进一步节省存储成本。
由于多个pool(逻辑层面的存储池)进行隔离可以导致资源被严重浪费,数据管理装置400还可以对资源池进行归一化,以解决资源浪费的问题。如图10所示,当不同用户池的数据被写入时,数据管理装置400可以识别不同用户池的标识pool ID,并将pool ID加入相应的key,例如是转换后的短key中。由于多个用户池的key存在公共前缀,使用前缀树进行存储时,pool id几乎不会额外占用空间,因而可以节省多pool隔离导致的大量资源浪费问题。
其中,在将pool id加入转换后的短key中后,数据管理装置400还可以对转换后的键值对中键值的长度进行判断。对于键值的长度大于预设长度的目标键值,数据管理装置400可以将该键值对分离,将目标键值写入键值管理,具体是通过追加写的方式写入value log。然后数据管理装置400可以利用所述目标键值的地址替换所述目标键值,获得替换后的键值对,并将所述替换后的键值对写入key/value管理,以进行持久化存储。对于键值的长度不大于预设长度的键值,数据管理装置400可以直接将其写入key/value管理,以进行持久化存储。
在一些可能的实现方式中,数据管理装置400还可以写入对象的属性,例如是对象的创建时间、创建者、对象的大小等属性至属性管理,以便于将对象的属性进行持久化存储。
其中,映射关系、对象的属性、键值对(例如为原key或转换后的key与长度不大于预设长度的键值形成的键值对,以及原key或转换后的key与长度大于预设长度的目标键值的地址形成的键值对)可以分别形成对应的前缀树,并按照时间序合并形成LSM树。数据管理装置400可以将LSM树的各层包括的树存储至持久化介质。
该方法中,数据管理装置400按照节点粒度进行下盘,而非KV粒度进行下盘,也节省了持久化介质(例如高速介质)的空间。并且,数据管理装置400以节点为粒度进行下盘,可以加快读取数据的速度。
上文结合图1至图10对本申请实施例提供的数据管理方法进行了详细介绍,下面将结合附图对本申请实施例提供的数据管理装置400进行介绍。
参见图4所示的数据管理装置400的结构示意图,该装置400包括:
交互模块402,用于获取待写入的数据,所述待写入的数据包括第一键值对,所述第一键值对包括第一键名和键值;
转换模块404,用于将所述第一键名转换为第二键名,所述第二键名的长度为预设值,生成第二键值对,所述第二键值对包括所述第二键名与所述键值,并记录所述第二键名与所述第一键名的映射关系;
管理模块406,用于存储所述第二键值对与所述映射关系。
其中,交互模块402获取待写入的数据的具体实现可以参见图5所示实施例中S502相关内容描述,转换模块404将第一键名转换为第二键名,生成第二键值对,记录第二键名和第一键名的映射关系的具体实现可以参见图5所示实施例中S504相关内容描述,管理模块406存储第二键值对与映射关系的具体实现可以参见图5所示实施例中S506相关内容描述。
在一些可能的实现方式中,所述转换模块404具体用于:
判断所述第一键名的长度是否大于所述预设值,如果大于所述预设值,则将所述第一键名转换为所述第二键名,且所述预设值小于所述第一键名的长度。
其中,转换模块404进行键名转换的具体实现可以参见图5所示实施例中S504相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述转换模块404具体用于:
根据所述第一键名,通过哈希算法确定哈希标识;
根据所述哈希标识生成第二键名。
其中,转换模块404进行键名转换的具体实现可以参见图5所示实施例中S504相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述第二键名中还包括用户标识。
其中,转换模块404可以在第二键名中添加用户标识,例如是pool ID,以实现用户数据归一化,避免重复申请管理资源。其具体实现可以参见图9所示实施例相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述待写入的数据为已存储数据的元数据。
在一些可能的实现方式中,所述已存储数据为对象存储中的对象,所述第一键值对中的第一键名为所述对象的标识,所述第一键值对中的键值包括所述对象的存储地址。
在一些可能的实现方式中,所述管理模块406具体用于:
将所述第二键值对与至少一个第三键值对形成前缀树,其中,所述第三键值对的键名与所述第二键值对的键名的长度相同,且所述第三键值对的键名与所述第二键值对的键名包括公共前缀,其中,所述公共前缀作为所述前缀树的根节点,所述第二键值对与所述至少一个第三键值对中除所述公共前缀之外的不同字符串作为所述前缀树的中间节点,所述键值作为所述前缀树的叶子节点;
将所述前缀树存储至持久化介质。
其中,管理模块406存储第二键值对的具体实现可以参见图5所示实施例中S506相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述管理模块406具体用于:
将所述前缀树写入结构化合并LSM树的第1层,所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1层,所述i和L为正整数;
对于所述L层中的第L层,当所述第L层中的树的数量达到所述第L层的预设阈值时,将所述第L层中的预设数量棵树合并为最终树;
将所述LSM树各层包括的树存储至所述持久化介质。
其中,管理模块406存储第二键值对的具体实现可以参见图5所示实施例中S506相关内容描述,在此不再赘述。
在一些可能的实现方式中,所述管理模块406具体用于:
将所述LSM树的各层包括的树的根节点、中间节点及叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在所述前缀树或所述最终树中的关系的索引信息,将所述索引信息写入所述持久化介质。
其中,管理模块406存储第二键值对的具体实现可以参见图5所示实施例中S506相关内容描述,在此不再赘述。
根据本申请实施例的数据管理装置400可对应于执行本申请实施例中描述的方法,并且数据管理装置400的各个模块/单元的上述和其它操作和/或功能分别为了实现图5至图9所示实施例中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供了一种计算机集群。该计算机集群包括至少一台计算机。例如,至少一台计算机可以形成如图1所示的集中式存储系统,当计算机集群包括多台计算机,如服务器时,也可以形成如图2所示的分布式存储系统。该计算机集群具体用于实现图4所示实施例中数据管理装置400的功能。
具体地,所述至少一台计算机包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令,所述至少一个处理器执行所述计算机可读指令,使得所述计算机集群执行前述数据管理方法,或者实现前述数据管理装置400的功能。
具体地,在实现图4所示实施例的情况下,且图4实施例中所描述的数据管理装置400的各模块为通过软件实现的情况下,执行图4中的交互模块402、转换模块404和管理模块406功能所需的软件或程序代码存储在存储器中。处理器执行所述存储器中的软件或程序代码,从而执行前述数据管理方法,或者实现前述数据管理装置的功能。
本申请实施例还提供了一种计算机可读存储介质。所述计算机可读存储介质可以是计算设备能够存储的任何可用介质或者是包含一个或多个可用介质的数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。该计算机可读存储介质包括指令,所述指令指示计算设备执行上述数据管理方法。
本申请实施例还提供了一种计算机程序产品。所述计算机程序产品包括一个或多个计算机指令。在计算设备上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机或数据 中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机或数据中心进行传输。
所述计算机程序产品可以为一个软件安装包,在需要使用前述数据管理方法的任一方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。
上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。

Claims (21)

  1. 一种数据管理方法,其特征在于,所述方法包括:
    获取待写入的数据,所述待写入的数据包括第一键值对,所述第一键值对包括第一键名和键值;
    将所述第一键名转换为第二键名,所述第二键名的长度为预设值;
    生成第二键值对,所述第二键值对包括所述第二键名与所述键值,并记录所述第二键名与所述第一键名的映射关系;
    存储所述第二键值对与所述映射关系。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述第一键名转换为第二键名,包括:
    判断所述第一键名的长度是否大于所述预设值,如果大于所述预设值,则将所述第一键名转换为所述第二键名,且所述预设值小于所述第一键名的长度。
  3. 根据权利要求1或2所述的方法,其特征在于,所述将所述第一键名转换为第二键名,包括:
    根据所述第一键名,通过哈希算法确定哈希标识;
    根据所述哈希标识生成第二键名。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述第二键名中还包括用户标识。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述待写入的数据为已存储数据的元数据。
  6. 根据权利要求5所述的方法,其特征在于,所述已存储数据为对象存储中的对象,所述第一键值对中的第一键名为所述对象的标识,所述第一键值对中的键值包括所述对象的存储地址。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述存储所述第二键值对,包括:
    将所述第二键值对与至少一个第三键值对形成前缀树,其中,所述第三键值对的键名与所述第二键值对的键名的长度相同,且所述第三键值对的键名与所述第二键值对的键名包括公共前缀,其中,所述公共前缀作为所述前缀树的根节点,所述第二键值对与所述至少一个第三键值对中除所述公共前缀之外的不同字符串作为所述前缀树的中间节点,所述键值作为所述前缀树的叶子节点;
    将所述前缀树存储至持久化介质。
  8. 根据权利要求7所述的方法,其特征在于,所述将所述前缀树存储至持久化介质,包括:
    将所述前缀树写入结构化合并LSM树的第1层,所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1层,所述i和L为正整数;
    对于所述L层中的第L层,当所述第L层中的树的数量达到所述第L层的预设阈值时,将所述第L层中的预设数量棵树合并为最终树;
    将所述LSM树各层包括的树存储至所述持久化介质。
  9. 根据权利要求8所述的方法,其特征在于,所述将所述LSM树各层包括的树存储至所述持久化介质,包括:
    将所述LSM树的各层包括的树的根节点、中间节点及叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在所述前缀树或所述最终树中的关系的索引信息,将所述索引信息写入所述持久化介质。
  10. 一种数据管理装置,其特征在于,所述装置包括:
    交互模块,用于获取待写入的数据,所述待写入的数据包括第一键值对,所述第一键值对包括第一键名和键值;
    转换模块,用于将所述第一键名转换为第二键名,所述第二键名的长度为预设值,生成第二键值对,所述第二键值对包括所述第二键名与所述键值,并记录所述第二键名与所述第一键名的映射关系;
    管理模块,用于存储所述第二键值对与所述映射关系。
  11. 根据权利要求10所述的装置,其特征在于,所述转换模块具体用于:
    判断所述第一键名的长度是否大于所述预设值,如果大于所述预设值,则将所述第一键名转换为所述第二键名,且所述预设值小于所述第一键名的长度。
  12. 根据权利要求10或11所述的装置,其特征在于,所述转换模块具体用于:
    根据所述第一键名,通过哈希算法确定哈希标识;
    根据所述哈希标识生成第二键名。
  13. 根据权利要求10至12任一项所述的装置,其特征在于,所述第二键名中还包括用户标识。
  14. 根据权利要求10至13任一项所述的装置,其特征在于,所述待写入的数据为已存储数据的元数据。
  15. 根据权利要求14所述的装置,其特征在于,所述已存储数据为对象存储中的对象,所述第一键值对中的第一键名为所述对象的标识,所述第一键值对中的键值包括所述对象的存储地址。
  16. 根据权利要求10至15任一项所述的装置,其特征在于,所述管理模块具体用于:
    将所述第二键值对与至少一个第三键值对形成前缀树,其中,所述第三键值对的键名与所述第二键值对的键名的长度相同,且所述第三键值对的键名与所述第二键值对的键名包括公共前缀,其中,所述公共前缀作为所述前缀树的根节点,所述第二键值对与所述至少一个第三键值对中除所述公共前缀之外的不同字符串作为所述前缀树的中间节点,所述键值作为所述前缀树的叶子节点;
    将所述前缀树存储至持久化介质。
  17. 根据权利要求16所述的装置,其特征在于,所述管理模块具体用于:
    将所述前缀树写入结构化合并LSM树的第1层,所述LSM树包括L层,对于所述L层中的第1层到第L-1层中的任意一层,当所述第i层包括的树的数量达到所述第i层的预设阈值时,将所述第i层的预设数量棵树合并,并将合并后的树写入第i+1层,所述i和L为正整数;
    对于所述L层中的第L层,当所述第L层中的树的数量达到所述第L层的预设阈值时,将所述第L层中的预设数量棵树合并为最终树;
    将所述LSM树各层包括的树存储至所述持久化介质。
  18. 根据权利要求17所述的装置,其特征在于,所述管理模块具体用于:
    将所述LSM树的各层包括的树的根节点、中间节点及叶子节点的值写入所述持久化介质,生成记录所述根节点、中间节点及叶子节点在所述前缀树或所述最终树中的关系的索引信息,将所述索引信息写入所述持久化介质。
  19. 一种计算机集群,其特征在于,所述计算机集群包括至少一台计算机,所述至少一台计算机包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储有计算机可读指令,所述至少一个处理器执行所述计算机可读指令,使得所述计算机集群执行如权利要求1至9任一项所述的方法。
  20. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机集群上运行时,使得所述计算机集群执行如权利要求1至9任一项所述的方法。
  21. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机集群上运行时,使得所述计算机集群执行如权利要求1至9任一项所述的方法。
PCT/CN2022/142699 2021-12-31 2022-12-28 一种数据管理方法及相关装置 WO2023125630A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111676219.X 2021-12-31
CN202111676219.XA CN116414828A (zh) 2021-12-31 2021-12-31 一种数据管理方法及相关装置

Publications (1)

Publication Number Publication Date
WO2023125630A1 true WO2023125630A1 (zh) 2023-07-06

Family

ID=86997969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142699 WO2023125630A1 (zh) 2021-12-31 2022-12-28 一种数据管理方法及相关装置

Country Status (2)

Country Link
CN (1) CN116414828A (zh)
WO (1) WO2023125630A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010590A1 (en) * 2003-07-11 2005-01-13 Bmc Software, Inc. Reorganizing database objects using variable length keys
CN103412916A (zh) * 2013-08-07 2013-11-27 北京京东尚科信息技术有限公司 一种监控系统的多维度数据存储、检索方法及装置
CN103870492A (zh) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 一种基于键排序的数据存储方法和装置
US20150293958A1 (en) * 2014-04-10 2015-10-15 Facebook, Inc. Scalable data structures
CN109460406A (zh) * 2018-10-15 2019-03-12 咪咕文化科技有限公司 一种数据处理方法及装置
US20200192940A1 (en) * 2018-12-14 2020-06-18 Micron Technology, Inc. Key-value store tree with selective use of key portion
CN111966654A (zh) * 2020-08-18 2020-11-20 浪潮云信息技术股份公司 一种基于Trie字典树的混合过滤器

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050010590A1 (en) * 2003-07-11 2005-01-13 Bmc Software, Inc. Reorganizing database objects using variable length keys
CN103870492A (zh) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 一种基于键排序的数据存储方法和装置
CN103412916A (zh) * 2013-08-07 2013-11-27 北京京东尚科信息技术有限公司 一种监控系统的多维度数据存储、检索方法及装置
US20150293958A1 (en) * 2014-04-10 2015-10-15 Facebook, Inc. Scalable data structures
CN109460406A (zh) * 2018-10-15 2019-03-12 咪咕文化科技有限公司 一种数据处理方法及装置
US20200192940A1 (en) * 2018-12-14 2020-06-18 Micron Technology, Inc. Key-value store tree with selective use of key portion
CN111966654A (zh) * 2020-08-18 2020-11-20 浪潮云信息技术股份公司 一种基于Trie字典树的混合过滤器

Also Published As

Publication number Publication date
CN116414828A (zh) 2023-07-11

Similar Documents

Publication Publication Date Title
US11258796B2 (en) Data processing unit with key value store
WO2022218160A1 (zh) 一种数据访问系统、方法、设备以及网卡
WO2020199760A1 (zh) 数据存储方法、存储器和服务器
CN106570113B (zh) 一种海量矢量切片数据云存储方法及系统
WO2024041022A1 (zh) 数据库表变更方法、装置、设备和存储介质
WO2023000770A1 (zh) 一种处理访问请求的方法、装置、存储设备及存储介质
CN115270033A (zh) 一种数据访问系统、方法、设备以及网卡
KR102471966B1 (ko) 스토리지 노드 기반의 키-값 스토어를 이용하는 데이터 입출력 방법
Luo et al. {SMART}: A {High-Performance} Adaptive Radix Tree for Disaggregated Memory
US9710479B2 (en) Providing record-level alternate-index upgrade locking
US11755555B2 (en) Storing an ordered associative array of pairs using an append-only storage medium
WO2023125630A1 (zh) 一种数据管理方法及相关装置
US8825985B2 (en) Data transfer reduction in scale out architectures
CN114428764B (zh) 文件写入方法、系统、电子设备及可读存储介质
WO2023024656A1 (zh) 数据访问方法、存储系统及存储节点
WO2022218218A1 (zh) 数据处理方法、装置、归约服务器及映射服务器
WO2023273803A1 (zh) 一种认证方法、装置和存储系统
EP4321981A1 (en) Data processing method and apparatus
CN116594551A (zh) 一种数据存储方法及装置
CN113051244B (zh) 数据访问方法和装置、数据获取方法和装置
CN114942727A (zh) 微内核文件系统可扩展页面缓存系统及方法
Tchaye-Kondi et al. Hadoop perfect file: a fast access container for small files with direct in disc metadata access
CN115438039A (zh) 存储系统的数据索引结构的调整方法和装置
WO2022083267A1 (zh) 数据处理方法、装置、计算节点以及计算机可读存储介质
US11782885B2 (en) Accessing S3 objects in a multi-protocol filesystem

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914882

Country of ref document: EP

Kind code of ref document: A1