CN113297432B

CN113297432B - Method, processor-readable medium, and system for partition splitting and merging

Info

Publication number: CN113297432B
Application number: CN202110610283.1A
Authority: CN
Inventors: 陈硕; 黄建; 庞柱; 陆庆达; 王睿; 周文翠; 吴结生
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2023-11-07
Anticipated expiration: 2041-06-01
Also published as: CN113297432A

Abstract

Methods, systems, and processor-readable media for performing partition splitting or merging on a key-value index structure are provided. In response to receiving or detecting a trigger event that merges multiple index data structures to form one index data structure, one or more data files of the multiple index data structures may be hard-linked into a directory of the index data structure. Adjacent nodes of a plurality of index data structures may be iteratively merged in a bottom-up manner to form the index data structure, and a new mapping data structure for the index data structure is constructed from an original mapping structure associated with the plurality of index data structures.

Description

Method, processor-readable medium, and system for partition splitting and merging

Technical Field

The application relates to a method, processor-readable medium and system for partition splitting and merging of key-value index structures.

Background

Persistent key-value stores play a fundamental and important role in internet-level services such as cloud services that use key-value stores to persist or maintain system state and facilitate quick lookup. Key-value storage is a storage paradigm configured to store, retrieve, and manage an associated array, and store data as a set of unique identifiers, each unique identifier associated with a corresponding value. The unique identification may act as a key (key) for the data segment, and the corresponding value associated with the unique identification may be the data segment itself or the location of the data segment. For example, the data structures typically used for key value storage may be referred to as dictionaries or hash tables. Unlike conventional relational databases, which require predefining the data structures in the database as a series of tables (which include fields with predefined data types), key-value stores can treat data as a single opaque set and allow each record to have different fields of different data types, providing flexibility and supporting the concept of object-oriented storage, which is similar to that of object-oriented programming.

To enhance flexibility and scalability of data growth, cloud services typically divide a data structure for key-value storage into multiple partitions, each of which may maintain key-value pairs whose keys are located within respective key ranges. Due to dynamic changes in data size, data access patterns, and user behavior, partitions may need to be split or merged to allow load balancing of access requests handled through key-value stores. However, existing partition splitting or merging operations require data migration, which is not only time consuming, but also takes up bandwidth and computational effort. Further, during existing partition splitting or merging operations, key-value store and cloud services become unavailable or at least partially unavailable to provide services to users. This not only affects the performance of the cloud service, but also reduces the motivation to perform partition splitting or merging operations, thereby increasing the cost of load balancing.

Disclosure of Invention

In one embodiment, a method for partition splitting and merging of key-value index structures implemented by one or more processors is provided, the method comprising: receiving a trigger event that merges the plurality of index data structures to form one index data structure; hard linking data files of the plurality of index data structures into a directory of the index data structure; iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

In another embodiment, one or more processor-readable media for partition splitting and merging of key-value index structures are provided, storing executable instructions that when executed by one or more processors cause the one or more processors to perform acts comprising: receiving a trigger event that merges the plurality of index data structures to form one index data structure; hard linking data files of the plurality of index data structures into a directory of the index data structure; iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

In another embodiment, a system for partition splitting and merging of key-value index structures is provided, comprising: one or more processors; and a memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving an instruction to split an index data structure to form a plurality of index data structures, the instruction comprising a key value; hard-linking data files of the index data structure into respective directories of the plurality of index data structures; iteratively splitting a plurality of nodes of the index data structure in a bottom-up manner to form the plurality of index data structures, a key range associated with the plurality of nodes comprising the key value; and constructing a mapping data structure of the plurality of index data structures.

Drawings

The detailed description is set forth with reference to the accompanying drawings. In the drawings, the leftmost reference numeral identifies the drawing in which the reference numeral first appears. The use of the same reference symbols in different drawings indicates similar or identical items.

FIG. 1 illustrates an example environment in which a storage system may be used.

FIG. 2 illustrates an example storage system in more detail.

FIG. 3 illustrates an example relationship between a key value data structure and a mapping data structure.

FIG. 4 illustrates an example partition merge scenario.

FIG. 5 illustrates an example partition splitting scenario.

FIG. 6 illustrates an example partition merge method.

FIG. 7 illustrates an example partition splitting method.

Detailed Description

The present application describes a number of different embodiments and implementations. The following sections describe example frameworks suitable for practicing various embodiments. Next, the present application describes example systems, devices, and processes for implementing a storage engine.

Example Environment

FIG. 1 illustrates an example environment 100 that can be used to implement a storage system. The environment 100 may include a storage engine 102. In an implementation, the storage system 102 may include a key-value engine 104 and one or more storage devices 106-1, …, 106-N (collectively storage devices 106), where N is an integer greater than or equal to 1. The key-value engine 104 and the plurality of storage devices 106 may communicate data with each other via one or more connections 108-1, …, 108-M (collectively, connections 108), where M is an integer greater than or equal to 1. In this example, one or more storage devices 106 are referred to as being included in storage system 102. In other examples, one or more storage devices 106 may be associated with storage system 102 and accessible by storage system 102.

In this example, the storage system 102 is depicted as a single entity. In other examples, the storage system 102 may be located or included in one or more servers 110-1, …, 110-K (collectively servers 110), where K is an integer greater than or equal to 1. In an implementation, the storage system 102 may be included in a data center or cloud computing infrastructure 112, which may include, for example, a plurality of servers (e.g., server 110). In an embodiment, the storage system 102 may be part of a data center or cloud computing infrastructure 112 and may be responsible for storing data and providing related storage functions, such as logging, querying data in response to user requests, and the like. Further, in an embodiment, the environment 100 may also include one or more client devices 114-1, …, 114-L (collectively client devices 114), where L is an integer greater than or equal to 1. One or more client devices 114 may communicate data with a data center or cloud computing infrastructure 112 (including storage system 102 and/or server 110) over a network 116.

In an implementation, each of the one or more servers 110 and the one or more client devices 114 may be implemented as any of a variety of computing devices, but are not limited to desktop, notebook or portable computers, handheld, netbook, internet devices, tablet or tablet computers, mobile devices (e.g., mobile phones, personal digital assistants, smart phones, etc.), server computers, etc., or a combination thereof.

In an implementation, each of the one or more storage devices 106 may be implemented as any of a variety of devices having memory or storage capabilities, but is not limited to a block storage device, a Solid State Device (SSD), a NUMA (non-uniform memory access) device, an NVMe (non-volatile memory express) device, or the like.

The one or more connections 108 may be a data communication network including one or more data communication lines or channels connecting the storage system 102 (e.g., memory of the storage system 102) and the one or more storage devices 106 through wireless and/or wired connections. Examples of wired connections may include an electrical carrier connection (e.g., a communication cable, a computer, or a communication bus, such as a serial bus, PCIe bus, or lane, etc.), an optical carrier connection (e.g., a fiber optic connection, etc.). The wireless connection may include for example a WiFi connection, other radio frequency connections (e.g.,etc.), and so on.

In an embodiment, the network 116 may be a wireless or wired network or a combination thereof. The network 116 may be a collection of individual networks interconnected with each other and functioning as a single large network (e.g., the internet or an intranet). Examples of such a single network include, but are not limited to, telephone networks, wired networks, local Area Networks (LANs), wide Area Networks (WANs), and Metropolitan Area Networks (MANs). Further, each network may be a wireless or wired network or a combination thereof. The wired network may include electrical carrier connections (e.g., communication cables, etc.) and/or optical carrier waves or connections (e.g., fiber optic connections, etc.). The wireless network may include for example a WiFi network, other radio frequency networks (e.g., Zigbee, etc.), and the like.

Example storage System

Fig. 2 illustrates the storage system 102 in more detail. In an embodiment, the storage system 102 may include, but is not limited to, one or more processors 202, input/output (I/O) interfaces 204 and/or network interfaces 206, and a memory 208. Additionally, the storage system 102 may also include a key-value engine 210 (e.g., key-value engine 104), one or more storage devices 212 (e.g., one or more storage devices 106), and one or more data communication channels 214. In an embodiment, the key-value engine 210 may include at least one processor (e.g., the processor 202) and memory (e.g., the memory 208).

In an embodiment, some of the functions of the storage system 102 may be implemented using hardware, such as an ASIC (i.e., application specific integrated circuit), an FPGA (i.e., field programmable gate array), and/or other hardware. In an implementation, the storage system 102 may include or be included in one or more computing devices.

In an embodiment, the processor 202 may be configured to execute instructions stored in the memory 208 and/or received from the I/O interface 204 and/or the network interface 206. In an embodiment, the processor 202 may be implemented as one or more hardware processors including, for example, a microprocessor, a special purpose instruction set processor, a Physical Processing Unit (PPU), a Central Processing Unit (CPU), a graphics processing unit, a digital signal processor, a tensor processing unit, and the like. Additionally or alternatively, the functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

Memory 208 may include a computer-readable medium (or processor-readable medium) in the form of volatile memory, such as Random Access Memory (RAM), and/or in the form of non-volatile memory, such as Read Only Memory (ROM) or flash RAM. Memory 208 is an example of a computer-readable medium (or processor-readable medium).

Computer-readable media (or processor-readable media) may include volatile or nonvolatile types, removable or non-removable media, which may be used to implement storage of information using any method or technology. The information may include computer readable instructions (or processor readable instructions), data structures, program modules, or other data. Examples of computer-readable media (or processor-readable media) include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electronically Erasable Programmable Read Only Memory (EEPROM), flash memory or other internal storage technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer-readable medium (or processor-readable medium) does not include any transitory medium such as a modulated data signal and carrier wave.

In an embodiment, the memory 208 and the one or more memory devices 212 may include, but are not limited to, different types of memory or storage devices having different storage and/or processing capabilities, e.g., memory or storage devices having different response delays, storage devices having different degrees of proximity to the processor of the memory system 102 and/or the key value engine 210, memory or storage devices having different data access (i.e., data read and/or write) speeds, memory or storage devices having different program erase cycles and/or read disturb thresholds, and the like. In an embodiment, the memory 208 may have better performance (e.g., lower response latency, higher data access speed, etc.) and/or be closer to the processor of the storage system 102 and/or the key value engine 210 than the one or more storage devices 212.

By way of example, and not limitation, memory 208 may be implemented as any of a variety of different types of memory devices having and providing storage capabilities, and may include, but is not limited to, main storage and the like. In an embodiment, the main storage may include, but is not limited to, cache, main memory (e.g., random Access Memory (RAM), such as Dynamic Random Access Memory (DRAM), etc.), DCPMM (i.e., a data center persistent memory module, such as an Optane) ^TM Persistent storage), NVMe (i.e., nonvolatile memory express), and the like.

Further, the one or more storage devices 212 (or the one or more storage devices 106) may be implemented as any of a variety of different types of storage devices having and providing storage capabilities, and may include, but are not limited to, secondary storage, tertiary storage, and the like. In an embodiment, the secondary storage may include, but is not limited to, flash memory or Solid State Devices (SSDs), hybrid hard disk drives (HHD), hard Disk Drives (HDD), and the like. Flash memory or solid state devices may include, for example, SLC (i.e., single level cell) flash memory, MLC (i.e., multi-level cell) flash memory (e.g., TLC (i.e., three level cell) flash memory), QLC (i.e., four level cell) flash memory, PLC (i.e., five level cell) flash memory), and the like. In an embodiment, the tertiary storage may include, but is not limited to, external memory or removable storage media, such as an external flash drive or SSD, external HHD, external HDD, or the like.

In an embodiment, the one or more data communication channels 214 may include at least one ofOr a plurality of data communication lines or channels that enable different components of the storage system 102 (e.g., the one or more processors 202, the memory 208, the key-value engine 210, the one or more storage devices 212, etc.) to communicate data and instructions with each other via wireless and/or wired connections. Examples of wired connections may include an electrical carrier connection (e.g., a communication cable, a computer, or a communication bus, such as a serial bus, PCIe bus, or lane, etc.), an optical carrier connection (e.g., a fiber optic connection, etc.). The wireless connection may include, for example, a WiFi connection, other radio frequency connections (e.g. Etc.), etc.

Although only hardware components in the storage system 102 are described in this example, in other examples, the storage system 102 may also include other hardware components and/or other software components, such as a program unit 216 for executing instructions stored in the memory 208 to perform various operations, and program data 218 storing application data and data of tasks handled by different components of the storage system 102. In this example, one or more storage devices 212 are described as being included in storage system 102. In other examples, one or more storage devices 212 may be associated with the storage system 102. For example, one or more storage devices 212 may be peripheral and accessible to one or more components of storage system 102 (e.g., key-value engine 210). By way of example and not limitation, the key value engine 210 may communicate data with one or more storage devices 212 via one or more data communication channels 214.

Example index and map data Structure

In an embodiment, the storage system 102 or key-value engine 104 may employ an ordered indexing system configured to provide and facilitate insertion, retrieval, and scanning operations. In an embodiment, the storage system 102 or key-value engine 104 may construct an index system based at least in part on an index data structure (e.g., a key-value data structure). By way of example and not limitation, the key-value data structure may include a tree or hierarchical data structure, which may include, but is not limited to, a B-tree, a B+ tree, a Bw tree, and the like. In an embodiment, the key-value data structure may comprise a probabilistic data structure, such as a skip list. In an embodiment, to improve the performance of the storage system 102, the storage system 102 or key-value engine 104 may further employ a mapping data structure that may map logical storage addresses stored or referenced in the key-value data structure to physical storage addresses in a storage device (e.g., storage device 212) and convert random writes caused by the key-value data structure due to user transactions or requests into a sequential pattern.

FIG. 3 illustrates an example relationship between a key value data structure 302 and a mapping data structure 304. In this example, the key-value data structure 302 is described as comprising a tree data structure (e.g., a B+ tree) having a plurality of levels, and comprising a root node 306, a plurality of internal nodes 308-1, …, 308-M, and a plurality of leaf nodes 310-1, …, 310-N, where M and N are integers greater than zero. In an embodiment, leaf node 310-i may store or include a corresponding key set or range (i.e., key range e _i ) And has a logical address p _i Wherein i is more than or equal to 1 and N is more than or equal to N. Mapping data structure 304, which in this example is shown as a table, may include a mapping entry defining a logical address of a node of key-value data structure 302 (i.e., p, for example ₁ ，…，p _N ) With the physical address of the data (i.e. e.g. d ₁ ，…，d _N ) The physical address of the data is logically identified by the corresponding key and physically stored in a storage device (e.g., storage device 212). In an embodiment, the storage system 102 or key value engine 104 may locate data and perform data reads and writes according to user requests based on the key value data structure 302 and the mapping data structure 304.

In an embodiment, when performing partition splitting or merging operations, the storage system 102 (or more specifically, the key-value engine 104) may perform such partition splitting or merging operations in multiple stages. In an embodiment, the plurality of stages may include, but are not limited to, a preparation stage, a build stage, and the like.

In an embodiment, depending on whether the operation is a partition split or merge operation (i.e., for a partition split operation, multiple child index data structures are generated from one parent index data structure, or for a partition merge operation, one child index data structure is generated from multiple parent index data structures), during the preparation phase, key-value engine 104 may first hard link the data files (e.g., the encapsulated data files) of one or more parent index data structures to one or more child index data structures while continuing to process user requests using one or more parent index data structures.

In an embodiment, during the build phase, key-value engine 104 may take one or more parent index data structures offline and unavailable to handle user requests, hard link any remaining data files of the one or more parent index data structures (e.g., data files generated or written by an activity other than the hard-linked save file during the prepare phase), build one or more child index data structures, and update the original mapping data structures associated with the one or more parent index data structures to complete the partition splitting or merging operation. Because the data files of one or more parent index data structures are hard-linked, rather than being copied from one location to another, data migration is minimized while saving computational costs such as processing resources and time.

Example method

FIG. 4 shows a schematic diagram depicting an exemplary partition merge scenario. FIG. 5 shows a schematic diagram depicting an exemplary partition splitting scenario. FIG. 6 shows a schematic diagram depicting an exemplary partition merge method. FIG. 7 shows a schematic diagram depicting an exemplary partition splitting method. The methods of fig. 6 and 7 may, but need not, be implemented in the environment of fig. 1 and using the system of fig. 2, the relationship of fig. 3, and the example scenarios of fig. 4 and 5. For ease of explanation, the methods 600 and 700 are described with reference to fig. 1-5. However, methods 600 and 700 may alternatively be implemented in other environments and/or using other systems.

Methods 600 and 700 are described in the general context of computer-executable instructions. Generally, computer-executable instructions may include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. Furthermore, each of the exemplary methods is illustrated as a collection of blocks in a logic flow diagram that represents a sequence of operations that may be implemented in hardware, software, firmware, or a combination thereof. The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternative method. In addition, various blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein. In the case of software, the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations. In the case of hardware, some or all of the blocks may represent Application Specific Integrated Circuits (ASICs) or other physical components that perform the operations.

Returning to FIG. 6, at block 602, key value engine 104 may receive or detect a trigger event to perform a partition merge operation.

In an embodiment, the storage system 102 (or, in particular, the key-value engine 104 of the storage system 102 hereinafter) may receive or detect a trigger event to perform a partition merge operation. In an embodiment, the triggering event may include, for example, receiving an instruction from a user to merge adjacent index data structures (or referred to as adjacent parent index partitions) into a single index data structure (or referred to as child index partitions). In an embodiment, two index data structures are described as being adjacent if the key range of at least one node of an index data structure is adjacent or near to the key range of at least one node of another index data structure.

Additionally or alternatively, the key-value engine 104 may perform partition merge operations for load balancing purposes. For example, key-value engine 104 may receive an associated user request to process data whose logical keys are respectively located in adjacent index data structures. If key-value engine 104 receives these relevant user requests continuously or in an amount greater than a predetermined number of thresholds for a predetermined period of time, key-value engine 104 may identify it as a trigger event to perform a partition merge operation and attempt to merge or merge these separate index data structures together to reduce the cost of accessing and searching the index data structures.

Additionally or alternatively, key-value engine 104 may detect that the total number of access requests to adjacent index data structures within a predetermined period of time is less than or equal to a predetermined access threshold. In response to this trigger event, key-value engine 104 may perform a partition merge operation to merge these adjacent index data structures to reduce the storage cost of two separate index data structures in memory (e.g., memory 208) and increase the access rate of the index data structures obtained from the merging of the index data structures.

At block 604, the key-value engine 104 may hard link the data files of the plurality of index data structures into the directory of index data structures.

In an embodiment, after receiving or detecting a trigger event to perform a partition merge operation, key value engine 104 may enter a preparation phase of the partition merge operation. In an embodiment, key-value engine 104 may hard link data files of multiple index data structures that are intended to be incorporated into a directory of index data structures. In an embodiment, the key-value engine 104 may hard link the data files of the plurality of index data structures by assigning additional filenames to the data files of the plurality of index data structures, the additional filenames being independently connected to respective data segments of the data files of the plurality of index data structures. In an embodiment, key engine 104 may add these additional filenames of the data files to the directory of the index data structure. In an embodiment, the data files of the plurality of index data structures may include, for example, the encapsulation files of the plurality of index data structures (e.g., encapsulation metadata files and encapsulation data files in a storage device such as storage device 212). In an embodiment, the sequestered file may comprise a file that does not accept new data and is not modifiable.

In an embodiment, the key-value engine 104 may further set a write-request threshold to limit the number of write requests to be processed in response to receiving the trigger event. In an embodiment, to reduce the amount of time that storage system 102 or key-value engine 104 is unavailable to process user requests using multiple index data structures during subsequent construction phases of a partition merge operation, key-value engine 104 may further limit the number of write requests to process after a trigger event is received or detected to reduce the number of files to be generated and/or written to a controllable or acceptable number (i.e., write request threshold). In an embodiment, this write request threshold may be determined based on a number of factors, which may include, but are not limited to, an allowable or tolerable amount of unavailable time to process a user request, the size of the multiple index data structures to be consolidated, the processing power of key value engine 104 to complete the consolidation or consolidation of multiple index data structures in the build phase, and the like. In an embodiment, the write request threshold may also be set by a user (e.g., an administrator) of the storage system 102.

In an embodiment, after successfully hard linking the data files of the plurality of index data structures to the directory of the index data structure, the key engine 104 may enter the construction phase of the partition merge operation. In an embodiment, key value engine 104 may make multiple index data structures unavailable for service provision, e.g., processing user requests. In addition, the key-value engine 104 may hard link any remaining files that were not previously hard linked during the preparation phase. For example, key value engine 104 may hard link one or more active data files in a plurality of index data structures into a directory of index data structures. In an embodiment, the one or more active data files may include files that are different from the previously hard-linked sequestered files at the preparation stage, and may include files that were written and/or created after the trigger event was received or detected and before the plurality of index data structures were made unavailable for service provision (e.g., processing user requests). In an embodiment, since the key-value engine 104 previously set a write-request threshold to limit the number of write requests to be processed during the preparation phase, the time to process (e.g., hard link) these remaining files may be significantly shortened according to the write-request threshold.

At block 606, the key-value engine 104 may iteratively merge adjacent nodes of the plurality of index data structures to form the index data structure in a bottom-up manner.

In an embodiment, after hard linking the data files (e.g., the sequestered and active files) of the multiple index data structures, key value engine 104 may begin to merge the multiple index data structures. In an embodiment, key-value engine 104 may iteratively merge adjacent nodes of multiple index data structures in a bottom-up manner to form an index data structure. In an embodiment, key-value engine 104 may write key-value pairs associated with adjacent nodes of a plurality of index data structures into corresponding merge nodes in the index data structure; reporting the new key range of the key in the key-value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

The scenario shown in fig. 4 is used as an illustrative example. FIG. 4 illustrates an example scenario in which two index data structures 402 and 404 (shown as tree structures in this example) are to be merged. Obviously, other index data structures, such as other types of hierarchical data structures or probabilistic data structures, may also be applicable. In this example, index data structure 402 may be associated with key range A1-A2 and include a plurality of nodes, such as N0, N1, and N2. Index data structure 404 may be associated with key range A3-A4 and include a plurality of nodes, such as N3 and N4. In this example, index data structure 402 and index data structure 404 are index data structures that are adjacent or contiguous to each other. In an embodiment, two index data structures are referred to as index data structures that are adjacent or contiguous to each other if the respective key ranges or sets represented by the two index data structures are contiguous or contiguous to each other, or are separated by an amount less than a predetermined separation threshold. The predetermined separation threshold may be predefined by a user (e.g., an administrator) of the storage system 102.

In an embodiment, when index data structures are merged or merged, key-value engine 104 may merge or merge adjacent nodes (or referred to as adjacent border nodes) of the index data structures in a bottom-up manner (i.e., bottom-up). In an embodiment, adjacent nodes (or called adjacent boundary nodes) of the index data structure may include nodes whose represented key ranges or sets are adjacent or contiguous to each other, or threshold with a predetermined proximityThe value locates the node. Using the example shown in fig. 4, key-value engine 104 may first attempt to merge two adjacent boundary nodes (i.e., N2 and N4) at the respective lowest levels of index data structures 402 and 404, and write or copy the key-value pairs of the two adjacent boundary nodes into a new node N5 (or referred to as a merge node) for merging index data structure 406. In an embodiment, depending on the type of index data structures 402, 404, and 406, key-value engine 104 may send the higher level parent node of index data structure 406 (e.g., node N in this example of FIG. 4) _c2 ) Reporting new key ranges or key sets and/or by new node N _c1 The maximum key value represented.

In an embodiment, key-value engine 104 may repeat the above operations at the next level for another pair of adjacent boundary nodes (e.g., N1 and N3 in the example shown in FIG. 4) until reaching the root node of the index data structure (e.g., index data structure 406 in the example of FIG. 4). The key-value engine 104 may then update the root node of the index data structure by writing information that covers the new key range or set of its child nodes, the index data structure (i.e., root node and child nodes) including all key-value pairs originally included in the plurality of index data structures (e.g., index data structures 402 and 404) in the new key range or set of index data structures.

At block 608, the key-value engine 104 may construct a mapping data structure for the index data structure.

In an embodiment, after merging or merging multiple index data structures into an index data structure, key value engine 104 may build a new mapping data structure for the index data structure. In an embodiment, key-value engine 104 may delete mapping entries of neighboring nodes of the plurality of index data structures from an original mapping data structure associated with the plurality of index data structures and add new mapping entries of merging nodes of the index data structures to the original mapping data structure to construct a new mapping data structure for the index data structure. For example, FIG. 4 shows the corresponding mapping entries for adjacent boundary nodes N2 and N4 deleted from the original mapping data structure 408 of index data structures 402 and 404, and the mapping entries for new or merged nodes Nc1 and Nc2 are added to the original mapping data structure to form a new mapping data structure 410 for index data structure 406.

Returning to FIG. 7, at block 702, key value engine 104 may receive or detect a trigger event to perform a partition splitting operation.

In an embodiment, the key value engine 104 of the storage system 102 may receive or detect a trigger event to perform a partition splitting operation. For example, key-value engine 104 may receive or detect a triggering event by receiving an instruction to split an index data structure (or referred to as a parent index partition) to form a plurality of index data structures (or referred to as a plurality of child index partitions), and the instruction may include a key value.

Additionally or alternatively, the key-value engine 104 may perform partition splitting operations for load balancing purposes. For example, key value engine 104 may receive a number of user requests to process data whose logical keys are located in the same index data structure. If key-value engine 104 receives these user requests within a predetermined period of time at a frequency (rate) greater than a predetermined frequency threshold, key-value engine 104 may identify this as detecting a trigger event to perform a partition splitting operation and attempt to split the index data structure into separate index data structures to enable parallel processing of the user requests by multiple threads or processes of key-value engine 104 using these separate index data structures for load balancing. In an embodiment, key value engine 104 may determine a key value for splitting an index data structure into a plurality of separate index data structures. For example, key-value engine 104 may determine such key-values based at least in part on load balancing. By way of example and not limitation, key-value engine 104 may determine key-values such that a corresponding number or frequency of past user requests to access (e.g., write or read) data whose logical keys are covered by separate index data structures over a past predetermined period of time is approximately the same or different than an amount within a predetermined threshold.

At block 704, the key-value engine 104 may hard link the data files of the index data structure into respective directories of the plurality of index data structures.

In an embodiment, the key value engine 104 may enter the preparation phase upon receiving or detecting a trigger event to perform a partition merge operation. In an embodiment, key-value engine 104 may hard link data files of an index data structure into respective directories of a plurality of index data structures. In an embodiment, the data file of the index data structure may comprise a sequestered file of the index data structure, wherein the sequestered file comprises a file that does not accept new data and is not modifiable.

In an embodiment, after key-value engine 104 hard links the data files of the index data structure (e.g., the encapsulated metadata file and the encapsulated data file) to the respective directories of the plurality of index data structures during the preparation phase, key-value engine 104 may then render the index data structure unavailable for providing services, such as processing user requests, before entering the construction phase (e.g., before iteratively splitting the nodes of the index data structure). In an embodiment, key-value engine 104 may further hard link one or more active data files of an index data structure into a corresponding directory of a plurality of index data structures. The one or more active data files may be different from the sequestered files and may include files that were written and/or created prior to making the plurality of index data structures unavailable for providing the service.

At block 706, the key-value engine 104 may iteratively split a plurality of nodes of the index data structure in a bottom-up manner to form a plurality of index data structures, a key range associated with the plurality of nodes including a key value.

In an embodiment, after key-value engine 104 hard links the data files (e.g., the sequestered files and the active files) of the index data structure to the respective directories of the plurality of index data structures, key engine 104 may iteratively split the nodes of the index data structure in a bottom-up manner to form the plurality of index data structures. In an embodiment, key value engine 104 may split key value pairs of one of the plurality of nodes of the index data structure into respective split nodes of the plurality of index data structures according to the key values, and report key ranges and/or maximum key values of keys of the split nodes to higher-level split nodes of the plurality of index data structures according to the type of index data structure. In an embodiment, key-value engine 104 may repeat the above operations for the next node of the higher level of the index data structure until the root node of the index data structure is rewritten.

A scenario as shown in fig. 5 is used as an illustrative example. Fig. 5 shows an example scenario in which an index data structure 502, shown in this example as a tree structure, is to be split. Obviously, other index data structures, such as other types of hierarchical data structures or probabilistic data structures, may also be applicable. In this example, splitting or partitioning the index data structure 502 along the key value A1 is described. In an embodiment, multiple index data structures may be obtained or formed independently by splitting or partitioning an index data structure (e.g., index data structure 502 as shown in fig. 5). In this example, one of the plurality of index data structures may hold or obtain a left portion of index data structure 502 and another of the plurality of index data structures may hold or obtain a right portion of index data structure 502.

In an embodiment, for each index data structure of the plurality of index data structures, key-value engine 104 may initiate a replay process in a bottom-up manner (i.e., in a bottom-up manner). For example, key-value engine 104 may locate or determine a boundary node (e.g., boundary node N3 shown in FIG. 5) at which a key value (e.g., key A1 in FIG. 5) is located or exists. In an embodiment, key value engine 104 may iteratively filter out keys that are not within a respective key range for each index data structure of the plurality of index data structures and write keys belonging to the respective index data structure of the plurality of index data structures into new nodes of the respective index data structure. In an embodiment, the key value engine 104 may further report the corresponding key range or maximum key value for each index data structure to an upper level. The key-value engine 104 may then repeatedly or iteratively perform the above operations for the parent node (e.g., node N2 in fig. 5) of each level of just processed boundary nodes (e.g., boundary node N3 in fig. 5) until the root node of the split index data structure is overwritten. In an embodiment, key-value engine 104 may complete such splitting or partitioning of the index data structure to obtain a plurality of index data structures after obtaining a new root node for each of the plurality of index data structures, with the respective keys being assigned to respective key ranges. In an embodiment, key-value engine 104 may only need to update border nodes between multiple index data structures, as only these border nodes are processed at the build stage. For example, as shown in FIG. 5, only the boundary nodes (i.e., nodes N5, N3, and N1) between the index data structures are updated.

At block 708, the key value engine 104 may construct a mapping data structure for the plurality of index data structures.

In an embodiment, after splitting the index data structure into multiple index data structures, key value engine 104 may build a new mapping data structure for the multiple index data structures. In an embodiment, similar to the partition merge operation, key-value engine 104 may delete a mapping entry for a node of an index data structure from an original mapping data structure associated with the index data structure and add a new mapping entry for a split node of the plurality of index data structures to the original mapping data structure to construct a new mapping data structure for the plurality of index data structures. In addition, in embodiments, since each index data structure of the plurality of index data structures does not process its data, all map entries that are out of range can be removed at a controlled rate by background processing without interfering with normal operation.

Any of the acts of any of the methods described herein may be implemented at least in part by a processor or other electronic device based on instructions stored on one or more computer-readable media. By way of example, and not limitation, any of the acts of any of the methods described herein may be implemented under the control of one or more processors configured with executable instructions that may be stored on one or more computer-readable media.

Conclusion(s)

Although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed subject matter. Additionally or alternatively, some or all of the operations may be implemented by one or more ASICs, FPGAs, or other hardware.

The invention may be further understood using the following entries.

Item 1: a method implemented by one or more processors, the method comprising: receiving a trigger event that merges the plurality of index data structures to form one index data structure; hard linking data files of the plurality of index data structures into a directory of the index data structure; iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

Item 2: the method of item 1, wherein hard-linking the data files of the plurality of index data structures into the directory of index data structures comprises hard-linking the sequestered files of the plurality of index data structures into the directory of index data structures, the sequestered files not accepting new data and being non-modifiable.

Item 3: the method of item 2, wherein prior to iteratively merging adjacent nodes of the plurality of index data structures, the method further comprises: making the plurality of index data structures unavailable for providing services; and hard-linking one or more active data files of the plurality of index data structures into a directory of the index data structure.

Item 4: the method of item 3, wherein the one or more active data files are different from the sequestered file and include files that were written and/or created prior to making the plurality of index data structures unavailable for providing services.

Item 5: the method of item 1, wherein receiving the trigger event comprises: receiving an instruction from a user to merge the plurality of index data structures into the index data structure; or detecting that the total number of access requests to the plurality of index data structures is less than or equal to a predetermined access threshold.

Item 6: the method of item 1, further comprising setting a threshold to limit a number of write requests to be processed in response to receiving the trigger event.

Item 7: the method of item 1, wherein iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure comprises: writing key-value pairs associated with adjacent nodes of the plurality of index data structures into corresponding merge nodes in the index data structure; reporting a new key range for a key in the key-value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

Item 8: the method of item 1, wherein constructing a mapping data structure of the index data structure comprises: deleting mapping entries of neighboring nodes of the plurality of index data structures from an original mapping data structure associated with the plurality of index data structures, and adding new mapping entries of merging nodes of the index data structures to the original mapping data structure to construct a mapping data structure of the index data structure.

Item 9: one or more processor-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: receiving a trigger event that merges the plurality of index data structures to form one index data structure; hard linking data files of the plurality of index data structures into a directory of the index data structure; iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

Item 10: the one or more processor-readable media of item 9, wherein hard-linking the data files of the plurality of index data structures into the directory of index data structures comprises hard-linking the sequestered files of the plurality of index data structures into the directory of index data structures, the sequestered files not accepting new data and being unmodified.

Item 11: the one or more processor-readable media of item 10, wherein prior to iteratively merging adjacent nodes of the plurality of index data structures, the acts further comprise: making the plurality of index data structures unavailable for providing services; and hard linking one or more active data files of the plurality of index data structures into a directory of the index data structure, the one or more active data files being different from the sequestered file and including files that were written and/or created prior to making the plurality of index data structures unavailable for providing services.

Item 12: the one or more processor-readable media of item 9, wherein receiving the trigger event comprises: receiving an instruction from a user to merge the plurality of index data structures into the index data structure; or detecting that the total number of access requests to the plurality of index data structures is less than or equal to a predetermined access threshold.

Item 13: the one or more processor-readable media of item 9, the actions further comprising setting a threshold to limit a number of write requests to be processed in response to receiving the trigger event.

Item 14: the one or more processor-readable media of item 9, wherein iteratively merging adjacent nodes of the plurality of index data structures in a bottom-up manner to form the index data structure comprises: writing key-value pairs associated with adjacent nodes of the plurality of index data structures into corresponding merge nodes in the index data structure; reporting a new key range for a key in the key-value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

Item 15: the one or more processor-readable media of item 9, wherein constructing the mapping data structure of the index data structure comprises: deleting mapping entries of neighboring nodes of the plurality of index data structures from an original mapping data structure associated with the plurality of index data structures, and adding new mapping entries of merging nodes of the index data structures to the original mapping data structure to construct a mapping data structure of the index data structure.

Item 16: a system, comprising: one or more processors; and a memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving an instruction to split an index data structure to form a plurality of index data structures, the instruction comprising a key value; hard-linking data files of the index data structure into respective directories of the plurality of index data structures; iteratively splitting a plurality of nodes of the index data structure in a bottom-up manner to form the plurality of index data structures, a key range associated with the plurality of nodes comprising the key value; and constructing a mapping data structure of the plurality of index data structures.

Item 17: the system of item 16, wherein hard-linking the data files of the index data structure into the respective directories of the plurality of index data structures comprises hard-linking the sequestered files of the index data structure into the respective directories of the plurality of index data structures, the sequestered files not accepting new data and being non-modifiable.

Item 18: the system of item 17, wherein prior to iteratively splitting the nodes of the index data structure, the acts further comprise: making the index data structure unavailable for providing services; and hard linking one or more active data files of the index data structure into respective directories of the plurality of index data structures, the one or more active data files being different from the sequestered file and including files written and/or created prior to making the plurality of index data structures unavailable for providing services.

Item 19: the system of item 16, wherein iteratively splitting nodes of the index data structure in a bottom-up manner to form the plurality of index data structures comprises: splitting key value pairs of one node in a plurality of nodes of the index data structure into corresponding split nodes of the plurality of index data structures according to the key values; reporting key ranges of keys of the split node to respective higher levels of the split nodes of the plurality of index data structures; and repeating the splitting and reporting until a root node of the index data structure is rewritten.

Item 20: the system of item 16, wherein constructing the mapping data structure for the plurality of index data structures comprises: deleting a mapping item of the node of the index data structure from an original mapping data structure associated with the index data structure and adding a new mapping item of a split node of the plurality of index data structures to the original mapping data structure to construct the mapping data structure of the plurality of index data structures.

Claims

1. A method implemented by one or more processors for partition splitting and merging of key-value index structures, the method comprising:

receiving a trigger event that merges the plurality of first index data structures to form a second index data structure;

hard linking data files of the plurality of first index data structures into a directory of the second index data structure;

iteratively merging adjacent nodes of the plurality of first index data structures in a bottom-up manner to form the second index data structure; the adjacent nodes include nodes whose represented key ranges or sets are adjacent or consecutive to each other, or nodes that are located at a predetermined proximity threshold; and

And constructing a mapping data structure of the second index data structure.

2. The method of claim 1, wherein hard linking the data files of the plurality of first index data structures into the directory of the second index data structure comprises:

hard-linking a sequestered file of the plurality of first index data structures into a directory of the second index data structure, the sequestered file not accepting new data and being non-modifiable.

3. The method of claim 2, wherein prior to iteratively merging adjacent nodes of the plurality of first index data structures, the method further comprises:

making the plurality of first index data structures unavailable for providing services; and

one or more active data files of the plurality of first index data structures are hard-linked into a directory of the second index data structure.

4. The method of claim 3, wherein,

the one or more active data files are different from the sequestered file, and

the one or more active data files include a first file that is written to and/or created prior to making the plurality of first index data structures unavailable for providing services.

5. The method of claim 1, wherein receiving the trigger event comprises:

receiving an instruction from a user to merge the plurality of first index data structures into the second index data structure; or alternatively

It is detected that a total number of access requests to the plurality of first index data structures is less than or equal to a predetermined access threshold.

6. The method of claim 1, further comprising:

a threshold is set to limit the number of write requests to be processed in response to receiving the trigger event.

7. The method of claim 1, wherein iteratively merging adjacent nodes of the plurality of first index data structures in a bottom-up manner to form the second index data structure comprises:

writing a plurality of key-value pairs associated with adjacent nodes of the plurality of first index data structures into corresponding merge nodes in the second index data structure;

reporting a new key range for a key in the plurality of key-value pairs to a higher level of the merge node; and

the writing and reporting are repeated until a root node of the second index data structure is reached.

8. The method of claim 1, wherein constructing a mapping data structure of the second index data structure comprises:

Deleting mapping entries of neighboring nodes of the plurality of first index data structures from original mapping data structures associated with the plurality of first index data structures, and

adding a new mapping entry of a merge node of the second index data structure to the original mapping data structure to construct a mapping data structure of the second index data structure.

9. One or more processor-readable media for partition splitting and merging of key-value index structures, storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

10. A system for partition splitting and merging of key-value index structures, comprising:

one or more processors; and

a memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:

receiving an instruction to split a first index data structure to form a plurality of second index data structures, the instruction comprising a key value;

hard-linking data files of the first index data structure into respective directories of the plurality of second index data structures;

Iteratively splitting a plurality of nodes of the first index data structure in a bottom-up manner to form the plurality of second index data structures, a key range associated with the plurality of nodes comprising the key value; and

a mapping data structure of the plurality of second index data structures is constructed.

11. The system of claim 10, wherein hard-linking the data files of the first index data structure into the respective directories of the plurality of second index data structures comprises:

hard-linking a sequestered file of the first index data structure into a corresponding directory of the plurality of second index data structures, the sequestered file not accepting new data and being non-modifiable.

12. The system of claim 11, wherein prior to iteratively splitting the nodes of the first index data structure, the acts further comprise:

making the first index data structure unavailable for providing services; and

one or more active data files of the first index data structure are hard-linked into respective directories of the plurality of second index data structures, the one or more active data files being different from the sequestered files and including files written and/or created prior to making the plurality of second index data structures unavailable for providing services.

13. The system of claim 10, wherein iteratively splitting nodes of the first index data structure in a bottom-up manner to form the plurality of second index data structures comprises:

splitting key value pairs of one node in a plurality of nodes of the first index data structure into corresponding split nodes of the plurality of second index data structures according to the key values;

reporting a key range of keys of the split node to respective higher levels of the split nodes of the plurality of second index data structures; and

the splitting and reporting are repeated until the root node of the first index data structure is rewritten.

14. The system of claim 10, wherein constructing the mapping data structure for the plurality of second index data structures comprises:

deleting a mapping entry of the node of the first index data structure from an original mapping data structure associated with the first index data structure and adding a new mapping entry of a split node of the plurality of second index data structures to the original mapping data structure to construct the mapping data structure of the plurality of second index data structures.