CN113297432A

CN113297432A - Method, processor readable medium and system for partition splitting and merging

Info

Publication number: CN113297432A
Application number: CN202110610283.1A
Authority: CN
Inventors: 陈硕; 黄建; 庞柱; 陆庆达; 王睿; 周文翠; 吴结生
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2021-08-24
Anticipated expiration: 2041-06-01
Also published as: CN113297432B

Abstract

Methods, systems, and processor-readable media for performing partition splitting or merging on a key-value index structure are provided. In response to receiving or detecting a triggering event that merges multiple index data structures to form one index data structure, one or more data files of the multiple index data structures may be hard-linked into a directory of the index data structure. A bottom-up approach may be applied to iteratively merge neighboring nodes of a plurality of index data structures to form the index data structure and construct a new mapping data structure for the index data structure from an original mapping structure associated with the plurality of index data structures.

Description

Method, processor readable medium and system for partition splitting and merging

Technical Field

The present application relates to methods, processor-readable media, and systems for partition splitting and merging of key-value index structures.

Background

Persistent key-value stores play a fundamental and important role in internet-level services, such as cloud services, that use key-value stores to persist or maintain system state and facilitate fast lookups. A key-value store is a storage paradigm configured to store, retrieve, and manage an associated array, and store data as a set of unique identifiers, each unique identifier associated with a corresponding value. The unique identification may serve as a key (key) for the data segment, and the corresponding value associated with the unique identification may be the data segment itself or the location of the data segment. For example, a data structure commonly used for key-value stores may be referred to as a dictionary or hash table. Unlike conventional relational databases, which require the data structures in the database to be pre-defined as a series of lists (which include fields having predefined data types), key-value stores can treat data as a single opaque set and allow each record to have different fields of different data types, thereby providing flexibility and supporting the concept of object-oriented storage, which is similar to that of object-oriented programming.

To enhance the flexibility and extensibility of data growth, cloud services typically divide a data structure for key-value storage into multiple partitions, each of which may maintain key-value pairs whose keys are located within respective key ranges. Because of dynamic changes in data size, data access patterns, and user behavior, partitions may need to be split or merged to allow load balancing of processing access requests through key-value stores. However, existing partition splitting or merging operations require data migration, which is not only time consuming, but also bandwidth and computational intensive. Further, during existing partition splitting or merging operations, key-value stores and cloud services become unavailable or at least partially unavailable to provide services to users. This not only impacts the performance of the cloud service, but also reduces the motivation to perform partition splitting or merging operations, thereby increasing the cost of load balancing.

Disclosure of Invention

In one embodiment, a method implemented by one or more processors for partition splitting and merging of a key-value index structure is provided, the method comprising: receiving a trigger event for merging a plurality of index data structures to form one index data structure; hard-linking data files of the plurality of index data structures into directories of the index data structures; iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

In another embodiment, one or more processor-readable media for partition splitting and merging of key-value index structures are provided, storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: receiving a trigger event for merging a plurality of index data structures to form one index data structure; hard-linking data files of the plurality of index data structures into directories of the index data structures; iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

In another embodiment, a system for partition splitting and merging of a key-value index structure is provided, comprising: one or more processors; and memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving an instruction to split an index data structure to form a plurality of index data structures, the instruction including a key value; hard-linking data files of the index data structure into respective directories of the plurality of index data structures; iteratively splitting a plurality of nodes of the index data structure in a bottom-up manner to form the plurality of index data structures, a key range being associated with a plurality of nodes comprising the key value; and constructing a mapping data structure of the plurality of index data structures.

Drawings

The detailed description is set forth with reference to the accompanying drawings. In the drawings, the left-most reference numeral identifies the drawing in which the reference numeral first appears. The use of the same reference symbols in different drawings indicates similar or identical items.

FIG. 1 illustrates an example environment in which a storage system can be used.

FIG. 2 illustrates an example storage system in more detail.

FIG. 3 illustrates an example relationship between a key-value data structure and a mapping data structure.

FIG. 4 illustrates an example partition merging scenario.

FIG. 5 illustrates an example partition splitting scenario.

FIG. 6 illustrates an example partition merging method.

FIG. 7 illustrates an example partition splitting method.

Detailed Description

This application describes many different embodiments and implementations. The following sections describe example frameworks suitable for practicing various embodiments. Next, this application describes example systems, devices, and processes for implementing a storage engine.

Example Environment

FIG. 1 illustrates an example environment 100 that may be used to implement a storage system. The environment 100 may include a storage engine 102. In an embodiment, the storage system 102 may include a key value engine 104 and one or more storage devices 106-1, …, 106-N (collectively storage devices 106), where N is an integer greater than or equal to 1. The key value engine 104 and the plurality of storage devices 106 may communicate data with one another via one or more connections 108-1, …, 108-M (collectively, connections 108), where M is an integer greater than or equal to 1. In this example, one or more storage devices 106 are referred to as being included in storage system 102. In other examples, one or more storage devices 106 may be associated with the storage system 102 and accessible by the storage system 102.

In this example, the storage system 102 is depicted as a single entity. In other examples, the storage system 102 may be located or included in one or more servers 110-1, …, 110-K (collectively referred to as servers 110), where K is an integer greater than or equal to 1. In an embodiment, storage system 102 may be included in a data center or cloud computing infrastructure 112, which may include, for example, a plurality of servers (e.g., server 110). In an embodiment, the storage system 102 may be part of a data center or cloud computing infrastructure 112 and may be responsible for storing data and providing related storage functions, such as logging, querying data in response to user requests, and the like. Further, in an embodiment, environment 100 may also include one or more client devices 114-1, …, 114-L (collectively client devices 114), where L is an integer greater than or equal to 1. One or more client devices 114 may communicate data with a data center or cloud computing infrastructure 112 (including storage system 102 and/or servers 110) over a network 116.

In embodiments, each of the one or more servers 110 and the one or more client devices 114 may be implemented as any of a variety of computing devices, but are not limited to desktop computers, notebook or portable computers, handheld devices, netbooks, internet devices, tablet or slate computers, mobile devices (e.g., mobile phones, personal digital assistants, smart phones, etc.), server computers, and the like, or a combination thereof.

In an embodiment, each of the one or more storage devices 106 may be implemented as any of a variety of devices having memory or storage capabilities, but are not limited to block storage devices, Solid State Devices (SSDs), NUMA (non-uniform memory access) devices, NVMe (non-volatile memory express) devices, and the like.

The one or more connections 108 may be a data communication network including one or more data communication lines or channels connecting the storage system 102 (e.g., a memory of the storage system 102) and the one or more storage devices 106 through wireless and/or wired connections. Examples of wired connections may include an electrical carrier wave connection (e.g., a communications cable, computer, or the likeA serial bus, a PCIe bus, or a communication bus such as a channel), an optical carrier connection (e.g., a fiber optic connection, etc.). The wireless connection may include for example a WiFi connection, other radio frequency connections (e.g.,

etc.), etc.

In an embodiment, the network 116 may be a wireless or wired network or a combination thereof. The network 116 may be a collection of single networks interconnected with each other and functioning as a single large network (e.g., the internet or an intranet). Examples of such a single network include, but are not limited to, a telephone network, a wireline network, a Local Area Network (LAN), a Wide Area Network (WAN), and a Metropolitan Area Network (MAN). Furthermore, each network may be a wireless or wired network or a combination thereof. A wired network may include electrical carrier connections (e.g., communication cables, etc.) and/or optical carrier waves or connections (e.g., fiber optic connections, etc.). The wireless network may include for example a WiFi network, other radio frequency networks (e.g.,

zigbee, etc.), etc.

Example storage System

FIG. 2 illustrates the storage system 102 in more detail. In an embodiment, the storage system 102 may include, but is not limited to, one or more processors 202, input/output (I/O) interfaces 204 and/or network interfaces 206, and memory 208. Additionally, the storage system 102 may also include a key value engine 210 (e.g., key value engine 104), one or more storage devices 212 (e.g., one or more storage devices 106), and one or more data communication channels 214. In an embodiment, the key value engine 210 may include at least one processor (e.g., the processor 202) and a memory (e.g., the memory 208).

In an embodiment, some of the functionality of the storage system 102 may be implemented using hardware, such as an ASIC (i.e., application specific integrated circuit), FPGA (i.e., field programmable gate array), and/or other hardware. In an embodiment, the storage system 102 may include or may be included in one or more computing devices.

In an embodiment, the processor 202 may be configured to execute instructions stored in the memory 208 and/or received from the I/O interface 204 and/or the network interface 206. In an embodiment, processor 202 may be implemented as one or more hardware processors including, for example, a microprocessor, a special purpose instruction set processor, a Physical Processing Unit (PPU), a Central Processing Unit (CPU), a graphics processing unit, a digital signal processor, a tensor processing unit, or the like. Additionally or alternatively, the functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The memory 208 may include a computer-readable medium (or processor-readable medium) in the form of volatile memory, such as Random Access Memory (RAM), and/or non-volatile memory, such as Read Only Memory (ROM) or flash RAM. Memory 208 is an example of a computer-readable medium (or processor-readable medium).

Computer-readable media (or processor-readable media) may include volatile or nonvolatile types of media, removable or non-removable, which may implement storage of information using any method or technology. The information may include computer readable instructions (or processor readable instructions), data structures, program modules, or other data. Examples of a computer-readable medium (or processor-readable medium) include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electronically Erasable Programmable Read Only Memory (EEPROM), flash memory or other internal storage technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer-readable medium (or a processor-readable medium) does not include any transitory medium such as a modulated data signal and a carrier wave.

In an embodiment, the memory 208 and the one or more storage devices 212 may include, but are not limited to, different types of memory or storage devices having different storage and/or processing capabilities, e.g., memory or storage devices having different response delays, storage devices having different degrees of proximity to the processor and/or the key value engine 210 of the storage system 102, memory or storage devices having different data access (i.e., data read and/or write) speeds, memory or storage devices having different program erase cycles and/or read disturb thresholds, etc. In an embodiment, the memory 208 may have better performance (e.g., lower response latency, higher data access speed, etc.) and/or be closer to the processor and/or key value engine 210 of the storage system 102 than the one or more storage devices 212.

By way of example, and not limitation, memory 208 may be implemented as any of a variety of different types of memory devices having and providing storage capabilities, and may include, but is not limited to, primary storage, or the like. In an embodiment, the primary storage may include, but is not limited to, a cache, a primary storage (e.g., Random Access Memory (RAM), such as Dynamic Random Access Memory (DRAM), etc.), a DCPMM (i.e., a data center persistent memory module, such as Optane^TMPersistent memory), NVMe (i.e., nonvolatile memory express), and the like.

Further, the one or more storage devices 212 (or the one or more storage devices 106) may be implemented as any of a variety of different types of storage devices having and providing storage capabilities, and may include, but are not limited to, secondary storage, tertiary storage, and the like. In embodiments, secondary storage may include, but is not limited to, flash memory or Solid State Devices (SSDs), hybrid hard disk drives (HHDs), Hard Disk Drives (HDDs), and the like. Flash or solid state devices may include, for example, SLC (i.e., single level cell) flash, MLC (i.e., multi-level cell) flash (e.g., TLC (i.e., three level cell) flash, QLC (i.e., four level cell) flash, PLC (i.e., five level cell) flash), and the like. In an embodiment, the tertiary storage may include, but is not limited to, external memory or removable storage media, such as an external flash drive or SSD, an external HHD, an external HDD, or the like.

In an embodiment, the one or more data communication channels 214 may include at least one or more data communication lines or channels that enable different components of the storage system 102 (e.g., the one or more processors 202, the memory 208, the key value engine 210, the one or more storage devices 212, etc.) to transfer data and instructions to one another over a wireless and/or wired connection. Examples of wired connections may include an electrical carrier connection (e.g., a communications cable, computer, or communications bus such as a serial bus, PCIe bus or channel, etc.), an optical carrier connection (e.g., a fiber optic connection, etc.). The wireless connection may include, for example, a WiFi connection, other radio frequency connections (e.g., a Bluetooth connection, etc.)

Etc.) and the like.

Although in this example only hardware components in the storage system 102 are described, in other examples, the storage system 102 may also include other hardware components and/or other software components, such as a program unit 216 for executing instructions stored in the memory 208 to perform various operations, and program data 218 that stores application data and data for tasks processed by different components of the storage system 102. In this example, one or more storage devices 212 are described as being included in storage system 102. In other examples, one or more storage devices 212 may be associated with storage system 102. For example, one or more storage devices 212 may be peripheral and accessible by one or more components of the storage system 102 (e.g., the key value engine 210). By way of example and not limitation, the key value engine 210 may communicate data with one or more storage devices 212 via one or more data communication channels 214.

Example indexing and mapping data structures

In an embodiment, the storage system 102 or the key-value engine 104 may employ an ordered indexing system configured to provide and facilitate insertion, retrieval, and scanning operations. In an embodiment, the storage system 102 or the key-value engine 104 may construct an indexing system based at least in part on an indexing data structure (e.g., a key-value data structure). By way of example and not limitation, the key-value data structure may include a tree or hierarchical data structure, which may include, but is not limited to, a B-tree, a B + tree, a Bw tree, and the like. In an embodiment, the key-value data structure may include a probabilistic data structure, such as a skip list. In an embodiment, to improve the performance of the storage system 102, the storage system 102 or the key-value engine 104 may further employ a mapping data structure that may map logical storage addresses stored or referenced in the key-value data structure to physical storage addresses in a storage device (e.g., the storage device 212), and convert random writes caused by the key-value data structure due to user transactions or requests into a sequential pattern.

FIG. 3 illustrates an example relationship between a key value data structure 302 and a mapping data structure 304. In this example, the key-value data structure 302 is depicted as including a tree data structure (e.g., a B + tree) having a plurality of levels, and includes a root node 306, a plurality of internal nodes 308-1, …, 308-M, and a plurality of leaf nodes 310-1, …, 310-N, where M and N are integers greater than zero. In an embodiment, leaf node 310-i may store or include a corresponding set or range of keys (i.e., key range e)_i) And has a logical address p_iWherein i is more than or equal to 1 and less than or equal to N. The mapping data structure 304, shown as a table in this example, may include mapping entries that define logical addresses (i.e., p, for example) of nodes of the key-value data structure 302₁，…，p_N) With the physical address of the data (i.e. d for example)₁，…，d_N) The physical address of the data is logically identified by the corresponding key and physically stored in a storage device (e.g., storage device 212). In an embodiment, the storage system 102 or the key-value engine 104 may locate data and perform data reads and writes according to user requests based on the key-value data structure 302 and the mapping data structure 304.

In an embodiment, when performing a partition split or merge operation, the storage system 102 (or more specifically, the key-value engine 104) may perform such a partition split or merge operation in multiple stages. In an embodiment, the plurality of stages may include, but are not limited to, a preparation stage, a construction stage, and the like.

In an embodiment, depending on whether the operation is a partition split or merge operation (i.e., generating multiple child index data structures from one parent index data structure for a partition split operation or one child index data structure from multiple parent index data structures for a partition merge operation), during the preparation phase, the key value engine 104 may first hardlink data files (e.g., sealed data files) of one or more parent index data structures to one or more child index data structures while continuing to process user requests using one or more parent index data structures.

In an embodiment, during the construction phase, the key-value engine 104 may take one or more parent index data structures offline and unavailable to process user requests, hard link any remaining data files of the one or more parent index data structures (e.g., actively generated or written data files other than the hard-linked canned files during the preparation phase), construct one or more child index data structures, and update the original mapping data structures associated with the one or more parent index data structures to complete the partition splitting or merging operations. Because the data files of one or more parent index data structures are hard-linked, rather than being copied from one location to another, data migration is minimized while saving computational costs such as processing resources and time.

Example method

FIG. 4 shows a schematic diagram depicting an exemplary partition merging scenario. FIG. 5 shows a schematic diagram depicting an exemplary partition splitting scenario. FIG. 6 shows a schematic diagram depicting an exemplary partition merging method. FIG. 7 shows a schematic diagram depicting an exemplary partition splitting method. The methods of fig. 6 and 7 may, but need not, be implemented in the environment of fig. 1 and using the system of fig. 2, the relationships of fig. 3, and the example scenarios of fig. 4 and 5. For ease of explanation,

methods

600 and 700 are described with reference to fig. 1-5. However,

methods

600 and 700 may alternatively be implemented in other environments and/or using other systems.

Methods

600 and 700 are described in the general context of computer-executable instructions. Generally, computer-executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, and the like that perform particular functions or implement particular abstract data types. Further, each of the exemplary methods is illustrated as a collection of blocks in a logical flow graph, which represents a sequence of operations that can be implemented in hardware, software, firmware, or a combination thereof. The order in which the methods are described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method, or an alternate method. Additionally, various blocks may be omitted from the method without departing from the spirit and scope of the subject matter described herein. In the case of software, the blocks represent computer instructions that, when executed by one or more processors, perform the recited operations. In the hardware case, some or all of the blocks may represent an Application Specific Integrated Circuit (ASIC) or other physical component that performs the described operations.

Returning to FIG. 6, at block 602, the key-value engine 104 may receive or detect a triggering event to perform a partition merge operation.

In an embodiment, the storage system 102 (or, more particularly, the key-value engine 104 of the storage system 102, infra) may receive or detect a triggering event to perform a partition merge operation. In an embodiment, a triggering event may include, for example, receiving an instruction from a user to merge adjacent index data structures (or referred to as adjacent parent index partitions) into a single index data structure (or referred to as a child index partition). In an embodiment, two index data structures are described as being adjacent if the key range of at least one node of the index data structure is adjacent or close to the key range of at least one node of the other index data structure.

Additionally or alternatively, the key-value engine 104 may perform partition merge operations for load balancing purposes. For example, the key-value engine 104 may receive an associated user request to process data whose logical keys are respectively located in adjacent index data structures. If the key-value engine 104 receives these related user requests, either continuously or in an amount greater than a predetermined number threshold, for a predetermined period of time, the key-value engine 104 may identify them as triggering events for performing partition merge operations and attempt to merge or merge together these separate index data structures to reduce the cost of accessing and searching the index data structures.

Additionally or alternatively, the key-value engine 104 can detect that a total number of access requests to adjacent index data structures within a predetermined time period is less than or equal to a predetermined access threshold. In response to this triggering event, the key-value engine 104 can perform a partition merge operation to merge these adjacent index data structures to reduce the storage cost of two separate index data structures in memory (e.g., memory 208) and increase the access rate of the index data structures resulting from the merging of the index data structures.

At block 604, the key value engine 104 may hard link data files of multiple index data structures into a directory of index data structures.

In an embodiment, the key-value engine 104 may enter a preparation phase of a partition merge operation after receiving or detecting a triggering event to perform the partition merge operation. In an embodiment, the key-value engine 104 may hard-link data files of multiple index data structures that are desired to be merged into a directory of index data structures. In an embodiment, the key-value engine 104 may hardlink the data files of the plurality of index data structures by assigning additional filenames to the data files of the plurality of index data structures, the additional filenames being independently connected to respective data segments of the data files of the plurality of index data structures. In an embodiment, the key-value engine 104 may add these additional filenames of the data files to the directory of the index data structure. In an embodiment, the data files of the plurality of index data structures may include, for example, sealed files of the plurality of index data structures (e.g., sealed metadata files and sealed data files in a storage device such as storage device 212). In an embodiment, the sealed files may include files that do not accept new data and are not modifiable.

In an embodiment, the key-value engine 104 may further set a write request threshold to limit the number of write requests to be processed in response to receiving a triggering event. In an embodiment, to reduce the amount of unavailable time for the storage system 102 or the key-value engine 104 to process user requests using multiple index data structures during a subsequent construction phase of a partition merge operation, the key-value engine 104 may further limit the number of write requests to process after receiving or detecting a triggering event to reduce the number of files to be generated and/or written to a manageable or acceptable number (i.e., a write request threshold). In embodiments, this write request threshold may be determined based on a number of factors, which may include, but are not limited to, an allowable or tolerable amount of unavailability time to process the user request, a size of the plurality of index data structures to be merged, a processing power of the key-value engine 104 to complete the merging or merging of the plurality of index data structures in the build phase, and the like. In an embodiment, the write request threshold may also be set by a user (e.g., an administrator) of the storage system 102.

In an embodiment, after successfully hard-linking data files of multiple index data structures to directories of the index data structures, the key value engine 104 may enter a build phase of a partition merge operation. In an embodiment, the key-value engine 104 may make multiple index data structures unavailable for service provision, e.g., to process user requests. Further, the key-value engine 104 can hard-link any remaining files that were not previously hard-linked during the preparation phase. For example, the key-value engine 104 can hard-link one or more active data files in a plurality of index data structures into a directory of index data structures. In an embodiment, the one or more active data files may include files other than the canned files previously hard-linked in the preparation phase, and may include files written and/or created after a triggering event is received or detected and before the plurality of index data structures are made unavailable for service provision (e.g., processing user requests). In an embodiment, since the key-value engine 104 previously set the write request threshold to limit the number of write requests to be processed during the prepare phase, the time to process (e.g., hard link) these remaining files may be significantly reduced based on the write request threshold.

At block 606, the key-value engine 104 may iteratively merge adjacent nodes of multiple index data structures, forming an index data structure in a bottom-up manner.

In an embodiment, after hard linking data files (e.g., files that are sealed and active) of multiple index data structures, the key-value engine 104 may begin merging the multiple index data structures. In an embodiment, the key-value engine 104 may iteratively merge adjacent nodes of multiple index data structures in a bottom-up manner to form an index data structure. In an embodiment, the key-value engine 104 may write key-value pairs associated with neighboring nodes of multiple index data structures into corresponding merge nodes in the index data structures; reporting the new key range of the key in the key value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

The scenario shown in fig. 4 serves as an illustrative example. FIG. 4 illustrates an example scenario in which two index data structures 402 and 404 (shown as tree structures in this example) are to be merged. Obviously, other index data structures, such as other types of hierarchical data structures or probabilistic data structures, may also be applicable. In this example, the index data structure 402 may be associated with a key range A1-A2 and include a plurality of nodes, e.g., N0, N1, and N2. The index data structure 404 may be associated with a key range A3-A4 and include a plurality of nodes, e.g., N3 and N4. In this example, index data structure 402 and index data structure 404 are index data structures that are adjacent or proximate to each other. In an embodiment, two index data structures are referred to as adjacent or proximate to each other if the respective key ranges or sets represented by the two index data structures are contiguous or adjacent to each other, or separated by an amount less than a predetermined separation threshold. The predetermined separation threshold may be predefined by a user (e.g., an administrator) of the storage system 102.

In an embodiment, when index data structures are merged or merged, the key-value engine 104 may merge or merge the indexes in a bottom-up manner (i.e., bottom-up)The neighboring nodes (or referred to as neighboring border nodes) of the reference data structure. In embodiments, neighboring nodes (or referred to as neighboring border nodes) of the index data structure may include nodes whose represented key ranges or sets are adjacent or contiguous to each other, or nodes located at a predetermined proximity threshold. Using the example shown in FIG. 4, the key-value engine 104 may first attempt to merge two adjacent boundary nodes (i.e., N2 and N4) at the respective lowest levels of the

index data structures

402 and 404, and write or copy the key-value pairs of the two adjacent boundary nodes into the new node N5 (or called the merge node) for the merge index data structure 406. In an embodiment, depending on the type of

index data structures

402, 404, and 406, the key-value engine 104 may direct the key-value engine 104 to a higher level parent node of the index data structure 406 (e.g., node N in this example of FIG. 4)_c2) Reporting new key ranges or key sets and/or by new nodes N_c1The maximum key value of the representation.

In an embodiment, the key-value engine 104 may repeat the above operations at a next level on another pair of adjacent border nodes (e.g., N1 and N3 in the example shown in fig. 4) until the root node of the index data structure (e.g., index data structure 406 in the example of fig. 4) is reached. The key-value engine 104 may then update the root node of the index data structure by writing information covering a new key range or set of the key range or set of its children nodes, the index data structure (i.e., the root node and the children nodes) including all of the key-value pairs in the plurality of index data structures (e.g., index data structures 402 and 404) originally included in the new key range or set of the index data structure.

At block 608, the key-value engine 104 may construct a mapping data structure for the index data structure.

In an embodiment, after merging or merging multiple index data structures into an index data structure, the key-value engine 104 may build a new mapping data structure for the index data structure. In an embodiment, the key-value engine 104 may delete mapping entries of neighboring nodes of the plurality of index data structures from original mapping data structures associated with the plurality of index data structures and add new mapping entries of merge nodes of the index data structures to the original mapping data structures to construct new mapping data structures for the index data structures. For example, FIG. 4 shows the corresponding map entries of the adjacent border nodes N2 and N4 deleted from the original map data structure 408 of the

index data structures

402 and 404, and the map entries for the new or merged nodes Nc1 and Nc2 are added to the original map data structure to form the new map data structure 410 for the index data structure 406.

Returning to FIG. 7, at block 702, the key-value engine 104 may receive or detect a triggering event to perform a partition splitting operation.

In an embodiment, the key-value engine 104 of the storage system 102 may receive or detect a triggering event to perform a partition splitting operation. For example, the key-value engine 104 may receive or detect a triggering event by receiving an instruction to split an index data structure (or referred to as a parent index partition) to form multiple index data structures (or referred to as multiple child index partitions), and the instruction may include a key value.

Additionally or alternatively, the key-value engine 104 may perform partition splitting operations for load balancing purposes. For example, the key-value engine 104 may receive a large number of user requests to process data whose logical keys are located in the same index data structure. If the key-value engine 104 receives the user requests at a frequency (rate) greater than a predetermined frequency threshold within a predetermined time period, the key-value engine 104 may identify this as detecting a triggering event to perform a partition splitting operation and attempt to split the index data structure into multiple separate index data structures to enable parallel processing of the user requests by multiple threads or processes of the key-value engine 104 using the separate index data structures for load balancing. In an embodiment, the key-value engine 104 may determine key values for splitting the index data structure into multiple separate index data structures. For example, the key-value engine 104 can determine such key values based at least in part on load balancing. By way of example and not limitation, the key-value engine 104 may determine key values such that respective numbers or frequencies of past user requests to access (e.g., write or read) data whose logical keys are covered by separate index data structures within a past predetermined period of time are approximately the same or different by an amount within a predetermined threshold.

At block 704, the key-value engine 104 may hard-link the data files of the index data structure into respective directories of multiple index data structures.

In an embodiment, the key-value engine 104 may enter the preparation phase upon receiving or detecting a triggering event to perform a partition merge operation. In an embodiment, the key-value engine 104 may hard-link data files of an index data structure into respective directories of multiple index data structures. In an embodiment, the data files of the index data structure may include a sealed file of the index data structure, wherein the sealed file includes a file that does not accept new data and is not modifiable.

In an embodiment, after the key-value engine 104 hardlinks the data files of the index data structure (e.g., the canned metadata file and the canned data file) to respective directories of the plurality of index data structures in the preparation phase, the key-value engine 104 may then make the index data structure unavailable to provide services, such as processing user requests, before entering the construction phase (e.g., before iteratively splitting nodes of the index data structure). In an embodiment, the key-value engine 104 may further hard-link one or more active data files of the index data structure into respective directories of multiple index data structures. The one or more active data files may be different from the canned file and may include files written and/or created prior to making the plurality of index data structures unavailable to provide the service.

At block 706, the key-value engine 104 may iteratively split a plurality of nodes of the index data structure in a bottom-up manner to form a plurality of index data structures, a key range associated with the plurality of nodes comprising key values.

In an embodiment, after the key-value engine 104 hardlinks data files (e.g., canned files and active files) of the index data structure to respective directories of the plurality of index data structures, the key engine 104 may iteratively split nodes of the index data structure in a bottom-up manner to form the plurality of index data structures. In an embodiment, the key-value engine 104 may split a key-value pair of a node of the plurality of nodes of the index data structure into corresponding split nodes of the plurality of index data structures according to the key-value, and report each key range and/or a maximum key value of a key of the split node to each higher-level split node of the plurality of index data structures according to the type of the index data structure. In an embodiment, the key-value engine 104 may repeat the above operations for a next node at a higher level of the index data structure until the root node of the index data structure is rewritten.

The scenario shown in fig. 5 is used as an illustrative example. FIG. 5 illustrates an example scenario in which an index data structure 502, shown in this example as a tree structure, is to be split. Obviously, other index data structures, such as other types of hierarchical data structures or probabilistic data structures, may also be suitable. In this example, the index data structure 502 is described as being split or partitioned along the key value A1. In an embodiment, multiple index data structures may be obtained or formed independently by splitting or partitioning of the index data structure (e.g., index data structure 502 as shown in FIG. 5). In this example, one of the plurality of index data structures may maintain or obtain a left portion of index data structure 502 and another of the plurality of index data structures may maintain or obtain a right portion of index data structure 502.

In an embodiment, for each index data structure of the plurality of index data structures, the key-value engine 104 may initiate a playback process using a bottom-up approach (i.e., in a bottom-up approach). For example, the key-value engine 104 may locate or determine a boundary node (e.g., boundary node N3 shown in fig. 5) at which a key value (e.g., key a1 in fig. 5) is located or exists. In an embodiment, the key-value engine 104 may iteratively, for each of the plurality of index data structures, filter out keys that are not within the respective key range and write keys belonging to the respective one of the plurality of index data structures into a new node of the respective index data structure. In an embodiment, the key-value engine 104 may further report the corresponding key range or maximum key value for each index data structure to an upper level. The key-value engine 104 may then repeatedly or iteratively perform the above operations on the parent node (e.g., node N2 in fig. 5) of the just-processed border node (e.g., border node N3 in fig. 5) of each level until the root node of the split index data structure is overwritten. In an embodiment, the key-value engine 104 may complete such splitting or partitioning of the index data structure to obtain multiple index data structures after obtaining a new root node for each of the multiple index data structures, with a corresponding key assigned to be within a corresponding key range. In an embodiment, the key-value engine 104 may only need to update the boundary nodes between multiple index data structures, as only these boundary nodes are processed during the construction phase. For example, as shown in FIG. 5, only the boundary nodes between the index data structures (i.e., nodes N5, N3, and N1) are updated.

At block 708, the key-value engine 104 may construct a mapping data structure for the plurality of index data structures.

In an embodiment, after splitting the index data structure into multiple index data structures, the key value engine 104 may build a new mapping data structure for the multiple index data structures. In an embodiment, similar to the partition merge operation, the key value engine 104 may delete the mapping entries for the nodes of the index data structure from the original mapping data structure associated with the index data structure and add new mapping entries for the split nodes of the plurality of index data structures to the original mapping data structure to construct a new mapping data structure for the plurality of index data structures. In addition, in embodiments, since each of the plurality of index data structures does not process its data, all map entries beyond its scope can be removed at a controlled rate by background processing without interfering with normal operations.

Any acts of any of the methods described herein may be implemented at least in part by a processor or other electronic device based on instructions stored on one or more computer-readable media. By way of example, and not limitation, any acts of any methods described herein may be implemented under control of one or more processors configured with executable instructions that may be stored on one or more computer-readable media.

Conclusion

Although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claimed subject matter. Additionally or alternatively, some or all of the operations may be implemented by one or more ASICs, FPGAs, or other hardware.

The invention may be further understood using the following items.

Item 1: a method implemented by one or more processors, the method comprising: receiving a trigger event for merging a plurality of index data structures to form one index data structure; hard-linking data files of the plurality of index data structures into directories of the index data structures; iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

Item 2: the method of entry 1, wherein hard-linking data files of the plurality of index data structures into a directory of the index data structures comprises hard-linking sealed files of the plurality of index data structures into a directory of the index data structures, the sealed files not accepting new data and being non-modifiable.

Item 3: the method of entry 2, wherein prior to iteratively merging neighboring nodes of the plurality of index data structures, the method further comprises: making the plurality of index data structures unavailable for providing a service; and hardlinking one or more active data files of the plurality of index data structures into a directory of the index data structure.

Entry 4: the method of clause 3, wherein the one or more active data files are different from the canned file and comprise files written and/or created prior to making the plurality of index data structures unavailable for providing service.

Item 5: the method of item 1, wherein receiving the triggering event comprises: receiving instructions from a user to merge the plurality of index data structures into the index data structure; or detecting that a total number of access requests to the plurality of index data structures is less than or equal to a predetermined access threshold.

Item 6: the method of clause 1, further comprising setting a threshold to limit the number of write requests to be processed in response to receiving the triggering event.

Item 7: the method of entry 1, wherein iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure, comprises: writing key-value pairs associated with neighboring nodes of the plurality of index data structures into corresponding merge nodes in the index data structure; reporting a new key range for a key in the key-value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

Entry 8: the method of entry 1, wherein constructing a mapping data structure of the index data structure comprises: deleting mapping entries of neighboring nodes of the plurality of index data structures from original mapping data structures associated with the plurality of index data structures, and adding new mapping entries of merge nodes of the index data structures to the original mapping data structures to construct mapping data structures of the index data structures.

Item 9: one or more processor-readable media storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: receiving a trigger event for merging a plurality of index data structures to form one index data structure; hard-linking data files of the plurality of index data structures into directories of the index data structures; iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure; and constructing a mapping data structure of the index data structure.

Item 10: the one or more processor-readable media of clause 9, wherein hard-linking the data files of the plurality of index data structures into the directory of index data structures comprises hard-linking sealed files of the plurality of index data structures into the directory of index data structures, the sealed files not accepting new data and being non-modifiable.

Item 11: the one or more processor-readable media of item 10, wherein prior to iteratively merging neighboring nodes of the plurality of index data structures, the acts further comprise: making the plurality of index data structures unavailable for providing a service; and hardlinking one or more active data files of the plurality of index data structures into a directory of the index data structures, the one or more active data files being different from the canned file and comprising files that were written and/or created prior to making the plurality of index data structures unavailable to provide services.

Item 12: the one or more processor-readable media of item 9, wherein receiving the triggering event comprises: receiving instructions from a user to merge the plurality of index data structures into the index data structure; or detecting that a total number of access requests to the plurality of index data structures is less than or equal to a predetermined access threshold.

Item 13: the one or more processor-readable media of clause 9, the acts further comprising setting a threshold to limit a number of write requests to be processed in response to receiving the triggering event.

Item 14: the one or more processor-readable media of entry 9, wherein iteratively merging neighboring nodes of the plurality of index data structures in a bottom-up manner to form the index data structure, comprises: writing key-value pairs associated with neighboring nodes of the plurality of index data structures into corresponding merge nodes in the index data structure; reporting a new key range for a key in the key-value pair to a higher level of the merge node; and repeating the writing and reporting until a root node of the index data structure is reached.

Item 15: the one or more processor-readable media of item 9, wherein constructing a mapping data structure of the index data structure comprises: deleting mapping entries of neighboring nodes of the plurality of index data structures from original mapping data structures associated with the plurality of index data structures, and adding new mapping entries of merge nodes of the index data structures to the original mapping data structures to construct mapping data structures of the index data structures.

Item 16: a system, comprising: one or more processors; and memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving an instruction to split an index data structure to form a plurality of index data structures, the instruction including a key value; hard-linking data files of the index data structure into respective directories of the plurality of index data structures; iteratively splitting a plurality of nodes of the index data structure in a bottom-up manner to form the plurality of index data structures, a key range being associated with a plurality of nodes comprising the key value; and constructing a mapping data structure of the plurality of index data structures.

Item 17: the system of entry 16, wherein hard-linking the data files of the index data structure into respective directories of the plurality of index data structures comprises hard-linking sealed files of the index data structure into respective directories of the plurality of index data structures, the sealed files not accepting new data and being non-modifiable.

Item 18: the system of entry 17, wherein, prior to iteratively splitting nodes of the index data structure, the actions further comprise: making the index data structure unavailable for providing a service; and hardlinking one or more active data files of the index data structure into respective directories of the plurality of index data structures, the one or more active data files being different from the canned files and comprising files written and/or created prior to making the plurality of index data structures unavailable to provide services.

Item 19: the system of entry 16, wherein iteratively splitting nodes of the index data structure in a bottom-up manner to form the plurality of index data structures comprises: splitting a key-value pair of one of a plurality of nodes of the index data structure into corresponding split nodes of the plurality of index data structures according to the key-value; reporting key ranges of keys of the split node to respective higher levels of the split node of the plurality of index data structures; and repeating the splitting and reporting until a root node of the index data structure is overwritten.

Item 20: the system of entry 16, wherein constructing the mapping data structure for the plurality of index data structures comprises: deleting mapping entries for the nodes of the index data structure from an original mapping data structure associated with the index data structure and adding new mapping entries for split nodes of the plurality of index data structures to the original mapping data structure to construct the mapping data structures of the plurality of index data structures.

Claims

1. A method implemented by one or more processors for partition splitting and merging of a key-value index structure, the method comprising:

receiving a trigger event for merging a plurality of first index data structures to form a second index data structure;

hard-linking data files of the plurality of first index data structures into a directory of the second index data structure;

iteratively merging neighboring nodes of the plurality of first index data structures in a bottom-up manner to form the second index data structure; and

a mapping data structure of the second index data structure is constructed.

2. The method of claim 1, wherein hard linking the data files of the plurality of first index data structures into the directory of the second index data structure comprises:

hard-linking sealed files of the plurality of first index data structures into a directory of the second index data structure, the sealed files not accepting new data and not being modifiable.

3. The method of claim 2, wherein prior to iteratively merging neighboring nodes of the plurality of first index data structures, the method further comprises:

making the plurality of first index data structures unavailable for providing a service; and

hard linking one or more active data files of the plurality of first index data structures into a directory of the second index data structure.

4. The method of claim 3, wherein,

the one or more active data files are different from the sealed file, and

the one or more active data files include a first file that is written and/or created prior to making the plurality of first index data structures unavailable to provide the service.

5. The method of claim 1, wherein receiving the trigger event comprises:

receiving an instruction from a user to merge the plurality of first index data structures into the second index data structure; or

Detecting that a total number of access requests to the plurality of first index data structures is less than or equal to a predetermined access threshold.

6. The method of claim 1, further comprising:

a threshold is set to limit the number of write requests to be processed in response to receiving the triggering event.

7. The method of claim 1, wherein iteratively merging neighboring nodes of the plurality of first index data structures in a bottom-up manner to form the second index data structure comprises:

writing a plurality of key value pairs associated with neighboring nodes of the plurality of first index data structures into corresponding merge nodes in the second index data structure;

reporting a new key range for a key in the plurality of key value pairs to a higher level of the merge node; and

repeating the writing and the reporting until a root node of the second index data structure is reached.

8. The method of claim 1, wherein constructing a mapping data structure for the second index data structure comprises:

deleting mapping entries of neighboring nodes of the plurality of first index data structures from original mapping data structures associated with the plurality of first index data structures, an

Adding the new mapping entry of the merge node of the second index data structure to the original mapping data structure to construct the mapping data structure of the second index data structure.

9. One or more processor-readable media for partition splitting and merging of key-value index structures storing executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

10. A computer program for partition splitting and merging of a key-value index structure, the computer program being stored in a storage medium and the computer program, when executed by a processor, performing the method of any of claims 1 to 8.

11. A system for partition splitting and merging of a key-value index structure, comprising:

one or more processors; and

a memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising:

receiving an instruction to split a first index data structure to form a plurality of second index data structures, the instruction including key values;

hard-linking data files of the first index data structure into respective directories of the plurality of second index data structures;

iteratively splitting a plurality of nodes of the first index data structure in a bottom-up manner to form the plurality of second index data structures, a key range being associated with a plurality of nodes comprising the key value; and

a mapping data structure of the plurality of second index data structures is constructed.

12. The system of claim 11, wherein hard linking the data files of the first index data structure into respective directories of the plurality of second index data structures comprises:

hard-linking the sealed files of the first index data structure into respective directories of the plurality of second index data structures, the sealed files not accepting new data and being non-modifiable.

13. The system of claim 12, wherein prior to iteratively splitting nodes of the first index data structure, the acts further comprise:

making the first index data structure unavailable for providing a service; and

hard-linking one or more active data files of the first index data structure into respective directories of the plurality of second index data structures, the one or more active data files being different from the canned file and comprising files written and/or created prior to making the plurality of second index data structures unavailable for providing service.

14. The system of claim 11, wherein iteratively splitting nodes of the first index data structure in a bottom-up manner to form the plurality of second index data structures comprises:

splitting a key-value pair of one of a plurality of nodes of the first index data structure into corresponding split nodes of the plurality of second index data structures according to the key-value;

reporting key ranges of keys of the split node to respective higher levels of the split node of the plurality of second index data structures; and

repeating the splitting and reporting until a root node of the first index data structure is overwritten.

15. The system of claim 11, wherein constructing the mapping data structure for the plurality of second index data structures comprises:

deleting mapping entries for the nodes of the first index data structure from an original mapping data structure associated with the first index data structure and adding new mapping entries for split nodes of the plurality of second index data structures to the original mapping data structure to construct the mapping data structures of the plurality of second index data structures.