WO2019228009A1 - Lsm tree optimization method and device and computer equipment - Google Patents

Lsm tree optimization method and device and computer equipment Download PDF

Info

Publication number
WO2019228009A1
WO2019228009A1 PCT/CN2019/077404 CN2019077404W WO2019228009A1 WO 2019228009 A1 WO2019228009 A1 WO 2019228009A1 CN 2019077404 W CN2019077404 W CN 2019077404W WO 2019228009 A1 WO2019228009 A1 WO 2019228009A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
lsm tree
leaf node
index key
target leaf
Prior art date
Application number
PCT/CN2019/077404
Other languages
French (fr)
Chinese (zh)
Inventor
阳振坤
席华锋
韩富晟
肖金亮
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019228009A1 publication Critical patent/WO2019228009A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the embodiments of the present specification relate to the field of data processing technologies, and in particular, to an optimization method, device, and computer device for an LSM tree.
  • the LSM tree (Log-Structured Merge Tree) is a hard disk-based data structure that includes dynamic data and static data. Dynamic data and static data store the incremental modification of data. During the process of accessing the LSM tree, its dynamic data and static data are read in turn, and the read results are combined to obtain the final read result.
  • the embodiments of the present specification provide a method, an apparatus, and a computer device for optimizing an LSM tree.
  • the technical solutions are as follows:
  • a method for optimizing an LSM tree includes:
  • a delete tag is added to the target leaf node.
  • an apparatus for optimizing an LSM tree includes:
  • a target determining module configured to determine a target leaf node in the LSM tree whose data has been deleted
  • a first adding module is configured to add a delete tag to the target leaf node in the dynamic data of the LSM tree.
  • a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the program when the processor executes the program.
  • An optimization method of the LSM tree provided in the embodiment of the specification.
  • a target leaf node in which data pointed to by the LSM tree has been deleted is determined, and a deletion mark is added to the determined target leaf node in dynamic data of the LSM tree. Because when accessing the LSM tree, first access the dynamic data of the LSM tree, and then access the static data of the LSM tree. Therefore, in the dynamic data, add delete marks to the target leaf nodes whose data has been deleted. When accessing dynamic data, it can directly skip the leaf nodes with delete marks, thereby improving data access efficiency. Furthermore, when accessing static data, there is no need to repeatedly visit leaf nodes with delete marks, thereby improving data access efficiency.
  • Figure 1 is an example of an LSM tree
  • FIG. 2 is a flowchart of an embodiment of an LSM tree optimization method according to an exemplary embodiment of the present specification
  • FIG. 3 is an example of static data of the LSM tree
  • FIG. 4 is an example of adding a deletion marker to the dynamic data of the LSM tree
  • FIG. 5 is another example of adding a deletion marker to the dynamic data of the LSM tree
  • FIG. 6 is a block diagram of an embodiment of an apparatus for optimizing an LSM tree according to an exemplary embodiment of the present specification
  • FIG. 7 shows a more specific schematic diagram of a hardware structure of a computing device provided by an embodiment of the present specification.
  • the LSM tree is a hard disk-based data structure. Its representative databases include Hbase, NessDB, and LevelDB.
  • the storage engine of the LSM tree is the same as the storage engine of the B + tree. It also supports the addition, deletion, reading, modification, and sequential scanning operations, and it can effectively avoid the problem of random disk writes through batch storage technology. Specifically, the core idea is: Assume that the memory is large enough. Therefore, you do not need to write data to disk every time there is a data update. Instead, you can store the latest data in memory first, wait until the data in the memory reaches the specified size limit, and then use it. Merge and sort the data in memory and append them to the end of the disk queue.
  • the data stored in the LSM tree can be divided into static data and dynamic data.
  • dynamic data refers to data in memory and static data.
  • the data in is the data stored on persistent media, such as disk.
  • Those skilled in the art can understand that what is stored in the LSM tree is the modification increment of the data, that is, the change information of the data. For example, as shown in Figure 1, it is an example of an LSM tree.
  • the embodiment of the present specification provides a method for optimizing the LSM tree.
  • this method when accessing the LSM tree, first access its dynamic data and then its static data, so that in the dynamic data of the LSM tree, Add a delete tag to the leaf node to which the pointed data has been deleted to indicate that the data has been deleted, so that when subsequent access to the LSM tree, if a leaf node with a delete tag is found in the dynamic data, it will jump directly This leaf node is passed, and in the subsequent search for static data, the leaf node with the same index key is no longer searched, thereby improving data query performance.
  • FIG. 2 is a flowchart of an embodiment of an LSM tree optimization method according to an exemplary embodiment of the present specification. The method includes the following steps:
  • Step 202 Determine the target leaf node in the LSM tree whose data has been deleted.
  • the leaf nodes to which the data pointed to have been deleted can be determined respectively for the dynamic data and the static data.
  • the leaf nodes that have been determined, that is, the data pointed to have been deleted are determined. Called the target leaf node.
  • Step 204 In the dynamic data of the LSM tree, add a delete tag to the target leaf node.
  • the target leaf nodes determined in the dynamic data include: leaf nodes with index keys of 3 to 8 and leaf nodes with index keys of 15 to 31. Then, in this step, you can Add delete marks to these target leaf nodes. For example, as shown in FIG. 3, in FIG. 3, "deleted" indicates a delete mark. Those skilled in the art can understand that the use of "deleted" as a delete mark is merely an example. In practical applications, the deletion mark may be embodied in other forms, which is not limited in the embodiments of the present specification.
  • the leaf node pointed to by the target leaf node with the largest index key Whether the index key of the index is continuous with the maximum index key. If the index key is continuous, then no processing is required. If the index key is not continuous, a virtual leaf node can be inserted after the target leaf node with the largest index key. The index key of the virtual leaf node Add 1 to the largest index key.
  • the target leaf node to which the deletion tag is added as a leaf node with an index key of 3 to 8 as an example.
  • the maximum index key is 8, as shown in FIG. 4, the index key is 8
  • the index key of the next leaf node pointed to by the leaf node is 9, which is continuous with 8, so that no processing is required.
  • the target leaf node to which the deletion tag is added as a leaf node having an index key of 15 to 31.
  • the maximum index key is 31, as shown in FIG. 4, the index key is The leaf node of 31 no longer points to other leaf nodes.
  • a virtual leaf node can be inserted after the leaf node, and the index key of the virtual leaf node is 32, for example, as shown in FIG. 4.
  • the virtual leaf node does not have a delete mark.
  • the target leaf nodes determined in the static data include leaf nodes with index keys of 3 to 15, at this time, it can be first determined whether there is an index key of 3 to 15 in the dynamic data. If the target leaf node does not exist, two target leaf nodes with the largest index key and the smallest index key in the target leaf node can be inserted into the dynamic data of the LSM tree. In the dynamic data of the LSM tree, the inserted Add a delete mark to the target leaf node.
  • target leaf nodes with index keys of 3 and 15 are inserted into the dynamic data of the LSM tree, and In the dynamic data, a delete mark is added to the inserted target leaf node, as shown in FIG. 5.
  • the index key of the leaf node pointed by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the largest index key. If the index key is continuous, then no processing is required. Continuously, a virtual leaf node can be inserted after the index key is the target leaf node of the largest index key, and the index key of the virtual leaf node is the largest index key plus one.
  • the target leaf node to which the deletion mark is added is a leaf node with index keys of 3 and 15, the maximum index key of which is 15, and after the leaf node whose index key is 15, no
  • a virtual leaf node with an index key of 16 can be inserted after the leaf node, as shown in FIG. 5.
  • a target leaf node in which data pointed to by the LSM tree has been deleted is determined, and a deletion mark is added to the determined target leaf node in dynamic data of the LSM tree. Because when accessing the LSM tree, first access the dynamic data of the LSM tree, and then access the static data of the LSM tree. Therefore, in the dynamic data, add delete marks to the target leaf nodes whose data has been deleted. When accessing dynamic data, it can directly skip the leaf nodes with delete marks, thereby improving data access efficiency. Furthermore, when accessing static data, there is no need to repeatedly visit leaf nodes with delete marks, thereby improving data access efficiency.
  • an embodiment of the present specification further provides an apparatus for optimizing an LSM tree.
  • FIG. 6 is a block diagram of an embodiment of an apparatus for optimizing an LSM tree according to an exemplary embodiment of the present specification. It may include a target determination module 61 and a first adding module 62.
  • the target determination module 61 may be used to determine a target leaf node in the LSM tree to which the pointed data has been deleted;
  • the first adding module 62 may be configured to add a delete tag to the target leaf node in the dynamic data of the LSM tree.
  • the apparatus may further include (not shown in FIG. 6):
  • a second adding module is configured to add a delete tag to the branch node if it is detected that all leaf nodes under any branch node are added with a delete tag in the dynamic data of the LSM tree.
  • the target determining module 61 may be specifically configured to:
  • the first adding module 62 may include (not shown in FIG. 6):
  • An inserting sub-module for inserting two target leaf nodes with the largest index key and the smallest index key of the determined target leaf nodes into the LSM tree if the target leaf nodes are determined for the static data of the LSM tree In dynamic data
  • the tag adding submodule is used to add a delete tag to the inserted target leaf node in the dynamic data of the LSM tree.
  • the apparatus may further include (not shown in FIG. 6):
  • a judging module configured to judge whether the index key of the leaf node pointed to by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the largest index key
  • a virtual node inserting module is configured to insert a virtual leaf node after the target leaf node having the largest index key if it is not continuous, wherein the index key of the virtual leaf node is the largest index key plus one.
  • the target determination module 61 and the first adding module 62 are two modules with independent functions, which can be configured in the device at the same time as shown in FIG. 6 or can be separately configured in the device, so FIG. 6 shows The structure should not be construed as limiting the scheme of the embodiment of the present specification.
  • An embodiment of the present specification further provides a computer device including at least a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the foregoing optimization of the LSM tree when executing the program.
  • the method at least includes: determining a target leaf node in the LSM tree whose data has been deleted; and adding a deletion mark to the target leaf node in the dynamic data of the LSM tree.
  • FIG. 7 shows a more specific schematic diagram of a hardware structure of a computing device provided by an embodiment of the present specification.
  • the device may include a processor 710, a memory 720, an input / output interface 730, a communication interface 740, and a bus 750.
  • the processor 710, the memory 720, the input / output interface 730, and the communication interface 740 realize a communication connection within the device through the bus 750.
  • the processor 710 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, etc., for performing related operations. Program to implement the technical solution provided by the embodiment of the present specification.
  • a general-purpose CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 720 may be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage devices, and dynamic storage devices.
  • the memory 720 may store an operating system and other application programs. When the technical solutions provided in the embodiments of the present specification are implemented by software or firmware, related program codes are stored in the memory 720 and are called and executed by the processor 710.
  • the input / output interface 730 is used to connect an input / output module to implement information input and output.
  • the input / output / module can be configured in the device as a component (not shown in FIG. 7), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, and an indicator light.
  • the communication interface 740 is used to connect a communication module (not shown in FIG. 7) to implement communication interaction between the device and other devices.
  • the communication module can implement communication through a wired method (for example, USB, network cable, etc.), and can also implement communication through a wireless method (for example, mobile network, WIFI, Bluetooth, etc.).
  • the bus 750 includes a path for transmitting information between various components of the device (for example, the processor 710, the memory 720, the input / output interface 730, and the communication interface 740).
  • the device may also include necessary for achieving normal operation Other components.
  • the foregoing device may also include only components necessary to implement the solutions of the embodiments of the present specification, and does not necessarily include all the components shown in the drawings.
  • An embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the foregoing optimization method of the LSM tree is implemented.
  • the method at least includes: determining a target leaf node in the LSM tree whose data has been deleted; and adding a deletion mark to the target leaf node in the dynamic data of the LSM tree.
  • Computer-readable media includes permanent and non-persistent, removable and non-removable media.
  • Information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present specification can be implemented by means of software plus a necessary universal hardware platform. Based on such an understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product that is essentially or contributes to the existing technology.
  • the computer software product may be stored in a storage medium, such as ROM / RAM, Magnetic disks, optical disks, and the like include a number of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or portions of the embodiments of this specification.
  • the system, device, module, or unit described in the foregoing embodiments may be specifically implemented by a computer chip or entity, or a product with a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or a combination of any of these devices.
  • each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments.
  • the device embodiment since it is basically similar to the method embodiment, it is described relatively simply, and the relevant part may refer to the description of the method embodiment.
  • the device embodiments described above are only schematic, and the modules described as separate components may or may not be physically separated. When implementing the solutions of the embodiments of this specification, the functions of each module may be the same. Or multiple software and / or hardware. Some or all of the modules may also be selected according to actual needs to achieve the objective of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without creative efforts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are an LSM tree optimization method and device and a computer equipment, the method comprising: determining a target leaf node for which indicated data has already been deleted in an LSM tree; and adding a deletion tag for the target leaf node in dynamic data of the LSM tree.

Description

一种LSM树的优化方法、装置及计算机设备Optimization method, device and computer equipment of LSM tree 技术领域Technical field
本说明书实施例涉及数据处理技术领域,尤其涉及一种LSM树的优化方法、装置及计算机设备。The embodiments of the present specification relate to the field of data processing technologies, and in particular, to an optimization method, device, and computer device for an LSM tree.
背景技术Background technique
LSM树(Log-Structured Merge Tree,日志结构的合并树)是一种基于硬盘的数据结构,其包括动态数据与静态数据,动态数据和静态数据存储的是数据的修改增量。在访问LSM树的过程中,依次读取其动态数据和静态数据,将所读取到的结果进行合并,以得到最终的读取结果。The LSM tree (Log-Structured Merge Tree) is a hard disk-based data structure that includes dynamic data and static data. Dynamic data and static data store the incremental modification of data. During the process of accessing the LSM tree, its dynamic data and static data are read in turn, and the read results are combined to obtain the final read result.
由于LSM树保存的是数据的修改增量,从而当LSM树中保存有大量的用于表示数据已被删除的修改增量时,在进行数据读取过程中,需要遍历大量的无用数据,才能读取到有效的数据,这也就导致数据的读取性能下降。Because the LSM tree holds the modification increment of the data, when a large number of modification increments are used to indicate that the data has been deleted in the LSM tree, a large amount of useless data needs to be traversed during the data reading process. Valid data is read, which also causes the data read performance to decrease.
发明内容Summary of the Invention
针对上述技术问题,本说明书实施例提供一种LSM树的优化方法、装置及计算机设备,技术方案如下:In view of the above technical problems, the embodiments of the present specification provide a method, an apparatus, and a computer device for optimizing an LSM tree. The technical solutions are as follows:
根据本说明书实施例的第一方面,提供一种LSM树的优化方法,所述方法包括:According to a first aspect of the embodiments of the present specification, a method for optimizing an LSM tree is provided. The method includes:
确定出LSM树中,所指向的数据已被删除的目标叶子节点;Determine the target leaf node in the LSM tree whose data has been deleted;
在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。In the dynamic data of the LSM tree, a delete tag is added to the target leaf node.
根据本说明书实施例的第二方面,提供一种LSM树的优化装置,所述装置包括:According to a second aspect of the embodiments of the present specification, an apparatus for optimizing an LSM tree is provided. The apparatus includes:
目标确定模块,用于确定出LSM树中,所指向的数据已被删除的目标叶子节点;A target determining module, configured to determine a target leaf node in the LSM tree whose data has been deleted;
第一添加模块,用于在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。A first adding module is configured to add a delete tag to the target leaf node in the dynamic data of the LSM tree.
根据本说明书实施例的第三方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现本说明书实施例提供的LSM树的优化方法。According to a third aspect of the embodiments of the present specification, there is provided a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the program when the processor executes the program. An optimization method of the LSM tree provided in the embodiment of the specification.
本说明书实施例所提供的技术方案,通过确定出LSM树中,所指向的数据已被删除的目标叶子节点,在该LSM树的动态数据中,为所确定出的目标叶子节点添加删除标记,由于在对LSM树进行数据访问时,首先访问LSM树的动态数据,再访问LSM树的静态数据,从而在动态数据中,为那些所指向的数据已被删除的目标叶子节点添加删除标记,可以实现在访问动态数据时,直接跳过具有删除标记的叶子节点,从而提高数据访问效率,进一步,在访问静态数据时,也无需重复访问具有删除标记的叶子节点,从而提高数据访问效率。In the technical solution provided in the embodiment of the present specification, a target leaf node in which data pointed to by the LSM tree has been deleted is determined, and a deletion mark is added to the determined target leaf node in dynamic data of the LSM tree. Because when accessing the LSM tree, first access the dynamic data of the LSM tree, and then access the static data of the LSM tree. Therefore, in the dynamic data, add delete marks to the target leaf nodes whose data has been deleted. When accessing dynamic data, it can directly skip the leaf nodes with delete marks, thereby improving data access efficiency. Furthermore, when accessing static data, there is no need to repeatedly visit leaf nodes with delete marks, thereby improving data access efficiency.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书实施例。It should be understood that the above general description and the following detailed description are merely exemplary and explanatory, and should not limit the embodiments of the present specification.
此外,本说明书实施例中的任一实施例并不需要达到上述的全部效果。In addition, any one of the embodiments in this specification does not need to achieve all the effects described above.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书实施例中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some of the embodiments described in the embodiments of this specification. For those of ordinary skill in the art, other drawings can be obtained based on these drawings.
图1为LSM树的一种示例;Figure 1 is an example of an LSM tree;
图2为本说明书一示例性实施例提供的一种LSM树的优化方法的实施例流程图;2 is a flowchart of an embodiment of an LSM tree optimization method according to an exemplary embodiment of the present specification;
图3为LSM树的静态数据的一种示例;FIG. 3 is an example of static data of the LSM tree;
图4为在LSM树的动态数据中添加删除标记的一种示例;FIG. 4 is an example of adding a deletion marker to the dynamic data of the LSM tree;
图5为在LSM树的动态数据中添加删除标记的另一种示例;FIG. 5 is another example of adding a deletion marker to the dynamic data of the LSM tree;
图6为本说明书一示例性实施例提供的一种LSM树的优化装置的实施例框图;6 is a block diagram of an embodiment of an apparatus for optimizing an LSM tree according to an exemplary embodiment of the present specification;
图7示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图。FIG. 7 shows a more specific schematic diagram of a hardware structure of a computing device provided by an embodiment of the present specification.
具体实施方式Detailed ways
为了使本领域技术人员更好地理解本说明书实施例中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行详细地描述,显然,所描述的实施例仅仅是本说明书的一部分实施例,而不是全部的实施例。基于本说明书中的实 施例,本领域普通技术人员所获得的所有其他实施例,都应当属于保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification. Obviously, the described implementations The examples are only a part of the embodiments of this specification, but not all the examples. Based on the embodiments in this specification, all other embodiments obtained by a person having ordinary skill in the art should fall into the protection scope.
LSM树是一种基于硬盘的数据结构,其代表数据库包括Hbase、NessDB、LevelDB等。LSM树的存储引擎和B+树的存储引擎一样,同样支持增、删、读、改、顺序扫描操作,而且其通过批量存储技术可以有效规避磁盘随机写入问题,具体的,其核心思路是,假定内存足够大,因此,不需要每次有数据更新就必须将数据写入磁盘中,而可以先将最新的数据驻留在内存中,等到内存中的数据达到指定的大小限制后,再使用归并排序的方式将内存中的数据合并追加到磁盘队尾,基于此,也就可以将LSM树所存储的数据分为静态数据和动态数据,其中,动态数据是指内存中的数据,静态数据中的数据是指存储于持久化介质上,例如磁盘中的数据。本领域技术人员可以理解的是,LSM树中存储的是数据的修改增量,也就是数据的变更信息。例如,如图1所示,为LSM树的一种示例。The LSM tree is a hard disk-based data structure. Its representative databases include Hbase, NessDB, and LevelDB. The storage engine of the LSM tree is the same as the storage engine of the B + tree. It also supports the addition, deletion, reading, modification, and sequential scanning operations, and it can effectively avoid the problem of random disk writes through batch storage technology. Specifically, the core idea is: Assume that the memory is large enough. Therefore, you do not need to write data to disk every time there is a data update. Instead, you can store the latest data in memory first, wait until the data in the memory reaches the specified size limit, and then use it. Merge and sort the data in memory and append them to the end of the disk queue. Based on this, the data stored in the LSM tree can be divided into static data and dynamic data. Among them, dynamic data refers to data in memory and static data. The data in is the data stored on persistent media, such as disk. Those skilled in the art can understand that what is stored in the LSM tree is the modification increment of the data, that is, the change information of the data. For example, as shown in Figure 1, it is an example of an LSM tree.
在访问LSM树的过程中,需要依次读取其动态数据和静态数据,将所读取到的结果进行合并,以得到最终的读取结果。In the process of accessing the LSM tree, it is necessary to sequentially read its dynamic data and static data, and combine the read results to obtain the final read result.
目前,在访问LSM的过程中,存在两个问题:At present, there are two problems in accessing LSM:
其一:当频繁地在动态数据中插入用于表示数据已被删除的修改增量时,也即动态数据中保存了大量的删除记录时,在访问动态数据时,需要遍历大量无用的数据,导致数据的读取性能下降。例如,假设通过索引键范围[1,50]查找数据,而如图1所示,在索引键范围[1,50]内存在大量的已被删除的数据,例如索引键为3至8的叶子节点所指向的数据,索引键为15至31的叶子节点所指向的数据,以及索引键为32和索引键为49的叶子节点所指向的数据,那么,在现有技术中,仍需要逐个遍历叶子节点,最终才能得到索引键为9至14,以及索引键为33至48的叶子节点所指向的数据。One: When frequent modification increments are inserted into the dynamic data to indicate that the data has been deleted, that is, when a large number of deletion records are stored in the dynamic data, it is necessary to traverse a large amount of useless data when accessing the dynamic data. As a result, the read performance of the data is reduced. For example, suppose the data is searched by the index key range [1, 50], and as shown in FIG. 1, there is a large amount of deleted data in the index key range [1, 50], for example, the leaves whose index keys are 3 to 8 The data pointed to by the node, the data pointed by the leaf nodes with the index keys of 15 to 31, and the data pointed by the leaf nodes with the index key of 32 and the index key of 49, then in the prior art, it is still necessary to traverse one by one The leaf nodes can finally get the data pointed to by the leaf nodes with index keys of 9 to 14, and the index keys of 33 to 48.
其二:在访问静态数据时,依旧需要遍历大量无用的数据,导致数据的查询性能下降。例如,在访问图1所示例的静态数据时,依旧需要逐个遍历索引键为3至99的叶子节点所指向的数据,并无法跳过所指向的数据已被删除的叶子节点。Second: When accessing static data, it is still necessary to traverse a large amount of useless data, resulting in data query performance degradation. For example, when accessing the static data illustrated in FIG. 1, it is still necessary to traverse the data pointed to by the leaf nodes whose index keys are 3 to 99, and it is impossible to skip the leaf nodes whose data has been deleted.
基于此,本说明书实施例提供一种LSM树的优化方法,在该方法中,考虑到在访问LSM树时,先访问其动态数据,再访问其静态数据,从而在LSM树的动态数据中,为所指向的数据已被删除的叶子节点添加用于表示数据已被删除的删除标记,以实现后续在访问LSM树时,若在动态数据中查找到带有删除标记的叶子节点,则直接跳过该叶子节点,并且,后续在查找静态数据时,也不再查找具有相同索引键的叶子节点,从 而提升数据查询性能。Based on this, the embodiment of the present specification provides a method for optimizing the LSM tree. In this method, when accessing the LSM tree, first access its dynamic data and then its static data, so that in the dynamic data of the LSM tree, Add a delete tag to the leaf node to which the pointed data has been deleted to indicate that the data has been deleted, so that when subsequent access to the LSM tree, if a leaf node with a delete tag is found in the dynamic data, it will jump directly This leaf node is passed, and in the subsequent search for static data, the leaf node with the same index key is no longer searched, thereby improving data query performance.
如下,示出下述实施例对上述LSM树的优化方法进行详细说明:The following embodiments are described in detail below to describe the optimization method of the LSM tree:
请参见图2,为本说明书一示例性实施例提供的一种LSM树的优化方法的实施例流程图,该方法包括以下步骤:Please refer to FIG. 2, which is a flowchart of an embodiment of an LSM tree optimization method according to an exemplary embodiment of the present specification. The method includes the following steps:
步骤202:确定出LSM树中,所指向的数据已被删除的目标叶子节点。Step 202: Determine the target leaf node in the LSM tree whose data has been deleted.
在本说明书实施例中,可以分别针对动态数据和静态数据,确定出所指向的数据已被删除的叶子节点,为了描述方便,将所确定出的,也即所指向的数据已被删除的叶子节点称为目标叶子节点。In the embodiment of the present specification, the leaf nodes to which the data pointed to have been deleted can be determined respectively for the dynamic data and the static data. For the convenience of description, the leaf nodes that have been determined, that is, the data pointed to have been deleted, are determined. Called the target leaf node.
如下,分别从动态数据和静态数据两方面进行说明:As follows, it is described from two aspects of dynamic data and static data:
首先,动态数据:First, dynamic data:
以图1所示例的动态数据为例,假设通过索引键范围[1,50]查找数据,在查询过程中,可以检测到多个索引键连续的叶子节点所指向的数据已被删除,例如,索引键为3至8的叶子节点,索引键为15至31的叶子节点,那么,则可以将这些叶子节点确定为目标叶子节点。Taking the dynamic data shown in Figure 1 as an example, assuming that the data is searched by the index key range [1, 50], during the query process, it can be detected that the data pointed to by multiple leaf nodes with continuous index keys have been deleted. For example, Leaf nodes whose index keys are 3 to 8 and leaf nodes whose index keys are 15 to 31, then these leaf nodes can be determined as target leaf nodes.
其次,静态数据:Second, static data:
以图3所示例的静态数据为例,假设通过索引键范围[1,50]查找数据,在查询过程中,可以检测到多个索引键连续的叶子节点所指向的数据已被删除,例如,索引键为3至15的叶子节点,那么,则可以将这些叶子节点确定为目标叶子节点。Taking the static data shown in Figure 3 as an example, suppose that the data is searched by the index key range [1, 50]. During the query process, it can be detected that the data pointed to by multiple leaf nodes with consecutive index keys have been deleted. For example, Leaf nodes with index keys of 3 to 15, then these leaf nodes can be determined as target leaf nodes.
步骤204:在LSM树的动态数据中,为目标叶子节点添加删除标记。Step 204: In the dynamic data of the LSM tree, add a delete tag to the target leaf node.
在本步骤中,仍分别从动态数据和静态数据两方面进行说明:In this step, it is still explained from two aspects of dynamic data and static data:
首先,动态数据:First, dynamic data:
基于上述步骤202中的相关举例,在动态数据中所确定出的目标叶子节点包括:索引键为3至8的叶子节点,索引键15至31的叶子节点,那么,在本步骤中,则可以为这些目标叶子节点添加删除标记,例如,如图3所示,在图3中,“deleted”表示删除标记,本领域技术人员可以理解的是,以“deleted”作为删除标记仅仅作为举例,在实际应用中,删除标记可以以其它形式体现,本说明书实施例对此不作限制。Based on the relevant examples in step 202 above, the target leaf nodes determined in the dynamic data include: leaf nodes with index keys of 3 to 8 and leaf nodes with index keys of 15 to 31. Then, in this step, you can Add delete marks to these target leaf nodes. For example, as shown in FIG. 3, in FIG. 3, "deleted" indicates a delete mark. Those skilled in the art can understand that the use of "deleted" as a delete mark is merely an example. In practical applications, the deletion mark may be embodied in other forms, which is not limited in the embodiments of the present specification.
此外,在本说明书实施例中还提出,若在LSM树的动态数据中,任一枝节点下的 所有叶子节点均被添加有删除标记,则为该枝节点添加删除标记。例如,如图3所示,在“15”这一枝节点下,所有的叶子节点均被添加有删除标记,则可以为该枝节点也添加删除标记,如图3所示。通过该种处理,后续在访问LSM树的动态数据时,则在访问完“3”这一枝节点下的所有叶子节点后,可以直接跳过“15”这一枝节点,访问“32”这一枝节点,从而提高数据访问效率。In addition, it is also proposed in the embodiment of the present specification that if in the dynamic data of the LSM tree, all leaf nodes under any branch node are added with a delete tag, a delete tag is added to the branch node. For example, as shown in FIG. 3, under the branch node of “15”, all leaf nodes are added with a delete mark, so it is also possible to add a delete mark to the branch node, as shown in FIG. Through this kind of processing, when accessing the dynamic data of the LSM tree in the future, after accessing all the leaf nodes under the "3" branch node, you can directly skip the "15" branch node and access the "32" branch node To improve data access efficiency.
此外,在本说明书实施例中,考虑到后续进一步提高在静态数据中进行数据访问的效率,可以进一步判断被添加删除标记的目标叶子节点中,具有最大索引键的目标叶子节点所指向的叶子节点的索引键是否与该最大索引键连续,若连续,则可以不做处理,若不连续,则可以在该具有最大索引键的目标叶子节点之后插入一个虚拟叶子节点,该虚拟叶子节点的索引键为该最大索引键加1。In addition, in the embodiment of the present specification, in consideration of further improving the efficiency of data access in static data in the future, it can be further judged that among the target leaf nodes to which the deletion mark is added, the leaf node pointed to by the target leaf node with the largest index key Whether the index key of the index is continuous with the maximum index key. If the index key is continuous, then no processing is required. If the index key is not continuous, a virtual leaf node can be inserted after the target leaf node with the largest index key. The index key of the virtual leaf node Add 1 to the largest index key.
举例来说,以被添加删除标记的目标叶子节点为索引键为3至8的叶子节点为例,在该些目标叶子节点中,最大索引键为8,如图4所示,索引键为8的叶子节点所指向的下一个叶子节点的索引键为9,与8连续,从而则可以不做处理。For example, take the target leaf node to which the deletion tag is added as a leaf node with an index key of 3 to 8 as an example. Among these target leaf nodes, the maximum index key is 8, as shown in FIG. 4, the index key is 8 The index key of the next leaf node pointed to by the leaf node is 9, which is continuous with 8, so that no processing is required.
再举例来说,以被添加删除标记的目标叶子节点为索引键为15至31的叶子节点为例,在该些目标叶子节点中,最大索引键为31,如图4所示,索引键为31的叶子节点不再指向其他叶子节点,那么,则可以在该叶子节点之后插入一个虚拟叶子节点,该虚拟叶子节点的索引键为32,例如,如图4所示。当然,本领域技术人员可以理解的是,该虚拟叶子节点不具有删除标记。For another example, take the target leaf node to which the deletion tag is added as a leaf node having an index key of 15 to 31. Among these target leaf nodes, the maximum index key is 31, as shown in FIG. 4, the index key is The leaf node of 31 no longer points to other leaf nodes. Then, a virtual leaf node can be inserted after the leaf node, and the index key of the virtual leaf node is 32, for example, as shown in FIG. 4. Of course, those skilled in the art can understand that the virtual leaf node does not have a delete mark.
通过该种处理,可以进一步提高在静态数据中进行数据访问的效率,例如,在访问动态数据过程中,当遍历到索引键为9的叶子节点时,发现其不具有删除标记,则可以将[3,8]这一索引键范围看作被删除数据的索引键范围,后续,在访问静态数据时,则可以不再访问索引键属于这一范围的叶子节点。Through this kind of processing, the efficiency of data access in static data can be further improved. For example, when accessing dynamic data, when traversing to a leaf node with an index key of 9 and finding that it does not have a delete mark, you can change [ 3, 8] This index key range is regarded as the index key range of the deleted data. Later, when accessing static data, you can no longer access the leaf nodes whose index keys belong to this range.
其次,静态数据:Second, static data:
基于上述步骤202中的相关举例,在静态数据中所确定出的目标叶子节点包括索引键为3至15的叶子节点,此时,则可以首先确定动态数据中是否存在索引键为3至15的目标叶子节点,若不存在,则可以将目标叶子节点中,具有最大索引键和最小索引键的两个目标叶子节点插入LSM树的动态数据中,在LSM树的动态数据中,为所插入的目标叶子节点添加删除标记。Based on the relevant examples in step 202 above, the target leaf nodes determined in the static data include leaf nodes with index keys of 3 to 15, at this time, it can be first determined whether there is an index key of 3 to 15 in the dynamic data. If the target leaf node does not exist, two target leaf nodes with the largest index key and the smallest index key in the target leaf node can be inserted into the dynamic data of the LSM tree. In the dynamic data of the LSM tree, the inserted Add a delete mark to the target leaf node.
例如,在索引键为3至15的叶子节点中,最大索引键为15,最小索引键为3,按照 前述描述,将索引键为3和15的目标叶子节点插入LSM树的动态数据中,并在动态数据中,为所插入的目标叶子节点添加删除标记,具体如图5所示。For example, among the leaf nodes with index keys of 3 to 15, the maximum index key is 15 and the minimum index key is 3. According to the foregoing description, target leaf nodes with index keys of 3 and 15 are inserted into the dynamic data of the LSM tree, and In the dynamic data, a delete mark is added to the inserted target leaf node, as shown in FIG. 5.
此外,可以进一步判断被添加删除标记的目标叶子节点中,具有最大索引键的目标叶子节点所指向的叶子节点的索引键是否与该最大索引键连续,若连续,则可以不做处理,若不连续,则可以在索引键为该最大索引键的目标叶子节点之后插入一个虚拟叶子节点,该虚拟叶子节点的索引键为该最大索引键加1。In addition, it can be further judged whether the index key of the leaf node pointed by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the largest index key. If the index key is continuous, then no processing is required. Continuously, a virtual leaf node can be inserted after the index key is the target leaf node of the largest index key, and the index key of the virtual leaf node is the largest index key plus one.
举例来说,在图5中,被添加删除标记的目标叶子节点为索引键为3和15的叶子节点,其中的最大索引键为15,而在索引键为15的这一叶子节点之后,不具有其他叶子节点,则可以在该叶子节点之后,插入一个索引键为16的虚拟叶子节点,具体如图5所示。For example, in FIG. 5, the target leaf node to which the deletion mark is added is a leaf node with index keys of 3 and 15, the maximum index key of which is 15, and after the leaf node whose index key is 15, no With other leaf nodes, a virtual leaf node with an index key of 16 can be inserted after the leaf node, as shown in FIG. 5.
通过该种后续,后续在访问动态数据过程中,当遍历到索引键为16的虚拟叶子节点时,发现其不具有删除标记,则可以将[3,15]看作被删除数据的索引键范围,后续,在访问静态数据时,则可以不再访问索引键属于这一范围的叶子节点,从而提高在静态数据中进行数据访问的效率。Through this kind of follow-up, in the process of accessing dynamic data, when the virtual leaf node with an index key of 16 is traversed and found that it does not have a delete mark, [3, 15] can be regarded as the index key range of the deleted data. Subsequently, when accessing static data, the leaf nodes whose index keys belong to this range can no longer be accessed, thereby improving the efficiency of data access in static data.
本说明书实施例所提供的技术方案,通过确定出LSM树中,所指向的数据已被删除的目标叶子节点,在该LSM树的动态数据中,为所确定出的目标叶子节点添加删除标记,由于在对LSM树进行数据访问时,首先访问LSM树的动态数据,再访问LSM树的静态数据,从而在动态数据中,为那些所指向的数据已被删除的目标叶子节点添加删除标记,可以实现在访问动态数据时,直接跳过具有删除标记的叶子节点,从而提高数据访问效率,进一步,在访问静态数据时,也无需重复访问具有删除标记的叶子节点,从而提高数据访问效率。In the technical solution provided in the embodiment of the present specification, a target leaf node in which data pointed to by the LSM tree has been deleted is determined, and a deletion mark is added to the determined target leaf node in dynamic data of the LSM tree. Because when accessing the LSM tree, first access the dynamic data of the LSM tree, and then access the static data of the LSM tree. Therefore, in the dynamic data, add delete marks to the target leaf nodes whose data has been deleted. When accessing dynamic data, it can directly skip the leaf nodes with delete marks, thereby improving data access efficiency. Furthermore, when accessing static data, there is no need to repeatedly visit leaf nodes with delete marks, thereby improving data access efficiency.
相应于上述方法实施例,本说明书实施例还提供一种LSM树的优化装置,请参见图6,为本说明书一示例性实施例提供的一种LSM树的优化装置的实施例框图,该装置可以包括:目标确定模块61、第一添加模块62。Corresponding to the foregoing method embodiments, an embodiment of the present specification further provides an apparatus for optimizing an LSM tree. Please refer to FIG. 6, which is a block diagram of an embodiment of an apparatus for optimizing an LSM tree according to an exemplary embodiment of the present specification. It may include a target determination module 61 and a first adding module 62.
其中,目标确定模块61,可以用于确定出LSM树中,所指向的数据已被删除的目标叶子节点;The target determination module 61 may be used to determine a target leaf node in the LSM tree to which the pointed data has been deleted;
第一添加模块62,可以用于在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。The first adding module 62 may be configured to add a delete tag to the target leaf node in the dynamic data of the LSM tree.
在一实施例中,所述装置还可以包括(图6中未示出):In an embodiment, the apparatus may further include (not shown in FIG. 6):
第二添加模块,用于在所述LSM树的动态数据中,若检测到任一枝节点下的所有叶子节点均被添加有删除标记,则为所述枝节点添加删除标记。A second adding module is configured to add a delete tag to the branch node if it is detected that all leaf nodes under any branch node are added with a delete tag in the dynamic data of the LSM tree.
在一实施例中,所述目标确定模块61可以具体用于:In an embodiment, the target determining module 61 may be specifically configured to:
分别针对LSM树的动态数据和静态数据,确定出所指向的数据已被删除,且索引键连续的两个以上目标叶子节点。For the dynamic data and static data of the LSM tree, it is determined that the data pointed to has been deleted and the two or more target leaf nodes have consecutive index keys.
在一实施例中,所述第一添加模块62可以包括(图6中未示出):In an embodiment, the first adding module 62 may include (not shown in FIG. 6):
插入子模块,用于若针对LSM树的静态数据确定出目标叶子节点,分别将所确定出的目标叶子节点中,具有最大索引键和最小索引键的两个目标叶子节点插入所述LSM树的动态数据中;An inserting sub-module for inserting two target leaf nodes with the largest index key and the smallest index key of the determined target leaf nodes into the LSM tree if the target leaf nodes are determined for the static data of the LSM tree In dynamic data
标记添加子模块,用于在所述LSM树的动态数据中,为所插入的目标叶子节点添加删除标记。The tag adding submodule is used to add a delete tag to the inserted target leaf node in the dynamic data of the LSM tree.
在一实施例中,所述装置还可以包括(图6中未示出):In an embodiment, the apparatus may further include (not shown in FIG. 6):
判断模块,用于判断被添加删除标记的目标叶子节点中,具有最大索引键的目标叶子节点所指向的叶子节点的索引键是否与所述最大索引键连续;A judging module, configured to judge whether the index key of the leaf node pointed to by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the largest index key;
虚拟节点插入模块,用于若不连续,则在所述具有最大索引键的目标叶子节点之后插入虚拟叶子节点,其中,所述虚拟叶子节点的索引键为所述最大索引键加1。A virtual node inserting module is configured to insert a virtual leaf node after the target leaf node having the largest index key if it is not continuous, wherein the index key of the virtual leaf node is the largest index key plus one.
可以理解的是,目标确定模块61与第一添加模块62作为两种功能独立的模块,既可以如图6所示同时配置在装置中,也可以分别单独配置在装置中,因此图6所示的结构不应理解为对本说明书实施例方案的限定。It can be understood that the target determination module 61 and the first adding module 62 are two modules with independent functions, which can be configured in the device at the same time as shown in FIG. 6 or can be separately configured in the device, so FIG. 6 shows The structure should not be construed as limiting the scheme of the embodiment of the present specification.
此外,上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。In addition, the implementation process of the functions and functions of each module in the above device is described in detail in the implementation process of the corresponding steps in the above method, and details are not described herein again.
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述的LSM树的优化方法。该方法至少包括:确定出LSM树中,所指向的数据已被删除的目标叶子节点;在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。An embodiment of the present specification further provides a computer device including at least a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the foregoing optimization of the LSM tree when executing the program. method. The method at least includes: determining a target leaf node in the LSM tree whose data has been deleted; and adding a deletion mark to the target leaf node in the dynamic data of the LSM tree.
图7示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器710、存储器720、输入/输出接口730、通信接口740和总线750。其中处理器710、存储器720、输入/输出接口730和通信接口740通过总线750实现彼 此之间在设备内部的通信连接。FIG. 7 shows a more specific schematic diagram of a hardware structure of a computing device provided by an embodiment of the present specification. The device may include a processor 710, a memory 720, an input / output interface 730, a communication interface 740, and a bus 750. Among them, the processor 710, the memory 720, the input / output interface 730, and the communication interface 740 realize a communication connection within the device through the bus 750.
处理器710可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。The processor 710 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, etc., for performing related operations. Program to implement the technical solution provided by the embodiment of the present specification.
存储器720可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器720可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器720中,并由处理器710来调用执行。The memory 720 may be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage devices, and dynamic storage devices. The memory 720 may store an operating system and other application programs. When the technical solutions provided in the embodiments of the present specification are implemented by software or firmware, related program codes are stored in the memory 720 and are called and executed by the processor 710.
输入/输出接口730用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图7中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input / output interface 730 is used to connect an input / output module to implement information input and output. The input / output / module can be configured in the device as a component (not shown in FIG. 7), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, and an indicator light.
通信接口740用于连接通信模块(图7中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 740 is used to connect a communication module (not shown in FIG. 7) to implement communication interaction between the device and other devices. The communication module can implement communication through a wired method (for example, USB, network cable, etc.), and can also implement communication through a wireless method (for example, mobile network, WIFI, Bluetooth, etc.).
总线750包括一通路,在设备的各个组件(例如处理器710、存储器720、输入/输出接口730和通信接口740)之间传输信息。The bus 750 includes a path for transmitting information between various components of the device (for example, the processor 710, the memory 720, the input / output interface 730, and the communication interface 740).
需要说明的是,尽管上述设备仅示出了处理器710、存储器720、输入/输出接口730、通信接口740以及总线750,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 710, the memory 720, the input / output interface 730, the communication interface 740, and the bus 750, in a specific implementation process, the device may also include necessary for achieving normal operation Other components. In addition, those skilled in the art can understand that the foregoing device may also include only components necessary to implement the solutions of the embodiments of the present specification, and does not necessarily include all the components shown in the drawings.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述的LSM树的优化方法。该方法至少包括:确定出LSM树中,所指向的数据已被删除的目标叶子节点;在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。An embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the foregoing optimization method of the LSM tree is implemented. The method at least includes: determining a target leaf node in the LSM tree whose data has been deleted; and adding a deletion mark to the target leaf node in the dynamic data of the LSM tree.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、 只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes permanent and non-persistent, removable and non-removable media. Information storage can be accomplished by any method or technology. Information may be computer-readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。It can be known from the description of the foregoing embodiments that those skilled in the art can clearly understand that the embodiments of the present specification can be implemented by means of software plus a necessary universal hardware platform. Based on such an understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product that is essentially or contributes to the existing technology. The computer software product may be stored in a storage medium, such as ROM / RAM, Magnetic disks, optical disks, and the like include a number of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or portions of the embodiments of this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The system, device, module, or unit described in the foregoing embodiments may be specifically implemented by a computer chip or entity, or a product with a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or a combination of any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, it is described relatively simply, and the relevant part may refer to the description of the method embodiment. The device embodiments described above are only schematic, and the modules described as separate components may or may not be physically separated. When implementing the solutions of the embodiments of this specification, the functions of each module may be the same. Or multiple software and / or hardware. Some or all of the modules may also be selected according to actual needs to achieve the objective of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without creative efforts.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above are only specific implementations of the embodiments of the present specification. It should be noted that for those of ordinary skill in the art, without departing from the principles of the embodiments of the present specification, several improvements and retouches can be made. These Improvement and retouching should also be regarded as the protection scope of the embodiments of the present specification.

Claims (11)

  1. 一种LSM树的优化方法,所述方法包括:An optimization method for an LSM tree, the method includes:
    确定出LSM树中,所指向的数据已被删除的目标叶子节点;Determine the target leaf node in the LSM tree whose data has been deleted;
    在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。In the dynamic data of the LSM tree, a delete tag is added to the target leaf node.
  2. 根据权利要求1所述的方法,所述方法还包括:The method according to claim 1, further comprising:
    在所述LSM树的动态数据中,若检测到任一枝节点下的所有叶子节点均被添加有删除标记,则为所述枝节点添加删除标记。In the dynamic data of the LSM tree, if it is detected that all leaf nodes under any branch node are added with a delete tag, a delete tag is added to the branch node.
  3. 根据权利要求1所述的方法,所述确定出LSM树中,所指向的数据已被删除的目标叶子节点,包括:The method according to claim 1, wherein determining the target leaf node in the LSM tree to which the pointed data has been deleted comprises:
    分别针对LSM树的动态数据和静态数据,确定出所指向的数据已被删除,且索引键连续的两个以上目标叶子节点。For the dynamic data and static data of the LSM tree, it is determined that the data pointed to has been deleted and the two or more target leaf nodes have consecutive index keys.
  4. 根据权利要求3所述的方法,若针对LSM树的静态数据确定出目标叶子节点,在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记,包括:The method according to claim 3, if the target leaf node is determined for the static data of the LSM tree, adding a delete tag to the target leaf node in the dynamic data of the LSM tree comprises:
    分别将所确定出的目标叶子节点中,具有最大索引键和最小索引键的两个目标叶子节点插入所述LSM树的动态数据中;Insert two target leaf nodes with the largest index key and the smallest index key from the determined target leaf nodes into the dynamic data of the LSM tree;
    在所述LSM树的动态数据中,为所插入的目标叶子节点添加删除标记。In the dynamic data of the LSM tree, a delete mark is added to the inserted target leaf node.
  5. 根据权利要求1所述的方法,在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记之后,所述方法还包括:The method according to claim 1, after adding a delete tag to the target leaf node in the dynamic data of the LSM tree, the method further comprises:
    判断被添加删除标记的目标叶子节点中,具有最大索引键的目标叶子节点所指向的叶子节点的索引键是否与所述最大索引键连续;Judging whether the index key of the leaf node pointed by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the maximum index key;
    若不连续,则在所述具有最大索引键的目标叶子节点之后插入虚拟叶子节点,其中,所述虚拟叶子节点的索引键为所述最大索引键加1。If discontinuous, a virtual leaf node is inserted after the target leaf node with the largest index key, where the index key of the virtual leaf node is the largest index key plus one.
  6. 一种LSM树的优化装置,所述装置包括:An apparatus for optimizing an LSM tree, the apparatus includes:
    目标确定模块,用于确定出LSM树中,所指向的数据已被删除的目标叶子节点;A target determining module, configured to determine a target leaf node in the LSM tree whose data has been deleted;
    第一添加模块,用于在所述LSM树的动态数据中,为所述目标叶子节点添加删除标记。A first adding module is configured to add a delete tag to the target leaf node in the dynamic data of the LSM tree.
  7. 根据权利要求6所述的装置,所述装置还包括:The apparatus according to claim 6, further comprising:
    第二添加模块,用于在所述LSM树的动态数据中,若检测到任一枝节点下的所有叶子节点均被添加有删除标记,则为所述枝节点添加删除标记。A second adding module is configured to add a delete tag to the branch node if it is detected that all leaf nodes under any branch node are added with a delete tag in the dynamic data of the LSM tree.
  8. 根据权利要求6所述的装置,所述目标确定模块具体用于:The apparatus according to claim 6, the target determination module is specifically configured to:
    分别针对LSM树的动态数据和静态数据,确定出所指向的数据已被删除,且索引 键连续的两个以上目标叶子节点。For the dynamic data and static data of the LSM tree, it is determined that the data pointed to has been deleted, and more than two target leaf nodes have consecutive index keys.
  9. 根据权利要求8所述的装置,所述第一添加模块包括:The apparatus according to claim 8, the first adding module comprises:
    插入子模块,用于若针对LSM树的静态数据确定出目标叶子节点,分别将所确定出的目标叶子节点中,具有最大索引键和最小索引键的两个目标叶子节点插入所述LSM树的动态数据中;An inserting sub-module for inserting two target leaf nodes with the largest index key and the smallest index key of the determined target leaf nodes into the LSM tree if the target leaf nodes are determined for the static data of the LSM tree In dynamic data
    标记添加子模块,用于在所述LSM树的动态数据中,为所插入的目标叶子节点添加删除标记。The tag adding submodule is used to add a delete tag to the inserted target leaf node in the dynamic data of the LSM tree.
  10. 根据权利要求6所述的装置,所述装置还包括:The apparatus according to claim 6, further comprising:
    判断模块,用于判断被添加删除标记的目标叶子节点中,具有最大索引键的目标叶子节点所指向的叶子节点的索引键是否与所述最大索引键连续;A judging module, configured to judge whether the index key of the leaf node pointed to by the target leaf node with the largest index key among the target leaf nodes to which the deletion mark is added is continuous with the largest index key;
    虚拟节点插入模块,用于若不连续,则在所述具有最大索引键的目标叶子节点之后插入虚拟叶子节点,其中,所述虚拟叶子节点的索引键为所述最大索引键加1。A virtual node inserting module is configured to insert a virtual leaf node after the target leaf node having the largest index key if it is not continuous, wherein the index key of the virtual leaf node is the largest index key plus one.
  11. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1至5任一项所述的方法。A computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the program according to any one of claims 1 to 5 when executing the program. method.
PCT/CN2019/077404 2018-05-31 2019-03-08 Lsm tree optimization method and device and computer equipment WO2019228009A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810550710.X 2018-05-31
CN201810550710.XA CN108804625B (en) 2018-05-31 2018-05-31 LSM tree optimization method and device and computer equipment

Publications (1)

Publication Number Publication Date
WO2019228009A1 true WO2019228009A1 (en) 2019-12-05

Family

ID=64089726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077404 WO2019228009A1 (en) 2018-05-31 2019-03-08 Lsm tree optimization method and device and computer equipment

Country Status (3)

Country Link
CN (1) CN108804625B (en)
TW (1) TWI710918B (en)
WO (1) WO2019228009A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804625B (en) * 2018-05-31 2020-05-12 阿里巴巴集团控股有限公司 LSM tree optimization method and device and computer equipment
CN114398378B (en) * 2022-03-25 2022-11-01 北京奥星贝斯科技有限公司 Method and device for determining index cost

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198150A (en) * 2013-04-24 2013-07-10 清华大学 Big data indexing method and system
US20140279855A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Differentiated secondary index maintenance in log structured nosql data stores
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN105447059A (en) * 2014-09-29 2016-03-30 华为技术有限公司 Data processing method and device
US9324367B1 (en) * 2015-05-05 2016-04-26 Futurewei Technologies, Inc. SMR-aware append-only file system
CN108804625A (en) * 2018-05-31 2018-11-13 阿里巴巴集团控股有限公司 A kind of optimization method, device and the computer equipment of LSM trees

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102142015A (en) * 2011-01-30 2011-08-03 唐凌遥 Processing system and processing method for nodes in family tree
CN102542057B (en) * 2011-12-29 2013-10-16 北京大学 High dimension data index structure design method based on solid state hard disk
US9727598B2 (en) * 2012-12-19 2017-08-08 Salesforce.Com, Inc. Systems, methods, and apparatuses for fixing logical or physical corruption in databases using LSM trees
CN103744961B (en) * 2014-01-06 2016-10-19 清华大学 The method improving the non-volatile memories life-span by reconfigurable file system directory tree
CN105224237B (en) * 2014-05-26 2018-06-19 华为技术有限公司 A kind of date storage method and device
US9959207B2 (en) * 2015-06-25 2018-05-01 Vmware, Inc. Log-structured B-tree for handling random writes
CN105138622B (en) * 2015-08-14 2018-05-22 中国科学院计算技术研究所 For the insertion operation of LSM tree storage systems and reading and the merging method of load
US10795871B2 (en) * 2016-09-26 2020-10-06 Vmware, Inc. Key-value stores implemented using fragmented log-structured merge trees

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279855A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Differentiated secondary index maintenance in log structured nosql data stores
CN103198150A (en) * 2013-04-24 2013-07-10 清华大学 Big data indexing method and system
CN105447059A (en) * 2014-09-29 2016-03-30 华为技术有限公司 Data processing method and device
US9324367B1 (en) * 2015-05-05 2016-04-26 Futurewei Technologies, Inc. SMR-aware append-only file system
CN105159915A (en) * 2015-07-16 2015-12-16 中国科学院计算技术研究所 Dynamically adaptive LSM (Log-structured merge) tree combination method and system
CN108804625A (en) * 2018-05-31 2018-11-13 阿里巴巴集团控股有限公司 A kind of optimization method, device and the computer equipment of LSM trees

Also Published As

Publication number Publication date
TW202004521A (en) 2020-01-16
CN108804625A (en) 2018-11-13
TWI710918B (en) 2020-11-21
CN108804625B (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN107704202B (en) Method and device for quickly reading and writing data
KR20160073402A (en) Callpath finder
US10884926B2 (en) Method and system for distributed storage using client-side global persistent cache
CN113535721B (en) Data writing method and device
US8903875B2 (en) Method for identifying corresponding directories in a union-mounted file system
CN111241040B (en) Information acquisition method and device, electronic equipment and computer storage medium
US11455117B2 (en) Data reading method, apparatus, and system, avoiding version rollback issues in distributed system
CN111324665A (en) Log playback method and device
US10929445B2 (en) Distributed search framework with virtual indexing
CN109522271A (en) A kind of batch insertion of B+ tree node and delet method and device
CN108875046A (en) A kind of storage system access method, device and electronic equipment
WO2019228009A1 (en) Lsm tree optimization method and device and computer equipment
US8396858B2 (en) Adding entries to an index based on use of the index
KR20150045073A (en) Data Operating Method And System supporting the same
CN113297432A (en) Method, processor readable medium and system for partition splitting and merging
WO2024016789A1 (en) Log data query method and apparatus, and device and medium
US11340999B2 (en) Fast restoration method from inode based backup to path based structure
CN112685329B (en) Method for processing data and related device
CN109344159B (en) Lookup method and device for LDAP (lightweight directory Access protocol), electronic equipment and storage medium
CN112948389A (en) MD 5-based database table data comparison method and equipment
CN113342270A (en) Volume unloading method and device and electronic equipment
CN112632211A (en) Semantic information processing method and equipment for mobile robot
CN113625938A (en) Metadata storage method and equipment thereof
JP6227055B1 (en) Storage system and file writing method
CN111625500A (en) File snapshot method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19811647

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19811647

Country of ref document: EP

Kind code of ref document: A1