CN117194440A

CN117194440A - Database index compression method and device, electronic equipment and storage medium

Info

Publication number: CN117194440A
Application number: CN202311479042.3A
Authority: CN
Inventors: 胡浩; 陈宇凡; 郑启洋; 夏文; 邹翔宇; 李诗逸; 张程伟; 张皖川; 熊艳辉; 蒋兆恒
Original assignee: Primitive Data Beijing Information Technology Co ltd; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Primitive Data Beijing Information Technology Co ltd; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2023-12-08
Anticipated expiration: 2043-11-08
Also published as: CN117194440B

Abstract

The application discloses a database index compression method, a database index compression device, electronic equipment and a storage medium, and relates to the technical field of indexes. The common prefix is extracted for the leaf nodes in the list of leaf nodes by obtaining an index tree comprising a plurality of leaf nodes. And determining a first boundary and a second boundary of the initial sliding window in the leaf node list, calculating the first benefit, and calculating the second benefit according to a preset boundary obtained by moving the second boundary to the next index value. And if the second benefit is greater than or equal to the first benefit, meeting the preset condition, and updating the second boundary to obtain a second sliding window. And taking the second sliding window as an initial sliding window, and iteratively executing the process until the preset condition is not met, thereby acquiring the public prefix of the initial sliding window. And compressing index values of leaf nodes in the initial sliding window by using the common prefix. By setting the sliding window and updating the sliding window according to the income, the index values in different sliding windows are compressed by using different public prefixes, so that the index compression rate of the database is effectively improved.

Description

Database index compression method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of indexing technologies, and in particular, to a database index compression method, apparatus, electronic device, and storage medium.

Background

In the database, the index is a single and physical storage structure for ordering the values of one or more columns in the data table, and the data query efficiency can be effectively improved by using the index. Compression of database indexes is an important means of optimizing database performance. The physical storage space of the index can be reduced through index compression, the storage efficiency is improved, meanwhile, the disk I/O operation is reduced, and the query speed is increased.

The related art database index compression technique is mostly directed to online transaction processing (Online Analytical Processing, OLAP) systems, and cannot be fully applied to online transaction processing (Online Transaction Processing, OLTP) systems. In the OLTP system, the ordered indexes have different common prefix continuous intervals, so that the related algorithm is difficult to effectively identify different common prefixes of the indexes, and the compression rate of the database indexes in the OLTP system is low.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the embodiment of the application provides a database index compression method, a device, electronic equipment and a storage medium, which can dynamically identify common prefixes of different indexes and improve the database index compression rate.

In a first aspect, an embodiment of the present application provides a method for compressing a database index, including:

acquiring an index tree of the database; wherein the index tree comprises a plurality of leaf nodes, each leaf node comprising at least one index value;

extracting a common prefix from the leaf nodes in the leaf node list; the process for extracting the common prefix comprises the following steps:

determining a first boundary and a second boundary of an initial sliding window in the leaf node list, and calculating a first benefit of the initial sliding window; the initial value of the first boundary is a first index value of a first leaf node, and the initial value of the second boundary is a next index value selected based on the first boundary;

calculating a second benefit of the initial sliding window according to a preset boundary obtained by moving the second boundary to the next index value;

if the second benefit is greater than or equal to the first benefit, the initial sliding window meets a preset condition, the second boundary of the initial sliding window is updated to obtain a second sliding window, the second sliding window is used as the initial sliding window, the process of extracting the public prefix is executed until the initial sliding window does not meet the preset condition, and the public prefix of the initial sliding window is obtained;

Compressing the leaf nodes in the initial sliding window by utilizing the public prefix to obtain an index table; the index table comprises a common prefix table and compressed leaf nodes, and the common prefix table comprises the common prefix.

In some embodiments of the present application, the initial sliding window does not satisfy the preset condition, including:

if the second benefit is smaller than the first benefit, selecting the leaf node where the second boundary is located as an initial leaf node;

based on the leaf node list, starting from the initial leaf node, executing the process of extracting the common prefix to obtain the common prefix corresponding to the initial leaf node;

until reaching the last leaf node of the list of leaf nodes, obtaining at least one common prefix; the common prefix is stored in a memory space.

In some embodiments of the application, the leaf nodes store index values in order; the calculating a first benefit of the initial sliding window includes:

matching the index value of the first boundary with the index value of the second boundary to obtain the longest common prefix of the initial sliding window;

Calculating a window length of the initial sliding window based on the first boundary and the second boundary;

multiplying the prefix length of the longest common prefix by the window length yields the first benefit.

In some embodiments of the present application, the compressing the leaf node in the initial sliding window with the common prefix to obtain an index table includes:

selecting an index part from the index values of the leaf nodes according to the public prefix;

storing, with the leaf node, the index portion and a pointer to a storage location of the common prefix in the memory space, and generating the index table based on the leaf node.

In some embodiments of the present application, the updating the second boundary of the initial sliding window to obtain a second sliding window includes:

taking the first boundary of the initial sliding window as a first boundary of the second sliding window;

and taking the preset boundary as a second boundary of the second sliding window.

In some embodiments of the application, the method further comprises:

responding to an index inquiry instruction, and acquiring a target leaf node based on the index table;

acquiring the public prefix in a memory space according to the pointer in the target leaf node;

And combining the common prefix and the index part in the target leaf node to obtain the index value of the target leaf node.

In some embodiments of the application, the method further comprises:

obtaining an index value to be inserted, and inserting the index value to be inserted into a target leaf node;

and if the common prefix exists in the memory space and is matched with the index value of the target leaf node, compressing the target leaf node by using the common prefix.

In a second aspect, an embodiment of the present application further provides a database index compression apparatus, where the database index compression method according to the embodiment of the first aspect of the present application includes:

the acquisition module is used for acquiring an index tree of the database; wherein the index tree comprises a plurality of leaf nodes;

the extraction module is used for extracting the common prefix from the leaf nodes in the leaf node list; the process for extracting the common prefix comprises the following steps: determining a first boundary and a second boundary of an initial sliding window in a leaf node list, and calculating a first benefit of the initial sliding window; the initial value of the first boundary is a first index value of a first leaf node, and the initial value of the second boundary is a next index value selected based on the first boundary; calculating a second benefit of the initial sliding window according to a preset boundary obtained by moving the second boundary to the next index value; if the second benefit is greater than or equal to the first benefit, the initial sliding window meets a preset condition, the second boundary of the initial sliding window is updated to obtain a second sliding window, the second sliding window is used as the initial sliding window, the process of extracting the public prefix is executed until the initial sliding window does not meet the preset condition, and the public prefix of the initial sliding window is obtained;

The compression module is used for compressing the leaf nodes in the initial sliding window by utilizing the common prefix to obtain an index table; the index table comprises a common prefix table and compressed leaf nodes, and the common prefix table comprises the common prefix.

In some embodiments of the application, the apparatus further comprises:

the query module is used for responding to the index query instruction and acquiring a target leaf node based on the index table; acquiring a common prefix in a memory space according to the pointer in the target leaf node; and combining the common prefix and the index part in the target leaf node to obtain the index value of the target leaf node.

In some embodiments of the application, the apparatus further comprises:

the inserting module is used for obtaining an index value to be inserted and inserting the index value to be inserted into the target leaf node; and if the common prefix exists in the memory space and is matched with the index value of the target leaf node, compressing the target leaf node by using the common prefix.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, and a processor, where the memory stores a computer program, and the processor implements the database index compression method according to the embodiment of the first aspect of the present application when executing the computer program.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where a program is stored, where the program is executed by a processor to implement a database index compression method according to an embodiment of the first aspect of the present application.

The embodiment of the application at least comprises the following beneficial effects:

the embodiment of the application provides a database index compression method, a device, electronic equipment and a storage medium, wherein the method comprises the steps of obtaining an index tree of a database, wherein the index tree comprises a plurality of leaf nodes, each leaf node comprises at least one index value, and then extracting a common prefix from leaf nodes in a leaf node list. Specifically, the process of extracting the common prefix includes: and determining a first boundary and a second boundary of the initial sliding window in the leaf node list, and calculating first benefits of the initial sliding window, wherein the initial value of the first boundary is a first index value of a first leaf node in the leaf node list, and the initial value of the second boundary is a next index value selected based on the first boundary. And moving the second boundary to the next index value to obtain a preset boundary, so as to calculate the second benefit of the initial sliding window according to the first boundary and the preset boundary. If the second benefit is greater than or equal to the first benefit, the initial sliding window meets the preset condition, and the second boundary of the initial sliding window is updated to obtain the second sliding window. And taking the second sliding window as an initial sliding window, and iteratively executing the process of extracting the common prefix until the initial sliding window does not meet the preset condition, thereby obtaining the common prefix of the initial sliding window. And finally, compressing the leaf nodes in the initial sliding window by using the common prefix to obtain an index table comprising the common prefix table and the compressed leaf nodes. By setting the sliding window, the leaf nodes in the index tree are dynamically identified according to the benefits of the sliding window to obtain the common prefix, and the index values of the leaf nodes in different sliding windows are compressed by using different common prefixes, so that the characteristic of different common prefix continuous intervals of the ordered index is effectively utilized, and the index compression rate of the database is improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of a database index compression method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of index construction provided by one embodiment of the present application;

FIG. 3 is a schematic view of a leaf node list provided by one embodiment of the present application;

fig. 4 is a schematic flow chart of step S203 in fig. 1;

fig. 5 is a schematic flow chart of step S201 in fig. 1;

fig. 6 is a schematic flow chart of step S103 in fig. 1;

fig. 7 is another schematic flow chart of step S203 in fig. 1;

FIG. 8 is a flowchart of a database index compression method according to another embodiment of the present application;

FIG. 9 is a flowchart of a database index compression method according to another embodiment of the present application;

FIG. 10 is a schematic diagram of data index compression provided by one embodiment of the present application;

FIG. 11 is a schematic diagram of a database index compression device according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a database index compression device according to another embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals: the system comprises an acquisition module 100, an extraction module 200, a compression module 300, a query module 400, an insertion module 500, an electronic device 1000, a processor 1001 and a memory 1002.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

In the description of the present application, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present application and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present application.

In the description of the present application, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present application, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present application can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.

In the relational database, the index is a single and physical storage structure for ordering the values of one or more columns in the data table, and the data query efficiency can be effectively improved by using the index. The indexes are of multiple types, each table in the database has one aggregation index, other indexes except the aggregation index are called secondary indexes, and each table in the database can have multiple secondary indexes. In order to save the memory space required for storing the index and reduce the amount of data transmission involved in the query processing, the related art uses different compression techniques for index compression, including variable-length encoding or fixed-length index key compression, etc.

The related art database index compression technique can cause a decrease in index decompression speed while maintaining a better compression ratio, thereby greatly increasing query time, resulting in a decrease in performance, and is often directed to an online transaction processing (Online Analytical Processing, OLAP) system, which performs compression by allocating smaller codes to more frequent values using a column value distribution bias, similar to huffman coding in entropy coding. For example, memory efficient data structures are designed, but this can make major changes to existing database systems and is not fully applicable to online transaction processing (Online Transaction Processing, OLTP) systems. Also for example, by access patterns in the database, data with low access frequency is compressed using entropy encoding, but for many OLTP applications this is inaccurate, resulting in reduced performance. In the OLTP system, the ordered indexes have different common prefix continuous intervals, and the related algorithm can only find the common prefix with a fixed length, so that it is difficult to effectively identify the best common prefix of different indexes, thereby resulting in low compression rate of the database index in the OLTP system.

Based on the above, the embodiment of the application provides a database index compression method, a device, an electronic device and a storage medium, wherein the method comprises the steps of obtaining an index tree of a database, wherein the index tree comprises a plurality of leaf nodes, each leaf node comprises at least one index value, and then extracting a common prefix from leaf nodes in a leaf node list. Specifically, the process of extracting the common prefix includes: and determining a first boundary and a second boundary of the initial sliding window in the leaf node list, and calculating first benefits of the initial sliding window, wherein the initial value of the first boundary is a first index value of a first leaf node in the leaf node list, and the initial value of the second boundary is a next index value selected based on the first boundary. And moving the second boundary to the next index value to obtain a preset boundary, so as to calculate the second benefit of the initial sliding window according to the first boundary and the preset boundary. If the second benefit is greater than or equal to the first benefit, the initial sliding window meets the preset condition, and the second boundary of the initial sliding window is updated to obtain the second sliding window. And taking the second sliding window as an initial sliding window, and iteratively executing the process of extracting the common prefix until the initial sliding window does not meet the preset condition, thereby obtaining the common prefix of the initial sliding window. And finally, compressing the leaf nodes in the initial sliding window by using the common prefix to obtain an index table comprising the common prefix table and the compressed leaf nodes. By setting the sliding window, the leaf nodes in the index tree are dynamically identified according to the benefits of the sliding window to obtain the common prefix, and the index values of the leaf nodes in different sliding windows are compressed by using different common prefixes, so that the characteristic of different common prefix continuous intervals of the ordered index is effectively utilized, and the index compression rate of the database is improved.

The embodiment of the application provides a database index compression method, a device, electronic equipment and a storage medium, and specifically, the method for compressing the database index in the embodiment of the application is described firstly by describing the following embodiment.

The embodiment of the application provides a database index compression method, relates to the technical field of indexes, and particularly relates to the technical field of database indexes. The database index compression method provided by the embodiment of the application can be applied to a terminal, a server and a computer program running in the terminal or the server. For example, the computer program may be a native program or a software module in an operating system; the application may be a local application, i.e., a program that needs to be installed in an operating system to run, such as a client that supports database index compression, i.e., a program that needs to be downloaded only into a browser environment to run. In general, the computer programs described above may be any form of application, module or plug-in. Wherein the terminal communicates with the server through a network. The database index compression method may be performed by a terminal or a server, or by a terminal and a server in cooperation.

In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, or smart watch, or the like. The server can be an independent server, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), basic cloud computing services such as big data and artificial intelligent platforms, and the like; or may be service nodes in a blockchain system, where Peer-To-Peer (P2P) networks are formed between the service nodes, and the P2P protocol is an application layer protocol that runs on top of a transmission control protocol (Transmission Control Protocol, TCP) protocol. The server may be provided with a server of the database index compression system, through which the server may interact with the terminal, for example, the server may be provided with corresponding software, which may be an application for implementing the database index compression method, or the like, but is not limited to the above form. The terminal and the server may be connected through communication connection modes such as bluetooth, universal serial bus (Universal Serial Bus, USB) or network, which is not limited herein.

The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The database index compression method in the embodiment of the application is described below.

Referring to fig. 1, an embodiment of the present application provides a database index compression method applied to an OLTP database system, which may include, but is not limited to, the following steps S101 to S103.

Step S101, obtaining an index tree of a database.

In some embodiments, index construction is required for the key value list in the database before index compression, and a corresponding index tree is generated. Specifically, in the index construction process, the key value list is processed one by one according to the designated index field, and each key value is inserted into an appropriate position in the index tree, so as to generate an index tree, for example, a b+ tree (b+ tree), a B tree (B-tree), a Bw tree (Bw-tree), or the like, which is not limited in this embodiment.

Referring to the index building diagram shown in fig. 2, in the index building stage, the database system builds an index from the input key value list using the constructor, and finally outputs an index tree including a plurality of leaf nodes. Taking the b+ tree as an example, the nodes in the b+ tree are divided into leaf nodes and non-leaf nodes (internal nodes). The leaf nodes are where data is actually stored in the b+ tree, each leaf node contains several keys and corresponding data pointers, in the form of a binary set of < keys, pointers >, and pointers to neighboring leaf nodes. The leaf nodes are sequentially linked into an ordered linked list according to the size of the key words, so that range query and sequential access are facilitated. Whereas the non-leaf node is the index part in the B + tree, which does not store the actual data, but only contains a certain number of keys and corresponding subtree pointers. The key is the largest or smallest key in the subtree that is used to control the direction of the search. Therefore, by comparing the keywords to be searched with the keywords of the non-leaf nodes, which subtree should be accessed can be determined, and by step-by-step comparison, the leaf nodes can be finally reached to find the required data, and the embodiment is not limited to this.

In some embodiments, an index tree of a database is obtained, the index tree comprising a plurality of leaf nodes in which at least one index value is stored in order, so the present application compresses values in existing leaf nodes of an index while maintaining its order, without introducing additional data structures, which is ideal for OLTP indexes.

Step S102, extracting common prefixes from the leaf nodes in the leaf node list.

The index tree comprises a plurality of leaf nodes, and the leaf nodes are linked according to the sequence of stored data to form a leaf node list. Illustratively, referring to the leaf node list schematic shown in FIG. 3, assume that each leaf node stores an index value, each column represents a leaf node, and the first left column represents the first leaf node to the left of the index tree. The index values are stored in order in each column, different leaf nodes may have the same prefix, the same prefix can be used as a common prefix, and index compression can be achieved by extracting the common prefix from the leaf nodes in the leaf node list, and the process of extracting the common prefix is described below.

Specifically, the process of extracting the common prefix may include, but is not limited to, the following steps S201 to S203.

Step S201, determining a first boundary and a second boundary of an initial sliding window in a leaf node list, and calculating a first benefit of the initial sliding window.

In some embodiments, the initial value of the initial sliding window first boundary is a first index value of a first leaf node in the list of leaf nodes, e.g., the index value of the leftmost leaf node; the initial value of the second boundary of the initial sliding window is the next index value selected based on the first boundary, and if the leaf node has only one index value, the initial value of the second boundary is the next leaf node selected based on the first boundary, that is, the second leaf node in the leaf node list, which is not limited in this embodiment.

In some embodiments, after determining the first and second boundaries of the initial sliding window in the leaf node list, a first benefit of the initial sliding window is calculated. It will be appreciated that the benefit of calculating a sliding window is to measure the value or importance of data within a given sliding window range. The benefit of the sliding window may be calculated in different ways, the specific method depending on the requirements of the application and the nature of the data. For example, the benefit may be calculated by counting the number of data entries within the window, calculating an average of the data, or other aggregation operations (e.g., summing, maximum, minimum, etc.), which is not limiting in this embodiment.

Step S202, calculating the second benefit of the initial sliding window according to the preset boundary obtained by the second boundary moving to the next index value.

In some embodiments, moving the second boundary of the initial sliding window obtains a preset boundary, and the preset boundary is obtained by moving the second boundary to the next index value according to the index value where the second boundary is located. Similar to the first benefit process of calculating the initial sliding window based on the first and second boundaries, the second benefit of the initial sliding window is calculated based on the first and preset boundaries using the same calculation method.

Step S203, if the second benefit is greater than or equal to the first benefit, the initial sliding window meets the preset condition, the second boundary of the initial sliding window is updated to obtain a second sliding window, the second sliding window is taken as the initial sliding window, and the process of extracting the public prefix is executed until the initial sliding window does not meet the preset condition, and the public prefix of the initial sliding window is obtained.

In some embodiments, the preset condition is that the second benefit of the initial sliding window is greater than or equal to the first benefit, and if the initial sliding window meets the preset condition, updating the second boundary of the initial sliding window to obtain the second sliding window. The second sliding window contains more index values than the initial sliding window, and the process of extracting the common prefix is repeatedly performed with the second sliding window as a new initial sliding window, i.e. steps S201 to S203 are repeatedly performed. Until the second benefit of the initial sliding window is smaller than the first benefit, that is, the initial sliding window does not meet the preset condition, the common prefix of the index value in the current initial sliding window is obtained, which is not limited in this embodiment.

And step S103, compressing leaf nodes in the initial sliding window by using the common prefix to obtain an index table.

In some embodiments, the common prefix is the longest common prefix of the index values stored by each leaf node in the initial sliding window, and each leaf node in the corresponding initial sliding window is compressed by using the common prefix, so as to obtain an index table including the common prefix table and the compressed leaf nodes. Specifically, the index table may be similar to a dictionary structure, the compressed leaf nodes have the same common prefix, the common prefixes corresponding to different initial sliding windows in the leaf node list are obtained in the form of dictionary elements and stored in the memory, which is not limited in this embodiment.

As shown in fig. 4, in some embodiments of the present application, the step S203 may include, but is not limited to, the following steps S301 to S303.

In step S301, if the second benefit is smaller than the first benefit, the leaf node where the second boundary is located is selected as the initial leaf node.

In some embodiments, if the second benefit of the initial sliding window is smaller than the first benefit, the initial sliding window does not meet the preset condition, and the leaf node where the second boundary is located is selected as the initial leaf node correspondingly. For example, assuming that each leaf node corresponds to an index value, there are 7 leaf nodes in the list of leaf nodes, the first boundary of the initial sliding window is the first leaf node, the second boundary is the fourth leaf node, and the first benefit calculated from the first boundary and the second boundary is assumed to be 9. The preset boundary of the initial sliding window is the next leaf node of the second boundary, i.e., the fifth leaf node, assuming that the second benefit calculated from the first boundary and the preset boundary is 8. At this time, the second benefit is smaller than the first benefit, and the initial sliding window does not meet the preset condition, so that the leaf node of the second boundary, namely, the fourth leaf node is selected as the initial leaf node.

Step S302, starting from the initial leaf node, a process of extracting the common prefix is performed based on the leaf node list, resulting in a common prefix corresponding to the initial leaf node.

In some embodiments, starting from an initial leaf node, the initial leaf node is taken as a first boundary of an initial sliding window, and the next leaf node of the initial leaf node is taken as a second boundary of the initial sliding window based on the list of leaf nodes. The process of extracting the common prefix is re-performed, that is, steps S201 to S203 are re-performed, and finally the common prefix corresponding to the initial sliding window corresponding to the initial leaf node is obtained, which is not limited in this embodiment.

Step S303, until the last leaf node of the list of leaf nodes is reached, at least one common prefix is obtained.

In some embodiments, the above process is performed on each leaf node in the list of leaf nodes until the last leaf node in the list of leaf nodes is reached initially, resulting in at least one common prefix. The common prefix is stored in the memory space so as to realize quick query and improve the query efficiency of the database, so that the best common prefix of the continuous interval of the leaf nodes in the leaf node list is dynamically searched, and the index compression rate of the database can be effectively improved.

Referring to fig. 5, in some embodiments of the present application, the above step S201 may include, but is not limited to, the following steps S401 to S403.

Step S401, matching the index value of the first boundary and the index value of the second boundary to obtain the longest common prefix of the initial sliding window.

In some embodiments, the leaf nodes store index values in order, and the index values of the leaf nodes corresponding to the first boundary of the initial sliding window and the index values of the leaf nodes corresponding to the second boundary are matched to obtain the longest common prefix of the initial sliding window. Specifically, the index values stored by the leaf nodes are already ordered according to the dictionary sequence, so that the longest common prefix of the initial sliding window can be obtained only by performing character-by-character matching on the first boundary and the second boundary, and every leaf node in the initial sliding window does not need to be matched pairwise.

Illustratively, the first boundary of the initial sliding window is the first leaf node in the leaf node list, the index value is BLE1Cf, the second boundary is the fourth leaf node in the leaf node list, and the index value is BLE2GmC, so the longest common prefix corresponding to the initial sliding window is BLE.

Step S402, calculating a window length of the initial sliding window based on the first boundary and the second boundary.

In some embodiments, the window length of the initial sliding window is calculated based on the first boundary and the second boundary, specifically, subtracting the leaf node sequence number where the first boundary is located from the leaf node sequence number where the second boundary is located is the window length. Illustratively, the second boundary of the initial sliding window is the fourth leaf node, and the first boundary of the initial sliding window is the first leaf node, so the window length of the initial sliding window=4-1=3, which is not limited in this embodiment.

Step S403, multiplying the prefix length of the longest common prefix by the window length to obtain a first benefit.

In some embodiments, the first benefit may be calculated by multiplying the prefix length of the longest common prefix of the initial sliding window by the window length of the initial sliding window. Illustratively, the longest common prefix of the initial sliding window is BLE, and thus the prefix length is 3, and the window length of the initial sliding window is 3, the calculated first benefit= 3*3 =9. It will be appreciated by those skilled in the art that the present embodiment is not limited in this regard, as may be desired.

Referring to fig. 6, in some embodiments of the present application, the step S103 may include, but is not limited to, the following steps S501 to S502.

In step S501, an index portion is selected from index values of leaf nodes according to a common prefix.

In some embodiments, the index portion is selected from index values of leaf nodes corresponding to the list of leaf nodes according to a common prefix of the initial sliding window. It will be appreciated that the index value of a leaf node is made up of an index portion and a common prefix, the common prefix being common to each leaf node in the initial sliding window, and the index portion being unique to each leaf node and having a different value.

Illustratively, it is assumed that a leaf node included in an initial sliding window is the first four leaf nodes in the leaf node list, respectively being leaf node 1, and its stored index value is BLE1Cf; leaf node 2, which stores an index value BLE1pni3; a leaf node 3 storing an index value BLE1F; leaf node 4 storing an index value BLE2GmC. The common prefix of the initial sliding window is BLE, and correspondingly, the index portion of the leaf node 1 is 1Cf, the index portion of the leaf node 2 is 1pni3, the index portion of the leaf node 3 is 1F, and the index portion of the leaf node 4 is 2GmC, which is not limited in this embodiment.

Step S502, storing the index portion and the pointer to the storage location of the common prefix in the memory space with the leaf node, and generating the index table based on the leaf node.

In some embodiments, the leaf nodes are utilized to store an index portion and pointers to storage locations in memory space for common prefixes, and an index table is generated based on the leaf nodes. The leaf nodes in the index tree are divided into continuous intervals according to the optimal common prefix based on the initial sliding window, and the common prefix is uniformly stored, so that the index compression of the database is realized, and the index compression rate of the database is effectively improved.

Referring to fig. 7, in some embodiments of the present application, the step S203 may further include, but is not limited to, the following steps S601 to S602.

In step S601, the first boundary of the initial sliding window is taken as the first boundary of the second sliding window.

In some embodiments, updating the second boundary of the initial sliding window results in the second sliding window, in particular, keeping the first boundary of the initial sliding window unchanged, i.e., taking the first boundary of the initial sliding window as the first boundary of the second sliding window.

Step S602, taking the preset boundary as a second boundary of the second sliding window.

In some embodiments, the preset boundary is taken as the second boundary of the second sliding window, that is, the next leaf node of the leaf nodes where the second boundary of the initial sliding window is located is taken as the second boundary of the second sliding window. The initial sliding window is updated to obtain a second sliding window, so that the sliding window length is increased, and more leaf nodes are contained. By setting the sliding window and controlling the size of the sliding window according to the benefits, the optimal common prefix of each leaf node in the leaf node list can be dynamically searched.

Referring to fig. 8, in some embodiments of the present application, the database index compression method may further include, but is not limited to, the following steps S701 to S703.

In step S701, in response to the index query instruction, a target leaf node is acquired based on the index table.

In some embodiments, in response to an index query instruction, starting from a root node of an index tree, a key value to be looked up is compared to a key in a current node. Illustratively, if the key to be found is smaller than the smallest key in the current node, the left subtree is entered. And if the key value to be searched is greater than or equal to the maximum key word in the current node, entering a right subtree. If the key value to be searched is between some two keywords in the current node, entering the corresponding subtree. The above steps are repeated until the leaf node is reached, and finally, the target leaf node is obtained according to the index table, which is not limited in this embodiment.

Step S702, according to the pointer in the target leaf node, the common prefix is acquired in the memory space.

According to the pointers stored in the target leaf nodes, the common prefixes stored in the corresponding positions can be quickly acquired in the memory space.

In step S703, the common prefix and the index portion in the target leaf node are combined to obtain the index value of the target leaf node.

In some embodiments, the obtained common prefix is combined with the index portion in the target leaf node, so that an index value of the target leaf node, that is, a target result of the index query instruction, can be obtained. Illustratively, the target leaf node is leaf node 1, the common prefix is BLE, the index portion is 1Cf, and BLE1Cf is obtained after combination.

Referring to fig. 9, in some embodiments of the present application, the database index compression method may further include, but is not limited to, the following steps S801 to S802.

Step S801, obtain the index value to be inserted, and insert the index value to be inserted into the target leaf node.

In some embodiments, an index value to be inserted is obtained, starting from a root node of the index tree, comparing the index value to be inserted with keywords in the node, finding a target leaf node and inserting the target leaf node.

In step S802, if there is a common prefix in the memory space that matches the index value of the target leaf node, the target leaf node is compressed using the common prefix.

In some embodiments, the index value of the target leaf node is changed when the index value to be inserted is inserted into the target leaf node. Further, whether the common prefix matches the index value of the target leaf node is found in the memory space, and if the common prefix matches the index value of the target leaf node, the target leaf node is compressed by using the common prefix, which is not limited in this embodiment.

The application is illustrated in the following in a complete example:

referring to the data index compression diagram of fig. 10, an initial sliding window is initialized starting from the first column to the left of the leaf node list with an initial profit of 0. Assuming that each leaf node has an index value, the first boundary of the initial sliding window is leaf node 1 and the second boundary is leaf node 2. In the process of sliding the initial sliding window, calculating the first benefit of the current initial sliding window. Specifically, as shown in the figure, the longest common prefix of the leaf node 1 and the leaf node 2 is BLE1, and the corresponding prefix length is 4, so that the first benefit=4×2-1=4. And then calculating the second benefit of the initial sliding window according to the preset boundary obtained by moving the second boundary to the next leaf node, namely calculating the second benefit according to the leaf node 1 and the leaf node 3. Specifically, as shown, the longest common prefix of the leaf node 1 and the leaf node 3 is BLE1, and the corresponding prefix length is 4, so that the first benefit=4×3-1=8.

At this time, the second benefit is greater than the first benefit, and the initial sliding window meets the preset condition, so that the second boundary of the initial sliding window is updated to obtain the second sliding window. And repeatedly executing the process by taking the second sliding window as a new initial sliding window. Specifically, the second benefit of the initial sliding window is used as the first benefit of the second sliding window, namely, the first benefit of the new initial sliding window; the first boundary remains unchanged, and the preset boundary, namely the leaf node 3, is taken as the second boundary of the second sliding window, namely the leaf node 3 is taken as the second boundary of the new initial sliding window. Thus, the leaf nodes contained in the current initial sliding window are leaf node 1, leaf node 2, and leaf node 3.

Then, the next leaf node of the second boundary, namely the leaf node 4, is taken as a preset boundary, and the second benefit of the current initial sliding window is calculated. Specifically, as shown, the longest common prefix of the leaf node 1 and the leaf node 4 is BLE, and the corresponding prefix length is 3, so that the second benefit=3×4-1=9. The second benefit is 9 and is greater than the first benefit 8, so that the current initial sliding window meets the preset condition, and the sliding is continued to update.

As shown, when the preset boundary is the leaf node 5, the longest common prefix of the leaf node 1 and the leaf node 5 is BL, and the corresponding prefix length is 2, so the second benefit=2×5-1=8. I.e. the second benefit is 8, which is smaller than the first benefit 9 of the current initial sliding window, and does not meet the preset condition. The first four leaf nodes are compressed as a compression partition of an initial sliding window, the leaf nodes are divided into a common prefix 'BLE' and an index part after the common prefix is removed, the common prefix is stored in a memory space, and then the index part and a pointer pointing to a storage position of the common prefix in the memory space are stored in the leaf nodes.

Further, the leaf node 5 serves as an initial leaf node, and the initial benefit is reset to 0. Whereby the next initial sliding window starts with leaf node 5 as a first boundary and leaf node 6 as a second boundary of the initial sliding window. As can be seen from the figure, the first benefit is 2 and the second benefit is 0, so that the leaf node 5 and the leaf node 6 are compressed as a compression partition of an initial sliding window, and the longest common prefix corresponding to the compression partition is BL. Thus, by setting the sliding window, the size of the sliding window is controlled according to the benefit to dynamically find the best common prefix of the continuous interval in the leaf node list until the last leaf node in the leaf node list is reached.

Experimental results show that the compression ratio of the method on the actual data set can reach 1.95x, the construction delay is increased by 11%, and the query bandwidth is reduced by 3.92%. It can be understood that the method for compressing the database index of the present embodiment does not have prefix coverage problem in the ordered case, that is, there are no two common prefixes a and B in the extracted common prefixes, so that a is the prefix of B, and thus, when querying the key, it is allowed to precisely match the common prefix by binary search, and further determine the compression partition to which the index belongs.

The embodiment of the application also provides a database index compression device, which can realize the database index compression method, and is shown in fig. 11, and in some embodiments of the application, the database index compression device comprises:

an obtaining module 100, configured to obtain an index tree of a database; wherein the index tree comprises a plurality of leaf nodes, each leaf node comprising at least one index value;

an extracting module 200, configured to extract a common prefix from leaf nodes in the leaf node list; the process of extracting the common prefix includes: determining a first boundary and a second boundary of an initial sliding window in a leaf node list, and calculating a first benefit of the initial sliding window; the initial value of the first boundary is a first index value of a first leaf node, and the initial value of the second boundary is a next index value selected based on the first boundary; calculating a second benefit of the initial sliding window according to a preset boundary obtained by moving the second boundary to the next index value; if the second benefit is greater than or equal to the first benefit, the initial sliding window meets a preset condition, a second boundary of the initial sliding window is updated to obtain a second sliding window, the second sliding window is used as the initial sliding window, and the process of extracting the public prefix is executed until the initial sliding window does not meet the preset condition, and the public prefix of the initial sliding window is obtained;

The compression module 300 is configured to compress leaf nodes in the initial sliding window by using a common prefix to obtain an index table; the index table comprises a common prefix table and compressed leaf nodes, and the common prefix table comprises a common prefix.

In some embodiments of the present application, referring to fig. 12, the apparatus further includes:

a query module 400, configured to obtain a target leaf node based on an index table in response to an index query instruction; acquiring a common prefix in a memory space according to a pointer in a target leaf node; and combining the common prefix and the index part in the target leaf node to obtain the index value of the target leaf node.

The inserting module 500 is configured to obtain an index value to be inserted, and insert the index value to be inserted into a target leaf node; and if the common prefix is matched with the index value of the target leaf node in the memory space, compressing the target leaf node by using the common prefix.

The specific implementation manner of the database index compression device in this embodiment is basically the same as the specific implementation manner of the database index compression method, and will not be described in detail herein.

Fig. 13 shows an electronic device 1000 provided by an embodiment of the application. The electronic device 1000 includes: the database index compression method comprises a processor 1001, a memory 1002 and a computer program stored on the memory 1002 and executable on the processor 1001, the computer program when executed being used to perform the database index compression method described above.

The processor 1001 and the memory 1002 may be connected by a bus or other means.

The memory 1002 is used as a non-transitory computer readable storage medium for storing non-transitory software programs and non-transitory computer executable programs, such as database index compression methods described in embodiments of the present application. The processor 1001 implements the database index compression method described above by running non-transitory software programs and instructions stored in the memory 1002.

Memory 1002 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store and perform the database index compression method described above. In addition, the memory 1002 may include high-speed random access memory 1002, and may also include non-transitory memory 1002, such as at least one storage device memory device, flash memory device, or other non-transitory solid state memory device. In some implementations, the memory 1002 optionally includes memory 1002 remotely located relative to the processor 1001, which remote memory 1002 can be connected to the electronic device 1000 over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The non-transitory software programs and instructions required to implement the database index compression method described above are stored in the memory 1002, which when executed by the one or more processors 1001, perform the database index compression method described above, e.g., perform method steps S101 through S103 in fig. 1, and steps S201 through S203, method steps S301 through S303 in fig. 4, method steps S401 through S403 in fig. 5, method steps S501 through S502 in fig. 6, method steps S601 through S602 in fig. 7, method steps S701 through S703 in fig. 8, and method steps S801 through S802 in fig. 9.

The embodiment of the application also provides a storage medium, which is a computer readable storage medium, and the storage medium stores a computer program, and the computer program realizes the database index compression method when being executed by a processor. The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

According to the database index compression method, the device, the electronic equipment and the storage medium, through obtaining the index tree of the database, the index tree comprises a plurality of leaf nodes, each leaf node comprises at least one index value, and then common prefixes are extracted from leaf nodes in a leaf node list. Specifically, the process of extracting the common prefix includes: and determining a first boundary and a second boundary of the initial sliding window in the leaf node list, and calculating first benefits of the initial sliding window, wherein the initial value of the first boundary is a first index value of a first leaf node in the leaf node list, and the initial value of the second boundary is a next index value selected based on the first boundary. And moving the second boundary to the next index value to obtain a preset boundary, so as to calculate the second benefit of the initial sliding window according to the first boundary and the preset boundary. If the second benefit is greater than or equal to the first benefit, the initial sliding window meets the preset condition, and the second boundary of the initial sliding window is updated to obtain the second sliding window. And taking the second sliding window as an initial sliding window, and iteratively executing the process of extracting the common prefix until the initial sliding window does not meet the preset condition, thereby obtaining the common prefix of the initial sliding window. And finally, compressing the leaf nodes in the initial sliding window by using the common prefix to obtain an index table comprising the common prefix table and the compressed leaf nodes. By setting the sliding window, whether the sliding window is updated or not is judged according to the income of the sliding window, so that the leaf nodes in the index tree are dynamically identified to obtain the common prefix, the index values of the leaf nodes in different sliding windows are compressed by using different common prefixes, the characteristic of different continuous intervals of the common prefix of the ordered index is effectively utilized, and the index compression rate of the database is improved.

The embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, storage device storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.

It should also be appreciated that the various embodiments provided by the embodiments of the present application may be arbitrarily combined to achieve different technical effects. While the preferred embodiments of the present application have been described in detail, the present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application.

Claims

1. A method for compressing a database index, comprising:

2. The method of claim 1, wherein the initial sliding window does not satisfy the preset condition, comprising:

3. The method of database index compression according to claim 2, wherein the leaf nodes store index values in order; the calculating a first benefit of the initial sliding window includes:

4. The method of claim 3, wherein compressing the leaf nodes in the initial sliding window with the common prefix to obtain an index table comprises:

5. The method of claim 1, wherein updating the second boundary of the initial sliding window to obtain a second sliding window comprises:

6. The method of database index compression according to claim 4, further comprising:

7. The method of database index compression according to claim 4, further comprising:

8. A database index compression apparatus, characterized by applying the database index compression method according to any one of claims 1 to 7, comprising:

The acquisition module is used for acquiring an index tree of the database; wherein the index tree comprises a plurality of leaf nodes, each leaf node comprising at least one index value;

9. The database index compression apparatus of claim 8, wherein the apparatus further comprises:

10. The database index compression apparatus of claim 8, wherein the apparatus further comprises:

11. An electronic device comprising a memory, a processor, the memory storing a computer program, the processor implementing the database index compression method of any one of claims 1 to 7 when the computer program is executed.

12. A computer-readable storage medium, characterized in that the storage medium stores a program that is executed by a processor to implement the database index compression method according to any one of claims 1 to 7.