CN114969034A - Query method and device for ordered table of LSM-Tree architecture database - Google Patents

Query method and device for ordered table of LSM-Tree architecture database Download PDF

Info

Publication number
CN114969034A
CN114969034A CN202210533848.5A CN202210533848A CN114969034A CN 114969034 A CN114969034 A CN 114969034A CN 202210533848 A CN202210533848 A CN 202210533848A CN 114969034 A CN114969034 A CN 114969034A
Authority
CN
China
Prior art keywords
current
iterator
data block
block
scanned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210533848.5A
Other languages
Chinese (zh)
Inventor
杨海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202210533848.5A priority Critical patent/CN114969034A/en
Publication of CN114969034A publication Critical patent/CN114969034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present specification provides a method for querying an ordered table of an LSM-Tree architecture database, where querying includes scanning one to multiple ordered tables in sequence according to a query condition, and scanning a certain ordered table includes: taking the certain ordered list as a current ordered list; when the current iterator is matched with the type of the current ordered list, the current iterator is reserved; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator; determining a target data block of the current scanning by a current iterator; when the target data block scanned this time and the current cache data block are the same data block, the current cache data block is reserved; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block; and scanning the current cache data block by the current iterator according to the query condition.

Description

Query method and device for ordered table of LSM-Tree architecture database
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for querying an ordered table of an LSM-Tree (Log Structured Merge Tree) Structured database.
Background
The LSM-Tree is a layered, ordered and disk-oriented data structure, and the core idea is to make the write performance optimal by using the characteristic that the sequential write performance of disk batches is higher than the random write performance. The data structure writes the insertion, deletion or modification of the data records into a series of ordered tables in an additional mode, greatly improves the writing capacity of the database, and has to sacrifice part of the reading performance.
When a database system with an LSM-Tree as a storage framework is queried, a plurality of ordered tables related to different layers are generally scanned according to query conditions, and then the scanning results of the ordered tables are merged to obtain a query result. The efficiency of scanning the ordered list determines the query speed and has a decisive influence on the reading performance of the LSM-Tree architecture database.
Disclosure of Invention
In view of this, the present specification provides a method for querying ordered tables in an LSM-Tree architecture database, where the querying includes sequentially scanning one to multiple ordered tables according to a query condition, and the scanning of an ordered table includes:
taking the certain ordered list as a current ordered list;
when the current iterator is matched with the type of the current ordered list, the current iterator is reserved; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator;
determining a target data block of the current scanning by a current iterator;
when the target data block scanned this time and the current cache data block are the same data block, the current cache data block is reserved; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block;
and scanning the current cache data block by the current iterator according to the query condition.
The present specification further provides an apparatus for querying ordered tables in an LSM-Tree architecture database, where the querying includes scanning one to multiple ordered tables in sequence according to a query condition, and a portion of the apparatus for scanning a certain ordered table includes:
the current ordered list unit is used for taking the certain ordered list as a current ordered list;
a current iterator unit, configured to retain the current iterator when the current iterator matches the type of the current ordered table; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator;
the target data block unit is used for determining the target data block of the current scanning by the current iterator;
a current cache data block unit, configured to reserve a current cache data block when a target data block of the current scanning is the same data block as the current cache data block; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block;
and the scanning execution unit is used for scanning the current cache data block by the current iterator according to the query condition.
This specification provides a computer device comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; and when the processor runs the computer program, executing the method of the query method of the LSM-Tree architecture database ordered list.
The present specification also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the method of the above method for querying the ordered table of the LSM-Tree architecture database.
As can be seen from the above technical solutions, in the embodiments of the present description, when the current iterator can be used for the current scanning, the current iterator is retained, and when the current cache data block can be used for the current scanning, the current cache data block is used for scanning, which avoids the need to regenerate the iterator and regenerate the cache data block during each scanning, accelerates the scanning speed of the ordered list by multiplexing the existing resources, and improves the reading performance of the LSM-Tree architecture database.
Drawings
Fig. 1 is a flowchart illustrating a scanning of an ordered table in a method for querying ordered tables of an LSM-Tree architecture database in an embodiment of the present disclosure;
FIG. 2 is a flow chart of scanning an ordered table in an example of application of the present specification;
FIG. 3 is a hardware block diagram of an apparatus for carrying out embodiments of the present description;
fig. 4 is a logical structure diagram of a query device for an LSM-Tree structured database ordered table in an embodiment of the present specification.
Detailed Description
In a database system with LSM-Tree as the storage architecture, when data insertion, deletion, or modification occurs, these write operations are written into MemTable (a data organization structure in memory) in memory. MemTable uses a tree structure to maintain order of keys. When the size of the MemTable exceeds a threshold, the MemTable is frozen while continuing to provide service in order to generate a new MemTable for non-blocking write operations. Freezing the data in MemTable persists on disk, written as an L0 layer SSTable (coded Strings Table) file.
When the volume of the SSTable files of the L0 layer exceeds a certain size or the number of the SSTable files exceeds a certain number, the SSTable files are periodically merged, the marked and deleted data are eliminated, and multi-version data are summarized to generate SSTable files of the L1 layer. Similarly, when the volume of the SSTable files of the L1 layer is too large or the number of the SSTable files is too large, the SSTable files of the L2 layer can be generated by periodically combining the SSTable files. By analogy, there may also be L3 layer SSTable files and the like.
SSTable adopts a persistent, ordered and immutable key value storage structure, data of the SSTable is sorted according to keys, the keys and values can be any byte arrays, and functions of searching according to a specified key and performing iterative traversal on a key interval with a specified range are provided. MemTable and each layer of SSTable both use ordered structures to store data, and are ordered tables.
The ordered tables (MemTable and SSTable) contain a series of data blocks inside, and usually block indices (block indices) are provided in the ordered tables to locate the data blocks to be read, which comprise a number of data rows. For some database products, such as OceanBase, the ordered table SSTable is composed of several Macro blocks (Macro blocks), which are the basic units of data file write operations; the data in the macro Block is organized into a plurality of variable length data blocks called as Micro blocks, each Micro Block comprises a plurality of data lines (Row), and each Micro Block is the minimum unit of data file reading operation; the corresponding macro and micro blocks may be found by indexing in the ordered table of OceanBase.
When the ordered table is subjected to query operation, searching MemTable in a memory firstly, and if all data rows meeting query conditions are found, finishing the query; otherwise, SSTable of the L0 layer is searched one by one, and then one to a plurality of SSTable of the L1 layer, the L2 layer and the like are searched until all data lines meeting the query condition are found or the data lines meeting the query condition are confirmed to be absent.
The look-up of each ordered table is typically performed using an iterator. Specifically, an iterator is generated for each ordered table to be searched, the iterator locates a data block (i.e. a target data block of the ordered table) possibly containing a data line to be searched according to a query range, reads the data line in the target data block, generates a cache data block corresponding to the target data block in a cache, and scans the cache data block according to a query condition. If one or more data lines meeting the query condition exist in the cache data block, generating a scanning result according to the data lines and returning the scanning result; and if the data line meeting the query condition does not exist in the cache data block, returning the scanning result without the matching item.
Ordered tables of the same type are scanned using the same iterator. For a database with LSM-Tree as the storage architecture, MemTable is a type of ordered table, SSTable at layer L0 is a type of ordered table, SSTable at layer L1 is a type of ordered table, …, i.e., if memory is counted as a layer, the ordered table at each layer is a type.
In the existing implementation scheme, if the next ordered table needs to be searched in the current query operation, before the next ordered table is scanned, resources such as an iterator and a cache data block used for searching the previous ordered table are all released, then an iterator used for searching the next ordered table is regenerated, and the searching process is repeated by a new iterator. Such a scheme has no dependency between the scans, is simple and easy to implement.
However, the ordered list is characterized in that data is stored in order, and in some application scenarios, the ordered list for multiple scanning in the same query is often of the same type, so that the same iterator is used; meanwhile, if the data scanned for multiple times are ordered, the target data block scanned for the previous time is often the target data block scanned for the next time, so that the cache data block generated by the previous time is likely to be directly hit in the next time of scanning. For example, in a nested Join (Nestloop Join), two-level index back, query across multiple partitions, etc., multiple scans of the same target data block are often performed in succession. Taking the example of nested links, nested links are used to find the associated rows of data in both tables, and for each row of data in the left table, the right table is scanned once to find the associated row, so that the right table will be repeatedly scanned. In the secondary index table returning and the cross-partition query, the indexed table and the next partition are respectively required to be scanned for multiple times. In these application scenarios, it is not efficient to regenerate the iterator and regenerate the buffer data block at each scan.
Therefore, an embodiment of the present specification provides a new query method for an ordered table of an LSM-Tree architecture database, where when a current ordered table is scanned, if a current iterator is matched with a type of the current ordered table and can be used for scanning the current ordered table, the current iterator is retained; if the current cache data block is the same as the target data block of the current scanning, the current cache data block is reserved for the current scanning, so that an iterator or the cache data block can be reused for multiple times of scanning under a certain condition, the iterator and the cache data block do not need to be generated for each scanning, the scanning speed is improved by using the characteristic of the ordered list query operation, and the reading performance of the LSM-Tree architecture database is improved.
Embodiments of the present description may be implemented on any device with computing and storage capabilities, such as a mobile phone, a tablet Computer, a PC (Personal Computer), a notebook, a server, and so on; the functions in the embodiments of the present specification may also be implemented by a logical node operating in two or more devices.
In an embodiment of the present specification, querying the ordered table of the LSM-Tree framework database includes sequentially scanning one to multiple ordered tables according to a query condition. The query of the ordered table is performed based on a query condition, where the query condition may be various conditional expressions capable of screening the data rows of the ordered table, and the embodiment of the present specification is not limited. For example, the data line may be a data line in which the value of a certain field in the data line is within a certain set range, the last change time is later than a certain time point, and the like. The purpose of scanning is to extract the data meeting the query condition in the ordered list or confirm that the data meeting the query condition does not exist in the ordered list
In the query of the LSM-Tree architecture database ordered table, the scanning process for a certain ordered table is shown in fig. 1.
Step 110, the ordered list is used as the current ordered list.
And taking the ordered list to be scanned as the current ordered list. The current ordered list is the target object of this scan.
Step 120, when the current iterator is matched with the type of the current ordered list, keeping the current iterator; and otherwise, acquiring the iterator matched with the current ordered list as the current iterator.
As previously described, the query to the ordered table includes one to a plurality of scans performed one after the other, each scan being performed by the iterator. The same type of ordered table uses the same iterator.
When the current scanning is started, the current iterator is the iterator used in the last scanning. For the initial scanning, the current iterator does not exist, and at this time, the iterator matched with the current ordered list type can be generated to execute the scanning by referring to the existing implementation, which is not described any more. For the non-initial scanning situation, if the current iterator is matched with the type of the current ordered list, the current iterator can be used for the scanning, a new iterator is not required to be reconstructed, and the current iterator is reserved for executing the scanning. And if the type of the current iterator is not matched with that of the current ordered table, acquiring the iterator matched with the current ordered table, and taking the iterator as the current iterator.
The manner of obtaining the iterator matched with the current ordered list may be determined according to the specific implementation of the actual application scenario, and is not limited. In one example, if the current iterator does not match the type of the current ordered table, the current iterator is released and a new iterator is generated that matches the current ordered table type.
In another example, an iterator pool may be constructed from used iterators, when the type of the current iterator does not match the type of the current ordered table, the current iterator is returned to the iterator pool, an iterator matching the type of the current ordered table is applied from the iterator pool, and if the application is successful, the applied iterator is used as a new current iterator; otherwise, generating the iterator matched with the current ordered list type and taking the iterator as a new current iterator. Therefore, the iterator which is used once is continuously put into the iterator pool for being used by subsequent application without regeneration, and the speed of replacing the iterator is improved.
And step 130, determining the target data block of the current scanning by the current iterator.
And after obtaining the current iterator used by the current scanning, determining the target data block of the current scanning by the current iterator according to the query range. The determination of the query range and the target data block is related to factors such as a storage mode of data rows in the ordered table in the actual application scene, specific implementation of query operation, and the like, and the conventional mode can be referred to, and is not described any more.
Step 140, when the target data block scanned this time and the current cache data block are the same data block, the current cache data block is reserved; otherwise, reading the target data block scanned this time to generate the current cache data block in the cache.
When the data line meeting the query condition is searched in the target data block, the content of the target data block is read into the cache, so that the searching speed can be accelerated. After the current iterator determines the target data block, the current cached data block is a cache of the last scanned target data block. The current iterator compares whether the current cache data block (namely the last scanned target data block) and the current scanned target data block are the same data block, if so, the current cache data block can be directly used for the current scanning and is reserved; if the data block is not the same data block, the current cache data block can be released, the target data block of the current scanning is read from the ordered list, and a new current cache data block is generated in the cache.
For some database products, the ordered table stores data in an organization of macroblocks and microblocks. For these database products, the data blocks read into the cache during scanning are microblocks, in other words, the target data block is a target microblock, and the macro block to which the target microblock belongs is a target macro block. Since some information of the macroblock is needed to locate the macroblock belonging to the macroblock, the iterator uses a data structure to store information of the macroblock to which the macroblock to be read belongs, and this data structure is referred to as the current macroblock in this embodiment of the present specification.
After the current iterator determines the target macro block scanned this time and its internal target micro block, the information of the last scanned target macro block is stored in the current macro block, and the current cache data block is the cache of the last scanned target micro block. And the current iterator compares whether the target micro-block scanned this time is the same micro-block as the target micro-block scanned last time, and if so, the current cache data block is reserved. If the current macroblock belongs to the same macroblock, the current iterator directly adopts the current macroblock to locate the target macroblock of the current scanning, reads the target macroblock of the current scanning and generates a new current cache data block in the cache; if not, updating the current macro block to the information of the scanned target macro block, positioning the scanned target micro block by the updated current macro block, reading the scanned target micro block and generating a new current cache data block in the cache.
And 150, scanning the current cache data block by the current iterator according to the query condition.
The current iterator scans the current cache data block using the query condition. If one or more data lines meeting the query condition exist in the current cache data block, generating a scanning result according to the data lines and returning; and if the data line meeting the query condition does not exist in the current cache data block, returning a scanning result of which the matching item is not found.
In some embodiments, within the current iterator, a decoder may be employed to decode the data lines in the cache data block and find the data lines that meet the query conditions. Before the decoder decodes and searches, the decoder is initialized, and the internal state of the decoder is constructed according to the scanned target data block, so that the decoder can correctly decode the cache data block.
In these embodiments, when the target data block scanned this time is the same data block as the target data block scanned last time, the internal state of the decoder may already be used to decode the target data block scanned this time, initialization is not necessary, and the decoder may directly decode and scan the current cache data block according to the query condition. When the scanned target data block is not the same as the last scanned target data block, the internal state of the decoder needs to be reconstructed, the decoder is initialized according to the scanned target data block, and the decoder after initialization decodes and scans the current cache data block.
It should be noted that, in an implementation manner that employs an iterator pool, if a current iterator does not match the type of a current ordered table and there is no iterator matching the type of the current ordered table in the iterator pool, an iterator matching the type of the current ordered table needs to be generated as the current iterator. Because the newly generated current iterator does not have the same target data block as the previous scanning, the target data block scanned this time can be directly read to generate the current cache data block in the cache, and then the current iterator scans the current cache data block by adopting the query condition. Neither the ability to multiplex the current buffer block nor the internal state of the decoder need to be considered.
It can be seen that, in the embodiments of the present specification, multiplexing of resources is implemented when scanning for multiple times at one to multiple levels of resources at each level, such as an iterator, a cache data block, a target macro block, and an internal state of a decoder, so that when query operations, such as nested connection, second-level index table return, or cross-partition query, are performed on an ordered table, a repeated loading or initialization process that is invalid for the above resources can be avoided, query efficiency is improved, and the read performance of an LSM-Tree architecture database is made superior.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In one example application of the present specification, query operations on an OceanBase database employing the LSM-Tree architecture include nested join, secondary index back to table, and cross partition queries. In these query operations, each of MemTable, L0-level SSTable, L1-level SSTable, and the like will be scanned layer by layer in sequence until all data lines that meet the query condition are found, or until it is confirmed that there are no data lines that meet the query condition.
The scanning flow for an ordered list in the above three query operations is shown in fig. 2.
Step 205, scanning starts, and the ordered table to be scanned is taken as the current ordered table.
Step 210, determining whether the current iterator is matched with the type of the current ordered list, if yes, keeping the current iterator, and turning to step 225; if not, step 215 is performed.
Step 215, judging whether an iterator matched with the current ordered list type exists in the iterator pool, if so, executing step 220, and if not, turning to step 260.
And step 220, releasing the current iterator and returning the current iterator to the iterator pool, applying the iterator matched with the current ordered list type from the iterator pool, and taking the applied iterator as a new current iterator.
And step 225, determining the target macro block and the target micro block of the current scanning by the current iterator.
Step 230, judging whether the target micro block scanned this time is the same micro block as the target micro block scanned last time, if yes, keeping the current buffer data block and the internal state of the decoder, and turning to step 255; if not, step 235 is performed.
235, judging whether the target macro block scanned this time is the same macro block as the last scanned target macro block, if so, keeping the current macro block, and turning to 245; if not, step 240 is performed.
And 240, updating the current macro block according to the target macro block scanned this time, and updating the information stored in the current macro block to the information of the target macro block scanned this time.
Step 245, positioning the target micro block scanned this time according to the current macro block, reading the data line in the target micro block, and generating a new current cache data block in the cache.
And step 250, initializing a decoder according to the scanned target micro block.
Step 255, the decoder scans the current cache data block according to the query condition, and the scanning is finished.
And step 260, generating an iterator matched with the type of the current ordered list as a current iterator, generating a current cache data block in the cache based on the target micro-block scanned at this time, scanning the current cache data block by the current iterator by adopting the query condition, and finishing the scanning at this time.
Corresponding to the implementation of the above process, an embodiment of the present specification further provides a device for querying an ordered table of an LSM-Tree architecture database. The apparatus may be implemented by software, or by hardware, or by a combination of hardware and software. Taking software implementation as an example, the device in the logical sense is formed by reading corresponding computer program instructions into a memory for operation through a Central Processing Unit (CPU) of the device. In terms of hardware, in addition to the CPU, the memory and the storage shown in fig. 3, the device in which the LSM-Tree structured database ordered table query apparatus is located generally includes other hardware such as a chip for performing wireless signal transmission and reception and/or other hardware such as a board card for implementing a network communication function.
Fig. 4 is a query apparatus for ordered tables in an LSM-Tree architecture database according to an embodiment of the present disclosure, where the query includes sequentially scanning one or more ordered tables according to a query condition, and a portion of the apparatus used for scanning a certain ordered table includes a current ordered table unit, a current iterator unit, a target data block unit, and a scan execution unit, where: the current ordered table unit is used for taking the certain ordered table as a current ordered table; the current iterator unit is used for keeping the current iterator when the current iterator is matched with the type of the current ordered list; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator; the target data block unit is used for determining a target data block of the current scanning by the current iterator; the current cache data block unit is used for reserving the current cache data block when the target data block scanned this time and the current cache data block are the same data block; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block; and the scanning execution unit is used for scanning the current cache data block by the current iterator according to the query condition.
In one example, the obtaining, by the current iterator unit, an iterator matching the current ordered list type as a current iterator includes: and returning the current iterator to the iterator pool, and applying the iterator matched with the current ordered list type from the iterator pool as a new current iterator.
In the foregoing example, the apparatus may further include an iterator adding unit, configured to generate, when the current iterator is not matched with the type of the current ordered table and there is no iterator matching with the type of the current ordered table in the iterator pool, an iterator matching with the type of the current ordered table as the current iterator, read the target data block scanned this time into the cache to generate a current cache data block, and scan the current cache data block by using the query condition by the current iterator.
Optionally, the target data block is a micro block and belongs to a certain macro block; the target data block unit is specifically configured to: determining a target macro block scanned this time and a target micro block inside the target macro block by a current iterator; the current cache data block unit is specifically configured to: when the target micro block scanned this time and the target micro block scanned last time are the same micro block, the current cache data block is reserved; when the scanned target micro block and the scanned target micro block at the last time are not the same micro block and belong to the same macro block, positioning the scanned target micro block at the current time based on the current macro block, and reading the scanned target micro block to a cache to generate a new current cache data block; when the scanned target micro block and the scanned target micro block are not the same micro block and do not belong to the same macro block, updating the current macro block according to the macro block to which the scanned target micro block belongs, positioning the scanned target micro block based on the updated current macro block, and reading the target micro block to a cache to generate a new current cache data block.
Optionally, the current iterator includes a decoder; the scan execution unit is specifically configured to: when the target data block scanned this time and the target data block scanned last time are the same data block, decoding and scanning the current cache data block by a decoder according to the query condition; when the scanned target data block is not the same as the last scanned target data block, a decoder is initialized according to the scanned target data block, and the decoder after initialization decodes and scans the current cache data block.
Optionally, the querying includes: nested Join Nestloop Join, secondary index back table, or cross partition query.
Embodiments of the present description provide a computer device that includes a memory and a processor. Wherein the memory has stored thereon a computer program executable by the processor; the processor, when executing the stored computer program, performs the steps of the method for querying the LSM-Tree structured database ordered table in the embodiments of the present specification. For a detailed description of each step of the query method for the LSM-Tree structured database ordered table, please refer to the previous contents, and it is not repeated.
Embodiments of the present description provide a computer-readable storage medium having stored thereon computer programs which, when executed by a processor, perform the steps of the method for querying ordered tables of an LSM-Tree architecture database in embodiments of the present description. For a detailed description of each step of the query method for the LSM-Tree structured database ordered table, please refer to the previous contents, and it is not repeated.
The above description is meant to be illustrative of the preferred embodiments of the present disclosure and not to limit the claimed embodiments to other embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and scope of the claims are intended to be included therein.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Claims (14)

1. A query method for an ordered list of an LSM-Tree architecture database comprises the following steps of sequentially scanning one to a plurality of ordered lists according to query conditions, wherein the scanning of one ordered list comprises the following steps:
taking the certain ordered list as a current ordered list;
when the current iterator is matched with the type of the current ordered list, the current iterator is reserved; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator;
determining a target data block of the current scanning by a current iterator;
when the target data block scanned this time and the current cache data block are the same data block, the current cache data block is reserved; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block;
and scanning the current cache data block by the current iterator according to the query condition.
2. The method of claim 1, the obtaining an iterator matching a current ordered list type as a current iterator, comprising: and returning the current iterator to the iterator pool, and applying the iterator matched with the current ordered list type from the iterator pool as a new current iterator.
3. The method of claim 2, further comprising: and when the current iterator is not matched with the type of the current ordered list and no iterator matched with the type of the current ordered list exists in the iterator pool, generating the iterator matched with the type of the current ordered list as the current iterator, reading the scanned target data block into the cache to generate a current cache data block, and scanning the current cache data block by using the current iterator according to the query condition.
4. The method of claim 1, wherein the target data block is a micro-block belonging to a certain macro-block;
the determining, by the current iterator, the target data block of the current scan includes: determining a target macro block scanned this time and a target micro block inside the target macro block by a current iterator;
when the target data block scanned this time and the current cache data block are the same data block, the current cache data block is reserved; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block, including: when the target micro block scanned this time and the target micro block scanned last time are the same micro block, the current cache data block is reserved; when the scanned target micro block and the scanned target micro block at the last time are not the same micro block and belong to the same macro block, positioning the scanned target micro block at the current time based on the current macro block, and reading the scanned target micro block to a cache to generate a new current cache data block; when the scanned target micro block and the scanned target micro block are not the same micro block and do not belong to the same macro block, updating the current macro block according to the macro block to which the scanned target micro block belongs, positioning the scanned target micro block based on the updated current macro block, and reading the target micro block to a cache to generate a new current cache data block.
5. The method of claim 1, the current iterator comprising a decoder;
the scanning, by the current iterator, the current cache data block according to the query condition includes: when the target data block scanned this time and the target data block scanned last time are the same data block, decoding and scanning the current cache data block by a decoder according to the query condition; when the scanned target data block is not the same as the last scanned target data block, a decoder is initialized according to the scanned target data block, and the decoder after initialization decodes and scans the current cache data block.
6. The method of claim 1, the querying comprising: nested Join Nestloop Join, secondary index back table, or cross partition query.
7. An apparatus for querying ordered tables in an LSM-Tree architecture database, the querying comprising scanning one to multiple ordered tables in sequence according to query conditions, the portion of the apparatus for scanning a certain ordered table comprising:
the current ordered list unit is used for taking the certain ordered list as a current ordered list;
a current iterator unit, configured to retain the current iterator when the current iterator matches the type of the current ordered table; otherwise, acquiring an iterator with a type matched with the current ordered list as a current iterator;
the target data block unit is used for determining the target data block of the current scanning by the current iterator;
a current cache data block unit, configured to reserve a current cache data block when a target data block of the current scanning is the same data block as the current cache data block; otherwise, reading the target data block scanned this time to a cache to generate a new current cache data block;
and the scanning execution unit is used for scanning the current cache data block by the current iterator according to the query condition.
8. The apparatus of claim 7, the current iterator unit to obtain an iterator matching a current ordered table type as a current iterator, comprising: and returning the current iterator to the iterator pool, and applying the iterator matched with the current ordered list type from the iterator pool as a new current iterator.
9. The apparatus of claim 8, the apparatus further comprising: and the iterator adding unit is used for generating an iterator matched with the current ordered list type as the current iterator when the current iterator is not matched with the type of the current ordered list and no iterator matched with the current ordered list type exists in the iterator pool, reading the scanned target data block into the cache to generate a current cache data block, and scanning the current cache data block by the current iterator by adopting a query condition.
10. The apparatus of claim 7, wherein the target data block is a micro-block belonging to a certain macro-block;
the target data block unit is specifically configured to: determining a target macro block scanned this time and a target micro block inside the target macro block by a current iterator;
the current cache data block unit is specifically configured to: when the target micro block scanned this time and the target micro block scanned last time are the same micro block, the current cache data block is reserved; when the scanned target micro block and the scanned target micro block at the last time are not the same micro block and belong to the same macro block, positioning the scanned target micro block at the current time based on the current macro block, and reading the scanned target micro block to a cache to generate a new current cache data block; when the scanned target micro block and the scanned target micro block are not the same micro block and do not belong to the same macro block, updating the current macro block according to the macro block to which the scanned target micro block belongs, positioning the scanned target micro block based on the updated current macro block, and reading the target micro block to a cache to generate a new current cache data block.
11. The apparatus of claim 7, the current iterator comprising a decoder;
the scan execution unit is specifically configured to: when the target data block scanned this time and the target data block scanned last time are the same data block, decoding and scanning the current cache data block by a decoder according to the query condition; when the scanned target data block is not the same as the last scanned target data block, a decoder is initialized according to the scanned target data block, and the decoder after initialization decodes and scans the current cache data block.
12. The apparatus of claim 7, the query comprising: nested Join Nestloop Join, secondary index back table, or cross partition query.
13. A computer device, comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; the processor, when executing the computer program, performs the method of any of claims 1 to 6.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.
CN202210533848.5A 2022-05-16 2022-05-16 Query method and device for ordered table of LSM-Tree architecture database Pending CN114969034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210533848.5A CN114969034A (en) 2022-05-16 2022-05-16 Query method and device for ordered table of LSM-Tree architecture database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210533848.5A CN114969034A (en) 2022-05-16 2022-05-16 Query method and device for ordered table of LSM-Tree architecture database

Publications (1)

Publication Number Publication Date
CN114969034A true CN114969034A (en) 2022-08-30

Family

ID=82982702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210533848.5A Pending CN114969034A (en) 2022-05-16 2022-05-16 Query method and device for ordered table of LSM-Tree architecture database

Country Status (1)

Country Link
CN (1) CN114969034A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092812A (en) * 2024-04-18 2024-05-28 华侨大学 Key value storage and read-write method based on memory table index and iterator reduction mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118092812A (en) * 2024-04-18 2024-05-28 华侨大学 Key value storage and read-write method based on memory table index and iterator reduction mechanism

Similar Documents

Publication Publication Date Title
US11586629B2 (en) Method and device of storing data object
CN109933570B (en) Metadata management method, system and medium
CN109683811B (en) Request processing method for hybrid memory key value pair storage system
US20140136510A1 (en) Hybrid table implementation by using buffer pool as permanent in-memory storage for memory-resident data
US9292554B2 (en) Thin database indexing
CN111324665B (en) Log playback method and device
CN110134335B (en) RDF data management method and device based on key value pair and storage medium
CN107391544B (en) Processing method, device and equipment of column type storage data and computer storage medium
CN114969034A (en) Query method and device for ordered table of LSM-Tree architecture database
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
US8156126B2 (en) Method for the allocation of data on physical media by a file system that eliminates duplicate data
CN113392089A (en) Database index optimization method and readable storage medium
US20210067332A1 (en) Network key value indexing design
CN116048396B (en) Data storage device and storage control method based on log structured merging tree
US9292553B2 (en) Queries for thin database indexing
CN115145954A (en) Data query method, data storage method and device
CN111198660A (en) B + tree traversal method and device
CN115495462A (en) Batch data updating method and device, electronic equipment and readable storage medium
US20220197884A1 (en) Encoding method for key trie, decoding method for key trie, and electronic devices
CN113094336B (en) Cuckoo hash-based file system directory management method and system
CN114461635A (en) MySQL database data storage method and device and electronic equipment
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
CN1235169C (en) Data storage and searching method of embedded system
CN109325023B (en) Data processing method and device
US9846553B2 (en) Organization and management of key-value stores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination