CN114461675A - Partial index implementation method and system based on KV storage - Google Patents
Partial index implementation method and system based on KV storage Download PDFInfo
- Publication number
- CN114461675A CN114461675A CN202210132055.2A CN202210132055A CN114461675A CN 114461675 A CN114461675 A CN 114461675A CN 202210132055 A CN202210132055 A CN 202210132055A CN 114461675 A CN114461675 A CN 114461675A
- Authority
- CN
- China
- Prior art keywords
- index
- partial
- data
- opt
- predicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/2454—Optimisation of common expressions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a partial index implementation method and a partial index implementation system based on KV storage, which belong to the technical field of query statement optimization, and aim to solve the technical problem of improving the execution efficiency of database query statements and reducing the cost of creation and storage on a cluster, the technical scheme is as follows: the method comprises the following specific steps: generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate; data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes; selecting a partial index and searching index data.
Description
Technical Field
The invention relates to the field of query statement optimization, in particular to a partial index implementation method and system based on KV storage.
Background
The NewSQL technology is developed rapidly in recent years, and has mature systems at home and abroad. In some open source system implementations in the industry, Rocksdb serves as a core storage and query function. Rocksdb is a high-performance embedded persistent key-value store. Many open-source databases are stored as inline Rocksdb, storing data in a key-value storage format.
The query of the database is one of the very important functions of the database, when the database is queried, data in a specific range sometimes needs to be frequently read, and if the data in the specific range only needs to be read, all the data are read every time, so that the execution efficiency of database query statements is not high.
Therefore, how to improve the execution efficiency of database query statements and reduce the cost of creating and storing the database query statements on a cluster is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a partial index implementation method and a partial index implementation system based on KV storage, so as to solve the problems of how to improve the execution efficiency of database query statements and reduce the cost of creation and storage on a cluster.
The technical task of the invention is realized in the following way, namely a partial index realization method based on KV storage, which comprises the following steps:
generating a partial index: creating a part of index statements through indexes containing predicate expressions where, namely, the index statements are only used for satisfying the query with the filter predicate;
data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
a partial index and lookup index data are selected.
Preferably, the partial index is generated as follows:
adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
Judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
the data filtered by the filter is stored in the sorted copy of the row-value subset created by the index.
Preferably, the predicate of the partial index is selected to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and (3) the function in the predicate using is not variable, for example, the now () function is not allowed to be used.
Preferably, when the validity of the predicate is verified, the same logic as that for verifying the checke expression is used, the SanitizeVarFreeExpr is used for verifying whether the predicate expression is valid, and then the expr expression is stored in the inderdescriptor, namely the metadata of the index is started when the data is waited to be operated (added or deleted) or backfilled.
Preferably, the data modification and backfill are specifically performed on the data in the partial index as follows:
changes made to the data create a sorted copy of the subset of row values for columns and rows in the Boolean predicate expression of the index that evaluate to true, without modifying the values in the table itself;
analyzing a filter expression through an expr expression stored in an indexdescriptor;
when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
when inserting the data code, judging whether the corresponding incoming mark needs to insert the line of data code into the sequencing copy (namely, a secondary index) of the line value subset;
the remaining operations of add, delete, modify, and backfill are similar to the insert operation described above.
Preferably, the selection of the partial index and the search index data are specifically as follows:
after the projection column and the index column are matched, adding a matching predicate;
if the predicates are matched, the subinterval matching is increased;
if the predicate matching meets the rule, cbo is changed through computeScancost, the number of lines of the index is counted through statistical information, and the use cost is reduced by the number of lines;
selecting an index by cbo;
the selection of the index is completed, the data in the index is directly read without judgment, and if the total data volume of the table is large, the execution efficiency can be greatly increased if the data volume contained in the where condition is small.
Preferably, the partially indexed index statement is specified as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index; the syntax for changing the data in the partial index and using the partial index is the same as the normal index.
A partial indexing system based on KV storage, the system comprising,
the generation module is used for creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executes the memory-stored computer program such that the at least one processor performs the KV-stored partial-index-based implementation method described above.
A computer readable storage medium having stored therein a computer program executable by a processor to implement a KV storage based partial index implementation method as described above.
The partial index implementation method and system based on KV storage have the following advantages:
when cbo selects to go through a partial index, the query effect is greatly improved because the data in the index can be directly and orderly pulled without the judgment of a conditional filter under the condition that a plurality of rows of filters of the partial index are screened out and the data query of the partial index is frequently called; to a certain extent, the more the number of rows filtered out is, the faster the calculation speed is compared with that of a common index, and when the query operation of the data in the interval is carried out for multiple times, the calculation effect is improved more obviously;
(II) the invention contains fewer rows than full indices, and therefore the cost of creating and storing them on clusters is lower;
(III) the present invention's read query of the rows contained in the partial index scans only the rows in the partial index, in contrast to the column query in the full index, which must scan all the rows in the index column;
(IV) the present invention performs index writes for write queries to tables with partial indices only when the inserted row satisfies the partial index predicate, as opposed to write queries to tables with full indices, which would incur the overhead of a full index write when the inserted row modifies the index column.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart diagram of a partial index implementation method based on KV memory;
FIG. 2 is a schematic illustration of a partial index.
Detailed Description
The method and system for implementing partial indexes based on KV memory according to the present invention will be described in detail below with reference to the drawings and specific embodiments of the present invention.
Example 1:
as shown in fig. 1, the method for implementing partial index based on KV storage of the present invention specifically includes:
s1, generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
s2, data modification and backfilling operation on the data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
and S3, selecting a partial index and searching index data.
The generation of the partial index in step S1 in this embodiment is specifically as follows:
s101, adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
S102, judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
s103, storing the data screened by the filter into the sequencing copy of the row value subset created by the index.
In this embodiment, the predicate selection of the partial index needs to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and (3) the function in the predicate using is not variable, for example, the now () function is not allowed to be used.
When the validity of the predicate is verified, the same logic as that for verifying the checke expression is used, the saitizevarfreeeexpr is used to verify whether the predicate expression is valid, and then the expr expression is stored in the inderdescriptor, that is, the metadata of the index is used to wait for the operation (addition and deletion) on the data or is started when the data is backfilled.
The operation of data modification and backfilling on the data in the partial index in step S2 in this embodiment is specifically as follows:
s201, changing data creates a sequencing copy of a row value subset for columns and rows evaluated to be true in the Boolean predicate expression of the index, and does not modify values in the table;
s202, analyzing a filter expression through an expr expression stored in an indexdescriptor;
s203, when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
s204, when the data codes are inserted, judging whether the corresponding transmitted marks need to insert the line of data codes into the sequencing copy (namely, a secondary index) of the line value subset;
the remaining operations of add, delete, modify, and backfill are similar to the insert operation described above.
The selected partial index and the search index data in step S3 in this embodiment are specifically as follows:
s301, after the projection column and the index column are matched, adding a matching predicate;
s302, when the predicates are matched, increasing subinterval matching;
s303, matching predicates to meet rules, changing cbo through computeScancost, counting the number of lines of the index through statistical information, and reducing use cost by the number of lines;
s304, selecting an index by cbo;
the selection of the index is completed, the data in the index is directly read without judgment, and if the total data volume of the table is large, but the data volume contained in the where condition is small, the execution efficiency can be greatly increased.
As shown in fig. 2, the partially indexed index statement in this embodiment is specifically as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index; the syntax for changing the data in the partial index and using the partial index is the same as the normal index.
When the query request is processed in the database, if irrelevant data can be filtered out as early as possible, the subsequent operators can do little useless work, and the execution efficiency of the whole SQL is improved. The most common means for filtering data is to use partial indexes, and the optimizer can process requests in an index filtering mode as much as possible and improve the query efficiency by utilizing the characteristic of ordered indexes. For example, when the condition of the partial index is a >1, and the query condition is a >1, we can directly take out the data satisfying a >1 by using the partial index without checking whether the value of a is 1 line by line. Of course whether index filtering will be selected depends on the cost estimate. When a certain part of data in the table is known to be frequently inquired, a partial index of the part of data can be established. This may elevate the query by a partial index.
Partial indexing may improve cluster performance in a number of ways:
that contains fewer rows than the full index, so it is less costly to create and store them on a cluster;
secondly, only scanning the rows in the partial index by the read query of the rows contained in the partial index; this is in contrast to a column query in the full index, which must scan all the rows in the index column;
a write query to a table with a partial index performs an index write only when the inserted row satisfies the partial index predicate, as opposed to a write query to a table with a full index, which may generate the overhead of a full index write when the inserted row modifies the index column.
Example 2:
the partial indexing system based on KV storage in the invention comprises,
the generation module is used for creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
Example 3:
an embodiment of the present invention further provides an electronic device, including: a memory and a processor;
wherein the memory stores computer execution instructions;
the processor executes the computer-executable instructions stored in the memory, so that the processor executes the partial index implementation method based on KV storage in any embodiment of the invention.
Example 4:
the embodiment of the present invention further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the method for implementing the partial index based on the KV storage in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A partial index implementation method based on KV storage is characterized by comprising the following steps:
generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
a partial index and lookup index data are selected.
2. The method for implementing the partial index based on the KV storage according to claim 1, wherein the partial index is generated as follows:
adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
Judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
the data filtered by the filter is stored in the sorted copy of the row-value subset created by the index.
3. The method for implementing the partial index based on the KV storage according to claim 1 or 2, wherein the predicate selection of the partial index needs to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and thirdly, the function in the predicate using is not variable.
4. The method for implementing partial indexes based on KV storage according to claim 3, wherein when the validity of the predicate is verified, SanitizeVarFreeExpr is used to verify whether the predicate expression is valid, and then the expr expression is stored in an indedescriptor, that is, the metadata of the index is used to wait for data to be operated or data to be backfilled.
5. The method for implementing the partial index based on the KV storage according to claim 4, wherein the operations of data modification and backfilling on the data in the partial index are specifically as follows:
changes made to the data create ordered copies of the row value subsets of columns and rows in the boolean predicate expression of the index that evaluate to true;
analyzing a filter expression through an expr expression stored in an indexdescriptor;
when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
when inserting the data code, it is determined whether the corresponding incoming flag requires the insertion of the row of data codes into the ordered copy of the row value subset.
6. The method for implementing partial indexes based on KV storage according to claim 5, wherein selecting partial indexes and searching index data is as follows:
after the projection column and the index column are matched, adding a matching predicate;
if the predicates are matched, the subinterval matching is increased;
if the predicate matching meets the rule, cbo is changed through computeScancost, and the number of rows of the index is counted through statistical information;
selecting an index by cbo;
the selection of the index is completed, and the data in the index is directly read without judgment.
7. The method for implementing partial indexes based on KV storage according to claim 6, wherein the index statements of the partial indexes are specifically as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index.
8. A partial indexing system based on KV storage, characterized in that the system comprises,
the generating module is used for creating a part of index statements through indexes containing predicate expressions where, namely, the index statements are only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
9. An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executing the memory-stored computer program causes the at least one processor to perform the KV-stored partial-index-based implementation method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program is executable by a processor to implement the KV storage-based partial index implementation method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210132055.2A CN114461675A (en) | 2022-02-14 | 2022-02-14 | Partial index implementation method and system based on KV storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210132055.2A CN114461675A (en) | 2022-02-14 | 2022-02-14 | Partial index implementation method and system based on KV storage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114461675A true CN114461675A (en) | 2022-05-10 |
Family
ID=81413176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210132055.2A Pending CN114461675A (en) | 2022-02-14 | 2022-02-14 | Partial index implementation method and system based on KV storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114461675A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117131042A (en) * | 2023-08-01 | 2023-11-28 | 上海沄熹科技有限公司 | Database function index implementation method and device |
-
2022
- 2022-02-14 CN CN202210132055.2A patent/CN114461675A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117131042A (en) * | 2023-08-01 | 2023-11-28 | 上海沄熹科技有限公司 | Database function index implementation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6832227B2 (en) | Database management program, a database managing method and an apparatus therefor | |
US8396852B2 (en) | Evaluating execution plan changes after a wakeup threshold time | |
CN107818115B (en) | Method and device for processing data table | |
US8332389B2 (en) | Join order for a database query | |
US8601474B2 (en) | Resuming execution of an execution plan in a virtual machine | |
US7676453B2 (en) | Partial query caching | |
US8924373B2 (en) | Query plans with parameter markers in place of object identifiers | |
EP1234258B1 (en) | System for managing rdbm fragmentations | |
US9218394B2 (en) | Reading rows from memory prior to reading rows from secondary storage | |
US8442971B2 (en) | Execution plans with different driver sources in multiple threads | |
US9323798B2 (en) | Storing a key value to a deleted row based on key range density | |
CN108062314B (en) | Dynamic sub-table data processing method and device | |
CN106484815B (en) | A kind of automatic identification optimization method based on mass data class SQL retrieval scene | |
US8396858B2 (en) | Adding entries to an index based on use of the index | |
CN114461675A (en) | Partial index implementation method and system based on KV storage | |
CN110321388B (en) | Quick sequencing query method and system based on Greenplus | |
EP3005161A1 (en) | Datasets profiling tools, methods, and systems | |
CN111782663A (en) | Aggregation index structure and aggregation index method for improving aggregation query efficiency | |
CN115374121A (en) | Database index generation method, machine-readable storage medium and computer equipment | |
CN115391424A (en) | Database query processing method, storage medium and computer equipment | |
WO2022016532A1 (en) | Efficient scan through comprehensive bitmap-index over columnar storage format | |
CN114064729A (en) | Data retrieval method, device, equipment and storage medium | |
CN113220719A (en) | Mass dimension data association query optimization method and system | |
KR102351846B1 (en) | Query Optimization Method using Index Merge on Distributed Database | |
CN114237511B (en) | Solution method for data optimization based on SharingSphere-JDBC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20221129 Address after: Room 305-22, Building 2, No. 1158 Zhangdong Road and No. 1059 Dangui Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120 Applicant after: Shanghai Yunxi Technology Co.,Ltd. Address before: Building S02, 1036 Gaoxin Langchao Road, Jinan, Shandong 250100 Applicant before: Shandong Inspur Scientific Research Institute Co.,Ltd. |
|
TA01 | Transfer of patent application right |