CN114461675A - Partial index implementation method and system based on KV storage - Google Patents

Partial index implementation method and system based on KV storage Download PDF

Info

Publication number
CN114461675A
CN114461675A CN202210132055.2A CN202210132055A CN114461675A CN 114461675 A CN114461675 A CN 114461675A CN 202210132055 A CN202210132055 A CN 202210132055A CN 114461675 A CN114461675 A CN 114461675A
Authority
CN
China
Prior art keywords
index
partial
data
opt
predicate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210132055.2A
Other languages
Chinese (zh)
Inventor
牟冠学
徐佳庆
柴毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunxi Technology Co ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN202210132055.2A priority Critical patent/CN114461675A/en
Publication of CN114461675A publication Critical patent/CN114461675A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a partial index implementation method and a partial index implementation system based on KV storage, which belong to the technical field of query statement optimization, and aim to solve the technical problem of improving the execution efficiency of database query statements and reducing the cost of creation and storage on a cluster, the technical scheme is as follows: the method comprises the following specific steps: generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate; data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes; selecting a partial index and searching index data.

Description

Partial index implementation method and system based on KV storage
Technical Field
The invention relates to the field of query statement optimization, in particular to a partial index implementation method and system based on KV storage.
Background
The NewSQL technology is developed rapidly in recent years, and has mature systems at home and abroad. In some open source system implementations in the industry, Rocksdb serves as a core storage and query function. Rocksdb is a high-performance embedded persistent key-value store. Many open-source databases are stored as inline Rocksdb, storing data in a key-value storage format.
The query of the database is one of the very important functions of the database, when the database is queried, data in a specific range sometimes needs to be frequently read, and if the data in the specific range only needs to be read, all the data are read every time, so that the execution efficiency of database query statements is not high.
Therefore, how to improve the execution efficiency of database query statements and reduce the cost of creating and storing the database query statements on a cluster is a technical problem to be solved urgently at present.
Disclosure of Invention
The technical task of the invention is to provide a partial index implementation method and a partial index implementation system based on KV storage, so as to solve the problems of how to improve the execution efficiency of database query statements and reduce the cost of creation and storage on a cluster.
The technical task of the invention is realized in the following way, namely a partial index realization method based on KV storage, which comprises the following steps:
generating a partial index: creating a part of index statements through indexes containing predicate expressions where, namely, the index statements are only used for satisfying the query with the filter predicate;
data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
a partial index and lookup index data are selected.
Preferably, the partial index is generated as follows:
adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
Judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
the data filtered by the filter is stored in the sorted copy of the row-value subset created by the index.
Preferably, the predicate of the partial index is selected to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and (3) the function in the predicate using is not variable, for example, the now () function is not allowed to be used.
Preferably, when the validity of the predicate is verified, the same logic as that for verifying the checke expression is used, the SanitizeVarFreeExpr is used for verifying whether the predicate expression is valid, and then the expr expression is stored in the inderdescriptor, namely the metadata of the index is started when the data is waited to be operated (added or deleted) or backfilled.
Preferably, the data modification and backfill are specifically performed on the data in the partial index as follows:
changes made to the data create a sorted copy of the subset of row values for columns and rows in the Boolean predicate expression of the index that evaluate to true, without modifying the values in the table itself;
analyzing a filter expression through an expr expression stored in an indexdescriptor;
when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
when inserting the data code, judging whether the corresponding incoming mark needs to insert the line of data code into the sequencing copy (namely, a secondary index) of the line value subset;
the remaining operations of add, delete, modify, and backfill are similar to the insert operation described above.
Preferably, the selection of the partial index and the search index data are specifically as follows:
after the projection column and the index column are matched, adding a matching predicate;
if the predicates are matched, the subinterval matching is increased;
if the predicate matching meets the rule, cbo is changed through computeScancost, the number of lines of the index is counted through statistical information, and the use cost is reduced by the number of lines;
selecting an index by cbo;
the selection of the index is completed, the data in the index is directly read without judgment, and if the total data volume of the table is large, the execution efficiency can be greatly increased if the data volume contained in the where condition is small.
Preferably, the partially indexed index statement is specified as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index; the syntax for changing the data in the partial index and using the partial index is the same as the normal index.
A partial indexing system based on KV storage, the system comprising,
the generation module is used for creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executes the memory-stored computer program such that the at least one processor performs the KV-stored partial-index-based implementation method described above.
A computer readable storage medium having stored therein a computer program executable by a processor to implement a KV storage based partial index implementation method as described above.
The partial index implementation method and system based on KV storage have the following advantages:
when cbo selects to go through a partial index, the query effect is greatly improved because the data in the index can be directly and orderly pulled without the judgment of a conditional filter under the condition that a plurality of rows of filters of the partial index are screened out and the data query of the partial index is frequently called; to a certain extent, the more the number of rows filtered out is, the faster the calculation speed is compared with that of a common index, and when the query operation of the data in the interval is carried out for multiple times, the calculation effect is improved more obviously;
(II) the invention contains fewer rows than full indices, and therefore the cost of creating and storing them on clusters is lower;
(III) the present invention's read query of the rows contained in the partial index scans only the rows in the partial index, in contrast to the column query in the full index, which must scan all the rows in the index column;
(IV) the present invention performs index writes for write queries to tables with partial indices only when the inserted row satisfies the partial index predicate, as opposed to write queries to tables with full indices, which would incur the overhead of a full index write when the inserted row modifies the index column.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart diagram of a partial index implementation method based on KV memory;
FIG. 2 is a schematic illustration of a partial index.
Detailed Description
The method and system for implementing partial indexes based on KV memory according to the present invention will be described in detail below with reference to the drawings and specific embodiments of the present invention.
Example 1:
as shown in fig. 1, the method for implementing partial index based on KV storage of the present invention specifically includes:
s1, generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
s2, data modification and backfilling operation on the data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
and S3, selecting a partial index and searching index data.
The generation of the partial index in step S1 in this embodiment is specifically as follows:
s101, adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
S102, judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
s103, storing the data screened by the filter into the sequencing copy of the row value subset created by the index.
In this embodiment, the predicate selection of the partial index needs to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and (3) the function in the predicate using is not variable, for example, the now () function is not allowed to be used.
When the validity of the predicate is verified, the same logic as that for verifying the checke expression is used, the saitizevarfreeeexpr is used to verify whether the predicate expression is valid, and then the expr expression is stored in the inderdescriptor, that is, the metadata of the index is used to wait for the operation (addition and deletion) on the data or is started when the data is backfilled.
The operation of data modification and backfilling on the data in the partial index in step S2 in this embodiment is specifically as follows:
s201, changing data creates a sequencing copy of a row value subset for columns and rows evaluated to be true in the Boolean predicate expression of the index, and does not modify values in the table;
s202, analyzing a filter expression through an expr expression stored in an indexdescriptor;
s203, when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
s204, when the data codes are inserted, judging whether the corresponding transmitted marks need to insert the line of data codes into the sequencing copy (namely, a secondary index) of the line value subset;
the remaining operations of add, delete, modify, and backfill are similar to the insert operation described above.
The selected partial index and the search index data in step S3 in this embodiment are specifically as follows:
s301, after the projection column and the index column are matched, adding a matching predicate;
s302, when the predicates are matched, increasing subinterval matching;
s303, matching predicates to meet rules, changing cbo through computeScancost, counting the number of lines of the index through statistical information, and reducing use cost by the number of lines;
s304, selecting an index by cbo;
the selection of the index is completed, the data in the index is directly read without judgment, and if the total data volume of the table is large, but the data volume contained in the where condition is small, the execution efficiency can be greatly increased.
As shown in fig. 2, the partially indexed index statement in this embodiment is specifically as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index; the syntax for changing the data in the partial index and using the partial index is the same as the normal index.
When the query request is processed in the database, if irrelevant data can be filtered out as early as possible, the subsequent operators can do little useless work, and the execution efficiency of the whole SQL is improved. The most common means for filtering data is to use partial indexes, and the optimizer can process requests in an index filtering mode as much as possible and improve the query efficiency by utilizing the characteristic of ordered indexes. For example, when the condition of the partial index is a >1, and the query condition is a >1, we can directly take out the data satisfying a >1 by using the partial index without checking whether the value of a is 1 line by line. Of course whether index filtering will be selected depends on the cost estimate. When a certain part of data in the table is known to be frequently inquired, a partial index of the part of data can be established. This may elevate the query by a partial index.
Partial indexing may improve cluster performance in a number of ways:
that contains fewer rows than the full index, so it is less costly to create and store them on a cluster;
secondly, only scanning the rows in the partial index by the read query of the rows contained in the partial index; this is in contrast to a column query in the full index, which must scan all the rows in the index column;
a write query to a table with a partial index performs an index write only when the inserted row satisfies the partial index predicate, as opposed to a write query to a table with a full index, which may generate the overhead of a full index write when the inserted row modifies the index column.
Example 2:
the partial indexing system based on KV storage in the invention comprises,
the generation module is used for creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
Example 3:
an embodiment of the present invention further provides an electronic device, including: a memory and a processor;
wherein the memory stores computer execution instructions;
the processor executes the computer-executable instructions stored in the memory, so that the processor executes the partial index implementation method based on KV storage in any embodiment of the invention.
Example 4:
the embodiment of the present invention further provides a computer-readable storage medium, where a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the method for implementing the partial index based on the KV storage in any embodiment of the present invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A partial index implementation method based on KV storage is characterized by comprising the following steps:
generating a partial index: creating a partial index statement through an index containing a predicate expression where, namely, the partial index statement is only used for satisfying the query with the filter predicate;
data modification and backfilling operations on data in the partial index: when the data is subjected to the addition and deletion modification operation, the data is simultaneously operated on partial indexes;
a partial index and lookup index data are selected.
2. The method for implementing the partial index based on the KV storage according to claim 1, wherein the partial index is generated as follows:
adding an expr type key word in the metadata IndexDescriptor of the index, and storing the condition of partial index
Judging whether the index is a partial index: judging according to whether the keyword of the expr type is null or not;
the data filtered by the filter is stored in the sorted copy of the row-value subset created by the index.
3. The method for implementing the partial index based on the KV storage according to claim 1 or 2, wherein the predicate selection of the partial index needs to satisfy the following requirements:
firstly, a Boolean value is set in Zen City;
secondly, only the columns in the indexed table are referred to;
and thirdly, the function in the predicate using is not variable.
4. The method for implementing partial indexes based on KV storage according to claim 3, wherein when the validity of the predicate is verified, SanitizeVarFreeExpr is used to verify whether the predicate expression is valid, and then the expr expression is stored in an indedescriptor, that is, the metadata of the index is used to wait for data to be operated or data to be backfilled.
5. The method for implementing the partial index based on the KV storage according to claim 4, wherein the operations of data modification and backfilling on the data in the partial index are specifically as follows:
changes made to the data create ordered copies of the row value subsets of columns and rows in the boolean predicate expression of the index that evaluate to true;
analyzing a filter expression through an expr expression stored in an indexdescriptor;
when building a value of buildvalue, checking whether line data needs to be filtered through filterexpr, and transmitting a corresponding mark into a next flow;
when inserting the data code, it is determined whether the corresponding incoming flag requires the insertion of the row of data codes into the ordered copy of the row value subset.
6. The method for implementing partial indexes based on KV storage according to claim 5, wherein selecting partial indexes and searching index data is as follows:
after the projection column and the index column are matched, adding a matching predicate;
if the predicates are matched, the subinterval matching is increased;
if the predicate matching meets the rule, cbo is changed through computeScancost, and the number of rows of the index is counted through statistical information;
selecting an index by cbo;
the selection of the index is completed, and the data in the index is directly read without judgment.
7. The method for implementing partial indexes based on KV storage according to claim 6, wherein the index statements of the partial indexes are specifically as follows:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in;
wherein, the expression represented by opt _ idx _ where is the filter expression of the partial index.
8. A partial indexing system based on KV storage, characterized in that the system comprises,
the generating module is used for creating a part of index statements through indexes containing predicate expressions where, namely, the index statements are only used for satisfying the query with the filter predicate;
the operation module is used for simultaneously carrying out data operation on partial indexes when the data is subjected to the addition and deletion operations;
and the selection and search module is used for selecting partial indexes and searching index data.
9. An electronic device, comprising: a memory and at least one processor;
wherein the memory has stored thereon a computer program;
the at least one processor executing the memory-stored computer program causes the at least one processor to perform the KV-stored partial-index-based implementation method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program is executable by a processor to implement the KV storage-based partial index implementation method according to any one of claims 1 to 7.
CN202210132055.2A 2022-02-14 2022-02-14 Partial index implementation method and system based on KV storage Pending CN114461675A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210132055.2A CN114461675A (en) 2022-02-14 2022-02-14 Partial index implementation method and system based on KV storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210132055.2A CN114461675A (en) 2022-02-14 2022-02-14 Partial index implementation method and system based on KV storage

Publications (1)

Publication Number Publication Date
CN114461675A true CN114461675A (en) 2022-05-10

Family

ID=81413176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210132055.2A Pending CN114461675A (en) 2022-02-14 2022-02-14 Partial index implementation method and system based on KV storage

Country Status (1)

Country Link
CN (1) CN114461675A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131042A (en) * 2023-08-01 2023-11-28 上海沄熹科技有限公司 Database function index implementation method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131042A (en) * 2023-08-01 2023-11-28 上海沄熹科技有限公司 Database function index implementation method and device

Similar Documents

Publication Publication Date Title
US6832227B2 (en) Database management program, a database managing method and an apparatus therefor
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
CN107818115B (en) Method and device for processing data table
US8332389B2 (en) Join order for a database query
US8601474B2 (en) Resuming execution of an execution plan in a virtual machine
US7676453B2 (en) Partial query caching
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
EP1234258B1 (en) System for managing rdbm fragmentations
US9218394B2 (en) Reading rows from memory prior to reading rows from secondary storage
US8442971B2 (en) Execution plans with different driver sources in multiple threads
US9323798B2 (en) Storing a key value to a deleted row based on key range density
CN108062314B (en) Dynamic sub-table data processing method and device
CN106484815B (en) A kind of automatic identification optimization method based on mass data class SQL retrieval scene
US8396858B2 (en) Adding entries to an index based on use of the index
CN114461675A (en) Partial index implementation method and system based on KV storage
CN110321388B (en) Quick sequencing query method and system based on Greenplus
EP3005161A1 (en) Datasets profiling tools, methods, and systems
CN111782663A (en) Aggregation index structure and aggregation index method for improving aggregation query efficiency
CN115374121A (en) Database index generation method, machine-readable storage medium and computer equipment
CN115391424A (en) Database query processing method, storage medium and computer equipment
WO2022016532A1 (en) Efficient scan through comprehensive bitmap-index over columnar storage format
CN114064729A (en) Data retrieval method, device, equipment and storage medium
CN113220719A (en) Mass dimension data association query optimization method and system
KR102351846B1 (en) Query Optimization Method using Index Merge on Distributed Database
CN114237511B (en) Solution method for data optimization based on SharingSphere-JDBC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221129

Address after: Room 305-22, Building 2, No. 1158 Zhangdong Road and No. 1059 Dangui Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, 200120

Applicant after: Shanghai Yunxi Technology Co.,Ltd.

Address before: Building S02, 1036 Gaoxin Langchao Road, Jinan, Shandong 250100

Applicant before: Shandong Inspur Scientific Research Institute Co.,Ltd.

TA01 Transfer of patent application right