CN117131042A - Database function index implementation method and device - Google Patents

Database function index implementation method and device Download PDF

Info

Publication number
CN117131042A
CN117131042A CN202310956793.3A CN202310956793A CN117131042A CN 117131042 A CN117131042 A CN 117131042A CN 202310956793 A CN202310956793 A CN 202310956793A CN 117131042 A CN117131042 A CN 117131042A
Authority
CN
China
Prior art keywords
index
function
data
column
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310956793.3A
Other languages
Chinese (zh)
Inventor
陈磊
牟冠学
柴毅
赵春泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunxi Technology Co ltd
Original Assignee
Shanghai Yunxi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunxi Technology Co ltd filed Critical Shanghai Yunxi Technology Co ltd
Priority to CN202310956793.3A priority Critical patent/CN117131042A/en
Publication of CN117131042A publication Critical patent/CN117131042A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the field of query statement optimization, and particularly provides a method and a device for realizing database function indexes, wherein the method comprises the following steps: s1, generating a function index; s2, modifying and backfilling data to operate the data in the function index; s3, selecting function indexes and searching index data. Compared with the prior art, the method and the device have the advantage that the calculation effect is improved more obviously when the query operation of the data of the function is carried out for a plurality of times.

Description

Database function index implementation method and device
Technical Field
The invention relates to the field of query statement optimization, and particularly provides a method and a device for realizing database function indexes.
Background
The newSQL technology has been developed rapidly in recent years, and a mature system is available at home and abroad. In some open source system implementations in the industry, the Rocksdb functions as a core store and query function. Rocksdb is a high-performance embedded persistent key-value store. Many open-sourced databases are stored in embedded Rocksdb, with data stored in key-value storage format.
The database query is one of very important functions of the database, and when the database performs some queries, data in a specific range sometimes needs to be frequently read, but if only data in the range need to be read, all data are read every time, so that the database query statement execution efficiency is not high.
Disclosure of Invention
The invention provides a database function index realization method with strong practicability aiming at the defects of the prior art.
The invention further aims to provide a database function index realizing device which is reasonable in design, safe and applicable.
The technical scheme adopted for solving the technical problems is as follows:
a method for realizing database function index comprises the following steps:
s1, generating a function index;
s2, modifying and backfilling data to operate the data in the function index;
s3, selecting function indexes and searching index data.
Further, in step S1, a structure is added to the structure of the index tree, the index structure marks an index, the index includes the column attribute of the index, and there are several index columns of an index having several index elements;
the structural members in IndexElem have a ColmnName, namely a column name, the name of an index column is marked, a Function is added, and Function information stored in a Function index is placed in the index column information.
Further, the index descriptor is an index descriptor, marks all information of an index, and adds a function of storing the index of the function with one information;
multiplexing the existing Expr storage function expression and predicate storage string, and obtaining a function expression through deserialization, wherein the function index is a row selected through the query condition function index, and has certain requirements on the index condition, namely, the function index can only be simply queried;
after grammar analysis, AST is generated to check the inquiry grammar tree, the function is stored as a virtual column in the index, the function can not be sub-inquiry, aggregation function, window function and function with return value, if yes, the unsupported type error is reported.
Further, in step S2, when the data is subjected to the adding and deleting operation, the data is simultaneously subjected to the operation of the index, and after the function index is created, the function value of the index is changed by changing the data, and in the process, the function expression is resolved by the expr expression stored in the index descriptor before;
calculating the value of the function index column through funcexpr when the buildvalue is constructed, and transmitting the mark into the following flow;
when inserting data codes, it is determined whether an incoming flag requires a row of data codes to be inserted into the secondary index.
Furthermore, in the create index and back fill mode, an index is created by using a schema changer backfilling mode, and a backfilling function is performed, wherein when data exists in the table, new indexes are created to re-import kv of the data in the table into the indexes.
Further, in insert, first proceeding (×insert node). ProcessSourceRow, then proceeding (×tableineanserter). Row and insert row, finally proceeding encodieinedrexes;
the data to be inserted with the index is encoded, and after kv is obtained through encodieindex, the index in the storage is stuffed with kv through insert row.
Further, in update, inquiring and recording the old value, generating a new value, and carrying out expression processing on the new value;
for the rows meeting the predicate condition and not meeting the condition, the corresponding index data deletion processing is performed.
Further, in step S3, after matching the projection column and the index column, a matching function is added,
if the function match meets the rule, the change of cbo is performed through the computeScanCost, the function index is finally selected by cbo, the selection of the index is completed, and then the data in the index is directly read without calculation.
Further, column_name is a function with columns in the table, and when the left value of the sphere condition is the function index function when the table is selected, the index is triggered.
A database function index implementation apparatus, comprising: at least one memory and at least one processor;
the at least one memory for storing a machine readable program;
the at least one processor is configured to invoke the machine-readable program to perform a database function index implementation method.
Compared with the prior art, the method and the device for realizing the database function index have the following outstanding beneficial effects:
according to the method, a large number of experiments are adopted to measure, when functions of the function index array are searched, and under the condition that the data query of the part is frequently called, when the cbo selects the function index, the calculated function values in the index can be directly and orderly pulled without recalculation, so that the query effect is greatly improved. The improvement of the calculation effect is more obvious when the query operation of the data of the function is performed multiple times.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of function index generation in a database function index implementation method;
FIG. 2 is a schematic flow chart of a create index and backfill mode in a database function index implementation method;
FIG. 3 is a schematic flow chart of an insert operation in a database function index implementation method;
FIG. 4 is a schematic diagram of function indexing function syntax in a method for implementing database function indexing.
Detailed Description
In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A preferred embodiment is given below:
the method for realizing the database function index in the embodiment comprises the following steps:
s1, generating a function index;
the structure of the index tree adds one structure:
type IndexElem struct{
Column Name
Function Expr
the index structure marks an index, the index contains the column attribute of the index, the index column of an index has several members with ColmnName, namely column name, in IndexElem, indexElem, the name of the index column is marked, and a Function is added, namely Function information stored in the Function index is put in the index column information.
The index IndexDescriptor adds a structure:
message IndexDescriptor{
...
optional string func_expr=20;
the IndexDescriptor is an index descriptor that identifies all information of an index, and is also a function that adds an index to the information storage function.
Multiplexing the existing Expr storage function expression, storing predicates, and obtaining the function expression through deserialization. The function index is a function index of the selected row by the query condition, and not all rows, and has certain requirements on the index condition, namely, only simple query.
After grammar parsing, the AST is generated to check the query grammar tree, and the function is stored as a virtual column in the index. The function has a certain limit, and cannot be a sub-query, an aggregation function, a window function, a function with a return value, and the like, if yes, the type error is not supported.
As shown in figure 1 of the drawings,
establishing a function index and searching;
establishing a table;
CREATE TABLE t1(a float);
CREATE TABLE
establishing a function index;
CREATE INDEX idx ON t1(abs(a));
CREATE INDEX
inserting data
INSERT INTO t1 VALUES(0),(1);
INSERT 2
Querying data
explain(opt,verbose)select*from t1 where abs(a)>-1;
scan t1@idx;
├──columns:a:1
├──constraint:/1/3/2:[/-0.9999999999999999-]
├──stats:[rows=333.333333]
└──cost:346.676667
(5rows)
The function index was successfully selected in the program.
S2, modifying and backfilling data to operate the data in the function index;
when the data is subjected to the adding and deleting operations, the data is subjected to the indexing operation at the same time. When creating a function index, changes to the data change the function value of the index.
In this process the function expression is parsed from the expr expression previously stored in the indexdescriptor. At the time of build value construction, the value of the function index column is calculated by funcexpr and this flag is passed into the next flow.
When inserting data encoding, it is determined whether the incoming flag requires insertion of the data encoding into a secondary index, i.e., the ordered copy of the row value subset described above.
As shown in fig. 2, in the create index, back fill mode, an index is created using the schema change backfill,
(*SchemaChanger).distBackfill→backfiller.Run→indexbackfiller.runChunk→BuildIndexEntriesChunk→EncodeSecondaryIndexes;
distBackfill, backfiller, run, index backfiller, runchunk, buildindexendestchunk are all functions of backfilling, which is to re-import key/value of data in the table into the index when there is data in the table.
EncodeeSecondaryIndexs creates an indexed KV (i.e., keys/values), which is then plugged into storage by buildIndexEntriesChunk.
As shown in fig. 3, in the insert, first (×insert node) ×processsourcrow, then (×tableeinserter) ×row and insert row, and finally, encodieidendexes;
this is the process of inserting data, and the same need to encode the data to be indexed. After KV (i.e. key/value) is obtained through encodielndexes, this KV is stuffed into the index in the store through insert row. Also, the function value calculation is performed one time before encodiedrexes, and the result is inserted into the index.
In update, inquiring and recording the old value, generating a new value, and carrying out expression processing on the new value;
for the rows meeting the predicate condition and not meeting the condition, corresponding index data deletion processing is carried out, and the execution flow is similar to Insert.
S3, selecting function indexes and searching index data;
as shown in fig. 4, unlike a full index, a function index is not able to satisfy all queries. After matching the projection column and the index column, a matching function is added.
If the function match meets the rule, a change is made cbo by computeScanCost, which requires a reduction in the cost of cbo. Finally, cbo selects the function index, and completes the selection of the index. And then directly reading the data in the index without calculation. If the total data size of the table is large, the execution efficiency can be greatly increased by querying the function index column for the function value.
The following is the function index function syntax:
CREATE opt_unique INDEX opt_index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave
opt_partition_by opt_idx_where opt_locate_in
CREATE opt_unique INDEX IF NOT EXISTS index_name ON table_name opt_using_gin_btree'('index_params')'opt_storing opt_interleave opt_partition_by opt_idx_where opt_locate_in
column_name is a function with columns in the table. The index is triggered when the left value of the where condition is a function of the function index at the time of the select table.
The syntax of changing the data in the function index and when using the function index is the same as the normal index.
Based on the above method, a database function index implementation device in this embodiment includes: at least one memory and at least one processor;
the at least one memory for storing a machine readable program;
the at least one processor is configured to invoke the machine-readable program to perform a database function index implementation method.
The above-mentioned specific embodiments are merely specific examples of the present invention, and the scope of the present invention is not limited to the specific embodiments, and any suitable changes or substitutions made by those skilled in the art, which conform to the technical solutions described in the claims of the present invention, should fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for implementing database function indexes, comprising the steps of:
s1, generating a function index;
s2, modifying and backfilling data to operate the data in the function index;
s3, selecting function indexes and searching index data.
2. The method according to claim 1, wherein in step S1, the structure of the index tree is increased by a structure, the index structure marks an index, the index column contains the index attribute, and there are several index columns of an index having several index elems;
the structural members in IndexElem have a ColmnName, namely a column name, the name of an index column is marked, a Function is added, and Function information stored in a Function index is placed in the index column information.
3. The method according to claim 2, wherein the index descriptor is an index descriptor, which marks all information of an index, and adds a function of storing the index of the function;
multiplexing the existing Expr storage function expression and predicate storage string, and obtaining a function expression through deserialization, wherein the function index is a row selected through the query condition function index, and has certain requirements on the index condition, namely, the function index can only be simply queried;
after grammar analysis, AST is generated to check the inquiry grammar tree, the function is stored as a virtual column in the index, the function can not be sub-inquiry, aggregation function, window function and function with return value, if yes, the unsupported type error is reported.
4. A method for implementing a database function index according to claim 3, wherein in step S2, when data is subjected to a pruned operation, the index is subjected to a data operation at the same time, and when the function index is created, the function value of the index is changed by changing the data, and in this process, the function expression is parsed by an expr expression stored in an indexdescriptor;
calculating the value of the function index column through funcexpr when the buildvalue is constructed, and transmitting the mark into the following flow;
when inserting data codes, it is determined whether an incoming flag requires a row of data codes to be inserted into the secondary index.
5. The method of claim 4, wherein in the create index and back fill mode, creating an index by using a schema changer backfilling mode, and performing backfilling, namely when data exists in the table, newly creating an index to re-import kv of the data in the table into the index.
6. The method according to claim 5, wherein in the insert, first of all (×insert node) & processsourcrow, then (×tableinliner) & row and insert row, and finally encodindexes;
the data to be inserted with the index is encoded, and after kv is obtained through encodieindex, the index in the storage is stuffed with kv through insert row.
7. The method according to claim 6, wherein in update, the old value is queried and recorded, a new value is generated, and the new value is subjected to expression processing;
for the rows meeting the predicate condition and not meeting the condition, the corresponding index data deletion processing is performed.
8. The method according to claim 7, wherein in step S3, after matching the projection column and the index column, the matching function is added,
if the function match meets the rule, the change of cbo is performed through the computeScanCost, the function index is finally selected by cbo, the selection of the index is completed, and then the data in the index is directly read without calculation.
9. The method of claim 8, wherein column_name is a function with columns in a table, and wherein the index is triggered when a left value of a sphere condition is a function of the function index when a table is selected.
10. A database function index implementation apparatus, comprising: at least one memory and at least one processor;
the at least one memory for storing a machine readable program;
the at least one processor being configured to invoke the machine readable program to perform the method of any of claims 1 to 9.
CN202310956793.3A 2023-08-01 2023-08-01 Database function index implementation method and device Pending CN117131042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310956793.3A CN117131042A (en) 2023-08-01 2023-08-01 Database function index implementation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310956793.3A CN117131042A (en) 2023-08-01 2023-08-01 Database function index implementation method and device

Publications (1)

Publication Number Publication Date
CN117131042A true CN117131042A (en) 2023-11-28

Family

ID=88861893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310956793.3A Pending CN117131042A (en) 2023-08-01 2023-08-01 Database function index implementation method and device

Country Status (1)

Country Link
CN (1) CN117131042A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462592A (en) * 2014-03-28 2017-02-22 华为技术有限公司 Systems and methods to optimize multi-version support in indexes
US20180300350A1 (en) * 2017-04-18 2018-10-18 Microsoft Technology Licensing, Llc File table index aggregate statistics
CN109918472A (en) * 2019-02-27 2019-06-21 北京百度网讯科技有限公司 Method, apparatus, equipment and the medium of storage and inquiry data
CN114461675A (en) * 2022-02-14 2022-05-10 山东浪潮科学研究院有限公司 Partial index implementation method and system based on KV storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462592A (en) * 2014-03-28 2017-02-22 华为技术有限公司 Systems and methods to optimize multi-version support in indexes
US20180300350A1 (en) * 2017-04-18 2018-10-18 Microsoft Technology Licensing, Llc File table index aggregate statistics
CN109918472A (en) * 2019-02-27 2019-06-21 北京百度网讯科技有限公司 Method, apparatus, equipment and the medium of storage and inquiry data
CN114461675A (en) * 2022-02-14 2022-05-10 山东浪潮科学研究院有限公司 Partial index implementation method and system based on KV storage

Similar Documents

Publication Publication Date Title
Wang et al. PBiTree coding and efficient processing of containment joins
US7739251B2 (en) Incremental maintenance of an XML index on binary XML data
EP1234258B1 (en) System for managing rdbm fragmentations
US8903805B2 (en) Method and system for performing query optimization using a hybrid execution plan
US8458191B2 (en) Method and system to store RDF data in a relational store
US8495085B2 (en) Supporting efficient partial update of hierarchically structured documents based on record storage
US11055284B1 (en) Optimizing domain queries for relational databases
KR101549220B1 (en) Method and System for Managing Database, and Tree Structure for Database
US20230367781A1 (en) Systems and methods for processing timeseries data
CN111125119A (en) HBase-based spatio-temporal data storage and indexing method
Borodin et al. Improving penalty function of R-tree over generalized index search tree possible way to advance performance of PostgreSQL cube extension
CN115840589A (en) Publishing method supporting heterogeneous distributed database
CN114372174A (en) XML document distributed query method and system
CN117131042A (en) Database function index implementation method and device
de Castro Lima et al. Multidimensional cyclic graph approach: representing a data cube without common sub-graphs
Wang et al. Rencoder: A space-time efficient range filter with local encoder
CN114461675A (en) Partial index implementation method and system based on KV storage
Choi et al. Updating recursive XML views of relations
KR102351846B1 (en) Query Optimization Method using Index Merge on Distributed Database
Xiao et al. Branch code: A labeling scheme for efficient query answering on trees
US7962473B2 (en) Methods and apparatus for performing structural joins for answering containment queries
US20230367801A1 (en) Systems and methods for processing timeseries data
Hadzic et al. U3-Mning Unordered Embedded Subtrees Using TMG Candidate Generation
He Succinct indexes
US20230367752A1 (en) Systems and methods for processing timeseries data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination