CN113704246A - Database index optimization method and system based on virtual index - Google Patents

Database index optimization method and system based on virtual index Download PDF

Info

Publication number
CN113704246A
CN113704246A CN202011146017.XA CN202011146017A CN113704246A CN 113704246 A CN113704246 A CN 113704246A CN 202011146017 A CN202011146017 A CN 202011146017A CN 113704246 A CN113704246 A CN 113704246A
Authority
CN
China
Prior art keywords
index
query
virtual
indexes
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011146017.XA
Other languages
Chinese (zh)
Inventor
曹靖城
张继东
王猛德
王培才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Digital Life Technology Co Ltd
Original Assignee
Tianyi Smart Family Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Smart Family Technology Co Ltd filed Critical Tianyi Smart Family Technology Co Ltd
Priority to CN202011146017.XA priority Critical patent/CN113704246A/en
Publication of CN113704246A publication Critical patent/CN113704246A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for database index optimization based on virtual indexes. The method comprises the following steps: responding to the index updating request, and synchronizing the production database and the log information; creating a set of virtual indices based on an analysis of corresponding query statements in the log; selecting and automatically updating one or more best indexes to a production database based on the actual query execution time for each virtual index in the set of virtual indexes; and reselecting and automatically updating the one or more best indexes to the production database based on a virtual index rating value reflecting an improvement in query execution time before and after each index is created.

Description

Database index optimization method and system based on virtual index
Technical Field
The invention relates to the field of databases, in particular to a database index optimization method and system based on virtual indexes.
Background
Databases are ubiquitous in software systems, both web applications and mobile APPs, and are not isolated. MySQL is one of the most popular relational database management systems, is popular with developers, has gradually become the mainstream database solution, and is widely applied in various business fields.
The index is a structure for sequencing values of one or more columns in a database table, and provides pointers to data values stored in a designated column of the table.
The creation of the current index mainly depends on the experience of developers, and when the developers realize the function of the business system, the developers create the index simultaneously according to the past experience of the developers. Because developers do not have knowledge of mastering the indexes of the database systematically, the subjectivity is strong, the indexes cannot be matched with the SQL sentences, and the query efficiency is low; in addition, when the data volume of the database table changes or the distribution of the table data changes, the historical index is not suitable any more, and the query efficiency cannot be guaranteed. These make it unreasonable to have a lot of indexes in practical application, and finally affect the query performance of the system.
The chinese patent application "database index creation method and apparatus" (201610181731.X) proposes to perform statistical analysis on each query field combination of data query operations, and to establish an index according to the query field combination when the number of data query operations satisfies a preset automatic index creation condition. The method introduced by the patent only uses the combination of query fields with more query times as a basis to create the corresponding data table index, but does not consider the running time of each index in the SQL statement of the actual environment, and cannot determine that the created index improves the actual query speed.
In Chinese patent application, "Python-based database index optimization method, device and system" (CN201811284394.2), a device is designed, a django frame based on Python is used as a front end, a database operation statement input by a user is received, and an open source tool SQLAdvisor is adopted to process and analyze at the rear end to obtain an index optimization result. The method of the patent analyzes SQL sentences input by developers to obtain index optimization suggestions, and can help SQL create or optimize indexes before online; however, when the data volume of the database table or the distribution of the table data in the actual environment changes, the index created before online cannot match the existing environment, and the query efficiency cannot be guaranteed.
Therefore, in order to further improve the query efficiency of the database index and adapt to the automatic optimization and adjustment of the database index in different service scenarios, it is desirable to provide an improved method and system for optimizing the query efficiency of the database index.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The invention aims to provide a database index optimization method and system based on virtual indexes, which select the optimal index from the actual operation efficiency of SQL query statements, can adapt to automatic optimization and adjustment of database indexes under different service scenes, does not depend on manual experience any more, improves the index query efficiency of a database system, and improves the query performance of the database, wherein the database comprises but is not limited to a MySQL database. The invention establishes three modules: the system comprises a production module, a replication module and a self-analysis module. The method comprises the steps of synchronizing database data and logs in a production module to a replication module, selecting SQL sentences in slow query logs to perform lexical analysis and syntactic analysis to generate a group of index sets, evaluating and obtaining effective index combinations, namely virtual indexes, calculating the running time of each virtual index, comparing and selecting the best index and automatically updating the best index to a production database to realize the first updating of the virtual index, training and establishing a virtual index evaluation value model by adopting a method of fusing three models of XBOOT, LIGHT and CATBOOST on the basis of data actually generated in the first updating through a self-analysis module, predicting the best index of a production environment with real-time change of the data, automatically updating the model to the production database, and completing the real-time monitoring and maintenance of the index efficiency.
According to an aspect of the present invention, there is provided a database index optimization method based on a virtual index, the method including:
responding to the index updating request, and synchronizing the production database and the log information;
creating a set of virtual indices based on an analysis of corresponding query statements in the log;
selecting and automatically updating one or more best indexes to a production database based on the actual query execution time for each virtual index in the set of virtual indexes; and
the one or more best indexes are again selected and automatically updated to the production database based on the virtual index rating values reflecting the improvements to query execution times before and after each index creation.
According to one embodiment of the invention, the index update request is triggered periodically or in response to production database query efficiency falling below a threshold preset in the log.
According to a further embodiment of the present invention, the creating a set of virtual indices further comprises:
selecting corresponding query sentences in the log to perform lexical analysis and syntactic analysis to generate a candidate index set; and
evaluating and acquiring a virtual index set with index evaluation parameters, wherein the index evaluation parameters comprise: column combinations of equivalent predicate associations behind WHEREs in the query; fields in the ORDER or GROUP need to be included in the combined index and are consistent in sequence; and an overlay index.
According to a further embodiment of the present invention, the reselecting and automatically updating one or more optimal indexes to the production database based on the virtual index rating values reflecting improvements for query execution time before and after creation of each index further comprises:
record query and index features F ═ (F)1,f2,…,fk) (ii) a And
generating training data including the features for training of the virtual index evaluation value model, wherein the data amount is a virtual index combination number (SQL statement number) query times.
According to a further embodiment of the invention, the virtual index evaluation value model is trained and established by adopting a method of fusing XGBOOT, LIGHT TGBM and CATBOST models.
According to a further embodiment of the present invention, the re-selecting and automatically updating one or more best indexes to the production database based on the virtual index rating value reflecting the improvement of the query execution time before and after each index creation further comprises:
obtaining the actual running time of each virtual index in the virtual index set and the characteristic F ' (F ' of query and index '1,f′2,…,f′k);
By passing
Figure BDA0002739716170000031
To train and obtain a model M, wherein the model M1、M2、M3Are respectively obtained by XGBOOT, LIGHT TGBM and CATBOOST training, and the optimal parameter Para1、Para2、Para3Are respectively obtained through five-fold cross validation;
by passing
Figure BDA0002739716170000032
Fitting best model Mbest=a*M1+b*M2+c*M3Where Opt (I, q) is the index I's optimization index to query SQL statement q;
simulating a predicted optimal index using the optimal model for different SQL statements related to the same table in the log,
Figure BDA0002739716170000033
calculating an optimal index using an index evaluation function Z, Ibest=min(Z|Ii)i∈[0,k]Wherein Z ═ Σq∈SOpt (I, q); and
and comparing the index with the minimum virtual index evaluation value with the current application index, and reserving one or more best indexes.
According to a further embodiment of the invention, the Opt (I, q) is the inverse of the difference in execution time before and after creation of the index,
Figure BDA0002739716170000041
according to another aspect of the present invention, there is provided a virtual index-based database index optimization system, the system including:
a production module configured to: recording log information, monitoring query efficiency, and sending an index updating request at regular time or when the query efficiency is lower than a preset threshold value;
a replication module configured to:
synchronizing production databases, log information, and
creating a set of virtual indices based on an analysis of corresponding query statements in the log; and a self-analysis module configured to:
selecting and automatically updating one or more best indexes to a production database based on the actual query execution time for each virtual index in the set of virtual indexes; and
the one or more best indexes are again selected and automatically updated to the production database based on the virtual index rating values reflecting the improvements to query execution times before and after each index creation.
According to an embodiment of the present invention, the log information includes a user query condition in a preset time period, including an original file of a slow query log, and a statistically generated slow query SQL statement (SQL _ text), a start execution time (start _ time), a statement execution time (query _ time), a lock table time (lock _ time), a record number returned by a query (rows _ send), a number of scanned rows (rows _ extended), and a number of times of statement execution (rows _ count).
According to a further embodiment of the invention, the self-analysis module is further configured to:
record query and askThe characteristic of the lead is F ═ F1,f2,…,fk);
Generating training data comprising the features for training of a virtual index evaluation value model, wherein the data amount is a virtual index combination number (SQL statement number) query times; and
and training and establishing the virtual index evaluation value model by adopting a method of fusing XGBOOT, LIGHT TGBM and CATBOOST models based on the training data, and performing optimal index prediction based on the virtual index evaluation value model.
Compared with the scheme in the prior art, the database index optimization system and the database index optimization method provided by the invention have the following advantages:
(1) at present, some techniques implement index creation methods based on field word frequency analysis in query logs, SQL statement execution analysis, and other methods, but the query efficiency of indexes is not strongly related to the format of query fields and SQL statements, and the characteristics, data volume, data distribution of a database table, and the like are also considered, so that these techniques cannot generate indexes with optimal query efficiency. The invention provides a simulation operation method through virtual indexing, which can finally obtain the most accurate indexing result, greatly improve the actual query speed of the indexing and reduce the expenditure of hardware resources.
(2) The invention provides a method for selecting the optimal index by analyzing the virtual index combination from the slow query log and simulating the SQL statement operation calculation, which improves the operation effect of the traditional field combination index optimization based on the query log without considering the index in the actual production environment.
(3) The invention provides a brand-new method for improving index query efficiency of a MySQL database, which comprises the steps of automatically creating training samples and collecting data characteristics from SQL query simulation operation based on virtual indexes, and effectively solving the problem of insufficient prediction accuracy of the traditional index creation method by adopting a method of fusing XGBOOT, LIGHT TGBM and CATBOOST machine learning models.
(4) Has the online learning ability and strong adaptability. The traditional index creating method is mainly based on data statistics and analysis, and the method disclosed by the patent continuously trains and automatically optimizes an index evaluation value model by adopting machine learning according to the change of data characteristics of a database, continuously adapts to the requirement of the database on indexes, and finally improves the query performance.
(5) The method has universality, although database query statements and data tables of different services are different and the indexing requirements are different, the method selects the optimal index based on virtual index to copy environment simulation operation, and simultaneously, a virtual index evaluation value model is established by utilizing real data based on machine learning training and can be used for predicting the optimal index, so that the method can be used for optimizing the database index under different scenes of different services.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
FIG. 1 is an exemplary architecture diagram of a virtual index based database index query efficiency optimization system according to one embodiment of the invention.
FIG. 2 is a flowchart of a method for optimizing the efficiency of database index queries based on virtual indexes, according to an embodiment of the present invention.
FIG. 3 is a flow diagram of a method for performing a first update of a virtual index based on an actual operation best index update, according to one embodiment of the invention.
FIG. 4 is a flowchart of a method for performing a second update of a virtual index based on a virtual index rating value model, according to one embodiment of the invention.
Detailed Description
The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.
FIG. 1 is an exemplary architecture diagram of a virtual index based database index optimization system 100, according to one embodiment of the invention. As shown in FIG. 1, the system 100 of the present invention includes a production module 101, a replication module 102, and a self-analysis module 103.
The production module 101 includes a timing unit 104, an abnormality detection unit 105, a user log unit 106, and a production database 107. The production database 107 is a database of real operating services. The user log unit 106 is responsible for recording and counting user query conditions in a preset time period, including a slow query log original file, and counting generated slow query SQL statements (SQL _ text), start execution time (start _ time), statement execution time (query _ time), lock table time (lock _ time), query returned record number (rows _ send), scanned row number (rows _ exposed), statement execution times (rows _ count), and other information. The anomaly detection unit 105 is configured to record and detect whether the execution time of the current query statement is significantly aggravated than the history, generate anomalous query data, and trigger an index update request. The timing unit 104 is used to time synchronize all data and log information of the database and trigger an index update request.
The copy module 102 includes a copy unit 108 and a virtual index unit 109. The replication unit 108 is used to replicate and synchronize the production database and the user log. The virtual index unit 109 decomposes each query statement in the copied user log into query condition information and field information, analyzes and generates an index combination, and finally generates a group of virtual index combinations with index evaluation parameters by combining with the self-analysis module 103.
The self-analysis module 103 includes a calculation unit 110, a prediction unit 111, and an optimization unit 112. The calculation unit 110 is configured to perform corresponding query statement calculations using the various virtual indexes in the copied database again to generate an effect evaluation value that measures the actual effect of the virtual indexes. The prediction unit 111 is used for optimal virtual index training based on data features to realize an index automatic creation process based on data change trend. The optimization unit 112 evaluates and adjusts the prediction results.
Those skilled in the art will appreciate that the database index optimization system of the present invention and its various modules may be implemented in hardware or software, and that the modules may be combined or combined in any suitable manner.
FIG. 2 is a flow diagram of a virtual index based database index optimization method 200 according to one embodiment of the invention. The method 200 begins at step 201 where the production module 101 records log information, monitors query efficiency, and sends an index update request at regular times or upon detection of an anomaly. The log information comprises a slow query log original file, and information such as a statistically generated slow query SQL statement (SQL _ text), a start execution time (start _ time), a statement execution time (query _ time), a lock table time (lock _ time), a record number (rows _ send) returned by the query, a number of scanned rows (rows _ exposed), a statement execution frequency (rows _ count), and the like. The query efficiency criterion is based on a threshold in the slow query log, which is set based on expert experience, for example, the threshold defaults to 1 in the present system, and if the query efficiency is lower than the threshold, an exception alarm will be triggered and recorded in the exception log. The anomaly detection unit 105 in the production module 101 triggers an index update request in response to the production database query efficiency falling below a preset threshold in the slow query log. The timing unit 104 in the production module 101 periodically synchronizes all data and log information of the database to the replication module, triggering an index update request, e.g., the timing unit synchronizes data every 3 months and triggers an index update request.
In step 202, replication module 102 creates a virtual index set based on the binlog log synchronized production database, log information. After the replication and synchronization of the database and the log information are completed by the replication unit 108, the virtual index unit 109 in the replication module 102 selects the corresponding query statements to perform one-by-one analysis according to the index update triggering condition, and the analysis process is completed by a lexical analysis tool Flex and a syntactic analysis tool Bison. As an example, the SQL statement is: SELECT a, B FROM T WHERE C ═ x and a > y ORDER BY B, which can be resolved BY the virtual index unit 109 as follows:
{SELECT:{A,B},FROM:{T},WHERE:{AND:{C:{C=x},A:{A>y}}},ORDER:{B}}
the virtual index unit 109 generates a set of candidate index sets (e.g., { [ a ], [ C, a, B ], [ C, B, a ] }) according to the parsing result after the parsing is completed. The virtual index unit 109 then acquires a set of valid index combinations (i.e., virtual indexes) based on the index evaluation criterion. The virtual index evaluation criteria are as follows: (1) column combinations of equivalent predicate associations behind WHEREs in the query; (2) fields in ORDER or GROUP need to be included in the combined index and are ordered consistently (3) to include all columns in the query statement (overlay index). Examples of virtual indices with index evaluation criteria parameters are as follows:
index combination Evaluation criteria parameters
Index_1 [A]
Index_2 [C]
Index_3 [C,A] [1]
Index_4 [C,A,B] [1,3]
Index_5 [C,B,A] [2,3]
In step 203, the self-analysis module 103 performs a first update of the virtual index based on the actual operation of the optimal index update. The specific update steps are further illustrated in fig. 3.
In step 204, the self-analysis module 103 performs a second update of the virtual index based on the virtual index evaluation value model, and completes the real-time index detection and maintenance. The specific update steps are further illustrated in fig. 4.
FIG. 3 is a flow diagram of a method 300 for performing a first update of a virtual index based on an actual operation best index update, according to one embodiment of the invention. The method 300 starts in step 301, and creates each virtual index (including the un-indexed index) acquired by the virtual indexing unit 109 from the computing unit 110 in the analysis module 103, and executes and records the actual running time of each SQL query.
In step 302, the computing unit 110 records the feature attributes of the query and index as follows:
name (R) Object Description of the invention
S_TYPE Query q Query categories (select, insert, update, etc.),delete)
S_TNUM Query q Looking up the number of involved tables
S_RNUM Query q Looking up the number of records in the related table (averaging multiple tables)
S_ROWS_EXAMINED Query q Querying the number of scanned rows
S_COST Query q Cost of query
S_ROWS_SENT Query q Querying number of records returned
S_ROWS_COUNT Query q Number of times of execution of query statement
S_OPERATION Query q Number of operators involved in a query
S_CODE Query q Coding conditions of query condition field
S_ZNUM Query q Number of types of query field
I_RNUM Index I Number of records in data table where index is located
I_ZNUM Index I Number of fields contained in index
I_CODE Index I Coding cases for fields contained in the index
In step 303, training data is generated for model training and building by the prediction unit 111, where the training data includes the actual running time and feature summary F ═ F (F) of each virtual index1,f2,…,fk) And the data quantity is the virtual index combination number SQL statement number table number query times.
At step 304, the actual run times of the virtual indices are compared, the top K best indices are selected as outputs to be fed back to the production module and the indices are automatically created. And finishing the first updating process of the virtual index.
FIG. 4 is a flow diagram of a method 400 for performing a second update of a virtual index based on a virtual index rating value model, according to one embodiment of the invention. The method 400 begins at step 401 by obtaining the index actual run time and feature summary in the first update, F ═ F (F)1,f2,…,fk) Wherein a class feature (e.g., S _ TYPE) is quantized to convert to a class value, a data feature (e.g., S _ RNUM) is binned, and finally the features are summarized as F ═ F'1,f′2,…,f′k)。
In step 402, the model M is obtained by training xgoost, lightgbm and catboost respectively1、M2、M3
In step 403, the optimal parameters Para of the 3 algorithms are respectively obtained through five-fold cross validation1、Para2、Para3
At step 404, training obtains model M
Figure BDA0002739716170000091
Where M reflects the relationship between the index and the query, M (F) → Opt (I, q), where Opt (I, q) is an index of optimization of index I to query SQL statement q. Opt (I, q) is measured by the inverse of the difference in execution time before and after creation of the index, i.e.
Figure BDA0002739716170000092
In real time, Opt (I, q) cannot be directly obtained, and the value can be predicted, for example, by machine learning.
In step 405, by
Figure BDA0002739716170000093
Fitting best model Mbest=a*M1+b*M2+c*M3
At step 406, the predicted best index is simulated for different SQL statements related to the same table in the log, i.e., the best index is simulated
Figure BDA0002739716170000094
In step 407, the index evaluation function Z of the same table is used to calculate the optimal index, i.e. the index evaluation function Z of the same table is used to calculate the optimal index
Ibest=min(Z|Ii)i∈[0,k]
Wherein Z ═ Σq∈sOpt(I,q)。
In step 408, the index with the smallest evaluation value of the virtual index is indexed with the current applicationAnd comparing the rows, and keeping the first K best indexes, namely automatically deleting the indexes which are not in the first K best indexes and automatically creating the indexes which are not created in the first K best indexes. Furthermore, in the case where the production database does not actually create an index, the virtual index may be directly based on the best model M fitted in step 405bestImplementation of evaluation value ZI=∑q∈sOpt (I, q).
It should be understood that, in an actual database environment, after an index update request is issued, the self-analysis module 103 no longer needs to spend a lot of time to create and execute each SQL query statement under each virtual index, and compares actual running time to select an optimal index, but completes an index real-time detection and maintenance task based on the virtual index evaluation value model.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Claims (10)

1. A database index optimization method based on virtual index is characterized by comprising the following steps:
responding to the index updating request, and synchronizing the production database and the log information;
creating a set of virtual indices based on an analysis of corresponding query statements in the log;
selecting and automatically updating one or more best indexes to a production database based on the actual query execution time for each virtual index in the set of virtual indexes; and
the one or more best indexes are again selected and automatically updated to the production database based on the virtual index rating values reflecting the improvements to query execution times before and after each index creation.
2. The method of claim 1, wherein the index update request is triggered periodically or in response to production database query efficiency falling below a threshold preset in the log.
3. The method of claim 1, wherein the creating a set of virtual indices further comprises:
selecting corresponding query sentences in the log to perform lexical analysis and syntactic analysis to generate a candidate index set; and
evaluating and acquiring a virtual index set with index evaluation parameters, wherein the index evaluation parameters comprise: column combinations of equivalent predicate associations behind WHEREs in the query; fields in the ORDER or GROUP need to be included in the combined index and are consistent in sequence; and an overlay index.
4. The method of claim 1, wherein the reselecting and automatically updating one or more best indexes to a production database based on a virtual index rating value reflecting improvements to query execution time before and after each index creation further comprises:
record query and index features F ═ (F)1,f2,…,fk) (ii) a And
generating training data including the features for training of the virtual index evaluation value model, wherein the data amount is a virtual index combination number (SQL statement number) query times.
5. The method of claim 4, wherein the virtual index rating model is trained and established by a method of fusing XGBOOT, LIGHT TGBM and CATBOOST models.
6. The method of claim 5, wherein reselecting and automatically updating one or more best indexes to the production database based on a virtual index rating value reflecting improvements to query execution time before and after each index creation further comprises:
obtaining an actual query execution time for each virtual index in the set of virtual indices and quantized features F ' (F ' of queries and indices '1,f′2,…,f′k);
By passing
Figure FDA0002739716160000021
To train and obtain a model M, wherein the model M1、M2、M3Are respectively obtained by XGBOOT, LIGHT TGBM and CATBOOST training, and the optimal parameter Para1、Para2、Para3Are respectively obtained through five-fold cross validation;
by passing
Figure FDA0002739716160000022
Fitting best model Mbest=a*M1+b*M2+c*M3Where Opt (I, q) is the index I's optimization index to query SQL statement q;
simulating a predicted optimal index using the optimal model for different SQL statements related to the same table in the log,
Figure FDA0002739716160000023
calculating an optimal index using an index evaluation function Z, Ibest=min(Z|Ii)i∈[0,k]Wherein Z ═ Σq∈SOpt (I, q); and
and comparing the index with the minimum virtual index evaluation value with the current application index, and reserving one or more best indexes.
7. The method of claim 6, wherein Opt (I, q) is the inverse of the difference between the pre-creation and post-creation execution times for the index,
Figure FDA0002739716160000024
8. a virtual index based database index optimization system, the system comprising:
a production module configured to: recording log information, monitoring query efficiency, and sending an index updating request at regular time or when the query efficiency is lower than a preset threshold value;
a replication module configured to:
synchronizing production databases, log information, and
creating a set of virtual indices based on an analysis of corresponding query statements in the log; and a self-analysis module configured to:
selecting and automatically updating one or more best indexes to a production database based on the actual query execution time for each virtual index in the set of virtual indexes; and
the one or more best indexes are again selected and automatically updated to the production database based on the virtual index rating values reflecting the improvements to query execution times before and after each index creation.
9. The system of claim 8, wherein the log information includes user query conditions in a preset time period, including a slow query log original file, and statistically generated slow query SQL statements (SQL _ text), start execution time (start _ time), statement execution time (query _ time), lock table time (lock _ time), number of records returned by the query (rows _ send), number of scanned rows (rows _ affected), and number of times of statement execution (rows _ count).
10. The system of claim 8, wherein the self-analysis module is further configured to:
record query and index features F ═ (F)1,f2,…,fk);
Generating training data comprising the features for training of a virtual index evaluation value model, wherein the data amount is a virtual index combination number (SQL statement number) query times; and
and training and establishing the virtual index evaluation value model by adopting a method of fusing XGBOOT, LIGHT TGBM and CATBOOST models based on the training data, and performing optimal index prediction based on the virtual index evaluation value model.
CN202011146017.XA 2020-10-23 2020-10-23 Database index optimization method and system based on virtual index Pending CN113704246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011146017.XA CN113704246A (en) 2020-10-23 2020-10-23 Database index optimization method and system based on virtual index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011146017.XA CN113704246A (en) 2020-10-23 2020-10-23 Database index optimization method and system based on virtual index

Publications (1)

Publication Number Publication Date
CN113704246A true CN113704246A (en) 2021-11-26

Family

ID=78646697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011146017.XA Pending CN113704246A (en) 2020-10-23 2020-10-23 Database index optimization method and system based on virtual index

Country Status (1)

Country Link
CN (1) CN113704246A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017130A (en) * 2022-01-19 2022-09-06 昆明理工大学 Multi-attribute index selection based on deep reinforcement learning
CN115827646A (en) * 2023-02-22 2023-03-21 北京仁科互动网络技术有限公司 Index configuration method and device and electronic equipment
CN118132566A (en) * 2024-04-30 2024-06-04 深圳九有数据库有限公司 Database index optimization method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010701A1 (en) * 1999-07-20 2002-01-24 Platinum Technology Ip, Inc. Method and apparatus for viewing the effect of changes to an index for a database table on an optimization plan for a database query
US20050203940A1 (en) * 2004-03-12 2005-09-15 Sybase, Inc. Database System with Methodology for Automated Determination and Selection of Optimal Indexes
SG126690A1 (en) * 2002-02-27 2006-11-29 Sin Etke Technology Co Ltd Method of transferring facility information by radio and control center

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010701A1 (en) * 1999-07-20 2002-01-24 Platinum Technology Ip, Inc. Method and apparatus for viewing the effect of changes to an index for a database table on an optimization plan for a database query
CN1361890A (en) * 1999-07-20 2002-07-31 计算机联合思想公司 A database system for viewing effects of changes to a index for a query optimization plan
SG126690A1 (en) * 2002-02-27 2006-11-29 Sin Etke Technology Co Ltd Method of transferring facility information by radio and control center
US20050203940A1 (en) * 2004-03-12 2005-09-15 Sybase, Inc. Database System with Methodology for Automated Determination and Selection of Optimal Indexes

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017130A (en) * 2022-01-19 2022-09-06 昆明理工大学 Multi-attribute index selection based on deep reinforcement learning
CN115827646A (en) * 2023-02-22 2023-03-21 北京仁科互动网络技术有限公司 Index configuration method and device and electronic equipment
CN118132566A (en) * 2024-04-30 2024-06-04 深圳九有数据库有限公司 Database index optimization method

Similar Documents

Publication Publication Date Title
CN113704246A (en) Database index optimization method and system based on virtual index
CN109408347B (en) A kind of index real-time analyzer and index real-time computing technique
Ghezzi et al. Mining behavior models from user-intensive web applications
Zeng et al. The analytical bootstrap: a new method for fast error estimation in approximate query processing
US8983936B2 (en) Incremental visualization for structured data in an enterprise-level data store
Wang et al. Estimating the completion time of crowdsourced tasks using survival analysis models
US20150066987A1 (en) Method and system for accessing a set of data tables in a source database
CN109523157B (en) Business process processing method and system
CN106557558B (en) Data analysis method and device
EP3674918A2 (en) Column lineage and metadata propagation
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN111241079A (en) Data cleaning method and device and computer readable storage medium
Pullokkaran Analysis of data virtualization & enterprise data standardization in business intelligence
CN105511869A (en) Demand tracking system and method based on user feedback
CN110750582B (en) Data processing method, device and system
Ding et al. Efficient currency determination algorithms for dynamic data
CN112634004A (en) Blood margin map analysis method and system for credit investigation data
CN115098336A (en) Method, system, equipment and storage medium for monitoring warehouse tasks
TWI695285B (en) Regression method and system based on system program infrastructure
CN107730021A (en) A kind of operational indicator optimization method and device
US20200167668A1 (en) Dynamic rule execution order
JP2017010376A (en) Mart-less verification support system and mart-less verification support method
CN113779064B (en) Intelligent optimization method and system for online sql
Hao et al. Ontology Alignment Repair Through 0-1 Programming
CN118012916B (en) Report generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220210

Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072

Applicant after: Tianyi Digital Life Technology Co.,Ltd.

Address before: 201702 3rd floor, 158 Shuanglian Road, Qingpu District, Shanghai

Applicant before: Tianyi Smart Family Technology Co.,Ltd.