CN114048216A - Index selection method, electronic device and storage medium - Google Patents

Index selection method, electronic device and storage medium Download PDF

Info

Publication number
CN114048216A
CN114048216A CN202210029432.XA CN202210029432A CN114048216A CN 114048216 A CN114048216 A CN 114048216A CN 202210029432 A CN202210029432 A CN 202210029432A CN 114048216 A CN114048216 A CN 114048216A
Authority
CN
China
Prior art keywords
index
database
workload
candidate set
selection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210029432.XA
Other languages
Chinese (zh)
Other versions
CN114048216B (en
Inventor
杨魁
李忠良
周祥生
屠要峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202210029432.XA priority Critical patent/CN114048216B/en
Publication of CN114048216A publication Critical patent/CN114048216A/en
Application granted granted Critical
Publication of CN114048216B publication Critical patent/CN114048216B/en
Priority to PCT/CN2022/136701 priority patent/WO2023134329A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The embodiment of the invention relates to the field of databases, and discloses an index selection method, electronic equipment and a storage medium. The index selection method of the invention comprises the following steps: acquiring database operation data input by a user; generating a database index candidate set through a preset rule according to database operation data; and calculating the workload of the database index candidate set, and selecting a database index recommendation set from the database index candidate set according to the workload of the database index candidate set and the constraint condition of the storage space. The index selection method can automatically recommend the proper index to the user.

Description

Index selection method, electronic device and storage medium
Technical Field
The embodiment of the application relates to the field of databases, in particular to an index selection method, electronic equipment and a storage medium.
Background
An index is a structure that sorts one or more columns of values in a database table, and the index can be used to quickly access specific information in the database table. Although the establishment of the database index can improve the query performance of the database management system, additional disk overhead and index maintenance cost (such as index update caused by query update) exist at the same time, and the establishment of improper indexes may not help the query speed greatly.
Therefore, it is a very important problem to select the fields or field combinations of the tables in the database to build the index, and improper selection will reduce the query performance and occupy hardware resources. How to automatically select a proper index by the device can effectively improve the query performance of the database, which becomes an urgent problem to be solved in the industry.
Disclosure of Invention
The embodiment of the application mainly aims to provide an index selection method, electronic equipment and a storage medium, so that a proper index can be automatically recommended to a user.
In order to achieve the above object, an embodiment of the present application provides an index selection method, including: acquiring database operation data input by a user; generating a database index candidate set through a preset rule according to database operation data; and calculating the workload of the database index candidate set, and selecting a database index recommendation set from the database index candidate set according to the workload of the database index candidate set and the constraint condition of the storage space.
To achieve the above object, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the index selection method.
To achieve the above object, an embodiment of the present application further provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the above index selection method.
The index selection method provided by the application obtains the database operation data input by the user and obtains the index selection information according to the database operation data, generating a database index candidate set through a preset rule, calculating the workload of the database index candidate set, and selecting a database index recommendation set from the database index candidate set according to the working load and storage space constraint conditions of the database index candidate set, because the database index candidate set is generated according to the database operation data input by the user, the indexes meeting the use requirements of the user are contained in the database index candidate set, and the database index recommendation set is selected from the database index candidate set according to the working load and the storage space constraint condition, therefore, the indexes in the database index recommendation set also take the workload and storage problems of the indexes in the database into account, so that the proper indexes can be automatically recommended to the user.
Drawings
FIG. 1 is a first flowchart illustrating an index selection method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a second method for selecting an index according to an embodiment of the present invention;
FIG. 3 is a third flowchart illustrating an index selection method according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a database data index recommended state representation phase according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating a control parameter exploration and optimization phase of a database index recommendation system according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an index selection apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that in the examples of the present application, numerous technical details are set forth in order to provide a better understanding of the present application. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present application, and the embodiments may be mutually incorporated and referred to without contradiction.
An embodiment of the present invention relates to an index selection method, as shown in fig. 1, including the following steps:
step 101, obtaining database operation data input by a user;
102, generating a database index candidate set according to database operation data through a preset rule;
and 103, calculating the workload of the database index candidate set, and selecting a database index recommendation set from the database index candidate set according to the workload of the database index candidate set and the constraint condition of the storage space.
The index selection method is applied to electronic equipment for providing database service for users, for example, a computer for managing the database by the users. The index selection method can be realized in a mode of being integrated in database service software or in a mode of developing a database plug-in to interact with the database service software.
The database index recommendation technology can be divided into two technologies, namely a rule-based technology and a cost-based technology, the rule-based technology generally utilizes a rule to construct a field of an index, and then screening is carried out through a database optimizer so as to give a proper index, and the cost-based technology generally utilizes a machine learning method to construct a quantitative model of the index, and estimates the query optimization effect of the index according to the model so as to select the proper index. However, both of these approaches have disadvantages: the rule-based technique does not take into account the cost of index maintenance due to write queries (insert, update and delete, and query); however, the cost model technology requires offline model training and online model use, and when a complex scene model fails, the model needs to be updated continuously to adapt to a more complex scene, so that the model has hysteresis and cannot be used for timely performing appropriate index selection recommendation for a user.
The index selection method of the application generates a database index candidate set through a preset rule according to database operation data by acquiring the database operation data input by a user, selecting a database index recommendation set from the database index candidate set according to the working load and storage space constraint conditions of the database index candidate set, because the database index candidate set is generated according to the database operation data input by the user, the indexes meeting the use requirements of the user are contained in the database index candidate set, and the database index recommendation set is selected from the database index candidate set according to the working load and the storage space constraint condition, therefore, the indexes in the database index recommendation set also take the workload and storage problems of the indexes in the database into account, so that the proper indexes can be automatically recommended to the user.
The following describes the implementation details of the index selection method of the present embodiment in detail, and the following description is only provided for facilitating understanding of the implementation details and is not necessary for implementing the present embodiment.
In step 101, the electronic device obtains database operation data input by a user. The electronic equipment can read all input information of the user in the database software, so that the database operation data input by the user in the database can be acquired. For example, the collected database operation data may include, but is not limited to: the user can add, delete, change, check and the like database access or operation statements when using the database.
In one example, the electronic device obtains the database operation data input by the user by obtaining the database operation statements input by the user and frequency information of the database operation statements. The workload of the index configuration group in step 103 is calculated according to the database operation statement and the frequency information corresponding to the index configuration group.
In this embodiment, the database operation statements input by the user and the frequency information of the database operation statements are obtained as database operation data, and the workload of the index configuration group is calculated by using the database operation statements and the frequency information corresponding to the index configuration group.
In step 102, the electronic device generates a database index candidate set according to the database operation data and a preset rule. The preset rule may be to construct a single attribute index for several attributes related to the maximum number of times in the database operation data, or to construct a multi-attribute index using several attributes related to the maximum number of times in a combination including multiple attributes, or to construct an index only for the attributes related to the query operation of the database, and so on.
In one example, the preset rule includes any one or any combination of the following:
a first rule: generating all single attribute indexes by using attributes in J, EQ and RANGE;
the second rule is as follows: when the attributes in the O are from the same table, generating an index by using all the attributes in the O;
a third rule: if the join of table a and table b has multiple attributes, generating an index using the attributes of the join;
the fourth rule is that: expressing the attribute of J or the index generated by the rule 3 as J, expressing the attribute of RANGE as r, and respectively constructing an index J + EQ + r, an index J + r and an index J + EQ;
a fifth rule: if the number of the attributes in the USED is less than the maximum number of the attributes in the user-defined index, generating indexes by using the rest attributes in the USED;
where J is the attribute that appears in the JOIN operation, EQ is the attribute that appears in the EQUAL EQUAL operation, RANGE is the attribute that appears in the Range query RANGE operation, O is the attribute that appears in the GROUP GROUP BY, sort ORDER BY operation, and USED is the attribute that appears in the query result.
In this embodiment, the attribute of the index to be created may be extracted from the database operation statement input by the user through the device rule, so as to generate the database index candidate set.
In step 103, the electronic device selects a recommended set of database indexes from the candidate set of database indexes according to the workload and storage space constraints of the candidate set of database indexes. And the workload is a quantized value consumed by the query of the database after the index is newly added.
In one example, the workload of the index configuration group is calculated by the following formula:
Figure 853536DEST_PATH_IMAGE001
where W is the workload of index configuration set X, n is the number of database operation statements, QiIs a database operation statement, fiIs QiFrequency of (1), Cost (Q)iX) is to execute Q under the index configuration group XiThe cost function of (2) is calculated by a database optimizer in the database.
In one example, the electronic device selects a recommended set of database indexes from the candidate set of database indexes according to the workload and storage space constraints of the candidate set of database indexes by: and adding a candidate index in the database index candidate set to the index configuration group each time through a reinforcement learning model, iteratively calculating to obtain the index configuration group which has the minimum workload and meets the constraint condition of a storage space, and taking the finally obtained index configuration group as a database index recommendation set.
The reinforcement learning model determines and selects a new index for the index configuration group in each iteration calculation, and the index is an optimal candidate index selected from the candidate indexes by simulating and adding each candidate index in the database index candidate set to the index configuration group respectively, calculating the workload of the index configuration group and confirming the constraint condition of the storage space. The advantages and disadvantages are obtained by judging through the minimum workload of the index configuration group and the satisfaction of the constraint conditions of the storage space.
In this embodiment, because the reinforcement learning model does not need to be trained in advance, iterative computation can be performed to find an optimal solution after iterative computation parameters are set, and therefore, through the reinforcement learning model, a candidate index in the database index candidate set is added to the index configuration set each time, an index configuration set with the smallest workload and meeting the storage space constraint condition is obtained through iterative computation, and the finally obtained index configuration set is used as the database index recommendation set, so that the time required by training the model can be reduced, and thus the database index recommendation set can be obtained quickly.
In one example, the electronic device implements a reinforcement learning model by adding a candidate index in the database index candidate set to the index configuration group each time, and iteratively calculating to obtain an index configuration group with a minimum workload and satisfying a storage space constraint condition: converting the database index candidate set into a state vector, inputting the state vector into a reinforcement learning model, and selecting one state in the state vector by the reinforcement learning model to obtain the action of the state; the action comprises the steps that candidate indexes corresponding to the states are added to the index configuration group, the optimization degree of the workload of the index configuration group obtained through the action is calculated by the reinforcement learning model and is used as an incentive value, the reinforcement learning model selects the next state from the state vector, and the index configuration group which is the smallest in workload and meets the constraint condition of the storage space is obtained through iterative calculation.
In the embodiment, the database index candidate set is converted into the state vector, the state vector is input into the reinforcement learning model, and the reinforcement learning model selects one state in the state vector to acquire the action of the state; the action comprises the steps of adding the candidate indexes corresponding to the states to the index configuration group, calculating the optimization degree of the workload of the index configuration group obtained by the action of the reinforcement learning model to be used as an incentive value, selecting the next state from the state vector by the reinforcement learning model, and performing iterative computation to obtain the index configuration group which has the minimum workload and meets the constraint condition of the storage space, so that the iterative computation of the index configuration group is performed by the reinforcement learning.
Further, the reward value is calculated by the following formula:
Figure 386149DEST_PATH_IMAGE002
wherein r istIs the prize value at time t, W (X)t-1): at time t-1, the index configuration is Xt-1The value of the hourly workload W;
W(Xt): at time t, the index is configured as XtValue of time-dependent workload W, Storage (X)t): at time t, the index is configured as XtThe size of the storage space in time, M is a constraint upper limit value of storage (x), which can be preset by a user according to the configuration of the electronic device.
The index selection method can effectively and quickly find the index suitable for being built through online interactive learning with the database.
In one example, as shown in FIG. 2, the electronic device implements an index selection method through a framework of reinforcement learning.
In the framework of reinforcement learning, several concepts need to be clarified first:
the database index recommends control exploratory parameters: controlling those items in the index candidate set requires indexing, commonly referred to as action a in reinforcement learning.
Database index recommended state parameter vector: the current state of the data index is characterized and can be used as the state S in reinforcement learning.
Database workload cost: the cost of all SQL is the sum of the cost of each SQL is the frequency of the SQL multiplied by the cost estimation of the SQL in the database query optimizer.
Index recommendation control parameter rewards: the reward reflects the quality of the control parameters of the index recommendation system, and reflects the cost degree of the control parameters on one hand; on the other hand, the safety of the control parameters is reflected, the requirement of index storage space is met, and the database can safely and normally run after the index is created; commonly referred to as reward R in reinforcement learning.
A basic process of reinforcement learning: the reinforcement learning model generates a corresponding exploration action A according to the state S, then acts the action A on the interactive environment ENV, and then the ENV feeds back a corresponding inheritance state S and an action reward R to the model, and the reinforcement learning model performs self-optimization according to the reward R fed back by the environment ENV.
The index selection method of this embodiment can be implemented by four modules: the system comprises an sql collecting module, a rule analyzing module, an index recommendation parameter cost reward calculating module and an index recommendation exploration optimizing module. Wherein:
sql collection module: the method is used for collecting sql information of the database and frequency information.
A rule analysis module: the sql statement is parsed using database expert experience rules (i.e., preset rules) to generate an index candidate set.
The index recommendation parameter energy consumption reward calculation module: the method is used for calculating the degree of the current generated index recommendation control exploration parameter to the database workload cost.
The index recommendation exploration optimization module: the method is used for interacting with the database, exploring in the index candidate set, and enabling the database workload cost corresponding to the recommended index to be minimum under the condition that a certain constraint condition is met.
As shown in fig. 2 and 3, the method includes the following steps:
step 1, collecting user sql and the frequency of each sql from a database;
step 2, generating an index candidate set by using a rule analysis module according to the sql in the step 1;
step 3, converting the index candidate set into a database index recommendation state parameter vector according to the step 2;
step 4, inputting the parameter vector of the index recommendation state of the database in the step 3 into an index recommendation exploration optimization module;
step 5, an algorithm Rainbow in the index recommendation exploration optimization module receives the state parameter vector, outputs an action, and converts the action into an index needing to be created in an index candidate set;
step 6, according to the index item selected in step 5, creating an index for the item in the data index plug-in;
step 7, calling an award calculation module to calculate the award value R of the action, and storing the formed (S, A, R, S) tuple;
and 8, optimizing the Rainbow control parameter exploration optimization model according to the accumulated interactive experience.
The sql collecting module is specifically used for acquiring all sql sentences of the user in a period of time from the database and counting frequency information of each sentence. This information is typically collected from a table in the database log that stores sql, with database D containing n tables in the form { T }1,T2, ⋯,Tn},TiIs a table. The workload W (of size m) is a set of { (Q)1,f1), (Q2,f2),⋯,(Qn,fn) Of the form (i) }, in which QiIs an sql statement, and fiIs QiOne index configuration X is a set of indices. Each index in X may be a single-attribute or multi-attribute index. Storage (X) represents the storage space occupied by | X |, the constraint upper limit of storage (X) is M, | X |, represents the number of indexes in X, and | X |, the constraint upper limit is N. The constraint on the index to be built is denoted by c. And c is composed of an index maximum storage budget upper limit M and a maximum index number N.
And the rule analysis module is used for storing the database index candidate set generation rule. The electronic device groups the attributes of a table in the query into five groups:
j: attributes that appear in the JOIN condition.
EQ: attributes that occur under EQUAL conditions.
RANGE: attributes that appear in RANGE conditions.
O: attributes that appear in the GROUP BY, ORDER BY clauses.
The USED: the attributes displayed in this query.
With the above defined set, the electronic device proposes five rules to guide the generation of "index candidates":
a first rule: all single attribute indexes are constructed using attributes in J, EQ and RANGE.
The second rule is as follows: when the attributes in O are from the same table, an index is generated using all the attributes in O.
A third rule: if the join of Table a and Table b has multiple attributes, an index is constructed using all of the join attributes.
The fourth rule is that: and expressing J as an attribute of J or the index generated by the rule 3, and expressing r as an attribute of RANGE, and respectively constructing an index J + EQ + r, an index J + r and an index J + EQ. The "index j + r" represents a multi-attribute index including j and r, and the index j + EQ + r, the index EQ + r, and the index j + EQ are the same.
A fifth rule: if the number of attributes in the USED is less than the maximum number of attributes in the user-defined index, please construct the index by appending the remaining attributes (attributes in the USED, not the attributes in the index generated by the rules described above) to the created index.
And the index recommendation parameter energy consumption reward calculation module is used for giving an action or converting the action into index configuration through a reinforcement learning algorithm Rainbow, the query Cost of each sql is Cost (Q, X) which represents the Cost of querying Q under the index configuration X, wherein f is frequency, and the Cost (Q, X) value is obtained in a manner of obtaining the Cost of querying Q under the index configuration X through a database index hypothesis plug-in as shown in FIG. 1. Workload cost W (X) represents the cost of workload W under index configuration X. The calculation formula of W (X) is:
Figure DEST_PATH_IMAGE003
action index configuration X at time ttIs awardedr tThe calculation formula of (a) is as follows:
Figure 116208DEST_PATH_IMAGE004
wherein:
W(Xt-1) Is that at time t-1 the index configuration is Xt-1An estimated cost value of the temporal workload W;
W(Xt) Is that at time t the index is configured as XtAn estimated cost value of the temporal workload W;
Storage(Xt) Is that at time t the index is configured as XtIn time, index is matchedIs set to XtThe size of the storage space of (a);
m is the constraint upper limit of storage (X).
The index recommendation exploration optimization module is used for converting the index candidate set into a database index recommendation state parameter vector, for example, the candidate set has three items to be indexed, the index type of each item is two, the length of the state vector is 6, a bitmap is used for coding, the state vector is generated and is input into the index recommendation exploration optimization module; creating an index from the item in the index plug-in, invoking a reward calculation module to calculate a reward value R for the action, and forming (S)t,At,Rt,St+1) Storing the tuple experience samples; and optimizing optimization parameters of the Rainbow algorithm model according to the accumulated empirical samples, and circularly iterating the process until the optimal index configuration is found so that the cost of the workload W is minimum.
In one example, the electronic device implements index selection, which requires two stages of state representation by database index recommendation, exploration and optimization of control parameters by the database index recommendation system.
In the database data index recommendation state representation phase, as shown in fig. 4, the electronic device performs the following steps:
s401, collecting SQL sentences added, deleted, changed and checked when the user uses the database, and counting the frequency of each sentence.
S402, the invention provides four expert experience rules for generating the index candidate set, and analyzes each SQL statement collected in the step S401 according to the expert experience rules to generate the index candidate set.
S403, sorting each index item in the index candidate set generated in S402 and the type needing to be indexed, and converting the index items into a database index recommendation state parameter vector through bitmap coding.
In the database index recommendation system control parameter exploration and optimization phase, as shown in fig. 5, the electronic device performs the following steps:
s501, calculating the cost of the work load of the database before trying on the index, giving the upper limit constraint of the storage space required by the index, and giving the upper limit constraint of the number of index items.
S502 takes the initial database index recommendation state parameter vector in S403 as the input of a reinforcement learning algorithm Rainbow, and the reinforcement learning algorithm Rainbow outputs a database index recommendation system control exploratory parameter.
S503, controlling exploratory parameters according to the database index recommendation system in S502, and calculating the cost of the current work load of the database and the storage space required by the index after the exploratory parameters are implemented.
S504 calculates the reward of the exploration according to the cost of the workload of the database before the indexing in S501 and the upper limit of the storage space required by the given indexing, and the cost of the current workload of the database after the exploration parameters are implemented and the storage space required by the indexing in S503.
And S505, according to the assumption in S503 that whether the storage space required by the index subjected to the search parameter is larger than the upper limit of the storage space required by the given index in S501, if so, stopping the round, otherwise, continuing.
S506 updates the database index recommended state parameter vector in S403 according to the exploratory parameter controlled by the database index recommendation system generated in S502.
S507 combines the updated database index recommended state parameter vector in S506, the database index recommended system control exploratory parameter generated in S502, and the reward value in S504 into a triple (the database index recommended state parameter vector, the database index recommended system control exploratory parameter, and the reward value) as the empirical sample of the database intelligent index recommended system control parameter, and stores and records the triple.
S508 takes the experience sample in S507 as the input of the reinforcement learning algorithm Rainbow in S502, and the Rainbow parameters of the algorithm are updated.
S509 judges whether the number of the given index item upper limit in S501 is less than or equal to the number of the given index item upper limit in the database index recommendation state parameter vector, if the number of the given index item upper limit is less than or equal to the number of the given index item upper limit, the step S502 to the step S509 are executed circularly until the number of times of exploring the control parameter of the current database index recommendation system reaches the set requirement, then the loop is jumped out, the step S510 is jumped to, and if the number of times of exploring the control parameter of the current database index recommendation system reaches the set requirement, the round is ended.
S510, searching and selecting the index recommendation configuration recommendation of the reason for the minimum cost value of the workload to the user according to the steps.
The present application is exemplified by two specific examples, such as index selection methods, which are the case of index selection in two databases, as shown in the following table.
Table 1: index selection example
Example ID Database identification Number of database tables Upper limit of number of indexes Index store ceiling Sql Collection time Length Number of index types Number of iterations of the algorithm
Example one B 4 10 1GB 1 month 2 100
Example two C 8 20 2GB 2 months old 4 200
Example one:
the first step is as follows: and acquiring all sql sentences used by the user within 1 month in the database B by using an sql collecting module, and counting the frequency of each sentence.
The second step is that: and (3) carrying out rule analysis on the sql statement in the first step, generating an index candidate set after the analysis, setting the number of items of the index candidate set to be L, adding index type information in the index candidate set as each item has an index type of 2, increasing the number of items of the index candidate set to be 2L, sequencing the index candidate set, and generating a state vector S by using bitmap coding, wherein the dimension of S is 2L.
The third step: and initializing a rainbow algorithm in the index recommendation exploration optimization module, and setting actions as 2L types.
The fourth step: inputting S into the rainbow algorithm gives the action A value as an i integer value (0 < i < 2L + 1).
The fifth step: finding the item corresponding to the position i in the sorted index candidate set, calling an index plug-in of the database A to index the item i, and acquiring a storage space used for creating the index of the item i through an optimizer of the database B as s 1. And (5) judging whether the s1 is less than or equal to the index storage upper limit 1G, if so, continuing to the next step, and if not, newly executing the step (i) and the step (v).
And a sixth step: and calling the reward calculation module R under the condition of indexing the i items, and updating the S. And storing the experience samples forming the (S, A, R, S) tuples, and inputting the experience samples into a rainbow algorithm to update the model parameters.
The seventh step: judging whether the number of the index items is less than or equal to the upper limit of the index number by 10, if so, circularly executing the fourth step to the seventh step, otherwise, updating the S state in the second step, and ending the round.
Eighth step: and judging whether the iteration times of the algorithm are less than or equal to the upper limit of the iteration times of the algorithm of 100, if so, circularly executing the fourth step to the eighth step, and if not, ending.
The ninth step: and finding out the index configuration corresponding to the minimum workload cost of the database B in the index exploration, and recommending the index configuration to the user as recommended configuration.
Example two:
the first step is as follows: and acquiring all sql sentences used by the user within 2 months in the database C by using an sql collecting module, and counting the frequency of each sentence.
The second step is that: and (3) carrying out rule analysis on the sql statement in the first step, wherein the details of the rule are shown in section 5.4, generating an index candidate set after the analysis, setting the number of items of the index candidate set to be K, increasing index type information in the index candidate set to 4K items as the index type of each item is 4, sequencing the index candidate set, and generating a state vector S by using bitmap coding, wherein the dimension of S is 4K.
The third step: and initializing a rainbow algorithm in the index recommendation exploration optimization module, and setting actions as 4K types.
The fourth step: s is input into the rainbow algorithm, giving the action A value as a j integer value (0 < j < 4K + 1).
The fifth step: finding the item corresponding to the position i in the sorted index candidate set, calling an index plug-in of the database A, indexing the j item, and acquiring a storage space used for creating the j item index through an optimizer of the database C as s 2. And judging whether the s2 is less than or equal to the index storage upper limit 2G, if so, continuing to the next step, and if not, newly executing the step j, and if not, executing the step four and the step five.
And a sixth step: and calling the reward calculation module R under the condition of indexing the j items, and updating the S. And storing the experience samples forming the (S, A, R, S) tuples, and inputting the experience samples into a rainbow algorithm to update the model parameters.
The seventh step: judging whether the number of the index items is less than or equal to the upper limit of the index number 20, if so, circularly executing the fourth step to the seventh step, otherwise, updating the S state in the second step, and ending the round.
Eighth step: and judging whether the iteration times of the algorithm are less than or equal to the upper limit 200 of the iteration times of the algorithm, if so, circularly executing the fourth step to the eighth step, and if not, ending.
The ninth step: and finding out the index configuration corresponding to the minimum workload cost of the database A in the index exploration, and recommending the index configuration to the user as recommended configuration.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
An embodiment of the present invention further relates to an index selection apparatus, as shown in fig. 6, including:
an obtaining module 601, configured to obtain database operation data input by a user;
a generating module 602, configured to generate a database index candidate set according to the database operation data through a preset rule;
the selecting module 603 is configured to calculate a workload of the database index candidate set, and select a database index recommendation set from the database index candidate set according to the workload of the database index candidate set and a storage space constraint condition.
Since the above embodiments correspond to the present embodiment, the present embodiment can be implemented in cooperation with the above embodiments. Related technical details mentioned in the above embodiments are still valid in this embodiment, and the technical effects that can be achieved in the above embodiments can also be achieved in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
Embodiments of the present invention also relate to an electronic device, as shown in fig. 7, including: at least one processor 701; a memory 702 communicatively coupled to the at least one processor; the memory 702 stores instructions executable by the at least one processor 701, and the instructions are executed by the at least one processor 701 to perform the index selection method according to any of the embodiments described above.
The memory 702 and the processor 701 are coupled by a bus, which may comprise any number of interconnecting buses and bridges that couple one or more of the various circuits of the processor 701 and the memory 702. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. Information processed by processor 701 is transmitted over a wireless medium through an antenna, which receives the information and passes the information to processor 701.
The processor 701 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 702 may be used to store information used by the processor in performing operations.
Embodiments of the present invention relate to a computer-readable storage medium storing a computer program. The computer program realizes the above-described method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. An index selection method, comprising:
acquiring database operation data;
generating a database index candidate set according to the database operation data through a preset rule;
and calculating the workload of the database index candidate set, and selecting a database index recommendation set from the database index candidate set according to the workload of the database index candidate set and the constraint condition of the storage space.
2. The index selection method of claim 1, wherein the calculating the workload of the candidate set of database indexes and selecting a recommended set of database indexes from the candidate set of database indexes according to the workload of the candidate set of database indexes and storage space constraints comprises:
adding one candidate index in the database index candidate set to an index configuration group every time through a preset reinforcement learning model, and performing iterative computation to obtain the index configuration group which has the minimum workload and meets the constraint condition of a storage space;
and taking the finally obtained index configuration group as a database index recommendation set.
3. The index selection method of claim 2, wherein the workload of the index configuration group is calculated according to the database operation statement and the frequency information corresponding to the index configuration group.
4. The index selection method of claim 3, wherein the obtaining database operation data comprises:
and acquiring all database operation statements in a preset time period, and counting the frequency information of each database operation statement.
5. The index selection method of claim 3, wherein the workload of the index configuration group is calculated by the following formula:
Figure 913189DEST_PATH_IMAGE001
wherein W is the workload of the index configuration group X, n is the number of database operation statements, QiIs the database operation statement, fiIs QiFrequency of (1), Cost (Q)iX) is to execute Q under the index configuration group XiThe cost function of (2) is calculated by a database optimizer in the database.
6. The index selection method according to any one of claims 2 to 5, wherein the iteratively calculating, by using a preset reinforcement learning model, an index configuration group that has a minimum workload and satisfies a storage space constraint by adding one candidate index in the database index candidate set to the index configuration group each time includes:
converting the database index candidate set into a state vector;
selecting a state from the state vector, and acquiring the action of the state; wherein the action comprises adding the candidate index corresponding to the state to the index configuration group;
calculating the optimization degree of the workload of the index configuration group obtained by the action as a reward value;
and selecting the next state from the state vector, and performing iterative computation to obtain an index configuration group which has the minimum workload and meets the constraint condition of the storage space.
7. The index selection method of claim 6, wherein the reward value is calculated by the following formula:
Figure 993141DEST_PATH_IMAGE002
wherein r istIs the prize value at time t, W (X)t-1): at time t-1, the index configuration is Xt-1The value of the hourly workload W;
W(Xt): at time t, the index is configured as XtValue of time-dependent workload W, Storage (X)t): at time t, the index is configured as XtThe size of the storage space in time, M is the constraint upper limit value of storage (X).
8. The index selection method of claim 1, wherein the preset rule comprises any one or any combination of the following:
a first rule: generating all single attribute indexes by using attributes in J, EQ and RANGE;
the second rule is as follows: when the attributes in the O are from the same table, generating an index by using all the attributes in the O;
a third rule: if the join of table a and table b has multiple attributes, generating an index using the attributes of the join;
the fourth rule is that: expressing the attribute of J or the index generated by the rule 3 as J, expressing the attribute of RANGE as r, and respectively constructing an index J + EQ + r, an index J + r and an index J + EQ;
a fifth rule: if the number of the attributes in the USED is less than the maximum number of the attributes in the user-defined index, generating indexes by using the rest attributes in the USED;
where J is the attribute that appears in the JOIN operation, EQ is the attribute that appears in the EQUAL EQUAL operation, RANGE is the attribute that appears in the Range query RANGE operation, O is the attribute that appears in the GROUP GROUP BY, sort ORDER BY operation, and USED is the attribute that appears in the query result.
9. An electronic device, comprising:
at least one processor;
a memory communicatively coupled to the at least one processor;
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the index selection method of any one of claims 1 to 8.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the index selection method of any one of claims 1 to 8.
CN202210029432.XA 2022-01-12 2022-01-12 Index selection method, electronic device and storage medium Active CN114048216B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210029432.XA CN114048216B (en) 2022-01-12 2022-01-12 Index selection method, electronic device and storage medium
PCT/CN2022/136701 WO2023134329A1 (en) 2022-01-12 2022-12-05 Index selection method, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210029432.XA CN114048216B (en) 2022-01-12 2022-01-12 Index selection method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114048216A true CN114048216A (en) 2022-02-15
CN114048216B CN114048216B (en) 2022-05-20

Family

ID=80196247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210029432.XA Active CN114048216B (en) 2022-01-12 2022-01-12 Index selection method, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114048216B (en)
WO (1) WO2023134329A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023134329A1 (en) * 2022-01-12 2023-07-20 中兴通讯股份有限公司 Index selection method, electronic device, and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6169983B1 (en) * 1998-05-30 2001-01-02 Microsoft Corporation Index merging for database systems
US6266658B1 (en) * 2000-04-20 2001-07-24 Microsoft Corporation Index tuner for given workload
US7272589B1 (en) * 2000-11-01 2007-09-18 Oracle International Corporation Database index validation mechanism
US20140207755A1 (en) * 2012-11-19 2014-07-24 Nec Laboratories America, Inc. System for Multi-store Analytics Execution Environments with Storage Constraints
CN107179944A (en) * 2016-03-10 2017-09-19 先智云端数据股份有限公司 The method that stocking system resource is disposed by the study to workload in execution
CN111723076A (en) * 2020-06-24 2020-09-29 苏州松鼠山人工智能科技有限公司 Method and device for generating database index
CN111752901A (en) * 2020-06-23 2020-10-09 网易(杭州)网络有限公司 Index creation method and device, electronic equipment and storage medium
US20210034588A1 (en) * 2019-08-01 2021-02-04 Teradata Us, Inc. Physical database design and tuning with deep reinforcement learning
CN113157694A (en) * 2021-03-22 2021-07-23 浙江大学 Database index generation method based on reinforcement learning
CN113590632A (en) * 2021-08-11 2021-11-02 平安普惠企业管理有限公司 Database index creating method, device, equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807041B (en) * 2019-11-01 2022-05-20 广州华多网络科技有限公司 Index recommendation method and device, electronic equipment and storage medium
CN114048216B (en) * 2022-01-12 2022-05-20 中兴通讯股份有限公司 Index selection method, electronic device and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6169983B1 (en) * 1998-05-30 2001-01-02 Microsoft Corporation Index merging for database systems
US6266658B1 (en) * 2000-04-20 2001-07-24 Microsoft Corporation Index tuner for given workload
US7272589B1 (en) * 2000-11-01 2007-09-18 Oracle International Corporation Database index validation mechanism
US20140207755A1 (en) * 2012-11-19 2014-07-24 Nec Laboratories America, Inc. System for Multi-store Analytics Execution Environments with Storage Constraints
CN107179944A (en) * 2016-03-10 2017-09-19 先智云端数据股份有限公司 The method that stocking system resource is disposed by the study to workload in execution
US20210034588A1 (en) * 2019-08-01 2021-02-04 Teradata Us, Inc. Physical database design and tuning with deep reinforcement learning
CN111752901A (en) * 2020-06-23 2020-10-09 网易(杭州)网络有限公司 Index creation method and device, electronic equipment and storage medium
CN111723076A (en) * 2020-06-24 2020-09-29 苏州松鼠山人工智能科技有限公司 Method and device for generating database index
CN113157694A (en) * 2021-03-22 2021-07-23 浙江大学 Database index generation method based on reinforcement learning
CN113590632A (en) * 2021-08-11 2021-11-02 平安普惠企业管理有限公司 Database index creating method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘彦宇: "一种面向对象数据库聚集层次的索引配置方法", 《中原工学院学报》 *
崔跃生等: "数据库物理结构优化技术", 《软件学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023134329A1 (en) * 2022-01-12 2023-07-20 中兴通讯股份有限公司 Index selection method, electronic device, and storage medium

Also Published As

Publication number Publication date
CN114048216B (en) 2022-05-20
WO2023134329A1 (en) 2023-07-20

Similar Documents

Publication Publication Date Title
US10296658B2 (en) Use of context-dependent statistics to suggest next steps while exploring a dataset
EP2893468B1 (en) Automatic denormalization for analytic query processing in large-scale clusters
US5899985A (en) Inference method and inference system
US20200210524A1 (en) Analytical processing system supporting natural language analytic questions
CN105210058A (en) Graph query processing using plurality of engines
Meila et al. An Exponential Model for Infinite Rankings.
US8744994B2 (en) Data filtering and optimization for ETL (extract, transform, load) processes
CN104933100A (en) Keyword recommendation method and device
CN103136260A (en) Method and device for applying filtration factor assessment in optimization of access path in database
US20200210525A1 (en) Predicting query language statements from natural language analytic questions
CN114048216B (en) Index selection method, electronic device and storage medium
Li et al. Mining association rules based on deep pruning strategies
CN113656440A (en) Database statement optimization method, device and equipment
CN111723076A (en) Method and device for generating database index
CN113204642A (en) Text clustering method and device, storage medium and electronic equipment
Niu et al. Applying database optimization technologies to feature recognition in CAD
US9135302B2 (en) Query rewrite with a nested materialized view
Butka et al. A proposal of the information retrieval system based on the generalized one-sided concept lattices
Olteanu Factorized databases: A knowledge compilation perspective
Huang et al. Rough-set-based approach to manufacturing process document retrieval
CN114943004B (en) Attribute graph query method, attribute graph query device, and storage medium
CN112905591B (en) Data table connection sequence selection method based on machine learning
Ceruto et al. Quality measures for fuzzy predicates in conjunctive and disjunctive normal forms
CN117519702B (en) Search page design method and system based on low code collocation
CN117555950B (en) Data blood relationship construction method based on data center

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant