CN115114295B - Method and apparatus for determining a composite index - Google Patents

Method and apparatus for determining a composite index Download PDF

Info

Publication number
CN115114295B
CN115114295B CN202210793671.2A CN202210793671A CN115114295B CN 115114295 B CN115114295 B CN 115114295B CN 202210793671 A CN202210793671 A CN 202210793671A CN 115114295 B CN115114295 B CN 115114295B
Authority
CN
China
Prior art keywords
index
composite
query statement
query
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210793671.2A
Other languages
Chinese (zh)
Other versions
CN115114295A (en
Inventor
徐泉清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oceanbase Technology Co Ltd
Original Assignee
Beijing Oceanbase Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oceanbase Technology Co Ltd filed Critical Beijing Oceanbase Technology Co Ltd
Priority to CN202210793671.2A priority Critical patent/CN115114295B/en
Publication of CN115114295A publication Critical patent/CN115114295A/en
Application granted granted Critical
Publication of CN115114295B publication Critical patent/CN115114295B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Abstract

Embodiments of the present specification provide a method and apparatus for determining a composite index. The method comprises the following steps: acquiring a query statement set aiming at a target database; determining index optimization benefits of each compound index in the candidate compound index set under each query statement in the query statement set, wherein the index optimization benefits are used for reflecting query cost benefits brought by introducing the compound index on the basis of the inherent index of the target database to query the target database based on the query statement; and determining a target composite index from the candidate composite index set according to the index optimization benefits of each composite index. Therefore, the target compound index which is more suitable for the current load can be flexibly determined according to the change of the workload (namely the query statement) of the target database, and the query efficiency of the target database is effectively improved.

Description

Method and apparatus for determining a composite index
Technical Field
Embodiments of the present disclosure relate generally to the field of computer technology, and more particularly, to a method and apparatus for determining a composite index.
Background
An index is a structure that orders the values of a column (or attribute) or columns in a database table, with which specific information in the database table can be accessed quickly. The user may build an index on multiple columns in the database table, such an index being called a composite index (combined index). The complex index requires less overhead during database operation and may replace multiple single indices. While index selection is a particularly important stage of physical database design, it is primarily used to select database indexes to be created in order to efficiently retrieve data for a given workload. Therefore, how to select the compound index to improve the query efficiency is a urgent issue to be resolved.
Disclosure of Invention
In view of the foregoing, embodiments of the present specification provide a method and apparatus for determining a composite index. By using the method and the device, the determination of the composite index of the target database can be realized, so that the efficiency of inquiring the target database based on the inquiry statement is improved.
According to an aspect of embodiments of the present specification, there is provided a method for determining a composite index, comprising: acquiring a query statement set aiming at a target database; determining index optimization benefits of each compound index in a candidate compound index set under each query statement in the query statement set, wherein the index optimization benefits are used for reflecting query cost benefits brought by introducing the compound index on the basis of the inherent index of the target database to query the target database based on the query statement; and determining a target composite index from the candidate composite index set according to the index optimization benefits of each composite index.
According to another aspect of embodiments of the present specification, there is provided an apparatus for determining a composite index, comprising: an acquisition unit configured to acquire a query statement set for a target database; a benefit determining unit configured to determine an index optimization benefit of each compound index in a candidate compound index set under each query statement in the query statement set, wherein the index optimization benefit is used for reflecting query cost benefit brought by introducing a compound index to query the target database based on a query statement on the basis of an inherent index of the target database; and an index determination unit configured to determine a target composite index from the candidate composite index set according to an index optimization benefit of each composite index.
According to another aspect of embodiments of the present specification, there is provided an apparatus for determining a composite index, comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement the method for determining a composite index as described above.
According to another aspect of embodiments of the present description, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method for determining a composite index as described above.
According to another aspect of embodiments of the present specification, there is provided a computer program product comprising a computer program for execution by a processor to implement a method for determining a composite index as described above.
Drawings
A further understanding of the nature and advantages of the present description may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
FIG. 1 illustrates an exemplary architecture of a method and apparatus for determining a composite index according to an embodiment of the present description.
FIG. 2 illustrates a flow chart of a method for determining a composite index according to an embodiment of the present description.
Fig. 3 shows a schematic diagram of yet another example of a method for determining a composite index according to an embodiment of the present description.
FIG. 4 shows a schematic diagram of one example of a process for determining index optimization benefits according to embodiments of the present description.
Fig. 5 shows a block diagram of one example of an apparatus for determining a composite index according to an embodiment of the present description.
Fig. 6 shows a block diagram of yet another example of an apparatus for determining a composite index according to an embodiment of the present disclosure.
Fig. 7 shows a block diagram of one example of a benefit determining unit in an apparatus for determining a composite index according to an embodiment of the present specification.
Fig. 8 shows a schematic diagram of an apparatus for determining a composite index according to an embodiment of the present disclosure.
Detailed Description
The subject matter described herein will be discussed below with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable a person skilled in the art to better understand and thereby practice the subject matter described herein, and are not limiting of the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the embodiments herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.
As used herein, the term "comprising" and variations thereof mean open-ended terms, meaning "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment. The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout this specification.
In this specification, the term "composite index" may refer to an index based on a plurality of attributes (or fields) in a database table. It should be noted that even though the attributes are the same, the order is different, and the query speed is also different, and the compound indexes are different.
In this specification, the term "overhead" may be used to evaluate how busy an operation is to the system for a database table, which may include, but is not limited to, at least one of CPU usage, disk read-write, duration (duration) in a query.
Methods and apparatuses for determining a composite index according to embodiments of the present specification will be described in detail below with reference to the accompanying drawings.
FIG. 1 illustrates an exemplary architecture 100 of a method and apparatus for determining a composite index according to an embodiment of the present disclosure.
In fig. 1, a network 110 is applied for interconnecting between terminal devices 121, 122 and a database server 130.
Network 110 may be any type of network capable of interconnecting network entities. The network 110 may be a single network or a combination of networks. In terms of coverage, network 110 may be a Local Area Network (LAN), wide Area Network (WAN), or the like. In terms of a carrier medium, the network 110 may be a wired network, a wireless network, or the like. In terms of data switching technology, the network 110 may be a circuit switched network, a packet switched network, or the like.
Terminal devices 121, 122 may be any type of electronic computing device capable of connecting to network 110, accessing servers or websites on network 110, processing data or signals, and the like. For example, the terminal devices 121, 122 may be desktop computers, notebook computers, tablet computers, smart phones, etc. Although only a few terminal devices are shown in fig. 1, it should be understood that there may be a different number of terminal devices connected to the network 110.
In one embodiment, the terminal devices 121, 122 may be used by a user. Terminal devices 121, 122 may include application clients (e.g., database client 1211, database client 1221) that may provide various services to users. In some cases, database clients 1211, 1221 may interact with database server 130. For example, database clients 1211, 1221 may transmit user-entered messages to database server 130 and receive responses associated with the messages from database server 130. Herein, a "message" may refer to any input information, such as a query statement or the like.
The database server 130 may process the received set of query terms 131, for example, a target composite index 1322 may be determined from the candidate composite index set 132 based on index optimization benefits of each composite index (e.g., composite index 1321, composite index 1322, composite index 1323, etc.) in the candidate composite index set 132 under each query term in the set of query terms 131. Alternatively, database server 130 may obtain query results for query statement set 131 from database table 133 using determined target compound index 1322 in response.
It should be appreciated that all network entities shown in fig. 1 are exemplary and that any other network entity may be involved in architecture 100, depending on the particular application requirements.
FIG. 2 illustrates a flow chart of a method 200 for determining a composite index according to an embodiment of the present disclosure.
As shown in fig. 2, at 210, a set of query statements for a target database is obtained.
In this embodiment, the query statement set for the target database may be acquired in various ways. The target database may be a pre-designated database or may be a database dynamically determined according to a rule. The set of query statements described above may typically include a plurality of query statements.
In the present embodiment, as an example, a table named "Sales" may be included in the above target database. A number of attributes may be included in the table named "Sales", for example (OrderID, price, discover, city, shipdate). The set of query statements for the table named "Sales" in the target database described above may include a plurality of query statements as follows:
q 1 =select SUM (price×discover) FROM Sales WHERE City = 'beijin' AND Shipdate BETWEEN '01-01-2021' and '12-31-2021';
q 2 = SELECT SUM(Price * Discount) FROM Sales WHERE Price >=1000 AND city= 'beijin' AND Shipdate BETWEEN '01-01-2021' AND '12-31-2021';
q 3 = SELECT SUM(Price * Discount) FROM Sales WHERE Discount >=0.6 AND city= 'beijin' AND Shipdate BETWEEN '01-01-2021' AND '12-31-2021'.
It should be noted that the query statement may have different forms according to the database, which is not limited herein.
At 220, index optimization benefits for each compound index in the candidate compound index set under each query statement in the query statement set are determined.
In this embodiment, the index optimization benefit of each compound index in the candidate compound index set under each query term in the query term set may be determined. The index optimization benefit may be used to reflect query cost benefits resulting from introducing a composite index based on the inherent index of the target database to query the target database based on query statements. The inherent index may refer to an index used by the target database before the composite index is introduced, and may be, for example, a primary key index. The query cost benefits may include, but are not limited to, at least one of: the inquiry time is saved, the data quantity of the disk read-write is reduced, the CPU occupancy rate is reduced, and the memory usage amount is reduced.
In this embodiment, as an example, the composite index in the candidate composite index set may include an indexI 1 = (City, shipdate), indexI 2 = (City, price, shipdate), indexI 3 = (City, discovery, date) and indexI 4 = (City, price, discover, clip). Thus, the index optimization benefit of a compound index under a query statement may be written, for example, as
Figure 608232DEST_PATH_IMAGE001
. Wherein, the->
Figure 607413DEST_PATH_IMAGE002
Can be used for characterizing the firstiAnd query sentences. Above->
Figure 157343DEST_PATH_IMAGE003
Can be used for characterizing the firstjAnd (3) a composite index. In the above-described example of the present invention,ithe value of (1, 2, 3),jthe value of (2) may be {1,2,3,4}.
It should be noted that the candidate compound index set is usually logical, i.e. usually does not actually occupy physical storage space. And the corresponding physical storage space is usually actually occupied when the target composite index determined in the candidate composite index set is created.
In some implementations of this embodiment, the index optimization benefit may be determined according to a query cost based on the inherent index and a query cost after introducing the composite index based on the inherent index. As an example, the index optimization benefit may be a difference between a query cost of the inherent index and a query cost of the composite index introduced based on the inherent index. As yet another example, the index optimization benefit may be a ratio of a query cost of the inherent index to a query cost of the composite index introduced based on the inherent index. As yet another example, the index optimization benefit may be a ratio between a difference between a query cost of the inherent index and a query cost of the composite index introduced based on the inherent index and a query cost of the inherent index. The query cost may include at least one of a query time overhead and a computation overhead required for the query. The query time overhead may include a query time. The calculation overhead required by the inquiry can comprise memory occupancy rate, disk read-write data quantity, CPU occupancy rate and the like.
Therefore, the scheme can determine the index optimization benefits based on the query cost of the inherent index and the query cost after the composite index is introduced on the basis of the inherent index, and enriches the determination modes of the index optimization benefits.
At 230, a target composite index is determined from the candidate composite index set based on the index optimization benefits of each composite index.
In this embodiment, the target composite index may be determined from the candidate composite index set in various ways, depending on the index optimization benefits of the respective composite indexes. As an example, a composite index corresponding to the target number of index optimization benefits with the largest numerical value in the candidate composite index set may be determined as the target composite index. As yet another example, a composite index corresponding to an index optimization benefit in the candidate composite index set greater than a preset index optimization benefit threshold may be determined as the target composite index.
In some optional implementations of the present embodiment, the target composite index may be determined from the candidate composite index set based on the index optimization benefits of each composite index and the storage costs of each composite index. Each composite index in the candidate composite index set may correspond to a storage cost having a storage space required for characterization (e.g., may be useds i Representation). For example, an index as in the previous exampleI 1I 2I 3 AndI 4 the corresponding storage costs may be 65 GB, 85GB, 105 GB and 150 GB, respectively. Thus, it can be indexed according to each compoundThe index optimization benefits and the storage cost of each composite index determine the target composite index from the candidate composite index set in various ways. As an example, a composite index corresponding to a larger index optimization benefit (for example, the index optimization benefit of the target number with the largest value or the index optimization benefit greater than the preset index optimization benefit threshold in the foregoing example) may be selected from the candidate composite index set, and then the composite index with the smallest storage cost in the selected composite index is determined as the target composite index. As yet another example, index optimization benefits corresponding to each composite index may be determined from the determined index optimization benefits of the composite index under the query statement. The index optimization benefit corresponding to the composite index may be determined in various manners, for example, the maximum value, the median, the average value, etc. of the index optimization benefit of the composite index under each query statement may be determined. And then, the target number of composite indexes can be selected according to the index optimization benefits corresponding to the composite indexes on the premise of meeting the storage cost requirement.
Therefore, the target composite index can be comprehensively determined from two aspects of index optimization benefits and storage costs, so that the determination modes of the composite index are enriched, and the application effect of the determined target composite index is improved.
In some optional implementations of this embodiment, the target composite index may be determined from the candidate composite index set described above, such that the sum of the index optimization benefits of each target composite index is maximized if the index capacity constraint is met. The above-mentioned index capacity constraint may include that the sum of storage costs of the respective target compound indexes does not exceed the index capacity constraint value. As an example, the index optimization benefits corresponding to the respective composite indexes may be first determined from the determined index optimization benefits of the composite indexes under the query statement (e.g., may be usedprof i Representation). The index optimization benefit corresponding to the composite index may be, for example, the maximum value, the median, the average value, etc. of the index optimization benefit of the composite index under each query statement. The above-mentioned index capacity limitation condition can be, for example, that the sum of the storage spaces required by the target compound index is not greater thanPreset values (for example, can be usedcapacityIndicated), e.g., 200 GB. Thus, determining the target composite index from the candidate composite index set described above may be accomplished by solving the following optimization problem:
Figure 773132DEST_PATH_IMAGE004
Figure 231489DEST_PATH_IMAGE005
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure 350755DEST_PATH_IMAGE006
when->
Figure 681373DEST_PATH_IMAGE007
When the value of (1) is 1, this is indicated to be determined>
Figure 784459DEST_PATH_IMAGE007
Corresponding->
Figure 616148DEST_PATH_IMAGE008
The indicated composite index is the target composite index. Of the formula (I)nThe number of composite indexes in the candidate composite index set that participate in solving the optimization problem may be represented, for example, the number of all or part of the composite indexes in the candidate composite index set.
Therefore, the target compound index can be determined from the candidate compound index set by solving the optimization problem that the sum of the index optimization benefits of each target compound index is the largest under the condition that the index capacity limiting condition is met, so that the determination mode of the target compound index is enriched, and the performance improvement of querying the target database for the query statement set is facilitated through the determined target compound index.
Fig. 3 shows a schematic diagram of yet another example of a method 300 for determining a composite index according to an embodiment of the present disclosure.
As shown in fig. 3, at 310, a set of query statements for a target database is obtained.
At 320, index optimization benefits for each compound index in the candidate compound index set under each query statement in the query statement set are determined.
In this embodiment, the above steps 310 and 320 may refer to the corresponding descriptions of the steps 210 and 220 in the foregoing embodiments, respectively, and are not repeated here.
At 330, a preferred composite index set is determined from the candidate composite index sets based on the index optimization benefits of each composite index.
In this embodiment, the index optimization benefit of each preferred composite index in the preferred composite index set is better than the index optimization benefits of all other composite indexes in the candidate composite index set that are not in the preferred composite index set. The preferred composite index set may be determined from the candidate composite index sets in various ways based on the index optimization benefits of each composite index. As an example, an index optimization benefit meeting a preset selection requirement may be selected from the candidate compound index set. The predetermined selection requirement may be that the index optimization benefit is greater than a predetermined benefit threshold, or that the value of the index optimization benefit belongs to the first 10% of the ranking from high to low. The selected index optimization benefits may then be counted according to the corresponding composite index. For example, the selected index optimization benefits may include
Figure 619614DEST_PATH_IMAGE009
Figure 917872DEST_PATH_IMAGE010
、/>
Figure 445936DEST_PATH_IMAGE011
、/>
Figure 19000DEST_PATH_IMAGE012
And->
Figure 175175DEST_PATH_IMAGE013
. Then compound indexI 1I 2I 3 The corresponding counts may be 1,2, respectively. The counts may then be ranked topKKFor a positive integer) complex indices are determined as complex indices (e.g., complex indices) in a preferred complex index setI 2 AndI 3 )。
at 340, a target composite index is determined from the set of preferred composite indices based on the index optimization benefits of each preferred composite index.
In this embodiment, the target composite index may be determined from the set of preferred composite indices in various ways, depending on the index optimization benefits of each preferred composite index. Reference may be made, for example, to the corresponding description of step 230 described above, which is not repeated here.
Alternatively, the target composite index may be determined with reference to solving the optimization problem in an alternative implementation of step 230 described previously. In this casenThe number of preferred compound indexes in the preferred compound index set is the number of preferred compound indexes in the preferred compound index set, and accordingly, the number of preferred compound indexes in the preferred compound index set is the number of preferred compound indexes in the preferred compound index setprof i Index optimization benefits corresponding to the preferred compound indexes in the preferred compound index set may be represented.
Based on the above, the method enriches the determination mode of the target composite index by determining the preferred composite index set from the candidate composite index set and further determining the target composite index. Further, the target composite index is determined by determining the optimal composite index set according to the index optimization benefits and solving the optimization problem according to the limitations of the index optimization benefits and the storage cost, and the effectiveness of the determined target composite index and the determination efficiency of the target composite index are considered.
FIG. 4 illustrates a schematic diagram of one example of a process 400 for determining index optimization benefits in accordance with an embodiment of the present description.
As shown in fig. 4, at 410, a query term-compound index pair is constructed from each index of the candidate compound index set and each query term of the query term set.
In the present embodiment, the above-described query statement-composite index pair may be used, for example<
Figure 674027DEST_PATH_IMAGE014
>And (3) representing. Wherein, the->
Figure 220546DEST_PATH_IMAGE002
Can be used for characterizing the firstiAnd query sentences. Above->
Figure 534984DEST_PATH_IMAGE003
Can be used for characterizing the firstjAnd (3) a composite index. As in the case of the examples described in the foregoing,ithe value of (1, 2, 3),jthe value of (2) may be {1,2,3,4}.
At 420, the constructed query statement-composite index pairs are provided to an index optimization benefit prediction model, resulting in index optimization benefits for each composite index under each query statement in the set of query statements.
In this embodiment, the constructed query statement-composite index pair may be provided to an index optimization benefit prediction model, so as to obtain an index optimization benefit of each composite index under each query statement in the query statement set. The index optimization benefit prediction model can be used for representing the corresponding relation between index optimization benefit and query statement-composite index pair. The index-optimized gain prediction model may include various pre-trained machine learning models, such as a support vector machine (Support Vector Machine, SVM) and K-Nearest-Neighbor (KNN) algorithms, etc.
In some optional implementations of this embodiment, the index-optimized revenue prediction model may be trained based on a set of historical query statements and a set of historical candidate composite indices for the target database. A training sample set for training the index-optimized revenue prediction model may be generated based on the set of historical query statements and the set of historical candidate composite indices for the target database. As an example, for each history query statement in the obtained set of history query statements for the target database, a history index optimization benefit of the history query statement using the candidate compound index (equivalent to a "history query statement-history candidate compound index pair") may be determined based on a base query cost corresponding to the history query statement when querying without using the obtained set of history candidate compound indices and a compound index query cost when querying using the history candidate compound index of the set of history candidate compound indices. Since the basic query cost value and the composite index query cost can be specifically checked through various performance analysis tools (for example, using showprofile in MySQL), a history index optimization benefit corresponding to each history query statement-history candidate composite index pair can be obtained. Therefore, the training sample set can be utilized for supervised training to obtain the index optimization profit prediction model.
Based on the above, the method can obtain the index optimization benefit prediction model based on the training of the historical query statement set and the historical candidate compound index set aiming at the target database, so that the determination mode of the index optimization benefit of each compound index under each query statement in the query statement set is enriched, the index optimization benefit is determined by introducing the machine learning model, a more accurate index optimization benefit prediction result can be obtained on the basis that each candidate index is not required to be actually created, the cost for determining the target compound index is greatly saved, and the efficiency for determining the target compound index is improved.
Returning to fig. 2, in some alternative implementations of the present embodiment, additional indexes for the target data set may also be created according to the determined target composite index, so as to improve efficiency when querying the target database based on the query statement in the query statement set.
With the method for determining composite indexes disclosed in fig. 1 to 4, it is possible to introduce composite indexes under each of the obtained query sentences in the query sentence set for the target database by determining each composite index in the candidate composite index set on the basis of the inherent indexes for reflecting the target database, to optimize the benefits of query cost benefits brought by querying the target database based on the query sentences, and to use the determined index optimizing benefits as the main basis for determining the target composite index from the candidate composite index set. Compared with the prior art which mainly tries to set the compound index according to manual experience, the method has higher efficiency and saves labor cost especially in the application of a large database relation table.
In addition, since the index optimization benefits determined by the method can be different according to different query sentences, the target composite index more suitable for the current load can be flexibly determined according to the change of the workload (i.e. query sentences) of the target database, and the query efficiency of the target database can be effectively improved.
Fig. 5 shows a block diagram of one example of an apparatus 500 for determining a composite index according to an embodiment of the present disclosure. The apparatus embodiment may correspond to the method embodiments shown in fig. 2-4, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for determining a composite index includes an acquisition unit 510, a benefit determination unit 520, and an index determination unit 530.
The acquisition unit 510 may be configured to acquire a set of query statements for a target database. The operation of the acquisition unit 510 may refer to the operation of step 210 described above with reference to fig. 2.
The benefit determination unit 520 may be configured to determine an index optimization benefit for each compound index in the candidate compound index set under each query term in the query term set. The index optimization benefits can be used for reflecting query cost benefits brought by introducing a composite index based on the inherent index of the target database to query the target database based on query statements. The operation of the benefit determining unit 520 may refer to the operation of step 220 described above with respect to fig. 2.
The index determination unit 530 may be configured to determine a target composite index from the candidate composite index set according to the index optimization benefits of the respective composite indexes. The operation of the index determination unit 530 may refer to the operation of step 230 described above with reference to fig. 2.
In one example, the index optimization benefits described above may be determined from query costs based on the inherent index and query costs after introducing the composite index based on the inherent index. The query cost may include at least one of a query time overhead and a computation overhead required for the query.
In one example, the above-described index determination unit 530 may be further configured to determine the target composite index from the candidate composite index set according to the index optimization benefits of the respective composite indexes and the storage costs of the respective composite indexes. The operation of the index determination unit 530 described above may refer to the corresponding description of an alternative implementation in step 230 in the embodiment described above with respect to fig. 2.
In one example, the above-described index determination unit 530 may be further configured to determine the target composite index from the candidate composite index set such that the sum of the index optimization benefits of the respective target composite indexes is maximized in the case where the index capacity constraint condition including that the sum of the storage costs of the respective target composite indexes does not exceed the index capacity constraint value is satisfied. The operation of the index determination unit 530 described above may refer to the corresponding description of an alternative implementation in step 230 in the embodiment described above with respect to fig. 2.
In one example, fig. 6 shows a block diagram of yet another example of an apparatus 600 for determining a composite index according to an embodiment of the present disclosure. The above-described apparatus 600 for determining a composite index may include an acquisition unit 610, a benefit determination unit 620, a preference index determination unit 630, and an index determination unit 640. The operations of the above-described acquisition unit 610 and the benefit determination unit 620 may refer to the operations of step 210 and step 220, respectively, described above with reference to fig. 2. The preferred index determining unit 630 may be configured to determine a preferred composite index set from the candidate composite index sets according to the index optimization benefits of the respective composite indexes. Wherein the index optimization benefits of each preferred composite index in the preferred composite index set may be superior to the index optimization benefits of all other composite indexes in the candidate composite index set that are not in the preferred composite index set. The above-described index determination unit 640 may be further configured to determine a target composite index from the set of preferred composite indexes according to the index optimization benefits of the respective preferred composite indexes. The operation of the above-described preferred index determination unit 630 and index determination unit 640 may be described with reference to the respective ones of steps 330 and 340 in the embodiment described above with reference to fig. 3.
In one example, fig. 7 shows a block diagram of one example of a benefit determining unit 700 in an apparatus for determining a composite index according to an embodiment of the present specification. The above-described benefit determining unit 700 may include: a construction module 710 configured to construct a query statement-composite index pair from each index of the candidate composite index set and each query statement of the query statement set; the benefit prediction module 720 is configured to provide the constructed query statement-composite index pair to the index optimization benefit prediction model, so as to obtain index optimization benefits of each composite index under each query statement in the query statement set. The operation of the above-described benefit determining unit 700 may refer to the index optimization benefit determining process of the embodiment described above with reference to fig. 4.
In one example, the index-optimized revenue prediction model may be trained based on a set of historical query statements and a set of historical candidate composite indices for the target database. The index-optimized revenue prediction model described above may refer to the corresponding description in the alternative implementation of step 420 in the embodiment described above in FIG. 4.
Embodiments of a method and apparatus for determining a composite index according to embodiments of the present specification are described above with reference to fig. 1 through 7.
The means for determining a composite index in the embodiments of the present disclosure may be implemented in hardware, or may be implemented in software, or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a memory into a memory by a processor of a device where the device is located. In the embodiment of the present specification, the means for determining the composite index may be implemented using an electronic device, for example.
Fig. 8 shows a schematic diagram of an apparatus 800 for determining a composite index according to an embodiment of the present disclosure.
As shown in fig. 8, an apparatus 800 for determining a composite index may include at least one processor 810, a memory (e.g., a non-volatile memory) 820, a memory 830, and a communication interface 840, and the at least one processor 810, the memory 820, the memory 830, and the communication interface 840 are connected together via a bus 850. At least one processor 810 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.
In one embodiment, computer-executable instructions are stored in memory that, when executed, cause the at least one processor 810 to: acquiring a query statement set aiming at a target database; determining index optimization benefits of each compound index in the candidate compound index set under each query statement in the query statement set, wherein the index optimization benefits are used for reflecting query cost benefits brought by introducing the compound index on the basis of the inherent index of the target database to query the target database based on the query statement; and determining a target composite index from the candidate composite index set according to the index optimization benefits of each composite index.
It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 810 to perform the various operations and functions described above in connection with fig. 1-7 in various embodiments of the present specification.
According to one embodiment, a program product, such as a computer readable medium, is provided. The computer-readable medium may have instructions (i.e., the elements described above implemented in software) that, when executed by a computer, cause the computer to perform the various operations and functions described above in connection with fig. 1-7 in various embodiments of the present specification.
In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium may implement the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.
Computer program code required for operation of portions of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, VB, NET, python and the like, a conventional programming language such as C language, visual Basic 2003, perl, COBOL 2002, PHP and ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute on the user's computer or as a stand-alone software package, or it may execute partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the connection may be made to the cloud computing environment, or for use as a service, such as software as a service (SaaS).
Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Not all steps or units in the above-mentioned flowcharts and system configuration diagrams are necessary, and some steps or units may be omitted according to actual needs. The order of execution of the steps is not fixed and may be determined as desired. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented jointly by some components in multiple independent devices.
The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The alternative implementation manner of the embodiment of the present disclosure has been described in detail above with reference to the accompanying drawings, but the embodiment of the present disclosure is not limited to the specific details of the foregoing implementation manner, and various simple modifications may be made to the technical solution of the embodiment of the present disclosure within the scope of the technical concept of the embodiment of the present disclosure, and all the simple modifications belong to the protection scope of the embodiment of the present disclosure.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for determining a composite index, comprising:
acquiring a query statement set aiming at a target database;
determining index optimization benefits of each compound index in a candidate compound index set under each query statement in the query statement set, wherein the index optimization benefits are used for reflecting query cost benefits brought by introducing the compound index on the basis of the inherent index of the target database to query the target database based on the query statement; and
determining a target composite index from the candidate composite index set according to index optimization benefits of each composite index;
wherein the determining the target composite index from the candidate composite index set according to the index optimization benefits of each composite index comprises:
determining target compound indexes from the candidate compound index set so as to maximize the sum of index optimization benefits of each target compound index under the condition that index capacity limiting conditions are met, wherein the index capacity limiting conditions comprise that the sum of storage costs of each target compound index does not exceed an index capacity limiting value, and the index optimization benefits of the target compound indexes are obtained by calculating according to the index optimization benefits of the target compound indexes under each query statement in the query statement set;
wherein determining the index optimization benefit of each compound index in the candidate compound index set under each query statement in the query statement set comprises:
constructing a query statement-composite index pair according to each index of the candidate composite index set and each query statement of the query statement set; and
providing the constructed query statement-composite index pair to an index optimization benefit prediction model to obtain index optimization benefits of each composite index under each query statement in the query statement set, wherein the index optimization benefit prediction model is obtained based on historical query statement sets and historical candidate composite index sets for the target database through training.
2. The method of claim 1, wherein the index optimization benefit is determined from a query cost based on the inherent index and a query cost after introducing a composite index based on the inherent index, the query cost including at least one of query time overhead and query required computational overhead.
3. The method of claim 1, further comprising:
determining a preferred composite index set from the candidate composite index set based on the index optimization benefits of each composite index, the index optimization benefits of each preferred composite index in the preferred composite index set being superior to the index optimization benefits of all other composite indexes in the candidate composite index set not in the preferred composite index set,
the determining the target composite index from the candidate composite index set according to the index optimization benefits of each composite index comprises:
and determining a target composite index from the preferred composite index set according to index optimization benefits of each preferred composite index.
4. An apparatus for determining a composite index, comprising:
an acquisition unit configured to acquire a query statement set for a target database;
a benefit determining unit configured to determine an index optimization benefit of each compound index in a candidate compound index set under each query statement in the query statement set, wherein the index optimization benefit is used for reflecting query cost benefit brought by introducing a compound index to query the target database based on a query statement on the basis of an inherent index of the target database; and
an index determining unit configured to determine a target composite index from the candidate composite index set according to an index optimization benefit of each composite index;
wherein the index determination unit is further configured to:
determining target compound indexes from the candidate compound index set so as to maximize the sum of index optimization benefits of each target compound index under the condition that index capacity limiting conditions are met, wherein the index capacity limiting conditions comprise that the sum of storage costs of each target compound index does not exceed an index capacity limiting value, the index optimization benefits of the target compound indexes are calculated according to the index optimization benefits of the target compound indexes under each query statement in the query statement set,
wherein the benefit determining unit includes:
a construction module configured to construct a query statement-composite index pair from each index of the candidate composite index set and each query statement of the query statement set; and
and the profit prediction module is configured to provide the constructed query statement-composite index pair to an index optimization profit prediction model to obtain index optimization profit of each composite index under each query statement in the query statement set, wherein the index optimization profit prediction model is obtained based on the historical query statement set and the historical candidate composite index set training aiming at the target database.
5. The apparatus of claim 4, wherein the apparatus further comprises:
a preferred index determination unit configured to determine a preferred composite index set from the candidate composite index sets according to index optimization benefits of respective composite indexes, the index optimization benefits of respective preferred composite indexes in the preferred composite index set being superior to index optimization benefits of all other composite indexes in the candidate composite index set that are not in the preferred composite index set,
the index determination unit is further configured to determine a target composite index from the set of preferred composite indexes according to an index optimization benefit of each preferred composite index.
6. An apparatus for determining a composite index, comprising: at least one processor, a memory coupled with the at least one processor, and a computer program stored on the memory, the at least one processor executing the computer program to implement the method of any one of claims 1 to 3.
7. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any one of claims 1 to 3.
CN202210793671.2A 2022-07-07 2022-07-07 Method and apparatus for determining a composite index Active CN115114295B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210793671.2A CN115114295B (en) 2022-07-07 2022-07-07 Method and apparatus for determining a composite index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210793671.2A CN115114295B (en) 2022-07-07 2022-07-07 Method and apparatus for determining a composite index

Publications (2)

Publication Number Publication Date
CN115114295A CN115114295A (en) 2022-09-27
CN115114295B true CN115114295B (en) 2023-07-14

Family

ID=83333176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210793671.2A Active CN115114295B (en) 2022-07-07 2022-07-07 Method and apparatus for determining a composite index

Country Status (1)

Country Link
CN (1) CN115114295B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9773032B2 (en) * 2011-09-30 2017-09-26 Bmc Software, Inc. Provision of index recommendations for database access
CN113407801B (en) * 2021-06-04 2023-11-28 跬云(上海)信息科技有限公司 Cloud computing index recommendation method and system

Also Published As

Publication number Publication date
CN115114295A (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN109766497B (en) Ranking list generation method and device, storage medium and electronic equipment
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
US20210150415A1 (en) Feature selection method, device and apparatus for constructing machine learning model
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US10169715B2 (en) Feature processing tradeoff management
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN108009933B (en) Graph centrality calculation method and device
CN102906736A (en) System and method for matching entities and synonym group organizer used therein
US8359280B2 (en) Diversifying recommendation results through explanation
CN104077723B (en) A kind of social networks commending system and method
JP2013534334A (en) Method and apparatus for sorting query results
WO2022252782A1 (en) Cloud computing index recommendation method and system
CN111651641B (en) Graph query method, device and storage medium
CN111427971A (en) Business modeling method, device, system and medium for computer system
WO2017092444A1 (en) Log data mining method and system based on hadoop
JP2022137281A (en) Data query method, device, electronic device, storage medium, and program
CN107291835B (en) Search term recommendation method and device
Wang et al. Data cache optimization model based on cyclic genetic ant colony algorithm in edge computing environment
CN104077288A (en) Web page content recommendation method and web page content recommendation equipment
Bao et al. Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency
CN115114295B (en) Method and apparatus for determining a composite index
WO2023185125A1 (en) Product resource data processing method and apparatus, electronic device and storage medium
US20140214826A1 (en) Ranking method and system
CN113961797A (en) Resource recommendation method and device, electronic equipment and readable storage medium
US8880546B1 (en) System, method, and computer program for refining a set of keywords utilizing historical activity thresholds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant