JP7358685B1

JP7358685B1 - Intelligent index tuning methods and systems for relational databases

Info

Publication number: JP7358685B1
Application number: JP2023015860A
Authority: JP
Inventors: 劉細涓; 楊晨
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2022-05-23
Filing date: 2023-02-06
Publication date: 2023-10-11
Anticipated expiration: 2043-02-06
Also published as: JP2023172870A; CN115062011A

Abstract

【課題】データベース・インスタンスのインデックスを動的に調整して、インデックスを効果的に最適化し、クエリ性能を向上するインテリジェント・インデックス・チューニング方法及びシステムを提供する。【解決手段】方法は、クエリログ内の各クエリ文を解析してクエリ文解析ツリーを生成し、クエリ文解析ツリーの集合を得て、クエリ文解析ツリー集合に基づいて、候補インデックス集合を生成し、候補インデックスの集合内の各候補インデックスのインデックス最適化値を決定し、候補インデックス集合の中から、少なくとも1つの候補インデックスの中の各候補インデックスのインデックスサイズの和がプリセットしきい値以下であり、かつ、各候補インデックスのインデックス最適化値の和が最大であるプリセット条件を満たす少なくとも1つの候補インデックスを調整インデックスセットとして選択し、チューニング・インデックス集合を目標データベースに保存する。【選択図】図１The present invention provides an intelligent index tuning method and system that dynamically adjusts indexes of a database instance to effectively optimize the indexes and improve query performance. [Solution] The method analyzes each query sentence in a query log to generate a query sentence parsing tree, obtains a set of query sentence parsing trees, and generates a candidate index set based on the set of query sentence parsing trees. , determine the index optimization value of each candidate index in the set of candidate indexes, and determine the index optimization value of each candidate index in the set of candidate indexes, and the sum of the index sizes of each candidate index in at least one candidate index from among the set of candidate indexes is less than or equal to a preset threshold; , and at least one candidate index that satisfies the preset condition that the sum of the index optimization values of each candidate index is maximum is selected as the tuning index set, and the tuning index set is stored in the target database. [Selection diagram] Figure 1

Description

本開示の実施例はコンピュータ技術分野に関し、具体的にはリレーショナル・データベー
ス向けのインテリジェント・インデックス・チューニング方法及びシステムに関する。 TECHNICAL FIELD Embodiments of the present disclosure relate to the field of computer technology, and specifically to intelligent index tuning methods and systems for relational databases.

インテリジェント・インデックス・チューニングは、データベース・インスタンスのイン
デックス調整を行い、データベースの効率的なクエリ・パフォーマンスを維持するための
テクノロジーである。現在、インデックスのチューニングでは、クエリ・ログから推奨イ
ンデックスを取得し、手動で設計されたモデルを使用してインデックスを選択してインデ
ックスを調整する方法が一般的である。
しかしながら、上述のようにインデックス・チューニングを行う場合には、次のような技
術的問題がしばしば存在する。
第一に、データベースインスタンスのインデックスを動的に調整できないため、インデッ
クスを効果的に調整できず、リアルタイムでデータクエリの効率を向上できない。
第二に、クエリ・ログから生成された推奨インデックスがデータベース・インスタンスに
存在しない場合、そのような実在しないインデックスに基づいてクエリのクエリ最適化効
果に対して最適なインデックスを正確に生成することはできない。
第三に、大量の候補インデックスから最適なインデックスを迅速に生成できず、インデッ
クスを生成する速度が遅くなることである。 Intelligent index tuning is a technology for adjusting indexes in a database instance to maintain efficient query performance of the database. Currently, a common method for index tuning is to obtain index recommendations from query logs, use manually designed models to select indexes, and tune the indexes.
However, when performing index tuning as described above, the following technical problems often exist.
First, the indexes of the database instance cannot be adjusted dynamically, so the indexes cannot be adjusted effectively and the efficiency of data queries cannot be improved in real time.
Second, if the recommended index generated from the query log does not exist in the database instance, it is impossible to accurately generate the optimal index for the query optimization effect of the query based on such non-existent index. Can not.
Third, it is not possible to quickly generate an optimal index from a large number of candidate indexes, and the speed of index generation becomes slow.

本開示のいくつかの実施形態は、上記の背景技術で言及された技術的問題のうちの１つま
たは複数を解決するために、リレーショナル・データベース向けのインテリジェント・イ
ンデックス・チューニング方法およびシステムを提案する。
第１の態様では、本開示のいくつかの実施形態は、クエリー文解析ツリーを生成し、クエ
リー文解析ツリー集合を得るためにクエリーログ内の各クエリー文を解析する。クエリー
文解析ツリー集合を得るためにクエリー文解析ツリーを生成する。上記クエリ文解析ツリ
ー集合に基づいて、候補インデックス集合を生成する。上述の候補インデックス集合中の
各候補インデックスのインデックス最適化値を決定する。前記候補インデックス集合から
、前記少なくとも１つの候補インデックスの各候補インデックスのインデックスサイズの
和が予め設定された閾値以下であり、前記各候補インデックスのインデックス最適化値の
和が最大である予め設定された条件を満たす少なくとも１つの候補インデックスを、予め
設定されたインデックス集合として選択する。上記のチューニング・インデックス集合を
目標データベースに保存する。
第２の態様では、本開示のいくつかの実施形態は、クエリー文解析ツリーを生成し、クエ
リー文解析ツリー集合を得るためにクエリーログ中の各クエリー文を解析するように構成
された解析ユニットと、前記クエリ文に基づいてツリー集合を解析し、候補インデックス
集合を生成するように構成された生成ユニットと、前記候補インデックス集合中の各候補
インデックスのインデックス最適化値を決定するように構成された決定ユニットと、前記
候補インデックス集合の中から、調整インデックス集合として予め設定された条件を満た
す少なくとも１つの候補インデックスを選択する選択ユニットと、上記のチューニング・
インデックス集合を目標データベースに格納するように構成された記憶ユニットと、を含
む。
第３の態様では、本開示のいくつかの実施形態は、１つまたは複数のプロセッサ、１つ以
上のプログラムが１つ以上のプロセッサによって実行され、１つ以上のプロセッサが上記
第１の態様のいずれかに記載の方法を実現するようにする１つ以上のプログラムが格納さ
れている記憶装置である。
第４の態様では、本開示のいくつかの実施形態は、プロセッサによって実行されるときに
上述の第１の態様のいずれかに記載の方法を実装するコンピュータプログラムを格納した
コンピュータ可読媒体である。 Some embodiments of the present disclosure propose an intelligent index tuning method and system for relational databases to solve one or more of the technical problems mentioned in the background section above. .
In a first aspect, some embodiments of the present disclosure generate a query sentence parse tree and parse each query sentence in a query log to obtain a set of query sentence parse trees. A query sentence parsing tree is generated to obtain a set of query sentence parsing trees. A candidate index set is generated based on the above query sentence analysis tree set. Determine an index optimization value for each candidate index in the candidate index set described above. From the candidate index set, the sum of index sizes of each candidate index of the at least one candidate index is less than or equal to a preset threshold, and the sum of index optimization values of each candidate index is the largest. At least one candidate index that satisfies the condition is selected as a preset index set. Save the above tuning index set to the target database.
In a second aspect, some embodiments of the present disclosure provide a parsing unit configured to generate a query sentence parse tree and parse each query sentence in a query log to obtain a set of query sentence parse trees. a generation unit configured to parse a tree set based on the query statement and generate a candidate index set; and a generation unit configured to determine an index optimization value for each candidate index in the candidate index set. a selection unit that selects at least one candidate index that satisfies a preset condition as an adjustment index set from among the candidate index set;
a storage unit configured to store the index set in the target database.
In a third aspect, some embodiments of the present disclosure provide for one or more processors, one or more programs being executed by the one or more processors, and one or more processors according to the first aspect above. A storage device in which one or more programs for implementing any of the methods described above are stored.
In a fourth aspect, some embodiments of the present disclosure are a computer readable medium storing a computer program that, when executed by a processor, implements a method according to any of the first aspects above.

本開示の上述の各実施形態には、本開示のいくつかの実施形態のリレーショナル・データ
ベース向けのインテリジェントインデックス・チューニング方法により、インデックスを
効果的に調整し、リアルタイムでデータクエリレートを向上させることができるという有
益な効果がある。特に、インデックスを効果的に調整できない理由は、データベース・イ
ンスタンスのインデックスを動的に調整できない。これに基づいて、本開示のいくつかの
実施形態のリレーショナル・データベース向けのインテリジェント・インデックス・チュ
ーニング方法は、まず、クエリーログ内の各クエリー文を解析して、クエリー文解析ツリ
ーを生成し、クエリー文解析ツリー集合を得ることを含む。したがって、適用されるクエ
リー・ログを分析することで、データベース・インスタンスのインデックスを動的に調整
できる。そして、上記クエリ文解析ツリー集合に基づいて、候補インデックス集合を生成
する。したがって、クエリの最適化に最も効果的なインデックスの集合を選択することが
できる。次に、上記の候補インデックス集合の各候補インデックスのインデックス最適化
値を決定する。したがって、各候補インデックスをクエリのクエリ最適化効果に対して量
子化し、各候補インデックスのクエリ最適化効果を視覚的に表現することができる。次に
、上記候補インデックス集合の中から、チューニング・インデックス集合として、予め設
定された条件を満たす少なくとも１つの候補インデックスを選択する。前記予め設定され
た条件は、前記少なくとも１つの候補インデックスにおける各候補インデックスのインデ
ックスサイズの和が予め設定された閾値以下であり、前記各候補インデックスのインデッ
クス最適化値の和が最大であるしたがって、事前設定された（プリセット）閾値を超えず
、クエリ最適化の効果が最も高いインデックス集合が得られる。最後に、上記のチューニ
ング・インデックス集合を目標データベースに格納する。これにより、チューニング後の
インデックスクエリデータを目標データベースで使用できる。これにより、異なるアプリ
ケーションシーンの下で、異なるクエリ文の分析を通じて、データベースインスタンスの
インデックスを動的に調整し、効果的にインデックスを最適化することを実現し、データ
ベースシステムのクエリ性能の向上を実現することができる。 Each of the above-described embodiments of the present disclosure includes an intelligent index tuning method for relational databases of some embodiments of the present disclosure to effectively tune indexes and improve data query rates in real time. It has the beneficial effect of being able to In particular, the reason indexes cannot be adjusted effectively is that indexes for a database instance cannot be adjusted dynamically. Based on this, the intelligent index tuning method for relational databases of some embodiments of the present disclosure first parses each query statement in the query log to generate a query statement parse tree, Including obtaining a set of sentence parsing trees. Therefore, by analyzing the applied query logs, the indexes of a database instance can be adjusted dynamically. Then, a candidate index set is generated based on the query sentence analysis tree set. Therefore, it is possible to select the most effective set of indexes for query optimization. Next, an index optimization value for each candidate index in the candidate index set is determined. Therefore, each candidate index can be quantized with respect to the query optimization effect of the query, and the query optimization effect of each candidate index can be visually expressed. Next, from among the candidate index sets, at least one candidate index that satisfies preset conditions is selected as a tuning index set. The preset condition is that the sum of index sizes of each candidate index in the at least one candidate index is less than or equal to a preset threshold, and the sum of index optimization values of each candidate index is maximum. An index set that does not exceed a preset threshold and has the highest query optimization effect is obtained. Finally, store the above tuning index set in the target database. This allows the index query data after tuning to be used in the target database. Through the analysis of different query statements under different application scenes, it is possible to dynamically adjust the index of the database instance, effectively optimizing the index, and improving the query performance of the database system. can do.

本開示によるリレーショナル・データベース向けのインテリジェント・インデックス・チューニング方法のいくつかの実施形態のフローチャートである。1 is a flowchart of several embodiments of an intelligent index tuning method for relational databases according to the present disclosure. 本開示によるリレーショナル・データベース向けのインテリジェント・インデックス・チューニングシステムのいくつかの実施形態の構成図である。1 is a block diagram of several embodiments of an intelligent index tuning system for relational databases according to the present disclosure. FIG. 図３は、本開示のいくつかの実施形態を実現するのに適した電子機器の構造概略図である。FIG. 3 is a structural schematic diagram of an electronic device suitable for implementing some embodiments of the present disclosure.

以下、図面を参照し、実施例に関連して本開示を詳細に説明する。
図１は、本開示によるリレーショナル・データベース向けのインテリジェント・インデッ
クス・チューニング方法のいくつかの実施形態のフロー１００を示す。リレーショナル・
データベース向けのインテリジェント・インデックス・チューニング方法では、次のステ
ップに従う。 Hereinafter, the present disclosure will be described in detail in connection with embodiments with reference to the drawings.
FIG. 1 shows a flow 100 of some embodiments of an intelligent index tuning method for relational databases according to the present disclosure. relational·
The intelligent index tuning methodology for databases follows these steps:

ステップ１０１では、クエリ・ログ内の各クエリ文を解析して、クエリ文解析ツリーを生
成し、クエリ文解析ツリー集合を得る。
いくつかの実施形態では、リレーショナル・データベース向けのインテリジェント・イン
デックス・チューニング方法の実行主体（たとえば計算装置）は、クエリー・ログ内の各
クエリー文を解析してクエリー文解析ツリーを生成し、クエリー文解析ツリー集合を得る
ことができる。ここで、上記クエリ文解析ツリーは、解析ツリーとして表されるクエリ文
であってもよい。
実際には、上述の実行主体は、オープンソースの構文解析器と構文解析器を通じてクエリ
ログ中の各クエリ文を解析し、クエリ文解析ツリーを生成し、クエリ文解析ツリー集合を
得ることができる。 In step 101, each query sentence in the query log is analyzed to generate a query sentence parsing tree, and a set of query sentence parsing trees is obtained.
In some embodiments, an entity (e.g., a computing device) performing an intelligent index tuning method for relational databases generates a query statement parse tree by parsing each query statement in a query log. A set of parse trees can be obtained. Here, the query sentence parsing tree may be a query sentence expressed as a parsing tree.
In practice, the above-mentioned execution entity can analyze each query sentence in the query log through an open source parser and a syntax analyzer, generate a query sentence parsing tree, and obtain a set of query sentence parsing trees.

ステップ１０２では、クエリ文解析ツリー集合に基づいて、候補インデックス集合を生成
する。
いくつかの実施形態では、前記実行主体は、前記クエリ文解析ツリー集合に基づいて候補
インデックス集合を生成することができる。
いくつかの実施形態のいくつかの代替的な実施形態では、前記実行主体は、前記クエリ文
解析ツリー集合に基づいて、候補インデックス集合を生成し、以下のステップを含むこと
ができる。
上記のクエリ・ログのクエリ文ごとに、次のインデックス生成ステップを実行する。
最初に、前記クエリ文に対応するクエリ文解析ツリーに基づいて、前記クエリ文に含まれ
るフィールド集合を決定する。
実際には、前記実行主体は、前記クエリ文に対応するクエリ文解析ツリーに含まれる各フ
ィールドをフィールド集合として決定することができる。例えば、上記クエリ文は、「Ｓ
ＥＬＥＣＴＣＮＯ，ＦＮＡＭＥＦＲＯＭＣＵＳＴＷＨＥＲＥＬＮＡＭＥ＝：Ｌ
ＮＡＭＥＡＮＤＣＮＯ＞：ＣＮＯＯＲＤＥＲＢＹＦＮＡＭＥ」であり、上記ク
エリ文に対応するクエリ文解析ツリーに基づいて、上記クエリ文に含まれるフィールド集
合を｛ＣＮＯ，ＦＮＡＭＥ，ＬＮＡＭＥ｝と決定することができる。
ステップ２において、上記のフィールド集合のフィールドを組み合わせて、予め設定され
た評価条件を満たす少なくとも１つの標準フィールドの組み合わせを生成する。
実際には、前記実行主体は、前記フィールド集合中のフィールドを組み合わせ、予め設定
された評価条件を満たすフィールドの組み合わせを標準フィールドの組み合わせとするこ
とができる。ここで、上記の予め設定された評価条件は、少なくともインデックスサムス
ン評価基準の１つを満たすフィールドの組み合わせであってもよい。予め設定された評価
条件を満たす標準フィールドの組み合わせは、（ＬＮＡＭＥ、ＣＮＯ）、（ＬＮＡＭＥ、
ＣＮＯ、ＦＮＡＭＥ）、（ＬＮＡＭＥ、ＦＮＡＭＥ、ＣＮＯ）を含むことができる。
ステップ３では、上記クエリ文の選択条件文に含まれる選択フィールド集合の各フィール
ドサブ集合と上記少なくとも１つの標準フィールドの組み合わせを候補インデックスとし
て決定する。
ここで、上記の選択条件文は、ｗｈｅｒｅ条件文であってもよい。前記選択フィールド集
合は、前記選択条件文に含まれる各フィールド集合であってもよい。実際には、前記実行
主体は、前記クエリ文の選択条件文に含まれる選択フィールド集合の各フィールドサブ集
合と前記少なくとも１つの標準フィールドの組み合わせを候補インデックスとして決定す
ることができる。例えば、上記クエリ文の選択条件文に含まれる選択フィールド集合は、
｛ＬＮＡＭＥ，ＣＮＯ｝であってもよい。上記候補インデックスは、（ＣＮＯ）、（ＬＮ
ＡＭＥ）、（ＬＮＡＭＥ、ＣＮＯ）、（ＬＮＡＭＥ、ＣＮＯ、ＦＮＡＭＥ）、（ＬＮＡＭ
Ｅ、ＦＮＡＭＥ、ＣＮＯ）とすることができる。 In step 102, a candidate index set is generated based on the query sentence analysis tree set.
In some embodiments, the execution entity may generate a candidate index set based on the query sentence parse tree set.
In some alternative embodiments of some embodiments, the execution entity may generate a set of candidate indexes based on the set of query sentence parse trees, and may include the following steps.
Perform the following index generation steps for each query statement in the above query log.
First, a set of fields included in the query statement is determined based on a query statement analysis tree corresponding to the query statement.
Actually, the execution entity can determine each field included in the query statement parsing tree corresponding to the query statement as a field set. For example, the above query statement is “S
ELECT CNO, FNAME FROM CUST WHERE LNAME=:L
NAME AND CNO>:CNO ORDER BY FNAME", and the field set included in the query statement can be determined as {CNO, FNAME, LNAME} based on the query statement parsing tree corresponding to the query statement.
In step 2, the fields of the above field set are combined to generate at least one standard field combination that satisfies preset evaluation conditions.
In reality, the execution entity can combine the fields in the field set and set the combination of fields that satisfy preset evaluation conditions as the combination of standard fields. Here, the preset evaluation condition may be a combination of fields that satisfy at least one of the IndexSamsung evaluation criteria. Combinations of standard fields that satisfy preset evaluation conditions are (LNAME, CNO), (LNAME,
CNO, FNAME), (LNAME, FNAME, CNO).
In step 3, a combination of each field subset of the selection field set included in the selection condition statement of the query statement and the at least one standard field is determined as a candidate index.
Here, the above selection condition statement may be a where condition statement. The selection field set may be each field set included in the selection condition statement. In fact, the execution entity can determine a combination of each field subset of the selection field set included in the selection condition statement of the query statement and the at least one standard field as a candidate index. For example, the selection field set included in the selection condition statement of the above query statement is
It may be {LNAME, CNO}. The above candidate indexes are (CNO), (LN
AME), (LNAME, CNO), (LNAME, CNO, FNAME), (LNAM
E, FNAME, CNO).

ステップ１０３において、候補インデックス集合中の各候補インデックスのインデックス
最適化値を決定する。
いくつかの実施形態では、上述の実行主体は、候補インデックス集合中の各候補インデッ
クスのインデックス最適化値を決定することができる。ここで、上述のインデックス最適
化値は、対応する候補インデックスのデータベースシステムに対するクエリ最適化の程度
を特徴づけることができる。
いくつかの実施形態のいくつかの代替的な実施形態では、上述の候補インデックス集合の
各候補インデックスについて、上述の実行主体は、以下のステップを実行することができ
る。
ステップ１において、予め訓練された目標値予測モデルを用いて、前記クエリログ中の各
クエリ文に対する前記候補インデックスの目標値を決定し、目標値集合を得る。
前記目標値予測モデルは、前記候補インデックスと前記クエリ文の特徴属性が入力であり
、目標値を出力とするモデルである。実際には、前記実行主体は、予め訓練された目標値
予測モデルから得られた出力結果を、前記クエリ・ログ内の各クエリ文に対する候補イン
デックスの目標値として決定することができる。前記目標値集合における目標値は、前記
候補インデックスが存在する場合に前記クエリ文を実行するために必要な実行時間と、前
記候補インデックスが存在しない場合に前記クエリ文を実行するために必要な実行時間と
の差であり、前記候補インデックスが前記目標値に対応するクエリ文に対して取得できる
クエリ最適化効果を特徴づける。
あるいは、上記目標値予測モデルは、以下のモデル生成ステップにより得られることがで
きる。
第１ステップでは、クエリ文サンプル集合とインデックスサンプル集合を取得する。
実際には、上述の実行主体は、クエリログ内の第１の目標数のクエリ文と第２の目標数の
インデックスをそれぞれクエリ文サンプルとインデックスサンプルとして、クエリ文サン
プル集合とインデックスサンプル集合を得ることができる。例えば、上記第１の目標数は
２０００であってもよい。上記第２の目標数は１００であってもよい。
ステップ２において、前記クエリ文サンプル集合と前記インデックスサンプル集合とに基
づいて、特徴属性集合を生成する。
ここで、上記特徴属性集合における特徴属性は、限定されないが、クエリ文サンプルのカ
テゴリ（例えば、ｓｅｌｅｃｔであってもよい）、クエリ文サンプルに関連するテーブル
のレコードの数、クエリ文サンプルに含まれるオペレータの数（例えば、上記オペレータ
はプラス、マイナス、または代入などであってもよい）、クエリ文サンプルにおける条件
文の選択後に係るフィールドの符号化（例えば、上記符号化はｕｔｆ－８であってもよく
、ｕｔｆ－８は可変長文字符号化である）、インデックスサンプルが存在するデータベー
スにおけるテーブルのレコードの数、インデックスサンプルに含まれるフィールドの数、
インデックスサンプルに含まれるフィールドの符号化、クエリ文サンプルがインデックス
サンプルに対してプレフィックス呼び出しの原則を満たすか否かの識別、クエリ文サンプ
ルがインデックスサンプルの更新を引き起こすかどうかの識別情報である。上述の接頭辞
呼び出しの原則は、クエリ文サンプルの選択条件文にインデックスサンプルが含まれる接
頭辞フィールドであってもよい。
実際には、前記実行主体は、前記クエリ文サンプル集合中のクエリ文サンプルの属性特徴
と前記インデックスサンプル集合中のインデックスサンプルの属性特徴とを特徴属性とし
て、特徴属性集合を得ることができる。上記の特徴属性集合の特徴属性は、クエリ文サン
プルがコンパイル段階で取得できる属性値である。ここで、前記クエリ文サンプルのカテ
ゴリ、前記クエリ文サンプルに係るテーブルのレコードの数、前記クエリ文サンプルに含
まれるオペレータの数、前記インデックスサンプルが存在するデータベースにおけるテー
ブルのレコードの数、および前記インデックスサンプルに含まれるフィールドの数は、い
ずれも解析段階で得られる属性値である。データベース・インスタンス内のテーブル内の
すべてのフィールドをある順序で並べ、クエリ文サンプルとインデックス・サンプルで使
用されるフィールドをこの順序で排熱符号化すると、上記クエリ文サンプル中の条件文選
択後に関連するフィールドの符号化と、上記インデックス・サンプルに含まれるフィール
ドの符号化が得られる。解析フェーズでは、クエリ文サンプルがインデックスサンプルの
プレフィックス呼び出しの原則を満たしているかどうか、クエリ文サンプルで更新された
フィールドにインデックスサンプルのフィールドが含まれているかどうかを比較すること
ができるため、インデックスサンプルがプレフィックス呼び出しの原則を満たしているか
どうか、クエリ文サンプルがインデックスサンプルの更新を引き起こすかどうかを識別す
ることができる。
第３ステップでは、上記特徴属性集合に基づいて、少なくとも１つの初期目標値予測サブ
モデルをそれぞれモデル訓練し、少なくとも１つの目標値予測サブモデルを得る。
実際には、前記実行主体は、前記クエリ文サンプル集合中の各クエリ文サンプルと前記イ
ンデックスサンプル集合中の各インデックスサンプルとを組み合わせ、各組み合わせ中の
クエリ文サンプルとインデックスサンプルの特徴属性を少なくとも１つの初期目標値予測
サブモデルに入力してモデル訓練を行い、訓練が完了したモデルを少なくとも１つの目標
値予測サブモデルとすることができる。ここで、上記の少なくとも１つの初期目標値予測
サブモデルは、線形予測モデル、勾配向上決定ツリー、ＤＮＮ（ＤｅｅｐＮｅｕｒａｌ
Ｎｅｔｗｏｒｋ、深度ニューラルネットワーク）を含むことができるが、これらに限定
されない。
第４ステップでは、上記少なくとも１つの目標値予測サブモデルにおける各目標値予測サ
ブモデルの重みを決定する。
実際には、前記実行主体は、結合モデルの方法を用いて、前記少なくとも１つの目標値予
測サブモデルにおける各目標値予測サブモデルの重みを決定することができる。
ステップ５では、決定された重みと前記少なくとも１つの目標値予測サブモデルに基づい
て、目標値予測モデルを生成する。
実際には、上記実行主体は、各目標値予測サブモデルの重みに基づいて重み付け加算を行
い、目標値予測モデルを得ることができる。
これにより、異なるサブモデルに重みを与えることができ、各サブモデルの利点を最大に
発揮し、得られた目標値予測モデルの出力結果の精度をより高くすることができる。さら
に、クエリに対する候補インデックスのクエリ最適化効果をより正確に予測することがで
きる。
上述のモデル生成ステップ及びその関連内容は、本開示の実施形態の１つの発明点として
、背景技術で言及された技術問題２「クエリログから生成された推奨インデックスがデー
タベースインスタンスに実際に存在しない可能性があり、そのような実際に存在しないイ
ンデックスに基づいてクエリのクエリ最適化効果に対して最適なインデックスを正確に生
成することができない」を解決した。最適なインデックスを正確に生成できない理由は、
データベース・インスタンスにインデックスが実際に存在しておらず、インデックスが存
在する場合と存在しない場合のクエリ文の実行時間の差によってインデックスの目標値を
決定することができないためである。つまり、インデックスのクエリ最適化の効果である
。このような要因を解決すれば、インデックスが実際に存在しない場合でも、インデック
スのクエリ最適化効果を特定し、最適なインデックスを生成することができる。この効果
を達成するために、まず、目標値予測モデルを構築し、履歴データを用いて訓練を行った
。次に、訓練された目標値予測モデルを使用して、クエリー文に対するリアルインデック
スまたは仮想インデックスのクエリー最適化効果を決定する。さらに、データベースシス
テムに対するインデックスのクエリ最適化効果を決定し、最適なインデックスを生成する
ことができる。
第２ステップでは、上記目標値集合における各目標値の和を目標値の総和として決定する
。
ステップ３では、上記候補インデックスの時間値を決定する。ここで、前記時間値は、前
記候補インデックスを構築するのに必要な時間を特徴づける。
いくつかの実施形態のいくつかの代替的な実施形態では、上記実行主体は、上記候補イン
デックスの時間値を次のステップにより決定することができる：
ステップ１において、上記候補インデックスが真のインデックスであることを決定するこ
とに応答して、時間値を０に決定する。ここで、上記のリアルインデックスは、データベ
ースインスタンスに実際に存在するインデックスである。
ステップ２において、前記候補インデックスを仮想インデックスとして決定することに応
答して、前記候補インデックスのインデックスサイズと予め設定された単位サイズのイン
デックスを構築するために必要な時間の積を時間値として決定する。ここで、上記の仮想
インデックスはクエリ・ログによって生成され、データベース・インスタンスに実際に存
在しないインデックスである。上記の予め設定された単位サイズは１ＭＢであってもよい
。上記のインデックスサイズは、インデックスが占めるバイトのサイズであってもよい。
ステップ４において、第１の数値と前記目標値との総和の積と第２の数値と前記時間値と
の積の差を、前記候補インデックスのインデックス最適化値として決定する。ここで、前
記第１の値及び前記第２の値は、前記候補インデックスを構築するためのコストが前記候
補インデックスの実用性に与える影響を調整するために事前に設定された且つ事前設定さ
れた値の範囲内の値であってもよい。上記の予め設定された値の範囲は［０，１］でもよ
い。第１の値と前記目標値の総和の積は、前記候補インデックスのクエリ最適化効果を特
徴づけることができる。２番目の値と前記時間値との積は、前記候補インデックスを構築
するための時間的コストを特徴づけることができる。前記第１の数値と前記第２の数値と
の比が大きいほど、前記候補インデックスを構築するために必要な代価ではなく、前記候
補インデックスがもたらす最適化効果に特徴づけられる。 In step 103, an index optimization value for each candidate index in the candidate index set is determined.
In some embodiments, the execution entity described above may determine an index optimization value for each candidate index in the candidate index set. Here, the above-mentioned index optimization value can characterize the degree of query optimization of the corresponding candidate index for the database system.
In some alternative implementations of some embodiments, for each candidate index in the candidate index set described above, the execution entity described above may perform the following steps.
In step 1, a target value prediction model trained in advance is used to determine a target value of the candidate index for each query sentence in the query log, and a target value set is obtained.
The target value prediction model is a model whose inputs are the candidate index and the feature attributes of the query sentence, and whose output is a target value. In fact, the execution entity may determine the output result obtained from the pre-trained target value prediction model as the target value of the candidate index for each query statement in the query log. The target value in the target value set is the execution time required to execute the query statement when the candidate index exists, and the execution time required to execute the query statement when the candidate index does not exist. It is the difference from the time and characterizes the query optimization effect that the candidate index can obtain for the query statement corresponding to the target value.
Alternatively, the target value prediction model can be obtained by the following model generation steps.
In the first step, a query sentence sample set and an index sample set are obtained.
In reality, the above execution entity obtains a query statement sample set and an index sample set by using the first target number of query statements and the second target number of indexes in the query log as query statement samples and index samples, respectively. I can do it. For example, the first target number may be 2,000. The second target number may be 100.
In step 2, a feature attribute set is generated based on the query sentence sample set and the index sample set.
Here, the feature attributes in the above feature attribute set include, but are not limited to, the category of the query sentence sample (for example, it may be select), the number of records in a table related to the query sentence sample, and the number of records included in the query sentence sample. the number of operators (for example, the above operator may be plus, minus, or assignment); the encoding of the field after the selection of the conditional statement in the query statement sample (for example, the above encoding is UTF-8; utf-8 is a variable-length character encoding), the number of records in the table in the database in which the index sample resides, the number of fields contained in the index sample,
These include encoding of fields included in the index sample, identification of whether the query sentence sample satisfies the prefix call principle for the index sample, and identification information of whether the query sentence sample causes an update of the index sample. The above prefix calling principle may be applied to a prefix field in which an index sample is included in a selection condition statement of a query statement sample.
In reality, the execution entity can obtain a feature attribute set by using the attribute features of the query sentence samples in the query sentence sample set and the attribute features of the index samples in the index sample set as feature attributes. The feature attributes of the above feature attribute set are attribute values that can be obtained by the query statement sample at the compilation stage. Here, the category of the query statement sample, the number of records in the table related to the query statement sample, the number of operators included in the query statement sample, the number of records in the table in the database in which the index sample exists, and the index The number of fields included in the sample is an attribute value obtained at the analysis stage. If all the fields in the tables in the database instance are arranged in a certain order, and the fields used in the query statement sample and index sample are heat-encoded in this order, the related fields after the conditional statement selection in the query statement sample above are The encoding of the field contained in the index sample and the encoding of the field contained in the index sample are obtained. During the analysis phase, you can compare whether the query statement sample satisfies the prefix call principle of the index sample, and whether the fields updated in the query statement sample include fields from the index sample, so the index sample It is possible to identify whether the query statement sample causes an update of the index sample or not, if the query statement satisfies the prefix call principle.
In the third step, each of the at least one initial target value prediction submodel is model-trained based on the feature attribute set to obtain at least one target value prediction submodel.
In reality, the execution entity combines each query statement sample in the query statement sample set and each index sample in the index sample set, and sets at least one feature attribute of the query statement sample and index sample in each combination. The model can be trained by inputting the input into two initial target value prediction submodels, and the trained model can be used as the at least one target value prediction submodel. Here, the at least one initial target value prediction sub-model is a linear prediction model, a gradient-enhanced decision tree, a DNN (Deep Neural
network, depth neural network).
In the fourth step, the weight of each target value prediction submodel in the at least one target value prediction submodel is determined.
In practice, the execution entity may determine the weight of each target value prediction sub-model in the at least one target value prediction sub-model using a combined model method.
In step 5, a target value prediction model is generated based on the determined weight and the at least one target value prediction sub-model.
In reality, the execution entity can obtain a target value prediction model by performing weighted addition based on the weight of each target value prediction submodel.
This makes it possible to give weights to different submodels, maximize the advantages of each submodel, and increase the accuracy of the output results of the obtained target value prediction model. Furthermore, the query optimization effect of candidate indexes on queries can be predicted more accurately.
The above-mentioned model generation step and its related content solve the technical problem 2 mentioned in the background art, "the possibility that the recommended index generated from the query log does not actually exist in the database instance," as one of the invention points of the embodiment of the present disclosure. ``It is not possible to accurately generate an optimal index for the query optimization effect of a query based on such an index that does not actually exist.'' The reason why the optimal index cannot be generated accurately is
This is because no index actually exists in the database instance, and the target value of the index cannot be determined based on the difference in execution time of a query statement when an index exists and when it does not exist. In other words, this is the effect of index query optimization. By solving these factors, it is possible to identify the query optimization effect of an index and generate an optimal index even if an index does not actually exist. To achieve this effect, we first built a target value prediction model and trained it using historical data. Next, the trained target value prediction model is used to determine the query optimization effect of the real or virtual index on the query statement. Furthermore, the query optimization effect of the index on the database system can be determined and an optimal index can be generated.
In the second step, the sum of each target value in the target value set is determined as the total sum of target values.
In step 3, the time value of the candidate index is determined. Here, the time value characterizes the time required to build the candidate index.
In some alternative embodiments of some embodiments, the execution entity may determine the time value of the candidate index by the following steps:
In step 1, in response to determining that the candidate index is the true index, a time value is determined to be zero. Here, the above-mentioned real index is an index that actually exists in the database instance.
In step 2, in response to determining the candidate index as a virtual index, the product of the index size of the candidate index and the time required to construct an index of a preset unit size is determined as a time value. . Here, the above virtual index is an index that is generated by the query log and does not actually exist in the database instance. The above preset unit size may be 1 MB. The above index size may be the size in bytes that the index occupies.
In step 4, the difference between the product of the sum of the first numerical value and the target value and the product of the second numerical value and the time value is determined as the index optimization value of the candidate index. Here, the first value and the second value are preset and preset in order to adjust the impact of the cost for constructing the candidate index on the practicality of the candidate index. It may be a value within a range of values. The above preset value range may be [0, 1]. The product of the first value and the sum of the target values may characterize the query optimization effect of the candidate index. The product of the second value and the time value may characterize the time cost for constructing the candidate index. The larger the ratio between the first number and the second number is characterized by the optimization effect brought about by the candidate index rather than the cost required to construct the candidate index.

ステップ１０４では、候補インデックス集合の中から、チューニング・インデックス集合
として、予め設定された条件を満たす少なくとも１つの候補インデックスを選択する。
いくつかの実施形態では、上記実行主体は、上記候補インデックス集合の中から、予め設
定された条件を満たす少なくとも１つの候補インデックスをチューニング・インデックス
集合として選択することができる。前記予め設定された条件は、前記少なくとも１つの候
補インデックスにおける各候補インデックスのインデックスサイズの和が予め設定された
閾値以下であり、前記各候補インデックスのインデックス最適化値の和が最大である上記
の予め設定された閾値は、ユーザによって与えられたメモリサイズ閾値であってもよい。
いくつかの実施形態のいくつかの代替的な実施形態では、上記実行主体は、上記候補イン
デックス集合の中から、チューニング・インデックス集合として、予め設定された条件を
満たす少なくとも１つの候補インデックスを選択するステップを実行することができる：
最初に、候補インデックス・集合が空ではないと判断したことに応答して、候補インデッ
クス・集合と予め設定された閾値に基づいて、次の選択ステップを実行する。
第１のサブステップでは、予め設定された閾値制限の下で候補インデックス集合のハッシ
ュ値を現在のハッシュ値として決定する。
ここで、前記ハッシュ値は、予め設定された閾値制限の下での候補インデックス集合の最
大インデックス最適化値及び最適候補インデックス集合を特徴づける。たとえば、候補イ
ンデックス集合は｛インデックス１、インデックス２、インデックス３、インデックス４
｝である。予め設定された閾値は１００ＭＢである。ハッシュ値は、予め設定された閾値
１００ＭＢの制限の下で、候補インデックス集合｛インデックス１、インデックス２、イ
ンデックス３、インデックス４｝の最大インデックス最適化値と最適候補インデックス集
合を決定するためのものである。
第２のサブステップは、決定対象ハッシュテーブル中に前記現在のハッシュ値に対応する
最大インデックス最適化値と最適候補インデックス集合が存在することに応答して、前記
最大インデックス最適化値と前記最適候補インデックス集合を、前記現在のハッシュ値に
対応する最大インデックス最適化値と最適候補インデックス集合として決定する。
なお、前記目標ハッシュテーブルは、前記ハッシュ値を格納するハッシュテーブルであっ
てもよい。
第３のサブステップは、前記目標ハッシュテーブルに前記現在のハッシュ値に対応する最
大インデックス最適化値及び最適候補インデックス集合が存在しないことを決定すること
に応答して、候補インデックス集合の最後の候補インデックスを最下位候補インデックス
と決定し、最下位候補インデックスを除去し、除去後の候補インデックス集合を候補イン
デックス集合として連携させ、前記選択ステップを再度実行する。
第２のステップでは、候補インデックス集合が空であることを決定することに応答して、
０および空集合をそれぞれ、現在のハッシュ値に対応する第１の最大インデックス最適化
値と第１の最適候補インデックス集合として決定する。
いくつかの実施形態のいくつかの代替的な実施形態では、上述の選択ステップは、以下の
プッシュバックステップをさらに含む。
最初に、予め設定された閾値が除去された最終候補インデックスのインデックスサイズ以
上であることを決定したことに応答して、次のステップを実行する。
予め設定された閾値と削除された最終候補索引の索引サイズの差を更新された予め設定さ
れた閾値として決定する第１のサブステップと、得られた０および空集合を現在のハッシ
ュ値の第２の最大索引最適化値と第２の最適候補索引集合として決定する第２の選択ステ
ップとを再び実行する。
第２のサブステップでは、第１の最大インデックス最適化値と除去された最終候補インデ
ックスのインデックス最適化値との和を更新された第２の最大インデックス最適化値とし
て決定し、除去された最終候補インデックスを第１の最適候補インデックス集合に追加し
、追加された第１の最適候補インデックス集合を更新された第２の最適候補インデックス
集合として決定する。
第３のサブステップは、第１の最大インデックス最適化値が第２の最大インデックス最適
化値以上であることを決定することに応答して、第１の最大インデックス最適化値と第１
の最適候補インデックス集合を上記目標ハッシュテーブルに追加し、第１の最大インデッ
クス最適化値と第１の最適候補インデックス集合を現在のハッシュ値の最大インデックス
最適化値と最適候補インデックス集合として決定する。
第４のサブステップでは、第１の最大インデックス最適化値が第２の最大インデックス最
適化値よりも小さいことを決定することに応答して、第２の最大インデックス最適化値と
第２の最適候補インデックス集合を上記目標ハッシュテーブルに追加し、第２の最大イン
デックス最適化値と第２の最適候補インデックス集合を現在のハッシュ値の最大インデッ
クス最適化値と最適候補インデックス集合として決定する。
次に、最後に削除された最後の候補インデックスを候補インデックス集合に追加し、次の
ステップに従う。
第１のサブステップでは、予め設定された閾値制限の下で候補インデックス集合のハッシ
ュ値を現在のハッシュ値として決定する。
例えば、候補インデックス集合が｛候補インデックス１、候補インデックス２｝であり、
候補インデックス３を除去した集合である場合、除去された最終候補インデックスは候補
インデックス３である。最後に削除された最後の候補インデックスは候補インデックス４
である。前回削除された候補インデックス４を候補インデックス集合に追加し、更新され
た候補インデックス集合｛候補インデックス１、候補インデックス２、候補インデックス
３、候補インデックス４｝を得る。事前設定された閾値制限の下で候補インデックス集合
のハッシュ値を現在のハッシュ値として決定する。
第２のサブステップでは、最大インデックス最適化値と最適候補インデックス集合を、現
在のハッシュ値に対応する第１の最大インデックス最適化値と第１の最適候補インデック
ス集合として決定し、再び上述のバックプッシュステップを実行する。
ステップ３では、得られた最適候補インデックス集合を上記のチューニング・インデック
ス集合として決定する。
上記選択ステップ及び上記プッシュバックステップ及びその関連内容は、本開示の実施例
の１つの発明点として、背景技術が言及した技術問題３「大量の候補インデックスから最
適なインデックスを迅速に生成することができず、インデックスを生成する速度が遅い」
を解決した。インデックスの生成速度が遅い理由は、データベース・インスタンスが大き
い場合には予め設定された閾値も大きく、インデックスを選択する際には空間的制約の下
でのローカル解を列挙する必要があり、時間的なオーバーヘッドが大きく、最終的な最適
インデックスとはほとんど関係がない。このような要因を解決すれば、インデックス生成
速度を向上させることができる。この効果を達成するために、ハッシュテーブルを使用し
て選択プロセス中の各ハッシュ値を格納し、ハッシュテーブル中に結果が存在しないハッ
シュ値について、候補インデックス集合中にインデックスが１つ捨てられ、インデックス
が１つ選択された場合をさらに比較する。これら２つのケースで取得できるインデックス
最適化値の大きさに基づいて、現在のハッシュ値の結果として、ハッシュテーブルに大き
なインデックス最適化値に対応する結果を記録する。これにより、問題の重複が回避され
、最適なインデックスを生成する速度が向上する。 In step 104, at least one candidate index that satisfies preset conditions is selected as a tuning index set from among the candidate index sets.
In some embodiments, the execution entity may select at least one candidate index that satisfies preset conditions from the candidate index set as the tuning index set. The preset condition is that the sum of index sizes of each candidate index in the at least one candidate index is less than or equal to a preset threshold, and the sum of index optimization values of each candidate index is maximum. The preset threshold may be a memory size threshold given by the user.
In some alternative embodiments of some embodiments, the execution entity selects at least one candidate index that satisfies a preset condition as a tuning index set from among the candidate index set. Steps can be performed:
First, in response to determining that the candidate index/set is not empty, the next selection step is performed based on the candidate index/set and a preset threshold.
In the first sub-step, the hash value of the candidate index set is determined as the current hash value under a preset threshold restriction.
Here, the hash value characterizes the maximum index optimization value of the candidate index set and the optimal candidate index set under a preset threshold restriction. For example, the candidate index set is {index 1, index 2, index 3, index 4
}. The preset threshold is 100MB. The hash value is used to determine the maximum index optimization value and the optimal candidate index set for the candidate index set {index 1, index 2, index 3, index 4} under the preset threshold of 100 MB. be.
The second sub-step includes determining the maximum index optimization value and the optimal candidate index set in response to the presence of the maximum index optimization value and the optimal candidate index set corresponding to the current hash value in the hash table to be determined. An index set is determined as a maximum index optimization value corresponding to the current hash value and an optimal candidate index set.
Note that the target hash table may be a hash table that stores the hash values.
a third substep, in response to determining that there is no maximum index optimization value and optimal candidate index set corresponding to the current hash value in the target hash table; The index is determined to be the lowest candidate index, the lowest candidate index is removed, the removed candidate index set is linked as the candidate index set, and the selection step is executed again.
In the second step, in response to determining that the candidate index set is empty,
0 and an empty set are determined as a first maximum index optimization value and a first optimal candidate index set corresponding to the current hash value, respectively.
In some alternative embodiments of some embodiments, the selection step described above further includes the following pushback step.
First, in response to determining that the preset threshold is greater than or equal to the index size of the final candidate index removed, the following steps are performed.
a first substep of determining the difference between the index size of the preset threshold and the deleted final candidate index as the updated preset threshold; The second selection step of determining the maximum index optimization value of 2 and the second optimal candidate index set is performed again.
In the second sub-step, the sum of the first maximum index optimization value and the index optimization value of the removed final candidate index is determined as the updated second maximum index optimization value, and the sum of the index optimization value of the removed final candidate index is determined as the updated second maximum index optimization value; The candidate index is added to the first optimal candidate index set, and the added first optimal candidate index set is determined as the updated second optimal candidate index set.
A third substep includes determining the first maximum index optimization value and the first maximum index optimization value in response to determining that the first maximum index optimization value is greater than or equal to the second maximum index optimization value.
The optimal candidate index set of is added to the target hash table, and the first maximum index optimization value and the first optimal candidate index set are determined as the maximum index optimization value and the optimal candidate index set of the current hash value.
In response to determining that the first maximum index optimization value is less than the second maximum index optimization value, the fourth substep includes determining the second maximum index optimization value and the second maximum index optimization value. The candidate index set is added to the target hash table, and a second maximum index optimization value and a second optimal candidate index set are determined as the maximum index optimization value and the optimal candidate index set of the current hash value.
Next, add the last removed candidate index to the candidate index set and follow the steps below.
In the first sub-step, the hash value of the candidate index set is determined as the current hash value under a preset threshold restriction.
For example, the candidate index set is {candidate index 1, candidate index 2},
If it is a set from which candidate index 3 has been removed, the final candidate index that has been removed is candidate index 3. The last candidate index deleted is candidate index 4
It is. The previously deleted candidate index 4 is added to the candidate index set to obtain an updated candidate index set {candidate index 1, candidate index 2, candidate index 3, candidate index 4}. Determine the hash value of the candidate index set as the current hash value under a preset threshold restriction.
In the second sub-step, the maximum index optimization value and the optimal candidate index set are determined as the first maximum index optimization value and the first optimal candidate index set corresponding to the current hash value, and the above-mentioned backing process is performed again. Execute a push step.
In step 3, the obtained optimal candidate index set is determined as the above tuning index set.
The above-mentioned selection step, the above-mentioned pushback step, and their related content are one of the invention points of the embodiments of the present disclosure, and the technical problem 3 mentioned in the background art is that it is difficult to quickly generate an optimal index from a large number of candidate indexes. "The index generation speed is slow."
solved. The reason why the index generation speed is slow is that when the database instance is large, the preset threshold is also large, and when selecting an index, it is necessary to enumerate local solutions under spatial constraints. The overhead is large and has little to do with the final optimal index. If these factors are resolved, index generation speed can be improved. To achieve this effect, a hash table is used to store each hash value during the selection process, and for hash values for which there is no result in the hash table, one index is discarded in the candidate index set, and the index The case where one is selected will be further compared. Based on the magnitude of the index optimization value that can be obtained in these two cases, the result corresponding to the large index optimization value is recorded in the hash table as the current hash value result. This avoids duplication of problems and increases the speed of generating optimal indexes.

ステップ１０５において、チューニング・インデックス集合を目標データベースに格納す
る。
いくつかの実施形態では、上記実行主体は、上記チューニング・インデックス集合を目標
データベースに格納することができる。ここで、前記目標データベースは、前記クエリロ
グに対応するデータベースであってもよく、目標データベースでは、前記チューニング・
インデックス集合におけるチューニング・インデックスクエリデータを使用することがで
きる。
本開示のいくつかの実施形態のリレーショナル・データベース向けのインテリジェント・
インデックス・チューニング方法により、インデックスを効果的に調整し、リアルタイム
でデータクエリレートを向上させることができる。特に、インデックスを効果的に調整で
きない理由は、データベース・インスタンスのインデックスを動的に調整できないためで
ある。これに基づいて、本開示のいくつかの実施形態のリレーショナル・データベース向
けのインテリジェント・インデックス・チューニング方法は、まず、クエリーログ内の各
クエリー文を解析して、クエリー文解析ツリーを生成し、クエリー文解析ツリー集合を得
ることを含む。したがって、適用されるクエリー・ログを分析することで、データベース
・インスタンスのインデックスを動的に調整できる。そして、上記クエリ文解析ツリー集
合に基づいて、候補インデックス集合を生成する。したがって、クエリの最適化に最も効
果的なインデックスの集合を選択することができる。次に、上記の候補インデックス集合
の各候補インデックスのインデックス最適化値を決定する。したがって、各候補インデッ
クスをクエリのクエリ最適化効果に対して量子化し、各候補インデックスのクエリ最適化
効果を視覚的に表現することができる。次に、前記候補インデックス集合の中から、前記
少なくとも１つの候補インデックスの中の各候補インデックスのインデックスサイズの和
が予め設定された閾値以下であり、かつ、前記各候補インデックスのインデックス最適化
値の和が最大である予め設定された条件を満たす少なくとも１つの候補インデックスを、
予め設定されたインデックス集合として選択する。したがって、事前設定された閾値を超
えず、クエリ最適化の効果が最も高いインデックス集合が得られる。最後に、上記のチュ
ーニング・インデックス集合を目標データベースに格納する。これにより、チューニング
後のインデックスクエリデータを目標データベースで使用できる。これにより、異なるア
プリケーションシーンの下で、異なるクエリ文の分析を通じて、データベースインスタン
スのインデックスを動的に調整し、効果的にインデックスを最適化することを実現し、デ
ータベースシステムのクエリ性能の向上を実現することができる。
さらに図２を参照すると、本開示は、上述した各図に示された方法の実装として、図１に
示された方法の実施形態に対応するリレーショナル・データベース向けのインテリジェン
ト・インデックス・チューニングシステムのいくつかの実施形態を提供し、このシステム
は、具体的には様々な電子機器に適用することができる。
図２に示すように、いくつかの実施形態のリレーショナル・データベース向けのインテリ
ジェントインデックス・チューニングシステム２００は、解析部２０１、生成部２０２、
判定部２０３、選択部２０４、および記憶部２０５を含む。ここで、解析部２０１は、ク
エリログ内の各クエリ文を解析してクエリ文解析ツリーを生成し、クエリ文解析ツリー集
合を得るように構成されている。生成部２０２は、上記クエリ文に基づいてツリー集合を
解析し、候補インデックス集合を生成するように構成されている。決定部２０３は、上記
候補インデックス集合の各候補インデックスのインデックス最適化値を決定する。選択部
２０４は、上記候補インデックス集合の中から、調整インデックス集合として、予め設定
された条件を満たす少なくとも１つの候補インデックスを選択するように構成されている
。記憶部２０５は、上述したチューニング・インデックス集合を目標データベースに記憶
するように構成されている。
このシステム２００に記載されたユニットは、図１を参照して説明された方法の各ステッ
プに対応していることが理解される。したがって、上述の方法について説明した動作、特
徴、および生成された有益な効果は、システム１００およびその中に含まれるユニットに
も同様に適用され、ここではこれ以上説明しない。
以下、図３を参照すると、本開示のいくつかの実施形態を実現するのに適した電子機器３
００の構造概略図が示されている。図３に示す電子機器は単なる一例であり、本開示の実
施形態の機能及び使用範囲に何ら制限を与えるものではない。
図３に示すように、電子機器３００は、読取り専用メモリ（ＲＯＭ）３０２に格納された
プログラム、または記憶装置３０８からランダムアクセスメモリ（ＲＡＭ）３０３にロー
ドされたプログラムに従って様々な適切な動作および処理を実行することができる処理装
置（例えば中央プロセッサ、グラフィックプロセッサなど）３０１を含むことができる。
ＲＡＭ３０３には、電子機器３００の動作に必要な各種プログラムやデータも記憶されて
いる。処理装置３０１、ＲＯＭ３０２及びＲＡＭ３０３は、バス３０４を介して互いに接
続されている。バス３０４には、入出力（Ｉ／Ｏ）インタフェース３０５も接続されてい
る。
一般に、以下の装置は、タッチスクリーン、タッチパッド、キーボード、マウス、カメラ
、マイク、加速度計、ジャイロスコープなどを含む入力装置３０６、液晶ディスプレイ（
ＬＣＤ）、スピーカ、バイブレータなどを含む出力装置３０７、例えば、磁気テープ、ハ
ードディスク等を含む記憶装置３０８、及び通信装置３０９を含む。通信装置３０９は、
データを交換するために電子機器３００が他のデバイスと無線または有線通信することを
可能にすることができる。図３は、様々な装置を備えた電子機器３００を示しているが、
図示された装置のすべてを実施または備える必要はないことが理解されるべきである。代
替的に、より多くまたはより少ない装置を実装または備えることができる。図３に示す各
ブロックは、１つのデバイスを表してもよく、必要に応じて複数のデバイスを表してもよ
い。 At step 105, the tuning index set is stored in the target database.
In some embodiments, the execution entity may store the tuning index set in a target database. Here, the target database may be a database corresponding to the query log, and in the target database, the tuning and
Tuning index query data in index sets can be used.
Intelligent software for relational databases of some embodiments of the present disclosure
Index tuning methods can effectively tune indexes and improve data query rates in real time. In particular, the reason indexes cannot be adjusted effectively is that indexes for a database instance cannot be adjusted dynamically. Based on this, the intelligent index tuning method for relational databases of some embodiments of the present disclosure first parses each query statement in the query log to generate a query statement parse tree, Including obtaining a set of sentence parsing trees. Therefore, by analyzing the applied query logs, the indexes of a database instance can be adjusted dynamically. Then, a candidate index set is generated based on the query sentence analysis tree set. Therefore, it is possible to select the most effective set of indexes for query optimization. Next, an index optimization value for each candidate index in the candidate index set is determined. Therefore, each candidate index can be quantized with respect to the query optimization effect of the query, and the query optimization effect of each candidate index can be visually expressed. Next, from among the candidate index set, the sum of the index sizes of each candidate index among the at least one candidate index is less than or equal to a preset threshold, and the index optimization value of each candidate index is At least one candidate index that satisfies a preset condition whose sum is maximum,
Select as a preset index set. Therefore, an index set that does not exceed a preset threshold and has the highest query optimization effect can be obtained. Finally, store the above tuning index set in the target database. This allows the index query data after tuning to be used in the target database. Through the analysis of different query statements under different application scenes, it is possible to dynamically adjust the index of the database instance, effectively optimizing the index, and improving the query performance of the database system. can do.
With further reference to FIG. 2, the present disclosure describes several intelligent index tuning systems for relational databases that correspond to embodiments of the method shown in FIG. This system can be specifically applied to various electronic devices.
As shown in FIG. 2, an intelligent index tuning system 200 for relational databases according to some embodiments includes an analysis unit 201, a generation unit 202,
It includes a determination section 203, a selection section 204, and a storage section 205. Here, the analysis unit 201 is configured to analyze each query sentence in the query log, generate a query sentence analysis tree, and obtain a set of query sentence analysis trees. The generation unit 202 is configured to analyze a tree set based on the above query statement and generate a candidate index set. The determining unit 203 determines an index optimization value for each candidate index in the candidate index set. The selection unit 204 is configured to select at least one candidate index that satisfies preset conditions as an adjustment index set from the candidate index set. The storage unit 205 is configured to store the above-mentioned tuning index set in the target database.
It will be understood that the units described in this system 200 correspond to the steps of the method described with reference to FIG. Accordingly, the operations, features, and beneficial effects described for the above-described method apply equally to system 100 and the units included therein, and will not be further described here.
Referring now to FIG. 3, an electronic device 3 suitable for implementing some embodiments of the present disclosure
A structural schematic diagram of 00 is shown. The electronic device shown in FIG. 3 is merely an example, and does not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
As shown in FIG. 3, electronic device 300 performs various appropriate operations and processes in accordance with programs stored in read-only memory (ROM) 302 or programs loaded into random access memory (RAM) 303 from storage device 308. A processing device (eg, a central processor, a graphics processor, etc.) 301 can be included.
The RAM 303 also stores various programs and data necessary for the operation of the electronic device 300. The processing device 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to the bus 304 .
Generally, the following devices include input devices 306, including touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.;
An output device 307 including an LCD), a speaker, a vibrator, etc., a storage device 308 including a magnetic tape, a hard disk, etc., and a communication device 309. The communication device 309 is
Electronic equipment 300 may enable wireless or wired communication with other devices to exchange data. Although FIG. 3 shows an electronic device 300 that includes various devices,
It should be understood that not all of the illustrated devices need be implemented or included. Alternatively, more or fewer devices may be implemented or included. Each block shown in FIG. 3 may represent one device, or may represent multiple devices as necessary.

Claims

a parsing unit configured to parse each query sentence in the query log, generate a query sentence parsing tree, and obtain a set of query sentence parsing trees;
a generation unit configured to generate a candidate index set and parsing the tree set based on a query statement;
a determining unit configured to determine an index optimization value for each candidate index in the candidate index set;
The decision unit is
The index optimization value is the value for the database system of the corresponding candidate index.
The degree of query optimization can be characterized,
Using a pre-trained target value prediction model, candidate images are generated for each query sentence in the query log.
Determine the target value of the index, obtain the target value set,
The target value prediction model takes the candidate index and the feature attributes of the query statement as input, and calculates the target value.
This is a model that outputs the output, and the main body of execution is the output obtained from the target value prediction model that has been trained in advance.
determine the target value of the candidate index for each query statement in the query log.
The target value in the target value set can be queried if a candidate index exists.
The execution time required to execute the statement and the query statement when no candidate index exists.
The difference between the execution time required to execute the query whose candidate index corresponds to the target value
Characterize the query optimization effect that can be obtained for resource statements,
The execution entity stores the first target number of query statements and the second target number of indexes in the query log.
A query statement sample set and an index sample are provided as query statement samples and index samples, respectively.
Obtain the dex sample set,
The execution entity executes the attribute characteristics and index sample of the query statement samples in the query statement sample set.
Obtain a feature attribute set using the attribute feature of the index sample in the sample set as the feature attribute.
Therefore, the feature attributes of the feature attribute set are the attribute values that the query statement sample can obtain at the compilation stage.
Here, the category of the query statement sample and the record of the table related to the query statement sample are
number of nodes, number of operators included in query statement samples, and whether index samples exist.
The number of records for a table in a given database, and the number of records included in the index sample.
The number of fields included are all attribute values obtained during the parsing stage and are dependent on the database installation.
Arrange all the fields in the tables in the database in a certain order and use the query statement sample and import.
If we wasteheat encode the fields used in the dex sample in this order, the above query
After selecting the conditional sentence in the sentence sample, encode the related fields and encode the above index sample.
The encoding of the fields included in the pull is obtained, and in the parsing phase, the sample query statement is
Whether the query statement satisfies the prefix call principle of the index sample
Do the fields updated in the sample include fields in the index sample?
Compare whether the index sample satisfies the prefix call principle.
and whether a sample query statement causes an update to a sample index.
Separately,
The main body of execution is each query statement sample in the query statement sample collection and the index sample collection.
Combine each index sample in the combination, and query statement sample and index in each combination.
inputting feature attributes of the index sample into at least one initial target value prediction submodel;
Perform model training and use the trained model as at least one target value prediction submodel.
, where the above at least one initial target value prediction submodel is a linear prediction model, a gradient
Improved decision tree, DNN (Deep Neural Network)
network),
The execution entity uses the combined model method to evaluate at least one target value prediction submodel.
Determine the weight of each target value prediction submodel for
The execution entity performs weighted addition based on the weight of each target value prediction submodel to predict the target value.
Obtain the measurement model,
Determine the sum of each target value in the target value set as the total sum of target values,
Determine the time value of the candidate index above, where the time value builds the candidate index
Characterize the time required to
The difference between the product of the total sum of the first numerical value and the target value and the product of the second numerical value and the time value is calculated as a candidate index.
where the first value and the second value are the index optimization values of the candidate index.
to adjust the impact of the cost of building the index on the usefulness of the candidate index.
A value that is preset to and within the range of the preset value, and the above preset value
The range of is [0,1], and the product of the first value and the sum of the target values is the query for the candidate index.
To characterize the optimization effect, the product of the second value and the time value is used to construct a candidate index.
The larger the ratio between the first number and the second number, the more likely the candidate index is
focus on the optimization benefits of candidate indexes, rather than the cost required to build indexes.
Sign,
a selection unit that selects at least one candidate index that satisfies preset conditions as a tuning index set from the candidate index set;
a storage unit configured to store the tuning index set in the target database;
Intelligent index tuning system for relational databases, including: