CN114510499A - Execution efficiency optimization method and device for database operation statements - Google Patents

Execution efficiency optimization method and device for database operation statements Download PDF

Info

Publication number
CN114510499A
CN114510499A CN202111537498.1A CN202111537498A CN114510499A CN 114510499 A CN114510499 A CN 114510499A CN 202111537498 A CN202111537498 A CN 202111537498A CN 114510499 A CN114510499 A CN 114510499A
Authority
CN
China
Prior art keywords
execution
database operation
operation statement
information
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111537498.1A
Other languages
Chinese (zh)
Inventor
卞中明
赵可可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd, Alibaba Cloud Computing Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111537498.1A priority Critical patent/CN114510499A/en
Publication of CN114510499A publication Critical patent/CN114510499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Abstract

An execution efficiency optimization method and device for database operation statements are disclosed. Information related to the execution efficiency of the database operation statement is collected within the lifecycle of the database operation statement, and a diagnostic report and/or optimization recommendation is generated based on the collected information. Therefore, the life cycle of the database operation statement is used as the data acquisition period, any information which is beneficial to improving the execution efficiency of the database operation statement can be acquired, and the diagnosis report and/or the optimization suggestion generated based on the acquired information can comprehensively and accurately optimize the execution efficiency of the database operation statement.

Description

Execution efficiency optimization method and device for database operation statements
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a method and an apparatus for optimizing execution efficiency of database operation statements.
Background
A database is a warehouse that organizes, stores, and manages data according to a data structure, which is an organized, sharable, and uniformly managed collection of large amounts of data that is stored in computers for a long period of time.
Databases may be queried, updated, and managed by executing database operation statements written in compliance with database query and programming languages. The execution efficiency of the database operation statement is generally influenced by various factors, however, the existing optimization scheme mainly performs tuning by analyzing whether the design of the database query statement is reasonable, so that the optimization efficiency is limited.
Therefore, a more efficient optimization scheme for database operation statement execution efficiency is needed.
Disclosure of Invention
One technical problem to be solved by the present disclosure is to provide a more effective optimization scheme for database operation statement execution efficiency.
According to a first aspect of the present disclosure, there is provided a method for optimizing execution efficiency of database operation statements, including: collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement; generating a diagnostic report and/or optimization recommendation based on the collected information.
Optionally, the lifecycle includes a pre-execution stage before executing the database operation statement, and the step of collecting information related to the execution efficiency of the database operation statement within the lifecycle of the database operation statement includes: collecting information related to the execution efficiency of the database operation statement in the pre-execution stage, and generating an efficiency diagnosis report and/or optimization suggestion based on the collected information comprises the following steps: generating a pre-execution suggestion based on the collected information, the method further comprising: performing an optimization operation based on the pre-execution suggestion prior to executing the database operation statement.
Optionally, the database targeted by the database operation statement is an MPP database, and the step of collecting information related to the execution efficiency of the database operation statement in the parsing stage includes: obtaining table information related to a table related to the database operation statement; and/or analyzing whether the query condition in the database operation statement is reasonable; and/or pre-estimating an execution plan of the database operation statement; and/or detecting whether the number of the master nodes on each host is matched with the host resources.
Optionally, the step of analyzing whether the query condition in the database operation statement is reasonable includes: detecting whether the hidden condition of the join exists in the database operation statement or not; and/or detecting whether indexes exist in a table corresponding to the query in which the ratio of the result set to the table after the where condition filtering in the database operation statement is lower than a preset threshold value.
Optionally, the table information includes first table information and second table information, where the first table information is used to represent basic information of a table, the second table information is used to represent information obtained by performing statistics and/or analysis on the table, and the first table information includes at least one of the following: the system comprises a distribution key, a row and column memory, table compression, field number, a primary key, unique constraint and partition number, wherein the second table information comprises at least one of the following items: disk space occupancy details, statistical information, expansion conditions, tilt conditions.
Optionally, the pre-execution suggestion includes at least one of: a table design suggestion; prompting the hidden condition danger of the join; index recommendation; performing a planned bottleneck prompt; an execution plan suggestion; prompting a node balance state; prompting the inclination condition of the watch; and (5) prompting statistical information.
Optionally, the lifecycle includes an execution phase for characterizing an actual execution process of the database operation statement, and the step of collecting information related to execution efficiency of the database operation statement within the lifecycle of the database operation statement includes: collecting information related to the execution efficiency of the database operation statements in the actual execution process in the execution stage, and generating an efficiency diagnosis report and/or an optimization suggestion based on the collected information comprises the following steps: and using the collected information as auxiliary optimization information.
Optionally, the database targeted by the database operation statement is an MPP database, and the step of collecting information related to the execution efficiency of the database operation statement in the actual execution process in the execution stage includes: monitoring the state of the database in the actual execution process to obtain the state information of the database; and/or monitoring the load of the host where the node is located in the actual execution process to obtain the load information of the host; and/or monitoring the execution state of the node in the actual execution process to obtain the state information of the node.
Optionally, the method further comprises: and generating an actual execution plan of the database operation statement based on an analysis tool, and analyzing the actual execution condition of the database operation statement in the execution stage.
Optionally, the method further comprises: and setting one or more optimizers, and generating and executing an actual execution plan of the database operation statement under the optimization action of the optimizers.
Optionally, the lifecycle includes a post-execution stage after an actual execution process of the database operation statement is completed, and the step of collecting information related to execution efficiency of the database operation statement in the lifecycle of the database operation statement further includes: acquiring a plurality of actual execution plans corresponding to different optimizers in the post-execution stage, and generating an efficiency diagnosis report and/or optimization suggestion based on the acquired information comprises the following steps: and comparing the actual execution plans to obtain information capable of reflecting execution differences of different optimizers.
Optionally, the step of generating an efficiency diagnostic report and/or optimization recommendation based on the collected information further comprises: a new execution plan and/or an actual execution plan bottleneck prompt is generated.
According to a second aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described in the first aspect above.
According to a third aspect of the present disclosure, there is provided a computer program product comprising executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described in the first aspect above.
According to a fourth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described in the first aspect above.
Therefore, the information related to the execution efficiency of the database operation statement is collected in the life cycle of the database operation statement, so that the diagnosis report and/or the optimization suggestion generated based on the collected information can comprehensively and accurately optimize the execution efficiency of the database operation statement.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 shows a schematic flow diagram of a method of performing efficiency optimization according to one embodiment of the present disclosure.
Figure 2 illustrates a schematic flow diagram of an MPP database oriented efficiency diagnosis and optimization assistance scheme according to one embodiment of the present disclosure.
Fig. 3 shows a schematic structural diagram of an execution efficiency optimization apparatus according to an embodiment of the present disclosure.
FIG. 4 shows a schematic structural diagram of a computing device according to one embodiment of the present disclosure.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
According to the method, the life cycle of the database operation statement is used as a data acquisition period, information required by optimization is acquired and processed in the life cycle of the database operation statement, and finally a diagnosis report and/or an optimization suggestion which are beneficial to improving the execution efficiency of the database operation statement are generated.
The life cycle of the database operation statement can reflect the process of the database operation statement from before execution to after the execution is finished. By executing the information acquisition operation in the life cycle of the database operation statement, any information which is helpful for improving the execution efficiency of the database operation statement can be acquired, and the execution efficiency of the database operation statement can be comprehensively and accurately optimized by the diagnosis report and/or the optimization suggestion generated based on the acquired information.
FIG. 1 shows a schematic flow diagram of a method of performing efficiency optimization according to one embodiment of the present disclosure.
Referring to fig. 1, in step S110, information related to the execution efficiency of a database operation statement is collected during the lifecycle of the database operation statement.
The database operation statement refers to a statement written based on a database query language for operating on a database. For example, the database operation statement may be a statement written based on SQL (Structured Query Language), that is, an SQL statement.
The lifecycle of the database operation statement may refer to a process from the generation of the database operation statement to the completion of the execution of the database operation statement. The lifecycle of a database operation statement may be divided into a number of phases. That is, the lifecycle of a database operation statement may include multiple phases. For example, the lifecycle of a database operation statement may include three phases before, during, and after execution.
The collection operation may occur during a portion of the lifecycle of the database operation statement or may occur during the entire lifecycle of the database operation statement. Preferably, the collection operation may occur in the whole life cycle of the database operation statement to collect as much information as possible related to the execution efficiency of the database operation statement
Collecting information related to the execution efficiency of the database operation statements may refer to collecting various types of information required to optimize the execution efficiency of the database operation statements. "execution efficiency" may refer to the execution efficiency of one or more stages (or steps) in the execution process of the database operation statement, and may also refer to the overall execution efficiency of the database operation statement in the execution process.
In step S120, a diagnostic report and/or optimization recommendation is generated based on the collected information.
Diagnostic reports and/or optimization recommendations may be generated by analyzing the collected information.
For example, the execution efficiency of the database operation statements may be diagnosed based on the collected information, and a diagnosis report capable of representing factors affecting the execution efficiency of the database operation statements may be obtained; and/or the factors influencing the execution efficiency of the database operation statements can be determined by analyzing the collected information, and corresponding optimization suggestions are generated according to the factors.
The generated diagnosis report and/or optimization suggestion can be output to a user, so that the user can optimize the execution efficiency of the database operation statement according to the diagnosis report and/or optimization suggestion.
As described above, the lifecycle of a database operation statement may include three phases before, during, and after execution. The present disclosure may perform data acquisition operations (i.e., step S110) in any or all of these three phases and generate diagnostic reports and/or optimization recommendations for the different phases. Wherein the generation of a diagnostic report and/or optimization recommendation at a later stage may depend on the information collected at this stage as well as at any of the preceding stages.
The specific implementation flows of the present disclosure at different stages are further described below.
It should be noted that the optimization method of the present disclosure can be applied to various databases.
A database using an MPP (Massively Parallel Processing) architecture is referred to as an MPP database. The MPP architecture is characterized in that tasks are distributed to a plurality of servers and nodes in parallel, and after calculation on each node is completed, results of respective parts are collected together to obtain a final result.
The MPP database consists of Master and Segment. The Master is the entry for the user to access the MPP. All user operations are performed through the Master. Master contains all metadata in the entire distributed database, such as table structure definitions, indexes, data distribution information, and so on. Master does not store the actual data, which is stored in Segment. The Master receives the SQL command sent by the user, analyzes and generates a distributed execution plan, and then issues the execution plan to the corresponding Segment for execution. And sending the result of the Segment execution to the Master, and the Master receives the result of the Segment to collect and return the execution result to the user.
Unlike a single-machine database, the SQL execution efficiency of the MPP database is affected by factors such as skew, lock, concurrency, and the like, and the state of each node in the cluster affects the return of the final result, which makes the SQL efficiency optimization a difficult problem for using the MPP database. The skew is a phenomenon that after tasks are dispersed to a plurality of servers and nodes in parallel, most of the nodes are executed and finished, and a few nodes are waited for execution, so that query cannot be returned.
Considering that factors influencing the execution efficiency of database operation statements of the MPP database are complex, the optimization of the MPP database is difficult. The scheme of the present disclosure is described below by taking an MPP database as an example, and it should be understood that the present disclosure can also be applied to the execution efficiency optimization of other types of databases.
Pre-execution phase
The lifecycle of the database operation statement may include a pre-execution phase prior to execution of the database operation statement. The database operation statement can be analyzed in the pre-execution stage to obtain the analysis result of the database operation statement.
Information related to the execution efficiency of the database operation statements may be collected at a pre-execution stage.
Pre-execution suggestions may be generated based on the collected information. The proposal is pre-executed, namely, the optimization proposal generated in advance before the database operation statement is executed. Before executing the database operation statement, optimization operation can be executed based on the pre-execution suggestion so as to check and solve a part of problems influencing the execution efficiency of the database operation statement in advance.
The information gathering operations performed in the pre-execution phase may include, but are not limited to, one or more of the following.
1. Obtaining table information associated with a table involved in a database operation statement
The tables involved in the database operation statement may be physical tables or underlying tables of the view (i.e., logical tables).
The table information may include first table information and second table information.
The first table information is used for representing basic information of the table, and the first table information can also be called table basic information; the second table information is used to characterize the information obtained by performing statistics and/or analysis on the table, and may also be referred to as table high-level information.
Specifically, the first table information may include, but is not limited to, any one or more of a distribution key, a row and column memory, a table compression, a field number, a primary key, a unique constraint, and a partition number; the second table information may include, but is not limited to, any one or more of disk space usage details, statistics, inflation conditions, and tilt conditions.
The statistical information is information obtained by performing statistics on a table, and may include, but is not limited to, a histogram (histogram), density information (density information), string statistics information (string statistics), and the like. When the table is counted to obtain the statistical information, the statistical operation (e.g., count) which is relatively time-consuming may be optional.
2. Analyzing whether the query condition in the database operation statement is reasonable
The query condition in the database operation statement may include, but is not limited to, a join condition, a where condition.
Aiming at the join condition in the database operation statement, whether the hidden danger of the join condition exists can be detected, so that the problem that the performance of the database is influenced due to expansion of a result set possibly caused by unreasonable query condition setting is found.
The join condition hidden danger detection can be carried out from the following two aspects: (1) judging a Cartesian product: the table in the SQL is not associated with any other table, and the result set is increased inevitably; (2) join amplification early warning: and (4) after correlation, the result set is larger than the maximum value of the number of the table lines, namely, a 1: n correlation mode exists, and whether the application requirements are met or not needs to be determined.
For the where condition in the database operation statement, it may be detected whether an index exists in the table corresponding to the query whose ratio of the result set after the where condition filtering is lower than a preset threshold (e.g., 30%) in the table. If no index exists, the query efficiency is affected.
3. Predicting execution plan of database operation statement
The execution plan of the database operation statement can be predicted based on the Explain keyword. The simulator optimizer can execute the database operation statement by using the Explain keyword so as to predict the execution process of the database operation statement and obtain the time consumption of each step in the simulated execution process.
4. Detecting whether the number of nodes on each host matches the host resources
According to the MPP plan, each segment has a primary segment and a backup segment. Normally, the main segment executes the task allocated by the master, the minor segment is converted into a new main segment to execute the task when the main segment is unavailable, and the original main segment is converted into the minor segment.
The main/standby switching causes an imbalance phenomenon due to the over-high load of part of servers, which is called segment imbalance. When segment imbalance occurs, the cluster is required to perform segment rebalancing. A more typical scenario is that after a certain host is down, the main segments on the host are switched to the host where the standby node is located, after the machine that is down recovers and reconstructs the segments, the segments on the host are all minor segments, the load is very low, and the distribution of the segments in the cluster is unbalanced.
By detecting whether the number of the main segments on each host in the cluster is matched with the host resources, whether segment imbalance exists in the cluster can be judged. A cluster is also a system executing SQL, and the system may include multiple hosts, and each host may have one or more segments deployed thereon.
The pre-execution plan generated based on the above information acquired in the pre-execution phase may include, but is not limited to, at least one of: a table design suggestion; prompting the hidden condition danger of the join; index recommendation; performing a planned bottleneck prompt; an execution plan suggestion; prompting a node balance state; prompting the inclination condition of the watch; and prompting the statistical information condition.
The table design suggestions may be table design best practice suggestions generated based on table information (e.g., table base information) and database operation statements (e.g., the results of parsing the database operation statements). The contents of the table design suggestions may include, but are not limited to, associated field distribution key suggestions, small table copy table suggestions, large table list compression suggestions.
The distribution key suggestion refers to a distribution key which can set the maximum table association field suggestion in the database operation statement as the table. Distribution key suggestions can be given by judging whether the table distribution building fields are consistent with the association fields among the tables. That is, if the table distribution field is not consistent with the inter-table association field, a distribution key suggestion may be generated.
The small table copy table suggestion refers to a query that a database operation statement relates to more than 2 small table queries and associated fields are not small table distribution keys, and the small table can be suggested to be set as a copy table. Wherein, a small table may refer to a table with a data amount less than or equal to the first threshold (e.g., the number of rows is within 10w rows).
The large-table column memory compression suggestion refers to that for a row memory table with the data volume larger than a second threshold (for example, 1 hundred million rows), a user is suggested to consider setting the row memory table as the column memory compression table under the condition of infrequent updating so as to save IO (input/output) expenses and improve query efficiency.
The join condition hidden danger prompt refers to that a corresponding prompt is generated under the condition that the join condition hidden danger is detected to exist possibly.
The index recommendation refers to generating index recommendation when no index exists in a table corresponding to a query of which the ratio of the result set in the table is lower than a preset threshold after the work condition in the database operation statement is filtered. Where multiple conditions suggest adding a joint index.
The execution plan bottleneck prompting refers to a step which takes a long time in prompting the execution plan based on the estimation result of the execution plan. For example, steps with execution cost (or actual execution time) up to 20% may be marked red to remind the executor of heavy attention.
The execution plan suggestion may refer to a new optimized execution plan generated based on results of prediction of the execution plan by Explain.
The node balance state prompt is used for prompting the balance state of the nodes in the cluster. And judging whether segment imbalance exists in the cluster or not based on the detection result of whether the number of the nodes on each host is matched with the host resources or not.
Execution phase
The lifecycle of the database operation statement may also include an execution phase (which may also be referred to as an in-execution phase) that characterizes the actual execution of the database operation statement.
Unlike the above-mentioned Explain key used in the pre-execution stage, the Explain key is used to evaluate the execution process of the database operation statement, and the database operation statement is not actually executed in the evaluation process.
The execution phase characterizes the actual execution process of the actual execution plan of the database operation statement.
Therefore, the information which cannot be acquired in the pre-execution stage and can truly reflect the information related to the execution efficiency of the database operation statement in the actual execution process of the database operation statement can be acquired in the execution stage. For example, the load condition of the host where each node (master and segment) is located in the execution process, the database state, the monitoring condition of each node, and the like.
The information which is collected in the execution stage and can truly reflect the execution efficiency of the database operation statement in the actual execution process of the database operation statement can be used as auxiliary optimization information to be provided for users.
One or more of the following information may be collected during the execution phase.
1. Status information of a database
The state of the database may be monitored during the execution of the database operation statements to obtain state information of the database in real time (i.e., a snapshot of the state of the database). The state of the database may include, but is not limited to, locks, active sessions, external exchange files (workfiles). The external exchange files are files used when the segment queries that the memory is insufficient.
The status of the database may be used to determine the database load condition. For example, the state of the database can be used as a judgment basis for performance degradation of the database operation statement under high load.
Therefore, the state information of the database collected in the actual execution process of the database operation statement can be used as the optimization auxiliary information and provided for the user.
2. Host load information
Load information such as a CPU, IO, MEM (memory), NET (network bandwidth), and the like of the host where each node (master and segment) is located in the actual execution process of the database operation statement can be monitored through the host or an Application Program Interface (API) arranged on the host.
The load information of the host can be used for judging the performance bottleneck, so that the load information of the host collected in the actual execution process of the database operation statement can be used as optimization auxiliary information and provided for a user.
3. Status information of nodes
The execution state of the nodes (master and segment) in the actual execution process of the database operation statement can be monitored to obtain the state information of the nodes.
Taking monitoring of the execution states of the segments as an example, the execution conditions of the database operation statements in the segments can be determined by monitoring the execution states of the segments, so that the problem of unbalance calculation can be conveniently positioned. For example, if a database operation statement is found to be completed in most segments, and only one or two segments are executed, the calculation imbalance can be basically determined.
As an example, it is optional to generate an actual execution plan of the database operation statement using an analysis tool (e.g., Explain Analyze), and Analyze the actual execution of the database operation statement during the execution phase.
By way of example, one or more optimizers may also be provided, under the optimization of which actual execution plans for the database operation statements are generated. Wherein different optimizers may generate different actual execution plans. In the case where a plurality of optimizers are provided, the database operation statement may be executed once under the optimization action of different optimizers. The information collection operation can be executed in the execution process of each optimizer, so that the execution difference of different optimizers can be compared. Therefore, the execution difference information of different optimizers can be provided to the user as the optimization auxiliary information.
Post-execution phase
The lifecycle of the database operation statement may further include a post-execution stage after the execution process for characterizing the database operation statement is completed.
Multiple execution plans can be obtained in the post-execution stage, and the multiple real execution plans are compared to obtain the comparison result of the multiple execution plans. The multiple execution plans may refer to actual execution plans corresponding to different optimizers, or may refer to a historical execution plan and a current execution plan, where the historical execution plan may refer to an execution plan before SQL is rewritten or imported.
Taking a plurality of actual execution plans as actual execution plans corresponding to different optimizers as an example, the actual execution plans and information acquired in the actual execution plan execution process can be compared to obtain information capable of reflecting execution differences of the different optimizers, and the information can be provided to a user as auxiliary optimization information.
New execution plan and/or actual execution plan bottleneck hints may also be generated at the post-execution stage.
A new execution plan may be generated based on the diagnostic report and/or optimization suggestions (e.g., the execution plan comparisons, other secondary optimization information, and/or optimization suggestions mentioned above), which may be considered to be an optimized actual execution plan based on which the database operation statements may be executed.
An actual execution plan bottleneck prompt may be generated based on actual execution of the actual execution plan. Execution plan bottleneck hints may be used to hint at the more time-consuming steps in execution planning. For example, the steps with the actual execution time of 20% may be marked with red to remind the performer to pay more attention.
Application example
By taking the database as the MPP database and the database operation statement as the SQL statement as an example, the execution efficiency optimization method can be realized as an auxiliary scheme for efficiency diagnosis and optimization of the MPP database. Figure 2 illustrates a schematic flow diagram of an MPP database oriented efficiency diagnosis and optimization assistance scheme according to one embodiment of the present disclosure. The following mainly describes the implementation flow of the scheme, and for the details involved therein, see the above related description.
As shown in fig. 2, in this embodiment, various types of information required for SQL tuning can be collected and processed from three stages before, during, and after SQL execution, and finally an efficiency diagnosis report and an optimization suggestion are generated.
The main flow is as follows.
1. Before SQL is executed
Firstly, the input SQL text can be analyzed, and the table basic information and the table high-level information can be obtained through analysis.
And obtaining a table design better practice suggestion based on the table basic information and the SQL text.
By carrying out join condition scanning on the SQL text, whether join condition hidden danger exists in the execution process of the SQL text can be determined. If the hidden danger exists, a join condition hidden danger prompt can be generated.
Index recommendations can be obtained by performing where conditional judgment on the SQL text.
The execution process of the SQL can be estimated based on the Explain keywords, and an execution plan bottleneck prompt is obtained.
By executing segment balance detection on the host where the segment of the SQL statement is executed, whether segment imbalance exists in a system executing the database operation statement can be determined.
Thus, the following pre-execution suggestions can be provided before SQL is executed: the method comprises the following steps of base table design suggestion, index suggestion, join condition hidden danger prompt, index suggestion, Explain execution plan suggestion, segment balance state and table inclination condition and statistical information condition prompt. Partial SQL can solve a part of efficiency problems by pre-executing suggestions.
2. In SQL execution
Whether to open the perspective segment SQL can be selected before SQL execution. The perspective segment SQL suggests opening before SQL is executed, so that the collected segment execution information is more complete. Opening the perspective segment SQL is equivalent to opening the segment SQL to perform monitoring.
The perspective segment SQL can track the actual execution content and state of SQL on the master or stored procedures on each segment node, and can provide reference for the calculation inclination which is difficult to locate. For example, if a piece of SQL is found to be executed in most segments, and only one or two segments are executed, the calculation imbalance can be basically determined.
Before SQL is executed, whether the actual execution plan is generated and executed by means of Explain analyze can be selected. After the Explain analyze step, the corresponding commands can be assembled in front of SQL through SQL assembly to generate an execution plan. For example, if the display analysis is selected, the execution plan can be generated by splicing the display analysis select from dual, and if the display analysis is not selected, the execution plan can be directly generated by splicing the display analysis select from dual.
System load snapshots and database state snapshots may be collected in SQL execution.
And whether multiple optimizers execute can be set, and if the multiple optimizers execute, the SQL can be executed once under the action of different optimizers. For example, SQL may be executed once under the optimization of the default optimizer, and then SQL may be executed once under the optimization of the non-default optimizer by changing the optimizer options.
3. After SQL is executed
After SQL is executed, actual execution plan bottleneck judgment can be executed, and the step consuming more time is determined.
Multiple execution plans may also be compared after SQL execution, e.g., execution plans of multiple optimizers may be compared to obtain execution differences of different optimizers.
Therefore, rich optimization auxiliary information can be provided, including but not limited to operating system load, database cluster load, table storage, fields, lock state, active sessions, segment level session state monitoring, workfile state, different optimizer execution differences, actual execution plan comparison and bottleneck prompting.
In conclusion, the method and the system can realize SQL full-life cycle information collection and provide optimization suggestions based on better practice of table design; the method provides deep monitoring in the SQL execution process, not only monitors the SQL execution state of the master, but also monitors the execution state of each segment SQL. By this monitoring means a reference is provided for calculating the tilt which is difficult to locate. Meanwhile, SQL perspective capability in the execution of a complex storage process is provided, and the problem of high time consumption and slow query in positioning execution is solved; an MPP database execution plan bottleneck judgment auxiliary function is provided, and a historical execution plan comparison function is also provided, so that transverse comparison and longitudinal bottleneck positioning are facilitated; the multi-optimizer execution SQL function is provided, so that a developer can conveniently and quickly compare execution differences of different optimizers (such as legacy and orca) to determine to select a proper optimizer.
The execution efficiency optimization method of the database operation statement can also be realized as an execution efficiency optimization device. Fig. 3 shows a schematic structural diagram of an execution efficiency optimization apparatus according to an embodiment of the present disclosure. Wherein the functional elements of the execution efficiency optimization apparatus may be implemented in hardware, software, or a combination of hardware and software implementing the principles of the present disclosure. It will be appreciated by those skilled in the art that the functional units described in fig. 3 may be combined or divided into sub-units to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional units described herein.
The functional units that the execution efficiency optimization apparatus can have and the operations that each functional unit can perform are briefly described below, and for the details related thereto, reference may be made to the above description, which is not repeated herein.
Referring to fig. 3, the execution efficiency optimization apparatus 300 includes an acquisition module 310 and a generation module 320.
The collection module 310 is used for collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement. The generation module 320 is used to generate diagnostic reports and/or optimization recommendations based on the collected information.
The lifecycle of the database operation statement may include a pre-execution stage before the database operation statement is executed, the collecting module 310 may collect information related to the execution efficiency of the database operation statement in the pre-execution stage, the generating module 320 may generate a pre-execution suggestion based on the collected information, and the execution efficiency optimizing apparatus 300 may further include an optimizing module configured to execute an optimizing operation based on the pre-execution suggestion before the database operation statement is executed.
The lifecycle of the database operation statement may include an execution phase for characterizing an actual execution process of the database operation statement, and the collection module 310 may collect information related to an execution efficiency of the database operation statement during the actual execution process in the execution phase. The generation module 320 may use the collected information as secondary optimization information.
The execution efficiency optimization device can also comprise a setting module for setting one or more optimizers, and an actual execution plan of the database operation statement can be generated and executed under the optimization action of the optimizers.
The life cycle may include a post-execution stage after the actual execution process of the database operation statement is completed, the acquisition module 310 may acquire a plurality of actual execution plans corresponding to different optimizers at the post-execution stage, and the generation module 320 may compare the plurality of actual execution plans to obtain information capable of reflecting execution differences of the different optimizers.
The generation module 320 may also generate new execution plans and/or actual execution plan bottleneck hints at the post-execution stage.
FIG. 4 shows a schematic structural diagram of a computing device that can be used to implement the above-described method according to an embodiment of the present disclosure.
Referring to fig. 4, computing device 400 includes memory 410 and processor 420.
The processor 420 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 420 may include a general-purpose host processor and one or more special purpose coprocessors such as, for example, a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 420 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 410 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are required by the processor 420 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 410 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 410 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 410 has stored thereon executable code that, when processed by the processor 420, causes the processor 420 to perform the above-mentioned method for optimizing the execution efficiency of the database operation statements.
The method, apparatus and device for optimizing execution efficiency of database operation statements according to the present disclosure have been described in detail above with reference to the accompanying drawings.
Furthermore, the method according to the present disclosure may also be implemented as a computer program or computer program product comprising computer program code instructions for performing the above-mentioned steps defined in the above-mentioned method of the present disclosure.
Alternatively, the present disclosure may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the various steps of the above-described method according to the present disclosure.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (16)

1. A method for optimizing the execution efficiency of database operation statements comprises the following steps:
collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement;
generating a diagnostic report and/or optimization recommendation based on the collected information.
2. The method of claim 1, wherein,
the lifecycle includes a pre-execution phase prior to executing the database operation statement,
the step of collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement comprises the following steps: collecting information related to the execution efficiency of the database operation statement in the pre-execution stage,
the step of generating efficiency diagnostic reports and/or optimization recommendations based on the collected information includes: generating a pre-execution suggestion based on the collected information,
the method further comprises the following steps: performing an optimization operation based on the pre-execution suggestion prior to executing the database operation statement.
3. The method of claim 2, wherein the database for which the database operation statement is directed is an MPP database, and the step of collecting information related to the execution efficiency of the database operation statement during the parsing stage comprises:
obtaining table information related to a table related to the database operation statement; and/or
Analyzing whether the query condition in the database operation statement is reasonable or not; and/or
Predicting an execution plan of the database operation statement; and/or
And detecting whether the number of the main selection nodes on each host is matched with the host resources.
4. The method of claim 3, wherein analyzing whether the query condition in the database operation statement is reasonable comprises:
detecting whether the hidden condition of the join exists in the database operation statement or not; and/or
And detecting whether indexes exist in the table corresponding to the query in which the ratio of the result set to the table after the where condition filtering in the database operation statement is lower than a preset threshold value.
5. The method of claim 3, wherein,
the table information includes first table information and second table information, the first table information is used for representing basic information of a table, the second table information is used for representing information obtained by statistics and/or analysis of the table, and the first table information includes at least one of the following items: the system comprises a distribution key, a row and column memory, table compression, field number, a primary key, unique constraint and partition number, wherein the second table information comprises at least one of the following items: disk space occupancy details, statistical information, expansion conditions, tilt conditions.
6. The method of claim 2, wherein the pre-execution suggestion includes at least one of:
a table design suggestion;
prompting the hidden danger of the join condition;
index recommendation;
performing a planned bottleneck prompt;
an execution plan suggestion;
prompting a node balance state;
prompting the inclination condition of the watch;
and (5) prompting statistical information.
7. The method of claim 1, wherein,
the lifecycle includes an execution phase for characterizing an actual execution process of the database operation statements,
the step of collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement comprises the following steps: collecting information related to the execution efficiency of the database operation statement in the actual execution process in the execution stage,
the step of generating efficiency diagnostic reports and/or optimization recommendations based on the collected information includes: and using the collected information as auxiliary optimization information.
8. The method of claim 7, wherein the database for which the database operation statement is directed is an MPP database, and the step of collecting information related to the execution efficiency of the database operation statement during actual execution in the execution stage comprises:
monitoring the state of the database in the actual execution process to obtain the state information of the database; and/or
Monitoring the load of the host where the node is located in the actual execution process to obtain the load information of the host; and/or
And monitoring the execution state of the node in the actual execution process to obtain the state information of the node.
9. The method of claim 7, further comprising:
and generating an actual execution plan of the database operation statement based on an analysis tool, and analyzing the actual execution condition of the database operation statement in the execution stage.
10. The method of claim 7, further comprising:
one or more optimizers are provided for the process,
and generating and executing an actual execution plan of the database operation statement under the optimization action of the optimizer.
11. The method of claim 1, wherein,
the life cycle comprises a post-execution stage after the actual execution process of the database operation statement is finished,
the step of collecting information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement further comprises: obtaining a plurality of execution plans at the post-execution stage,
the step of generating efficiency diagnostic reports and/or optimization recommendations based on the collected information includes: and comparing the plurality of execution plans to obtain execution plan comparison information.
12. The method of claim 11, wherein generating an efficiency diagnostic report and/or optimization recommendation based on the collected information further comprises:
generating a new execution plan and/or an actual execution plan bottleneck prompt at the post-execution stage.
13. An execution efficiency optimization device of database operation statements comprises:
the acquisition module is used for acquiring information related to the execution efficiency of the database operation statement in the life cycle of the database operation statement;
a generation module for generating a diagnostic report and/or optimization recommendation based on the collected information.
14. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1 to 12.
15. A computer program product comprising executable code which, when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1 to 12.
16. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-12.
CN202111537498.1A 2021-12-15 2021-12-15 Execution efficiency optimization method and device for database operation statements Pending CN114510499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111537498.1A CN114510499A (en) 2021-12-15 2021-12-15 Execution efficiency optimization method and device for database operation statements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111537498.1A CN114510499A (en) 2021-12-15 2021-12-15 Execution efficiency optimization method and device for database operation statements

Publications (1)

Publication Number Publication Date
CN114510499A true CN114510499A (en) 2022-05-17

Family

ID=81547714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111537498.1A Pending CN114510499A (en) 2021-12-15 2021-12-15 Execution efficiency optimization method and device for database operation statements

Country Status (1)

Country Link
CN (1) CN114510499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591542A (en) * 2024-01-18 2024-02-23 准检河北检测技术服务有限公司 Intelligent detection method for database software data security

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591542A (en) * 2024-01-18 2024-02-23 准检河北检测技术服务有限公司 Intelligent detection method for database software data security
CN117591542B (en) * 2024-01-18 2024-03-22 准检河北检测技术服务有限公司 Intelligent detection method for database software data security

Similar Documents

Publication Publication Date Title
CN108255712B (en) Test system and test method of data system
US11429584B2 (en) Automatic determination of table distribution for multinode, distributed database systems
US8578348B2 (en) System and method of cost oriented software profiling
US6801903B2 (en) Collecting statistics in a database system
US7991763B2 (en) Database query optimization utilizing remote statistics collection
US7130838B2 (en) Query optimization via a partitioned environment
US7831593B2 (en) Selective automatic refreshing of stored execution plans
US7743052B2 (en) Method and apparatus for projecting the effect of maintaining an auxiliary database structure for use in executing database queries
US20080228710A1 (en) Building database statistics across a join network using skew values
Pavlo et al. External vs. internal: an essay on machine learning agents for autonomous database management systems
US20060031200A1 (en) Method and system for tracking performance by breaking down a query
Baldacci et al. A cost model for SPARK SQL
US9195730B2 (en) Verifying correctness of a database system via extended access paths
Munir et al. A cost-based storage format selector for materialized results in big data frameworks
CN114510499A (en) Execution efficiency optimization method and device for database operation statements
US20080133457A1 (en) Database optimizer plan validation and characterizations
Oliveira et al. Migration of relational databases to NoSQL-methods of analysis
KR101830936B1 (en) Performance Improving System Based Web for Database and Application
JP5084750B2 (en) Managing statistical views in a database system
US7155432B2 (en) Method and system decoding user defined functions
US7805434B2 (en) Performing a non-invasive runtime simulation for a database system
US20060085464A1 (en) Method and system for providing referential integrity constraints
Zhang et al. Getting more for less in optimized mapreduce workflows
Giurgiu et al. Do you know how to configure your enterprise relational database to reduce incidents?
Munir et al. Intermediate results materialization selection and format for data-intensive flows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination