CN111046059B

CN111046059B - Low-efficiency SQL statement analysis method and system based on distributed database cluster

Info

Publication number: CN111046059B
Application number: CN201911248586.2A
Authority: CN
Inventors: 徐国柱; 欧万翔; 邓智鸿; 张东凯
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2023-06-30
Anticipated expiration: 2039-12-09
Also published as: CN111046059A

Abstract

The application provides a low-efficiency SQL statement analysis method and system based on a distributed database cluster, which can further realize the function of automatically analyzing an input SQL statement by setting an SQL performance analysis model, wherein the SQL performance analysis model is formed by training according to historical data, whether application data or system data are known data, the known data are used as the basis of performance results, and the analysis results gradually tend to be reasonable through training. Can liberate manpower, improve efficiency, rapidly accomplish analysis diagnosis to a low-efficient SQL.

Description

Low-efficiency SQL statement analysis method and system based on distributed database cluster

Technical Field

The invention relates to the technical field of database query, in particular to a low-efficiency SQL statement analysis method and system based on a distributed database cluster.

Background

Distributed databases are in most cases more suitable as storage engines, computation engines and analysis engines for big data. For example, greenplus is taken as an enterprise-level database product, and is one of the most advanced OLAP open source databases in the world because of the characteristics of supporting mass data storage and processing, high cost performance, supporting BI real-time analysis, realizing dynamic data warehouse, system usability, supporting thread expansion by adopting an MPP parallel processing structure, better concurrent support and high availability support, supporting MapReduce, and database internal compression.

A distributed database is a logical database that is composed of up to tens, hundreds of individual database services. The application has the following characteristics: the SQL access amount is large; SQL access content is unpredictable; the operation efficiency has close relation with system resources; the operation efficiency is closely related to the database design. This in turn results in a distributed database, which may have millions of query statements per day, which are written properly, and whether the database table design is reasonable will affect the efficiency of SQL execution and thus the user experience.

Disclosure of Invention

To solve at least one of the above problems, an embodiment of a first aspect of the present application provides an inefficient SQL statement analysis method based on a distributed database cluster, including:

obtaining analysis basis information and low-efficiency SQL sentences of a distributed database cluster, wherein the analysis basis information comprises: application information and system information;

inputting the low-efficiency SQL statement and the analysis basis information into a preset SQL performance analysis model to obtain a low-efficiency reason analysis result of the low-efficiency SQL statement;

the SQL performance analysis model is trained by a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information in the distributed database cluster.

In certain embodiments, further comprising:

establishing the SQL performance analysis model;

the SQL performance analysis model is trained by a plurality of known inefficient SQL statements, corresponding historical application information, and historical system information in the distributed database cluster.

In some embodiments, the building the SQL performance analysis model comprises:

establishing an input layer, a data source arrangement layer, an SQL analysis layer, a performance analysis layer and an output layer;

the input layer inputs the analysis basis information and the low-efficiency SQL statement;

the data source arrangement layer extracts the analysis basis information and outputs each piece of structured information formed by classification according to a preset rule;

the SQL analysis layer extracts the low-efficiency SQL sentence and outputs the characteristic information of the structural characteristics of the low-efficiency SQL sentence;

the performance analysis layer inputs feature information of structural features of the low-efficiency SQL sentences and each piece of structural information, and performs SQL performance analysis to obtain the low-efficiency reason analysis result;

and the output layer outputs the analysis result of the inefficiency reasons.

In some embodiments, the training the SQL performance analysis model with a plurality of known inefficient SQL statements, corresponding historical application information, and historical system information in the distributed database cluster comprises:

Inputting a known low-efficiency SQL sentence, corresponding historical application information and historical system information into an input layer;

generating efficiency factors influencing SQL sentences and corresponding weights according to the historical application information and the historical system information;

setting structured features output by an SQL analysis layer, analyzing the known low-efficiency SQL sentence to obtain structured feature information of the known low-efficiency SQL sentence, and associating the efficiency factors, the corresponding weights and the structured feature information of the known low-efficiency SQL sentence;

and inputting a plurality of known low-efficiency SQL sentences, and corresponding historical application information and historical system information, and training the SQL analysis layer.

Embodiments of a second aspect of the present application provide an inefficient SQL statement analysis system based on a distributed database cluster, including:

the acquisition module is used for acquiring analysis basis information and low-efficiency SQL sentences of the distributed database cluster, wherein the analysis basis information comprises the following components: application information and system information;

the performance analysis module inputs the low-efficiency SQL statement and the analysis basis information into a preset SQL performance analysis model to obtain a low-efficiency reason analysis result of the low-efficiency SQL statement;

In certain embodiments, further comprising:

the model building module is used for building the SQL performance analysis model;

and the model training module is used for training the SQL performance analysis model through a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information in the distributed database cluster.

In certain embodiments, the SQL performance analysis model comprises: an input layer, a data source arrangement layer, an SQL analysis layer, a performance analysis layer and an output layer;

and the output layer outputs the analysis result of the inefficiency reasons.

In certain embodiments, the model training module comprises:

the sample input unit is used for inputting a known low-efficiency SQL sentence, corresponding historical application information and historical system information into the input layer;

The efficiency factor generation unit is used for generating efficiency factors influencing SQL sentences and corresponding weights according to the historical application information and the historical system information;

the association unit is used for setting the structured features output by the SQL analysis layer, analyzing the known low-efficiency SQL sentences to obtain structured feature information of the known low-efficiency SQL sentences, and associating the efficiency factors, the corresponding weights and the structured feature information of the known low-efficiency SQL sentences;

and the training unit is used for inputting a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information and training the SQL analysis layer.

In yet another aspect, embodiments of the present application provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method described above when the program is executed.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method described above.

The beneficial effects of this application are as follows:

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flow chart of an inefficient SQL statement analysis method based on a distributed database cluster in an embodiment of the invention.

FIG. 2 is a flow chart illustrating an exemplary method for automatically analyzing an inefficient SQL statement based on a distributed database cluster in accordance with an embodiment of the invention.

Fig. 3 shows a schematic diagram of the data acquisition process input/output of the process 1 of fig. 2.

Fig. 4 shows a functional introduction to the data source collation process of process 2 in fig. 2.

Fig. 5 shows a functional introduction diagram implemented by the SQL parsing process of process 3 in fig. 2.

Fig. 6 shows a functional introduction chart implemented by the report preparation process of process 4 in fig. 2.

FIG. 7 shows a functional introduction graph implemented by report generation of process 5 in FIG. 2.

Fig. 8 shows a functional description diagram implemented by the process 6 index resolution in fig. 2.

FIG. 9 shows a prior art SQL statement analysis graph.

FIG. 10 is a schematic diagram of an inefficient SQL statement analysis system based on distributed database clusters according to an embodiment of the invention.

Fig. 11 shows a schematic structural diagram of a computer device suitable for use in implementing embodiments of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 shows an inefficient SQL statement analysis method based on a distributed database cluster in an embodiment of the present application, which specifically includes:

s1, acquiring analysis basis information and low-efficiency SQL sentences of a distributed database cluster, wherein the analysis basis information comprises: application information and system information;

s2, inputting the low-efficiency SQL statement and the analysis basis information into a preset SQL performance analysis model to obtain a low-efficiency reason analysis result of the low-efficiency SQL statement;

The method for analyzing the low-efficiency SQL sentences based on the distributed database clusters can realize the function of automatically analyzing the input SQL sentences by setting an SQL performance analysis model, wherein the SQL performance analysis model is formed by training according to historical data, is known data no matter application data or system data, takes the known data as the basis of performance results, and makes the analysis results gradually trend to be reasonable through training. Can liberate manpower, improve efficiency, rapidly accomplish analysis diagnosis to a low-efficient SQL.

The SQL performance analysis model may be built online or offline, e.g., in some embodiments, the SQL performance analysis model is built online and trained, i.e., the method further comprises:

s01: establishing the SQL performance analysis model;

s02: the SQL performance analysis model is trained by a plurality of known inefficient SQL statements, corresponding historical application information, and historical system information in the distributed database cluster.

The building and training of SQL performance models is described in detail below.

The building of the SQL performance analysis model comprises the following steps:

and the output layer outputs the analysis result of the inefficiency reasons.

The input layer, the data source arrangement layer, the SQL analysis layer, the performance analysis layer and the output layer are described in detail below with reference to the embodiments.

Fig. 2 shows a specific flow analysis step of the whole model.

In some embodiments not shown in the figures, the input layer further performs data classification, i.e., regularizing the input data to obtain system resource information, inefficient SQL statements, metadata information, and running information.

In other embodiments, as shown in fig. 2, the SQL performance analysis model further includes a data acquisition layer, where the data acquisition layer normalizes input data to obtain system resource information, a low-efficiency SQL statement, metadata information, and operation information.

In the embodiment in fig. 3, the data input by the input layer includes:

a1 application information, including: a1.1 is single SQL to be optimized, A1.2 database operation log and A1.3 index information;

a2 system information, including: a2.1 database system table and A2.2 node operation load information;

a3 experience information, including: a3.1 cluster data, a3.2 inefficiency functions.

In the embodiment in fig. 3, the data input by the data acquisition layer is the data output by the input layer, and outputting the B1 system resource information includes: b1.1 resource queue, B1.2CPU information, B1.3 memory information, B1.4 disk information;

b2 inefficient SQL includes: B2.1SQL content;

the B3 metadata information includes: b3.1 table information, B3.2 view information, B3.3 field information;

the B4 operation information includes: b4.1 execution information, B4.2 user information, B4.3 cluster information.

The data acquisition layer in the specific embodiment is described below as a function.

1. The information source is the basis of subsequent analysis, and mainly comes from 3 aspects:

(1) The existing reports and index information of the users, SQL supporting the reports and index applications, and SQL running time, environment, users and the like;

(2) basic information of a database environment operated by SQL (structured query language) can be obtained from system logs, system tables and configuration information;

(3) independent of experience information of existing applications or systems, such as known inefficiency functions, number of nodes of database clusters, hardware configuration, etc.

2. "Process 1: data acquisition process completion

And according to the requirements of Greemplum database cluster application optimization, summarizing and combing out the information types needing to be collected, wherein the information types comprise system resource information, metadata information, analysis objects SQL (structured query language) and related operation information when the SQL is operated. While this information may be distributed over multiple locations in the large environment of the user, most may be obtained automatically, some may require manual maintenance, and some may require process extraction. Process 1: the input before the data acquisition process is uncertain and varies depending on the user environment itself. "Process 1: data acquisition the data of various different sources and different properties are collected to the low-efficiency SQL statement analysis method and system described in the application to form a data acquisition layer B.

3. "Process 1: meaning of data acquisition "procedure:

the data acquisition is before the "low-efficiency SQL statement analysis method and system" described in the present application, and may be the database system itself, an external system, or an external function. Is necessary for the subsequent work.

In some embodiments, the data source sorting layer inputs data output by the data acquisition layer, as shown in fig. 4, and the output data includes:

c1 metadata info_table, comprising: c1.1 table identification+job identification, C1.2 table definition, C1.3 table information;

c2 metadata info_field, comprising: c2.1 table identification + field identification + job identification, c2.2 field info_field type sense;

c3 operation information, including: c3.1 job identification, C3.2 execution information, C3.3 user information, C3.4 resource queues, C3.5 application information, C3.6 machine information, C3.7 memory information, C3.8CPU information, and C3.9 network information.

The data source collation layer in the specific embodiment is described below as a function.

1. The collected and tidied B data acquisition layer is subjected to processing treatments such as integration, standardization and the like;

2. "Process 2: the data source arrangement is to reorganize various data with different sources and different properties into structural information which accords with design specifications and is reasonably classified, and a C data source arrangement layer is formed and is used as a real data source for subsequent analysis work.

3. "Process 2: meaning of data Source arrangement "procedure:

"Process 2: the data source arrangement forms a data source arrangement layer C, and the data source arrangement layer C can ensure that functional logic and information sources in the subsequent analysis process are not tightly coupled and relatively independent, so that the method can be used as an independent and general solution.

In the embodiment in fig. 5, the input of the SQL parsing layer is the output data of the data acquisition layer, and the output data of the SQL parsing layer includes:

d1 table and field list, comprising: d1.1 job identification, D1.2 table identification, D1.3 field identification and D1.4 statistical information;

d2 inter-table association, comprising: d2.1 operation identification, D2.2 association identification, D2.3 left information, D2.4 right information, D2.5 association condition information, D2.6 statistical information and D2.7 characteristic information;

d3 table filtering conditions, including: d3.1 operation identification, D3.2 condition identification, D3.3 left information, D3.4 right information, D3.5 filtering condition information, D3.6 statistical information and D3.7 characteristic information;

d4 sub-query, comprising: d4.1 operation identification, D4.2 sub-query identification, D4.3 statistical information and D4.4 design information;

d5group_by, comprising: d5.1 job identification, d5.2group_by identification, d5.3 statistical information, and D5.4 characteristic information;

D6order_by, comprising: d6.1 job identification, d6.2order_by identification, d6.3 statistics, and D6.4 feature information.

And analyzing all key factors such as tables, fields, associated information, condition information and the like used by the SQL from the low-efficiency SQL statement.

The analysis result is not suitable for centralized processing due to different dimensions, and therefore is divided into 6 parts according to characteristics. The analysis result is conveniently expressed, and the attaching performance is optimized and analyzed. The 6 dimensions are respectively:

list of table fields

Inter-table association information

Table filtering information

Sub-queries

Group_by

Order_by

The analysis method is based on the library sqlparse of Python, and the tree-shaped result analyzed by sqlparse is further analyzed and processed to obtain a clear structured result.

"Process 3: meaning of SQL parse:

through the process, the low-efficiency SQL sentence can be finely decomposed, and preparation is made for the next step of correlating with the information in the data source arrangement layer C to obtain the direct basis required by the optimization analysis.

In some embodiments, the inefficiency cause analysis result is an analysis report, in which the performance analysis layer includes: report preparation layer and report layer.

The report preparation layer inputs output data of the data source arrangement layer and the SQL analysis layer, as shown in fig. 6, and the output data comprises:

E1 table design, comprising: e1.1 job identification, E1.10 statistical information collection information, E1.2 table identification, E1.3 statistical information, E1.4 data volume, E1.5 distribution information, E1.6 partition information, E1.7 compression information, E1.8 column storage information and E1.9 expansion information;

e2SQL association, comprising: e2.1 operation identification, E2.10 left and right information comparison, E2.2 association identification, E2.3 left information, E2.4 right information, E2.5 association condition information, E2.6 statistical information, E2.7 characteristic information, E2.8 related table design information and E2.9 related field design information;

e3sql_where, includes: e3.1 operation identification, E3.10 left and right information comparison, E3.2 condition identification, E3.3 left information, E3.4 right information, E3.5 filtering condition information, E3.6 statistical information, E3.7 characteristic information, E3.8 related table design information and E3.9 related field design information;

e4 system resources, including: e4.1 job identification, E4.2 execution plan overhead information, E4.3 run time period, E4.4 resource queue combination, E4.5CPU usage, E4.6 memory usage, E4.7 disk usage, E4.8 network usage, E4.9 system runtime usage.

The functions of the report preparation layer are described below.

After a large number of manually performed optimization analysis works are completed, the pencils accumulate rich optimization work experience, summarize and comb the steps of the optimization analysis and the concerned influence factors. Solidifying the contents to form an E report preparation layer; comprising 4 parts:

E1 table design

/>

E2SQL correlation

/>

E3SQL_WHERE

/>

E4 system resources

Through the association, statistics and extraction of the DSQL analysis layer and the C data source arrangement layer, the content designed in the E report preparation layer can be obtained, and a final basis is provided for the analysis result;

"Process 4: meaning of report preparation:

the design of the E report preparation layer is actually a summary of SQL performance analysis methods based on distributed database clusters;

the step covers efficiency factors which need to be concerned in each step of performance analysis, and the data of the factors are subjected to evidence obtaining, so that preparation is made for generating a final user report in the next step;

the input of the step also comprises the collection of application information corresponding to SQL and the machine learning result of the relation between the efficiency factor weight and the application efficiency, so that the final report can not only identify the points affecting the efficiency, but also quantify the proportion of each point affecting the final efficiency.

The input of the report layer prepares the output data of the layer for the report, as shown in fig. 7, the output data of the report layer includes:

f1 table design correlation, comprising: f1.1 distribution key, F1.2 partition, F1.3 compression, F1.4 column storage, F1.5 statistics collection, and F1.6 table expansion;

The SQL statement of the F2 application comprises: f2.1 complexity, f2.2 association, F2.3 filter conditions, F2.4 inefficiency functions, F2.5 ordering, f2.6unit;

f3 system resources including: f3.1 system resources;

f4 consolidated report comprising: is formed by combining F1, F2 and F3.

The function of the report layer is specifically described below.

And generating a report which is a text by using a structured E report preparation layer stored in a database. Providing a reference for subsequent optimization processing for a performance optimization demander;

analysis and description from different dimensions are required due to many factors that affect the final execution performance of SQL. The preceding steps are also classified for analysis. The three (3) parts which are relatively independent are shown in the final report, and are respectively:

watch design related part

SQL self-problem

System resource problems

/>

There is also a secondary processing procedure for the "E report preparation layer" in this procedure.

The final submitting part is the result of combining the three parts;

"Process 5: report generation "meaning:

and displaying the analysis result in a text form.

In the embodiment, the SQL performance analysis model can be more perfect and intelligent through training, and the adaptability is better.

In some embodiments, the training step specifically includes:

s10: inputting a known low-efficiency SQL sentence, corresponding historical application information and historical system information into an input layer;

s20: generating efficiency factors influencing SQL sentences and corresponding weights according to the historical application information and the historical system information;

s30: setting structured features output by an SQL analysis layer, analyzing the known low-efficiency SQL sentence to obtain structured feature information of the known low-efficiency SQL sentence, and associating the efficiency factors, the corresponding weights and the structured feature information of the known low-efficiency SQL sentence;

s40: and inputting a plurality of known low-efficiency SQL sentences, and corresponding historical application information and historical system information, and training the SQL analysis layer.

In a specific embodiment, as shown in fig. 8, the training step is process 6 index analysis in fig. 2.

Wherein the G application information includes:

g1 relationship information: g1.1 application identification, G1.2 index identification, G1.3 operation identification, G1.4 matching degree, G1.5 efficiency factor and G1.6 efficiency information;

g2 weight information: a G2.1 efficiency factor, a G2.2 weight value, and a G2.3 adjustment coefficient.

1. Combining "procedure 6: the index analyzes the generated G application information, and the performance of SQL used by a large number of applications is analyzed and fed back in a machine learning mode, so that the weight of each efficiency influence factor can be obtained, and an accurate basis is provided for predicting the efficiency of the new application;

2. "Process 6: meaning of index resolution

3. And obtaining the relation between the application such as the index and the report and the SQL at the bottom layer. Therefore, the analysis of SQL can be combined with the index report, and the weight of the SQL performance influence factor can be calculated by acquiring the access efficiency feedback of the index and the report.

It can be appreciated that the present application has the following beneficial effects:

1. can liberate the manpower, raise the efficiency. The analysis and diagnosis are rapidly completed on a low-efficiency SQL, and the analysis is based on the comprehensive analysis of the aspects of a database system, the principles of a distributed database, the historical access condition of a report and index application system, SQL writing experience and the like, and is not only aimed at writing SQL sentences;

2. the threshold can be lowered. The operation performance optimization analysis of the distributed database needs to be well understood on the principle of the distributed database, analysis implementation personnel are required to have rich database management and application development experience, and the threshold is high. With the method described in the application, a detailed and accurate diagnosis report is obtained through the system under the condition of collecting enough multi-source information, which is almost zero threshold;

3. based on the collection result of the low-efficiency SQL, batch analysis can be performed, database table design, SQL writing defects and the like can be rapidly and actively discovered, and the discovery can further guide and perfect design specifications and development specifications. The method has practical value for the conditions of more development and design personnel and uneven level;

4. By tracking and analyzing the running effects of a large number of reports, indexes and SQL applications, a manager of the database can be guided to perform optimization allocation on system parameter configuration, system resources and the like, and operation, maintenance and management work can be guided. The larger the cluster, the more users and the more applications, the more practical significance is achieved;

5. not only is it post-managed after the application online, as described above, but it can be pre-checked before the application online, finding possible problems in advance.

Based on the same inventive concept, another embodiment of the present application further provides an inefficient SQL statement analysis system based on a distributed database cluster, as shown in fig. 10, including:

the acquisition module 1 acquires analysis basis information and a low-efficiency SQL statement of the distributed database cluster, wherein the analysis basis information comprises: application information and system information;

the performance analysis module 2 inputs the low-efficiency SQL statement and the analysis basis information into a preset SQL performance analysis model to obtain a low-efficiency reason analysis result of the low-efficiency SQL statement;

For the same reason, in certain embodiments, the above system further comprises:

In a specific embodiment, the SQL performance analysis model comprises: an input layer, a data source arrangement layer, an SQL analysis layer, a performance analysis layer and an output layer;

and the output layer outputs the analysis result of the inefficiency reasons.

Based on the same inventive concept, in certain embodiments, the model training module comprises:

It can be understood that the function of automatically analyzing the input SQL sentence can be further realized by setting the SQL performance analysis model, the SQL performance analysis model is formed by training according to historical data, whether application data or system data are known data, the known data are used as the basis of performance results, and the analysis results gradually tend to be reasonable through training. Can liberate manpower, improve efficiency, rapidly accomplish analysis diagnosis to a low-efficient SQL.

The prior art is described in comparison.

FIG. 9 illustrates a prior art method and apparatus for structured query language SQL performance statistics, the method comprising: acquiring SQL type log information, wherein the log information comprises SQL sentences and execution performance information of the low-efficiency SQL sentences; analyzing the low-efficiency SQL sentences aiming at each piece of log information, and dividing the low-efficiency SQL sentences according to preset dividing characters to obtain divided SQL sentences; taking the divided SQL sentences as first index information of a performance statistics list, and storing the execution performance information of the SQL sentences in the current log information into the performance statistics list according to the first index information; and counting the performance data of the SQL sentences in the current log information according to the execution performance information of the SQL sentences in the current log information stored in the performance statistics list. According to the invention, the performance statistics of the SQL sentences is realized quickly and efficiently, and the SQL sentences with abnormal problems can be found accurately.

In the prior art, the SQL sentence is segmented, and the segmented SQL sentence is used as the first index information of the performance statistics list. The strategy of segmentation is to iterate through SQL sentences according to a preset separator, namely: "according to the predetermined search rule, search each predetermined segmentation character in the SQL sentence to be segmented; if the preset segmentation character is found, the preset segmentation character is used as a segmentation boundary, the SQL sentence to be segmented is segmented, and the SQL sentence content before the preset segmentation character found in the SQL sentence to be segmented is used as the SQL sentence to be segmented for the next segmentation, and the process is repeated continuously. This of course cannot be counted as a disadvantage, but is not the same as the processing focus of the present invention. The invention adopts the library sqlparse of Python as a tool, analyzes the tree result by the sqlparse, further analyzes and processes the tree result, and classifies the tree result to obtain a clear structured result. Emphasis is placed on the subsequent classification and structuring process. The analysis result is completed by the sqlparse of the open source, and the tree expression mode is finer and more complete for describing an SQL. The conversion process from tree-like to structured storable expression forms embodies the characteristic of the invention of a low-efficiency SQL statement analysis method based on a distributed database cluster, and has irreplaceability.

2. The principle of unknown judgment is different from the known principle

One important function mentioned in the prior art is "store the execution performance information of the SQL statement in the current log information to the performance statistics list according to the first index information; and counting the performance data of the SQL sentences in the current log information according to the execution performance information of the SQL sentences in the current log information stored in the performance statistics list. The method aims to realize the performance statistics of SQL sentences rapidly and efficiently and accurately find out the SQL sentences with abnormal problems. This process is a single historical SQL split result that is matched to an existing SQL split result to obtain a historical result that the existing SQL may match as a prediction.

This application direction of the prior art coincides with one of the application directions of the invention, but the principle is quite different.

The invention obtains the information of users (application users, but not database users) accessed by the application, IP, access quantity, start-stop time (note that the start-stop time is not the start-stop time of SQL operation, but the start-stop time of application execution submitted by the user), success-failure proportion and the like covered in the application as the basis of the performance result according to the final application such as report, index and the like. The execution and access information of the applications are fed back to the system of the invention and used as the adjustment basis of the weight of each efficiency factor on the efficiency effect, and the application participates in the analysis process of the low-efficiency SQL. The process of adjusting the factor weight and tracking and feeding back the adjusted result is automatically completed in a machine learning mode. So that the adjustment result tends to be reasonable. Based on a set of reasonable efficiency factors and weight information obtained by historical access, the system can perform operation prediction on an SQL which is not on line, and can also perform efficiency prediction on an application corresponding to the SQL.

3. The closeness of property fusion to a particular database type is different

This prior art can ignore any type of database, only concerns the SQL versus SQL. The invention tightly combines the characteristics of the distributed database, and the whole architecture and scheme are integrated with the experience of the performance optimization work of the distributed database for years everywhere, so that the experience is integrated in a system in an automatic mode through design.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

In a typical example, the computer apparatus includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement a method performed by a client as described above, or where the processor executes the program to implement a method performed by a server as described above.

Referring now to FIG. 11, there is illustrated a schematic diagram of a computer device 600 suitable for use in implementing embodiments of the present application.

As shown in fig. 11, the computer apparatus 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data required for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 606 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on drive 610 as needed, so that a computer program read therefrom is mounted as needed as storage section 608.

In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include transitory computer readable media (transshipment) such as modulated data signals and carrier waves.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method for efficient SQL statement analysis based on a distributed database cluster, comprising:

the SQL performance analysis model is obtained through training a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information in the distributed database cluster;

the performance analysis model comprises an input layer, a data source arrangement layer, an SQL analysis layer and a performance analysis layer, wherein the SQL performance analysis model is trained by a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information in the distributed database cluster, and comprises the following steps:

inputting a plurality of known low-efficiency SQL sentences, and corresponding historical application information and historical system information, and training the SQL analysis layer;

the data source arrangement layer extracts the analysis basis information and outputs each piece of structured information formed by classification according to a preset rule; and the performance analysis layer performs SQL performance analysis according to the characteristic information of the structural characteristics of the input low-efficiency SQL statement and each piece of structural information to obtain the low-efficiency reason analysis result.

2. The method of claim 1, further comprising:

establishing the SQL performance analysis model;

3. The method of claim 2, wherein the building the SQL performance analysis model comprises:

and the output layer outputs the analysis result of the inefficiency reasons.

4. A distributed database cluster-based low-efficiency SQL statement analysis system, comprising:

the performance analysis model comprises an input layer, a data source arrangement layer, an SQL analysis layer and a performance analysis layer, the system further comprises a model training module, and the model training module comprises:

the training unit is used for inputting a plurality of known low-efficiency SQL sentences, corresponding historical application information and historical system information and training the SQL analysis layer;

5. The low-efficiency SQL statement analysis system of claim 4, further comprising:

6. The low-efficiency SQL statement analysis system of claim 5, wherein the SQL performance analysis model comprises: an input layer, a data source arrangement layer, an SQL analysis layer, a performance analysis layer and an output layer;

and the output layer outputs the analysis result of the inefficiency reasons.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the program is executed.

8. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 3.