CN112612810A

CN112612810A - Slow SQL statement identification method and system

Info

Publication number: CN112612810A
Application number: CN202011539849.8A
Authority: CN
Inventors: 王凯
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-06

Abstract

The invention provides a method and a system for identifying slow SQL sentences, which comprises the steps of firstly, acquiring characteristic information of SQL sentences to be identified; then calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and finally, identifying whether the SQL statement to be identified is a slow SQL statement or not according to the similarity obtained by calculation. According to the embodiment of the invention, the similarity between the characteristic information of the SQL sentence to be identified and the characteristic information of the historical slow SQL sentence is calculated, the SQL sentence to be identified is identified, manual operation is not required to be introduced, the identification result is more accurate, the labor is saved, and the identification efficiency is improved. In addition, the SQL statement can be recognized as the SQL statement to be recognized before being on-line, and system faults caused by on-line of slow SQL statements can be avoided.

Description

Slow SQL statement identification method and system

Technical Field

The invention relates to the technical field of computer software, in particular to a slow SQL statement identification method and system.

Background

In software programming, the slow SQL phenomenon may occur due to the developer's programming power and frequent iterations of the business. The slow SQL phenomenon means that after the application runs a service for a period of time, the partial SQL sentences occupy the CPU for a long time when being executed because the partial SQL sentences frequently perform I/O operations on the database. For an application, a small number of slow SQL statements may not affect normal operation of a service, but as the slow SQL statements increase to a certain extent, execution of other service functions in the application may be affected, and even a system failure is caused, resulting in unavailability of the system.

In the prior art, when slow SQL statements in a system are identified, generally, when a slow SQL phenomenon is generated, a worker determines the slow SQL statements whose execution time is greater than a threshold value, which requires the worker to have a more comprehensive understanding of the system and has a higher requirement on the worker. Moreover, this approach not only does not avoid online system failures, but also may present human-induced false positives.

Disclosure of Invention

The invention provides a method and a system for identifying slow SQL sentences, which are used for overcoming the defects in the prior art.

The invention provides a slow SQL statement identification method, which comprises the following steps:

acquiring characteristic information of an SQL statement to be identified;

calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library;

and identifying whether the SQL statement to be identified is a slow SQL statement or not based on the similarity.

According to the slow SQL statement identification method provided by the invention, the calculating of the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library specifically comprises the following steps:

calculating the editing distance between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement in the historical slow SQL statement library;

and taking the editing distance as the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement.

calculating a simhash value between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement in the historical slow SQL statement library;

and determining a Hamming distance between the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement based on the calculated simhash value, and taking the Hamming distance as the similarity between the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement.

calculating the Jacard similarity coefficient between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement in the historical slow SQL statement library;

and taking the Jacard similarity coefficient as the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement.

According to the slow SQL statement identification method provided by the invention, the historical slow SQL statement library is determined by the following method:

acquiring the business information of the SQL statement to be identified;

and determining the historical slow SQL sentence library based on the service information.

According to the slow SQL statement identification method provided by the invention, the historical slow SQL statement in the historical slow SQL statement library is determined by the following method:

acquiring a first type of SQL sentences of which the execution time of a server side exceeds a time threshold and a second type of SQL sentences of which the execution time of a service side exceeds the time threshold;

and storing the first type of SQL sentences and the second type of SQL sentences into corresponding historical slow SQL sentence libraries according to business information, and performing duplication elimination processing on the historical slow SQL sentences in the corresponding historical slow SQL sentence libraries.

According to the slow SQL statement identification method provided by the invention, the identification of the SQL statement to be identified based on the similarity specifically comprises the following steps:

and if the similarity between the characteristic information of the SQL sentence to be identified and the characteristic information of any historical slow SQL sentence in the historical slow SQL sentence library is judged and obtained to be greater than a preset threshold value, determining that the SQL sentence to be identified is the slow SQL sentence as a result of identification.

The invention also provides a slow SQL statement recognition system, which comprises: the device comprises a characteristic information acquisition module, a similarity calculation module and an identification module. Wherein the content of the first and second substances,

the characteristic information acquisition module is used for acquiring the characteristic information of the SQL statement to be identified;

the similarity calculation module is used for calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library;

and the identification module is used for identifying whether the SQL statement to be identified is a slow SQL statement or not based on the similarity.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the steps of any one of the slow SQL statement identification methods.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the slow SQL statement identification method according to any of the above.

The invention provides a method and a system for identifying slow SQL sentences, which comprises the steps of firstly obtaining the characteristic information of the SQL sentences to be identified; then calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and finally, identifying whether the SQL statement to be identified is a slow SQL statement or not according to the similarity obtained by calculation. According to the embodiment of the invention, the similarity between the characteristic information of the SQL sentence to be identified and the characteristic information of the historical slow SQL sentence is calculated, the SQL sentence to be identified is identified, manual operation is not required to be introduced, the identification result is more accurate, the labor is saved, and the identification efficiency is improved. In addition, the SQL statement can be recognized as the SQL statement to be recognized before being on-line, and system faults caused by on-line of slow SQL statements can be avoided.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a slow SQL statement identification method provided by the present invention;

FIG. 2 is a flow chart diagram of a slow SQL statement identification method provided by the present invention;

FIG. 3 is a schematic structural diagram of a slow SQL statement identification system provided by the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the prior art, when slow SQL statements in the system are identified, generally, when a slow SQL phenomenon occurs, a worker determines the slow SQL statements with execution time greater than a threshold value, which requires the worker to have a more comprehensive understanding of the system, and the worker with less experience in work cannot accurately determine the slow SQL statements. Therefore, the labor is required to be invested greatly, and the requirement on the capability of workers is high. In addition, this approach not only fails to avoid online system failures, but also may present human-induced false positives. Therefore, the embodiment of the invention provides a slow SQL statement identification method.

Fig. 1 is a schematic flow chart of a slow SQL statement identification method provided in an embodiment of the present invention. As shown in fig. 1, the method includes:

s1, acquiring the characteristic information of the SQL statement to be identified;

s2, calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library;

s3, based on the similarity, identifying whether the SQL statement to be identified is a slow SQL statement.

Specifically, in the slow SQL statement identification method provided in the embodiment of the present invention, the execution subject is a server, specifically, the execution subject may be a local server or a cloud server, and the local server may be a computer, which is not specifically limited in the embodiment of the present invention.

Step S1 is performed first. The SQL statement to be recognized may be an SQL statement to be brought online, that is, an SQL statement to be entered into a production environment or an SQL statement to be used by a user. SQL statements are a Structured Query Language (SQL), which is a database Query and programming Language for accessing data and querying, updating, and managing a relational database system, and SQL statements are a Language for operating on a database system.

The feature information of the SQL statement, that is, the fingerprint of the SQL statement, is identification information for characterizing the SQL statement, for example, if the SQL statement 1 is "30 yuan ≦ price ≦ 100 yuan" to indicate a target object with a query price between 30 yuan and 100 yuan, then the feature information of the SQL statement 1 is "a ≦ price ≦ B", and if the SQL statement 2 is "101 yuan ≦ price ≦ 200 yuan" to indicate a target object with a query price between 101 yuan and 200 yuan, the feature information of the SQL statement 2 may be the same as the feature information of the SQL statement 1, and is "a ≦ price ≦ B". That is, the characteristic information of the SQL statement may represent a type of SQL statement, and is a fixed query format obtained by ignoring changeable data information. Accordingly, the characteristic information of the SQL statement to be identified can be identified and extracted from the SQL statement to be identified.

Then, step S2 is executed. And calculating the similarity between the characteristic information of the SQL sentence to be identified and the characteristic information of the historical slow SQL sentence in the historical slow SQL sentence library. The historical slow SQL statement library used herein may include all slow SQL statements collected from the database system, or may only include all slow SQL statements matching the service information of the SQL statement to be identified, which is not specifically limited in the embodiment of the present invention. The service information of the SQL statement to be identified may refer to service information corresponding to the SQL to be identified, and may include information such as a database service address, a port number, a database name, and a data table name.

The historical slow SQL statement library stores feature information of a plurality of historical slow SQL statements, so in the embodiment of the present invention, a similarity between the feature information of the SQL statement to be identified and the feature information of each historical slow SQL statement needs to be calculated. The similarity may be calculated in various ways, for example, by calculating the euclidean distance between the two, and taking the obtained euclidean distance as the similarity between the two, and other similarity calculation manners may also be selected, which is not specifically limited in the embodiment of the present invention.

Finally, step S3 is performed. And identifying whether the SQL sentence to be identified is the slow SQL sentence or not according to the similarity between the feature information of the SQL sentence to be identified, which is obtained in the step 2, and the feature information of each historical slow SQL sentence in the historical slow SQL sentence library. The identification method may specifically be that whether a historical slow SQL statement exists in the historical slow SQL statement library, where the similarity between the historical slow SQL statement and the feature information of the SQL statement to be identified reaches a preset threshold value, is judged, and if the historical slow SQL statement exists, it is indicated that the identified SQL statement is a slow SQL statement. Otherwise, if the SQL statement to be recognized is not the slow SQL statement, the result of the recognition is shown.

The slow SQL statement identification method provided by the embodiment of the invention comprises the steps of firstly, acquiring the characteristic information of an SQL statement to be identified; then calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and finally, according to the similarity obtained by calculation, identifying whether the SQL statement to be identified is a slow SQL statement. According to the embodiment of the invention, the similarity between the characteristic information of the SQL sentence to be identified and the characteristic information of the historical slow SQL sentence is calculated, the SQL sentence to be identified is identified, manual operation is not required to be introduced, the identification result is more accurate, the labor is saved, and the identification efficiency is improved. In addition, the SQL statement can be recognized as the SQL statement to be recognized before being on-line, and system faults caused by on-line of slow SQL statements can be avoided.

On the basis of the foregoing embodiment, the slow SQL statement identification method provided in the embodiment of the present invention specifically includes calculating a similarity between the feature information of the to-be-identified SQL statement and the feature information of the historical slow SQL statement in the historical slow SQL statement library, where the similarity includes:

Specifically, in the embodiment of the present invention, when calculating the similarity, an edit distance between the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement in the historical slow SQL statement library may be calculated first. The edit distance is the minimum number of edit operations required to convert one string into another string between two strings. Editing operations may include replacing one character with another, inserting one character, and deleting one character.

Therefore, firstly, the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement in the historical slow SQL statement library are converted into word strings respectively, and a feature word string s [1, …, m ] of the SQL statement to be identified and a feature word string t [1, …, n ] of any historical slow SQL statement are obtained respectively. The string s contains m characters, and the string t contains n characters. Suppose that the string s is converted into the string t by at least d [ m, n ] steps of editing operations, i.e., d [ m, n ] is the minimum number of editing operations. When m is equal to 0, the string s is empty, the minimum number of editing operations is d [0, n ], which means that n characters are added, so that the string s can be converted into the string t. When j equals 0, the string t is empty, the minimum number of editing operations is d [ m,0], which means that m characters are reduced, so that the string s can be converted into the string t.

Generally, to convert a string s into a string t through a minimum number of editing operations such as adding, deleting, or replacing, a first round of conversion needs to be completed through the minimum number of editing operations such as adding, deleting, or replacing, so that the string s and the string t can be converted into the string t through another editing operation or a second round of conversion without any editing operation. Among them, the first kind of conversion can be classified into the following three cases:

1) the string s [1, …, m ] may be converted into string t [1, …, n-1] in k editing operations;

2) the string s [1, …, m-1] may be converted into string t [1, …, n ] in k editing operations;

3) the string s [1, …, m-1] may be converted to t [1, …, n-1] in k editing operations.

For case 1), matching is completed by adding the character t [ n ] to the string s [1, …, m ] at the end, and k +1 operations are needed in total;

aiming at the situation 2), only the character s [ m ] needs to be removed at last, then k editing operations are carried out, and k +1 operations are needed in total;

for case 3), it is only necessary to replace the character s [ m ] with the character t [ n ] at the end, so that s [1, …, m ] ═ t [1, …, n ] is satisfied, and a total of k +1 operations are also required.

In particular, if in case 3 the character s [ m ] is exactly equal to the character t [ n ], this conversion process can be done using only k editing operations.

Finally, in order to ensure that the obtained number of editing operations is always the smallest, the least consumed one of the above three cases may be selected as the minimum number of editing operations required to convert the string s [1, …, m ] into the string t [1, …, n ].

Based on this, in the embodiment of the present invention, the first step: denote string s [1, …, m ] as str1, string t [1, …, n ] as str2, returning the length of another string when str1 or str2 is 0. For example:

if(str1.length＝＝0)return str2.length；

if(str2.length＝＝0)return str1.length。

the second step is that: initializing a matrix d of (n +1) × (m +1), and increasing the values of the first row and column from 0, and expressing the number of editing operations by the values in the matrix d.

The third step: scan str1 and str2 if: str1[ i ] ═ str2[ j ], which was recorded as 0 using temp. Otherwise temp is noted as 1. Then, the matrix d [ i, j ] is assigned to the minimum value of d [ i-1, j ] +1, d [ i, j-1] +1, d [ i-1, j-1] + temp.

The fourth step: after scanning, the last value d [ n ] [ m ] of the returned matrix is the edit distance between str1 and str 2.

And then, taking the calculated editing distance as the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement.

It should be noted that, at this time, the preset threshold value adopted when identifying the to-be-identified SQL statement based on the similarity is the minimum similarity which is set when the edit distance is used as the similarity and can identify a certain SQL statement as a slow SQL statement. That is, when the similarity calculation methods are different, the preset thresholds that are set do not affect each other, and may be the same or different.

According to the slow SQL statement identification method provided by the embodiment of the invention, an edit distance algorithm is introduced, and the edit distance between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement is used as the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement, so that the obtained similarity is more accurate.

Specifically, in the embodiment of the present invention, when calculating the similarity, the simhash values of the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement in the historical slow SQL statement library may be calculated respectively. The determination of the simhash value can be realized by a simhash algorithm, which can convert a document into a 64-bit byte, i.e., obtain the simhash value.

The simhash algorithm mainly comprises five steps, which are respectively as follows: word segmentation processing, Hash calculation, weighting calculation, accumulation calculation and binarization calculation. The following description is only given by taking the example of calculating the simhash value of the feature information of the SQL statement to be identified by using the simhash algorithm, and the method of calculating the simhash value of the feature information of any historical slow SQL statement by using the simhash algorithm is consistent with the method of calculating the simhash value of the feature information of the SQL statement to be identified, which is not described herein in the embodiments of the present invention.

Word segmentation processing: the characteristic information of the SQL sentence to be recognized is usually presented in a text form, so that the characteristic information of the SQL sentence to be recognized can be subjected to word segmentation processing to obtain a plurality of characteristic words of the SQL sentence to be recognized, and all the characteristic words form a characteristic word sequence. On the basis, denoising can be carried out on the characteristic word sequence, namely, noise words in the characteristic word sequence are removed. Each word in the sequence of feature words may also be given a weight. In the embodiment of the invention, the weight can be divided into 5 levels, and the values are 1-5 respectively. The weight of the subject may be 4, the weight of the predicate may be 5, the weight of the quantifier may be 3, the weight of the object may be 1, and the weight of the adjective may be 2. For example, the word segmentation is performed on "i am a busy man", and after the word segmentation, the following results are obtained: i (4) are (5) a group (3) of large (2) busy people (1).

And (3) Hash calculation: the hash value of each feature word is calculated by a hash algorithm, for example, the hash value of the word "i" calculated by the hash algorithm may be 100101.

And (3) weighting calculation: and forming a weighted digit string according to the weight of the characteristic word by the result obtained by the Hash calculation, wherein the weighted digit string can be specifically formed by multiplying the bit with the value of 1 in the hash value by the weight and subtracting the bit with the value of 0 from the weight. For example, the hash value corresponding to "I" is 100101, and a weighted numeric string "4-4-44-44" is obtained after weighting, the hash value corresponding to "I" is 101011, and a weighted numeric string "5-55-555" is obtained after weighting.

And (3) accumulation calculation: and accumulating the weighted digit strings corresponding to the feature words obtained by weighted calculation. For example, "I am" results in, after accumulation: "9-91-119".

And (3) binarization calculation: and converting the '9-91-119' obtained by accumulation calculation into a 0-1 string to form a final simhash signature, namely a simhash value. The rule of the transformation is: if the value of a bit is greater than 0, it is recorded as 1, and if the value of a bit is less than 0, it is recorded as 0. Finally, my is the result of the binarization calculation, i.e. the simhash value is "101011".

The specific process of the simhash algorithm adopted in the embodiment of the invention is as follows:

converting the characteristic information of the SQL statement to be identified into a f-dimensional vector V, and initializing the vector V to be 0; initializing a binary number S of f dimensions to 0;

each feature word of the SQL sentence to be recognized is as follows: and generating a signature b in an f-dimension for the feature word by adopting a hash algorithm.

For i 1 to f, there are:

if the ith bit of the signature b is 1, adding the weight of the feature word to the ith element of the vector V; otherwise, the ith element of the vector V is subtracted by the weight of the feature word.

If the ith element of the vector V is greater than 0, the ith bit of the binary number S is 1, otherwise, 0; and finally, outputting a binary number S as a signature to obtain a simhash value of the characteristic information of the SQL statement to be identified.

And then, according to the calculated simhash values, determining a hamming distance between the feature information of the SQL statement to be recognized and the feature information of any historical slow SQL statement, specifically calculating the hamming distance between the two simhash values, and taking the hamming distance as the similarity between the feature information of the SQL statement to be recognized and the feature information of any historical slow SQL statement.

It should be noted that, at this time, the preset threshold value adopted when identifying the SQL statement to be identified based on the similarity is the minimum similarity that can be set when the hamming distance is used as the similarity and that can identify a certain SQL statement as a slow SQL statement. That is, when the similarity calculation methods are different, the preset thresholds that are set do not affect each other, and may be the same or different.

According to the slow SQL statement identification method provided by the embodiment of the invention, the simhash algorithm is introduced to calculate the simhash value, and the hamming distance between the two simhash values is used as the similarity between the two simhash values, so that the obtained similarity is more accurate. Moreover, the simhash algorithm can reduce the dimension of the data, so that the calculation amount during the similarity calculation can be reduced and the calculation efficiency is improved.

Specifically, in the embodiment of the present invention, when calculating the similarity, the jaccard similarity coefficient between the feature information of the SQL statement to be identified and the feature information of any historical slow SQL statement in the historical slow SQL statement library may be calculated respectively. The determination of the jacqard similarity coefficient may be achieved by a jacqard similarity coefficient algorithm, which may be understood as the proportion of the intersection elements of the two sets a and B in the union, referred to as the jacqard similarity coefficients of the two sets, and denoted by the symbol J (a, B). Then there are:

the Jacard similarity coefficient algorithm mainly comprises four steps which are respectively as follows: word segmentation processing, intersection calculation, union calculation and coefficient calculation.

Word segmentation processing: the characteristic information of the SQL sentence to be recognized is usually presented in a text form, so that the characteristic information of the SQL sentence to be recognized can be subjected to word segmentation processing to obtain a plurality of characteristic words of the SQL sentence to be recognized, and a first word set is formed. Similarly, the feature information of any historical slow SQL statement is usually presented in the form of text, so that the feature information of any historical slow SQL statement can be subjected to word segmentation to obtain a plurality of feature words of any historical slow SQL statement, and a second word set is formed. In the embodiment of the present invention, the code "select? The feature information of the SQL statement to be identified and the feature words in the feature information of any historical slow SQL statement are extracted, which is not specifically limited in the embodiment of the present invention.

And (3) intersection calculation: and counting the same characteristic words in the first word set and the second word set.

And (3) union set calculation: and counting all the characteristic words after the duplication in the first word set and the second word set.

Coefficient calculation: and solving the quotient of the intersection and the union to obtain the Jacard similarity coefficient between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement.

According to the slow SQL statement identification method provided by the embodiment of the invention, the Jacard similarity coefficient algorithm is introduced, and the Jacard similarity coefficient between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement is used as the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of any historical slow SQL statement, so that the obtained similarity is more accurate. Moreover, because the calculation method of the Jacard similarity coefficient is simple, the calculation amount during the calculation of the similarity can be reduced and the calculation efficiency is improved by adopting the Jacard similarity coefficient algorithm.

On the basis of the foregoing embodiment, in the slow SQL statement identification method provided in the embodiment of the present invention, the historical slow SQL statement library is determined specifically by the following method:

acquiring the business information of the SQL statement to be identified;

Specifically, in the embodiment of the present invention, the database system includes a plurality of ports, a plurality of databases are provided under each port, a plurality of data tables are stored under each database, each data table corresponds to a specific service, and the function of querying data is implemented by SQL statements, so that each data table corresponds to one historical slow SQL statement library, and all slow SQL statements appearing in the corresponding data table are stored in the historical slow SQL statement library.

Therefore, the historical slow SQL statement library in the embodiment of the present invention may refer to a historical slow SQL statement library corresponding to each data table in the database system. The service information of the SQL statement to be identified may be obtained first, where the service information may be carried by the SQL statement to be identified, and specifically may include a database service address, a port number, a database name, a data table name, and the like. And obtaining the historical slow SQL sentence library corresponding to the SQL sentence to be identified through the routing of the service information.

In the embodiment of the invention, the specific method for determining the historical slow SQL sentence library through the business information of the SQL sentences to be identified is provided, so that the historical slow SQL sentence library only corresponds to the data table of the SQL sentences to be identified, the number of the historical slow SQL sentence libraries in the historical slow SQL sentence library can be reduced, whether the SQL sentences to be identified are slow SQL sentences or not can be determined more easily, and the identification efficiency is improved.

On the basis of the foregoing embodiment, in the slow SQL statement identification method provided in the embodiment of the present invention, the historical slow SQL statement in the historical slow SQL statement library is specifically determined by the following method:

Specifically, the first type of SQL statements whose execution time exceeds the time threshold at the server side may be obtained first, specifically, the first type of SQL statements may be obtained through the Mysql parameters configured at the server side, and the time threshold may be set according to needs, which is not specifically limited in the embodiment of the present invention. The first type of SQL sentences are historical slow SQL sentences on the server side. Meanwhile, a second SQL type statement with the service side execution time exceeding a time threshold is also obtained, the execution time of the SQL statement of the service side can be monitored through codes, and the SQL statement with the execution time exceeding the time threshold is extracted as the second SQL type statement. The second type of SQL sentences are historical slow SQL sentences on the business side.

Then, storing the first type of SQL statement and the second type of SQL statement into the corresponding historical slow SQL statement libraries according to the service information, that is, storing both the SQL statements corresponding to the same data table into the historical slow SQL statement library corresponding to the data table. And performing deduplication processing on the historical slow SQL sentences in the corresponding historical slow SQL sentence library, wherein each historical slow SQL sentence stored in the historical slow SQL sentence library comprises two attributes, one attribute is characteristic information, and the other attribute is encryption information obtained after encryption by M5. Therefore, the process of the deduplication processing may be to compare whether the encryption information of every two historical slow SQL statements is the same, and if the encryption information is the same, the two historical slow SQL statements are considered to be the same, and then delete one of the two historical slow SQL statements, so that the deduplication processing can be implemented.

In the embodiment of the present invention, the storage process and the deduplication processing process may be combined, that is, each time a new historical slow SQL statement is obtained, corresponding encryption information is obtained through an MD5 encryption algorithm. And then comparing the encrypted information with the encrypted information of the historical slow SQL sentences in the historical slow SQL sentence library corresponding to the business information of the new historical slow SQL sentences to judge whether the historical slow SQL sentences corresponding to the business information of the new historical slow SQL sentences have the historical slow SQL sentences same as the encrypted information, if so, not adding the obtained new historical slow SQL sentences into the historical slow SQL sentence library, and if not, adding the obtained new historical slow SQL sentences into the historical slow SQL sentence library to realize deduplication processing, thereby ensuring that two historical slow SQL sentences with the same encrypted information do not exist in the historical slow SQL sentence library.

The slow SQL sentence identification method provided by the embodiment of the invention obtains the historical slow SQL sentences from the server side and the business side respectively, and stores and removes the duplicate, so that enough samples in the historical slow SQL sentence library can be obtained without duplication, and the accuracy of the identification result can be ensured.

On the basis of the foregoing embodiment, the slow SQL statement identification method provided in the embodiment of the present invention identifies whether the to-be-identified SQL statement is a slow SQL statement based on the similarity, and specifically includes:

Specifically, in the embodiment of the present invention, when the SQL statement to be recognized is recognized, it may be specifically determined whether a similarity between the feature information of the SQL statement to be recognized and the feature information of each historical slow SQL statement in the historical slow SQL statement library is greater than a preset threshold, and if at least one historical slow SQL statement exists in the historical slow SQL statement library, and the similarity between the feature information of the historical slow SQL statement to be recognized and the feature information of the SQL statement to be recognized is greater than the preset threshold, the two historical slow SQL statements are similar to each other, and the SQL statement to be recognized may be regarded as the slow SQL statement. If any historical slow SQL statement does not exist in the historical slow SQL statement library, the similarity between the characteristic information of the historical slow SQL statement library and the characteristic information of the SQL statement to be identified is larger than a preset threshold value, namely the similarity between the characteristic information of each historical slow SQL statement in the historical slow SQL statement library and the characteristic information of the SQL statement to be identified is smaller than or equal to the preset threshold value, the historical slow SQL statement similar to the SQL statement to be identified does not exist in the historical slow SQL statement library, and the SQL statement to be identified can be considered not to be a slow SQL statement.

If the SQL statement to be recognized is judged to be the slow SQL statement, the related information of the SQL statement to be recognized can be sent to the staff for the staff to carry out optimization modification on the SQL statement to be recognized according to the related information. The related information may include service information of the SQL statement to be identified, and may also be attribute information.

Fig. 2 is a schematic diagram of a complete flow of the slow SQL statement identification method provided in the embodiment of the present invention, and as shown in fig. 2, the method includes:

acquiring an SQL sentence to be identified;

determining characteristic information of the SQL statement to be identified and business information, wherein the business information comprises a database service address (Host2), a port number (port2), a database name (database1) and a data table name (table 1). One database service address and one port number may form one database instance, and the entire database system may include three database instances, respectively Host1: port1, Host2: port2, and Host3: port 3. Host1 Port1 may include two databases, databaseY and databaseX. Host2 Port2 may include two databases, database1 and database 2. Two data tables, table1 and table2, may be included under database 1. Two data tables, table3 and table4, may be included under database 2. Host3 Port3 may include two databases, databaseZ and databaseK. A database instance Host2, namely a port2 can be determined through a Host2 and a port2 in service information of an SQL statement to be identified, database1 continues to be searched under the database instance, and table1 continues to be searched under database 1. And determining a historical slow SQL statement library corresponding to the table1, wherein the historical slow SQL statement library comprises a plurality of historical slow SQL statements which are respectively marked as historical slow SQL1, historical slow SQL2, historical slow SQL3, historical slow SQL4, historical slow SQL5, historical slow SQL6 and the like. By calculating the similarity between the feature information of the SQL statement to be identified and the feature information of each historical slow SQL statement, the similarity 1, the similarity 2, the similarity 3, the similarity 4, the similarity 5 and the similarity 6 can be obtained respectively. And then judging the size relationship between each similarity and a preset threshold, if the similarity greater than the preset threshold exists, determining that the SQL statement to be identified is a slow SQL statement, and informing a worker to perform optimization modification. If the similarity larger than the preset threshold does not exist, determining that the SQL statement to be recognized is not the slow SQL statement, and ending the recognition process.

As shown in fig. 3, on the basis of the above embodiment, an embodiment of the present invention provides a slow SQL statement identification system, including: a feature information acquisition module 31, a similarity calculation module 32, and an identification module 33. Wherein the content of the first and second substances,

the characteristic information obtaining module 31 is configured to obtain characteristic information of an SQL statement to be identified;

the similarity calculation module 32 is configured to calculate a similarity between the feature information of the SQL statement to be identified and the feature information of the historical slow SQL statement in the historical slow SQL statement library;

the identifying module 33 is configured to identify whether the SQL statement to be identified is a slow SQL statement based on the similarity.

Specifically, the functions of the modules in the slow SQL statement identification system provided in the embodiment of the present invention correspond to the operation flows of the steps in the method embodiments one to one, and the implementation effect is also consistent.

On the basis of the foregoing embodiment, an embodiment of the present invention provides a slow SQL statement identification system, where the similarity calculation module is specifically configured to:

On the basis of the above embodiments, an embodiment of the present invention provides a slow SQL statement identification system, further including: the historical slow SQL sentence library determining module is used for:

acquiring the business information of the SQL statement to be identified;

On the basis of the above embodiments, an embodiment of the present invention provides a slow SQL statement identification system, further including: the historical slow SQL sentence library construction module is used for:

On the basis of the foregoing embodiment, an embodiment of the present invention provides a slow SQL statement identification system, where the identification module is specifically configured to:

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform a slow SQL statement identification method comprising: acquiring characteristic information of an SQL statement to be identified; calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and identifying whether the SQL statement to be identified is a slow SQL statement or not based on the similarity.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the slow SQL statement identification method provided by the above methods, the method including: acquiring characteristic information of an SQL statement to be identified; calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and identifying whether the SQL statement to be identified is a slow SQL statement or not based on the similarity.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that when executed by a processor is implemented to perform the slow SQL statement recognition methods provided above, the method comprising: acquiring characteristic information of an SQL statement to be identified; calculating the similarity between the characteristic information of the SQL statement to be identified and the characteristic information of the historical slow SQL statement in the historical slow SQL statement library; and identifying whether the SQL statement to be identified is a slow SQL statement or not based on the similarity.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A slow SQL statement identification method is characterized by comprising the following steps:

acquiring characteristic information of an SQL statement to be identified;

2. The method according to claim 1, wherein the calculating the similarity between the feature information of the SQL statement to be identified and the feature information of the historical slow SQL statement in the historical slow SQL statement library specifically comprises:

3. The method according to claim 1, wherein the calculating the similarity between the feature information of the SQL statement to be identified and the feature information of the historical slow SQL statement in the historical slow SQL statement library specifically comprises:

4. The method according to claim 1, wherein the calculating the similarity between the feature information of the SQL statement to be identified and the feature information of the historical slow SQL statement in the historical slow SQL statement library specifically comprises:

5. The slow SQL statement identification method according to any of claims 1-4, wherein the historical slow SQL statement library is determined by the following method:

acquiring the business information of the SQL statement to be identified;

6. The slow SQL statement identification method according to claim 5, wherein the historical slow SQL statement in the historical slow SQL statement library is determined by the following method:

7. The slow SQL statement identification method according to any of claims 1 to 4, wherein identifying whether the to-be-identified SQL statement is a slow SQL statement based on the similarity includes:

8. A slow SQL statement identification system, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the slow SQL statement identification method according to any of claims 1 to 7 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the slow SQL statement identification method according to any of claims 1 to 7.