CN111400338B - SQL optimization method, device, storage medium and computer equipment - Google Patents

SQL optimization method, device, storage medium and computer equipment Download PDF

Info

Publication number
CN111400338B
CN111400338B CN202010141851.3A CN202010141851A CN111400338B CN 111400338 B CN111400338 B CN 111400338B CN 202010141851 A CN202010141851 A CN 202010141851A CN 111400338 B CN111400338 B CN 111400338B
Authority
CN
China
Prior art keywords
sql
optimized
optimization
statement
sql statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010141851.3A
Other languages
Chinese (zh)
Other versions
CN111400338A (en
Inventor
刘佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ping An Medical Health Technology Service Co Ltd filed Critical Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority to CN202010141851.3A priority Critical patent/CN111400338B/en
Publication of CN111400338A publication Critical patent/CN111400338A/en
Application granted granted Critical
Publication of CN111400338B publication Critical patent/CN111400338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a SQL optimization method, a device, a storage medium and computer equipment, and relates to the technical field of databases. The method comprises the following steps: collecting multiple groups of SQL sample data, and creating a decision model according to the SQL sample data; acquiring an SQL statement to be optimized and application environment information corresponding to the SQL statement to be optimized; converting SQL sentences to be optimized into an abstract syntax tree; optimizing the abstract syntax tree by using a decision model according to application environment information corresponding to the SQL sentence to be optimized; and converting the optimized abstract syntax tree into an SQL statement to obtain the optimized SQL statement. According to the method, the SQL sentences can be optimized in a targeted manner according to different application environments, the SQL optimizing efficiency can be improved by optimizing the SQL sentences through the strategy model, the time consumption of SQL optimization is reduced, and a better optimizing effect is achieved.

Description

SQL optimization method and device, storage medium and computer equipment
Technical Field
The invention relates to the technical field of databases, in particular to a method and a device for optimizing SQL (structured query language), a storage medium and computer equipment.
Background
SQL (Structured Query Language), which is a Structured Query Language, is a Database Language with multiple functions of data manipulation and data definition, and is specifically used for operating a relational Database, which is the most widely used Database at present, and more typical relational databases include Oracle Database of Oracle corporation, SQL Server of microsoft corporation, DB2 of IBM corporation, postgreSQL and MySQ.
In practical application, SQL tuning is crucial to the performance of a project, but the existing SQL tuning is totally based on the technical accumulation of technicians themselves, the technical points are relatively scattered, the adjusters need professional knowledge in multiple fields such as SQL optimization, access design, SQL design, and the like, and the configuration tuning modes of different data volumes and different database scenes are different, a large amount of repeated and trivial challenge work consumes time, and the tuning work is endless due to the continuous increase of SQL load.
At present, no effective automatic optimization scheme specially aiming at SQL sentences exists in the market.
Disclosure of Invention
In view of this, the present application provides an SQL optimization method, an apparatus, a storage medium, and a computer device, and mainly aims to solve the technical problems of long time consumption, low optimization efficiency, and poor optimization effect in SQL optimization.
According to a first aspect of the present invention, there is provided a SQL optimizing method, comprising:
collecting multiple groups of SQL sample data, and creating a decision model according to the SQL sample data;
acquiring an SQL statement to be optimized and application environment information corresponding to the SQL statement to be optimized;
converting SQL sentences to be optimized into an abstract syntax tree;
optimizing the abstract syntax tree by using a decision model according to application environment information corresponding to the SQL statement to be optimized;
and converting the optimized abstract syntax tree into an SQL statement to obtain the optimized SQL statement.
According to a second aspect of the present invention, there is provided an SQL optimizing device, comprising:
the model creating module is used for acquiring multiple groups of SQL sample data and creating a decision model according to the SQL sample data;
the data acquisition module is used for acquiring the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized;
the semantic parsing module is used for converting SQL sentences to be optimized into abstract syntax trees;
the optimization processing module is used for optimizing the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized;
and the statement conversion module is used for converting the optimized abstract syntax tree into an SQL statement to obtain the optimized SQL statement.
According to a third aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the SQL optimization method described above.
According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the SQL optimization method when executing the program.
According to the SQL optimization method, the SQL optimization device, the storage medium and the computer equipment, firstly, a decision model is created by utilizing collected SQL sample data, then, the decision model is utilized to optimize SQL sentences converted into abstract syntax trees in combination with the application environment of the SQL sentences, and finally, the optimized abstract syntax trees are converted into the SQL sentences to obtain the optimized SQL sentences. Compared with the prior art, the method and the device can perform targeted optimization on the SQL sentences according to different application environments, improve the SQL optimization efficiency by optimizing the SQL sentences through the strategy model, reduce the time consumption of SQL optimization and achieve better optimization effect.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 shows a schematic flow chart of an SQL optimization method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another SQL optimization method provided by the embodiment of the invention;
fig. 3 shows a schematic structural diagram of an SQL optimizing apparatus according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of another SQL optimizing apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the drawings and embodiments. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.
In the development process of actual projects, SQL optimization is a very important link, for example, in the initial operation stage of some small projects, since the amount of service data is relatively small, the influence of the execution efficiency of some SQL on the operation efficiency of the program is not obvious, and developers and operation and maintenance personnel cannot judge the influence of the SQL on the operation efficiency of the program, special optimization cannot be performed on SQL, but as time is accumulated, the amount of service data increases, the influence of the execution efficiency of the SQL on the operation efficiency of the program gradually increases, and at this time, periodic optimization needs to be performed on SQL; for another example, for some medium and large projects, due to a large data volume, targeted SQL optimization needs to be performed for a service scenario at the initial stage of project operation, so as to avoid affecting the operation efficiency of a program in the process of project operation.
In one embodiment, as shown in fig. 1, an SQL optimizing method is provided, which is described by taking an example that the method is applied to a server, and includes the following steps:
101. and collecting multiple groups of SQL sample data, and creating a decision model according to the SQL sample data.
The decision model belongs to a reinforced learning model, and the model comprises two training processes of learning and reinforcement, wherein the learning and the reinforcement are two continuous and cyclic iterative processes until an optimal decision is reached.
Specifically, the multiple groups of SQL sample data include multiple groups of SQL scripts before optimization, pre-optimized SQL scripts corresponding to the SQL scripts before optimization, and corresponding application environment information, and before the decision model is trained, the multiple groups of SQL sample data may be divided into a training set and a test set, where data in the training set may be used for training a learning process, and data in the test set may be used for training a reinforcement process.
Furthermore, in the learning process of the decision model, a plurality of decision trajectory data can be created according to the SQL script before optimization and the pre-optimized SQL script in the training set, wherein each decision trajectory data comprises a plurality of SQL grammars and optimization operation sequences corresponding to the SQL grammars, then mapping relationships between the SQL grammars and the optimization operations in all the decision trajectory data are extracted, a data set is constructed according to the extracted mapping relationships between the SQL grammars and the optimization operations, and finally, iterative computation is performed on the data set through a separation algorithm and/or a regression algorithm, and the decision model is obtained through continuous learning and training.
Further, in the process of strengthening the decision model, firstly, a performance test can be performed on the SQL script before optimization and the pre-optimized SQL script in the test set, and a target performance index can be obtained according to the result of the performance test, wherein the target performance index includes a preset SQL execution time threshold, a memory occupancy threshold, a CPU occupancy threshold, a read-write database frequency threshold, a deadlock frequency threshold, and the like, then a reward function is created according to the target performance index, and the decision model is used for performing optimization and performance test on the SQL script before optimization in the test set to obtain the performance index of the optimized SQL script, wherein the performance index includes SQL execution time, memory occupancy, CPU occupancy, read-write database frequency, and the like, then the performance index of the optimized SQL script is compared with the target performance index, and an accumulated reward is obtained through a greedy algorithm and/or a value function algorithm according to the comparison result, and finally the transition probability of the decision model is optimized according to the accumulated reward.
In this embodiment, the reward function is an important component in the decision model, and is a feedback signal, which can identify whether the SQL performance optimization reaches a preset target, the goal of the decision model is to obtain a maximum cumulative reward (maximum cumulative reward), for the serialized decision problem, reinforcement learning is to select some optimization operations to maximize the cumulative reward, for the same SQL script, there may be multiple optimization paths from a start state to a stop state, the transition of the paths is related to the transition probability, and after continuous learning and reinforcement, the transition probability can be adjusted and optimized, so as to obtain a more complete decision model.
102. And acquiring the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized.
The SQL application environment information refers to a database environment in which SQL statements are applied, the SQL syntax in each database environment is different, and the execution efficiency in different database environments is different for the same SQL statement, so in the process of model training and model application, the application environments of the SQL statements need to be distinguished for targeted training and application.
Specifically, the server may directly obtain the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized, where the application environment information corresponding to the SQL statement to be optimized includes information such as an application database and a database plug-in.
103. And converting the SQL statement to be optimized into an abstract syntax tree.
The Abstract Syntax Tree (AST) is an Abstract representation of a Syntax structure of a source code, and represents the Syntax structure of a programming language in a Tree form, and each node on the Syntax Tree represents a structure in the source code.
Specifically, the server may first split the SQL statement to be optimized into a plurality of morphemes according to a predetermined lexical rule through a lexical analyzer, then perform syntax analysis on the plurality of morphemes according to a set SQL syntax parsing rule through a syntax analyzer, and finally convert the plurality of morphemes conforming to the syntax rule into an abstract syntax tree. In this embodiment, the converted abstract syntax tree may represent the syntax structure of the SQL statement.
104. And optimizing the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized.
Specifically, the query condition of the SQL statement to be optimized can be obtained through the abstract syntax tree, then the decision model can determine the optimization strategy according to the application environment information corresponding to the SQL statement to be optimized, then the redundant query condition in the SQL statement is removed according to the optimization strategy, and finally the query condition from which the redundancy is removed is replaced with the query condition with the same semantics or the query condition is split into a plurality of query conditions.
In this embodiment, the optimization strategies of the decision models are different for different application environments, that is, the transition probabilities of the optimization operations are different in different application environments, and naturally, the output optimization results are also different. Therefore, after the abstract syntax tree is optimized by using the decision model, a better SQL statement aiming at different application environments can be obtained.
105. And converting the optimized abstract syntax tree into an SQL statement to obtain the optimized SQL statement.
Specifically, the step is to convert the SQL statement into an inverse operation of the abstract syntax tree, and after the abstract syntax tree is optimized by using the decision model, semantic reverse analysis can be performed on the abstract syntax tree, so that the optimized SQL statement can be obtained.
The SQL optimization method provided in this embodiment first creates a decision model using the collected SQL sample data, then optimizes the SQL statement converted into the abstract syntax tree using the decision model in combination with the application environment of the SQL statement, and finally converts the optimized abstract syntax tree into the SQL statement to obtain the optimized SQL statement. The method can perform targeted optimization on the SQL sentences according to different application environments, and the optimization on the SQL sentences through the strategy model can improve the SQL optimization efficiency, reduce the time consumption of the SQL optimization and achieve better optimization effect.
Further, as a refinement and an extension of the specific implementation of the above embodiment, in order to fully illustrate the implementation process of the embodiment, an SQL optimization method is provided, as shown in fig. 2, and the method includes the following steps:
201. and collecting multiple groups of SQL sample data, and creating a decision model according to the SQL sample data.
Specifically, multiple groups of SQL sample data include multiple groups of SQL scripts before optimization, pre-optimized SQL scripts corresponding to the SQL scripts before optimization, and corresponding application environment information, before a decision model is trained, the multiple groups of SQL sample data may be divided into a training set and a test set, during the learning process of the decision model, multiple pieces of decision trajectory data may be created according to the SQL scripts before optimization and the pre-optimized SQL scripts in the training set, where each piece of decision trajectory data includes multiple pieces of SQL grammar and an optimization operation sequence corresponding to the SQL grammar, then mapping relationships between the SQL grammar and the optimization operation in all the decision trajectory data are extracted, a data set is constructed according to the extracted mapping relationships between the SQL grammar and the optimization operation, and finally, iterative computation is performed on the data set through a separation algorithm and/or a regression algorithm, and a decision model is obtained through continuous learning.
In this embodiment, each decision trajectory data represents an optimization process of a group of SQL sample data, that is, each decision trajectory data includes a corresponding relationship between a group of SQL syntax and an optimization operation, so that multiple groups of SQL sample data can create multiple pieces of decision trajectory data, it can be understood that multiple optimization operations for the same SQL syntax exist in the multiple pieces of decision trajectory data, a mapping relationship between each SQL syntax and the optimization operations is extracted, a data set can be constructed according to the mapping relationship therebetween, and the corresponding relationship between each SQL syntax and the optimization operations can be analyzed by performing a certain mode of operation on the data set, so as to train to obtain a decision model. It should be noted that, for the same SQL syntax, there may be multiple optimization operations, but the transition probability of each optimization operation is different, so when the decision model is used to optimize the SQL statement, different optimized SQL statements may be obtained, and in addition, the transition probability may also change for different application environments.
Further, in the process of strengthening the decision model, firstly, a performance test can be performed on the SQL script before optimization and the pre-optimized SQL script in the test set, and a target performance index can be obtained according to the result of the performance test, wherein the target performance index includes a preset SQL execution time threshold, a memory occupancy threshold, a CPU occupancy threshold, a read-write database frequency threshold, a deadlock frequency threshold, and the like, then a reward function is created according to the target performance index, and the decision model is used for performing optimization and performance test on the SQL script before optimization in the test set to obtain the performance index of the optimized SQL script, wherein the performance index includes SQL execution time, memory occupancy, CPU occupancy, read-write database frequency, and the like, then the performance index of the optimized SQL script is compared with the target performance index, and an accumulated reward is obtained through a greedy algorithm and/or a value function algorithm according to the comparison result, and finally the transition probability of the decision model is optimized according to the accumulated reward.
In this embodiment, performance tests are performed on the pre-optimized SQL script and the pre-optimized SQL script in the test set, so that the pre-optimization effect can be quantified, and a target performance index can be established accordingly, where the target performance index may be qualitative or quantitative, for example, the target performance index may be that the SQL execution time is shortened by 20%, the memory occupancy rate is reduced by 30%, or the number of deadlock times cannot exceed 1, and the like. After the target performance index is set, the trained decision model can be used for automatically optimizing the SQL script before test centralized optimization, outputting the optimized SQL statement, then performing performance test on the output optimized SQL statement, comparing the performance test result with the target performance index, accumulating rewards according to the comparison result, and adjusting the transition probability of the decision model according to the accumulated rewards, so that the decision model has a better optimization strategy.
202. And acquiring the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized.
Specifically, the server may directly obtain the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized, where the application environment information corresponding to the SQL statement to be optimized includes information such as an application database and a database plug-in.
203. And converting the SQL statement to be optimized into an abstract syntax tree.
Specifically, the server may first split the SQL statement to be optimized into a plurality of morphemes according to a predetermined lexical rule through a lexical analyzer, then perform syntax analysis on the plurality of morphemes according to a set SQL syntax analysis rule through a syntax analyzer, and finally convert the plurality of morphemes conforming to the syntax rule into an abstract syntax tree. In this embodiment, the converted abstract syntax tree may represent the syntax structure of the SQL statement.
For example, the server obtains a segment of SQL statement to be optimized as follows: the SQL sentence is divided into word sequences according to a predefined SQL script IN a lexical analysis stage to output key word sequences (TOKEN) of the SQL sentence, constant words, variable words, operational words and the like, then the decomposed multiple word sequences are analyzed, and the parsed words are converted into an abstract word sequence according to a syntax analysis stage, and finally the tree syntax structure is converted into a tree syntax structure according to the abstract syntax analysis stage.
204. And optimizing the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized.
Specifically, the query condition of the SQL statement to be optimized can be obtained through the abstract syntax tree, then the decision model can determine the optimization strategy according to the application environment information corresponding to the SQL statement to be optimized, then the redundant query condition in the SQL statement is removed according to the optimization strategy, and finally the query condition from which the redundancy is removed is replaced with the query condition with the same semantic meaning or the query condition is split into a plurality of query conditions.
In the embodiment, various optimization processes can be performed on the abstract syntax tree by using the decision model. For example, the decision model may remove some meaningless query conditions, such as the statement select col1 that generates an empty table structure, col2 into # t from t where 1=0 may be replaced with create table # t (·), because the original statement will not return any result set, but will consume system resources; as another example, the decision model may replace the symbol "# with a specific list of fields, such as replacing" # in statement select from t with a specific list of fields, to avoid returning redundant columns; as another example, for consecutive values, the decision model may replace the in statement with a between statement or an exists statement, such as replacing the statement select num from a where num in (select num from b) with select num from a where exists (select 1 from b where num = a.num) to avoid a full table scan. It can be understood that the policy model may automatically select different optimization operations according to different application scenarios or different functions of the SQL statements to obtain different optimization results.
205. And converting the optimized abstract syntax tree into an SQL statement to obtain the optimized SQL statement.
Specifically, after the abstract syntax tree is optimized by using the decision model, semantic reverse analysis can be performed on the abstract syntax tree, and the optimized SQL statement can be obtained.
206. And performing performance test on the optimized SQL statement to obtain the performance index of the optimized SQL statement.
Specifically, after the decision model is used to obtain the optimized SQL statement, a performance test may be performed on the SQL statement to obtain a performance index of the optimized SQL statement. The performance indexes comprise SQL execution time, memory occupancy rate, CPU occupancy rate, database reading and writing times, deadlock times and the like.
207. And comparing the performance index of the optimized SQL statement with a preset performance index.
Specifically, after the performance index of the optimized SQL statement is obtained, the performance index of the optimized SQL statement may be compared with a preset performance index. The preset performance index may also be qualitative or quantitative, for example, the preset performance index may be that the SQL execution time is shortened by 20%, the memory occupancy rate is reduced by 30%, or the number of deadlocks cannot exceed 1.
208. And optimizing the SQL sentences which do not reach the preset performance index again and/or outputting index optimization suggestions and memory optimization suggestions of the SQL sentences to be optimized through a decision model.
Specifically, for an SQL statement that does not reach the preset performance index, the decision model may be used to optimize the SQL statement before optimization or the SQL statement after optimization again to obtain different optimization results, or the decision model may be used to output an index optimization suggestion and a memory optimization suggestion to obtain a better SQL statement.
Further, when outputting the index optimization suggestion and the memory optimization suggestion of the SQL statement to be optimized, the index information, the table building script information, and the execution plan of the SQL statement to be optimized may be first obtained, then the index execution information of the SQL statement to be optimized is obtained through the execution plan, a targeted index optimization suggestion is generated according to the index execution information, the index information of the SQL statement, and a preset index optimization policy, and a targeted memory optimization suggestion is generated according to the table building script information of the SQL statement and the preset memory optimization policy.
In this embodiment, the index optimization suggestion and the memory optimization suggestion may include various contents, and the index optimization aspect may suggest to build an index on a column of a where statement or an order by statement in a certain row, or suggest to delete an index built on some columns that are not frequently used to improve execution efficiency. In the aspect of memory optimization, a digital field can be used for replacing a character field, and a varchar field can be used for replacing a char field, so that the query and connection performance is improved, and the storage space is saved; or when the temporary table is newly built, if the data volume inserted at one time is large, the select intos is used for replacing the create table statement so as to avoid generating a large amount of logs and improve the speed, if the data volume inserted at one time is not large, the create table statement is firstly used, then the insert statement is used so as to alleviate the resources of the system table, in addition, frequent creation and deletion of the temporary table can be avoided so as to reduce the consumption of the resources of the system table, and the like.
The SQL optimization method provided by the embodiment can perform targeted optimization on SQL statements according to different application environments, perform performance testing and re-optimization on the optimized SQL statements, and output index optimization suggestions and memory optimization suggestions in a targeted manner, thereby further improving the efficiency of SQL optimization, reducing the time consumption of SQL optimization, and achieving a better optimization effect.
Further, as a specific implementation of the method shown in fig. 1 and fig. 2, this embodiment provides an SQL optimizing device, as shown in fig. 3, where the device includes: a model creating module 31, a data acquiring module 32, a semantic parsing module 33, an optimization processing module 34, and a sentence converting module 35, wherein:
the model creating module 31 is used for collecting multiple groups of SQL sample data and creating a decision model according to the SQL sample data;
the data acquisition module 32 is configured to acquire an SQL statement to be optimized and application environment information corresponding to the SQL statement to be optimized;
the semantic parsing module 33 is used for converting the SQL statement to be optimized into an abstract syntax tree;
the optimization processing module 34 is configured to perform optimization processing on the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized;
and the statement conversion module 35 may be configured to convert the optimized abstract syntax tree into an SQL statement, so as to obtain the optimized SQL statement.
In a specific application scenario, the multiple groups of SQL sample data include multiple groups of first SQL scripts before optimization, and a pre-optimized second SQL script corresponding to the first SQL script and corresponding application environment information, so the model creation module 31 may be specifically configured to divide the multiple groups of SQL sample data into a training set and a test set; creating a plurality of decision track data according to a first SQL script, a second SQL script and corresponding application environment information in a training set, wherein each decision track data comprises a plurality of SQL grammars and an optimized operation sequence corresponding to the SQL grammars; extracting the mapping relation between SQL grammar and optimized operation in all decision track data, and constructing a data set according to the extracted mapping relation between SQL grammar and optimized operation; and carrying out iterative computation on the data set through a separation algorithm and/or a regression algorithm, and training to obtain a decision model.
In a specific application scenario, the model creating module 31 may be further configured to perform performance tests on the first SQL script and the second SQL script in the test set, and obtain a target performance index according to a result of the performance tests, where the target performance index includes a preset SQL execution time threshold, a preset memory occupancy threshold, a preset CPU occupancy threshold, a preset read-write database time threshold, and a preset deadlock time threshold; creating a reward function according to the target performance index; optimizing the first SQL script in the test set by using the decision model to obtain the optimized first SQL script; performing performance test on the optimized first SQL script to obtain performance indexes of the optimized first SQL script, wherein the performance indexes comprise SQL execution time, memory occupancy rate, CPU occupancy rate, database reading and writing times and deadlock times; comparing the performance index of the optimized first SQL script with a target performance index, and obtaining accumulated rewards through a greedy algorithm and/or a value function algorithm according to a comparison result; optimizing the transition probabilities of the decision model based on the cumulative rewards.
In a specific application scenario, the semantic analysis module 33 is specifically configured to split the SQL statement to be optimized into a plurality of lemmas by a lexical analyzer according to a predetermined lexical rule; performing syntactic analysis on the multiple lemmas by using a syntactic analyzer according to a set syntactic rule; and converting a plurality of word elements which conform to the grammar rules into an abstract grammar tree.
In a specific application scenario, the optimization processing module 34 may be specifically configured to obtain, through an abstract syntax tree, a query condition of an SQL statement to be optimized; determining an optimization strategy according to application environment information corresponding to the SQL statement to be optimized; removing redundant query conditions according to an optimization strategy; and replacing the query condition without redundancy with a query condition with the same semantic meaning or splitting the query condition into a plurality of query conditions according to an optimization strategy.
In a specific application scenario, as shown in fig. 4, the apparatus further includes: performance test module 36, index comparison module 37, strengthen optimizing module 38, wherein:
the performance test module 36 may be configured to perform a performance test on the optimized SQL statement to obtain a performance index of the optimized SQL statement, where the performance index includes an SQL execution time, a memory occupancy rate, a CPU occupancy rate, a number of times of reading and writing the database, and a number of times of deadlock;
the index comparison module 37 is configured to compare the performance index of the optimized SQL statement with a preset performance index;
the reinforced optimization module 38 may be configured to optimize the optimized SQL statement again and/or output an index optimization suggestion and a memory optimization suggestion of the SQL statement to be optimized through the decision model if the performance test index of the optimized SQL statement does not reach the preset performance index.
In a specific application scenario, the reinforced optimization module 38 may be specifically configured to obtain index information, table-building script information, and an execution plan of an SQL statement to be optimized; acquiring index execution information of an SQL statement to be optimized through an execution plan; generating an index optimization suggestion according to the index execution information, the index information and a preset index optimization strategy; and generating a memory optimization suggestion according to the table building script information and a preset memory optimization strategy.
It should be noted that other corresponding descriptions of the functional units related to the SQL optimizing device provided in this embodiment may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not described herein again.
Based on the methods shown in fig. 1 and fig. 2, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the SQL optimization method shown in fig. 1 and fig. 2 is implemented.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, and the software product to be identified may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, or the like), and include several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the implementation scenarios of the present application.
Based on the foregoing methods shown in fig. 1 and fig. 2 and the SQL optimizing apparatus embodiments shown in fig. 3 and fig. 4, to achieve the foregoing object, this embodiment further provides an SQL optimized entity device, which may specifically be a personal computer, a server, a smart phone, a tablet computer, a smart watch, or other network devices, and the entity device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing the computer program to implement the above-mentioned methods as shown in fig. 1 and fig. 2.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
Those skilled in the art will appreciate that the SQL-optimized entity device structure provided in this embodiment does not constitute a limitation on the entity device, and may include more or less components, or combine some components, or arrange different components.
The storage medium may further include an operating system and a network communication module. The operating system is a program for managing the hardware of the entity device and the software resources to be identified, and supports the running of the information processing program and other software and/or programs to be identified. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.
Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus an essential general hardware platform, and can also be implemented by means of hardware. By applying the technical scheme of the application, the SQL sentence can be subjected to targeted optimization according to different application environments, the SQL sentence optimization efficiency can be improved by optimizing the SQL sentence through the strategy model, the time consumption of the SQL optimization is reduced, and a better optimization effect is achieved.
Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims (8)

1. A SQL optimization method, the method comprising:
collecting multiple groups of SQL sample data, and creating a decision model according to the SQL sample data;
acquiring an SQL statement to be optimized and application environment information corresponding to the SQL statement to be optimized;
converting the SQL statement to be optimized into an abstract syntax tree;
optimizing the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized;
converting the optimized abstract syntax tree into an SQL statement to obtain an optimized SQL statement;
wherein, the creating a decision model according to the SQL sample data comprises: dividing the multiple groups of SQL sample data into a training set and a test set, wherein the multiple groups of SQL sample data comprise multiple groups of first SQL scripts before optimization, pre-optimized second SQL scripts corresponding to the first SQL scripts and corresponding application environment information; creating a plurality of decision trajectory data according to a first SQL script, a second SQL script and corresponding application environment information in the training set, wherein each decision trajectory data comprises a plurality of SQL grammars and an optimized operation sequence corresponding to the SQL grammars; extracting the mapping relation between the SQL grammar and the optimization operation in all the decision trajectory data, and constructing a data set according to the extracted SQL grammar and the mapping relation of the optimization operation; performing iterative computation on the data set through a separation algorithm and/or a regression algorithm, and training to obtain a decision model;
the optimizing the abstract syntax tree by using the decision model according to the application environment information corresponding to the SQL statement to be optimized includes: obtaining the query condition of the SQL statement to be optimized through an abstract syntax tree; determining an optimization strategy according to the application environment information corresponding to the SQL statement to be optimized; removing redundant query conditions according to the optimization strategy; and replacing the query condition with redundancy removed with a query condition with the same semantic meaning or splitting the query condition into a plurality of query conditions according to the optimization strategy.
2. The method of claim 1, further comprising:
respectively performing performance tests on the first SQL script and the second SQL script in the test set, and obtaining target performance indexes according to the performance test results, wherein the target performance indexes comprise a preset SQL execution time threshold, a memory occupancy rate threshold, a CPU occupancy rate threshold, a read-write database frequency threshold and a deadlock frequency threshold;
creating a reward function according to the target performance index;
optimizing the first SQL script in the test set by using the decision model to obtain the optimized first SQL script;
performing performance test on the optimized first SQL script to obtain performance indexes of the optimized first SQL script, wherein the performance indexes comprise SQL execution time, memory occupancy rate, CPU occupancy rate, database reading and writing times and deadlock times;
comparing the performance index of the optimized first SQL script with a target performance index, and obtaining accumulated reward through a greedy algorithm and/or a value function algorithm according to a comparison result;
optimizing transition probabilities of the decision model based on the accumulated rewards.
3. The method according to claim 2, wherein the converting the SQL statement to be optimized into an abstract syntax tree specifically comprises:
splitting the SQL sentence to be optimized into a plurality of word elements according to a predetermined lexical rule through a lexical analyzer;
using a grammar analyzer to analyze the plurality of word elements according to a set grammar rule;
and converting a plurality of word elements which conform to the grammar rules into an abstract grammar tree.
4. The method of claim 3, further comprising:
performing performance test on the optimized SQL statement to obtain a performance index of the optimized SQL statement, wherein the performance index comprises SQL execution time, memory occupancy rate, CPU occupancy rate, database reading and writing times and deadlock times;
comparing the performance index of the optimized SQL statement with a preset performance index;
if the performance test index of the optimized SQL sentence does not reach the preset performance index, optimizing the optimized SQL sentence again and/or outputting an index optimization suggestion and a memory optimization suggestion of the SQL sentence to be optimized through the decision model.
5. The method according to claim 4, wherein the outputting, by the decision model, the index optimization suggestion and the memory optimization suggestion of the SQL statement to be optimized specifically includes:
acquiring index information, table-building script information and an execution plan of an SQL statement to be optimized;
acquiring index execution information of the SQL statement to be optimized through the execution plan;
generating an index optimization suggestion according to the index execution information, the index information and a preset index optimization strategy;
and generating a memory optimization suggestion according to the table building script information and a preset memory optimization strategy.
6. An SQL optimizing apparatus, the apparatus comprising:
the model creating module is used for acquiring a plurality of groups of SQL sample data and creating a decision model according to the SQL sample data;
the data acquisition module is used for acquiring the SQL statement to be optimized and the application environment information corresponding to the SQL statement to be optimized;
the semantic parsing module is used for converting the SQL statement to be optimized into an abstract syntax tree;
the optimization processing module is used for optimizing the abstract syntax tree by utilizing the decision model according to the application environment information corresponding to the SQL statement to be optimized;
the statement conversion module is used for converting the optimized abstract syntax tree into an SQL statement to obtain an optimized SQL statement;
the model creating module is specifically used for dividing the multiple groups of SQL sample data into a training set and a test set, wherein the multiple groups of SQL sample data comprise multiple groups of first SQL scripts before optimization, pre-optimized second SQL scripts corresponding to the first SQL scripts and corresponding application environment information; creating a plurality of decision trajectory data according to a first SQL script, a second SQL script and corresponding application environment information in the training set, wherein each decision trajectory data comprises a plurality of SQL grammars and an optimized operation sequence corresponding to the SQL grammars; extracting the mapping relation between SQL grammar and optimized operation in all decision track data, and constructing a data set according to the extracted mapping relation between SQL grammar and optimized operation; performing iterative computation on the data set through a separation algorithm and/or a regression algorithm, and training to obtain a decision model;
the optimization processing module is specifically used for obtaining the query condition of the SQL statement to be optimized through an abstract syntax tree; determining an optimization strategy according to the application environment information corresponding to the SQL statement to be optimized; removing redundant query conditions according to the optimization strategy; and replacing the query condition with redundancy removed with a query condition with the same semantic meaning or splitting the query condition into a plurality of query conditions according to the optimization strategy.
7. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
8. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 5 when executed by the processor.
CN202010141851.3A 2020-03-04 2020-03-04 SQL optimization method, device, storage medium and computer equipment Active CN111400338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010141851.3A CN111400338B (en) 2020-03-04 2020-03-04 SQL optimization method, device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010141851.3A CN111400338B (en) 2020-03-04 2020-03-04 SQL optimization method, device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN111400338A CN111400338A (en) 2020-07-10
CN111400338B true CN111400338B (en) 2022-11-22

Family

ID=71435996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010141851.3A Active CN111400338B (en) 2020-03-04 2020-03-04 SQL optimization method, device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN111400338B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897802B (en) * 2020-08-10 2023-08-04 中国工商银行股份有限公司 Database container fault positioning method and system
CN112434003B (en) * 2020-09-23 2022-11-18 苏宁云计算有限公司 SQL optimization method and device, computer equipment and storage medium
CN112347100B (en) * 2020-10-27 2024-03-26 杭州安恒信息技术股份有限公司 Database index optimization method, device, computer equipment and storage medium
CN112559554B (en) * 2020-12-24 2024-01-26 北京百家科技集团有限公司 Query statement optimization method and device
CN112667666A (en) * 2020-12-31 2021-04-16 民生科技有限责任公司 SQL operation time prediction method and system based on N-gram
CN113268494B (en) * 2021-05-24 2023-06-02 中国联合网络通信集团有限公司 Method and device for processing database statement to be optimized
CN113505141A (en) * 2021-07-05 2021-10-15 浪潮云信息技术股份公司 Automated method and system for realizing database SQL optimized execution based on artificial intelligence
CN114003231B (en) * 2021-09-28 2022-07-26 厦门国际银行股份有限公司 SQL syntax parse tree optimization method and system
CN114347039B (en) * 2022-02-14 2023-09-22 北京航空航天大学杭州创新研究院 Robot look-ahead control method and related device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021874A (en) * 2007-03-21 2007-08-22 金蝶软件(中国)有限公司 Method and apparatus for optimizing request to poll SQL
CN103761080A (en) * 2013-12-25 2014-04-30 中国农业大学 Structured query language (SQL) based MapReduce operation generating method and system
CN107247811A (en) * 2017-07-21 2017-10-13 中国联合网络通信集团有限公司 SQL statement performance optimization method and device based on oracle database
CN107644073A (en) * 2017-09-18 2018-01-30 广东中标数据科技股份有限公司 A kind of field consanguinity analysis method, system and device based on depth-first traversal
CN109753490A (en) * 2018-12-13 2019-05-14 深圳壹账通智能科技有限公司 Database optimizing method, system, equipment and medium based on loophole reparation
CN109766354A (en) * 2018-12-04 2019-05-17 北京辰森世纪科技股份有限公司 Optimization method, device and the equipment of business datum inquiry
CN110134706A (en) * 2019-04-01 2019-08-16 平安科技(深圳)有限公司 SQL statement automatic optimization method, device, computer equipment and storage medium
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110555032A (en) * 2019-09-09 2019-12-10 北京搜狐新媒体信息技术有限公司 Data blood relationship analysis method and system based on metadata
CN110555035A (en) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 Method and device for optimizing query statement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9146957B2 (en) * 2012-12-20 2015-09-29 Business Objects Software Ltd. Method and system for generating optimal membership-check queries

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021874A (en) * 2007-03-21 2007-08-22 金蝶软件(中国)有限公司 Method and apparatus for optimizing request to poll SQL
CN103761080A (en) * 2013-12-25 2014-04-30 中国农业大学 Structured query language (SQL) based MapReduce operation generating method and system
CN107247811A (en) * 2017-07-21 2017-10-13 中国联合网络通信集团有限公司 SQL statement performance optimization method and device based on oracle database
CN107644073A (en) * 2017-09-18 2018-01-30 广东中标数据科技股份有限公司 A kind of field consanguinity analysis method, system and device based on depth-first traversal
CN110555035A (en) * 2018-05-31 2019-12-10 阿里巴巴集团控股有限公司 Method and device for optimizing query statement
CN109766354A (en) * 2018-12-04 2019-05-17 北京辰森世纪科技股份有限公司 Optimization method, device and the equipment of business datum inquiry
CN109753490A (en) * 2018-12-13 2019-05-14 深圳壹账通智能科技有限公司 Database optimizing method, system, equipment and medium based on loophole reparation
CN110134706A (en) * 2019-04-01 2019-08-16 平安科技(深圳)有限公司 SQL statement automatic optimization method, device, computer equipment and storage medium
CN110362597A (en) * 2019-06-28 2019-10-22 华为技术有限公司 A kind of structured query language SQL injection detection method and device
CN110555032A (en) * 2019-09-09 2019-12-10 北京搜狐新媒体信息技术有限公司 Data blood relationship analysis method and system based on metadata

Also Published As

Publication number Publication date
CN111400338A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111400338B (en) SQL optimization method, device, storage medium and computer equipment
US11221832B2 (en) Pruning engine
US10430469B2 (en) Enhanced document input parsing
US9971967B2 (en) Generating a superset of question/answer action paths based on dynamically generated type sets
US9122540B2 (en) Transformation of computer programs and eliminating errors
US8364696B2 (en) Efficient incremental parsing of context sensitive programming languages
US11281864B2 (en) Dependency graph based natural language processing
CN115237920A (en) Load-oriented data index recommendation method and device and storage medium
CN111984625B (en) Database load characteristic processing method and device, medium and electronic equipment
CN115292347A (en) Active SQL algorithm performance checking device and method based on rules
CN115510139A (en) Data query method and device
CN114064606A (en) Database migration method, device, equipment, storage medium and system
CN112948419A (en) Query statement processing method and device
KR102605929B1 (en) Method for processing structured data and unstructured data by allocating different processor resource and data processing system providing the method
CN117041073B (en) Network behavior prediction method, system, equipment and storage medium
EP3944127A1 (en) Dependency graph based natural language processing
CN117609075A (en) Index use efficiency evaluation method and device, electronic equipment and storage medium
CN117520631A (en) Z+ operation optimization method and device based on big data Z+ platform
CN117290377A (en) Method and device for converting SQL sentences among relational databases
CN117609274A (en) Intelligent database language generation system and method
CN118152427A (en) SQL analysis method, system, terminal and medium based on large model
Wei et al. Exploiting Data-pattern-aware Vertical Partitioning to Achieve Fast and Low-cost Cloud Log Storage
CN117668024A (en) SQL statement analysis-based data processing and synchronizing method and system
CN115543836A (en) Script quality detection method and related equipment
CN117667976A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220526

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Area H, 666 Beijing East Road, Huangpu District, Shanghai 200001

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant