WO2023016537A1 - Database management system, data processing method, and device - Google Patents

Database management system, data processing method, and device Download PDF

Info

Publication number
WO2023016537A1
WO2023016537A1 PCT/CN2022/111991 CN2022111991W WO2023016537A1 WO 2023016537 A1 WO2023016537 A1 WO 2023016537A1 CN 2022111991 W CN2022111991 W CN 2022111991W WO 2023016537 A1 WO2023016537 A1 WO 2023016537A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
target
database
performance
data
Prior art date
Application number
PCT/CN2022/111991
Other languages
French (fr)
Chinese (zh)
Inventor
李国良
李士福
周煊赫
王天庆
Original Assignee
华为技术有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 清华大学 filed Critical 华为技术有限公司
Publication of WO2023016537A1 publication Critical patent/WO2023016537A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Definitions

  • the present application relates to the field of database management, in particular to a database management system, data processing method and equipment.
  • a database is a "warehouse that organizes, stores and manages data according to the data structure", and is a collection of organized, shareable and unified management of a large amount of data stored in a computer for a long time. Databases are critical to the efficient operation of modern businesses.
  • the management method of the database mainly depends on the database administrator (DBA for short), and the DBA often needs to spend a lot of time and energy to manage and maintain the database manually, which is very error-prone and affects the uptime, performance and quality of the database.
  • Security has catastrophic effects. For example, failure to properly patch and install security updates in a timely manner can lead to database vulnerabilities, weakening or even complete failure of database protection measures, which in turn exposes enterprises to serious data breach risks, severe financial impact and loss of goodwill.
  • this traditional database management method is based on traditional optimization techniques based on expert experience, such as cost estimation, connection sequence selection, and parameter tuning, which can no longer meet the requirements of multi-scenario business, massive applications, and extreme performance.
  • the embodiment of the present application provides a database management system, data processing method and equipment, which can automatically perform database optimization, protection, update and other routine database management tasks traditionally performed by DBAs by combining with machine learning technology function without human intervention.
  • the embodiment of the present application firstly provides a database management system, which may specifically include: a self-learning optimizer (it should be noted that some kernel components included in the optimizer provided by the present application may be native kernels Components, but the n models included in the optimizer are newly added in this application, n ⁇ 1), training data collector, model manager and model evaluator, wherein, the optimizer is used according to the SQL input database statement, the final physical plan to be executed (may be called the target physical plan) is obtained through the n models, wherein the target physical plan refers to the execution cost that meets a certain preset requirement (may be called the first preset requirement) physical plan.
  • a self-learning optimizer it should be noted that some kernel components included in the optimizer provided by the present application may be native kernels Components, but the n models included in the optimizer are newly added in this application, n ⁇ 1)
  • the optimizer is used according to the SQL input database statement
  • the final physical plan to be executed (may be called the target physical plan) is obtained
  • the training data collector is used to obtain training data according to the operation data of the process in the database (such as database operation indicators, query logs, system logs, etc.), and construct m training sets based on the obtained training data, where m ⁇ 1.
  • a model manager configured to use the target training set corresponding to the first target model to fine-tune the first target model when the first target model satisfies a certain preset requirement (which may be referred to as a second preset requirement). (finetune), thereby obtaining the second target model (the second target model is essentially the first target model with model parameters updated).
  • the first target model is one of the n models
  • the target training set is one of the m training sets.
  • the model parameters of the second target model after finetune can be passed to the model evaluator to evaluate the model performance.
  • the model evaluator is used to evaluate the performance of the obtained second target model, and when the performance of the second target model satisfies a certain preset requirement (which may be called the third preset requirement), the first An object model is updated to the second object model.
  • the update process may specifically be: the model evaluator sends the model parameters of the second target model to the optimizer, and the optimizer assigns the received updated model parameters to the first target model, thereby obtaining the Second target model.
  • the model evaluator may be a performance prediction model based on graph embedding.
  • the database management system includes an optimizer with n models, a training data collector, a model manager, and a model evaluator, wherein the optimizer with n models replaces the traditional heuristic optimizer,
  • the training data collector can generate training data for models involved in the database based on the running data of the processes in the database, and can realize continuous Optimize the database and provide credible autonomous operation and maintenance services
  • the model manager can call the corresponding target training set in the training data collector to fine-tune the first target model, so as to dynamically update and replace the database according to the real-time operation status of the database
  • the corresponding model used in ; the model evaluator is used to provide secondary verification capabilities to ensure that the best valid model is provided.
  • the database management system constructed in the embodiment of the present application can realize the function of automatically performing database tuning, protection, updating and other routine database management tasks traditionally performed by DBAs without human intervention by combining with machine learning technology
  • the database management system may further include a suggester, which may include p models, where p ⁇ 1; the suggester is used to discover the Abnormal conditions (that is, abnormal data found) in the operation data (such as CPU utilization rate, user response time, etc.), and the abnormal cause is diagnosed based on the obtained abnormal data, and then based on the obtained abnormal cause optimization and the abnormal cause
  • the corresponding optimization module (the optimization module is also located in the advisor, and the function of the optimization module is to optimize the parameters of the database) is to reduce the probability of abnormal occurrence of the operation data of the process in the database.
  • the constructed database management system can be based on machine learning methods, and a new suggester is added to the database management system to realize the functions of self-monitoring, self-diagnosis, and self-optimization, so as to automatically and intelligently optimize the database .
  • This solves the problem that database monitoring, configuration, diagnosis, and optimization methods (such as parameter tuning, slow SQL diagnosis, index/view consultant) in the prior art rely on DBA, have high cost, and cannot adapt to large-scale instances (such as cloud databases). .
  • the suggester may specifically include three models, which may be respectively referred to as a codec, a first model, and a second model, and are respectively used to perform self-monitoring, self-diagnosis, and Self-optimize these three steps.
  • the suggester encodes and decodes the running data of the process in the database through a codec to obtain encoded data, and compares the encoded data with the running data input to the codec to obtain abnormal data .
  • the principle of using the codec to obtain abnormal data is: the codec can restore the normal original data, but not the abnormal original data, so that the input original data is encoded and then decoded to obtain the encoded data.
  • the encoded data is compared with the original data to know whether there is abnormal data.
  • the suggester can further use the first model to diagnose the cause of the abnormality according to the abnormal data, wherein the first model is based on depth A model constructed by a learning algorithm; if the operating data belongs to query index data (such as average delay), the suggester can further diagnose and obtain the cause of the abnormality based on the abnormal data through the second model, wherein the second model is based on Models built by deep learning algorithms.
  • the suggester may also include more or fewer models for realizing the process of self-monitoring and self-diagnosis of the database. In the embodiment of this application, it is recommended Included are three models for illustration only.
  • the advisor implements the functions of self-monitoring and self-diagnosis of the database through the models contained therein, and realizes the purpose of the database management system to automatically and intelligently manage the operating data of the database.
  • the first model may include an LSRM model and a classifier.
  • the suggester encodes the found abnormal data into a compressed vector (ie dimensionality reduction/ Dimension-uplifted vector), and then use learning-type classifiers (such as binary classifiers, multi-classifiers, etc.) to reason about the corresponding root cause (such as database backup operations);
  • the second model can include Tree-LSTM models and The softmax function, specifically, the suggester encodes the slow query (that is, the execution time of the query is long) by calling the Tree-LSTM model, locates the operation of the physical operator (that is, the execution operator) that causes the exception, and then uses the softmax function to identify the resulting The root cause of the exception.
  • the above-mentioned first target model can not only be any one of the n models in the optimizer , can also be any one of the p models in the suggester.
  • the first target model can be any one of the optimizer and the recommender, which expands the selection range of the first target model, so that the subsequent model evaluator can not only evaluate the model in the optimizer performance, you can further evaluate the performance of the model in the proposer, and have the feasibility of the scheme.
  • the optimizer may specifically include three models, which may be referred to as the third model, the fourth model, and the fifth model, which are respectively used to perform logical query rewriting, cost estimation, and The physical plan is generated in these three steps.
  • the optimizer rewrites the logical query of the SQL statement (also referred to as SQL query) input to the database through the third model, so as to obtain the rewritten logical plan, wherein the third model is based on tree search
  • the model constructed by the algorithm for example, can be a Monte Carlo tree search algorithm; afterward, the optimizer generates q physical plans according to the logical plan through the fourth model, wherein the fourth model is constructed based on a deep learning algorithm
  • the model for example, can be a model based on Tree-LSTM, q ⁇ 1; finally, the optimizer calculates the q execution costs corresponding to the q physical plans through the fifth model (one physical plan corresponds to one execution cost) , and determine the final target physical plan to be executed according to the q execution overheads, wherein the fifth model is a model constructed based on a reinforcement learning algorithm, for example, it may be a model based on deep reinforcement learning (deep Q-learning, DQN) .
  • DQN deep reinforcement learning
  • the optimizer implements the functions of logical query rewriting, cost estimation and physical plan generation of the database through the model included in it, thereby replacing the traditional heuristic optimizer.
  • the combination of technologies realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the problems of inaccurate cost evaluation caused by current database architecture problems and poor physical plans generated by complex SQL statements.
  • the model evaluator will also be used to trigger the database to use the native kernel components in the database to generate the final to-be-executed
  • the target physical plan for example, the index selection module enables the traditional hill-climbing algorithm to create new indexes to execute the logic of the SQL statement. That is to say, if the performance of the second target model after finetune still does not meet the requirements, the traditional kernel component algorithm of the database is used to generate the target physical plan.
  • the model evaluator since the original kernel components of the database are not deleted, but coexist with the newly added optimizer in the database software, therefore, during the operation of the database, if the performance of the second target model does not meet the requirements of the first Three preset requirements, the model evaluator will also trigger the database to use the native kernel components in the database to generate the final target physical plan to be executed (because in this case, the native kernel components perform better), so that it can be based on The method of dynamically adjusting the evaluation of the current performance of the second target model in real time to generate the target physical plan improves the performance of the database as a whole.
  • the model evaluator can be used to trigger the database to generate the final target to be executed using native kernel components in the database
  • the physical plan may also feed back the information that the model update fails (that is, the performance of the second target model does not meet the third preset requirement) to the model manager, so that the model manager adjusts the first target model based on the information fine-tuning strategy.
  • the model evaluator can further feed back information about model update failure to the model manager, so that the model manager can provide reference for subsequent model training strategies, thereby improving model training capability and training efficiency.
  • the execution overhead meeting the first preset requirement includes but is not limited to:
  • the execution overhead of the target physical plan is the lowest among the q execution overheads.
  • the q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statements input into the database.
  • One physical plan corresponds to one execution overhead, where ,q ⁇ 1;
  • the execution cost of the target physical plan is lower than a certain preset value (which may be referred to as a first preset threshold).
  • a certain preset value which may be referred to as a first preset threshold.
  • the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
  • the first target model meeting the second preset requirement includes but is not limited to:
  • the performance of the first target model can be predicted by a model evaluator, assuming that the predicted performance of the first target model declines
  • the probability of reaching a certain preset value may be referred to as the third preset threshold value, for example, the probability that the predicted performance will decline reaches 80%;
  • the continuous running time of the first target model reaches a preset time period, for example, the continuous running time of the first target model reaches 30 minutes.
  • the second target model meeting the third preset requirement may include but not limited to:
  • the fourth preset threshold can be zero, which means that as long as When the performance of the second target model reaches the performance level of the original first target model, it is considered that the second target model meets the third preset requirement; as another example, the fourth preset threshold can also be a certain value greater than zero Or a certain ratio, indicating that only when the performance of the second target model is improved to a certain extent compared with the performance of the original first target model, the second target model is considered to meet the third preset requirement;
  • the performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan.
  • the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
  • the second aspect of the embodiment of the present application also provides a data management method, the method includes: first, the local computer device receives the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes Optimizer and database native kernel components, the optimizer includes n models, n ⁇ 1. After the computer device receives the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a certain preset requirement (which can be called the second preset requirement).
  • a target model that is, one of the n models
  • the computer device will obtain the target physical plan through the n models included in the optimizer according to the SQL statement, wherein the target physical plan To execute a physical plan whose cost meets the first preset requirement.
  • the computer device After the computer device obtains the final target physical plan, it will execute the target physical plan.
  • the essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
  • the computer device obtains the target physical plan based on the optimizer included in the database and finally executes the target physical plan.
  • the database deployed in the computer device includes an optimizer with n models, It replaces the traditional heuristic optimizer and realizes the conversion of logical queries into physical execution plans with higher execution efficiency by combining with machine learning technology.
  • the optimizer may specifically include three models, which may be referred to as the third model, the fourth model, and the fifth model, which are respectively used to perform logical query rewriting, cost estimation, and Steps for physical plan generation.
  • the way the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement can specifically be: first, the computer device uses the third model to input the SQL statement (also can be referred to as SQL query) to rewrite the logical query, so as to obtain the rewritten logical plan, wherein, the model A is a model based on a tree search algorithm, for example, it can be a Monte Carlo tree search algorithm; after that, through The fourth model is to generate q physical plans according to the logical plan, wherein the fourth model is a model based on a deep learning algorithm, for example, a model based on Tree-LSTM, q ⁇ 1; finally calculated by the fifth model q execution overheads corresponding to the q physical plans (
  • the optimizer implements the functions of logical query rewriting, cost estimation and physical plan generation of the database through the model included in it, thereby replacing the traditional heuristic optimizer.
  • the combination of technologies realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the problems of inaccurate cost evaluation caused by current database architecture problems and poor physical plans generated by complex SQL statements.
  • the computer device may also send the running data of the processes in the database deployed therein to the recommender, and the recommender may be deployed in the computer device or in the
  • the remote device is not limited here.
  • the advisor can send abnormal data based on the operation data, and diagnose the cause of the abnormality based on the obtained abnormal data, and finally optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the subsequent process in the database.
  • the computer device can also feed back the running data of the processes in the database to the suggester, and the suggester can give all-round optimization suggestions for the database based on the running data, and can realize an unattended database Performance monitoring and root cause identification greatly liberate operation and maintenance manpower, and can help the database system quickly recover from abnormalities or improve performance.
  • the computer device may also send the running data of the processes in the database deployed therein to the recommender, or send the running data to the training data collector, the
  • the training data collector can be deployed in the computer device or in the remote device, which is not limited here. After receiving the running data, the training data collector can obtain training data according to the running data, and construct m training sets based on these training data, where m ⁇ 1.
  • the computer device can also feed back the running data of the processes in the database to the training data collector, and the training data collector can generate the training data of the models involved in the database based on the running data of the processes in the database.
  • Data can realize continuous optimization of the database system, reduce the probability of misjudgment of the database system, and provide credible autonomous operation and maintenance services.
  • the computer device may further send a model manager (the model manager may be deployed in the computer device, or may be not deployed on the computer device, not limited here) to send a first instruction, the first instruction is used to instruct the model manager to fine-tune the first target model, and, when the performance of the second target model satisfies a predetermined
  • the computer device receives the model parameters of the second target model sent by the model manager, wherein the second target model is used by the model manager and the first target model
  • the target training set corresponding to the model is a model obtained by fine-tuning the first target model.
  • the target training set is one of the m training sets.
  • the computer device updates the first target model to the second target model, and passes The updated n models (at this time, the updated n models do not include the first target model, but include the second target model) to obtain the target physical plan.
  • the first target model does not meet the second preset requirements, it is also necessary to use the target training set corresponding to the first target model to set the first target model through the model manager.
  • the model is fine-tuned, and when the performance of the second target model obtained after fine-tuning meets the third preset requirement, the first target model is updated, so as to dynamically update and replace the used in the database according to the real-time operation status of the database corresponding model.
  • the computer device will also receive a second instruction sent by the model evaluator, and the second The instruction is used to instruct the database to use native kernel components in the database to generate the final target physical plan, wherein the model evaluator can be deployed in the computer device or in a remote device, which is not limited here.
  • the model evaluator is used to evaluate the performance of the second target model.
  • the computer device receives the second instruction of the model evaluator to instruct the database to use the traditional algorithm of the database (that is, the native kernel component) to generate the target physical plan.
  • the embodiment of the present application provides multiple options for generating the target physical plan, and has flexibility.
  • the first target model meeting the second preset requirements includes but is not limited to:
  • the performance of the first target model can be predicted by a model evaluator, assuming that the predicted performance of the first target model declines
  • the probability of reaching a certain preset value may be referred to as the third preset threshold value, for example, the probability that the predicted performance will decline reaches 80%;
  • the continuous running time of the first target model reaches a preset time period, for example, the continuous running time of the first target model reaches 30 minutes.
  • the second target model meeting the third preset requirement may include but not limited to:
  • the performance of the second target model has increased by a certain preset value (which may be called the fourth preset threshold).
  • the fourth preset threshold can be zero, which means that as long as When the performance of the second target model reaches the performance level of the original first target model, it is considered that the second target model meets the third preset requirement; as another example, the fourth preset threshold can also be a certain value greater than zero Or a certain ratio, it means that only when the performance of the second target model is improved to a certain extent compared with the performance of the original first target model, the second target model is considered to meet the third preset requirement;
  • the performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan.
  • the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
  • the execution overhead meeting the first preset requirement includes but is not limited to:
  • the execution overhead of the target physical plan is the lowest among the q execution overheads.
  • the q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statements input into the database.
  • One physical plan corresponds to one execution overhead, where ,q ⁇ 1;
  • the execution cost of the target physical plan is lower than a certain preset value (which may be referred to as a first preset threshold).
  • a certain preset value which may be referred to as a first preset threshold.
  • the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
  • a third aspect of the embodiments of the present application provides a computer device, where the computer device has a function of implementing the method of the second aspect or any possible implementation manner of the second aspect.
  • This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the fourth aspect of the embodiment of the present application provides a computer device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call the program stored in the memory to execute the second aspect of the embodiment of the present application Or any possible implementation method of the second aspect.
  • the fifth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer, the computer can execute any one of the above-mentioned second aspect or the second aspect. method of possible implementation.
  • the sixth aspect of the embodiments of the present application provides a computer program, which, when running on a computer, causes the computer to execute the method of the above-mentioned second aspect or any possible implementation manner of the second aspect.
  • the seventh aspect of the embodiment of the present application provides a chip, the chip includes at least one processor and at least one interface circuit, the interface circuit is coupled to the processor, and the at least one interface circuit is used to perform the function of sending and receiving, and send instructions to At least one processor, at least one processor is used to run computer programs or instructions, which has the function of realizing the method of the second aspect or any possible implementation mode of the second aspect above, and this function can be realized by hardware or by software Realization can also be achieved through a combination of hardware and software, where the hardware or software includes one or more modules corresponding to the above functions.
  • the interface circuit is used to communicate with other modules outside the chip.
  • Fig. 1 is a schematic diagram of a system architecture for constructing a database management system provided by the embodiment of the present application;
  • FIG. 2 is a schematic diagram of a logical architecture of the database management system provided by the embodiment of the present application.
  • Fig. 3 is a schematic diagram of the principle of the optimizer provided by the embodiment of the present application.
  • Fig. 4 is a schematic diagram of the principle of the model evaluator provided by the embodiment of the present application.
  • Fig. 5 is a schematic diagram of the principle of the suggester provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of the comparison between the optimizer provided by the embodiment of the present application and three re-strategies
  • Fig. 7 is a schematic diagram comparing the model evaluator provided by the embodiment of the present application with two known performance evaluation methods
  • FIG. 8 is a schematic flowchart of a data processing method provided in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 10 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the embodiment of the present application provides a database management system, data processing method and equipment, which can automatically perform database optimization, protection, update and other routine database management tasks traditionally performed by DBAs by combining with machine learning technology function without human intervention.
  • the embodiment of the present application involves a lot of related knowledge about databases, models, etc.
  • the following first introduces related terms and concepts that may be involved in the embodiment of the present application. It should be understood that the interpretation of related concepts may be limited due to the specific conditions of the embodiment of the application, but it does not mean that the application is limited to the specific conditions, and there may be differences in the specific conditions of different embodiments. Specifically, there is no limitation here.
  • a database is a computer software system that stores and manages data according to its data structure.
  • the concept of the database actually includes two meanings: a.
  • the database is an entity, which is a "warehouse” that can reasonably store data. Users store transaction data to be managed in the "warehouse”. Concepts are combined into a database.
  • b. Database is a new method and technology of data management. It can organize data more appropriately, maintain data more conveniently, control data more closely and utilize data more effectively.
  • Database software is deployed on local devices, such as local servers and local terminal devices (such as mobile phones, smart watches, personal computers, etc.), and usually exists in the form of single or multiple processes, so database software can also be called database processes .
  • a neural network can be composed of neural units. Specifically, it can be understood as a neural network with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, and the last layer is the output layer. The number of layers in the middle is is the hidden layer. Among them, a neural network with many hidden layers is called a deep neural network (DNN).
  • DNN deep neural network
  • the work of each layer in a neural network can be expressed mathematically To describe, from the physical level, the work of each layer in the neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the column space) through five operations on the input space (a collection of input vectors). Space), these five operations include: 1. Dimension up/down; 2. Zoom in/out; 3.
  • Space refers to the collection of all individuals of such things, where W is the weight matrix of each layer of the neural network , each value in this matrix represents the weight value of a neuron in the layer.
  • the matrix W determines the space transformation from the input space to the output space mentioned above, that is, the W of each layer of the neural network controls how to transform the space.
  • the purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially to learn the way to control the spatial transformation, and more specifically, to learn the weight matrix.
  • the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal until the output will generate an error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • Machine learning is a multi-field interdisciplinary subject, involving probability theory, statistics, convex analysis, algorithm complexity and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire knowledge or skills and reorganize existing The knowledge structure enables it to continuously improve its own functions.
  • the following introduces several machine learning models used in the embodiments of this application:
  • Monte Carlo tree search (monte carlo tree search, MCTS) is a method of strategy optimization on artificial intelligence problems, usually for those parts that require mobile planning in combination games, Monte Carlo tree search will randomly simulate Versatility combined with tree search accuracy. Due to its achievements in computer Go and the potential to solve some difficult problems, the application field of the Monte Carlo tree search algorithm can cover any field that can be described in the form of (state, action) and predict the result through simulation ( e.g., the rewrite sequence selection problem in query rewriting).
  • RNN Recurrent neural networks
  • RNN is a kind of neural network, and the purpose of RNN is to process sequence data.
  • the layers are fully connected, and the nodes between each layer are unconnected.
  • this ordinary neural network is powerless for many problems. For example, to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output.
  • RNN can process sequence data of any length.
  • the time recurrent neural network can describe dynamic time behavior, because unlike the feedforward neural network (feed forward neural network) that accepts a more specific structure input, RNN circulates the state in its own network, so it can accept a wider range of time series structure input .
  • LSTM can also be called long-term short-term memory network. It is a time-cycle neural network. It is specially designed to solve the long-term dependence problem of general RNN. All RNNs have a chain form of repeated neural network modules. . In standard RNNs, this repeated structural module has only a very simple structure, such as a tanh layer.
  • Tree long short-term memory artificial neural network (tree long short-term memory, Tree-LSTM)
  • Tree-LSTM mainly extends LSTM to the tree-shaped input structure, and surpasses the traditional LSTM model in predicting semantic relevance and semantic classification tasks on the tree.
  • CNN Convolutional neural networks
  • the CNN is a deep neural network with a convolutional structure.
  • the convolutional neural network includes a feature extractor composed of a convolutional layer and a subsampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolutional feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. That means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information.
  • multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • GCN Graph convolutional network
  • RL is a field in machine learning that emphasizes how to act based on the environment to maximize the expected benefit.
  • RL is the third basic machine learning method besides supervised learning and unsupervised learning. Unlike supervised learning, RL does not require labeled input/output pairs, nor does it require accurate correction of non-optimal solutions. Its focus is to find the balance between exploration (for unknown domains) and utilization (for existing knowledge), the exchange of "exploration-utilization" in reinforcement learning, multi-armed bandit problems and limited Markov decision-making processes (markov decision process (MDP) is the most studied.
  • MDP Markov decision process
  • RL controls the execution sequence of each connection operation to ensure that the total query execution cost is minimized.
  • RL also has a training process, which requires continuous execution of actions, observation of the effect of the execution of actions, and accumulation of experience to form a model. Different from supervised learning, each action here generally does not have a directly calibrated label value as a supervisory signal, and the system only gives a feedback to the action performed by the algorithm.
  • Remote procedure call protocol remote procedure call
  • RPC is a protocol for requesting services from remote computer programs over a network without requiring knowledge of the underlying network technology.
  • RPC is used to realize the rapid interaction between the database kernel component and the external model manager, such as making a model update request, creating a new model, and so on.
  • the core idea of SageDB is to build multiple cumulative distribution functions (CDF) about data distribution, and use these CDF models to generate learning indexes, replacement cost estimation models, and accelerated physical operators.
  • CDF cumulative distribution functions
  • SageDB first assumes that it can learn a "perfect" CDF model, that is, the probability distribution of the model accurately conforms to the data distribution of the corresponding data table, and then inserts the CDF model into different modules of the database to provide machine learning-based Reasoning ability: 1) For the optimizer, SageDB directly replaces the cost estimation model with the CDF model learned on a single table to estimate the cost (cost) and cardinality of different queries; 2) For the data structure, SageDB directly uses the traditional The block on the multi-way search tree (balance tree, B-Tree) is replaced by the learned CDF model, and a list of abnormal conditions is summarized to correct the problem of CDF positioning errors; 3) For physical operator acceleration, use Take the sorting operation as an example. SageDB first performs rough sorting on the underlying data based on the learned CDF model (input data values, output relative position numbers), and then use traditional sorting algorithms (such as quick sort) to get the final sorting results.
  • a "perfect" CDF model that is, the probability distribution of
  • SageDB is still in the process of research, using a simple CDF experimental model, which is difficult to adapt to large-scale data sets and other issues; and all learning functions of SageDB are based on the CDF model cluster learned on a single table, suitable for simple single-table query scenario, but cannot effectively deal with multi-table connection problems; in addition, CDF is only used to learn data or load distribution, and cannot provide intelligent decision-making functions such as query rewriting, query plan generation, abnormal diagnosis, etc., and SageDB does not provide multi-CDF model management and model update mechanism.
  • Oracle Database has a long-term investment in the automated operation and maintenance of the database.
  • Oracle 10g introduces various self-management functions to simplify management, improve efficiency, and reduce the total cost related to system management. These management functions include: 1) Statistical analysis related to SQL query optimization; 2) Automatic storage manager: Simplifies how data files, control files and log files are stored; 3) Automatic workload repository: Stores and manages Adjusted information; 4) Automatic database diagnostic monitor: analyze stored statistics, identify possible performance bottlenecks and provide suggestions for solving found problems; 5) Automatic query optimization: determine the execution structure by using query rewriting rules and cost models 6) Automatically generate tuning suggestions for SQL statements or workloads.
  • SQL tuning recommendations make decisions based on information provided by query optimizers, including automatic database diagnostic monitors and automatic workload repositories; 8) based on current load Quantity recommended indexes (including bitmap indexes, function-based indexes, and B-tree indexes), materialized views, table partitions, and indexes.
  • Optimizer statistics collector collect statistics about optimization; 10) By managing database snapshots and storing information, coordinating all in the server Self-management; 11) Server generates alerts, configure the system to automatically generate alerts when events are triggered; 12) Automate pre-installation and post-installation tasks: systems are automatically checked before installation to ensure the success of the installation process and recommend changes; 13) Automatically manage the shared memory used by the Oracle database instance, and eliminate the need for administrators to manually configure shared memory components; 14) The database resource manager allows the DBA to logically divide the workload into different units, and central processing unit (central processing unit) , CPU) resources are allocated to these units without additional overhead.
  • central processing unit central processing unit
  • on-line transaction processing should take precedence over on-line analytical processing (OLAP), and vice versa.
  • OLTP on-line transaction processing
  • OLAP on-line analytical processing
  • Its scheduling mechanism is used for fixed time intervals and controls the number of active sessions executing at one time. When an active session's available slots are filled with new sessions, the remaining sessions are queued until a slot becomes available.
  • Its first autonomous database release version is 19c, which provides external services in the form of public cloud, including functions such as automatic indexing, distributed columns, and materialized view recommendations.
  • the optimization capabilities of the Oracle database are mainly presented in the form of single-point, independent functions, and the functions are not unified to form a closed loop. Users need to call on demand according to their own needs; in addition, the autonomous functions of the Oracle database are mainly reflected in the In the stage of analysis and management, based on limited rules or traditional statistical learning, the ability to optimize database exceptions is limited; moreover, Oracle database does not provide a unified model and training data management and update mechanism, nor does it provide component performance verification functions, which is passive Tuning.
  • the embodiment of this application provides a new database management system, which is based on machine learning algorithms and expert experience, realizes self-learning kernel and model optimization, and builds all-round autonomous functions of the database.
  • FIG. 1 is a schematic diagram of a system architecture for constructing the database management system provided by the embodiment of the present application.
  • Machine learning platform component 102, self-learning suggester 103 may be referred to as suggester 103 for short
  • the module functions of each structure are introduced below:
  • the database software 101 (similar to server software) is deployed on a local device, for example, can be deployed on a local server or a local terminal device (such as a mobile phone, a personal computer, etc.), and usually exists in the form of a single or multiple processes.
  • the system architecture of the embodiment of the present application includes the self-learning kernel components of the database, and replaces or replaces the algorithm or implementation of the traditional database native kernel components (it should be noted that the native kernel components are not deleted, and they are still in the database process). Overall database reliability or performance improvement.
  • the self-learning kernel component means that it does not replace a certain algorithm through machine learning at a single point. Its outstanding capabilities are based on changes in system load or business status, automatic adaptation to scenarios, automatic updates based on algorithm training models, and access to feedback mechanisms and verification mechanisms. , for automatic model drift and continuous model availability.
  • the self-learning kernel components can be identified as self-learning optimizer 1011 (may be referred to as optimizer 1011 for short), self-learning index, self-learning storage and self-learning executor, etc., different
  • the self-learning kernel component can correspondingly realize the functions of the corresponding modules in the native kernel component.
  • the implementation mechanism and calling logic of the self-learning kernel component are mainly introduced by using the optimizer 1011 .
  • the data source is the database system, including but not limited to the internal indicators of the database (such as the number of transaction processing per second (transactions per second, TPS), cache hits, active transactions, resource usage, etc.), operating system information, log information, etc.
  • TPS transactions per second
  • the model manager 1022 also referred to as the model management platform
  • the training algorithm combined with the data information, completes the model training.
  • the trained model will be pushed to the model evaluator 1023, and only the model that has been evaluated and meets business expectations can be identified as the model to be applied, otherwise it needs to be readjusted and trained.
  • the information collector can be deployed in the database software 101, or it can be a process deployed separately outside the database software 101.
  • the purpose of the separate deployment is to decouple from the database and to implement specific functions (that is, the process in the database
  • the collection function of the operating data) the specific embodiment of the present application does not limit the deployment mode of the information collector.
  • the function of the information collector is to collect data
  • the function of the information collector can be integrated in the optimizer 1011, or integrated in the training data collection
  • the training data collector 1021 also functions as an information collector is taken as an example for illustration, and details will not be described later.
  • the models included in the machine learning platform component 102 may be pre-trained in advance, and the models deployed in the system architecture may refer to pre-trained models.
  • the training data collected by the training data collector 1021 in the embodiment of the present application can be used to fine-tune the pre-trained model (for example, fine-tuning can be performed when the performance of the model declines after the model has been applied for a period of time); the machine The models included in the learning platform component 102 may also not be pre-trained in advance, but the initialized models are directly deployed on the system architecture, and then the training data collected by the training data collector 1021 are used to train each model and Subsequent fine-tuning, specifically, this application does not limit the state of the model included in the machine learning platform component 102 during deployment (that is, whether it has been pre-trained).
  • the machine learning platform component 102 may be deployed on a remote device (for example, a remote server, as shown in FIG. 1 is deployed on a remote device), or may Deploying local devices (for example, local servers) can even be implemented together with the database in the same process (that is, implemented in the database kernel).
  • the individual component program is the least intrusive way to the existing capabilities of the database, and can be used as an accessory to iteratively replace the database kernel module capabilities gradually. The same is true for deployment on local devices. It is less intrusive to the database, but it will force the same device resources as the database. Usually, new scheduling components are required for balance and resource control.
  • Integrate the machine learning component platform in the database kernel that is, the database itself provides machine learning (including but not limited to deep learning, reinforcement learning, etc.) capabilities, which is very intrusive to the database, but the data privacy protection is good, communication overhead is reduced, and the interface is indirectly convenient and more convenient. Easy model tuning or fine-tuning.
  • the suggester 103 is used to discover possible problems in the running process of the database, and perform diagnosis and tuning for intelligent operation and maintenance management of the database.
  • the recommender 103 also needs a machine learning platform component to manage the algorithm model used in the intelligent operation and maintenance process.
  • the machine learning platform used can be of the same origin as the machine learning platform components used by the database system, that is, the two can use the same machine learning platform, or can be managed separately, that is, the model in the suggester 103 can deploy a machine learning platform component separately, specifically This application does not limit this, but the functions or mechanisms of the machine learning platform components have not changed, and the model is also automatically updated to provide a learning and feedback mechanism to ensure the availability of the model.
  • the suggester 103 is not a manifestation of the database kernel capability, but is used for database management, and can tune or strengthen the capabilities provided by the database kernel module. After interacting with the database, the suggester 103 obtains more information and suggestions, and the model is more optimized, which is beneficial to the intelligent implementation of the system.
  • the suggester 103 needs to report diagnostic information and health index, and also needs to accept instructions from users. At this time, a web front-end is required to realize this function.
  • the implementation method is a conventional operation, which will not be described here.
  • the system architecture of the data management system includes a suggester 103, and it can be seen from the above description that the module function of the suggester 103 is to find possible problems in the database operation process, and perform diagnostics and tuning. Therefore, in other implementation manners of the present application, the suggester 103 may not be required.
  • Each functional module is respectively the optimizer 1011 shown in Figure 1, Training Data Collector 1021 (Training Data Collector may also be referred to as Training Data Collection Platform), Model Manager 1022 (Model Manager may also be referred to as Model Management Platform), Model Evaluator 1023, Adviser 103 (in some embodiments , the suggester 103 may also not be required). Please refer to FIG. 2 for details.
  • FIG. 1 Training Data Collector 1021 (Training Data Collector may also be referred to as Training Data Collection Platform), Model Manager 1022 (Model Manager may also be referred to as Model Management Platform), Model Evaluator 1023, Adviser 103 (in some embodiments , the suggester 103 may also not be required).
  • FIG. 2 for details.
  • the new module includes self-learning optimizer 201 (it should be noted that some kernel components included in the optimizer 201 provided by this application can be native kernel components, but the n models included in the optimizer 201 This application is newly added), training data collector 202, model manager 203, model evaluator 204 and self-learning suggester 205 (in some embodiments, suggester 205 may not be required), the following are respectively from the specific functions
  • Each function module is described logically and called:
  • the learning optimizer used by the embodiment of the present application uses machine learning technology to improve performance.
  • the optimizer 201 includes n models, where n ⁇ 1.
  • the optimizer 201 is used to obtain the final physical plan (which may be called the target physical plan) to be executed through the n models according to the SQL statement input into the database, wherein the target physical plan means that the execution cost meets a certain predetermined
  • a physical plan for setting requirements (which may be referred to as the first predetermined requirement).
  • the execution overhead meeting the first preset requirements includes but is not limited to: 1) The execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are The corresponding execution costs of q physical plans generated based on the SQL statements input into the database, one physical plan corresponds to one execution cost, where q ⁇ 1; 2) The execution cost of the target physical plan is lower than a certain preset value (which can be called is the first preset threshold).
  • the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
  • the optimizer 201 may specifically include three models, which may be referred to as model A, model B, and model C respectively, which are respectively used to perform logical query rewriting, cost estimation, and physical The plan generates these three steps. It should be noted here that, in other implementations of this application, the optimizer 201 may also include more or fewer models for realizing the process of logical query rewriting, cost estimation, and physical plan generation. In this application, the three models included in the optimizer 201 are only for illustration, and will not be described in detail later.
  • the optimizer 201 performs logical query rewriting on the SQL statement (also referred to as SQL query) input to the database through model A, so as to obtain a rewritten logical plan, wherein the model A is based on a tree search algorithm
  • the constructed model for example, may be a Monte Carlo tree search algorithm; afterward, the optimizer 201 generates q physical plans according to the logical plan through the model B, wherein the model B is a model constructed based on a deep learning algorithm, For example, it may be a model based on Tree-LSTM, q ⁇ 1; finally, the optimizer 201 calculates the q execution costs corresponding to the q physical plans (one physical plan corresponds to one execution cost) through the model C, and The final target physical plan to be executed is determined according to the q execution costs, wherein the model C is a model constructed based on a reinforcement learning algorithm, for example, a model based on DQN.
  • the above three steps performed by the optimizer 201 can specifically be implemented through three sub-modules: a learning-type rewriter, a learning-type cost estimator, and a learning-type plan generator.
  • this process can specifically be: first, the rewriter provided in the embodiment of the present application uses a model based on a tree search algorithm to convert the initial SQL statement input into the database system into a semantically equivalent logical plan A recognizable by the database system; then based on The cost estimator obtains a rewritten logical plan B with higher execution efficiency.
  • the optimizer 201 including three sub-modules of a learning-type rewriter, a learning-type cost estimator, and a learning-type plan generator as an example, the logical query rewriting and cost estimation executed by the optimizer 201 And the process of generating the physical plan is described in detail:
  • the learning rewriter uses an MCTS-based approach to rewrite the input SQL statement into an equivalent but less expensive query. It first constructs a strategy tree, where the root node is the original query, and the tree nodes are obtained from its The parent node rewrites the resulting query. This rewriter uses an MCTS method.
  • each child node represents the semantically equivalent logical plan obtained by the parent node through a rewriting operation; then in the strategy tree The last iteration selects the equivalent logical plan with the least cost or the lowest probability of being selected, and expands the strategy tree (that is, according to all the rewriting strategies of the selected plan, add a new child node under the tree node corresponding to the selected plan); finally The logical plan with the least execution cost on the strategy tree is selected as the output of the rewriter.
  • the learned cost estimator uses deep learning-based methods to estimate the cost and cardinality of queries, which can capture the correlation between different columns. It designs a tree structure model of a physical plan that matches the query statement.
  • the analogy physical plan is composed of multiple sub-plans, and each tree structure model can be composed of several sub-models.
  • the embodiment of this application uses the tree structure model to estimate the cost of the plan or base.
  • the learning plan generator uses a method based on reinforcement learning to generate an optimized physical plan (that is, the target physical plan).
  • the logic here is: the generated equivalent logical plan will correspond to multiple execution plan trees, and each execution plan tree includes One or more execution operators (also called physical operators), each execution plan tree may have multiple execution paths, involving different execution operators, one execution plan tree corresponds to a total cost, our The goal is to find the physical plan with the smallest total overhead. It uses the reinforcement learning of LSTM with tree structure to select the connection order.
  • the plan generator uses Tree-LSTM to encode the current physical plan into a compressed vector as the state of deep reinforcement learning, and then iterates multiple times, each Select the connection operation with the highest long-term income for the first time, and finally output the physical plan with the lowest execution cost as the actual logic for executing the SQL statement.
  • GCN may be used to capture the structure of the connection tree supporting database schema update and multiple aliases for table names. The model can automatically select the appropriate physical operator.
  • the call logic of the optimizer 201 is: in the learning optimizer 201 provided by the embodiment of the present application, for the logical plan input through the SQL query parser, the rewriter first builds a The logical plan is the strategy tree of the root node, where each child node represents the equivalent logical plan obtained by the parent node after a rewriting operation. Based on MCTS, the rewriter searches the strategy tree for an equivalent logical plan with the least cost, and then inputs it to the plan generator. The plan generator iteratively adjusts the sequence of join operations to obtain multiple different physical plans. For each physical plan, the cost estimator is used to estimate the execution cost, and then the physical plan output with the smallest execution cost is selected.
  • FIG. 3 is a schematic diagram of the principle of the optimizer provided by the embodiment of the present application.
  • the learning model rewrites SQL statements, estimates costs, and selects physical execution plans during database execution.
  • the model in this process can be updated through feedback and incremental training to dynamically adapt to load changes. Its core steps are as follows:
  • a strategy tree is constructed first, in which the root node is the input SQL query, and the non-root nodes are rewritten query statements. Find the rewriting sequence that obtains the maximum benefit through the MCTS search algorithm, that is, iteratively select the equivalent logical plan with the least cost or the least frequently selected equivalent logical plan on the strategy tree, and expand the strategy tree (that is, according to all rewrite strategies of the selected plan , add a new child node under the corresponding tree node of the selected plan); finally select the logical plan with the least execution cost on the strategy tree as the output of the rewriter.
  • the second step is to confirm the potential income of each tree node, and design a neural network-based income estimation model (for example, the attention layer calculates the similarity between rules and rules on the rewriting operator), and predicts the execution overhead that can be reduced in the subsequent queries.
  • a neural network-based income estimation model For example, the attention layer calculates the similarity between rules and rules on the rewriting operator), and predicts the execution overhead that can be reduced in the subsequent queries.
  • the third step in order to improve the search efficiency, especially when there are many logical operators in the query, dynamic programming is used to calculate from the bottom up to select the optimal top N nodes without ancestor-descendant relationship from each node and its subtrees. node, to ensure the maximum total revenue value; then output the node selection scheme corresponding to the root node, which means that the optimal rewritten query can be obtained with the highest probability from the corresponding N node expansion strategy tree.
  • the training data is based on the collected historical query statements, and through feature extraction, the training data is input into the model to be trained.
  • the weight of the model is updated through backpropagation.
  • each intermediate state contains optional plans for part of the plan tree, forming a plan forest.
  • the training process is divided into cost training and delay adjustment. Among them, the cost training continuously selects fragments in the execution plan through the reinforcement learning method to judge whether the operation of the newly selected plan conforms to the optimal plan; in this process, the Q value retrieval method in the Tree-LSTM model is used to initially determine the pros and cons of the plan.
  • delay tuning only a few planned delays are used as training data for model fine-tuning.
  • DQN uses Q network estimation and finds which execution tree is better.
  • plan tree there are three types of leaf nodes, including columns, tables, and operations. Use the depth-first search to traverse the plan tree, and the Tree-LSTM network layer judges that each leaf node represents the cost.
  • the matching plan is the real executable plan of the statement.
  • the database execution engine executes the above-mentioned optimized SQL statement execution plan, that is, executes the final target physical plan.
  • the training data collector 202 is configured to obtain training data according to the running data of the processes in the database, and construct m training sets based on the obtained training data, where m ⁇ 1.
  • the training data collector 202 can automatically collect statistical information of the database, including database operation indicators, query logs, system logs, etc., and use these information to generate all the learning data involved in the database management system.
  • type model i.e. the model included in the suggester 205 (if any), the optimizer 201 and the model evaluator 204
  • can generate corresponding training sets for different models i.e. construct m training sets
  • the optimizer 201, the recommender 205, and the model evaluation 204 include 6 models in total
  • a model corresponds to a training set.
  • the training set constructed can also be less than 6 types, that is, m ⁇ 6.
  • some models can share one training set. There is no limit to the corresponding relationship between the training set and the model.
  • the models involved in the embodiments of this application may be pre-trained in advance, that is, the models deployed in the database management system all refer to pre-trained models.
  • the training data collected by the training data collector 202 can be used to fine-tune the pre-trained model (for example, fine-tuning can be performed when the performance of the model declines after a period of application); the model involved in the embodiment of the present application can also be It may not be pre-trained in advance, but directly deploy the initialized model to the database management system, and then use the training data collected by the training data collector 202 to train each model and perform subsequent fine-tuning.
  • the deployment state of the model included in the management system (that is, whether it has been pre-trained) is not limited.
  • the training data collector 202 can collect the running data of the processes in the database from various aspects, including but not limited to: 1) Database indicators: the running status of the database, such as queries per second Number (query per second, QPS), CPU usage, cache hit rate, etc. These are usually represented by time-series data; 2) SQL queries: which collect SQL queries and their statistics, such as physical plan, response time and duration, etc. 3) Database log: It collects running logs. Since different models in the database management system require different training data, this embodiment of the application can intelligently organize training data for different learning modules, including organizing related columns into the same table To reduce connection overhead, select training data for the model, etc.
  • the invocation logic of the training data collector 202 is: the training data collector 202 receives the running data (such as the collection information of the database Agent program) from the process in the database, and performs data cleaning on the received running data And data processing operations (for example, after data cleaning, data merging, multi-indicator direct correlation analysis and other operations, so that the data is more suitable for subsequent model training or fine-tuning), the training data is obtained, and the training data is constructed into m training sets, It is used for training or fine-tuning of each model in the database management system.
  • the running data such as the collection information of the database Agent program
  • data processing operations for example, after data cleaning, data merging, multi-indicator direct correlation analysis and other operations, so that the data is more suitable for subsequent model training or fine-tuning
  • the model is a model that has not been pre-trained, train the model based on the specified algorithm and training data; if the model is a model that has been pre-trained, evaluate the relationship between the newly obtained training data and the pre-trained model relationship, and continuously monitor to decide whether to update the model; the monitoring time is based on the data source of the training model, whether it is high frequency and easy to change data.
  • the model manager 203 is configured to use the target training set corresponding to the first target model when the first target model satisfies a certain preset requirement (which may be referred to as the second preset requirement).
  • the first target model is fine-tuned to obtain a second target model (the second target model is essentially the first target model with model parameters updated).
  • the first target model is one of the n models
  • the target training set is one of the m training sets.
  • the model parameters of the finetune second target model can be passed to the model evaluator 204 for model performance evaluation.
  • the first target model meeting the second preset requirements includes but is not limited to: 1) the performance of the first target model starts to decline; 2) the first target model The performance of the model not only declines, but the degree of decline reaches a certain preset value (which may be called the second preset threshold); 3) Evaluate the real-time performance of the first target model and predict the next performance, for example , the performance of the first target model can be predicted by the model evaluator 204, assuming that the predicted probability of performance degradation of the first target model reaches a certain preset value (which can be called a third preset threshold), such as predicting The probability that the performance of the first target model will decrease reaches 80%; 4) The continuous running time of the first target model reaches a certain preset time period, for example, the continuous running time of the first target model has reached 30 minutes.
  • the model manager 203 integrates commonly used machine learning capabilities to provide a unified application access interface and support management and scheduling of learning models. Specifically, the model manager 203 generates a better model according to the training data updated by the training data collector 202 to conform to the current system running state.
  • the invocation logic of the model manager 203 is: after receiving the training data, the model manager 203 judges whether the model needs to be updated, and if it needs to be updated, after the model is updated, the model parameters of the model are passed to the model for verification device 204.
  • the model evaluator 204 is used to evaluate the performance of the obtained second target model, and when the performance of the second target model satisfies a certain preset requirement (which may be referred to as the third preset requirement) Next, update the first objective model in the optimizer 201 to the second objective model.
  • the update process may specifically be: the model evaluator 204 sends the model parameters of the second target model to the optimizer 201, and the optimizer 201 assigns the received updated model parameters to the first target model, Thus the second target model is obtained.
  • the model evaluator 204 may be a performance prediction model based on graph embedding.
  • the second target model meeting the third preset requirements may include but not limited to: 1) The performance of the second target model is improved by a certain amount compared with the performance of the first target model A preset value (can be called the fourth preset threshold), as an example, the fourth preset threshold can be zero, indicating that as long as the performance of the second target model reaches the performance level of the original first target model, it is considered The second target model satisfies the third preset requirement; as another example, the fourth preset threshold can also be a certain value or a certain ratio greater than zero, indicating that only the performance of the second target model is compared with that of the original first target model.
  • a preset value can be called the fourth preset threshold
  • the second target model is considered to meet the third preset requirement only when the performance has been improved to a certain extent; 2)
  • the performance of the second target model is improved by the fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan.
  • the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
  • the model evaluator 204 will also be used to trigger the database to be generated using native kernel components in the database.
  • the final target physical plan to be executed for example, the index selection module enables the traditional hill-climbing algorithm to create an index to execute the logic of the SQL statement. That is to say, if the performance of the second target model after finetune still does not meet the requirements, the traditional kernel component algorithm of the database is used to generate the target physical plan. Since the original kernel components of the database are not deleted, but coexist with the newly added optimizer in the database software, therefore, during the operation of the database, which method can improve the performance of the database will be dynamically adjusted in real time to improve overall performance. database performance.
  • the model evaluator 204 can be used to trigger the database to use native kernel components in the database to generate the final
  • the target physical plan to be executed may also feed back the information that the model update fails (that is, the performance of the second target model does not meet the third preset requirement) to the model manager 203, so that the model manager 203 can based on the information Adjust the fine-tuning strategy for the first target model to provide a reference for subsequent model training strategies.
  • the purpose of the model evaluator is to verify whether the model is effective for the workload. If the database uses a learning model, the model evaluator 204 can be used to predict the performance of the model. For the models deployed in the database management system provided by the embodiment of the present application, the present application can perform performance prediction through the model evaluator 204 . If the performance of the model becomes better (for example, the performance improvement reaches a certain threshold), the new model obtained (that is, the original model has updated model parameters) is marked as the best model, which can be actually deployed to the database management system; otherwise, it is marked as The model needs to be updated, abandoning the deployment.
  • the new model obtained that is, the original model has updated model parameters
  • the calling logic of the model evaluator 204 is: the model evaluator 204 obtains the latest model generated by the model manager 203, and verifies whether the model is stable and reliable, and at the same time can improve system performance. The verification result is fed back to the model manager 203, which identifies it as the best model or needs to re-update the model.
  • the evaluator evaluates the performance of the model deployed in the database management system constructed in the embodiment of the present application (such as the model included in the optimizer and the recommender), and checks whether it can obtain performance improvement. If the new model does not improve performance, then abandon the deployment of the model.
  • the model evaluator may be implemented based on GNN, and its structure is shown in FIG. 4 . Its core steps are as follows:
  • the estimated execution effect can be given by combining the load characteristics of the old and new models, and then comparing the new model is it effective.
  • the embodiment of the present application proposes a graph compression algorithm, which deletes redundant vertices and merges similar vertices.
  • the specific implementation process is as follows:
  • a graph model is used to capture the workload characteristics, where the vertices represent the operator characteristics extracted from the query plan, and the edges between two operators represent the query correlation and resources between them compete.
  • the performance prediction model feeds features into the prediction model, in which this application proposes a graph embedding algorithm that embeds graph features (e.g., operator features and K-hop neighbors) at the operator level, and constructs a Deep learning models to predict query performance.
  • graph features e.g., operator features and K-hop neighbors
  • the graph compression algorithm in the load graph optimization program uses the graph compression algorithm in the load graph optimization program to reduce the size of the load graph by merging nodes with overlapping time.
  • the method used is to first divide the existing time according to the execution time range of each node The overlapping nodes are clustered, and then the nodes without edge relationship are merged within each class by least fully connected subgraph (clique).
  • the database management system may further include a suggester 205, and the suggester 205 includes p models, where p ⁇ 1.
  • the suggester 205 is used to discover abnormal conditions (that is, find abnormal data) in the running data (such as CPU utilization rate, user response time, etc.) of processes in the database, and diagnose the cause of the abnormality based on the obtained abnormal data, and then , then optimize the optimization module corresponding to the abnormal cause based on the obtained abnormal cause (the optimization module is also located in the suggester 205, and the function of the optimization module is to optimize the parameters of the database), so as to reduce the running time of the process in the database The probability of anomalies in the data.
  • the above-mentioned first target model can not only be any one of the n models in the optimizer 201, but can also be a suggestion Any one of the p models in the device 205.
  • the recommender 205 may specifically include three models, which may be called codec, model D, and model E, respectively, and are respectively used to perform self-monitoring, self-diagnosis and Self-optimize these three steps. It should be noted here that in other implementations of this application, the suggester 205 may also include more or fewer models for realizing the process of self-monitoring, self-diagnosis and self-optimization of the database. In this example, the suggester 205 includes three models for illustration only, and details will not be described later.
  • the suggester 205 uses a codec to encode and then decode the running data of the process in the database to obtain the coded data, and compares the coded data with the running data input to the codec to obtain an abnormal data.
  • the principle of using the codec to obtain abnormal data is: the codec can restore the normal original data, but not the abnormal original data, so that the input original data is encoded and then decoded to obtain the encoded data. The encoded data is compared with the original data to know whether there is abnormal data.
  • the suggester 205 can further diagnose the cause of the abnormality based on the abnormal data through the model D, wherein the model D is based on deep learning A model constructed by an algorithm, for example, the model D may include an LSRM model and a classifier.
  • the suggester 205 encodes the abnormal data found by calling the LSRM model into a compressed vector (that is, a dimension-reduced/up-dimensional vector) , and then use learning-type classifiers (such as binary classifiers, multi-classifiers, etc.) to reason about the corresponding root cause (such as database backup operations); if the running data belongs to query index data (such as average delay), then the The suggester 205 can further use the model E to diagnose and obtain the cause of the abnormality based on the abnormal data, wherein the model E is a model constructed based on a deep learning algorithm.
  • the model E may include a Tree-LSTM model and a softmax function.
  • the suggester 205 encodes a slow query (that is, the execution time of the query is long) by calling the Tree-LSTM model, and locates the physical operator that causes the exception (that is, Execute the operator) operation, and then use the softmax function to identify the root cause of the abnormality.
  • the suggester 205 finds the root cause of the data anomaly based on self-monitoring and self-diagnosis, it will select the corresponding optimization module for optimization according to the root cause of the performance degradation of the database system. For example, if the performance degradation is caused by the lack of indexes, The suggester 205 can call the index selection module to build a new index based on deep reinforcement learning, so that the performance of the query load can be improved (the root cause is that there is no suggestion for the index); Parameter tuning based on empirical rules or reinforcement learning at the level, connection level, and system level. The purpose of optimization is to make the system have as few abnormal operating data as possible.
  • optimization module may correspond to a series (that is, multiple) root causes It is also possible that one optimization module corresponds to one root cause, which is not limited in this application.
  • Table 1 is a schematic diagram of the corresponding relationship between optimization modules in the suggester 205 and some root causes.
  • the suggester 205 is mainly used to implement the following three functions:
  • Self-monitoring database status and providing running data (such as CPU usage, response time, running log) when the database is running.
  • this application utilizes a codec to automatically detect anomalies based on data distribution and metric correlation. Specifically, the running data is converted into a low-dimensional representation by an encoder, and the low-dimensional representation is restored by a decoder. Data that the codec cannot reconstruct well is considered outlier.
  • Self-diagnosis aims at automatically diagnosing anomalies and discovering the root causes of abnormal data.
  • the abnormal data is system index data (for example, lock conflict)
  • the abnormal data found is encoded into a compressed vector (that is, the vector after dimension reduction/dimension enhancement) by calling the LSRM model, and then a learning classifier (such as , binary classifier, multi-classifier, etc.) to infer the corresponding root cause (such as database backup operation);
  • the abnormal data is query index data (such as slow query)
  • the slow query is encoded by calling the Tree-LSTM model, Locate the physical operator (that is, the execution operator) operation that causes the exception, and then use the softmax function to identify the root cause of the exception.
  • Self-tuning automatically optimizes the database for query workloads, e.g. index/view recommendations.
  • learned view recommendation utilizes an encoder-decoder model to automatically recommend views.
  • Self-optimization is optimized for the database system, and the learning parameter adjustment module uses deep reinforcement learning technology to adjust parameter values.
  • the embodiment of the present application can use the Actor-Critic model to automatically select appropriate parameter values, and can support SQL-level, session-level and system-level parameter tuning.
  • the invocation logic of the suggester 205 is as follows: first, dynamically collect database and query execution status indicators, and then use the self-monitoring module (ie codec) to find abnormal data.
  • the self-diagnosis module uses the system diagnosis function (namely model D) and the query diagnosis function (namely model E) to find the root cause of database performance degradation, and then specifies the self-optimization module to perform corresponding optimization functions. For example, if the root cause is that no index is established for the accessed column, the self-diagnosis module will call the index selection module of the self-optimization module for optimization.
  • the module function of the suggester 205 is to discover possible problems in the running process of the database, and perform diagnosis and optimization. Therefore, in some implementation manners of the present application, the suggester 103 may not be needed, and the present application does not limit this.
  • FIG. 5 There are three parts: monitoring, self-diagnosis and self-optimization. Among them, self-monitoring can judge whether there are problems in the history, current and future of the database according to the performance indicators of the running data of the process in the database, and judge the abnormal or possible abnormal state of the database. By exploring the abnormal state of the database, the database self-diagnosis and self-optimization functions are used to solve the actual problems of the database.
  • the advisor includes a self-monitoring module, a self-diagnosis module, and a self-optimization module, which are used to realize self-monitoring, self-diagnosis, and self-optimization respectively.
  • the core steps are as follows:
  • the self-monitoring module continuously collects database performance indicators. When abnormalities occur inside or outside the database, it can be reflected through corresponding indicators and system logs. Therefore, openGauss performs real-time anomaly monitoring and discovery by analyzing database and operating system indicators. The specific process is as follows:
  • the training data collector continuously collects indicators and logs from the database and operating system, such as QPS, operation logs, etc., and then puts these data together to form time series data.
  • the reconstruction-based algorithm is used to find anomalies, that is, normal time series data always have a regular change pattern, and the abnormal change pattern is likely to be a system anomaly.
  • the embodiment of the present application adopts an LSTM-based automatic codec with an attention layer.
  • Raw time series data is encoded into a low-dimensional representation, and a decoder parses the representation and attempts to recover the original data.
  • the training loss is the reconstruction quality.
  • the model learns the distribution of these multidimensional data and acquires the ability to reconstruct. Data that cannot be reconstructed (errors exceed a threshold) are reported as anomalies.
  • the embodiment of the present application adopts the statistical method "extreme value theory" to determine the dynamic threshold.
  • the user needs to set the system sensitivity to 1% or 5%, and it will calculate the corresponding threshold based on historical data: the present invention first normalizes the training data, and then inputs the processed data into the time series autoencoder to update the model parameters, After the model has the ability to reconstruct normal database indicators, openGauss collects the reconstruction error and calculates the threshold.
  • step 2 If no abnormality is found, wait for a period of time (that is, the preset duration), and then repeat step 1); if abnormality is found, then execute step 3).
  • the self-diagnosis module is called to conduct root cause analysis.
  • the self-diagnosis module judges the found faults, and if there are indeed problems, it gives the root cause of the problem at the system level or SQL statement level.
  • the specific process is as follows:
  • the self-diagnosis function of the database can identify the root causes of faults or abnormalities at the system level and SQL statement level.
  • the system-level fault analysis method is realized by the LSTM+KNN algorithm
  • the SQL statement-level fault root cause is realized by the Tree-LSTM algorithm. For locating the root cause of the fault, call the self-optimization function and give corresponding optimization suggestions to solve the problem.
  • the self-optimization module includes optimizing parameter configuration according to the characteristics of the database system.
  • Parameter recommendation is realized through deep reinforcement learning: First, the embodiment of the present application models the database parameter configuration and its corresponding performance through historical learning, that is, searches for the parameter with the best performance in the search space composed of selected parameters Then, the deep reinforcement learning model takes the database state and load characteristics as the input state, and selects the appropriate parameter configuration as the output behavior according to the parameter tuning experience learned from historical data, thus giving the optimal database parameter optimization scheme.
  • the self-optimization module also includes tuning for database SQL statements, for example, materialized view recommendation and index recommendation.
  • materialized view recommendation is implemented through RNN and reinforcement learning.
  • Index recommendation refers to the index recommendation for the load level, and provides the optimal index configuration scheme adapted to it according to the user's addition, deletion, query and modification ratio.
  • Figure 6 is a schematic diagram of the comparison between the optimizer and the three rewriting strategies provided by the embodiment of the present application.
  • the embodiment of the present application compares the query rewriting in openGauss with the three rewriting strategies Strategies (random rewriting, top-down rewriting, and heuristic rewriting) are compared.
  • the embodiment of the present application extracts 82 rewriting rules from the query optimization engine Calcite, and rewrites the query with corresponding strategies.
  • the embodiment of the present application uses the tool SQL-smith to generate 15,750 and 10,673 slow queries (>1s) for TPC-H and JOB respectively.
  • this rewriting strategy outperforms other methods in all cases, namely, the execution time of TPC-H is reduced by more than 49.7%, and that of JOB is reduced by more than 36.8%.
  • the reason is mainly twofold: First, openGauss explores a rewrite order that is less expensive to execute than the default top-down order in PostgreSQL. For example, with outer joins, PostgreSQL cannot push down predicates to input tables, while openGauss solves the problem by first converting outer joins to inner joins and then pushing down predicates. Second, the estimation model in openGauss predicts potential cost reductions, from which openGauss chooses a rewrite order with lower execution overhead. In addition, openGauss works better on TPC-H than JOB, because TPC-H query contains many subqueries that can be optimized by query rewriting, while multi-joins in JOB query will be further optimized by plan enumerator.
  • the optimizer 201 included in the database management system provided by the embodiment of the present application can perform fine-grained optimization during the execution of SQL statements according to the optimization method given by the AI model, improve the execution efficiency of SQL statements, and improve the performance of the database. Performance.
  • Table 2 and Table 3 are a schematic diagram of the comparison between the suggester provided by the embodiment of this application and the two index strategies. Taking index selection as an example, the present invention is based on TPC-H and TPC-C Experiments were conducted and the proposer of the embodiment of the application was compared with the default index and the artificially designed index. The results are shown in Table 2 and Table 3.
  • the index selection algorithm of the present application outperforms the default and manual indexes on both workloads. This is because the index selection algorithm encodes system statistics into the state representation and is able to optimize the index selection strategy based on historical data in order to dynamically update the index configuration.
  • TPC-H(s) TPC-C (tpmC) Database management system openGauss 122.9 10202 database administrator DBA 130.1 10001 default setting 140.8 9700
  • the optimizer 205 included in the database management system provided by the embodiment of the present application can timely discover whether there are faults or abnormalities in the database history, current and future, and give corresponding solutions according to business types and characteristics. root causes of failures, and provide optimal optimization and configuration solutions.
  • Figure 7 is a schematic diagram of the comparison between the model evaluator provided by the embodiment of the present application and the two known performance evaluation methods of BAL and DL.
  • BAL estimates the average buffer access delay and uses linear regression to predict Query latency of concurrent queries
  • DL uses a neural network designed according to the query plan structure to predict the performance of a single query.
  • the embodiment of this application compares the prediction accuracy and prediction time on JOB, and the results are shown in Figure 7. From the comparison results in Figure 7, it can be seen that the error rate of the model evaluator provided by the embodiment of this application is the lowest, which is about 29.9 times lower than that of BAL , 22.5 times lower than DL.
  • the workload graph in the model evaluator encodes concurrency factors such as resource contention, which increases the query delay of JOB by more than 20% compared with serial execution.
  • BAL collects buffer access latencies, while DL relies on a single query function.
  • openGauss utilizes a graph embedding network to directly map structural information to performance factors, which can improve generality when workloads vary. Instead, BAL uses a linear regression approach that requires many statistical samples for a single workload. In addition, it can be seen from Fig.
  • the prediction delay of the model evaluator provided by the embodiment of the present application is less than that of BAL and DL, and when the concurrency level increases, the prediction delay of the model evaluator provided by the embodiment of the present application is relatively stable.
  • the model evaluator provided in the embodiment of this application simultaneously predicts the execution time of all vertices. It embeds the localized graph for all vertices in the workload graph, so the total prediction time for the workload is close to predicting the vertex with the largest localized graph.
  • BAL it takes the longest time to predict because it predicts the performance while executing the workload.
  • For DL it propagates intermediate data features in the query plan tree in a bottom-up manner, which takes relatively longer time than openGauss.
  • the model evaluator 204 included in the database management system provided by the embodiment of the present application can effectively and timely check whether the new model is valid, and if it is effective, it will be deployed; otherwise, the model update will be abandoned.
  • the training data collector 202 and the model manager 203 do not exist in the known database management system, and these two modules included in the database management system constructed in this application can ensure the reliability of data processing.
  • the embodiment of the present application provides an autonomous database framework, which implements a self-learning kernel and a suggester based on machine learning algorithms and expert experience , to build a full range of autonomous functions of the database.
  • the learning-type optimizer built into the database kernel in the database management system constructed in the embodiment of the present application may specifically include an MCTS-based rewriter, a Tree-LSTM-based cost estimator, and an RL-based plan generator. Realize the efficient query optimization of the optimizer to meet the demands of multi-scenario business;
  • the database management system provided by the embodiment of the present invention may also include a learning-type suggester of the database, which is based on machine learning technology to realize automatic abnormal monitoring, automatic system diagnosis, automatic slow Query diagnosis and automatic performance optimization (such as parameter tuning, index recommendation, and view recommendation) functions, satisfy customers with one-click operation and maintenance management operations, improve operation and maintenance efficiency and database execution efficiency;
  • the database management system provided by the embodiment of the present invention also provides It can include an efficient model evaluator, which is based on machine learning technology, estimates the performance of the model deployed in the database management system, judges the benefits brought by the application of the corresponding model, and ensures that the database management system is always running at high performance and high reliability;
  • the training data collector automatically collects the running data of processes in the database, including database running indicators, query logs, system logs, etc. Use this information to generate training data for the model deployed in the database management system; the model manager provides a unified interface to manage and control the model version, and dynamically update and replace the models used by each module.
  • Each module of a traditional database is based on a classic heuristic or rule-defined algorithm. However, after replacing each module with a machine learning model, each machine learning model needs to collect data and train the model, and update the model when the scene changes. If the above operations are performed separately, the training and management costs of the model are very high.
  • the embodiment of the present application provides a training data collector and a model manager with a unified interface to evaluate the availability of the model and automatically update the model according to changes in the collected information.
  • Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application , the method may specifically include the following steps:
  • the computer device receives the SQL statement sent by the client to the database.
  • the database is deployed in the computer device.
  • the database includes an optimizer and native kernel components.
  • the optimizer includes n models, where n ⁇ 1.
  • a local computer device receives the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes an optimizer and a database native kernel component, the The optimizer includes n models, n ⁇ 1.
  • the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement, and the target physical plan is that the execution cost meets the first preset requirement. Assuming a required physical plan, the first target model is one of n models.
  • the computer device After the computer device receives the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a certain preset requirement (which can be called the second preset requirement). A target model (that is, one of the n models) does not meet the second preset requirement, then the computer device will obtain the target physical plan through the n models included in the optimizer according to the SQL statement, wherein the target physical plan To execute a physical plan whose cost meets the first preset requirement.
  • a certain preset requirement which can be called the second preset requirement
  • the execution overhead meeting the first preset requirements includes but is not limited to: 1) The execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are The corresponding execution costs of q physical plans generated based on the SQL statements input into the database, one physical plan corresponds to one execution cost, where q ⁇ 1; 2) The execution cost of the target physical plan is lower than a certain preset value (which can be called is the first preset threshold).
  • the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
  • the failure of the first target model to meet the second preset requirements may include but not limited to: 1) the performance of the first target model does not degrade; Although the performance of a target model has declined, the degree of decline has not reached a certain preset value (which can be called a second preset threshold); 3) Evaluate the real-time performance of the first target model and predict the next performance Performance, assuming that the predicted probability of performance degradation of the first target model does not reach a certain preset value (which may be referred to as the third preset threshold), such as the probability that the predicted performance will decline does not reach 80%; 4) the first The duration of continuous running of the target model has not reached a certain preset duration, for example, the duration of continuous running of the first target model has not reached 30 minutes.
  • the optimizer may specifically include three models, which may be called model A, model B, and model C respectively, and are respectively used to perform logical query rewriting , cost estimation and physical plan generation steps.
  • the way the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement can specifically be: first, the computer device uses the model A to input the SQL statement of the database (also can be called SQL query) to rewrite the logical query to obtain the rewritten logical plan, wherein, the model A is a model based on a tree search algorithm, for example, it can be a Monte Carlo tree search algorithm; after that, through the model B, generate q physical plans according to the logical plan, where the model B is a model based on a deep learning algorithm, for example, it can be a model based on Tree-LSTM, q ⁇ 1; finally, the q physical plans are calculated by the model C
  • the q execution overheads corresponding to the physical plan one physical plan corresponds to one execution overhead
  • the final target physical plan to be executed is determined according to the q execution overheads, where the model C is a model constructed based on a reinforcement learning algorithm, for example, It can be a model based
  • the optimizer may also include more or fewer models to implement the process of logical query rewriting, cost estimation, and physical plan generation.
  • the optimizer includes three models for illustration only, and details will not be described later.
  • the computer device executes the target physical plan.
  • the computer equipment After the computer equipment obtains the final target physical plan, it will execute the target physical plan.
  • the essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
  • the computer device obtains the target physical plan based on the optimizer included in the database and finally executes the target physical plan.
  • the database deployed in the computer device includes an optimizer with n models, Therefore, it replaces the traditional heuristic optimizer.
  • By combining with machine learning technology it realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the inaccurate and complex cost evaluation caused by the current database architecture problems. Problem with poor physical plan generation for SQL statements.
  • the computer device can also send the running data of the processes in the database deployed in it to the recommender, and the recommender can be deployed in the computer device or can not be deployed
  • the advisor can send abnormal data based on the operation data, and diagnose the cause of the abnormality based on the obtained abnormal data, and finally optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the subsequent process in the database.
  • the specific functions and calling logic of the recommender involved in the embodiment of the present application can refer to the part of the recommender 205 described in the above-mentioned embodiment corresponding to FIG. 2 , and details will not be repeated here. .
  • the computer device can also feed back the running data of the processes in the database to the suggester, and the suggester can give all-round optimization suggestions for the database based on the running data, and can realize an unattended database Performance monitoring and root cause identification greatly liberate operation and maintenance manpower, and can help the database system quickly recover from abnormalities or improve performance.
  • the computer device may also send the running data of the processes in the database deployed in it to the recommender, or send the running data to the training data collector
  • the training data collector can be deployed in the computer device, and can not be deployed in the remote device, which is not limited here.
  • the training data collector After receiving the running data, the training data collector can obtain training data according to the running data, and construct m training sets based on these training data, where m ⁇ 1.
  • the specific functions and calling logic of the training data collector involved in the embodiment of the present application can refer to the part of the training data collector 202 described in the embodiment corresponding to FIG. 2 above. I won't repeat them here.
  • the computer device can also feed back the running data of the processes in the database to the training data collector, and the training data collector can generate the training data of the models involved in the database based on the running data of the processes in the database.
  • Data can realize continuous optimization of the database system, reduce the probability of misjudgment of the database system, and provide credible autonomous operation and maintenance services.
  • the situation that the first target model does not meet the second preset requirements includes but is not limited to, on the contrary, the first target model meets the second preset requirements including but not limited to
  • the limited situation is: 1) the performance of the first target model begins to decline; 2) the performance of the first target model not only declines, but the degree of decline reaches a certain preset value (which can be called the second preset value).
  • the performance of the first target model can be predicted by the model evaluator 204, assuming the predicted first
  • the probability that the performance of the target model declines reaches a certain preset value (which can be referred to as the third preset threshold), such as the probability that the predicted performance will decline reaches 80%
  • the duration of the first target model's continuous operation reaches a certain preset The duration is set, for example, the duration of the continuous operation of the first target model has reached 30 minutes.
  • the computer device may further submit a request to the model manager (the model manager may be deployed in the computer device, or may not deployed on the computer device, not limited here) to send a first instruction, the first instruction is used to instruct the model manager to fine-tune the first target model, and, when the performance of the second target model satisfies
  • the computer device receives the model parameters of the second target model sent by the model manager, wherein the second target model is used by the model manager and the first target model A target training set corresponding to a target model is obtained by fine-tuning the first target model, the target training set is one of m training sets, and finally, the computer device updates the first target model to a second target model, And the target physical plan is obtained through the updated n models (at this time, the updated n models do not include the first target model,
  • the second target model meeting the third preset requirements may include but not limited to: 1) The performance of the second target model is improved by a certain amount compared with the performance of the first target model A preset value (can be called the fourth preset threshold), as an example, the fourth preset threshold can be zero, indicating that as long as the performance of the second target model reaches the performance level of the original first target model, it is considered The second target model satisfies the third preset requirement; as another example, the fourth preset threshold can also be a certain value or a certain ratio greater than zero, indicating that only the performance of the second target model is compared with that of the original first target model.
  • a preset value can be called the fourth preset threshold
  • the second target model is considered to meet the third preset requirement only when the performance has been improved to a certain extent; 2)
  • the performance of the second target model is improved by the fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan.
  • the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
  • the model manager invokes the corresponding target training set in the training data collector to fine-tune the first target model , which can dynamically update and replace the corresponding model used in the database according to the real-time operation status of the database.
  • the computer device will also receive a second instruction sent by the model evaluator, and the second instruction is used to instruct the database to adopt The native kernel component in the database generates the final target physical plan, wherein, the model evaluator can be deployed in the computer device or a remote device, which is not limited here.
  • the model evaluator is used to evaluate the performance of the second target model.
  • the computer device receives the second instruction of the model evaluator to instruct the database to use the traditional algorithm of the database (that is, the native kernel component) to generate the target physical plan.
  • the embodiment of the present application provides multiple options for generating the target physical plan, and has flexibility.
  • FIG. 9 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the computer device 900 may specifically include: a receiving module 901, a determining module 902, and an execution module 903, wherein the receiving module 901 is used to receive The SQL statement sent by the client to the database; the determination module 902 is used to obtain the target physical plan through the n models according to the SQL statement when the first target model does not meet the second preset requirement, and the target physical The plan is a physical plan whose execution cost meets a first preset requirement, and the first target model is one of the n models; the execution module 903 is configured to execute the target physical plan.
  • the n models include a third model, a fourth model, and a fifth model.
  • the determination module 902 is specifically configured to: use the third model to execute the SQL statement Rewrite the logical query to obtain the rewritten logical plan.
  • the third model is a model based on the tree search algorithm; through the fourth model, q physical plans are generated according to the logical plan.
  • the fourth model is based on deep learning The model constructed by the algorithm, q ⁇ 1; through the fifth model, calculate the q execution costs corresponding to the q physical plans, and determine the target physical plan according to the q execution costs, one physical plan corresponds to one execution cost
  • the fifth model is a model constructed based on a reinforcement learning algorithm.
  • the computer device 900 further includes a sending module 904, configured to: send the operation data of the processes in the database to the advisor, so that the advisor finds abnormal data based on the operation data, And make the recommender diagnose the cause of the abnormality based on the abnormal data, and optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the probability of abnormal occurrence of the operating data, wherein the suggester includes p models , p ⁇ 1.
  • the sending module 904 can also be configured to: send the running data of the processes in the database to the training data collector, so that the training data collector can obtain training data according to the running data, and based on The training data constructs m training sets, m ⁇ 1.
  • the sending module 904 may also be configured to: when the first target model meets the second preset requirement, send a first instruction to the model manager, where the first instruction is used to instructing the model manager to fine-tune the first target model, where the first target model is one of the n models; the receiving module 901 may also be used for: when the performance of the second target model satisfies a third preset If required, receive the model parameters of the second target model sent by the model manager, the second target model is the first target model that the model manager uses the target training set corresponding to the first target model The model obtained by fine-tuning, the target training set is one of the m training sets; the determination module 902 can also be used to: update the first target model to the second target model, and pass the updated n models Get a physics plan for that goal.
  • the receiving module 901 may also be configured to: when the performance of the second target model does not meet the third preset requirement, receive the second instruction sent by the model evaluator, the first The second instruction is used to instruct the database to use native kernel components in the database to generate the target physical plan, and the model evaluator is used to evaluate the performance of the second target model.
  • the meeting of the second preset requirement by the first target model includes at least any one of the following: the performance of the first target model is degraded; or, the degree of performance degradation of the first target model reaches the second Two preset thresholds, or, the predicted probability of performance degradation of the first target model reaches a third preset threshold; or, the duration of continuous operation of the first target model satisfies a preset duration.
  • the performance of the second target model meeting the third preset requirement includes at least any one of the following: the performance of the second target model is improved by a fourth preset compared with the performance of the first target model threshold; or, the performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database.
  • the execution overhead meeting the first preset requirement includes at least any one of the following: the execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are based on the SQL statement
  • Each of the generated q physical plans corresponds to an execution cost, one physical plan corresponds to one execution cost, q ⁇ 1; or, the execution cost of the target physical plan is lower than a first preset threshold.
  • FIG. 10 is a schematic structural diagram of the computer device provided by the embodiment of the present application.
  • the computer device 1000 can be deployed with The described computer device 900 is used to realize the functions of the computer device 900 in the embodiment corresponding to FIG. Differences may include one or more central processing units (central processing units, CPU) 1022 and memory 1032, one or more storage media 1030 (such as one or more mass storage devices) for storing application programs 1042 or data 1044.
  • the memory 1032 and the storage medium 1030 may be temporary storage or persistent storage.
  • the program stored in the storage medium 1030 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the computer device 1000 .
  • the central processing unit 1022 may be configured to communicate with the storage medium 1030 , and execute a series of instruction operations in the storage medium 1030 on the computer device 1000 .
  • Computer device 1000 can also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input and output interfaces 1058, and/or, one or more operating systems 1041, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • one or more operating systems 1041 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1022 is configured to execute the steps performed by the computer device in the embodiment corresponding to FIG. 8 .
  • the central processing unit 1022 can be used to: firstly, receive the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes an optimizer and database native kernel components, and the optimizer includes n model, n ⁇ 1. After receiving the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a preset requirement (which can be called the second preset requirement).
  • the target physical plan is obtained through the n models included in the optimizer, wherein the target physical plan is that the execution cost meets the first A physical plan of preset requirements. After the final target physical plan is obtained, the target physical plan will be executed. The essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
  • the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be A personal computer, a training device, or a network device, etc.) executes the methods described in various embodiments of the present application.
  • a computer device which can be A personal computer, a training device, or a network device, etc.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training device or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wired eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present application are a database management system, a data processing method, and a device. The system comprises: an optimizer including n models; a training data collector; a model manager; and a model evaluator. The optimizer is used to obtain a target physical plan by means of the n models according to an SQL statement; the training data collector is used to construct m training sets according to operating data of a database process; the model manager is used to fine-tune a first target model (which belongs to the n models and needs to meet a preset requirement, e.g. a performance reduction) by using a target training set (which belongs to the m training sets), so as to obtain a second target model; and the model evaluator is used to evaluate the performance of the second target model, and update the first target model to the second target model when the performance meets a preset requirement (e.g. a performance improvement). The present application is combined with machine learning, so as to realize the function of automatically executing database tuning and updating and other database management tasks that are traditionally executed by a DBA, without manual intervention.

Description

一种数据库管理系统、数据处理方法及设备A database management system, data processing method and device
本申请要求于2021年8月13日提交中国专利局、申请号为202110930569.8、申请名称为“一种数据库管理系统、数据处理方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110930569.8 and the application title "A database management system, data processing method and equipment" filed with the China Patent Office on August 13, 2021, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及数据库管理领域,尤其涉及一种数据库管理系统、数据处理方法及设备。The present application relates to the field of database management, in particular to a database management system, data processing method and equipment.
背景技术Background technique
数据库(database)是“按照数据结构来组织、存储和管理数据的仓库”,是一个长期存储在计算机内的有组织、可共享、统一管理的大量数据的集合。数据库对现代企业高效运营至关重要。A database is a "warehouse that organizes, stores and manages data according to the data structure", and is a collection of organized, shareable and unified management of a large amount of data stored in a computer for a long time. Databases are critical to the efficient operation of modern businesses.
在现实应用中,数据库的管理方式主要依赖于数据库管理员(database administrator,简称DBA),DBA常常需要耗费大量时间和精力来手动管理和维护数据库,非常容易出错,给数据库正常运行时间、性能和安全性带来灾难性影响。例如,未及时、正确地打补丁和安装安全更新可能导致数据库漏洞,削弱甚至造成数据库保护措施完全失效,进而使企业面临严重的数据泄露风险,遭受严重的财务影响和商誉损失。In practical applications, the management method of the database mainly depends on the database administrator (DBA for short), and the DBA often needs to spend a lot of time and energy to manage and maintain the database manually, which is very error-prone and affects the uptime, performance and quality of the database. Security has catastrophic effects. For example, failure to properly patch and install security updates in a timely manner can lead to database vulnerabilities, weakening or even complete failure of database protection measures, which in turn exposes enterprises to serious data breach risks, severe financial impact and loss of goodwill.
然而,传统的这种数据库管理方式是基于专家经验的传统优化技术,例如代价估计、连接顺序选择和参数调优等,已经不能满足多场景业务、海量应用和极致性能的要求。However, this traditional database management method is based on traditional optimization techniques based on expert experience, such as cost estimation, connection sequence selection, and parameter tuning, which can no longer meet the requirements of multi-scenario business, massive applications, and extreme performance.
发明内容Contents of the invention
本申请实施例提供了一种数据库管理系统、数据处理方法及设备,通过与机器学习技术相结合,以实现自动执行数据库调优、保护、更新以及其他传统上由DBA执行的常规数据库管理任务的功能,无需人工干预。The embodiment of the present application provides a database management system, data processing method and equipment, which can automatically perform database optimization, protection, update and other routine database management tasks traditionally performed by DBAs by combining with machine learning technology function without human intervention.
基于此,本申请实施例提供以下技术方案:Based on this, the embodiment of the present application provides the following technical solutions:
第一方面,本申请实施例首先提供一种数据库管理系统,该系统具体可以包括:自学习的优化器(需注意的是,本申请提供的优化器中包括的部分内核组件可以是原生的内核组件,但该优化器中包括的n个模型是本申请新增的,n≥1)、训练数据收集器、模型管理器以及模型评估器,其中,该优化器,用于根据输入数据库的SQL语句,通过所述n个模型得到最终待执行的物理计划(可称为目标物理计划),其中,该目标物理计划是指执行开销满足某预设要求(可称为第一预设要求)的物理计划。训练数据收集器,用于根据数据库中进程的运行数据(如,数据库运行指标、查询日志、系统日志等)得到训练数据,并基于得到的这些训练数据构建m个训练集,m≥1。模型管理器,用于在第一目标模型满足某预设要求(可称为第二预设要求)的情况下,采用与该第一目标模型对应的目标训练集对该第一目标模型进行微调(finetune),从而得到第二目标模型(该第二目标模型实质就是更新了模型参数的第一目标模型)。其中,该第一目标模型为所述n个模型中的一个,该目标训练集为所述m个训练集中的一个。之后,可再将finetune后的第二目标模型的模型 参数传递给模型评估器进行模型性能的评估。这里需要说明的是,m与n的取值可以相同,也可以不相同,若m=n,则说明n个模型中的每个模型都对应有一个自身使用的训练集,若m≠n,则说明n个模型中可能存在多个模型共用一个训练集的情形(即m<n的情况),也可能存在一个模型可有多个训练集用于进行训练的情形(即m>n的情况),具体本申请对此不作限定。模型评估器,用于评估得到的第二目标模型的性能,并在该第二目标模型的性能满足某预设要求(可称为第三预设要求)的情况下,将优化器中的第一目标模型更新为该第二目标模型。作为一种示例,更新的过程具体可以是:模型评估器将该第二目标模型的模型参数向优化器发送,优化器将接收到的更新的模型参数赋于该第一目标模型,从而得到该第二目标模型。需要注意的是,在本申请的一些实施方式中,该模型评估器可以是基于图嵌入的性能预测模型。In the first aspect, the embodiment of the present application firstly provides a database management system, which may specifically include: a self-learning optimizer (it should be noted that some kernel components included in the optimizer provided by the present application may be native kernels Components, but the n models included in the optimizer are newly added in this application, n≥1), training data collector, model manager and model evaluator, wherein, the optimizer is used according to the SQL input database statement, the final physical plan to be executed (may be called the target physical plan) is obtained through the n models, wherein the target physical plan refers to the execution cost that meets a certain preset requirement (may be called the first preset requirement) physical plan. The training data collector is used to obtain training data according to the operation data of the process in the database (such as database operation indicators, query logs, system logs, etc.), and construct m training sets based on the obtained training data, where m≥1. A model manager, configured to use the target training set corresponding to the first target model to fine-tune the first target model when the first target model satisfies a certain preset requirement (which may be referred to as a second preset requirement). (finetune), thereby obtaining the second target model (the second target model is essentially the first target model with model parameters updated). Wherein, the first target model is one of the n models, and the target training set is one of the m training sets. After that, the model parameters of the second target model after finetune can be passed to the model evaluator to evaluate the model performance. What needs to be explained here is that the values of m and n can be the same or different. If m=n, it means that each of the n models corresponds to a training set used by itself. If m≠n, Then it shows that among n models, there may be multiple models sharing a training set (that is, the case of m<n), and there may also be a situation where a model can have multiple training sets for training (that is, the case of m>n ), the specific application is not limited to this. The model evaluator is used to evaluate the performance of the obtained second target model, and when the performance of the second target model satisfies a certain preset requirement (which may be called the third preset requirement), the first An object model is updated to the second object model. As an example, the update process may specifically be: the model evaluator sends the model parameters of the second target model to the optimizer, and the optimizer assigns the received updated model parameters to the first target model, thereby obtaining the Second target model. It should be noted that, in some embodiments of the present application, the model evaluator may be a performance prediction model based on graph embedding.
在本申请上述实施方式中,数据库管理系统包括具有n个模型的优化器、训练数据收集器、模型管理器以及模型评估器,其中,具有n个模型的优化器替代了传统启发式优化器,通过与机器学习技术相结合,可实现将逻辑查询转换为执行效率更高的物理执行计划;训练数据收集器,可基于数据库中进程的运行数据生成数据库所涉及的模型的训练数据,可实现持续优化数据库,提供可信的自治运维服务;模型管理器,可调用训练数据收集器中对应的目标训练集对该第一目标模型进行微调,以实现根据数据库的实时运行状态动态更新和替换数据库中使用的对应模型;模型评估器,用于提供二次校验能力,保证提供最佳有效模型。本申请实施例构建的数据库管理系统通过与机器学习技术相结合,可实现自动执行数据库调优、保护、更新以及其他传统上由DBA执行的常规数据库管理任务的功能,无需人工干预。In the above embodiments of the present application, the database management system includes an optimizer with n models, a training data collector, a model manager, and a model evaluator, wherein the optimizer with n models replaces the traditional heuristic optimizer, By combining with machine learning technology, logical queries can be transformed into physical execution plans with higher execution efficiency; the training data collector can generate training data for models involved in the database based on the running data of the processes in the database, and can realize continuous Optimize the database and provide credible autonomous operation and maintenance services; the model manager can call the corresponding target training set in the training data collector to fine-tune the first target model, so as to dynamically update and replace the database according to the real-time operation status of the database The corresponding model used in ; the model evaluator is used to provide secondary verification capabilities to ensure that the best valid model is provided. The database management system constructed in the embodiment of the present application can realize the function of automatically performing database tuning, protection, updating and other routine database management tasks traditionally performed by DBAs without human intervention by combining with machine learning technology.
在第一方面的一种可能的实现方式中,该数据库管理系统还可以进一步包括建议器,该建议器可以包括p个模型,其中,p≥1;该建议器,用于发现数据库中进程的运行数据(如,CPU利用率、用户响应时间等)中存在的异常情况(即发现异常数据),并基于得到的异常数据诊断出异常原因,之后,再基于得到的异常原因优化与该异常原因对应的优化模块(该优化模块也位于建议器内,优化模块的作用是用于对数据库进行参数调优),以减小数据库中进程的运行数据发生异常的概率。In a possible implementation manner of the first aspect, the database management system may further include a suggester, which may include p models, where p≥1; the suggester is used to discover the Abnormal conditions (that is, abnormal data found) in the operation data (such as CPU utilization rate, user response time, etc.), and the abnormal cause is diagnosed based on the obtained abnormal data, and then based on the obtained abnormal cause optimization and the abnormal cause The corresponding optimization module (the optimization module is also located in the advisor, and the function of the optimization module is to optimize the parameters of the database) is to reduce the probability of abnormal occurrence of the operation data of the process in the database.
在本申请上述实施方式中,构建的数据库管理系统可以基于机器学习方法,在数据库管理系统中新增建议器,用于实现自监控、自诊断、自优化的功能,以便自动和智能地优化数据库。从而解决了现有技术中数据库监控、配置、诊断、优化方法(如,参数调优、慢SQL诊断、索引/视图顾问)依赖DBA,成本高,无法适应大规模实例(如云数据库)的问题。In the above-mentioned embodiments of the present application, the constructed database management system can be based on machine learning methods, and a new suggester is added to the database management system to realize the functions of self-monitoring, self-diagnosis, and self-optimization, so as to automatically and intelligently optimize the database . This solves the problem that database monitoring, configuration, diagnosis, and optimization methods (such as parameter tuning, slow SQL diagnosis, index/view consultant) in the prior art rely on DBA, have high cost, and cannot adapt to large-scale instances (such as cloud databases). .
在第一方面的一种可能的实现方式中,建议器具体可以包括三个模型,可以分别称为编解码器、第一模型、第二模型,分别用于执行数据库的自监控、自诊断以及自优化这三个步骤。具体地,该建议器通过编解码器,对数据库中进程的运行数据进行编码后再解码,从而得到编码数据,并将该编码数据与输入该编解码器的运行数据进行比对,得到异常数据。这里需要说明的是,利用编解码器得到异常数据的原理是:编解码器能还原正常的原始数据,而不能还原异常的原始数据,这样将输入的原始数据编码后再解码,得到编码数 据,编码数据再与原始数据对比,就能得知是否存在异常数据。在得到异常数据后,若该运行数据属于系统指标数据(如,页面故障),则该建议器可以进一步通过第一模型,根据该异常数据诊断得到异常原因,其中,该第一模型为基于深度学习算法构建的模型;若该运行数据属于查询指标数据(如,平均延迟),则该建议器还可以进一步通过第二模型,根据该异常数据诊断得到异常原因,其中,该第二模型为基于深度学习算法构建的模型。这里需要注意的是,在本申请的另一些实施方式中,建议器还可以包括更多或更少的模型,用于实现数据库的自监控、自诊断的过程,在本申请实施例中,建议器包括是三个模型仅为示意。In a possible implementation manner of the first aspect, the suggester may specifically include three models, which may be respectively referred to as a codec, a first model, and a second model, and are respectively used to perform self-monitoring, self-diagnosis, and Self-optimize these three steps. Specifically, the suggester encodes and decodes the running data of the process in the database through a codec to obtain encoded data, and compares the encoded data with the running data input to the codec to obtain abnormal data . What needs to be explained here is that the principle of using the codec to obtain abnormal data is: the codec can restore the normal original data, but not the abnormal original data, so that the input original data is encoded and then decoded to obtain the encoded data. The encoded data is compared with the original data to know whether there is abnormal data. After obtaining the abnormal data, if the operation data belongs to the system index data (such as page failure), the suggester can further use the first model to diagnose the cause of the abnormality according to the abnormal data, wherein the first model is based on depth A model constructed by a learning algorithm; if the operating data belongs to query index data (such as average delay), the suggester can further diagnose and obtain the cause of the abnormality based on the abnormal data through the second model, wherein the second model is based on Models built by deep learning algorithms. It should be noted here that in other implementations of this application, the suggester may also include more or fewer models for realizing the process of self-monitoring and self-diagnosis of the database. In the embodiment of this application, it is recommended Included are three models for illustration only.
在本申请上述实施方式中,具体阐述了建议器如何通过其中所包括的模型实现数据库的自监控、自诊断的功能,实现了数据库管理系统自动、智能地管理数据库的运行数据的目的。In the above embodiments of the present application, it is specifically explained how the advisor implements the functions of self-monitoring and self-diagnosis of the database through the models contained therein, and realizes the purpose of the database management system to automatically and intelligently manage the operating data of the database.
在第一方面的一种可能的实现方式中,该第一模型中可以包括LSRM模型以及分类器,具体地,建议器通过调用LSRM模型将发现的异常数据编码成一个压缩向量(即降维/升维后的向量),然后利用学习型的分类器(如,二分类器、多分类器等)推理相应的根因(如,数据库备份操作);该第二模型可以包括Tree-LSTM模型以及softmax函数,具体地,建议器通过调用Tree-LSTM模型对慢查询(即查询的执行时间长)进行编码,定位导致异常的物理算子(即执行算子)操作,再通过使用softmax函数识别导致异常的根因。In a possible implementation of the first aspect, the first model may include an LSRM model and a classifier. Specifically, the suggester encodes the found abnormal data into a compressed vector (ie dimensionality reduction/ Dimension-uplifted vector), and then use learning-type classifiers (such as binary classifiers, multi-classifiers, etc.) to reason about the corresponding root cause (such as database backup operations); the second model can include Tree-LSTM models and The softmax function, specifically, the suggester encodes the slow query (that is, the execution time of the query is long) by calling the Tree-LSTM model, locates the operation of the physical operator (that is, the execution operator) that causes the exception, and then uses the softmax function to identify the resulting The root cause of the exception.
在本申请上述实施方式中,具体阐述了第一模型、第二模型的一种典型的实现方式,具备可实现性。In the above-mentioned implementation manners of the present application, a typical implementation manner of the first model and the second model is described in detail, which is realizable.
在第一方面的一种可能的实现方式中,若本申请实施例构建的数据库管理系统包括该建议器,则上述所述的第一目标模型不仅可以是优化器中n个模型中的任意一个,也可以是建议器中p个模型中的任意一个。In a possible implementation of the first aspect, if the database management system constructed in the embodiment of the present application includes the suggester, the above-mentioned first target model can not only be any one of the n models in the optimizer , can also be any one of the p models in the suggester.
在本申请上述实施方式中,明确了第一目标模型可以是优化器以及建议器中的任意一个模型,扩大了第一目标模型的选择范围,使得后续模型评估器不仅可以评估优化器中模型的性能,还可以进一步评估建议器中模型的性能,具备方案的可实现性。In the above embodiments of the present application, it is clarified that the first target model can be any one of the optimizer and the recommender, which expands the selection range of the first target model, so that the subsequent model evaluator can not only evaluate the model in the optimizer performance, you can further evaluate the performance of the model in the proposer, and have the feasibility of the scheme.
在第一方面的一种可能的实现方式中,优化器具体可以包括三个模型,可分别称为第三模型、第四模型、第五模型,分别用于执行逻辑查询重写、代价估计以及物理计划生成这三个步骤。具体地,该优化器通过第三模型,对输入数据库的SQL语句(也可称为SQL查询)进行逻辑查询重写,从而得到重写后的逻辑计划,其中,该第三模型为基于树搜索算法构建的模型,例如,可以是蒙特卡洛树搜索算法;之后,该优化器再通过第四模型,根据该逻辑计划生成q个物理计划,其中,该第四模型为基于深度学习算法构建的模型,例如,可以是基于Tree-LSTM的模型,q≥1;最后,该优化器再通过第五模型,计算与该q个物理计划对应的q个执行开销(一个物理计划对应一个执行开销),并根据该q个执行开销确定最终待执行的目标物理计划,其中,该第五模型为基于强化学习算法构建的模型,例如,可以是基于深度强化学习(deep Q-learning,DQN)的模型。这里需要注意的是,在本申请的另一些实施方式中,优化器还可以包括更多或更少的模型,用于实现逻辑查询重写、代价估计以及物理计划生成的过程,在本申请实施例中,优化器包括是三个模型仅为 示意。In a possible implementation of the first aspect, the optimizer may specifically include three models, which may be referred to as the third model, the fourth model, and the fifth model, which are respectively used to perform logical query rewriting, cost estimation, and The physical plan is generated in these three steps. Specifically, the optimizer rewrites the logical query of the SQL statement (also referred to as SQL query) input to the database through the third model, so as to obtain the rewritten logical plan, wherein the third model is based on tree search The model constructed by the algorithm, for example, can be a Monte Carlo tree search algorithm; afterward, the optimizer generates q physical plans according to the logical plan through the fourth model, wherein the fourth model is constructed based on a deep learning algorithm The model, for example, can be a model based on Tree-LSTM, q≥1; finally, the optimizer calculates the q execution costs corresponding to the q physical plans through the fifth model (one physical plan corresponds to one execution cost) , and determine the final target physical plan to be executed according to the q execution overheads, wherein the fifth model is a model constructed based on a reinforcement learning algorithm, for example, it may be a model based on deep reinforcement learning (deep Q-learning, DQN) . It should be noted here that in other implementations of this application, the optimizer may also include more or fewer models to implement the process of logical query rewriting, cost estimation, and physical plan generation. In the example, the optimizer includes three models for illustration only.
在本申请上述实施方式中,具体阐述了优化器如何通过其中所包括的模型实现数据库的逻辑查询重写、代价估计以及物理计划生成的功能,从而替代了传统启发式优化器,通过与机器学习技术相结合,实现了将逻辑查询转换为执行效率更高的物理执行计划,并可以有效解决当前数据库体系结构问题导致的代价评估不准确、复杂SQL语句生成物理计划差的问题。In the above-mentioned embodiments of the present application, it is specifically explained how the optimizer implements the functions of logical query rewriting, cost estimation and physical plan generation of the database through the model included in it, thereby replacing the traditional heuristic optimizer. The combination of technologies realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the problems of inaccurate cost evaluation caused by current database architecture problems and poor physical plans generated by complex SQL statements.
在第一方面的一种可能的实现方式中,若第二目标模型的性能未满足第三预设要求,则该模型评估器还将用于触发该数据库采用数据库内原生内核组件生成最终待执行的目标物理计划,例如,索引选择模块启用传统的爬山算法新建索引,以执行SQL语句的逻辑。也就是说,如果finetune后的第二目标模型的性能依然没有达到要求,就采用数据库传统内核组件算法来生成目标物理计划。In a possible implementation of the first aspect, if the performance of the second target model does not meet the third preset requirement, the model evaluator will also be used to trigger the database to use the native kernel components in the database to generate the final to-be-executed The target physical plan, for example, the index selection module enables the traditional hill-climbing algorithm to create new indexes to execute the logic of the SQL statement. That is to say, if the performance of the second target model after finetune still does not meet the requirements, the traditional kernel component algorithm of the database is used to generate the target physical plan.
在本申请上述实施方式中,由于数据库的原生内核组件并未删除,而是与新增的优化器并存于数据库软件中,因此,在数据库运行过程中,若第二目标模型的性能未满足第三预设要求,则该模型评估器还将触发该数据库采用数据库内原生内核组件生成最终待执行的目标物理计划(因为在这种情况下,原生内核组件执行的效果更好),从而可基于对第二目标模型的当前性能的评估实时进行动态调整生成目标物理计划的方式,从整体上提高了数据库性能。In the above embodiments of the present application, since the original kernel components of the database are not deleted, but coexist with the newly added optimizer in the database software, therefore, during the operation of the database, if the performance of the second target model does not meet the requirements of the first Three preset requirements, the model evaluator will also trigger the database to use the native kernel components in the database to generate the final target physical plan to be executed (because in this case, the native kernel components perform better), so that it can be based on The method of dynamically adjusting the evaluation of the current performance of the second target model in real time to generate the target physical plan improves the performance of the database as a whole.
在第一方面的一种可能的实现方式中,若第二目标模型的性能未满足第三预设要求,模型评估器除了可以用于触发该数据库采用数据库内原生内核组件生成最终待执行的目标物理计划,也可以将模型更新失败的信息(即第二目标模型的性能未满足所述第三预设要求)向模型管理器反馈,以使得该模型管理器基于该信息调整对第一目标模型的微调策略。In a possible implementation of the first aspect, if the performance of the second target model does not meet the third preset requirement, the model evaluator can be used to trigger the database to generate the final target to be executed using native kernel components in the database The physical plan may also feed back the information that the model update fails (that is, the performance of the second target model does not meet the third preset requirement) to the model manager, so that the model manager adjusts the first target model based on the information fine-tuning strategy.
在本申请上述实施方式中,模型评估器还可以进一步将模型更新失败的信息反馈至模型管理器,使得模型管理器为后续模型训练策略提供参考,从而可提升模型训练能力以及训练效率。In the above-mentioned embodiments of the present application, the model evaluator can further feed back information about model update failure to the model manager, so that the model manager can provide reference for subsequent model training strategies, thereby improving model training capability and training efficiency.
在第一方面的一种可能的实现方式中,执行开销满足第一预设要求包括但不限于:In a possible implementation manner of the first aspect, the execution overhead meeting the first preset requirement includes but is not limited to:
1)目标物理计划的执行开销在q个执行开销中开销最低,该q个执行开销为基于输入数据库的SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,其中,q≥1;1) The execution overhead of the target physical plan is the lowest among the q execution overheads. The q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statements input into the database. One physical plan corresponds to one execution overhead, where ,q≥1;
2)目标物理计划的执行开销低于某个预设值(可称为第一预设阈值)。为便于阐述,在本申请后续实施例中,均以目标物理计划的执行开销在q个执行开销中开销最低作为执行开销满足第一预设要求的情形,后续不予赘述。2) The execution cost of the target physical plan is lower than a certain preset value (which may be referred to as a first preset threshold). For ease of illustration, in subsequent embodiments of the present application, the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
在本申请上述实施方式中,具体阐述了执行开销满足第一预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the execution cost meets the first preset requirement are specifically described, which has wide applicability and flexibility.
在第一方面的一种可能的实现方式中,第一目标模型满足第二预设要求包括但不限于:In a possible implementation manner of the first aspect, the first target model meeting the second preset requirement includes but is not limited to:
1)第一目标模型的性能开始出现性能下降的情况;1) The performance of the first target model begins to decline;
2)第一目标模型的性能不仅出现下降,并且下降的程度达到某个预设值(可称为第二预设阈值);2) The performance of the first target model not only declines, but the degree of decline reaches a certain preset value (which may be called the second preset threshold);
3)对该第一目标模型的实时性能进行评估并预测接下来的性能表现,例如,可通过模型评估器对该第一目标模型的性能进行预测,假设预测出的第一目标模型的性能下降的概率达到某个预设值(可称为第三预设阈值),如预测的性能要下降的概率达到80%;3) Evaluate the real-time performance of the first target model and predict the next performance. For example, the performance of the first target model can be predicted by a model evaluator, assuming that the predicted performance of the first target model declines The probability of reaching a certain preset value (may be referred to as the third preset threshold value), for example, the probability that the predicted performance will decline reaches 80%;
4)第一目标模型持续运行的时长达到某个预设时长,如,第一目标模型持续运行的时长已达到30分钟。4) The continuous running time of the first target model reaches a preset time period, for example, the continuous running time of the first target model reaches 30 minutes.
在本申请上述实施方式中,具体阐述了第一目标模型满足第二预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the first target model satisfies the second preset requirement are specifically described, which has wide applicability and flexibility.
在第一方面的一种可能的实现方式中,第二目标模型满足第三预设要求可以包括但不限于:In a possible implementation manner of the first aspect, the second target model meeting the third preset requirement may include but not limited to:
1)该第二目标模型的性能相比第一目标模型的性能提高了某个预设值(可称为第四预设阈值),作为一个示例,第四预设阈值可以为零,说明只要第二目标模型的性能达到了原来第一目标模型的性能的水平,就认为第二目标模型满足第三预设要求;作为另一示例,第四预设阈值也可以是大于零的某个值或某个比例,说明只有第二目标模型的性能相对原来第一目标模型的性能提高到了一定程度,才认为第二目标模型满足第三预设要求;1) Compared with the performance of the first target model, the performance of the second target model has increased by a certain preset value (which may be called the fourth preset threshold). As an example, the fourth preset threshold can be zero, which means that as long as When the performance of the second target model reaches the performance level of the original first target model, it is considered that the second target model meets the third preset requirement; as another example, the fourth preset threshold can also be a certain value greater than zero Or a certain ratio, indicating that only when the performance of the second target model is improved to a certain extent compared with the performance of the original first target model, the second target model is considered to meet the third preset requirement;
2)第二目标模型的性能相比数据库内原生内核组件的性能提高了第五预设阈值。即验证该第二目标模型相对于传统数据库算法的性能提升,如果性能提升达到一定阈值,则实际替换数据相应模块使用的模型,否则,还是采用传统数据库算法执行目标物理计划。其中,第五预设阈值的取值可以为零,也可以是大于零的某个值或某个比例,具体请参阅上述第一种方式,此处不予赘述。2) The performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan. Wherein, the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
在本申请上述实施方式中,具体阐述了第二目标模型满足第三预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the second target model satisfies the third preset requirement are specifically described, which has wide applicability and flexibility.
本申请实施例第二方面还提供了一种数据管理方法,该方法包括:首先,本地的计算机设备接收由客户端设备向该计算机设备中部署的数据库发送的SQL语句,其中,该数据库内包括优化器以及数据库原生内核组件,该优化器包括n个模型,n≥1。计算机设备在接收到客户端设备发送的SQL语句后,会先判断优化器中的n个模型是否存在有模型不满足某事先预设的要求(可称为第二预设要求),若存在第一目标模型(即n个模型中的一个)不满足该第二预设要求,则计算机设备会根据该SQL语句,通过优化器中包括的n个模型得到目标物理计划,其中,该目标物理计划为执行开销满足第一预设要求的物理计划。计算机设备在得到最终的目标物理计划后,会执行该目标物理计划,这个执行的过程实质就是利用生成的目标物理计划执行输入的该SQL语句的实际逻辑。The second aspect of the embodiment of the present application also provides a data management method, the method includes: first, the local computer device receives the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes Optimizer and database native kernel components, the optimizer includes n models, n≥1. After the computer device receives the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a certain preset requirement (which can be called the second preset requirement). A target model (that is, one of the n models) does not meet the second preset requirement, then the computer device will obtain the target physical plan through the n models included in the optimizer according to the SQL statement, wherein the target physical plan To execute a physical plan whose cost meets the first preset requirement. After the computer device obtains the final target physical plan, it will execute the target physical plan. The essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
在本申请上述实施方式中,阐述了计算机设备如何基于数据库中所包括的优化器得到目标物理计划并最终执行该目标物理计划,该计算机设备中所部署的数据库包括具有n个模型的优化器,从而替代了传统启发式优化器,通过与机器学习技术相结合,实现了将逻辑查询转换为执行效率更高的物理执行计划。In the above embodiments of the present application, it is explained how the computer device obtains the target physical plan based on the optimizer included in the database and finally executes the target physical plan. The database deployed in the computer device includes an optimizer with n models, It replaces the traditional heuristic optimizer and realizes the conversion of logical queries into physical execution plans with higher execution efficiency by combining with machine learning technology.
在第二方面的一种可能的实现方式中,优化器具体可以包括三个模型,可分别称为第三模型、第四模型以及第五模型,分别用于执行逻辑查询重写、代价估计以及物理计划生成的步骤。在这种情况下,计算机设备根据该SQL语句,通过优化器中包括的n个模型得 到目标物理计划的方式具体可以是:首先,该计算机设备通过第三模型,对输入数据库的SQL语句(也可称为SQL查询)进行逻辑查询重写,从而得到重写后的逻辑计划,其中,该模型A为基于树搜索算法构建的模型,例如,可以是蒙特卡洛树搜索算法;之后,再通过第四模型,根据该逻辑计划生成q个物理计划,其中,该第四模型为基于深度学习算法构建的模型,例如,可以是基于Tree-LSTM的模型,q≥1;最后通过第五模型计算与该q个物理计划对应的q个执行开销(一个物理计划对应一个执行开销),并根据该q个执行开销确定最终待执行的目标物理计划,其中,该第五模型为基于强化学习算法构建的模型,例如,可以是基于DQN的模型。这里需要注意的是,在本申请的另一些实施方式中,优化器还可以包括更多或更少的模型,用于实现逻辑查询重写、代价估计以及物理计划生成的过程,在本申请实施例中,优化器包括是三个模型仅为示意。In a possible implementation of the second aspect, the optimizer may specifically include three models, which may be referred to as the third model, the fourth model, and the fifth model, which are respectively used to perform logical query rewriting, cost estimation, and Steps for physical plan generation. In this case, the way the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement can specifically be: first, the computer device uses the third model to input the SQL statement (also can be referred to as SQL query) to rewrite the logical query, so as to obtain the rewritten logical plan, wherein, the model A is a model based on a tree search algorithm, for example, it can be a Monte Carlo tree search algorithm; after that, through The fourth model is to generate q physical plans according to the logical plan, wherein the fourth model is a model based on a deep learning algorithm, for example, a model based on Tree-LSTM, q≥1; finally calculated by the fifth model q execution overheads corresponding to the q physical plans (one physical plan corresponds to one execution overhead), and determine the final target physical plan to be executed according to the q execution overheads, wherein the fifth model is constructed based on a reinforcement learning algorithm The model of , for example, can be a DQN-based model. It should be noted here that in other implementations of this application, the optimizer may also include more or fewer models to implement the process of logical query rewriting, cost estimation, and physical plan generation. In the example, the optimizer includes three models for illustration only.
在本申请上述实施方式中,具体阐述了优化器如何通过其中所包括的模型实现数据库的逻辑查询重写、代价估计以及物理计划生成的功能,从而替代了传统启发式优化器,通过与机器学习技术相结合,实现了将逻辑查询转换为执行效率更高的物理执行计划,并可以有效解决当前数据库体系结构问题导致的代价评估不准确、复杂SQL语句生成物理计划差的问题。In the above-mentioned embodiments of the present application, it is specifically explained how the optimizer implements the functions of logical query rewriting, cost estimation and physical plan generation of the database through the model included in it, thereby replacing the traditional heuristic optimizer. The combination of technologies realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the problems of inaccurate cost evaluation caused by current database architecture problems and poor physical plans generated by complex SQL statements.
在第二方面的一种可能的实现方式中,该计算机设备还可以将部署于其中的数据库中进程的运行数据向建议器发送,该建议器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。建议器接收到该运行数据,可基于该运行数据发送异常数据,并基于得到的异常数据诊断出异常原因,最后基于异常原因优化与该异常原因对应的自优化模块,以减小后续数据库中进程的运行数据发生异常的概率,其中,建议器包括p个模型,p≥1。In a possible implementation of the second aspect, the computer device may also send the running data of the processes in the database deployed therein to the recommender, and the recommender may be deployed in the computer device or in the The remote device is not limited here. After receiving the operation data, the advisor can send abnormal data based on the operation data, and diagnose the cause of the abnormality based on the obtained abnormal data, and finally optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the subsequent process in the database. The probability of anomalies in the running data of , where the suggester includes p models, p≥1.
在本申请上述实施方式中,具体阐述了计算机设备还可以将数据库中进程的运行数据反馈给建议器,建议器可以基于该运行数据给出数据库全方位的优化建议,可以实现无人值守的数据库性能监控、根因识别,极大解放运维人力,并可以帮助数据库系统迅速恢复异常或提升性能。In the above-mentioned embodiments of the present application, it is specifically stated that the computer device can also feed back the running data of the processes in the database to the suggester, and the suggester can give all-round optimization suggestions for the database based on the running data, and can realize an unattended database Performance monitoring and root cause identification greatly liberate operation and maintenance manpower, and can help the database system quickly recover from abnormalities or improve performance.
在第二方面的一种可能的实现方式中,该计算机设备还可以将部署于其中的数据库中进程的运行数据向建议器发送之外,也可以将该运行数据向训练数据收集器发送,该训练数据收集器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。训练数据收集器接收到该运行数据后,可以根据该运行数据得到训练数据,并基于这些训练数据构建m个训练集,m≥1。In a possible implementation manner of the second aspect, the computer device may also send the running data of the processes in the database deployed therein to the recommender, or send the running data to the training data collector, the The training data collector can be deployed in the computer device or in the remote device, which is not limited here. After receiving the running data, the training data collector can obtain training data according to the running data, and construct m training sets based on these training data, where m≥1.
在本申请上述实施方式中,具体阐述了计算机设备还可以将数据库中进程的运行数据反馈给训练数据收集器,该训练数据收集器可基于数据库中进程的运行数据生成数据库所涉及的模型的训练数据,可实现持续优化数据库系统,降低数据库系统的误判概率,提供可信的自治运维服务。In the above-mentioned embodiments of the present application, it is specifically stated that the computer device can also feed back the running data of the processes in the database to the training data collector, and the training data collector can generate the training data of the models involved in the database based on the running data of the processes in the database. Data can realize continuous optimization of the database system, reduce the probability of misjudgment of the database system, and provide credible autonomous operation and maintenance services.
在第二方面的一种可能的实现方式中,若第一目标模型满足第二预设要求,该计算机设备还可以进一步向模型管理器(该模型管理器可部署于该计算机设备中,也可以不部署于该计算机设备,此处不做限定)发送第一指令,该第一指令用于指示该模型管理器对该 第一目标模型进行微调,并且,在第二目标模型的性能满足某预设要求(可称为第三预设要求)的情况下,计算机设备接收模型管理器发送的该第二目标模型的模型参数,其中,该第二目标模型为该模型管理器利用与第一目标模型对应的目标训练集对该第一目标模型进行微调得到的模型,该目标训练集为m个训练集中的一个,最后,计算机设备再将该第一目标模型更新为第二目标模型,并通过更新后的n个模型(此时更新后的n个模型中不包括第一目标模型,包括的是第二目标模型)得到目标物理计划。In a possible implementation of the second aspect, if the first target model satisfies the second preset requirement, the computer device may further send a model manager (the model manager may be deployed in the computer device, or may be not deployed on the computer device, not limited here) to send a first instruction, the first instruction is used to instruct the model manager to fine-tune the first target model, and, when the performance of the second target model satisfies a predetermined In the case of setting a requirement (which may be referred to as a third preset requirement), the computer device receives the model parameters of the second target model sent by the model manager, wherein the second target model is used by the model manager and the first target model The target training set corresponding to the model is a model obtained by fine-tuning the first target model. The target training set is one of the m training sets. Finally, the computer device updates the first target model to the second target model, and passes The updated n models (at this time, the updated n models do not include the first target model, but include the second target model) to obtain the target physical plan.
在本申请上述实施方式中,具体阐述了在第一目标模型没有满足第二预设要求的情况下,还需要通过模型管理器利用与该第一目标模型对应的目标训练集对该第一目标模型进行微调,并在微调后得到的第二目标模型的性能满足第三预设要求的情况下,对第一目标模型进行更新,以实现根据数据库的实时运行状态动态更新和替换数据库中使用的对应模型。In the above-mentioned embodiments of the present application, it is specifically stated that in the case that the first target model does not meet the second preset requirements, it is also necessary to use the target training set corresponding to the first target model to set the first target model through the model manager. The model is fine-tuned, and when the performance of the second target model obtained after fine-tuning meets the third preset requirement, the first target model is updated, so as to dynamically update and replace the used in the database according to the real-time operation status of the database corresponding model.
在第二方面的一种可能的实现方式中,若该第二目标模型的性能不满足所述第三预设要求,那么该计算机设备还将接收模型评估器发送的第二指令,该第二指令用于指示该数据库采用数据库内原生内核组件生成最终的目标物理计划,其中,该模型评估器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。该模型评估器就用于评估该第二目标模型的性能。In a possible implementation of the second aspect, if the performance of the second target model does not meet the third preset requirement, then the computer device will also receive a second instruction sent by the model evaluator, and the second The instruction is used to instruct the database to use native kernel components in the database to generate the final target physical plan, wherein the model evaluator can be deployed in the computer device or in a remote device, which is not limited here. The model evaluator is used to evaluate the performance of the second target model.
在本申请上述实施方式中,具体阐述了当微调后的第二目标模型依然没有满足第三预设要求,则计算机设备接收模型评估器第二指令,以指示该数据库采用数据库传统算法(即原生内核组件)来生成目标物理计划。本申请实施例为目标物理计划的生成提供了多种选择,具备灵活性。In the above-mentioned embodiments of the present application, it is specifically stated that when the fine-tuned second target model still does not meet the third preset requirement, the computer device receives the second instruction of the model evaluator to instruct the database to use the traditional algorithm of the database (that is, the native kernel component) to generate the target physical plan. The embodiment of the present application provides multiple options for generating the target physical plan, and has flexibility.
在第二方面的一种可能的实现方式中,第一目标模型满足第二预设要求包括但不限于:In a possible implementation of the second aspect, the first target model meeting the second preset requirements includes but is not limited to:
1)第一目标模型的性能开始出现性能下降的情况;1) The performance of the first target model begins to decline;
2)第一目标模型的性能不仅出现下降,并且下降的程度达到某个预设值(可称为第二预设阈值);2) The performance of the first target model not only declines, but the degree of decline reaches a certain preset value (which may be called the second preset threshold);
3)对该第一目标模型的实时性能进行评估并预测接下来的性能表现,例如,可通过模型评估器对该第一目标模型的性能进行预测,假设预测出的第一目标模型的性能下降的概率达到某个预设值(可称为第三预设阈值),如预测的性能要下降的概率达到80%;3) Evaluate the real-time performance of the first target model and predict the next performance. For example, the performance of the first target model can be predicted by a model evaluator, assuming that the predicted performance of the first target model declines The probability of reaching a certain preset value (may be referred to as the third preset threshold value), for example, the probability that the predicted performance will decline reaches 80%;
4)第一目标模型持续运行的时长达到某个预设时长,如,第一目标模型持续运行的时长已达到30分钟。4) The continuous running time of the first target model reaches a preset time period, for example, the continuous running time of the first target model reaches 30 minutes.
在本申请上述实施方式中,具体阐述了第一目标模型满足第二预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the first target model satisfies the second preset requirement are specifically described, which has wide applicability and flexibility.
在第二方面的一种可能的实现方式中,第二目标模型满足第三预设要求可以包括但不限于:In a possible implementation manner of the second aspect, the second target model meeting the third preset requirement may include but not limited to:
1)该第二目标模型的性能相比第一目标模型的性能提高了某个预设值(可称为第四预设阈值),作为一个示例,第四预设阈值可以为零,说明只要第二目标模型的性能达到了原来第一目标模型的性能的水平,就认为第二目标模型满足第三预设要求;作为另一示例,第四预设阈值也可以是大于零的某个值或某个比例,则说明只有第二目标模型的性能相对 原来第一目标模型的性能提高到了一定程度,才认为该第二目标模型满足第三预设要求;1) Compared with the performance of the first target model, the performance of the second target model has increased by a certain preset value (which may be called the fourth preset threshold). As an example, the fourth preset threshold can be zero, which means that as long as When the performance of the second target model reaches the performance level of the original first target model, it is considered that the second target model meets the third preset requirement; as another example, the fourth preset threshold can also be a certain value greater than zero Or a certain ratio, it means that only when the performance of the second target model is improved to a certain extent compared with the performance of the original first target model, the second target model is considered to meet the third preset requirement;
2)第二目标模型的性能相比数据库内原生内核组件的性能提高了第五预设阈值。即验证该第二目标模型相对于传统数据库算法的性能提升,如果性能提升达到一定阈值,则实际替换数据相应模块使用的模型,否则,还是采用传统数据库算法执行目标物理计划。其中,第五预设阈值的取值可以为零,也可以是大于零的某个值或某个比例,具体请参阅上述第一种方式,此处不予赘述。2) The performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan. Wherein, the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
在本申请上述实施方式中,具体阐述了第二目标模型满足第三预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the second target model satisfies the third preset requirement are specifically described, which has wide applicability and flexibility.
在第二方面的一种可能的实现方式中,执行开销满足第一预设要求包括但不限于:In a possible implementation of the second aspect, the execution overhead meeting the first preset requirement includes but is not limited to:
1)目标物理计划的执行开销在q个执行开销中开销最低,该q个执行开销为基于输入数据库的SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,其中,q≥1;1) The execution overhead of the target physical plan is the lowest among the q execution overheads. The q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statements input into the database. One physical plan corresponds to one execution overhead, where ,q≥1;
2)目标物理计划的执行开销低于某个预设值(可称为第一预设阈值)。为便于阐述,在本申请后续实施例中,均以目标物理计划的执行开销在q个执行开销中开销最低作为执行开销满足第一预设要求的情形,后续不予赘述。2) The execution cost of the target physical plan is lower than a certain preset value (which may be referred to as a first preset threshold). For ease of illustration, in subsequent embodiments of the present application, the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
在本申请上述实施方式中,具体阐述了执行开销满足第一预设要求的几种具体情形,具备广泛适用性以及灵活性。In the above embodiments of the present application, several specific situations in which the execution cost meets the first preset requirement are specifically described, which has wide applicability and flexibility.
本申请实施例第三方面提供一种计算机设备,该计算机设备具有实现上述第二方面或第二方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。A third aspect of the embodiments of the present application provides a computer device, where the computer device has a function of implementing the method of the second aspect or any possible implementation manner of the second aspect. This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.
本申请实施例第四方面提供一种计算机设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于调用该存储器中存储的程序以执行本申请实施例第二方面或第二方面任意一种可能实现方式的方法。The fourth aspect of the embodiment of the present application provides a computer device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call the program stored in the memory to execute the second aspect of the embodiment of the present application Or any possible implementation method of the second aspect.
本申请实施例第五方面提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机可以执行上述第二方面或第二方面任意一种可能实现方式的方法。The fifth aspect of the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer, the computer can execute any one of the above-mentioned second aspect or the second aspect. method of possible implementation.
本申请实施例第六方面提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第二方面或第二方面任意一种可能实现方式的方法。The sixth aspect of the embodiments of the present application provides a computer program, which, when running on a computer, causes the computer to execute the method of the above-mentioned second aspect or any possible implementation manner of the second aspect.
本申请实施例第七方面提供了一种芯片,该芯片包括至少一个处理器和至少一个接口电路,该接口电路和该处理器耦合,至少一个接口电路用于执行收发功能,并将指令发送给至少一个处理器,至少一个处理器用于运行计算机程序或指令,其具有实现如上述第二方面或第二方面任意一种可能实现方式的方法的功能,该功能可以通过硬件实现,也可以通过软件实现,还可以通过硬件和软件组合实现,该硬件或软件包括一个或多个与上述功能相对应的模块。此外,该接口电路用于与该芯片之外的其它模块进行通信。The seventh aspect of the embodiment of the present application provides a chip, the chip includes at least one processor and at least one interface circuit, the interface circuit is coupled to the processor, and the at least one interface circuit is used to perform the function of sending and receiving, and send instructions to At least one processor, at least one processor is used to run computer programs or instructions, which has the function of realizing the method of the second aspect or any possible implementation mode of the second aspect above, and this function can be realized by hardware or by software Realization can also be achieved through a combination of hardware and software, where the hardware or software includes one or more modules corresponding to the above functions. In addition, the interface circuit is used to communicate with other modules outside the chip.
附图说明Description of drawings
图1为本申请实施例提供的构建数据库管理系统的一个系统架构示意图;Fig. 1 is a schematic diagram of a system architecture for constructing a database management system provided by the embodiment of the present application;
图2为本申请实施例提供的数据库管理系统的一个逻辑架构示意图;FIG. 2 is a schematic diagram of a logical architecture of the database management system provided by the embodiment of the present application;
图3为本申请实施例提供的优化器的一个原理示意图;Fig. 3 is a schematic diagram of the principle of the optimizer provided by the embodiment of the present application;
图4为本申请实施例提供的模型评估器的一个原理示意图;Fig. 4 is a schematic diagram of the principle of the model evaluator provided by the embodiment of the present application;
图5为本申请实施例提供的建议器的一个原理示意图;Fig. 5 is a schematic diagram of the principle of the suggester provided by the embodiment of the present application;
图6为本申请实施例提供的优化器与三种重新策略对比的一个示意图;FIG. 6 is a schematic diagram of the comparison between the optimizer provided by the embodiment of the present application and three re-strategies;
图7为本申请实施例提供的模型评估器与两种已知性能评估方法对比的一个示意图;Fig. 7 is a schematic diagram comparing the model evaluator provided by the embodiment of the present application with two known performance evaluation methods;
图8为本申请实施例提供的数据处理方法的一个流程示意图;FIG. 8 is a schematic flowchart of a data processing method provided in an embodiment of the present application;
图9为本申请实施例提供的计算机设备的一种结构示意图;FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application;
图10为本申请实施例提供的计算机设备的另一结构示意图。FIG. 10 is another schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种数据库管理系统、数据处理方法及设备,通过与机器学习技术相结合,以实现自动执行数据库调优、保护、更新以及其他传统上由DBA执行的常规数据库管理任务的功能,无需人工干预。The embodiment of the present application provides a database management system, data processing method and equipment, which can automatically perform database optimization, protection, update and other routine database management tasks traditionally performed by DBAs by combining with machine learning technology function without human intervention.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the specification and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a description of the manner in which objects with the same attribute are described in the embodiments of the present application. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product, or apparatus comprising a series of elements is not necessarily limited to those elements, but may include elements not expressly included. Other elements listed explicitly or inherent to the process, method, product, or apparatus.
本申请实施例涉及了许多关于数据库、模型等相关知识,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。应理解的是,相关的概念解释可能会因为本申请实施例的具体情况有所限制,但并不代表本申请仅能局限于该具体情况,在不同实施例的具体情况可能也会存在差异,具体此处不做限定。The embodiment of the present application involves a lot of related knowledge about databases, models, etc. In order to better understand the solution of the embodiment of the present application, the following first introduces related terms and concepts that may be involved in the embodiment of the present application. It should be understood that the interpretation of related concepts may be limited due to the specific conditions of the embodiment of the application, but it does not mean that the application is limited to the specific conditions, and there may be differences in the specific conditions of different embodiments. Specifically, there is no limitation here.
(1)数据库(1) database
数据库是一个按数据结构来存储和管理数据的计算机软件系统。数据库的概念实际包括两层意思:a、数据库是一个实体,它是能够合理保管数据的“仓库”,用户在该“仓库”中存放要管理的事务数据,“数据”和“库”两个概念结合成为数据库。b、数据库是数据管理的新方法和技术,它能更合适的组织数据、更方便的维护数据、更严密的控制数据和更有效的利用数据。A database is a computer software system that stores and manages data according to its data structure. The concept of the database actually includes two meanings: a. The database is an entity, which is a "warehouse" that can reasonably store data. Users store transaction data to be managed in the "warehouse". Concepts are combined into a database. b. Database is a new method and technology of data management. It can organize data more appropriately, maintain data more conveniently, control data more closely and utilize data more effectively.
(2)数据库软件(2) Database software
数据库软件部署在本地设备上,如部署在本地服务器、本地终端设备(如,手机、智能手表、个人电脑等),通常以单个或多个进程的形式存在,因此数据库软件也可称为数据库进程。Database software is deployed on local devices, such as local servers and local terminal devices (such as mobile phones, smart watches, personal computers, etc.), and usually exists in the form of single or multiple processes, so database software can also be called database processes .
(3)神经网络(3) neural network
神经网络可以是由神经单元组成的,具体可以理解为具有输入层、隐含层、输出层的神经网络,一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。其中, 具有很多层隐含层的神经网络则称为深度神经网络(deep neural network,DNN)。神经网络中的每一层的工作可以用数学表达式
Figure PCTCN2022111991-appb-000001
来描述,从物理层面,神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由
Figure PCTCN2022111991-appb-000002
完成,4的操作由“+b”完成,5的操作则由“a()”来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合,其中,W是神经网络各层的权重矩阵,该矩阵中的每一个值表示该层的一个神经元的权重值。该矩阵W决定着上文所述的输入空间到输出空间的空间变换,即神经网络每一层的W控制着如何变换空间。训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
A neural network can be composed of neural units. Specifically, it can be understood as a neural network with an input layer, a hidden layer, and an output layer. Generally speaking, the first layer is the input layer, and the last layer is the output layer. The number of layers in the middle is is the hidden layer. Among them, a neural network with many hidden layers is called a deep neural network (DNN). The work of each layer in a neural network can be expressed mathematically
Figure PCTCN2022111991-appb-000001
To describe, from the physical level, the work of each layer in the neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the column space) through five operations on the input space (a collection of input vectors). Space), these five operations include: 1. Dimension up/down; 2. Zoom in/out; 3. Rotate; 4. Translation; 5. "Bending". Among them, the operations of 1, 2, and 3 are performed by
Figure PCTCN2022111991-appb-000002
Complete, the operation of 4 is completed by "+b", and the operation of 5 is realized by "a()". The reason why the word "space" is used here is because the object to be classified is not a single thing, but a class of things. Space refers to the collection of all individuals of such things, where W is the weight matrix of each layer of the neural network , each value in this matrix represents the weight value of a neuron in the layer. The matrix W determines the space transformation from the input space to the output space mentioned above, that is, the W of each layer of the neural network controls how to transform the space. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially to learn the way to control the spatial transformation, and more specifically, to learn the weight matrix.
(4)损失函数(loss function)(4) Loss function (loss function)
在训练神经网络的过程中,因为希望神经网络的输出尽可能的接近真正想要预测的值,可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重矩阵(当然,在第一次更新之前通常会有初始化的过程,即为神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重矩阵让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training the neural network, because it is hoped that the output of the neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then according to the difference between the two to update the weight matrix of each layer of neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the neural network), for example, if the predicted value of the network is high, then Adjust the weight matrix to make it predict lower, and keep adjusting until the neural network can predict the desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which is an important equation for measuring the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the neural network becomes a process of reducing the loss as much as possible.
(5)反向传播算法(5) Back propagation algorithm
在神经网络的训练过程中,可以采用误差反向传播(back propagation,BP)算法修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。In the training process of the neural network, the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal until the output will generate an error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
(6)机器学习(6) Machine learning
机器学习是一门多领域交叉学科,涉及概率论、统计学、凸分析、算法复杂度等多门学科,专门研究计算机怎么模拟或实现人类的学习行为,以获得知识或技能,重新组织已有的知识结构使之不断改善自身的功能。下面对本申请实施例中使用到的几种机器学习模型进行介绍:Machine learning is a multi-field interdisciplinary subject, involving probability theory, statistics, convex analysis, algorithm complexity and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire knowledge or skills and reorganize existing The knowledge structure enables it to continuously improve its own functions. The following introduces several machine learning models used in the embodiments of this application:
a、蒙特卡洛树搜索a. Monte Carlo tree search
蒙特卡洛树搜索(monte carlo tree search,MCTS)是一种在人工智能问题上进行策略优化的方法,通常是对于那些在组合游戏中需要移动规划的部分,蒙特卡洛树搜索将随机模拟的通用性与树搜索的准确性进行了结合。由于它在计算机围棋上的成果和某些难题的 解决潜力,蒙特卡洛树搜索算法的应用领域除了博弈,已经可以涵盖任何能够以(状态,动作)形式描述、通过模拟来预测结果的领域(如,查询重写中的重写序列选择问题)。Monte Carlo tree search (monte carlo tree search, MCTS) is a method of strategy optimization on artificial intelligence problems, usually for those parts that require mobile planning in combination games, Monte Carlo tree search will randomly simulate Versatility combined with tree search accuracy. Due to its achievements in computer Go and the potential to solve some difficult problems, the application field of the Monte Carlo tree search algorithm can cover any field that can be described in the form of (state, action) and predict the result through simulation ( e.g., the rewrite sequence selection problem in query rewriting).
b、循环神经网络(recurrent neural networks,RNN)b. Recurrent neural networks (RNN)
RNN是神经网络的一种,RNN的目的是用来处理序列数据。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,每层之间的节点是无连接的。但是这种普通的神经网络对于很多问题却无能无力。例如,要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐藏层之间的节点不再无连接而是有连接的,并且隐藏层的输入不仅包括输入层的输出还包括上一时刻隐藏层的输出。理论上,RNN能够对任何长度的序列数据进行处理。RNN is a kind of neural network, and the purpose of RNN is to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and the nodes between each layer are unconnected. But this ordinary neural network is powerless for many problems. For example, to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will remember the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layers are no longer connected but connected, and the input of the hidden layer not only includes the output of the input layer Also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length.
由于单纯的RNN因为无法处理随着递归,权重指数级爆炸或梯度消失问题,难以捕捉长期时间关联;而结合不同的LSTM可以很好解决这个问题。时间循环神经网络可以描述动态时间行为,因为和前馈神经网络(feed forward neural network)接受较特定结构的输入不同,RNN将状态在自身网络中循环传递,因此可以接受更广泛的时间序列结构输入。Since pure RNN cannot deal with recursion, weight exponential explosion or gradient disappearance, it is difficult to capture long-term time correlation; and combining different LSTMs can solve this problem well. The time recurrent neural network can describe dynamic time behavior, because unlike the feedforward neural network (feed forward neural network) that accepts a more specific structure input, RNN circulates the state in its own network, so it can accept a wider range of time series structure input .
c、长短期记忆人工神经网络(long short-term memory,LSTM)c. Long short-term memory artificial neural network (long short-term memory, LSTM)
LSTM也可称为长短期记忆网络,是一种时间循环神经网络,是为了解决一般的RNN存在的长期依赖问题而专门设计出来的,所有的RNN都具有一种重复神经网络模块的链式形式。在标准RNN中,这个重复的结构模块只有一个非常简单的结构,例如一个tanh层。LSTM can also be called long-term short-term memory network. It is a time-cycle neural network. It is specially designed to solve the long-term dependence problem of general RNN. All RNNs have a chain form of repeated neural network modules. . In standard RNNs, this repeated structural module has only a very simple structure, such as a tanh layer.
d、树长短期记忆人工神经网络(tree long short-term memory,Tree-LSTM)d. Tree long short-term memory artificial neural network (tree long short-term memory, Tree-LSTM)
传统LSTM由于能够保持序列信息,在序列模型任务上能够有较好的表现。但是,数据库中的很多问题都建模成树形结构,如查询语句的逻辑计划、物理计划,而LSTM只能输入线性的序列,导致对于树形的输入无法很好处理。Tree-LSTM主要将LSTM拓展到树形的输入结构上,并在预测树上的语义相关性、语义分类任务上,超过传统LSTM模型。Traditional LSTM can perform better on sequence model tasks because it can maintain sequence information. However, many problems in the database are modeled as a tree structure, such as the logical plan and physical plan of query statements, and LSTM can only input linear sequences, which makes it difficult to handle tree-shaped input well. Tree-LSTM mainly extends LSTM to the tree-shaped input structure, and surpasses the traditional LSTM model in predicting semantic relevance and semantic classification tasks on the tree.
e、卷积神经网络(convolutional neural networks,CNN)e. Convolutional neural networks (CNN)
CNN是一种带有卷积结构的深度神经网络,卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。CNN is a deep neural network with a convolutional structure. The convolutional neural network includes a feature extractor composed of a convolutional layer and a subsampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolutional feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can only be connected to some adjacent neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. That means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可 以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
f、图卷积神经网络(graph convolutional network,GCN)f. Graph convolutional network (GCN)
受到CNN在计算机视觉领域所获巨大成功的激励,近来出现了很多为图数据重新定义卷积概念的方法。这些方法属于GCN的范畴。由于谱方法通常同时处理整个图,并且难以并行或扩展到大图上,基于空间的GCN通过聚集近邻节点的信息,直接在图结构上执行卷积。结合采样策略,计算可以在一个批量的节点而不是整个图中执行,从而可以有效的提高数据库中与图模型相关的问题(如,并发查询建模)的处理效率。Motivated by the great success of CNNs in computer vision, many approaches have recently emerged to redefine the concept of convolution for graph data. These methods belong to the category of GCN. Since spectral methods usually process the entire graph at the same time and are difficult to parallelize or scale to large graphs, spatial-based GCNs directly perform convolutions on the graph structure by aggregating information from neighboring nodes. Combined with the sampling strategy, calculations can be performed on a batch of nodes instead of the entire graph, which can effectively improve the processing efficiency of problems related to graph models in databases (such as concurrent query modeling).
g、强化学习(reinforcement learning,RL)g. Reinforcement learning (reinforcement learning, RL)
RL是机器学习中的一个领域,强调如何基于环境而行动,以取得最大化的预期利益。RL是除了监督学习和非监督学习之外的第三种基本的机器学习方法。与监督学习不同的是,RL不需要带标签的输入/输出对,同时也无需对非最优解的精确地纠正。其关注点在于寻找探索(对未知领域的)和利用(对已有知识的)的平衡,强化学习中的“探索-利用”的交换,在多臂老虎机问题和有限马尔科夫决策过程(markov decision process,MDP)中研究得最多。RL is a field in machine learning that emphasizes how to act based on the environment to maximize the expected benefit. RL is the third basic machine learning method besides supervised learning and unsupervised learning. Unlike supervised learning, RL does not require labeled input/output pairs, nor does it require accurate correction of non-optimal solutions. Its focus is to find the balance between exploration (for unknown domains) and utilization (for existing knowledge), the exchange of "exploration-utilization" in reinforcement learning, multi-armed bandit problems and limited Markov decision-making processes (markov decision process (MDP) is the most studied.
例如,在本申请实施例中,对于多连接操作的查询语句,RL控制每次连接操作的执行顺序,保证总的查询执行开销最小。RL也有训练过程,需要不断的执行动作,观察执行动作后的效果,积累经验形成一个模型。与有监督学习不同的是,这里每个动作一般没有直接标定的标签值作为监督信号,系统只给算法执行的动作一个反馈。For example, in the embodiment of the present application, for query statements of multi-connection operations, RL controls the execution sequence of each connection operation to ensure that the total query execution cost is minimized. RL also has a training process, which requires continuous execution of actions, observation of the effect of the execution of actions, and accumulation of experience to form a model. Different from supervised learning, each action here generally does not have a directly calibrated label value as a supervisory signal, and the system only gives a feedback to the action performed by the algorithm.
(7)远程过程调用协议(remote procedure call,RPC)(7) Remote procedure call protocol (remote procedure call, RPC)
RPC是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议。RPC is a protocol for requesting services from remote computer programs over a network without requiring knowledge of the underlying network technology.
在本申请实施例中,利用RPC实现数据库内核组件与外部模型管理器的快速交互,如提出模型更新请求、创建新的模型等。In the embodiment of the present application, RPC is used to realize the rapid interaction between the database kernel component and the external model manager, such as making a model update request, creating a new model, and so on.
此外,在介绍本申请实施例之前,先对目前数据库管理系统的几种管理架构进行简单介绍,使得后续便于理解本申请实施例。In addition, before introducing the embodiment of the present application, several management structures of the current database management system are briefly introduced, so that the subsequent understanding of the embodiment of the present application is facilitated.
方式一、数据库管理系统SageDB Method 1. Database management system SageDB
SageDB的核心思想是构建多个关于数据分布的累积分布函数(cumulative distribution function,CDF),分别利用这些CDF模型生成学习型索引、替换代价估计模型以及加速物理算子等。The core idea of SageDB is to build multiple cumulative distribution functions (CDF) about data distribution, and use these CDF models to generate learning indexes, replacement cost estimation models, and accelerated physical operators.
作为一个概念系统,SageDB首先假设其可以学到一个“完美”的CDF模型,即模型的概率分布精确符合相应数据表的数据分布情况,然后分别在数据库不同模块插入CDF模型,提供基于机器学习的推理能力:1)对于优化器,SageDB直接用在单表上学习得到的CDF模型替代代价估计模型,估计不同查询的代价(cost)和基数(cardinality);2)对于数据结构,SageDB直接将传统多路搜索树(balance tree,B-Tree)上的块(block)替代为学到的CDF模型,并总结异常情况列表,用于纠正CDF定位错误的问题;3)对于物理算子加速,以排序操作为例,SageDB先根据学到的CDF模型,对底层数据进行粗排(输入 数据值,输出相对位置序号),然后用传统排序算法(如,快排)得到最后的排序结果。As a conceptual system, SageDB first assumes that it can learn a "perfect" CDF model, that is, the probability distribution of the model accurately conforms to the data distribution of the corresponding data table, and then inserts the CDF model into different modules of the database to provide machine learning-based Reasoning ability: 1) For the optimizer, SageDB directly replaces the cost estimation model with the CDF model learned on a single table to estimate the cost (cost) and cardinality of different queries; 2) For the data structure, SageDB directly uses the traditional The block on the multi-way search tree (balance tree, B-Tree) is replaced by the learned CDF model, and a list of abnormal conditions is summarized to correct the problem of CDF positioning errors; 3) For physical operator acceleration, use Take the sorting operation as an example. SageDB first performs rough sorting on the underlying data based on the learned CDF model (input data values, output relative position numbers), and then use traditional sorting algorithms (such as quick sort) to get the final sorting results.
SageDB仍然处于研究出气,采用的是简单的CDF实验模型,难以适应大规模数据集等问题;并且,SageDB的所有学习功能都基于在单表上学习得到的CDF模型簇,适合简单的单表查询场景,但是无法有效处理多表连接问题;此外,CDF只用于学习数据或负载分布,不能提供如查询重写、查询计划生成、异常诊断等智能决策功能,并且SageDB没有提供多CDF的模型管理和模型更新机制。SageDB is still in the process of research, using a simple CDF experimental model, which is difficult to adapt to large-scale data sets and other issues; and all learning functions of SageDB are based on the CDF model cluster learned on a single table, suitable for simple single-table query scenario, but cannot effectively deal with multi-table connection problems; in addition, CDF is only used to learn data or load distribution, and cannot provide intelligent decision-making functions such as query rewriting, query plan generation, abnormal diagnosis, etc., and SageDB does not provide multi-CDF model management and model update mechanism.
方式二、关系数据库管理系统OracleMethod 2, relational database management system Oracle
Oracle数据库在数据库的自动化运维上具有很长时间的投入,Oracle 10g引入了各种自我管理功能,以简化管理、提高效率并降低与系统管理相关的总成本。这些管理功能包括:1)与SQL查询优化相关的统计分析;2)自动存储管理器:简化数据文件、控制文件和日志文件的存储方式;3)自动工作负载存储库:存储和管理用于自我调整的信息;4)自动数据库诊断监视器:分析存储的统计数据,识别可能的性能瓶颈并提供解决发现问题的建议;5)自动查询优化:通过使用查询重写规则和代价模型来确定执行结构化查询语言(structured query language,SQL)查询的有效方式;6)自动为SQL语句或工作负载生成调优建议。这些推荐被提供给用户,用户主观接受或拒绝这些建议;7)SQL调优建议根据查询优化器提供的信息做出决策,包括自动数据库诊断监视器和自动工作负载存储库;8)根据当前负载量推荐索引(包括位图索引、基于函数索引和B树索引)、物化视图和表分区、索引。从SQL缓存中获取内容,并在分析后选择适当的索引和物化视图;9)优化器统计收集器,收集有关优化的统计信息;10)通过管理数据库快照并将信息存储,协调服务器内的所有自主管理;11)服务器生成警报,配置系统以在触发事件时自动生成警报;12)自动化安装前和安装后任务:在安装前会自动检查系统,以保证安装过程的成功并建议更改;13)自动管理Oracle数据库实例使用的共享内存,并使管理员无需手动配置共享内存组件;14)数据库资源管理器允许DBA在逻辑上将工作负载划分为不同的单元,并将中央处理器(central processing unit,CPU)资源分配给这些单元,而无需额外开销。在高峰时段,联机事务处理负载(on-line transaction processing,OLTP)应优先于联机事务分析负载(on-line analytical processing,OLAP),反之亦然。其调度机制用于固定时间间隔并控制一次执行的活动会话数。当活动会话的可用插槽被新会话填满时,剩余的会话将排队,直到某个插槽可用。其首个自治数据库发布版本是19c,以公有云的形式对外部提供服务,包括了自动化索引、分布列、物化视图推荐等功能。Oracle Database has a long-term investment in the automated operation and maintenance of the database. Oracle 10g introduces various self-management functions to simplify management, improve efficiency, and reduce the total cost related to system management. These management functions include: 1) Statistical analysis related to SQL query optimization; 2) Automatic storage manager: Simplifies how data files, control files and log files are stored; 3) Automatic workload repository: Stores and manages Adjusted information; 4) Automatic database diagnostic monitor: analyze stored statistics, identify possible performance bottlenecks and provide suggestions for solving found problems; 5) Automatic query optimization: determine the execution structure by using query rewriting rules and cost models 6) Automatically generate tuning suggestions for SQL statements or workloads. These recommendations are provided to users, who accept or reject these recommendations subjectively; 7) SQL tuning recommendations make decisions based on information provided by query optimizers, including automatic database diagnostic monitors and automatic workload repositories; 8) based on current load Quantity recommended indexes (including bitmap indexes, function-based indexes, and B-tree indexes), materialized views, table partitions, and indexes. Get the content from the SQL cache, and select the appropriate index and materialized view after analysis; 9) Optimizer statistics collector, collect statistics about optimization; 10) By managing database snapshots and storing information, coordinating all in the server Self-management; 11) Server generates alerts, configure the system to automatically generate alerts when events are triggered; 12) Automate pre-installation and post-installation tasks: systems are automatically checked before installation to ensure the success of the installation process and recommend changes; 13) Automatically manage the shared memory used by the Oracle database instance, and eliminate the need for administrators to manually configure shared memory components; 14) The database resource manager allows the DBA to logically divide the workload into different units, and central processing unit (central processing unit) , CPU) resources are allocated to these units without additional overhead. During peak hours, on-line transaction processing (OLTP) should take precedence over on-line analytical processing (OLAP), and vice versa. Its scheduling mechanism is used for fixed time intervals and controls the number of active sessions executing at one time. When an active session's available slots are filled with new sessions, the remaining sessions are queued until a slot becomes available. Its first autonomous database release version is 19c, which provides external services in the form of public cloud, including functions such as automatic indexing, distributed columns, and materialized view recommendations.
然而,Oracle数据库的优化能力主要以单点、独立功能的形式呈现,各个功能并没有统一起来形成闭环,用户需要根据自己的需求进行按需调用;此外,Oracle数据库的自治功能主要体现在数据统计、分析和管理阶段,基于有限规则或传统统计学习,对数据库异常的优化能力有限;并且,Oracle数据库没有提供统一的模型和训练数据管理、更新机制,也没有提供组件性能的验证功能,属于被动式调优。However, the optimization capabilities of the Oracle database are mainly presented in the form of single-point, independent functions, and the functions are not unified to form a closed loop. Users need to call on demand according to their own needs; in addition, the autonomous functions of the Oracle database are mainly reflected in the In the stage of analysis and management, based on limited rules or traditional statistical learning, the ability to optimize database exceptions is limited; moreover, Oracle database does not provide a unified model and training data management and update mechanism, nor does it provide component performance verification functions, which is passive Tuning.
综上所述,为解决上述问题,本申请实施例提供了一种新的数据库管理系统,该系统基于机器学习算法和专家经验,实现自学习内核和模型优化,构筑数据库全方位的自治功能。To sum up, in order to solve the above problems, the embodiment of this application provides a new database management system, which is based on machine learning algorithms and expert experience, realizes self-learning kernel and model optimization, and builds all-round autonomous functions of the database.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。Embodiments of the present application are described below in conjunction with the accompanying drawings. Those of ordinary skill in the art know that, with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
首先,对本申请实施例构建的数据库管理系统的系统架构进行描述,具体请参阅图1,图1为本申请实施例提供的构建数据库管理系统的一个系统架构示意图,该系统架构包括数据库软件101、机器学习平台组件102、自学习建议器103(可简称为建议器103),下面对各个结构的模块功能进行介绍:First, describe the system architecture of the database management system constructed by the embodiment of the present application. Please refer to FIG. 1 for details. FIG. 1 is a schematic diagram of a system architecture for constructing the database management system provided by the embodiment of the present application. Machine learning platform component 102, self-learning suggester 103 (may be referred to as suggester 103 for short), the module functions of each structure are introduced below:
(1)数据库软件101的模块功能(1) The module function of database software 101
数据库软件101(类似服务器软件)部署在本地设备上,例如,可部署在本地服务器、本地终端设备(如手机、个人电脑等)上,通常以单个或多个进程的形式存在。本申请实施例的系统架构包括数据库的自学习内核组件,替代或者置换传统的数据库原生内核组件的算法或者实现(需注意的是,原生内核组件没有删减,还是一样在数据库进程中),实现数据库整体可靠性或性能提升。自学习内核组件意味着不是单点通过机器学习方法替代某个算法,突出的能力是基于系统负载变化或者业务状态变化,自动适应场景,基于算法训练模型进行自动更新,接入反馈机制和验证机制,进行模型自动漂移,持续模型可用性。The database software 101 (similar to server software) is deployed on a local device, for example, can be deployed on a local server or a local terminal device (such as a mobile phone, a personal computer, etc.), and usually exists in the form of a single or multiple processes. The system architecture of the embodiment of the present application includes the self-learning kernel components of the database, and replaces or replaces the algorithm or implementation of the traditional database native kernel components (it should be noted that the native kernel components are not deleted, and they are still in the database process). Overall database reliability or performance improvement. The self-learning kernel component means that it does not replace a certain algorithm through machine learning at a single point. Its outstanding capabilities are based on changes in system load or business status, automatic adaptation to scenarios, automatic updates based on algorithm training models, and access to feedback mechanisms and verification mechanisms. , for automatic model drift and continuous model availability.
需要说明的是,在本申请的实施例中,自学习内核组件可以标识为自学习优化器1011(可简称为优化器1011)、自学习索引、自学习存储及自学习执行器等,不同的自学习内核组件可对应实现原生内核组件中对应模块的功能,为便于阐述,在本申请实施例中,以优化器1011为主介绍自学习内核组件的实现机理以及调用逻辑。It should be noted that, in the embodiment of the present application, the self-learning kernel components can be identified as self-learning optimizer 1011 (may be referred to as optimizer 1011 for short), self-learning index, self-learning storage and self-learning executor, etc., different The self-learning kernel component can correspondingly realize the functions of the corresponding modules in the native kernel component. For the convenience of explanation, in the embodiment of the present application, the implementation mechanism and calling logic of the self-learning kernel component are mainly introduced by using the optimizer 1011 .
(2)机器学习平台组件102的模块功能(2) Module functions of the machine learning platform component 102
在本申请实施例中,无论是使用哪种机器学习算法,数据来源均为数据库系统,包括但不限于数据库内部指标(如,每秒传输的事物处理个数(transactions per second,TPS)、缓存命中、活跃事物、资源使用等)、操作系统信息、日志信息等。这些信息将用于模型训练,具体可以通过信息采集器将这些信息写入训练数据收集器1021(也可称为训练数据收集平台),模型管理器1022(也可称为模型管理平台)基于要训练的算法,结合数据信息,完成模型训练。训练好的模型将被推送到模型评估器1023,只有经过评估符合业务期望的模型,才可标识为即将应用的模型,否则需要重新调整及训练。In the embodiment of this application, no matter which machine learning algorithm is used, the data source is the database system, including but not limited to the internal indicators of the database (such as the number of transaction processing per second (transactions per second, TPS), cache hits, active transactions, resource usage, etc.), operating system information, log information, etc. These information will be used for model training. Specifically, the information can be written into the training data collector 1021 (also referred to as the training data collection platform) through the information collector, and the model manager 1022 (also referred to as the model management platform) based on the requirements The training algorithm, combined with the data information, completes the model training. The trained model will be pushed to the model evaluator 1023, and only the model that has been evaluated and meets business expectations can be identified as the model to be applied, otherwise it needs to be readjusted and trained.
这里需要注意的是,信息采集器可以部署在数据库软件101内,也可以是数据库软件101外单独部署的一个进程,单独部署的目的是与数据库解耦,用于实现特定功能(即数据库中进程的运行数据的采集功能),具体本申请实施例对信息采集器的部署方式不做限定。It should be noted here that the information collector can be deployed in the database software 101, or it can be a process deployed separately outside the database software 101. The purpose of the separate deployment is to decouple from the database and to implement specific functions (that is, the process in the database The collection function of the operating data), the specific embodiment of the present application does not limit the deployment mode of the information collector.
还需要注意的是,在本申请的另一些实施方式中,由于信息采集器的功能是采集数据,为便于阐述,可将信息采集器的功能集成在优化器1011内,或集成在训练数据采集器1021内,具体本申请对此不做限定,为便于阐述,在本申请下述实施例中,均以训练数据收集器1021兼具信息采集器的功能为例进行说明,后续不予赘述。It should also be noted that in other embodiments of the present application, since the function of the information collector is to collect data, for the sake of illustration, the function of the information collector can be integrated in the optimizer 1011, or integrated in the training data collection The specific application does not limit this, but for the convenience of explanation, in the following embodiments of the application, the training data collector 1021 also functions as an information collector is taken as an example for illustration, and details will not be described later.
需要说明的是,在本申请实施例中,机器学习平台组件102中所包括的模型可以是事先经过预训练的,部署在该系统架构中的模型均可以是指经过预训练后的模型,在这种情况下,本申请实施例训练数据收集器1021收集得到的训练数据可用于对预训练后的模型进行微调(例如,在模型应用一段时间后,出现性能下降时,可进行微调);机器学习平台组 件102中所包括的模型也可以是事先没有经过预训练的,而是直接将初始化的模型部署于该系统架构,再采用训练数据收集器1021收集得到的训练数据对各个模型进行训练以及后续的微调,具体本申请对机器学习平台组件102中所包括的模型部署时的状态(即是否经过预训练)不做限定。It should be noted that, in the embodiment of the present application, the models included in the machine learning platform component 102 may be pre-trained in advance, and the models deployed in the system architecture may refer to pre-trained models. In this case, the training data collected by the training data collector 1021 in the embodiment of the present application can be used to fine-tune the pre-trained model (for example, fine-tuning can be performed when the performance of the model declines after the model has been applied for a period of time); the machine The models included in the learning platform component 102 may also not be pre-trained in advance, but the initialized models are directly deployed on the system architecture, and then the training data collected by the training data collector 1021 are used to train each model and Subsequent fine-tuning, specifically, this application does not limit the state of the model included in the machine learning platform component 102 during deployment (that is, whether it has been pre-trained).
还需要说明的是,在本申请的一些实施方式中,机器学习平台组件102可以部署在远端设备(如,远端服务器,在图1中示意的是部署在远端设备上),也可以部署本地设备(如,本地服务器),甚至于和数据库同进程一起实现(即在数据库内核里实现)均可。单独组件程序是对数据库已有能力侵入式最小方式,可以作为附件迭代式逐步替换数据库内核模块能力。部署在本地设备也是如此,对数据库侵入性小,但是会和数据库强制同一设备资源,通常需要新增调度组件进行平衡及资源管控。在数据库内核集成机器学习组件平台,即数据库本身提供机器学习(包括不限于深度学习、强化学习等)能力,对数据库侵入性大,但数据隐私保护好,减少通信开销,接口实现间接方便,更加易于模型调优或者微调。It should also be noted that, in some embodiments of the present application, the machine learning platform component 102 may be deployed on a remote device (for example, a remote server, as shown in FIG. 1 is deployed on a remote device), or may Deploying local devices (for example, local servers) can even be implemented together with the database in the same process (that is, implemented in the database kernel). The individual component program is the least intrusive way to the existing capabilities of the database, and can be used as an accessory to iteratively replace the database kernel module capabilities gradually. The same is true for deployment on local devices. It is less intrusive to the database, but it will force the same device resources as the database. Usually, new scheduling components are required for balance and resource control. Integrate the machine learning component platform in the database kernel, that is, the database itself provides machine learning (including but not limited to deep learning, reinforcement learning, etc.) capabilities, which is very intrusive to the database, but the data privacy protection is good, communication overhead is reduced, and the interface is indirectly convenient and more convenient. Easy model tuning or fine-tuning.
(3)建议器103的模块功能(3) Module functions of the suggester 103
建议器103是用于发现数据库运行过程中可能存在问题,并进行诊断和调优,用于数据库的智能化运维管理。建议器103也需要机器学习平台组件,进行智能运维过程中使用的算法模型管理。使用的机器学习平台可以和数据库系统使用的机器学习平台组件同源,即两者可以使用同一个机器学习平台,或者分别管理,即建议器103中的模型可以单独部署一个机器学习平台组件,具体本申请对此不做限定,但机器学习平台组件的功能或者机制没有变化,也是进行模型自动更新,提供学习和反馈机制,保证模型的可用性。The suggester 103 is used to discover possible problems in the running process of the database, and perform diagnosis and tuning for intelligent operation and maintenance management of the database. The recommender 103 also needs a machine learning platform component to manage the algorithm model used in the intelligent operation and maintenance process. The machine learning platform used can be of the same origin as the machine learning platform components used by the database system, that is, the two can use the same machine learning platform, or can be managed separately, that is, the model in the suggester 103 can deploy a machine learning platform component separately, specifically This application does not limit this, but the functions or mechanisms of the machine learning platform components have not changed, and the model is also automatically updated to provide a learning and feedback mechanism to ensure the availability of the model.
这里需要说明的是,建议器103非数据库内核能力体现,但用于进行数据库管理,可以对数据库内核模块提供的能力进行调优或者加强。经过和数据库交互,建议器103获取更多信息和建议,模型更加优化,有利于系统智能化实施。What needs to be explained here is that the suggester 103 is not a manifestation of the database kernel capability, but is used for database management, and can tune or strengthen the capabilities provided by the database kernel module. After interacting with the database, the suggester 103 obtains more information and suggestions, and the model is more optimized, which is beneficial to the intelligent implementation of the system.
通常建议器103需要上报诊断信息及健康指数,也需要接受来自用户的指令,这时需要web前端来实现该功能,实现方式为常规操作,此处不予赘述。Usually, the suggester 103 needs to report diagnostic information and health index, and also needs to accept instructions from users. At this time, a web front-end is required to realize this function. The implementation method is a conventional operation, which will not be described here.
还需要说明的是,在本申请上述实施例中,数据管理系统的系统架构是包括了建议器103,而由上述描述可知,该建议器103的模块功能是发现数据库运行过程中可能存在问题,并进行诊断和调优。因此,在本申请的另一些实施方式中,该建议器103也可以不需要。It should also be noted that, in the above embodiments of the present application, the system architecture of the data management system includes a suggester 103, and it can be seen from the above description that the module function of the suggester 103 is to find possible problems in the database operation process, and perform diagnostics and tuning. Therefore, in other implementation manners of the present application, the suggester 103 may not be required.
基于上述对数据管理系统的系统架构的描述,下面具体介绍本申请实施例提供的数据库管理系统下的各个功能模块的具体功能和调用逻辑,各功能模块分别为图1所示的优化器1011、训练数据收集器1021(训练数据收集器也可称为训练数据收集平台)、模型管理器1022(模型管理器也可称为模型管理平台)、模型评估器1023、建议器103(在一些实施例中,建议器103也可不需要)。具体请参阅图2,图2为本申请实施例提供的数据库管理系统的一个逻辑架构示意图,其中,该数据库管理系统中的SQL查询解析器以及存储引擎为已有模块,其他均为该系统新增模块,新增模块包括自学习的优化器201(需注意的是,本申请提供的优化器201中包括的部分内核组件可以是原生的内核组件,但该优化器201中包括的n个模型是本申请新增的)、训练数据收集器202、模型管理器203、模型评估器204以及自学习的建议器205(在一些实施例中,建议器205也可不需要),下面分别 从具体功能和调用逻辑上对各个功能模块进行说明:Based on the above description of the system architecture of the data management system, the following specifically introduces the specific functions and calling logic of each functional module under the database management system provided by the embodiment of the present application. Each functional module is respectively the optimizer 1011 shown in Figure 1, Training Data Collector 1021 (Training Data Collector may also be referred to as Training Data Collection Platform), Model Manager 1022 (Model Manager may also be referred to as Model Management Platform), Model Evaluator 1023, Adviser 103 (in some embodiments , the suggester 103 may also not be required). Please refer to FIG. 2 for details. FIG. 2 is a schematic diagram of a logical architecture of the database management system provided by the embodiment of the present application, wherein the SQL query parser and the storage engine in the database management system are existing modules, and others are new to the system. Add module, the new module includes self-learning optimizer 201 (it should be noted that some kernel components included in the optimizer 201 provided by this application can be native kernel components, but the n models included in the optimizer 201 This application is newly added), training data collector 202, model manager 203, model evaluator 204 and self-learning suggester 205 (in some embodiments, suggester 205 may not be required), the following are respectively from the specific functions Each function module is described logically and called:
一、优化器2011. Optimizer 201
优化器中的大多数问题(例如,查询重写、代价估计和物理计划生成)都是NP-hard问题(很难有效算法来解决非确定性问题),现有的优化技术采用启发式方法,可能会陷入局部最优。因此,为了解决这些问题,本申请实施例提供的学习型的优化器使用机器学习技术来提高性能。Most problems in optimizers (e.g., query rewriting, cost estimation, and physical plan generation) are NP-hard problems (it is difficult to solve non-deterministic problems with efficient algorithms), and existing optimization techniques use heuristic methods, may fall into a local optimum. Therefore, in order to solve these problems, the learning optimizer provided by the embodiment of the present application uses machine learning technology to improve performance.
具体地,在本申请实施例中,优化器201中包括n个模型,n≥1。该优化器201,用于根据输入数据库的SQL语句,通过所述的n个模型得到最终待执行的物理计划(可称为目标物理计划),其中,该目标物理计划是指执行开销满足某预设要求(可称为第一预设要求)的物理计划。Specifically, in the embodiment of the present application, the optimizer 201 includes n models, where n≥1. The optimizer 201 is used to obtain the final physical plan (which may be called the target physical plan) to be executed through the n models according to the SQL statement input into the database, wherein the target physical plan means that the execution cost meets a certain predetermined A physical plan for setting requirements (which may be referred to as the first predetermined requirement).
需要说明的是,在本申请的一些实施方式中,执行开销满足第一预设要求包括但不限于:1)目标物理计划的执行开销在q个执行开销中开销最低,该q个执行开销为基于输入数据库的SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,其中,q≥1;2)目标物理计划的执行开销低于某个预设值(可称为第一预设阈值)。为便于阐述,在本申请后续实施例中,均以目标物理计划的执行开销在q个执行开销中开销最低作为执行开销满足第一预设要求的情形,后续不予赘述。It should be noted that, in some embodiments of the present application, the execution overhead meeting the first preset requirements includes but is not limited to: 1) The execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are The corresponding execution costs of q physical plans generated based on the SQL statements input into the database, one physical plan corresponds to one execution cost, where q≥1; 2) The execution cost of the target physical plan is lower than a certain preset value (which can be called is the first preset threshold). For ease of illustration, in subsequent embodiments of the present application, the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
需要说明的是,在本申请的一些实施方式中,优化器201具体可以包括三个模型,可分别称为模型A、模型B、模型C,分别用于执行逻辑查询重写、代价估计以及物理计划生成这三个步骤。这里需要注意的是,在本申请的另一些实施方式中,优化器201还可以包括更多或更少的模型,用于实现逻辑查询重写、代价估计以及物理计划生成的过程,在本申请实施例中,优化器201包括是三个模型仅为示意,后续不予赘述。It should be noted that, in some embodiments of the present application, the optimizer 201 may specifically include three models, which may be referred to as model A, model B, and model C respectively, which are respectively used to perform logical query rewriting, cost estimation, and physical The plan generates these three steps. It should be noted here that, in other implementations of this application, the optimizer 201 may also include more or fewer models for realizing the process of logical query rewriting, cost estimation, and physical plan generation. In this application In the embodiment, the three models included in the optimizer 201 are only for illustration, and will not be described in detail later.
具体地,该优化器201通过模型A,对输入数据库的SQL语句(也可称为SQL查询)进行逻辑查询重写,从而得到重写后的逻辑计划,其中,该模型A为基于树搜索算法构建的模型,例如,可以是蒙特卡洛树搜索算法;之后,该优化器201再通过模型B,根据该逻辑计划生成q个物理计划,其中,该模型B为基于深度学习算法构建的模型,例如,可以是基于Tree-LSTM的模型,q≥1;最后,该优化器201再通过模型C,计算与该q个物理计划对应的q个执行开销(一个物理计划对应一个执行开销),并根据该q个执行开销确定最终待执行的目标物理计划,其中,该模型C为基于强化学习算法构建的模型,例如,可以是基于DQN的模型。Specifically, the optimizer 201 performs logical query rewriting on the SQL statement (also referred to as SQL query) input to the database through model A, so as to obtain a rewritten logical plan, wherein the model A is based on a tree search algorithm The constructed model, for example, may be a Monte Carlo tree search algorithm; afterward, the optimizer 201 generates q physical plans according to the logical plan through the model B, wherein the model B is a model constructed based on a deep learning algorithm, For example, it may be a model based on Tree-LSTM, q≥1; finally, the optimizer 201 calculates the q execution costs corresponding to the q physical plans (one physical plan corresponds to one execution cost) through the model C, and The final target physical plan to be executed is determined according to the q execution costs, wherein the model C is a model constructed based on a reinforcement learning algorithm, for example, a model based on DQN.
需要说明的是,在本申请的一些实施方式中,优化器201执行的上述三个步骤具体可以通过学习型的重写器、学习型的代价估计器以及学习型的计划生成器这三个子模块生成,该过程具体可以是:首先本申请实施例提供的重写器使用基于树搜索算法的模型将输入数据库系统的初始SQL语句转换为数据库系统可识别的语义等价的逻辑计划A;再基于代价估计器得到改写的执行效率更高的逻辑计划B。计划生成器再基于逻辑计划B生成x个物理计划(如,x=5),再基于代价估计器得到x个物理计划各自的执行开销,从中选出执行开销满足第一预设要求的物理计划(即目标物理计划),最后利用生成的目标物理计划执行初始SQL语句的实际逻辑。It should be noted that, in some embodiments of the present application, the above three steps performed by the optimizer 201 can specifically be implemented through three sub-modules: a learning-type rewriter, a learning-type cost estimator, and a learning-type plan generator. Generation, this process can specifically be: first, the rewriter provided in the embodiment of the present application uses a model based on a tree search algorithm to convert the initial SQL statement input into the database system into a semantically equivalent logical plan A recognizable by the database system; then based on The cost estimator obtains a rewritten logical plan B with higher execution efficiency. The plan generator generates x physical plans (for example, x=5) based on the logical plan B, and then obtains the respective execution costs of the x physical plans based on the cost estimator, and selects the physical plan whose execution cost meets the first preset requirement (that is, the target physical plan), and finally use the generated target physical plan to execute the actual logic of the initial SQL statement.
为便于理解,以优化器201包括学习型的重写器、学习型的代价估计器以及学习型的计划生成器这三个子模块为例,对优化器201所执行的逻辑查询重写、代价估计以及物理计划生成的过程进行详细说明:For ease of understanding, taking the optimizer 201 including three sub-modules of a learning-type rewriter, a learning-type cost estimator, and a learning-type plan generator as an example, the logical query rewriting and cost estimation executed by the optimizer 201 And the process of generating the physical plan is described in detail:
(1)逻辑查询重写(1) Logical query rewriting
学习型重写器使用基于MCTS的方法将输入的SQL语句重写为等效但执行代价更低的查询。其首先构建一个策略树,其中根节点是原始查询,树节点是通过应用重写规则(重写规则是已知技术,如子查询提升、冗余过滤删除等,此处不予赘述)从其父节点重写得到的查询。本重写器利用一种MCTS方法,首先构建一棵以输入的逻辑计划为根节点的策略树,其中每个子节点表示父节点经过一个重写操作获得的语义等价逻辑计划;然后在策略树上迭代选择开销最小或者被选概率最低的等价逻辑计划,对策略树进行扩展(即根据被选计划的所有重写策略,在被选计划相应的树节点下添加新的子节点);最后选择策略树上执行开销最小的逻辑计划,作为重写器的输出。The learning rewriter uses an MCTS-based approach to rewrite the input SQL statement into an equivalent but less expensive query. It first constructs a strategy tree, where the root node is the original query, and the tree nodes are obtained from its The parent node rewrites the resulting query. This rewriter uses an MCTS method. First, it constructs a strategy tree with the input logical plan as the root node, in which each child node represents the semantically equivalent logical plan obtained by the parent node through a rewriting operation; then in the strategy tree The last iteration selects the equivalent logical plan with the least cost or the lowest probability of being selected, and expands the strategy tree (that is, according to all the rewriting strategies of the selected plan, add a new child node under the tree node corresponding to the selected plan); finally The logical plan with the least execution cost on the strategy tree is selected as the output of the rewriter.
(2)代价估计(2) Cost estimation
学习型代价估计器使用基于深度学习的方法来估计查询的代价和基数,其可以捕获不同列之间的相关性。其设计一个匹配查询语句的物理计划的树结构模型,类比物理计划由多个子计划组成,其中每个树结构模型可以由几个子模型组成,本申请实施例使用该树结构模型来估计计划的代价或基数。The learned cost estimator uses deep learning-based methods to estimate the cost and cardinality of queries, which can capture the correlation between different columns. It designs a tree structure model of a physical plan that matches the query statement. The analogy physical plan is composed of multiple sub-plans, and each tree structure model can be composed of several sub-models. The embodiment of this application uses the tree structure model to estimate the cost of the plan or base.
(3)物理计划生成(3) Physical plan generation
学习型计划生成器使用基于强化学习的方法来生成优化的物理计划(即目标物理计划),这里的逻辑是:生成的等价逻辑计划会对应存在多个执行计划树,每个执行计划树包括一个或多个执行算子(也可称为物理算子),每个执行计划树下可能有多条执行路径,涉及到不同的执行算子,一颗执行计划树对应一个总开销,我们的目的是找到总开销最小的那个物理计划。其使用带有树结构的LSTM的强化学习进行连接顺序选择,具体地,该计划生成器利用Tree-LSTM将当前物理计划编码成一个压缩向量,作为深度强化学习的状态,然后迭代多次,每次选择长期收益最高的连接操作,最后输出执行开销最低的物理计划,作为执行SQL语句的实际逻辑。本申请实施例可以采用GCN来捕获支持数据库模式更新和表名多别名的连接树的结构。该模型可以自动选择合适的物理运算符。The learning plan generator uses a method based on reinforcement learning to generate an optimized physical plan (that is, the target physical plan). The logic here is: the generated equivalent logical plan will correspond to multiple execution plan trees, and each execution plan tree includes One or more execution operators (also called physical operators), each execution plan tree may have multiple execution paths, involving different execution operators, one execution plan tree corresponds to a total cost, our The goal is to find the physical plan with the smallest total overhead. It uses the reinforcement learning of LSTM with tree structure to select the connection order. Specifically, the plan generator uses Tree-LSTM to encode the current physical plan into a compressed vector as the state of deep reinforcement learning, and then iterates multiple times, each Select the connection operation with the highest long-term income for the first time, and finally output the physical plan with the lowest execution cost as the actual logic for executing the SQL statement. In this embodiment of the present application, GCN may be used to capture the structure of the connection tree supporting database schema update and multiple aliases for table names. The model can automatically select the appropriate physical operator.
综上所述,该优化器201的调用逻辑为:在本申请实施例提供的学习型的优化器201中,对于经过SQL查询解析器输入的逻辑计划,重写器首先构建一棵以输入的逻辑计划为根节点的策略树,其中每个子节点表示父节点经过一个重写操作获得的等价逻辑计划。重写器基于MCTS在策略树上搜索开销最小的等价逻辑计划,然后输入给计划生成器。计划生成器迭代的调整连接操作的顺序,得到多个不同的物理计划。对于每个物理计划,利用代价估计器估计执行开销,然后选择执行开销最小的物理计划输出。To sum up, the call logic of the optimizer 201 is: in the learning optimizer 201 provided by the embodiment of the present application, for the logical plan input through the SQL query parser, the rewriter first builds a The logical plan is the strategy tree of the root node, where each child node represents the equivalent logical plan obtained by the parent node after a rewriting operation. Based on MCTS, the rewriter searches the strategy tree for an equivalent logical plan with the least cost, and then inputs it to the plan generator. The plan generator iteratively adjusts the sequence of join operations to obtain multiple different physical plans. For each physical plan, the cost estimator is used to estimate the execution cost, and then the physical plan output with the smallest execution cost is selected.
为便于理解,下面基于优化器201的原理,对优化器201的执行过程进行详细阐述,具体请参阅图3,图3为本申请实施例提供的优化器的一个原理示意图,该优化器通过机器学习模型对数据库执行过程中的SQL语句重写、代价估计以及物理执行计划选择。该过程中的模型可以通过反馈与增量训练进行更新,从而动态适配负载的变化。其核心步骤如 下:For ease of understanding, based on the principle of the optimizer 201, the execution process of the optimizer 201 is described in detail below. Please refer to FIG. 3 for details. FIG. 3 is a schematic diagram of the principle of the optimizer provided by the embodiment of the present application. The learning model rewrites SQL statements, estimates costs, and selects physical execution plans during database execution. The model in this process can be updated through feedback and incremental training to dynamically adapt to load changes. Its core steps are as follows:
1)在一条输入数据库的SQL语句在被执行之前,调用优化器的逻辑查询重写功能(即词学习型的重写器)对其进行语句级改写,以防不良的SQL书写习惯造成性能问题。具体过程如下:1) Before a SQL statement input into the database is executed, call the logical query rewriting function of the optimizer (that is, a word learning rewriter) to rewrite it at the statement level to prevent performance problems caused by bad SQL writing habits . The specific process is as follows:
a、第一步,首先构建一个策略树,其中根节点是输入SQL查询,非根节点是重写的查询语句。通过MCTS搜索算法发现获取最大收益的重写顺序,即在策略树上迭代选择开销最小或最不经常被选的等价逻辑计划,对策略树进行扩展(即根据被选计划的所有重写策略,在被选计划相应的树节点下添加新的子节点);最后选择策略树上执行开销最小的逻辑计划,作为重写器的输出。a. In the first step, a strategy tree is constructed first, in which the root node is the input SQL query, and the non-root nodes are rewritten query statements. Find the rewriting sequence that obtains the maximum benefit through the MCTS search algorithm, that is, iteratively select the equivalent logical plan with the least cost or the least frequently selected equivalent logical plan on the strategy tree, and expand the strategy tree (that is, according to all rewrite strategies of the selected plan , add a new child node under the corresponding tree node of the selected plan); finally select the logical plan with the least execution cost on the strategy tree as the output of the rewriter.
b、第二步,确认每个树节点的潜在收益,根据给定的查询语句(原始或重写中的查询)、可用的重写规则以及数据列信息,设计基于神经网络的收益估计模型(如,注意力层计算规则与规则之间在重写算子上的相似度),并预测查询语句后续可降低的执行开销。b. The second step is to confirm the potential income of each tree node, and design a neural network-based income estimation model ( For example, the attention layer calculates the similarity between rules and rules on the rewriting operator), and predicts the execution overhead that can be reduced in the subsequent queries.
c、第三步,为提高搜索效率,特别是当查询逻辑运算符很多时,利用动态规划,自底向上计算从每个节点及其子树选择最优的前N个没有祖先-后代关系的节点,保证总收益值最大;然后输出根节点对应的节点选择方案,表示从相应的N个节点扩展策略树有最高概率采到最优的重写查询。c. In the third step, in order to improve the search efficiency, especially when there are many logical operators in the query, dynamic programming is used to calculate from the bottom up to select the optimal top N nodes without ancestor-descendant relationship from each node and its subtrees. node, to ensure the maximum total revenue value; then output the node selection scheme corresponding to the root node, which means that the optimal rewritten query can be obtained with the highest probability from the corresponding N node expansion strategy tree.
d、第四步,达到最大迭代次数或者无新叶子节点,输出执行开销最小的重写查询语句。d. In the fourth step, when the maximum number of iterations is reached or there is no new leaf node, output the rewritten query statement with the smallest execution cost.
2)调用学习型的代价估计器,利用机器学习模型计算代价估计,通过Tree-LSTM模型估计任意执行计划的基数和成本,供优化器进行路径选择。具体过程如下:2) Call the learning-type cost estimator, use the machine learning model to calculate the cost estimate, and use the Tree-LSTM model to estimate the cardinality and cost of any execution plan for the optimizer to select the path. The specific process is as follows:
a、对于离线训练,训练数据基于采集的历史查询语句,通过特征提取,将训练数据输入待训练的模型。训练时,基于当前训练损失,通过反向传播方式更新模型的权重。a. For offline training, the training data is based on the collected historical query statements, and through feature extraction, the training data is input into the model to be trained. During training, based on the current training loss, the weight of the model is updated through backpropagation.
b、对于在线使用代价估计时,如果已经评估了当前节点根的子计划,从缓存池中获取子计划估计信息;如果当前子计划未评估过,则对根进行编码,将编码后的计划向量输入Tree-LSTM模型中,然后模型将评估的代价和计划返回给优化器,并且将新评估信息放入缓存池,以便后面查询语句使用。b. When using cost estimation online, if the sub-plan of the root of the current node has been evaluated, obtain sub-plan estimation information from the cache pool; if the current sub-plan has not been evaluated, encode the root, and convert the encoded plan vector Input the Tree-LSTM model, and then the model returns the estimated cost and plan to the optimizer, and puts the new evaluation information into the cache pool for subsequent query statements.
3)当侦测到的是复杂SQL语句时,则调用基于深度强化学习的模型进行执行路径枚举,相比传统数据库内的基于遗传算法等启发式算法可以更快、更有效率地获得可行的SQL执行计划。具体过程如下:3) When a complex SQL statement is detected, the model based on deep reinforcement learning is called to enumerate the execution path. Compared with the heuristic algorithm based on genetic algorithm in the traditional database, it can obtain feasible data faster and more efficiently. The SQL execution plan. The specific process is as follows:
a、通过深度Q网络(DQN)结合Tree-LSTM模型来发现最优计划。a. Discover the optimal plan by combining the Deep Q Network (DQN) with the Tree-LSTM model.
b、首先初始化空状态,仅包含查询的基本信息,之后,进一步设计许多中间状态,每一个中间状态包含部分计划树的可选计划,组成计划森林。b. Initialize the empty state first, which only contains the basic information of the query, and then further design many intermediate states, each intermediate state contains optional plans for part of the plan tree, forming a plan forest.
c、训练过程分为成本训练和延迟调整。其中成本训练通过强化学习方法不断选取执行计划中片段,判断新选计划的操作是否符合最佳计划;在这过程中使用Tree-LSTM模型中Q值检索方法,初步确定计划的优劣。延迟调整时,只有少数计划延迟被用作训练数据进行模型的微调。c. The training process is divided into cost training and delay adjustment. Among them, the cost training continuously selects fragments in the execution plan through the reinforcement learning method to judge whether the operation of the newly selected plan conforms to the optimal plan; in this process, the Q value retrieval method in the Tree-LSTM model is used to initially determine the pros and cons of the plan. When delay tuning, only a few planned delays are used as training data for model fine-tuning.
d、DQN使用Q网络估计,并找到那个执行树更好。在计划树,有三类叶子节点,包括列、表和操作。使用深度优先搜索遍历计划树,Tree-LSTM网络层判断每个叶子节点表 现代价。d. DQN uses Q network estimation and finds which execution tree is better. In the plan tree, there are three types of leaf nodes, including columns, tables, and operations. Use the depth-first search to traverse the plan tree, and the Tree-LSTM network layer judges that each leaf node represents the cost.
4)当侦测到的是简单SQL语句时,则基于语句特征和数据分布,利用机器学习算法,在计划缓冲区中,搜索可选执行计划列表,匹配相似计划作为最终的执行计划。具体过程如下:4) When a simple SQL statement is detected, based on the statement characteristics and data distribution, the machine learning algorithm is used to search the list of optional execution plans in the plan buffer and match similar plans as the final execution plan. The specific process is as follows:
a、构建缓冲区,在简单SQL语句第一次被执行时,将其计划加入缓冲区。a. Build a buffer. When a simple SQL statement is executed for the first time, add its plan to the buffer.
b、在简单SQL语句第二次及后续执行中,基于90:10策略,90%的操作是在缓冲区中获取计划,10%的操作是重新生成计划后执行。重新生成计划若在缓冲区中不存在,则加入缓冲区中作为待选计划,每个语句最多存y个待选计划,其中,y可自定义,例如,y=5。b. In the second and subsequent executions of the simple SQL statement, based on the 90:10 strategy, 90% of the operations are to obtain the plan in the buffer, and 10% of the operations are to be executed after regenerating the plan. If the regenerated plan does not exist in the buffer, it will be added to the buffer as a candidate plan, and each statement can store up to y candidate plans, where y can be customized, for example, y=5.
c、通过K最近邻(k-nearest neighbor,KNN)分类算法,将新执行语句特征与缓冲计划中信息进行匹配,选择匹配成功的执行计划。也就是说匹配计划为该语句真正可执行计划。c. Match the characteristics of the new execution statement with the information in the buffer plan through the K-nearest neighbor (KNN) classification algorithm, and select the execution plan that matches successfully. That is to say, the matching plan is the real executable plan of the statement.
5)数据库执行引擎执行上述优化后的SQL语句执行计划,即执行最终的目标物理计划。5) The database execution engine executes the above-mentioned optimized SQL statement execution plan, that is, executes the final target physical plan.
二、训练数据收集器2022. Training data collector 202
在本申请实施例中,训练数据收集器202,用于根据数据库中进程的运行数据得到训练数据,并基于得到的这些训练数据构建m个训练集,m≥1。In the embodiment of the present application, the training data collector 202 is configured to obtain training data according to the running data of the processes in the database, and construct m training sets based on the obtained training data, where m≥1.
具体地,在本申请的一些实施方式中,训练数据收集器202可自动收集数据库的统计信息,包括数据库运行指标、查询日志、系统日志等,利用这些信息生成本数据库管理系统所涉及的所有学习型模型(即建议器205(若有)、优化器201以及模型评估器204中所包括的模型)的训练数据,并可以分别针对不同的模型生成对应的训练集(即构建m个训练集)。Specifically, in some embodiments of the present application, the training data collector 202 can automatically collect statistical information of the database, including database operation indicators, query logs, system logs, etc., and use these information to generate all the learning data involved in the database management system. type model (i.e. the model included in the suggester 205 (if any), the optimizer 201 and the model evaluator 204), and can generate corresponding training sets for different models (i.e. construct m training sets) .
作为一种实例,假设优化器201、建议器205以及模型评估其204中一共包括6种模型,那么可基于各自的模型特点生成对应的6种不同的训练集,即m=6,在这种情况下,一个模型对应有一个训练集。在本申请的另一些实施方式中,构建的训练集也可以少于6种,即m<6,在这种情况下,某些模型可共用一个训练集,具体本申请对构建训练集的数量有自己训练集与模型的对应关系不做限定。As an example, assuming that the optimizer 201, the recommender 205, and the model evaluation 204 include 6 models in total, then 6 different training sets corresponding to each can be generated based on the characteristics of the respective models, that is, m=6, in this In the case, a model corresponds to a training set. In other embodiments of the present application, the training set constructed can also be less than 6 types, that is, m<6. In this case, some models can share one training set. There is no limit to the corresponding relationship between the training set and the model.
需要说明的是,本申请实施例所涉及的模型可以是事先经过预训练的,即部署在该数据库管理系统中的模型均是指经过预训练后的模型,在这种情况下,本申请实施例训练数据收集器202收集得到的训练数据可用于对预训练后的模型进行微调(例如,在模型应用一段时间后,出现性能下降时,可进行微调);本申请实施例所涉及的模型也可以是事先没有经过预训练的,而是直接将初始化的模型部署于该数据库管理系统,再采用训练数据收集器202收集得到的训练数据对各个模型进行训练以及后续的微调,具体本申请对数据库管理系统所包括的模型部署时的状态(即是否经过预训练)不做限定。It should be noted that the models involved in the embodiments of this application may be pre-trained in advance, that is, the models deployed in the database management system all refer to pre-trained models. In this case, the implementation of this application For example, the training data collected by the training data collector 202 can be used to fine-tune the pre-trained model (for example, fine-tuning can be performed when the performance of the model declines after a period of application); the model involved in the embodiment of the present application can also be It may not be pre-trained in advance, but directly deploy the initialized model to the database management system, and then use the training data collected by the training data collector 202 to train each model and perform subsequent fine-tuning. The deployment state of the model included in the management system (that is, whether it has been pre-trained) is not limited.
还需要说明的是,在本申请实施例中,训练数据收集器202可以从多个方面收集数据库中进程的运行数据,包括但不限于:1)数据库指标:数据库的运行状态,如每秒查询数(query per second,QPS)、CPU使用率、缓存命中率等。这些通常由时间序列数据表示;2)SQL查询:其收集SQL查询及其统计信息,如物理计划、响应时间和持续时间等。3) 数据库日志:其收集运行日志,由于数据库管理系统中不同的模型需要不同的训练数据,本申请实施例可智能地为不同的学习模块组织训练数据,包括将相关列组织到同一个表中以减少连接开销、为模型选择训练数据等。It should also be noted that, in this embodiment of the application, the training data collector 202 can collect the running data of the processes in the database from various aspects, including but not limited to: 1) Database indicators: the running status of the database, such as queries per second Number (query per second, QPS), CPU usage, cache hit rate, etc. These are usually represented by time-series data; 2) SQL queries: which collect SQL queries and their statistics, such as physical plan, response time and duration, etc. 3) Database log: It collects running logs. Since different models in the database management system require different training data, this embodiment of the application can intelligently organize training data for different learning modules, including organizing related columns into the same table To reduce connection overhead, select training data for the model, etc.
综上所述,该训练数据收集器202的调用逻辑为:训练数据收集器202接收来自数据库中进程的运行数据(如,数据库Agent程序的采集信息),并对接收到的运行数据进行数据清理和数据加工操作(如,经过数据清洗、数据合并、多指标直接关联分析等操作,从而使得数据更加适合后续的模型训练或微调),得到训练数据,该训练数据被构建为m个训练集,用于数据库管理系统中各个模型的训练或微调。若模型是未经过预训练的模型,则基于指定的算法和训练数据,对该模型进行训练;若模型是已经经过预训练的模型,则评估新得到的训练数据及该已预训练模型之间的关系,并持续监测,决定是否需要更新模型;监测时间基于训练模型的数据来源,是否属于频度很高且易于变化数据。In summary, the invocation logic of the training data collector 202 is: the training data collector 202 receives the running data (such as the collection information of the database Agent program) from the process in the database, and performs data cleaning on the received running data And data processing operations (for example, after data cleaning, data merging, multi-indicator direct correlation analysis and other operations, so that the data is more suitable for subsequent model training or fine-tuning), the training data is obtained, and the training data is constructed into m training sets, It is used for training or fine-tuning of each model in the database management system. If the model is a model that has not been pre-trained, train the model based on the specified algorithm and training data; if the model is a model that has been pre-trained, evaluate the relationship between the newly obtained training data and the pre-trained model relationship, and continuously monitor to decide whether to update the model; the monitoring time is based on the data source of the training model, whether it is high frequency and easy to change data.
三、模型管理器2033. Model Manager 203
在本申请实施例中,模型管理器203,用于在第一目标模型满足某预设要求(可称为第二预设要求)的情况下,采用与该第一目标模型对应的目标训练集对该第一目标模型进行微调(finetune),从而得到第二目标模型(该第二目标模型实质就是更新了模型参数的第一目标模型)。其中,该第一目标模型为所述n个模型中的一个,该目标训练集为所述m个训练集中的一个。之后,可再将finetune后的第二目标模型的模型参数传递给模型评估器204进行模型性能的评估。In the embodiment of the present application, the model manager 203 is configured to use the target training set corresponding to the first target model when the first target model satisfies a certain preset requirement (which may be referred to as the second preset requirement). The first target model is fine-tuned to obtain a second target model (the second target model is essentially the first target model with model parameters updated). Wherein, the first target model is one of the n models, and the target training set is one of the m training sets. Afterwards, the model parameters of the finetune second target model can be passed to the model evaluator 204 for model performance evaluation.
这里需要说明的是,m与n的取值可以相同,也可以不相同,若m=n,则说明n个模型中的每个模型都对应有一个自身使用的训练集,若m≠n,则说明n个模型中可能存在多个模型共用一个训练集的情形(即m<n的情况),也可能存在一个模型可有多个训练集用于进行训练的情形(即m>n的情况),具体本申请对此不作限定。What needs to be explained here is that the values of m and n can be the same or different. If m=n, it means that each of the n models corresponds to a training set used by itself. If m≠n, Then it shows that among n models, there may be multiple models sharing a training set (that is, the case of m<n), and there may also be a situation where a model can have multiple training sets for training (that is, the case of m>n ), the specific application is not limited to this.
还需要说明的是,在本申请的一些实施方式中,第一目标模型满足第二预设要求包括但不限于:1)第一目标模型的性能开始出现性能下降的情况;2)第一目标模型的性能不仅出现下降,并且下降的程度达到某个预设值(可称为第二预设阈值);3)对该第一目标模型的实时性能进行评估并预测接下来的性能表现,例如,可通过模型评估器204对该第一目标模型的性能进行预测,假设预测出的第一目标模型的性能下降的概率达到某个预设值(可称为第三预设阈值),如预测的性能要下降的概率达到80%;4)第一目标模型持续运行的时长达到某个预设时长,如,第一目标模型持续运行的时长已达到30分钟。It should also be noted that, in some embodiments of the present application, the first target model meeting the second preset requirements includes but is not limited to: 1) the performance of the first target model starts to decline; 2) the first target model The performance of the model not only declines, but the degree of decline reaches a certain preset value (which may be called the second preset threshold); 3) Evaluate the real-time performance of the first target model and predict the next performance, for example , the performance of the first target model can be predicted by the model evaluator 204, assuming that the predicted probability of performance degradation of the first target model reaches a certain preset value (which can be called a third preset threshold), such as predicting The probability that the performance of the first target model will decrease reaches 80%; 4) The continuous running time of the first target model reaches a certain preset time period, for example, the continuous running time of the first target model has reached 30 minutes.
在本申请实施例中,模型管理器203集成了常用的机器学习能力,用于提供统一的应用接入接口,支持学习模型的管理和调度。具体地,模型管理器203根据训练数据收集器202更新的训练数据,生成更优的模型,以符合当前系统运行状态。In the embodiment of the present application, the model manager 203 integrates commonly used machine learning capabilities to provide a unified application access interface and support management and scheduling of learning models. Specifically, the model manager 203 generates a better model according to the training data updated by the training data collector 202 to conform to the current system running state.
综上所述,该模型管理器203的调用逻辑为:模型管理器203接收到训练数据后,判断是否需要更新模型,如果需要更新,则完成更新模型后,将模型的模型参数传给模型验证器204。In summary, the invocation logic of the model manager 203 is: after receiving the training data, the model manager 203 judges whether the model needs to be updated, and if it needs to be updated, after the model is updated, the model parameters of the model are passed to the model for verification device 204.
四、模型评估器2044. Model evaluator 204
在本申请实施例中,模型评估器204,用于评估得到的第二目标模型的性能,并在该 第二目标模型的性能满足某预设要求(可称为第三预设要求)的情况下,将优化器201中的第一目标模型更新为该第二目标模型。作为一种示例,更新的过程具体可以是:模型评估器204将该第二目标模型的模型参数向优化器201发送,优化器201将接收到的更新的模型参数赋于该第一目标模型,从而得到该第二目标模型。需要注意的是,在本申请的一些实施方式中,该模型评估器204可以是基于图嵌入的性能预测模型。In the embodiment of the present application, the model evaluator 204 is used to evaluate the performance of the obtained second target model, and when the performance of the second target model satisfies a certain preset requirement (which may be referred to as the third preset requirement) Next, update the first objective model in the optimizer 201 to the second objective model. As an example, the update process may specifically be: the model evaluator 204 sends the model parameters of the second target model to the optimizer 201, and the optimizer 201 assigns the received updated model parameters to the first target model, Thus the second target model is obtained. It should be noted that, in some embodiments of the present application, the model evaluator 204 may be a performance prediction model based on graph embedding.
需要说明的是,在本申请的一些实施方式中,第二目标模型满足第三预设要求可以包括但不限于:1)该第二目标模型的性能相比第一目标模型的性能提高了某个预设值(可称为第四预设阈值),作为一个示例,第四预设阈值可以为零,说明只要第二目标模型的性能达到了原来第一目标模型的性能的水平,就认为第二目标模型满足第三预设要求;作为另一示例,第四预设阈值也可以是大于零的某个值或某个比例,说明只有第二目标模型的性能相对原来第一目标模型的性能提高到了一定程度,才认为第二目标模型满足第三预设要求;2)第二目标模型的性能相比数据库内原生内核组件的性能提高了第五预设阈值。即验证该第二目标模型相对于传统数据库算法的性能提升,如果性能提升达到一定阈值,则实际替换数据相应模块使用的模型,否则,还是采用传统数据库算法执行目标物理计划。其中,第五预设阈值的取值可以为零,也可以是大于零的某个值或某个比例,具体请参阅上述第一种方式,此处不予赘述。It should be noted that, in some embodiments of the present application, the second target model meeting the third preset requirements may include but not limited to: 1) The performance of the second target model is improved by a certain amount compared with the performance of the first target model A preset value (can be called the fourth preset threshold), as an example, the fourth preset threshold can be zero, indicating that as long as the performance of the second target model reaches the performance level of the original first target model, it is considered The second target model satisfies the third preset requirement; as another example, the fourth preset threshold can also be a certain value or a certain ratio greater than zero, indicating that only the performance of the second target model is compared with that of the original first target model. The second target model is considered to meet the third preset requirement only when the performance has been improved to a certain extent; 2) The performance of the second target model is improved by the fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan. Wherein, the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
还需要说明的是,在本申请的一些实施方式中,若第二目标模型的性能未满足第三预设要求,则该模型评估器204还将用于触发该数据库采用数据库内原生内核组件生成最终待执行的目标物理计划,例如,索引选择模块启用传统的爬山算法新建索引,以执行SQL语句的逻辑。也就是说,如果finetune后的第二目标模型的性能依然没有达到要求,就采用数据库传统内核组件算法来生成目标物理计划。由于数据库的原生内核组件并未删除,而是与新增的优化器并存于数据库软件中,因此,在数据库运行过程中,哪种方式能提升数据库性能,就实时进行动态调整,从整体上提高了数据库性能。It should also be noted that, in some implementations of the present application, if the performance of the second target model does not meet the third preset requirement, the model evaluator 204 will also be used to trigger the database to be generated using native kernel components in the database. The final target physical plan to be executed, for example, the index selection module enables the traditional hill-climbing algorithm to create an index to execute the logic of the SQL statement. That is to say, if the performance of the second target model after finetune still does not meet the requirements, the traditional kernel component algorithm of the database is used to generate the target physical plan. Since the original kernel components of the database are not deleted, but coexist with the newly added optimizer in the database software, therefore, during the operation of the database, which method can improve the performance of the database will be dynamically adjusted in real time to improve overall performance. database performance.
还需要说明的是,在本申请的另一些实施方式中,若第二目标模型的性能未满足第三预设要求,模型评估器204除了可以用于触发该数据库采用数据库内原生内核组件生成最终待执行的目标物理计划,也可以将模型更新失败的信息(即第二目标模型的性能未满足所述第三预设要求)向模型管理器203反馈,以使得该模型管理器203基于该信息调整对第一目标模型的微调策略,为后续模型训练策略提供参考。It should also be noted that, in other embodiments of the present application, if the performance of the second target model does not meet the third preset requirement, the model evaluator 204 can be used to trigger the database to use native kernel components in the database to generate the final The target physical plan to be executed may also feed back the information that the model update fails (that is, the performance of the second target model does not meet the third preset requirement) to the model manager 203, so that the model manager 203 can based on the information Adjust the fine-tuning strategy for the first target model to provide a reference for subsequent model training strategies.
基于上述所述,模型评估器是为了验证模型是否对工作负载有效,如果数据库采用了学习型模型,则可以利用模型评估器204预测模型性能。对于本申请实施例提供的数据库管理系统中所部署的模型,本申请都可通过该模型评估器204进行性能预测。如果模型的性能变好(如,性能提升达到一定阈值),则将得到的新的模型(即原模型更新了模型参数)标识为最佳模型,可实际部署到该数据库管理系统;否则标识为需要更新模型,放弃部署。Based on the above, the purpose of the model evaluator is to verify whether the model is effective for the workload. If the database uses a learning model, the model evaluator 204 can be used to predict the performance of the model. For the models deployed in the database management system provided by the embodiment of the present application, the present application can perform performance prediction through the model evaluator 204 . If the performance of the model becomes better (for example, the performance improvement reaches a certain threshold), the new model obtained (that is, the original model has updated model parameters) is marked as the best model, which can be actually deployed to the database management system; otherwise, it is marked as The model needs to be updated, abandoning the deployment.
综上所述,该模型评估器204的调用逻辑为:模型评估器204获取来自模型管理器203生成的最新模型,并验证其模型是否稳定可靠,同时可以提升系统性能。验证结果反馈给模型管理器203,标识其为最佳模型或需要重新更新模型。To sum up, the calling logic of the model evaluator 204 is: the model evaluator 204 obtains the latest model generated by the model manager 203, and verifies whether the model is stable and reliable, and at the same time can improve system performance. The verification result is fed back to the model manager 203, which identifies it as the best model or needs to re-update the model.
为便于理解,下面基于模型评估器204的原理,对模型评估器204的执行过程进行详 细阐述,具体请参阅图4,图4为本申请实施例提供的模型评估器的一个原理示意图,该模型评估器对本申请实施例构建的数据库管理系统中部署的模型(如,优化器、建议器中所包括的模型)进行性能评估,校验其是否可以获得性能提升,如果新的模型没有提升性能,则放弃模型的部署。具体地,在本申请的一些实施方式中,该模型评估器可以是基于GNN实现的,其结构如图4所示。其核心步骤如下:For ease of understanding, the execution process of the model evaluator 204 is described in detail below based on the principle of the model evaluator 204. For details, please refer to FIG. 4. FIG. The evaluator evaluates the performance of the model deployed in the database management system constructed in the embodiment of the present application (such as the model included in the optimizer and the recommender), and checks whether it can obtain performance improvement. If the new model does not improve performance, then abandon the deployment of the model. Specifically, in some embodiments of the present application, the model evaluator may be implemented based on GNN, and its structure is shown in FIG. 4 . Its core steps are as follows:
1)首先,通过将用户待执行的负载进行向量化表示,然后调用基于GNN的性能预测模型进行评估,结合新旧两个模型下的负载特征即可给出预估的执行效果,进而比较新模型是否有效。1) First, by vectorizing the load to be executed by the user, and then calling the GNN-based performance prediction model for evaluation, the estimated execution effect can be given by combining the load characteristics of the old and new models, and then comparing the new model is it effective.
2)其次,再将将工作负载图输入到预测模型中,若图太大,可能会影响预测效率。因此,本申请实施例提出了一种图压缩算法,它删除冗余顶点并合并相似的顶点。具体执行过程如下:2) Secondly, input the workload graph into the prediction model. If the graph is too large, it may affect the prediction efficiency. Therefore, the embodiment of the present application proposes a graph compression algorithm, which deletes redundant vertices and merges similar vertices. The specific implementation process is as follows:
a、首先,在负载图构建中,利用图模型来捕获工作负载特征,其中顶点表示从查询计划中提取的算子特征,两个算子之间的边表示它们之间的查询相关性和资源竞争。a. First, in the construction of the load graph, a graph model is used to capture the workload characteristics, where the vertices represent the operator characteristics extracted from the query plan, and the edges between two operators represent the query correlation and resources between them compete.
b、其次,性能预测模型将特征输入预测模型,在该模型中,本申请提出了一种图嵌入算法,在算子级别嵌入图特征(如,算子特征和K跳邻居),并构建一个深度学习模型来预测查询性能。b. Second, the performance prediction model feeds features into the prediction model, in which this application proposes a graph embedding algorithm that embeds graph features (e.g., operator features and K-hop neighbors) at the operator level, and constructs a Deep learning models to predict query performance.
c、此外,若图太大,通过负载图优化程序中的图压缩算法,通过合并存在时间重叠的节点减小负载图的规模,使用的方法是先根据每个节点的执行时间范围将存在时间重叠的节点进行聚类,然后通过最少完全连接子图(clique)划分在各个类内将没有边关系的节点进行合并。c. In addition, if the graph is too large, use the graph compression algorithm in the load graph optimization program to reduce the size of the load graph by merging nodes with overlapping time. The method used is to first divide the existing time according to the execution time range of each node The overlapping nodes are clustered, and then the nodes without edge relationship are merged within each class by least fully connected subgraph (clique).
d、最终预测查询性能,验证输入模型是否带来收益。d. Finally, predict query performance and verify whether the input model brings benefits.
五、建议器2055. Adviser 205
现有的数据库监控、配置、诊断、优化方法(如,参数调优、慢SQL诊断、索引/视图顾问)依赖DBA,成本高,无法适应大规模实例(如云数据库)。因此,为了解决该问题,本申请可以基于机器学习方法,实现自监控、自诊断、自优化的功能,以便自动和智能地优化数据库。Existing database monitoring, configuration, diagnosis, and optimization methods (eg, parameter tuning, slow SQL diagnosis, index/view consultant) rely on DBAs, are costly, and cannot be adapted to large-scale instances (such as cloud databases). Therefore, in order to solve this problem, this application can realize the functions of self-monitoring, self-diagnosis and self-optimization based on machine learning methods, so as to automatically and intelligently optimize the database.
因此,在本申请实施例中,数据库管理系统还可以进一步包括建议器205,该建议器205中包括p个模型,其中,p≥1。该建议器205,用于发现数据库中进程的运行数据(如,CPU利用率、用户响应时间等)中存在的异常情况(即发现异常数据),并基于得到的异常数据诊断出异常原因,之后,再基于得到的异常原因优化与该异常原因对应的优化模块(该优化模块也位于建议器205内,优化模块的作用是用于对数据库进行参数调优),以减小数据库中进程的运行数据发生异常的概率。这里需要注意的是,若本申请实施例构建的数据库管理系统包括该建议器205,则上述所述的第一目标模型不仅可以是优化器201中n个模型中的任意一个,也可以是建议器205中p个模型中的任意一个。Therefore, in the embodiment of the present application, the database management system may further include a suggester 205, and the suggester 205 includes p models, where p≧1. The suggester 205 is used to discover abnormal conditions (that is, find abnormal data) in the running data (such as CPU utilization rate, user response time, etc.) of processes in the database, and diagnose the cause of the abnormality based on the obtained abnormal data, and then , then optimize the optimization module corresponding to the abnormal cause based on the obtained abnormal cause (the optimization module is also located in the suggester 205, and the function of the optimization module is to optimize the parameters of the database), so as to reduce the running time of the process in the database The probability of anomalies in the data. It should be noted here that if the database management system constructed in the embodiment of the present application includes the suggester 205, the above-mentioned first target model can not only be any one of the n models in the optimizer 201, but can also be a suggestion Any one of the p models in the device 205.
需要说明的是,在本申请的一些实施方式中,建议器205具体可以包括三个模型,可以分别称为编解码器、模型D、模型E,分别用于执行数据库的自监控、自诊断以及自优化这三个步骤。这里需要注意的是,在本申请的另一些实施方式中,建议器205还可以包 括更多或更少的模型,用于实现数据库的自监控、自诊断以及自优化的过程,在本申请实施例中,建议器205包括是三个模型仅为示意,后续不予赘述。It should be noted that, in some embodiments of the present application, the recommender 205 may specifically include three models, which may be called codec, model D, and model E, respectively, and are respectively used to perform self-monitoring, self-diagnosis and Self-optimize these three steps. It should be noted here that in other implementations of this application, the suggester 205 may also include more or fewer models for realizing the process of self-monitoring, self-diagnosis and self-optimization of the database. In this example, the suggester 205 includes three models for illustration only, and details will not be described later.
具体地,该建议器205通过编解码器,对数据库中进程的运行数据进行编码后再解码,从而得到编码数据,并将该编码数据与输入该编解码器的运行数据进行比对,得到异常数据。这里需要说明的是,利用编解码器得到异常数据的原理是:编解码器能还原正常的原始数据,而不能还原异常的原始数据,这样将输入的原始数据编码后再解码,得到编码数据,编码数据再与原始数据对比,就能得知是否存在异常数据。在得到异常数据后,若该运行数据属于系统指标数据(如,页面故障),则该建议器205可以进一步通过模型D,根据该异常数据诊断得到异常原因,其中,该模型D为基于深度学习算法构建的模型,例如,该模型D中可以包括LSRM模型以及分类器,具体地,建议器205通过调用LSRM模型将发现的异常数据编码成一个压缩向量(即降维/升维后的向量),然后利用学习型的分类器(如,二分类器、多分类器等)推理相应的根因(如,数据库备份操作);若该运行数据属于查询指标数据(如,平均延迟),则该建议器205还可以进一步通过模型E,根据该异常数据诊断得到异常原因,其中,该模型E为基于深度学习算法构建的模型。例如,该模型E可以包括Tree-LSTM模型以及softmax函数,具体地,建议器205通过调用Tree-LSTM模型对慢查询(即查询的执行时间长)进行编码,定位导致异常的物理算子(即执行算子)操作,再通过使用softmax函数识别导致异常的根因。Specifically, the suggester 205 uses a codec to encode and then decode the running data of the process in the database to obtain the coded data, and compares the coded data with the running data input to the codec to obtain an abnormal data. What needs to be explained here is that the principle of using the codec to obtain abnormal data is: the codec can restore the normal original data, but not the abnormal original data, so that the input original data is encoded and then decoded to obtain the encoded data. The encoded data is compared with the original data to know whether there is abnormal data. After obtaining the abnormal data, if the operation data belongs to the system index data (such as page failure), the suggester 205 can further diagnose the cause of the abnormality based on the abnormal data through the model D, wherein the model D is based on deep learning A model constructed by an algorithm, for example, the model D may include an LSRM model and a classifier. Specifically, the suggester 205 encodes the abnormal data found by calling the LSRM model into a compressed vector (that is, a dimension-reduced/up-dimensional vector) , and then use learning-type classifiers (such as binary classifiers, multi-classifiers, etc.) to reason about the corresponding root cause (such as database backup operations); if the running data belongs to query index data (such as average delay), then the The suggester 205 can further use the model E to diagnose and obtain the cause of the abnormality based on the abnormal data, wherein the model E is a model constructed based on a deep learning algorithm. For example, the model E may include a Tree-LSTM model and a softmax function. Specifically, the suggester 205 encodes a slow query (that is, the execution time of the query is long) by calling the Tree-LSTM model, and locates the physical operator that causes the exception (that is, Execute the operator) operation, and then use the softmax function to identify the root cause of the abnormality.
最后,建议器205在基于自监控、自诊断找到数据异常的根因后,会根据导致数据库系统性能下降的根因,选择相应的优化模块进行优化,比如,如果是因为缺少索引导致性能下降,建议器205可以调用索引选择模块,基于深度强化学习建立新的索引,使查询负载的性能得以提升(根因是索引没有建议);如果是由于参数导致的性能下降,建议器205可以分别从查询级别、连接级别、系统级别进行基于经验规则或强化学习的参数调优,优化的目的是使得系统尽可能少的出现异常的运行数据。Finally, after the suggester 205 finds the root cause of the data anomaly based on self-monitoring and self-diagnosis, it will select the corresponding optimization module for optimization according to the root cause of the performance degradation of the database system. For example, if the performance degradation is caused by the lack of indexes, The suggester 205 can call the index selection module to build a new index based on deep reinforcement learning, so that the performance of the query load can be improved (the root cause is that there is no suggestion for the index); Parameter tuning based on empirical rules or reinforcement learning at the level, connection level, and system level. The purpose of optimization is to make the system have as few abnormal operating data as possible.
这里需要注意的是,建议器205所包括的优化模块与根因之间存在对应关系(但不一定是一一对应的关系),例如,可能一个优化模块对应一系列(即多个)的根因,也可能一个优化模块对应一个根因,具体本申请对此不做限定。作为一种示例,表1为建议器205中的优化模块与部分根因之间的对应关系的示意表。It should be noted here that there is a corresponding relationship (but not necessarily a one-to-one correspondence) between the optimization modules included in the suggester 205 and the root causes. For example, one optimization module may correspond to a series (that is, multiple) root causes It is also possible that one optimization module corresponds to one root cause, which is not limited in this application. As an example, Table 1 is a schematic diagram of the corresponding relationship between optimization modules in the suggester 205 and some root causes.
表1、优化模块与部分根因之间的对应关系Table 1. Correspondence between optimization modules and some root causes
Figure PCTCN2022111991-appb-000003
Figure PCTCN2022111991-appb-000003
基于上述所述,该建议器205主要用来实现如下三方面的功能:Based on the above, the suggester 205 is mainly used to implement the following three functions:
(1)自监控(1) Self-monitoring
自监控数据库状态,并提供数据库运行时的运行数据(如,CPU使用率、响应时间、运行日志)。对于异常检测,本申请利用一种编解码器,以根据数据分布和指标相关性自动检测异常。具体地,通过编码器将运行数据转换为低维表示,并使用解码器对该低维表示进行恢复。若该编解码器不能很好地重建的数据则被视为异常数据。Self-monitoring database status, and providing running data (such as CPU usage, response time, running log) when the database is running. For anomaly detection, this application utilizes a codec to automatically detect anomalies based on data distribution and metric correlation. Specifically, the running data is converted into a low-dimensional representation by an encoder, and the low-dimensional representation is restored by a decoder. Data that the codec cannot reconstruct well is considered outlier.
(2)自诊断(2) Self-diagnosis
自诊断旨在自动诊断异常,用于发现异常数据产生的根本原因。若异常数据是系统指标数据(如,锁冲突),则通过调用LSRM模型将发现的异常数据编码成一个压缩向量(即降维/升维后的向量),然后利用学习型的分类器(如,二分类器、多分类器等)推理相应的根因(如,数据库备份操作);若异常数据是查询指标数据(如,慢查询),则通过调用Tree-LSTM模型对慢查询进行编码,定位导致异常的物理算子(即执行算子)操作,再通过使用softmax函数识别导致异常的根因。Self-diagnosis aims at automatically diagnosing anomalies and discovering the root causes of abnormal data. If the abnormal data is system index data (for example, lock conflict), the abnormal data found is encoded into a compressed vector (that is, the vector after dimension reduction/dimension enhancement) by calling the LSRM model, and then a learning classifier (such as , binary classifier, multi-classifier, etc.) to infer the corresponding root cause (such as database backup operation); if the abnormal data is query index data (such as slow query), the slow query is encoded by calling the Tree-LSTM model, Locate the physical operator (that is, the execution operator) operation that causes the exception, and then use the softmax function to identify the root cause of the exception.
(3)自优化(3) Self-optimization
自优化针对查询工作负载自动优化数据库,例如,索引/视图推荐。使用深度强化学习来自动推荐索引,学习型视图推荐利用一个编码器-解码器模型来自动推荐视图。自优化针对数据库系统进行调优,学习型调参模块采用深度强化学习技术来调整参数值。本申请实施例可以使用Actor-Critic模型来自动选择合适的参数值,可以支持SQL级、会话级和系统级参数调优。Self-tuning automatically optimizes the database for query workloads, e.g. index/view recommendations. Using deep reinforcement learning to automatically recommend indexes, learned view recommendation utilizes an encoder-decoder model to automatically recommend views. Self-optimization is optimized for the database system, and the learning parameter adjustment module uses deep reinforcement learning technology to adjust parameter values. The embodiment of the present application can use the Actor-Critic model to automatically select appropriate parameter values, and can support SQL-level, session-level and system-level parameter tuning.
综上所述,该建议器205的调用逻辑为:首先动态的收集数据库和查询执行的状态指标,然后利用自监控模块(即编解码器)发现异常数据。对于异常数据,自诊断模块分别利用系统诊断功能(即模型D)、查询诊断功能(即模型E)发现导致数据库性能下降的根因,然后指定自优化模块执行相应的优化功能。例如,如果根因是访问的列没有建立索引,自诊断模块会调用自优化模块的索引选择模块进行优化。To sum up, the invocation logic of the suggester 205 is as follows: first, dynamically collect database and query execution status indicators, and then use the self-monitoring module (ie codec) to find abnormal data. For abnormal data, the self-diagnosis module uses the system diagnosis function (namely model D) and the query diagnosis function (namely model E) to find the root cause of database performance degradation, and then specifies the self-optimization module to perform corresponding optimization functions. For example, if the root cause is that no index is established for the accessed column, the self-diagnosis module will call the index selection module of the self-optimization module for optimization.
需要说明的是,在本申请的一些实施方式中,由于建议器205的模块功能是发现数据库运行过程中可能存在问题,并进行诊断和调优。因此,在本申请的一些实施方式中,该建议器103也可以不需要,本申请对此不做限定。It should be noted that, in some embodiments of the present application, the module function of the suggester 205 is to discover possible problems in the running process of the database, and perform diagnosis and optimization. Therefore, in some implementation manners of the present application, the suggester 103 may not be needed, and the present application does not limit this.
为便于理解,下面基于建议器205的原理,对建议器205的执行过程进行详细阐述,具体请参阅图5,图5为本申请实施例提供的建议器的一个原理示意图,该建议器包括自监控、自诊断、自优化三个部分。其中自监控可以根据数据库中进程的运行数据的性能指标判断数据库历史、当前、未来是否存在问题,判断数据库的异常或可能的异常状态。通过发掘数据库的异常状态来数据库自诊断、自优化功能,来解决数据库的实际问题。为便于阐述,假设该建议器分别包括自监控模块、自诊断模块、自优化模块,分别用于实现自监控、自诊断、自优化,其核心步骤如下:For ease of understanding, based on the principle of the adviser 205, the execution process of the adviser 205 will be described in detail below. Please refer to FIG. 5 for details. FIG. There are three parts: monitoring, self-diagnosis and self-optimization. Among them, self-monitoring can judge whether there are problems in the history, current and future of the database according to the performance indicators of the running data of the process in the database, and judge the abnormal or possible abnormal state of the database. By exploring the abnormal state of the database, the database self-diagnosis and self-optimization functions are used to solve the actual problems of the database. For the sake of illustration, it is assumed that the advisor includes a self-monitoring module, a self-diagnosis module, and a self-optimization module, which are used to realize self-monitoring, self-diagnosis, and self-optimization respectively. The core steps are as follows:
1)自监控模块不断采集数据库性能指标,当数据库内部或外部发生异常时,可以通过相应的指标和系统日志来反映。因此,openGauss通过分析数据库和操作系统指标进行实时异常监控和发现。具体过程如下:1) The self-monitoring module continuously collects database performance indicators. When abnormalities occur inside or outside the database, it can be reflected through corresponding indicators and system logs. Therefore, openGauss performs real-time anomaly monitoring and discovery by analyzing database and operating system indicators. The specific process is as follows:
a、首先,训练数据收集器不断从数据库和操作系统中收集指标和日志,例如QPS、 运行日志等,之后,将这些数据放在一起形成时间序列数据。a. First, the training data collector continuously collects indicators and logs from the database and operating system, such as QPS, operation logs, etc., and then puts these data together to form time series data.
b、其次,采用基于重建的算法来发现异常,即正常的时间序列数据总是有规律的变化模式,异常的变化模式很有可能是系统异常。本申请实施例采用了一个带有注意力层的基于LSTM的自动编解码器。原始时间序列数据被编码为低维表示,解码器解析表示并尝试恢复原始数据。训练损失是重建质量。模型学习这些多维数据的分布,获得重构能力。无法重建(错误超过阈值)的数据被报告为异常。本申请实施例采用统计方法“极值理论”来确定动态阈值。因此,用户需要将系统灵敏度设置为1%或5%,它会根据历史数据计算相应的阈值:本发明首先对训练数据进行标准化,然后将处理后的数据输入时间序列自动编码器更新模型参数,模型具有重建正常数据库指标的能力后,openGauss收集重建误差并计算阈值。b. Secondly, the reconstruction-based algorithm is used to find anomalies, that is, normal time series data always have a regular change pattern, and the abnormal change pattern is likely to be a system anomaly. The embodiment of the present application adopts an LSTM-based automatic codec with an attention layer. Raw time series data is encoded into a low-dimensional representation, and a decoder parses the representation and attempts to recover the original data. The training loss is the reconstruction quality. The model learns the distribution of these multidimensional data and acquires the ability to reconstruct. Data that cannot be reconstructed (errors exceed a threshold) are reported as anomalies. The embodiment of the present application adopts the statistical method "extreme value theory" to determine the dynamic threshold. Therefore, the user needs to set the system sensitivity to 1% or 5%, and it will calculate the corresponding threshold based on historical data: the present invention first normalizes the training data, and then inputs the processed data into the time series autoencoder to update the model parameters, After the model has the ability to reconstruct normal database indicators, openGauss collects the reconstruction error and calculates the threshold.
2)若没有发现异常,则等待一段时间(即预设时长)后,重复执行步骤1);若发现异常,则执行步骤3)。2) If no abnormality is found, wait for a period of time (that is, the preset duration), and then repeat step 1); if abnormality is found, then execute step 3).
3)当发现数据库历史、当前或未来存在问题或潜在问题,则调用自诊断模块进行根因分析。3) When the database history, current or future problems or potential problems are found, the self-diagnosis module is called to conduct root cause analysis.
4)自诊断模块对发现的故障进行判断,若确实存在问题,则给出系统级或SQL语句级问题根因。具体过程如下:数据库自诊断功能可以识别出系统级、SQL语句级的故障或异常根因。其中,系统级故障分析方法采用LSTM+KNN算法实现;SQL语句级故障根因通过Tree-LSTM算法实现。对于定位出故障的根因,则调用自优化功能,给出对应的优化建议,从而进行问题解决。4) The self-diagnosis module judges the found faults, and if there are indeed problems, it gives the root cause of the problem at the system level or SQL statement level. The specific process is as follows: The self-diagnosis function of the database can identify the root causes of faults or abnormalities at the system level and SQL statement level. Among them, the system-level fault analysis method is realized by the LSTM+KNN algorithm; the SQL statement-level fault root cause is realized by the Tree-LSTM algorithm. For locating the root cause of the fault, call the self-optimization function and give corresponding optimization suggestions to solve the problem.
5)通过自诊断给出的根因对数据库系统进行建议,调用自优化模块进行调优。具体过程如下:5) Make suggestions to the database system through the root cause given by the self-diagnosis, and call the self-optimization module for tuning. The specific process is as follows:
a、自优化模块包括针对数据库系统特征优化参数配置。参数推荐是通过深度强化学习来实现的:首先,本申请实施例通过历史学习到的数据库参数配置及其对应表现进行建模,即在被选参数组成的搜索空间中搜索对应表现最优的参数组合;然后,深度强化学习模型将数据库状态和负载特征作为输入状态,根据从历史数据中学习的调参经验,选择合适的参数配置作为输出行为,从而给出最优的数据库参数优化方案。a. The self-optimization module includes optimizing parameter configuration according to the characteristics of the database system. Parameter recommendation is realized through deep reinforcement learning: First, the embodiment of the present application models the database parameter configuration and its corresponding performance through historical learning, that is, searches for the parameter with the best performance in the search space composed of selected parameters Then, the deep reinforcement learning model takes the database state and load characteristics as the input state, and selects the appropriate parameter configuration as the output behavior according to the parameter tuning experience learned from historical data, thus giving the optimal database parameter optimization scheme.
b、自优化模块也包括针对数据库SQL语句的调优,例如,物化视图推荐与索引推荐。其中,物化视图推荐通过RNN与强化学习实现,通过分析用户的负载,采用枚举和评估的手段推荐可以创建的物化视图,用户可以通过创建该类物化视图来实现对负载的加速。索引推荐则是针对负载级别的索引推荐,根据用户的增删查改比例的不同给出与之适配的最优索引配置方案。b. The self-optimization module also includes tuning for database SQL statements, for example, materialized view recommendation and index recommendation. Among them, materialized view recommendation is implemented through RNN and reinforcement learning. By analyzing the user's load, enumeration and evaluation methods are used to recommend materialized views that can be created. Users can create such materialized views to achieve load acceleration. Index recommendation refers to the index recommendation for the load level, and provides the optimal index configuration scheme adapted to it according to the user's addition, deletion, query and modification ratio.
为了对本申请实施例所带来的有益效果有更为直观的认识,以下对本申请实施例所带来的技术效果作进一步的对比,对比结果分别如下:In order to have a more intuitive understanding of the beneficial effects brought by the embodiments of the present application, the technical effects brought by the embodiments of the present application are further compared below, and the comparison results are as follows:
(1)优化器201与三种重写策略(随机重写、自顶向下重写、启发式重写)的对比结果。(1) Comparison results between the optimizer 201 and three rewriting strategies (random rewriting, top-down rewriting, and heuristic rewriting).
具体请参阅图6,图6为本申请实施例提供的优化器与三种重新策略对比的一个示意图,以查询重写为例,本申请实施例将openGauss中的查询重写与三种重写策略(随机重 写、自顶向下重写、和启发式重写)进行比较。对于随机重写和自顶向下重写,本申请实施例在查询优化引擎Calcite中提取了82条重写规则,并用相应的策略重写了查询。此外,本申请实施例使用工具SQL-smith分别为TPC-H和JOB生成15,750和10,673个慢查询(>1s)。如图6所示,本重写策略在所有情况下都优于其他方法,即TPC-H的执行时间减少了49.7%以上,JOB的执行时间减少了36.8%以上。原因主要有两方面:首先,openGauss探索了执行成本低于PostgreSQL中默认自顶向下顺序的重写顺序。例如,使用外连接,PostgreSQL无法将谓词下推到输入表,而openGauss通过先将外连接转换为内连接,然后将谓词下推来解决问题。其次,openGauss中的估计模型预测了潜在的成本降低,openGauss据此选择了执行开销更低的重写顺序。此外,openGauss在TPC-H上的效果比JOB更好,因为TPC-H查询包含许多可以被查询重写优化的子查询,而JOB查询中的多连接将通过计划枚举器进一步优化。Please refer to Figure 6 for details. Figure 6 is a schematic diagram of the comparison between the optimizer and the three rewriting strategies provided by the embodiment of the present application. Taking query rewriting as an example, the embodiment of the present application compares the query rewriting in openGauss with the three rewriting strategies Strategies (random rewriting, top-down rewriting, and heuristic rewriting) are compared. For random rewriting and top-down rewriting, the embodiment of the present application extracts 82 rewriting rules from the query optimization engine Calcite, and rewrites the query with corresponding strategies. In addition, the embodiment of the present application uses the tool SQL-smith to generate 15,750 and 10,673 slow queries (>1s) for TPC-H and JOB respectively. As shown in Figure 6, this rewriting strategy outperforms other methods in all cases, namely, the execution time of TPC-H is reduced by more than 49.7%, and that of JOB is reduced by more than 36.8%. The reason is mainly twofold: First, openGauss explores a rewrite order that is less expensive to execute than the default top-down order in PostgreSQL. For example, with outer joins, PostgreSQL cannot push down predicates to input tables, while openGauss solves the problem by first converting outer joins to inner joins and then pushing down predicates. Second, the estimation model in openGauss predicts potential cost reductions, from which openGauss chooses a rewrite order with lower execution overhead. In addition, openGauss works better on TPC-H than JOB, because TPC-H query contains many subqueries that can be optimized by query rewriting, while multi-joins in JOB query will be further optimized by plan enumerator.
综上所述,本申请实施例提供的数据库管理系统所包括的优化器201可以根据AI模型给出的优化方法进行SQL语句执行过程中的细粒度优化,提高SQL语句的执行效率,提高数据库性能表现。To sum up, the optimizer 201 included in the database management system provided by the embodiment of the present application can perform fine-grained optimization during the execution of SQL statements according to the optimization method given by the AI model, improve the execution efficiency of SQL statements, and improve the performance of the database. Performance.
(2)建议器205(若有)与两种索引策略(默认索引、人工设计的索引)的对比结果。(2) The comparison result between the suggester 205 (if any) and two index strategies (default index, artificially designed index).
具体请参阅表2和表3,表2和表3为本申请实施例提供的建议器与两种索引策略对比的一个示意图,以索引选择为例,本发明在TPC-H和TPC-C上进行了实验,并将本申请实施例建议器与默认索引和人工设计的索引进行了比较。结果如表2和表3所示。本申请的索引选择算法在两个工作负载上的表现都优于默认索引和人工索引。因为本索引选择算法将系统统计信息编码到状态表示中,并能够根据历史数据优化索引选择策略,以便动态更新索引配置。For details, please refer to Table 2 and Table 3. Table 2 and Table 3 are a schematic diagram of the comparison between the suggester provided by the embodiment of this application and the two index strategies. Taking index selection as an example, the present invention is based on TPC-H and TPC-C Experiments were conducted and the proposer of the embodiment of the application was compared with the default index and the artificially designed index. The results are shown in Table 2 and Table 3. The index selection algorithm of the present application outperforms the default and manual indexes on both workloads. This is because the index selection algorithm encodes system statistics into the state representation and is able to optimize the index selection strategy based on historical data in order to dynamically update the index configuration.
表2、索引程序Table 2. Indexing program
 the TPC-H(s)TPC-H(s) TPC-C(tpmC)TPC-C (tpmC)
数据库管理系统openGaussDatabase management system openGauss 122.9122.9 1020210202
数据库管理员DBAdatabase administrator DBA 130.1130.1 1000110001
默认设置default setting 140.8140.8 97009700
表3、异常检测(TPC-C)Table 3. Anomaly detection (TPC-C)
 the 准确率Accuracy 召回率recall rate F1分数F1 score
数据库管理系统openGaussDatabase management system openGauss 0.7950.795 0.7760.776 0.7850.785
变分自编码器VAEVariational Autoencoder VAE 0.3020.302 0.8210.821 0.4410.441
生成式对抗网络GANGenerative Adversarial Network GAN 0.5540.554 0.7450.745 0.6350.635
综上所述,本申请实施例提供的数据库管理系统所包括的优化器205(若有)可及时发现数据库历史、当前以及未来是否存在故障或异常,并根据业务类型以及特征,给出对应的故障根因,并提供最优的优化与配置方案。To sum up, the optimizer 205 (if any) included in the database management system provided by the embodiment of the present application can timely discover whether there are faults or abnormalities in the database history, current and future, and give corresponding solutions according to business types and characteristics. root causes of failures, and provide optimal optimization and configuration solutions.
(3)模型评估器204与两种已知性能评估方法的对比结果。(3) Comparison results between the model evaluator 204 and two known performance evaluation methods.
具体请参阅图7,图7为本申请实施例提供的模型评估器与BAL、DL两种已知性能评估方法对比的一个示意图,其中,BAL估计平均缓冲区访问延迟,并使用线性回归来预测 并发查询的查询延迟;DL采用根据查询计划结构设计的神经网络,来预测单个查询的性能。本申请实施例在JOB上比较了预测精度和预测时间,结果如图7所示,由图7的比较结果可知,本申请实施例提供的模型评估器的错误率最低,比BAL低约29.9倍,比DL低22.5倍。原因有两方面:首先,本申请实施例提供的模型评估器中的工作负载图编码了资源争用等并发因素,与串行执行相比,这使JOB的查询延迟增加了20%以上。相反,BAL收集了缓冲区访问延迟,而DL依赖于单个查询功能。其次,openGauss利用图嵌入网络将结构信息直接映射到性能因素,可以在工作负载变化时提高通用性。相反,BAL使用了一种线性回归方法,该方法需要为单个工作负载提供许多统计样本。此外,由图7可知,本申请实施例提供的模型评估器的预测延迟比BAL和DL都少,并且当并发级别增加时,本申请实施例提供的模型评估器的预测延迟相对稳定。对于openGauss,本申请实施例提供的模型评估器同时预测所有顶点的执行时间。它嵌入了工作负载图中所有顶点的局部化图,因此工作负载的总预测时间接近于预测具有最大局部化图的顶点。而对于BAL,它需要最长的预测时间,因为它在执行工作负载时预测了性能。对于DL,它以自底向上的方式在查询计划树中传播中间数据特征,这比openGauss花费的时间相对较长。Please refer to Figure 7 for details. Figure 7 is a schematic diagram of the comparison between the model evaluator provided by the embodiment of the present application and the two known performance evaluation methods of BAL and DL. Among them, BAL estimates the average buffer access delay and uses linear regression to predict Query latency of concurrent queries; DL uses a neural network designed according to the query plan structure to predict the performance of a single query. The embodiment of this application compares the prediction accuracy and prediction time on JOB, and the results are shown in Figure 7. From the comparison results in Figure 7, it can be seen that the error rate of the model evaluator provided by the embodiment of this application is the lowest, which is about 29.9 times lower than that of BAL , 22.5 times lower than DL. There are two reasons: First, the workload graph in the model evaluator provided by the embodiment of this application encodes concurrency factors such as resource contention, which increases the query delay of JOB by more than 20% compared with serial execution. In contrast, BAL collects buffer access latencies, while DL relies on a single query function. Second, openGauss utilizes a graph embedding network to directly map structural information to performance factors, which can improve generality when workloads vary. Instead, BAL uses a linear regression approach that requires many statistical samples for a single workload. In addition, it can be seen from Fig. 7 that the prediction delay of the model evaluator provided by the embodiment of the present application is less than that of BAL and DL, and when the concurrency level increases, the prediction delay of the model evaluator provided by the embodiment of the present application is relatively stable. For openGauss, the model evaluator provided in the embodiment of this application simultaneously predicts the execution time of all vertices. It embeds the localized graph for all vertices in the workload graph, so the total prediction time for the workload is close to predicting the vertex with the largest localized graph. Whereas for BAL, it takes the longest time to predict because it predicts the performance while executing the workload. For DL, it propagates intermediate data features in the query plan tree in a bottom-up manner, which takes relatively longer time than openGauss.
综上所述,本申请实施例提供的数据库管理系统所包括的模型评估器204可有效、及时的校验新模型是否有效,若有效果,则进行部署,否则就放弃模型更新。To sum up, the model evaluator 204 included in the database management system provided by the embodiment of the present application can effectively and timely check whether the new model is valid, and if it is effective, it will be deployed; otherwise, the model update will be abandoned.
(4)训练数据收集器202以及模型管理器203在已知的数据库管理系统中不存在,本申请构建的数据库管理系统所包括的这两个模块可以保证数据处理的可靠性。(4) The training data collector 202 and the model manager 203 do not exist in the known database management system, and these two modules included in the database management system constructed in this application can ensure the reliability of data processing.
基于上述图1至图7中对本申请实施例构建的数据库管理系统的描述可知,本申请实施例提供了一款自治的数据库框架,其基于机器学习算法和专家经验,实现自学习内核和建议器,构筑数据库全方位的自治功能。Based on the description of the database management system constructed by the embodiment of the present application in Figures 1 to 7 above, it can be seen that the embodiment of the present application provides an autonomous database framework, which implements a self-learning kernel and a suggester based on machine learning algorithms and expert experience , to build a full range of autonomous functions of the database.
具体地,本申请实施例构建的数据库管理系统中内置数据库内核的学习型的优化器,具体可以包括基于MCTS的重写器、基于Tree-LSTM的代价估计器、基于RL的计划生成器,以实现优化器高效查询优化,满足多场景业务诉求;本发明实施例提供的数据库管理系统还可以包括数据库的学习型的建议器,其基于机器学习技术,实现自动异常监控、自动系统诊断、自动慢查询诊断以及自动性能优化(如,参数调优、索引推荐、视图推荐)功能,满足客户一键式运维管理操作,提升运维效率及数据库执行效率;本发明实施例提供的数据库管理系统还可以包括一个高效的模型评估器,其基于机器学习技术,预估数据库管理系统中所部署模型的表现,判断应用对应模型带来的收益,保证该数据库管理系统一直处于高性能和高可靠运行;本发明实施例提供的数据库管理系统还可以包括基于统一接口的训练数据收集器和模型管理器,训练数据收集器自动收集数据库中进程的运行数据,包括数据库运行指标、查询日志、系统日志等,利用这些信息生成数据库管理系统中所部署模型的训练数据;模型管理器则提供统一接口管理、控制模型版本,动态更新和替换各个模块使用的模型。Specifically, the learning-type optimizer built into the database kernel in the database management system constructed in the embodiment of the present application may specifically include an MCTS-based rewriter, a Tree-LSTM-based cost estimator, and an RL-based plan generator. Realize the efficient query optimization of the optimizer to meet the demands of multi-scenario business; the database management system provided by the embodiment of the present invention may also include a learning-type suggester of the database, which is based on machine learning technology to realize automatic abnormal monitoring, automatic system diagnosis, automatic slow Query diagnosis and automatic performance optimization (such as parameter tuning, index recommendation, and view recommendation) functions, satisfy customers with one-click operation and maintenance management operations, improve operation and maintenance efficiency and database execution efficiency; the database management system provided by the embodiment of the present invention also provides It can include an efficient model evaluator, which is based on machine learning technology, estimates the performance of the model deployed in the database management system, judges the benefits brought by the application of the corresponding model, and ensures that the database management system is always running at high performance and high reliability; The database management system provided by the embodiment of the present invention may also include a training data collector and a model manager based on a unified interface. The training data collector automatically collects the running data of processes in the database, including database running indicators, query logs, system logs, etc. Use this information to generate training data for the model deployed in the database management system; the model manager provides a unified interface to manage and control the model version, and dynamically update and replace the models used by each module.
基于上述所述的数据库管理系统,具体可以解决现有传统数据库技术存在的以下技术问题:Based on the above-mentioned database management system, it can specifically solve the following technical problems existing in the existing traditional database technology:
1)传统数据库技术有不同模块和功能,难以选择合适的自动化算法从而获取最大化 性能收益。如果需要把可能存在性能瓶颈的数据库模块替换成学习型模型,首先需要选择合适的机器学习模型或算法。比如,深度学习模型可以被应用到代价估计问题,因为深度学习能够拟合高维数据列的数据特征和访问相关性;深度强化学习可以被应用到参数调优问题,因为深度强化学习不要求预先提供训练数据,可以在少量样本的条件下对连续的高维参数空间做高效的探索。本申请实施例经过模型评估器的性能评估,为复杂多变的业务场景、大并发及高性能业务,提供合适的机器学习模型或算法。1) Traditional database technologies have different modules and functions, and it is difficult to select a suitable automatic algorithm to maximize performance benefits. If you need to replace a database module that may have a performance bottleneck with a learning model, you first need to select an appropriate machine learning model or algorithm. For example, deep learning models can be applied to cost estimation problems, because deep learning can fit data characteristics and access correlations of high-dimensional data columns; deep reinforcement learning can be applied to parameter tuning problems, because deep reinforcement learning does not require prior Provide training data, which can efficiently explore continuous high-dimensional parameter spaces under the condition of a small number of samples. The embodiment of the present application provides a suitable machine learning model or algorithm for complex and changeable business scenarios, large concurrency and high-performance business through the performance evaluation of the model evaluator.
2)模型有效性评估难。在设计一个新的学习型模型部署到数据库之前,需要预先确认上述模型是否较传统算法有性能提升。传统方法往往依赖于专家经验或大量的运行测试,开销大而且评价效率低。本申请实施例通过设计一款预判不同学习型模型性能在典型应用场景下性能的模型评估器。2) It is difficult to evaluate the validity of the model. Before designing a new learning model and deploying it to the database, it is necessary to confirm in advance whether the above model has performance improvement compared with traditional algorithms. Traditional methods often rely on expert experience or a large number of running tests, which have high overhead and low evaluation efficiency. In the embodiment of the present application, a model evaluator is designed to predict the performance of different learning models in typical application scenarios.
3)传统数据库的各个模块都基于经典的启发式或规则定义的算法。然而,将各个模块替换成机器学习模型后,每个机器学习模型都需要收集数据和训练模型,并在场景变化时进行模型更新,如果分别进行上述操作,模型的训练和管理代价非常高。本申请实施例提供统一接口的训练数据收集器和模型管理器,根据采集信息变化,评估模型可用性,进行模型自动更新。3) Each module of a traditional database is based on a classic heuristic or rule-defined algorithm. However, after replacing each module with a machine learning model, each machine learning model needs to collect data and train the model, and update the model when the scene changes. If the above operations are performed separately, the training and management costs of the model are very high. The embodiment of the present application provides a training data collector and a model manager with a unified interface to evaluate the availability of the model and automatically update the model according to changes in the collected information.
在图1至图5所对应的实施例的基础上,下面对应用该数据库管理系统的方法进行介绍,具体请参阅图8,图8为本申请实施例提供的数据处理方法的一个流程示意图,该方法具体可以包括如下步骤:On the basis of the embodiments corresponding to Figures 1 to 5, the method for applying the database management system is introduced below, please refer to Figure 8 for details, Figure 8 is a schematic flow chart of the data processing method provided by the embodiment of the present application , the method may specifically include the following steps:
801、计算机设备接收客户端向数据库发送的SQL语句,该数据库部署于计算机设备内,该数据库内包括优化器以及原生内核组件,该优化器包括n个模型,n≥1。801. The computer device receives the SQL statement sent by the client to the database. The database is deployed in the computer device. The database includes an optimizer and native kernel components. The optimizer includes n models, where n≥1.
首先,本地的计算机设备(即图1中所述的本地设备)接收由客户端设备向该计算机设备中部署的数据库发送的SQL语句,其中,该数据库内包括优化器以及数据库原生内核组件,该优化器包括n个模型,n≥1。First, a local computer device (i.e. the local device described in Figure 1) receives the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes an optimizer and a database native kernel component, the The optimizer includes n models, n≥1.
802、在第一目标模型不满足第二预设要求的情况下,计算机设备根据该SQL语句,通过优化器中包括的n个模型得到目标物理计划,该目标物理计划为执行开销满足第一预设要求的物理计划,该第一目标模型为n个模型中的一个。802. When the first target model does not meet the second preset requirement, the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement, and the target physical plan is that the execution cost meets the first preset requirement. Assuming a required physical plan, the first target model is one of n models.
计算机设备在接收到客户端设备发送的SQL语句后,会先判断优化器中的n个模型是否存在有模型不满足某事先预设的要求(可称为第二预设要求),若存在第一目标模型(即n个模型中的一个)不满足该第二预设要求,则计算机设备会根据该SQL语句,通过优化器中包括的n个模型得到目标物理计划,其中,该目标物理计划为执行开销满足第一预设要求的物理计划。After the computer device receives the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a certain preset requirement (which can be called the second preset requirement). A target model (that is, one of the n models) does not meet the second preset requirement, then the computer device will obtain the target physical plan through the n models included in the optimizer according to the SQL statement, wherein the target physical plan To execute a physical plan whose cost meets the first preset requirement.
需要说明的是,在本申请的一些实施方式中,执行开销满足第一预设要求包括但不限于:1)目标物理计划的执行开销在q个执行开销中开销最低,该q个执行开销为基于输入数据库的SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,其中,q≥1;2)目标物理计划的执行开销低于某个预设值(可称为第一预设阈值)。为便于阐述,在本申请后续实施例中,均以目标物理计划的执行开销在q个执行开销中开销最低作为执行开销满足第一预设要求的情形,后续不予赘述。It should be noted that, in some embodiments of the present application, the execution overhead meeting the first preset requirements includes but is not limited to: 1) The execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are The corresponding execution costs of q physical plans generated based on the SQL statements input into the database, one physical plan corresponds to one execution cost, where q≥1; 2) The execution cost of the target physical plan is lower than a certain preset value (which can be called is the first preset threshold). For ease of illustration, in subsequent embodiments of the present application, the execution cost of the target physical plan is the lowest among the q execution costs as the case where the execution cost meets the first preset requirement, and details will not be described later.
还需要说明的是,在本申请的一些实施方式中,第一目标模型不满足第二预设要求可以包括但不限于:1)第一目标模型的性能没有出现性能下降的情况;2)第一目标模型的性能虽然出现下降,但是下降的程度没有达到某个预设值(可称为第二预设阈值);3)对该第一目标模型的实时性能进行评估并预测接下来的性能表现,假设预测出的第一目标模型的性能下降的概率没有达到某个预设值(可称为第三预设阈值),如预测的性能要下降的概率没有达到80%;4)第一目标模型持续运行的时长没有达到某个预设时长,如,第一目标模型持续运行的时长还没有达到30分钟。It should also be noted that, in some embodiments of the present application, the failure of the first target model to meet the second preset requirements may include but not limited to: 1) the performance of the first target model does not degrade; Although the performance of a target model has declined, the degree of decline has not reached a certain preset value (which can be called a second preset threshold); 3) Evaluate the real-time performance of the first target model and predict the next performance Performance, assuming that the predicted probability of performance degradation of the first target model does not reach a certain preset value (which may be referred to as the third preset threshold), such as the probability that the predicted performance will decline does not reach 80%; 4) the first The duration of continuous running of the target model has not reached a certain preset duration, for example, the duration of continuous running of the first target model has not reached 30 minutes.
还需要说明的是,在本申请的一些实施方式中,作为一种示例,优化器具体可以包括三个模型,可分别称为模型A、模型B、模型C,分别用于执行逻辑查询重写、代价估计以及物理计划生成的步骤。在这种情况下,计算机设备根据该SQL语句,通过优化器中包括的n个模型得到目标物理计划的方式具体可以是:首先,该计算机设备通过模型A,对输入数据库的SQL语句(也可称为SQL查询)进行逻辑查询重写,从而得到重写后的逻辑计划,其中,该模型A为基于树搜索算法构建的模型,例如,可以是蒙特卡洛树搜索算法;之后,再通过模型B,根据该逻辑计划生成q个物理计划,其中,该模型B为基于深度学习算法构建的模型,例如,可以是基于Tree-LSTM的模型,q≥1;最后通过模型C计算与该q个物理计划对应的q个执行开销(一个物理计划对应一个执行开销),并根据该q个执行开销确定最终待执行的目标物理计划,其中,该模型C为基于强化学习算法构建的模型,例如,可以是基于深度强化学习(deep Q-learning,DQN)的模型。It should also be noted that, in some embodiments of the present application, as an example, the optimizer may specifically include three models, which may be called model A, model B, and model C respectively, and are respectively used to perform logical query rewriting , cost estimation and physical plan generation steps. In this case, the way the computer device obtains the target physical plan through the n models included in the optimizer according to the SQL statement can specifically be: first, the computer device uses the model A to input the SQL statement of the database (also can be called SQL query) to rewrite the logical query to obtain the rewritten logical plan, wherein, the model A is a model based on a tree search algorithm, for example, it can be a Monte Carlo tree search algorithm; after that, through the model B, generate q physical plans according to the logical plan, where the model B is a model based on a deep learning algorithm, for example, it can be a model based on Tree-LSTM, q≥1; finally, the q physical plans are calculated by the model C The q execution overheads corresponding to the physical plan (one physical plan corresponds to one execution overhead), and the final target physical plan to be executed is determined according to the q execution overheads, where the model C is a model constructed based on a reinforcement learning algorithm, for example, It can be a model based on deep reinforcement learning (deep Q-learning, DQN).
这里需要注意的是,在本申请的另一些实施方式中,优化器还可以包括更多或更少的模型,用于实现逻辑查询重写、代价估计以及物理计划生成的过程,在本申请实施例中,优化器包括是三个模型仅为示意,后续不予赘述。It should be noted here that in other implementations of this application, the optimizer may also include more or fewer models to implement the process of logical query rewriting, cost estimation, and physical plan generation. In the example, the optimizer includes three models for illustration only, and details will not be described later.
还需要注意的是,在本申请实施例中,该计算机设备所涉及的优化器的具体功能以及调用逻辑可参阅上述图2所对应实施例中描述的优化器201部分,具体此处不予赘述。It should also be noted that in the embodiment of the present application, the specific functions and calling logic of the optimizer involved in the computer device can refer to the part of the optimizer 201 described in the embodiment corresponding to Figure 2 above, and details will not be repeated here. .
803、计算机设备执行该目标物理计划。803. The computer device executes the target physical plan.
计算机设备在得到最终的目标物理计划后,会执行该目标物理计划。这个执行的过程实质就是利用生成的目标物理计划执行输入的该SQL语句的实际逻辑。After the computer equipment obtains the final target physical plan, it will execute the target physical plan. The essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
在本申请上述实施方式中,阐述了计算机设备如何基于数据库中所包括的优化器得到目标物理计划并最终执行该目标物理计划,该计算机设备中所部署的数据库包括具有n个模型的优化器,从而替代了传统启发式优化器,通过与机器学习技术相结合,实现了将逻辑查询转换为执行效率更高的物理执行计划,并可以有效解决当前数据库体系结构问题导致的代价评估不准确、复杂SQL语句生成物理计划差的问题。In the above embodiments of the present application, it is explained how the computer device obtains the target physical plan based on the optimizer included in the database and finally executes the target physical plan. The database deployed in the computer device includes an optimizer with n models, Therefore, it replaces the traditional heuristic optimizer. By combining with machine learning technology, it realizes the conversion of logical queries into physical execution plans with higher execution efficiency, and can effectively solve the inaccurate and complex cost evaluation caused by the current database architecture problems. Problem with poor physical plan generation for SQL statements.
需要说明的是,在本申请的一些实施方式中,该计算机设备还可以将部署于其中的数据库中进程的运行数据向建议器发送,该建议器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。建议器接收到该运行数据,可基于该运行数据发送异常数据,并基于得到的异常数据诊断出异常原因,最后基于异常原因优化与该异常原因对应的自优化模块,以减小后续数据库中进程的运行数据发生异常的概率,其中,建议器包括p个模型,p≥1。It should be noted that, in some embodiments of the present application, the computer device can also send the running data of the processes in the database deployed in it to the recommender, and the recommender can be deployed in the computer device or can not be deployed For the remote device, there is no limitation here. After receiving the operation data, the advisor can send abnormal data based on the operation data, and diagnose the cause of the abnormality based on the obtained abnormal data, and finally optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the subsequent process in the database. The probability of anomalies in the running data of , where the suggester includes p models, p≥1.
需要注意的是,在本申请实施例中,本申请实施例所涉及的建议器的具体功能以及调用逻辑可参阅上述图2所对应实施例中描述的建议器205部分,具体此处不予赘述。It should be noted that, in the embodiment of the present application, the specific functions and calling logic of the recommender involved in the embodiment of the present application can refer to the part of the recommender 205 described in the above-mentioned embodiment corresponding to FIG. 2 , and details will not be repeated here. .
在本申请上述实施方式中,具体阐述了计算机设备还可以将数据库中进程的运行数据反馈给建议器,建议器可以基于该运行数据给出数据库全方位的优化建议,可以实现无人值守的数据库性能监控、根因识别,极大解放运维人力,并可以帮助数据库系统迅速恢复异常或提升性能。In the above-mentioned embodiments of the present application, it is specifically stated that the computer device can also feed back the running data of the processes in the database to the suggester, and the suggester can give all-round optimization suggestions for the database based on the running data, and can realize an unattended database Performance monitoring and root cause identification greatly liberate operation and maintenance manpower, and can help the database system quickly recover from abnormalities or improve performance.
还需要说明的是,在本申请的另一些实施方式中,该计算机设备还可以将部署于其中的数据库中进程的运行数据向建议器发送之外,也可以将该运行数据向训练数据收集器发送,该训练数据收集器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。训练数据收集器接收到该运行数据后,可以根据该运行数据得到训练数据,并基于这些训练数据构建m个训练集,m≥1。It should also be noted that, in other embodiments of the present application, the computer device may also send the running data of the processes in the database deployed in it to the recommender, or send the running data to the training data collector Sending, the training data collector can be deployed in the computer device, and can not be deployed in the remote device, which is not limited here. After receiving the running data, the training data collector can obtain training data according to the running data, and construct m training sets based on these training data, where m≥1.
需要注意的是,在本申请实施例中,本申请实施例所涉及的训练数据收集器的具体功能以及调用逻辑可参阅上述图2所对应实施例中描述的训练数据收集器202部分,具体此处不予赘述。It should be noted that in the embodiment of the present application, the specific functions and calling logic of the training data collector involved in the embodiment of the present application can refer to the part of the training data collector 202 described in the embodiment corresponding to FIG. 2 above. I won't repeat them here.
在本申请上述实施方式中,具体阐述了计算机设备还可以将数据库中进程的运行数据反馈给训练数据收集器,该训练数据收集器可基于数据库中进程的运行数据生成数据库所涉及的模型的训练数据,可实现持续优化数据库系统,降低数据库系统的误判概率,提供可信的自治运维服务。In the above-mentioned embodiments of the present application, it is specifically stated that the computer device can also feed back the running data of the processes in the database to the training data collector, and the training data collector can generate the training data of the models involved in the database based on the running data of the processes in the database. Data can realize continuous optimization of the database system, reduce the probability of misjudgment of the database system, and provide credible autonomous operation and maintenance services.
还需要说明的是,在本申请上述实施方式中,阐述了第一目标模型不满足第二预设要求所包括但不限于的情形,反之,第一目标模型满足第二预设要求包括但不限于的情形则是:1)第一目标模型的性能开始出现性能下降的情况;2)第一目标模型的性能不仅出现下降,并且下降的程度达到某个预设值(可称为第二预设阈值);3)对该第一目标模型的实时性能进行评估并预测接下来的性能表现,例如,可通过模型评估器204对该第一目标模型的性能进行预测,假设预测出的第一目标模型的性能下降的概率达到某个预设值(可称为第三预设阈值),如预测的性能要下降的概率达到80%;4)第一目标模型持续运行的时长达到某个预设时长,如,第一目标模型持续运行的时长已达到30分钟。It should also be noted that in the above-mentioned embodiments of the present application, the situation that the first target model does not meet the second preset requirements includes but is not limited to, on the contrary, the first target model meets the second preset requirements including but not limited to The limited situation is: 1) the performance of the first target model begins to decline; 2) the performance of the first target model not only declines, but the degree of decline reaches a certain preset value (which can be called the second preset value). threshold); 3) evaluate the real-time performance of the first target model and predict the next performance, for example, the performance of the first target model can be predicted by the model evaluator 204, assuming the predicted first The probability that the performance of the target model declines reaches a certain preset value (which can be referred to as the third preset threshold), such as the probability that the predicted performance will decline reaches 80%; 4) the duration of the first target model's continuous operation reaches a certain preset The duration is set, for example, the duration of the continuous operation of the first target model has reached 30 minutes.
因此,在本申请的另一些实施方式中,若第一目标模型满足第二预设要求,该计算机设备还可以进一步向模型管理器(该模型管理器可部署于该计算机设备中,也可以不部署于该计算机设备,此处不做限定)发送第一指令,该第一指令用于指示该模型管理器对该第一目标模型进行微调(finetune),并且,在第二目标模型的性能满足某预设要求(可称为第三预设要求)的情况下,计算机设备接收模型管理器发送的该第二目标模型的模型参数,其中,该第二目标模型为该模型管理器利用与第一目标模型对应的目标训练集对该第一目标模型进行微调得到的模型,该目标训练集为m个训练集中的一个,最后,计算机设备再将该第一目标模型更新为第二目标模型,并通过更新后的n个模型(此时更新后的n个模型中不包括第一目标模型,包括的是第二目标模型)得到目标物理计划。Therefore, in some other embodiments of the present application, if the first target model satisfies the second preset requirement, the computer device may further submit a request to the model manager (the model manager may be deployed in the computer device, or may not deployed on the computer device, not limited here) to send a first instruction, the first instruction is used to instruct the model manager to fine-tune the first target model, and, when the performance of the second target model satisfies In the case of a certain preset requirement (which may be referred to as a third preset requirement), the computer device receives the model parameters of the second target model sent by the model manager, wherein the second target model is used by the model manager and the first target model A target training set corresponding to a target model is obtained by fine-tuning the first target model, the target training set is one of m training sets, and finally, the computer device updates the first target model to a second target model, And the target physical plan is obtained through the updated n models (at this time, the updated n models do not include the first target model, but include the second target model).
需要说明的是,在本申请的一些实施方式中,第二目标模型满足第三预设要求可以包括但不限于:1)该第二目标模型的性能相比第一目标模型的性能提高了某个预设值(可称 为第四预设阈值),作为一个示例,第四预设阈值可以为零,说明只要第二目标模型的性能达到了原来第一目标模型的性能的水平,就认为第二目标模型满足第三预设要求;作为另一示例,第四预设阈值也可以是大于零的某个值或某个比例,说明只有第二目标模型的性能相对原来第一目标模型的性能提高到了一定程度,才认为第二目标模型满足第三预设要求;2)第二目标模型的性能相比数据库内原生内核组件的性能提高了第五预设阈值。即验证该第二目标模型相对于传统数据库算法的性能提升,如果性能提升达到一定阈值,则实际替换数据相应模块使用的模型,否则,还是采用传统数据库算法执行目标物理计划。其中,第五预设阈值的取值可以为零,也可以是大于零的某个值或某个比例,具体请参阅上述第一种方式,此处不予赘述。It should be noted that, in some embodiments of the present application, the second target model meeting the third preset requirements may include but not limited to: 1) The performance of the second target model is improved by a certain amount compared with the performance of the first target model A preset value (can be called the fourth preset threshold), as an example, the fourth preset threshold can be zero, indicating that as long as the performance of the second target model reaches the performance level of the original first target model, it is considered The second target model satisfies the third preset requirement; as another example, the fourth preset threshold can also be a certain value or a certain ratio greater than zero, indicating that only the performance of the second target model is compared with that of the original first target model. The second target model is considered to meet the third preset requirement only when the performance has been improved to a certain extent; 2) The performance of the second target model is improved by the fifth preset threshold compared with the performance of the native kernel components in the database. That is to verify the performance improvement of the second target model compared with the traditional database algorithm. If the performance improvement reaches a certain threshold, the model used by the corresponding module of the data is actually replaced. Otherwise, the traditional database algorithm is still used to execute the target physical plan. Wherein, the value of the fifth preset threshold may be zero, or a certain value or a certain ratio greater than zero. For details, please refer to the above-mentioned first method, which will not be repeated here.
在本申请上述实施方式中,具体阐述了计算机设备在第一目标模型满足第二预设要求的情况下,模型管理器调用训练数据收集器中对应的目标训练集对该第一目标模型进行微调,可实现根据数据库的实时运行状态动态更新和替换数据库中使用的对应模型。In the above embodiments of the present application, it is specifically explained that when the first target model of the computer device meets the second preset requirements, the model manager invokes the corresponding target training set in the training data collector to fine-tune the first target model , which can dynamically update and replace the corresponding model used in the database according to the real-time operation status of the database.
需要注意的是,在本申请实施例中,本申请实施例所涉及的模型管理器的具体功能以及调用逻辑可参阅上述图2所对应实施例中描述的模型管理器203部分,具体此处不予赘述。It should be noted that in the embodiment of the present application, the specific functions and calling logic of the model manager involved in the embodiment of the present application can refer to the part of the model manager 203 described in the embodiment corresponding to Figure 2 above. I will repeat.
还需要说明的是,若该第二目标模型的性能不满足所述第三预设要求,那么该计算机设备还将接收模型评估器发送的第二指令,该第二指令用于指示该数据库采用数据库内原生内核组件生成最终的目标物理计划,其中,该模型评估器可以部署于该计算机设备中,也不可以部署于远端设备,此处不做限定。该模型评估器就用于评估该第二目标模型的性能。It should also be noted that if the performance of the second target model does not meet the third preset requirement, then the computer device will also receive a second instruction sent by the model evaluator, and the second instruction is used to instruct the database to adopt The native kernel component in the database generates the final target physical plan, wherein, the model evaluator can be deployed in the computer device or a remote device, which is not limited here. The model evaluator is used to evaluate the performance of the second target model.
在本申请上述实施方式中,具体阐述了当微调后的第二目标模型依然没有满足第三预设要求,则计算机设备接收模型评估器第二指令,以指示该数据库采用数据库传统算法(即原生内核组件)来生成目标物理计划。本申请实施例为目标物理计划的生成提供了多种选择,具备灵活性。In the above-mentioned embodiments of the present application, it is specifically stated that when the fine-tuned second target model still does not meet the third preset requirement, the computer device receives the second instruction of the model evaluator to instruct the database to use the traditional algorithm of the database (that is, the native kernel component) to generate the target physical plan. The embodiment of the present application provides multiple options for generating the target physical plan, and has flexibility.
需要注意的是,在本申请实施例中,本申请实施例所涉及的模型评估器的具体功能以及调用逻辑可参阅上述图2所对应实施例中描述的模型评估器204部分,具体此处不予赘述。It should be noted that in the embodiment of the present application, the specific functions and calling logic of the model evaluator involved in the embodiment of the present application can refer to the part of the model evaluator 204 described in the embodiment corresponding to Figure 2 above. I will repeat.
在图8所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的计算机设备。具体参阅图9,图9为本申请实施例提供的一种计算机设备的示意图,该计算机设备900具体可以包括:接收模块901、确定模块902以及执行模块903,其中,接收模块901,用于接收客户端向该数据库发送的SQL语句;确定模块902,用于在第一目标模型不满足第二预设要求的情况下,根据该SQL语句,通过该n个模型得到目标物理计划,该目标物理计划为执行开销满足第一预设要求的物理计划,该第一目标模型为该n个模型中的一个;执行模块903,用于执行该目标物理计划。On the basis of the embodiment corresponding to FIG. 8 , in order to better implement the above solution of the embodiment of the present application, the computer equipment for implementing the above solution is also provided below. Refer to FIG. 9 for details. FIG. 9 is a schematic diagram of a computer device provided by an embodiment of the present application. The computer device 900 may specifically include: a receiving module 901, a determining module 902, and an execution module 903, wherein the receiving module 901 is used to receive The SQL statement sent by the client to the database; the determination module 902 is used to obtain the target physical plan through the n models according to the SQL statement when the first target model does not meet the second preset requirement, and the target physical The plan is a physical plan whose execution cost meets a first preset requirement, and the first target model is one of the n models; the execution module 903 is configured to execute the target physical plan.
在一种可能的设计中,该n个模型包括第三模型、第四模型以及第五模型,在这种情况下,确定模块902,具体用于:通过该第三模型,对该SQL语句进行逻辑查询重写,得到重写后的逻辑计划,该第三模型为基于树搜索算法构建的模型;通过该第四模型,根据 该逻辑计划生成q个物理计划,该第四模型为基于深度学习算法构建的模型,q≥1;通过该第五模型,计算与该q个物理计划对应的q个执行开销,并根据该q个执行开销确定该目标物理计划,一个物理计划对应一个执行开销,该第五模型为基于强化学习算法构建的模型。In a possible design, the n models include a third model, a fourth model, and a fifth model. In this case, the determination module 902 is specifically configured to: use the third model to execute the SQL statement Rewrite the logical query to obtain the rewritten logical plan. The third model is a model based on the tree search algorithm; through the fourth model, q physical plans are generated according to the logical plan. The fourth model is based on deep learning The model constructed by the algorithm, q≥1; through the fifth model, calculate the q execution costs corresponding to the q physical plans, and determine the target physical plan according to the q execution costs, one physical plan corresponds to one execution cost, The fifth model is a model constructed based on a reinforcement learning algorithm.
在一种可能的设计中,计算机设备900还包括发送模块904,该发送模块904,用于:将该数据库中进程的运行数据向建议器发送,以使得建议器基于该运行数据发现异常数据,并使得该建议器基于该异常数据诊断出异常原因,并基于该异常原因优化与该异常原因对应的自优化模块,以减小该运行数据发生异常的概率,其中,该建议器包括p个模型,p≥1。In a possible design, the computer device 900 further includes a sending module 904, configured to: send the operation data of the processes in the database to the advisor, so that the advisor finds abnormal data based on the operation data, And make the recommender diagnose the cause of the abnormality based on the abnormal data, and optimize the self-optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the probability of abnormal occurrence of the operating data, wherein the suggester includes p models , p≥1.
在一种可能的设计中,该发送模块904,还可以用于:将该数据库中进程的运行数据向训练数据收集器发送,以使得该训练数据收集器根据该运行数据得到训练数据,并基于该训练数据构建m个训练集,m≥1。In a possible design, the sending module 904 can also be configured to: send the running data of the processes in the database to the training data collector, so that the training data collector can obtain training data according to the running data, and based on The training data constructs m training sets, m≥1.
在一种可能的设计中,该发送模块904,还可以用于:在该第一目标模型满足该第二预设要求的情况下,向模型管理器发送第一指令,该第一指令用于指示该模型管理器对该第一目标模型进行微调,该第一目标模型为该n个模型中的一个;该接收模块901,还可以用于:在第二目标模型的性能满足第三预设要求的情况下,接收该模型管理器发送的该第二目标模型的模型参数,该第二目标模型为该模型管理器利用与该第一目标模型对应的目标训练集对该第一目标模型进行微调得到的模型,该目标训练集为该m个训练集中的一个;该确定模块902,还可以用于:将该第一目标模型更新为该第二目标模型,并通过更新后的n个模型得到该目标物理计划。In a possible design, the sending module 904 may also be configured to: when the first target model meets the second preset requirement, send a first instruction to the model manager, where the first instruction is used to instructing the model manager to fine-tune the first target model, where the first target model is one of the n models; the receiving module 901 may also be used for: when the performance of the second target model satisfies a third preset If required, receive the model parameters of the second target model sent by the model manager, the second target model is the first target model that the model manager uses the target training set corresponding to the first target model The model obtained by fine-tuning, the target training set is one of the m training sets; the determination module 902 can also be used to: update the first target model to the second target model, and pass the updated n models Get a physics plan for that goal.
在一种可能的设计中,该接收模块901,还可以用于:在该第二目标模型的性能不满足该第三预设要求的情况下,接收模型评估器发送的第二指令,该第二指令用于指示该数据库采用该数据库内原生内核组件生成该目标物理计划,该模型评估器用于评估该第二目标模型的性能。In a possible design, the receiving module 901 may also be configured to: when the performance of the second target model does not meet the third preset requirement, receive the second instruction sent by the model evaluator, the first The second instruction is used to instruct the database to use native kernel components in the database to generate the target physical plan, and the model evaluator is used to evaluate the performance of the second target model.
在一种可能的设计中,该第一目标模型满足第二预设要求至少包括如下任意一种:该第一目标模型的性能出现下降;或,该第一目标模型的性能下降的程度达到第二预设阈值,或,预测出的该第一目标模型的性能下降的概率达到第三预设阈值;或,该第一目标模型持续运行的时长满足预设时长。In a possible design, the meeting of the second preset requirement by the first target model includes at least any one of the following: the performance of the first target model is degraded; or, the degree of performance degradation of the first target model reaches the second Two preset thresholds, or, the predicted probability of performance degradation of the first target model reaches a third preset threshold; or, the duration of continuous operation of the first target model satisfies a preset duration.
在一种可能的设计中,第二目标模型的性能满足该第三预设要求至少包括如下任意一项:该第二目标模型的性能相比该第一目标模型的性能提高了第四预设阈值;或,该第二目标模型的性能相比该数据库内原生内核组件的性能提高了第五预设阈值。In a possible design, the performance of the second target model meeting the third preset requirement includes at least any one of the following: the performance of the second target model is improved by a fourth preset compared with the performance of the first target model threshold; or, the performance of the second target model is improved by a fifth preset threshold compared with the performance of the native kernel components in the database.
在一种可能的设计中,该执行开销满足第一预设要求至少包括如下任意一种:该目标物理计划的执行开销在q个执行开销中开销最低,该q个执行开销为基于该SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,q≥1;或,该目标物理计划的执行开销低于第一预设阈值。In a possible design, the execution overhead meeting the first preset requirement includes at least any one of the following: the execution overhead of the target physical plan is the lowest among the q execution overheads, and the q execution overheads are based on the SQL statement Each of the generated q physical plans corresponds to an execution cost, one physical plan corresponds to one execution cost, q≥1; or, the execution cost of the target physical plan is lower than a first preset threshold.
需要说明的是,计算机设备900中各模块/单元之间的信息交互、执行过程等内容,与本申请中图8对应的方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实 施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process among the various modules/units in the computer device 900 are based on the same idea as the method embodiment corresponding to Figure 8 in this application, and the specific content can refer to the method shown above in this application The narration in the embodiment will not be repeated here.
接下来介绍本申请实施例提供的另一种计算机设备,请参阅图10,图10为本申请实施例提供的计算机设备的一种结构示意图,计算机设备1000上可以部署有图9对应实施例中所描述的计算机设备900,用于实现图9对应实施例中计算机设备900的功能,具体的,计算机设备1000由一个或多个服务器实现,计算机设备1000可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1022和存储器1032,一个或一个以上存储应用程序1042或数据1044的存储介质1030(例如一个或一个以上海量存储设备)。其中,存储器1032和存储介质1030可以是短暂存储或持久存储。存储在存储介质1030的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备1000中的一系列指令操作。更进一步地,中央处理器1022可以设置为与存储介质1030通信,在计算机设备1000上执行存储介质1030中的一系列指令操作。Next, another computer device provided by the embodiment of the present application is introduced. Please refer to FIG. 10. FIG. 10 is a schematic structural diagram of the computer device provided by the embodiment of the present application. The computer device 1000 can be deployed with The described computer device 900 is used to realize the functions of the computer device 900 in the embodiment corresponding to FIG. Differences may include one or more central processing units (central processing units, CPU) 1022 and memory 1032, one or more storage media 1030 (such as one or more mass storage devices) for storing application programs 1042 or data 1044. Wherein, the memory 1032 and the storage medium 1030 may be temporary storage or persistent storage. The program stored in the storage medium 1030 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the computer device 1000 . Furthermore, the central processing unit 1022 may be configured to communicate with the storage medium 1030 , and execute a series of instruction operations in the storage medium 1030 on the computer device 1000 .
计算机设备1000还可以包括一个或一个以上电源1026,一个或一个以上有线或无线网络接口1050,一个或一个以上输入输出接口1058,和/或,一个或一个以上操作系统1041,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。 Computer device 1000 can also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input and output interfaces 1058, and/or, one or more operating systems 1041, such as Windows Server™, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
本申请实施例中,中央处理器1022,用于执行图8对应实施例中的计算机设备执行的步骤。例如,中央处理器1022可以用于:首先,接收由客户端设备向该计算机设备中部署的数据库发送的SQL语句,其中,该数据库内包括优化器以及数据库原生内核组件,该优化器包括n个模型,n≥1。在接收到客户端设备发送的SQL语句后,会先判断优化器中的n个模型是否存在有模型不满足某事先预设的要求(可称为第二预设要求),若存在第一目标模型(即n个模型中的一个)不满足该第二预设要求,则根据该SQL语句,通过优化器中包括的n个模型得到目标物理计划,其中,该目标物理计划为执行开销满足第一预设要求的物理计划。在得到最终的目标物理计划后,会执行该目标物理计划,这个执行的过程实质就是利用生成的目标物理计划执行输入的该SQL语句的实际逻辑。In the embodiment of the present application, the central processing unit 1022 is configured to execute the steps performed by the computer device in the embodiment corresponding to FIG. 8 . For example, the central processing unit 1022 can be used to: firstly, receive the SQL statement sent by the client device to the database deployed in the computer device, wherein the database includes an optimizer and database native kernel components, and the optimizer includes n model, n≥1. After receiving the SQL statement sent by the client device, it will first judge whether there are n models in the optimizer that do not meet a preset requirement (which can be called the second preset requirement). If there is a first goal If the model (that is, one of the n models) does not meet the second preset requirement, then according to the SQL statement, the target physical plan is obtained through the n models included in the optimizer, wherein the target physical plan is that the execution cost meets the first A physical plan of preset requirements. After the final target physical plan is obtained, the target physical plan will be executed. The essence of this execution process is to use the generated target physical plan to execute the actual logic of the input SQL statement.
需要说明的是,中央处理器1022执行上述各个步骤的具体方式,与本申请中图8对应的方法实施例基于同一构思,其带来的技术效果也与本申请上述实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the specific way for the central processing unit 1022 to execute the above-mentioned steps is based on the same concept as the method embodiment corresponding to FIG. Refer to the descriptions in the foregoing method embodiments of the present application, and details are not repeated here.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的, 例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware, and of course it can also be realized by special hardware including application-specific integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions completed by computer programs can be easily realized by corresponding hardware, and the specific hardware structure used to realize the same function can also be various, such as analog circuits, digital circuits or special-purpose circuit etc. However, for this application, software program implementation is a better implementation mode in most cases. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be A personal computer, a training device, or a network device, etc.) executes the methods described in various embodiments of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, training device, or data The center transmits to another website site, computer, training device or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media. The available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.

Claims (25)

  1. 一种数据库管理系统,其特征在于,包括:A database management system, characterized in that it comprises:
    优化器、训练数据收集器、模型管理器、模型评估器,所述优化器中包括n个模型,n≥1;An optimizer, a training data collector, a model manager, and a model evaluator, wherein the optimizer includes n models, n≥1;
    所述优化器,用于根据输入所述数据库的SQL语句,通过所述n个模型得到目标物理计划,所述目标物理计划为执行开销满足第一预设要求的物理计划;The optimizer is configured to obtain a target physical plan through the n models according to the SQL statement input into the database, and the target physical plan is a physical plan whose execution cost meets a first preset requirement;
    所述训练数据收集器,用于根据所述数据库中进程的运行数据得到训练数据,并基于所述训练数据构建m个训练集,m≥1;The training data collector is used to obtain training data according to the running data of the process in the database, and construct m training sets based on the training data, where m≥1;
    所述模型管理器,用于在第一目标模型满足第二预设要求的情况下,利用与所述第一目标模型对应的目标训练集对所述第一目标模型进行微调,得到第二目标模型,所述第一目标模型为所述n个模型中的一个,所述目标训练集为所述m个训练集中的一个;The model manager is configured to use the target training set corresponding to the first target model to fine-tune the first target model to obtain a second target when the first target model satisfies a second preset requirement model, the first target model is one of the n models, and the target training set is one of the m training sets;
    所述模型评估器,用于评估所述第二目标模型的性能,并在所述第二目标模型的性能满足第三预设要求的情况下,将所述第一目标模型更新为所述第二目标模型。The model evaluator is configured to evaluate the performance of the second target model, and update the first target model to the first target model when the performance of the second target model meets a third preset requirement. Two-objective model.
  2. 根据权利要求1所述的系统,其特征在于,所述系统还包括:The system according to claim 1, further comprising:
    建议器,所述建议器包括p个模型,p≥1;a suggester, the suggester comprising p models, p≥1;
    所述建议器,用于发现所述运行数据中的异常数据,并基于所述异常数据诊断出异常原因;The advisor is used to discover abnormal data in the operating data, and diagnose the cause of the abnormality based on the abnormal data;
    所述建议器,还用于基于所述异常原因优化与所述异常原因对应的优化模块,以减小所述运行数据发生异常的概率。The suggester is further configured to optimize an optimization module corresponding to the abnormal cause based on the abnormal cause, so as to reduce the probability of abnormal occurrence of the operating data.
  3. 根据权利要求2所述的系统,其特征在于,所述p个模型包括编解码器、第一模型以及第二模型,所述建议器,具体用于:The system according to claim 2, wherein the p models include a codec, a first model, and a second model, and the suggester is specifically used for:
    通过所述编解码器,对所述运行数据进行编码后再解码,得到编码数据,并将所述编码数据与所述运行数据进行比对,得到所述异常数据;Encoding and decoding the operating data by the codec to obtain encoded data, and comparing the encoded data with the operating data to obtain the abnormal data;
    通过所述第一模型,在所述运行数据属于系统指标数据的情况下,根据所述异常数据诊断得到所述异常原因,所述第一模型为基于深度学习算法构建的模型;Through the first model, when the operation data belongs to the system index data, the cause of the abnormality is obtained according to the diagnosis of the abnormal data, and the first model is a model constructed based on a deep learning algorithm;
    通过所述第二模型,在所述运行数据属于查询指标数据的情况下,根据所述异常数据诊断得到所述异常原因,所述第二模型为基于深度学习算法构建的模型。Through the second model, when the operation data belongs to the query index data, the cause of the abnormality is diagnosed and obtained according to the abnormal data, and the second model is a model constructed based on a deep learning algorithm.
  4. 根据权利要求3所述的系统,其特征在于,The system according to claim 3, characterized in that,
    所述第一模型包括长短时记忆模型LSTM以及分类器;The first model includes a long short-term memory model LSTM and a classifier;
    所述第二模型包括树长短时记忆模型Tree-LSTM以及softmax函数。The second model includes a tree-length short-term memory model Tree-LSTM and a softmax function.
  5. 根据权利要求2-4中任一项所述的系统,其特征在于,所述第一目标模型还包括:The system according to any one of claims 2-4, wherein the first object model further comprises:
    所述p个模型中的任意一个。Any one of the p models.
  6. 根据权利要求1-5中任一项所述的系统,其特征在于,所述n个模型包括第三模型、第四模型以及第五模型,所述优化器,具体用于:The system according to any one of claims 1-5, wherein the n models include a third model, a fourth model, and a fifth model, and the optimizer is specifically used for:
    通过所述第三模型,对输入所述数据库的SQL语句进行逻辑查询重写,得到重写后的逻辑计划,所述第三模型为基于树搜索算法构建的模型;Through the third model, rewrite the logical query of the SQL statement input to the database to obtain a rewritten logical plan, and the third model is a model constructed based on a tree search algorithm;
    通过所述第四模型,根据所述逻辑计划生成q个物理计划,所述第四模型为基于深度 学习算法构建的模型,q≥1;Through the fourth model, q physical plans are generated according to the logical plan, and the fourth model is a model built based on a deep learning algorithm, and q≥1;
    通过所述第五模型,计算与所述q个物理计划对应的q个执行开销,并根据所述q个执行开销确定所述目标物理计划,一个物理计划对应一个执行开销,所述第五模型为基于强化学习算法构建的模型。Through the fifth model, calculate q execution costs corresponding to the q physical plans, and determine the target physical plan according to the q execution costs, one physical plan corresponds to one execution cost, the fifth model For models built based on reinforcement learning algorithms.
  7. 根据权利要求1-6中任一项所述的系统,其特征在于,所述模型评估器,还用于:The system according to any one of claims 1-6, wherein the model evaluator is also used for:
    在所述第二目标模型的性能未满足所述第三预设要求的情况下,触发所述数据库采用所述数据库内原生内核组件生成目标物理计划。In the case that the performance of the second target model does not meet the third preset requirement, the database is triggered to generate a target physical plan using native kernel components in the database.
  8. 根据权利要求7所述的系统,其特征在于,所述模型评估器,还用于:The system according to claim 7, wherein the model evaluator is also used for:
    将模型更新失败的信息向所述模型管理器反馈,以使得所述模型管理器基于所述信息调整对所述第一目标模型进行微调的策略,所述模型更新失败为所述第二目标模型的性能未满足所述第三预设要求。Feedback information about model update failure to the model manager, so that the model manager adjusts a strategy for fine-tuning the first target model based on the information, and the model update failure is the second target model The performance of does not meet the third preset requirement.
  9. 根据权利要求1-8中任一项所述的系统,其特征在于,所述执行开销满足第一预设要求至少包括如下任意一种:The system according to any one of claims 1-8, wherein the execution cost meeting the first preset requirement includes at least any of the following:
    所述目标物理计划的执行开销在q个执行开销中开销最低,所述q个执行开销为基于所述SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,q≥1;The execution overhead of the target physical plan is the lowest among the q execution overheads. The q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statement. One physical plan corresponds to one execution overhead, and q ≥1;
    或,or,
    所述目标物理计划的执行开销低于第一预设阈值。The execution cost of the target physical plan is lower than a first preset threshold.
  10. 根据权利要求1-9中任一项所述的系统,其特征在于,所述第一目标模型满足第二预设要求至少包括如下任意一种:The system according to any one of claims 1-9, wherein the first object model meeting the second preset requirement includes at least any one of the following:
    所述第一目标模型的性能出现下降;The performance of the first target model is degraded;
    或,or,
    所述第一目标模型的性能下降的程度达到第二预设阈值;The degree of performance degradation of the first target model reaches a second preset threshold;
    或,or,
    预测出的所述第一目标模型的性能下降的概率达到第三预设阈值;The predicted probability of performance degradation of the first target model reaches a third preset threshold;
    或,or,
    所述第一目标模型持续运行的时长达到预设时长。The duration of continuous operation of the first target model reaches a preset duration.
  11. 根据权利要求1-10中任一项所述的系统,其特征在于,所述第二目标模型的性能满足所述第三预设要求至少包括如下任意一种:The system according to any one of claims 1-10, wherein the performance of the second target model meeting the third preset requirement includes at least any of the following:
    所述第二目标模型的性能相比所述第一目标模型的性能提高了第四预设阈值;The performance of the second target model is increased by a fourth preset threshold compared with the performance of the first target model;
    或,or,
    所述第二目标模型的性能相比所述数据库内原生内核组件的性能提高了第五预设阈值。The performance of the second target model is increased by a fifth preset threshold compared with the performance of the native kernel components in the database.
  12. 一种数据处理方法,其特征在于,所述方法应用于计算机设备,所述计算机设备部署有数据库,所述数据库内包括优化器以及原生内核组件,所述优化器包括n个模型,n≥1,所述方法包括:A data processing method, characterized in that the method is applied to computer equipment, the computer equipment is deployed with a database, the database includes an optimizer and native kernel components, and the optimizer includes n models, n≥1 , the method includes:
    所述计算机设备接收客户端向所述数据库发送的SQL语句;The computer device receives the SQL statement sent by the client to the database;
    在第一目标模型不满足第二预设要求的情况下,所述计算机设备根据所述SQL语句, 通过所述n个模型得到目标物理计划,所述目标物理计划为执行开销满足第一预设要求的物理计划,所述第一目标模型为所述n个模型中的一个;In the case that the first target model does not meet the second preset requirement, the computer device obtains a target physical plan through the n models according to the SQL statement, and the target physical plan is that the execution cost meets the first preset The required physical plan, the first target model is one of the n models;
    所述计算机设备执行所述目标物理计划。The computer device executes the target physical plan.
  13. 根据权利要求12所述的方法,其特征在于,所述n个模型包括第三模型、第四模型以及第五模型,所述计算机设备根据所述SQL语句,通过所述n个模型得到目标物理计划包括:The method according to claim 12, wherein the n models include a third model, a fourth model, and a fifth model, and the computer device obtains the target physical model through the n models according to the SQL statement. Programs include:
    所述计算机设备通过所述第三模型,对所述SQL语句进行逻辑查询重写,得到重写后的逻辑计划,所述第三模型为基于树搜索算法构建的模型;The computer device rewrites the logical query of the SQL statement through the third model to obtain a rewritten logical plan, and the third model is a model constructed based on a tree search algorithm;
    所述计算机设备通过所述第四模型,根据所述逻辑计划生成q个物理计划,所述第四模型为基于深度学习算法构建的模型,q≥1;The computer device generates q physical plans according to the logical plan through the fourth model, the fourth model is a model constructed based on a deep learning algorithm, and q≥1;
    所述计算机设备通过所述第五模型,计算与所述q个物理计划对应的q个执行开销,并根据所述q个执行开销确定所述目标物理计划,一个物理计划对应一个执行开销,所述第五模型为基于强化学习算法构建的模型。The computer device calculates q execution overheads corresponding to the q physical plans through the fifth model, and determines the target physical plan according to the q execution overheads, one physical plan corresponds to one execution overhead, so The fifth model mentioned above is a model constructed based on a reinforcement learning algorithm.
  14. 根据权利要求12-13中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 12-13, wherein the method further comprises:
    所述计算机设备将所述数据库中进程的运行数据向建议器发送,以使得所述建议器基于所述运行数据发现异常数据,并使得所述建议器基于所述异常数据诊断出异常原因,并基于所述异常原因优化与所述异常原因对应的自优化模块,以减小所述运行数据发生异常的概率,其中,所述建议器包括p个模型,p≥1。The computer device sends the operation data of the processes in the database to the advisor, so that the advisor finds abnormal data based on the operation data, and enables the advisor to diagnose the cause of the abnormality based on the abnormal data, and The self-optimization module corresponding to the abnormal reason is optimized based on the abnormal reason, so as to reduce the probability of abnormal occurrence of the operation data, wherein the suggester includes p models, p≧1.
  15. 根据权利要求12-14中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 12-14, wherein the method further comprises:
    所述计算机设备将所述数据库中进程的运行数据向训练数据收集器发送,以使得所述训练数据收集器根据所述运行数据得到训练数据,并基于所述训练数据构建m个训练集,m≥1。The computer device sends the operation data of the processes in the database to the training data collector, so that the training data collector obtains training data according to the operation data, and constructs m training sets based on the training data, m ≥1.
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:The method according to claim 15, further comprising:
    在所述第一目标模型满足所述第二预设要求的情况下,所述计算机设备向模型管理器发送第一指令,所述第一指令用于指示所述模型管理器对所述第一目标模型进行微调,所述第一目标模型为所述n个模型中的一个;When the first target model satisfies the second preset requirement, the computer device sends a first instruction to the model manager, and the first instruction is used to instruct the model manager to The target model is fine-tuned, and the first target model is one of the n models;
    在第二目标模型的性能满足第三预设要求的情况下,所述计算机设备接收所述模型管理器发送的所述第二目标模型的模型参数,所述第二目标模型为所述模型管理器利用与所述第一目标模型对应的目标训练集对所述第一目标模型进行微调得到的模型,所述目标训练集为所述m个训练集中的一个;When the performance of the second target model meets the third preset requirement, the computer device receives the model parameters of the second target model sent by the model manager, the second target model is the model management A model obtained by fine-tuning the first target model using a target training set corresponding to the first target model, where the target training set is one of the m training sets;
    所述计算机设备将所述第一目标模型更新为所述第二目标模型,并通过更新后的n个模型得到所述目标物理计划。The computer device updates the first target model to the second target model, and obtains the target physical plan through the updated n models.
  17. 根据权利要求16所述的方法,其特征在于,所述方法还包括:The method according to claim 16, further comprising:
    在所述第二目标模型的性能不满足所述第三预设要求的情况下,所述计算机设备接收模型评估器发送的第二指令,所述第二指令用于指示所述数据库采用所述数据库内原生内核组件生成所述目标物理计划,所述模型评估器用于评估所述第二目标模型的性能。When the performance of the second target model does not meet the third preset requirement, the computer device receives a second instruction sent by the model evaluator, and the second instruction is used to instruct the database to adopt the The in-database native kernel component generates the target physical plan, and the model evaluator is used to evaluate the performance of the second target model.
  18. 根据权利要求16-17中任一项所述的方法,其特征在于,所述第一目标模型满足 第二预设要求至少包括如下任意一种:The method according to any one of claims 16-17, wherein the first target model meeting the second preset requirement at least includes any of the following:
    所述第一目标模型的性能出现下降;The performance of the first target model is degraded;
    或,or,
    所述第一目标模型的性能下降的程度达到第二预设阈值;The degree of performance degradation of the first target model reaches a second preset threshold;
    或,or,
    预测出的所述第一目标模型的性能下降的概率达到第三预设阈值;The predicted probability of performance degradation of the first target model reaches a third preset threshold;
    或,or,
    所述第一目标模型持续运行的时长满足预设时长。The duration for which the first target model continues to run satisfies a preset duration.
  19. 根据权利要求16-18中任一项所述的方法,其特征在于,所述第二目标模型的性能满足所述第三预设要求至少包括如下任意一种:The method according to any one of claims 16-18, wherein the performance of the second target model meeting the third preset requirement includes at least any of the following:
    所述第二目标模型的性能相比所述第一目标模型的性能提高了第四预设阈值;The performance of the second target model is increased by a fourth preset threshold compared with the performance of the first target model;
    或,or,
    所述第二目标模型的性能相比所述数据库内原生内核组件的性能提高了第五预设阈值。The performance of the second target model is increased by a fifth preset threshold compared with the performance of the native kernel components in the database.
  20. 根据权利要求12-19中任一项所述的方法,其特征在于,所述执行开销满足第一预设要求至少包括如下任意一种:The method according to any one of claims 12-19, wherein the execution cost meeting the first preset requirement includes at least any of the following:
    所述目标物理计划的执行开销在q个执行开销中开销最低,所述q个执行开销为基于所述SQL语句生成的q个物理计划各自对应的执行开销,一个物理计划对应一个执行开销,q≥1;The execution overhead of the target physical plan is the lowest among the q execution overheads. The q execution overheads are the corresponding execution overheads of the q physical plans generated based on the SQL statement. One physical plan corresponds to one execution overhead, and q ≥1;
    或,or,
    所述目标物理计划的执行开销低于第一预设阈值。The execution cost of the target physical plan is lower than a first preset threshold.
  21. 一种计算机设备,所述设备具有实现权利要求12-20中任一项所述方法的功能,所述功能通过硬件或通过硬件执行相应的软件实现,所述硬件或所述软件包括一个或多个与所述功能相对应的模块。A computer device, the device has the function of implementing the method according to any one of claims 12-20, the function is implemented by hardware or by executing corresponding software through hardware, and the hardware or the software includes one or more A module corresponding to the described function.
  22. 一种计算机设备,包括处理器和存储器,所述处理器与所述存储器耦合,其特征在于,A computer device comprising a processor and a memory, the processor being coupled to the memory, characterized in that,
    所述存储器,用于存储程序;The memory is used to store programs;
    所述处理器,用于执行所述存储器中的程序,使得所述计算机设备执行如权利要求12-20中任一项所述的方法。The processor is configured to execute the program in the memory, so that the computer device executes the method according to any one of claims 12-20.
  23. 一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行如权利要求12-20中任一项所述的方法。A computer-readable storage medium, including a program, which, when run on a computer, causes the computer to execute the method according to any one of claims 12-20.
  24. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求12-20中任一项所述的方法。A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as claimed in any one of claims 12-20.
  25. 一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行如权利要求12-20中任一项所述的方法。A chip, the chip includes a processor and a data interface, the processor reads instructions stored on the memory through the data interface, and executes the method according to any one of claims 12-20.
PCT/CN2022/111991 2021-08-13 2022-08-12 Database management system, data processing method, and device WO2023016537A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110930569.8A CN115705322A (en) 2021-08-13 2021-08-13 Database management system, data processing method and equipment
CN202110930569.8 2021-08-13

Publications (1)

Publication Number Publication Date
WO2023016537A1 true WO2023016537A1 (en) 2023-02-16

Family

ID=85180159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111991 WO2023016537A1 (en) 2021-08-13 2022-08-12 Database management system, data processing method, and device

Country Status (2)

Country Link
CN (1) CN115705322A (en)
WO (1) WO2023016537A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108025B (en) * 2023-04-14 2023-08-01 安元科技股份有限公司 Data virtualization performance optimization method
CN116821171B (en) * 2023-06-27 2024-04-19 杭州云之重器科技有限公司 Method for generating new virtual view to accelerate computing task
CN116627773B (en) * 2023-07-21 2023-09-22 四川发展环境科学技术研究院有限公司 Abnormality analysis method and system of production and marketing difference statistics platform system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030758A1 (en) * 2008-07-30 2010-02-04 Oracle International Corporation Hybrid optimization strategies in automatic SQL tuning
CN106991116A (en) * 2017-02-10 2017-07-28 阿里巴巴集团控股有限公司 The optimization method and device of database executive plan
CN111813888A (en) * 2019-04-12 2020-10-23 微软技术许可有限责任公司 Training target model
CN112215357A (en) * 2020-09-29 2021-01-12 三一专用汽车有限责任公司 Model optimization method, device, equipment and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030758A1 (en) * 2008-07-30 2010-02-04 Oracle International Corporation Hybrid optimization strategies in automatic SQL tuning
CN106991116A (en) * 2017-02-10 2017-07-28 阿里巴巴集团控股有限公司 The optimization method and device of database executive plan
CN111813888A (en) * 2019-04-12 2020-10-23 微软技术许可有限责任公司 Training target model
CN112215357A (en) * 2020-09-29 2021-01-12 三一专用汽车有限责任公司 Model optimization method, device, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG HE-BIAO, LIU LING, YANG LI-FAN: "SQL (A Study of the Automated Programming Assessment Model for SQL Based on Structure Similarity Matching", COMPUTER ENGINEERING AND SCIENCE, GUOFANG KEJI DAXUE JISUANJI XUEYUAN, CN, vol. 32, no. 11, 15 November 2010 (2010-11-15), CN , pages 92 - 96, XP009543424, ISSN: 1007-130X *

Also Published As

Publication number Publication date
CN115705322A (en) 2023-02-17

Similar Documents

Publication Publication Date Title
US11567937B2 (en) Automated configuration parameter tuning for database performance
WO2023016537A1 (en) Database management system, data processing method, and device
Li et al. Qtune: A query-aware database tuning system with deep reinforcement learning
Hilprecht et al. Learning a partitioning advisor for cloud databases
Christophides et al. End-to-end entity resolution for big data: A survey
US20210286786A1 (en) Database performance tuning method, apparatus, and system, device, and storage medium
US20180032405A1 (en) Method for data protection for cloud-based service system
Raza et al. Autonomic performance prediction framework for data warehouse queries using lazy learning approach
CN113010547A (en) Database query optimization method and system based on graph neural network
Zhao et al. Queryformer: A tree transformer model for query plan representation
Zhang et al. CDBTune+: An efficient deep reinforcement learning-based automatic cloud database tuning system
Prats et al. You only run once: spark auto-tuning from a single run
Paludo Licks et al. SmartIX: A database indexing agent based on reinforcement learning
Zhao et al. Automatic database knob tuning: a survey
US20150356485A1 (en) Methods and systems for intelligent evolutionary optimization of workflows using big data infrastucture
Cai et al. HUNTER: an online cloud database hybrid tuning system for personalized requirements
Zhu et al. Lero: A learning-to-rank query optimizer
Chen et al. LOGER: A learned optimizer towards generating efficient and robust query execution plans
US20230067285A1 (en) Linkage data generator
Chen et al. Leon: A new framework for ml-aided query optimization
Doshi et al. Kepler: Robust Learning for Parametric Query Optimization
Wu et al. Dynamic index construction with deep reinforcement learning
Lyu et al. Fine-grained modeling and optimization for intelligent resource management in big data processing
Mozaffari et al. Feedback control loop design for workload change detection in self-tuning NoSQL wide column stores
Valavala et al. Automatic database index tuning using machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22855517

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE