WO2020228378A1 - 一种确定数据库的配置参数的方法及装置 - Google Patents
一种确定数据库的配置参数的方法及装置 Download PDFInfo
- Publication number
- WO2020228378A1 WO2020228378A1 PCT/CN2020/073429 CN2020073429W WO2020228378A1 WO 2020228378 A1 WO2020228378 A1 WO 2020228378A1 CN 2020073429 W CN2020073429 W CN 2020073429W WO 2020228378 A1 WO2020228378 A1 WO 2020228378A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- database
- configuration parameter
- information
- model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 103
- 239000013598 vector Substances 0.000 claims abstract description 163
- 230000008569 process Effects 0.000 claims description 48
- 238000004088 simulation Methods 0.000 claims description 31
- 230000008859 change Effects 0.000 claims description 17
- 239000003795 chemical substances by application Substances 0.000 claims description 11
- 230000002787 reinforcement Effects 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000009977 dual effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 23
- 230000006870 function Effects 0.000 description 17
- 238000007726 management method Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000008521 reorganization Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24549—Run-time optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
- G06F16/24545—Selectivity estimation or determination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- This application relates to the field of database technology, and in particular to a method and device for determining configuration parameters of a database.
- a large amount of data is stored in a database (DB), which can provide data services for a large number of clients.
- the client is the load of the database.
- the database When the database is working, it can be run based on different configuration parameters to provide different performance data services to the client. In other words, if the configuration parameters of the database match the load business, the database will respond faster when it provides services. If the database configuration parameters do not match the load business, the database will respond quickly when it provides services. Slower.
- the database administrator In order to better provide services for the client, during the operation of the database, the database administrator (DBA) will determine the configuration parameters of the database according to the load situation.
- the DBA determines the configuration parameters of the database to test the service to be deployed in a test environment. It takes a lot of time to determine the configuration parameters of the database suitable for this service. Once the service changes, it needs to be tested again. For the number of services or loads in the network that may change at any time, this DBA tuning method cannot meet current needs.
- the embodiment of the present application provides a method for determining the configuration parameters of the database, which is used to dynamically determine the configuration parameters of the database according to the load situation and the current state information of the database, which can ensure the good performance of the database under different load requirements, for example: Performance in terms of delay and throughput.
- a method for determining configuration parameters of a database may include: obtaining N query sentences sent by M clients, where M is an integer greater than 0, N is an integer greater than 0, and N ⁇ M; determine N query information from the N query statements, and the N query statements have a one-to-one correspondence with the N query information; perform vectorized encoding on the N query information to obtain the corresponding N target feature vectors, the N query information corresponds to the N target feature vectors one-to-one; according to the N target feature vectors and the current state information of the database, it is determined to correspond to the N query sentences
- the configuration parameter set of the database, the configuration parameter set includes at least one target configuration parameter group, the target configuration parameter group includes at least one configuration parameter, and the configuration parameter set is used to execute the N query statements.
- the configuration parameters currently used by the database can be adjusted based on the configuration parameter set.
- the configuration parameter adjustment method can be to replace the configuration parameters currently in the database with the configuration parameters in the current configuration parameter set.
- the configuration parameter used may also be the adjustment of the configuration parameter currently used by the database by increasing or decreasing the offset value from the value of the configuration parameter currently used by the database.
- the configuration parameters of the database can be determined dynamically according to the load situation and the current state information of the database, which can ensure the good performance of the database under different load requirements, for example, it can meet the performance of delay and throughput.
- the method may further include: in response to the mode selection instruction, determining the parameter adjustment mode from the query-level tuning mode, the load-level tuning mode, and the cluster-level tuning mode.
- the target parameter adjustment mode for the N query sentences that is to say, this possible implementation provides three granular tuning modes: query-level tuning mode, load-level tuning mode, and cluster-level tuning mode.
- the mode selection command can be triggered by the user. , It can also be triggered by the database administrator, that is to say, the user or the database administrator can switch the tuning mode under different demand scenarios, so as to meet various tuning needs.
- a configuration parameter set can be determined for a query statement.
- the configuration parameter set includes a target configuration parameter group for the query statement, and the target configuration parameter group is used to configure the database when the query statement is executed.
- the method when the target tuning mode is the load level tuning mode, the method further includes: fusing the N target feature vectors to obtain A unified vector after fusion; the determining the configuration parameter set of the database corresponding to the N query sentences according to the N target feature vectors and the current state information of the database includes: according to the unified vector And the current state information of the database, determine the configuration parameter set of the database corresponding to the N query statements, the configuration parameter set includes a target configuration parameter group, and the target configuration parameter group is used for executing When the N query statements are used, the configuration parameters currently used by the database are adjusted. From a possible implementation of the first aspect, it can be seen that this load-level parameter adjustment mode can perform a parameter adjustment for N query statements at a time, which can meet the requirements of a large throughput scenario.
- Determining the configuration parameter set of the database corresponding to the N query sentences includes: determining N configuration parameter groups according to the N target feature vectors and the current state information of the database, where N configurations The parameter groups correspond to the N target feature vectors one-to-one; the N configuration parameter groups are clustered to obtain Q target configuration parameter groups, where the first configuration parameter group corresponds to at least one target feature vector, so The first configuration parameter group is any one of the Q target configuration parameter groups, the Q is an integer greater than 0, and Q ⁇ N, and the first configuration parameter group is used to execute the at least one target When the query sentence corresponding to the feature vector is adjusted, the configuration parameters currently used by the database are adjusted. From a possible implementation of the first aspect, it can be seen that this cluster-level parameter adjustment mode can determine a set of target configuration parameter sets for the same type of query statement, which can ensure greater throughput and satisfy
- the first query information in the N query information includes the first query type, the first table information, the type of the operator involved in the query, and the corresponding cost information
- the first query information is any one of the N query information; the first query type is used to indicate the operation type of the query statement corresponding to the first query information on the database; A table of information is used to indicate the relational table involved when the query sentence corresponding to the first query information is executed by the database; the cost information is used to indicate that the involved operator is performing the first query information. The respective execution cost of the corresponding query statement.
- the target feature vector corresponding to the first query information includes the identifier of the first query type and the identifier of the second query type, the identifier of the first table information, and the second table information.
- the floating-point bits of each type of operator wherein the identifier of the first query type is represented by a first value, the identifier of the second query type is represented by a second value, and the second query type belongs to The query type of the database is not included in the first query information; the identifier of the first table information is represented by a third value, the identifier of the second table information is represented by a fourth value, and the second
- the table information belongs to the table information of the database and is not included in the first query information; the floating-point bits of each type of operator include the execution cost of the corresponding type of operator.
- the identifier can be a bit identifier or a floating-point identifier.
- 1 can be used to indicate the first query type
- 0 can be used to indicate the second query type
- T represents the first query type
- F can be used to represent the second query type, or other bit representations can also be applied to this solution.
- it is a floating point identifier
- Delete (delete) is represented by 4.
- these four query types are used as examples for illustration. It is not limited to only these four query types, nor is it limited to be expressed in the form of numerical values listed here, and can also be expressed by other symbols Or numerical form to represent different query types.
- the execution cost of the corresponding type of operator is a normalized execution cost.
- the determining a configuration parameter set of the database corresponding to the N query sentences according to the N target feature vectors and current state information of the database Including: according to the N target feature vectors and the current state information of the database, using a dual-state deep reinforcement learning (DS-DRL) model to determine all corresponding to the N query sentences
- the configuration parameter set of the database can process the data of both the feature vector of the query statement and the status information of the database, so as to obtain the configuration parameters of the database suitable for the query statement.
- the DS-DRL model includes a predictive model, an environment model, and a proxy model; the use of dual data is based on the N target feature vectors and the current state information of the database.
- the state deep reinforcement learning DS-DRL model determines the configuration parameter set of the database corresponding to the N query sentences, including: predicting the state information of the database according to the N target feature vectors through the prediction model
- the amount of change ⁇ S, the ⁇ S is the amount of change in the state information of the database after the execution of the N query statements relative to the state information of the database before the execution of the N query statements; through the environment model, according to the ⁇ S, and execution
- the state information S of the database before the N query statements determines the simulation state information S′ of the database after the execution of the N query statements; through the proxy model, according to the simulation state information S′ , Determine a recommended configuration parameter set; through the environment model, perform simulation configuration according to the recommended configuration parameter set, and execute the N target feature vectors to update the S and S
- the DS-DRL model implements the above-mentioned process of determining the configuration parameter set through the cooperation of the three models of the predictive model, the environment model and the proxy model, which can ensure the accuracy of the configuration parameter set.
- Satisfying the preset condition in this possible implementation manner may include that the value of R exceeds a preset threshold, or a predetermined resource restriction condition is reached, for example: iterated a predetermined number of times, or iterated for a preset duration.
- the proxy model when the DS-DRL model is a dual-state deep deterministic policy gradient algorithm DS-DDPG model, the proxy model includes a role actor model and a scoring critic model; Using the proxy model to determine a recommended configuration parameter set based on the simulation state information S'includes: determining a recommended configuration parameter set based on the actor model and the simulation state information S'; using the critic model, Determine the score of the recommended configuration parameter set according to the simulation state information S'and the recommended configuration parameter set.
- the updating the parameter adjustment strategy according to the performance value R may include: updating the weight of the actor model according to the score through the actor model.
- the updating the parameter adjustment strategy according to the performance value R may include: updating the weight of the critic model according to the performance value R through the critic model.
- a device for determining configuration parameters of a database which is used to execute the foregoing first aspect or any possible implementation of the first aspect.
- the device includes a module or unit for executing the foregoing first aspect or any possible implementation of the first aspect.
- a device for determining configuration parameters of a database may include at least one processor, memory, and communication interface.
- the processor is coupled with the memory and the communication interface.
- the memory is used to store instructions
- the processor is used to execute the instructions
- the communication interface is used to communicate with other network elements under the control of the processor.
- a database system in the fourth aspect, includes the device for determining the configuration parameters of the database and the database of the second aspect or the third aspect.
- a computer-readable storage medium stores a program that enables a data query device to execute the first aspect and any of its various implementations to determine the database The method of configuring parameters.
- a computer program product in a sixth aspect, includes computer-executable instructions stored in a computer-readable storage medium; at least one processor of the device can read the computer from the computer-readable storage medium. The instruction is executed, and at least one processor executes the computer-executed instruction to enable the device to implement the method for determining the configuration parameters of the database provided by the foregoing first aspect or any one of the possible implementation manners of the first aspect.
- any device, computer storage medium, or computer program product for determining configuration parameters of a database provided above is used to execute the corresponding method for determining configuration parameters of a database provided above, and therefore, it can achieve
- the beneficial effects of please refer to the beneficial effects of the corresponding method provided above, which will not be repeated here.
- FIG. 1 is a schematic diagram of the architecture of a database system provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of another architecture of a database system provided by an embodiment of the present application.
- FIG. 3 is another schematic diagram of the architecture of the database system provided by an embodiment of the present application.
- FIG. 4 is another schematic diagram of the architecture of the database system provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of a database server provided by an embodiment of the present application.
- FIG. 6 is another schematic structural diagram of a database server provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of a neural network model provided by an embodiment of the present application.
- FIG. 8 is another schematic diagram of the architecture of the database system provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of an embodiment of a method for determining configuration parameters of a database provided by an embodiment of the present application.
- FIG. 10 is another schematic diagram of the architecture of the database system provided by an embodiment of the present application.
- FIG. 11 is a schematic diagram of another embodiment of a method for determining configuration parameters of a database provided by an embodiment of the present application.
- FIG. 12 is another schematic diagram of a neural network model provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of a query sentence vectorized encoding process provided by an embodiment of the present application.
- FIG. 14 is a schematic diagram of a DS-DRL architecture provided by an embodiment of the present application.
- FIG. 15 is a schematic diagram of a DS-DDPG architecture provided by an embodiment of the present application.
- FIG. 16 is a schematic diagram of an embodiment of an apparatus for determining configuration parameters of a database provided by an embodiment of the present application.
- FIG. 17 is a schematic diagram of another structure of a database server provided by an embodiment of the present application.
- the architecture of the database system applied by the embodiment of the present application is shown in FIG. 1, and the database system includes a database 101 and a database management system (DBMS) 102.
- DBMS database management system
- the database 101 refers to an organized data collection stored in a data store for a long time, that is, a related data collection organized, stored, and used according to a certain data model.
- the database 101 may include one or more Table data.
- the DBMS 102 is used to establish, use, and maintain the database 101, and to perform unified management and control of the database 101 to ensure the security and integrity of the database 101.
- the user can access the data in the database 101 through the DBMS 102, and the database administrator (DBA) also performs the maintenance of the database through the DBMS 102.
- the DBMS 102 provides multiple functions, allowing multiple applications and user equipment to use different methods to create, modify, and query the database at the same time or at different times.
- the applications and user equipment can be collectively referred to as the client.
- the functions provided by DBMS 102 can include the following: (1) Data definition function.
- DBMS 102 provides Data Definition Language (DDL) to define the database structure.
- DDL Data Definition Language
- DDL is used to describe the database framework and can be stored in the data.
- DBMS 102 provides Data Manipulation Language (DML) to realize basic access operations on database data, such as retrieval, insertion, modification and deletion
- DML Data Manipulation Language
- Database operation management Function DBMS 102 provides data control functions, that is, data security, integrity and concurrency control, etc., to effectively control and manage the operation of the database to ensure that the data is correct and effective
- the establishment and maintenance of the database including the database Initial data loading, database dump, recovery, reorganization, system performance monitoring, analysis and other functions
- Database transmission DBMS 102 provides processing data transmission, and realizes communication between client and DBMS 102, Usually completed in coordination with the operating system.
- Figure 2 is a schematic diagram of a stand-alone database system, including a database management system and data store (Data Store), the database management system is used to provide database query and modification services, the database management system stores data in the data store .
- the database management system and data storage are usually located on a single server, such as a Symmetric Multi-Processor (SMP) server.
- SMP server includes multiple processors, and all processors share resources, such as buses, memory, and I/O systems.
- the functions of the database management system can be implemented by one or more processors executing programs in the memory.
- FIG 3 is a schematic diagram of a cluster database system using a shared disk (Shared-storage) architecture, including multiple nodes (node 1-N in Figure 3), each node is deployed with a database management system to provide users with database query
- multiple database management systems store shared data in the shared data storage, and perform read and write operations on the data in the data storage through the switch.
- the shared data storage can be a shared disk array.
- the nodes in the cluster database system can be physical machines, such as database servers, or virtual machines running on abstract hardware resources. If the node is a physical machine, the switch is a Storage Area Network (SAN) switch, an Ethernet switch, an optical fiber switch or other physical switching equipment. If the node is a virtual machine, the switch is a virtual switch.
- SAN Storage Area Network
- FIG 4 is a schematic diagram of a cluster database system adopting a Shared-nothing architecture.
- Each node has its own hardware resources (such as data storage), operating system and database, and the nodes communicate through the network. Under this system, data will be distributed to each node according to the database model and application characteristics, and the query task will be divided into several parts, which will be executed in parallel on all nodes and coordinated with each other to provide database services as a whole. All communication functions are in Realized on a high-bandwidth network interconnection system.
- the nodes here can be either physical machines or virtual machines.
- the data store of the database system includes but is not limited to solid state drives (SSD), disk arrays, or other types of non-transitory computer readable media.
- SSD solid state drives
- the database is not shown in FIGS. 2 to 4, it should be understood that the database is stored in a data storage.
- a database system may include fewer or more components than those shown in Figures 2 to 4, or include components different from those shown in Figures 2 to 4, as shown in Figures 2 to 4 FIG. 4 only shows components that are more relevant to the implementation manner disclosed in the embodiments of the present application.
- a cluster database system may include any number of nodes.
- the database management system function of each node can be implemented by a suitable combination of software, hardware and/or firmware running on each node.
- a database management system which can be applied to a stand-alone database system, a cluster database system of Shared-nothing architecture, and Shared -storage architecture cluster database system, or other types of database systems.
- the DBMS 102 when the DBMS 102 executes a query of the database 101, it usually needs to perform syntax analysis, pre-compilation, and optimization of the query statement to estimate the execution mode that the database system considers the least expensive, and then generate the least expensive execution plan , The runtime execution structure will perform data operations in accordance with the generated execution plan to improve the performance of the database system.
- the DBMS 102 performs cost estimation on a query statement, it needs to collect statistical information of the query statement, and perform cost estimation based on the collected statistical information.
- the method of collecting statistical information may be model information obtained through model training through machine learning, or statistical information obtained through data sampling and statistics, and model information may also be referred to as statistical information.
- the DBMS 102 may be located in a database server.
- the database server may specifically be the SMP server in the stand-alone database system described in FIG. 2 or the node described in FIG. 3 or FIG. 4.
- the database server may include a core 1021, and an external trainer 1022 independent of the core 1021 and located inside the database server; or, as shown in FIG. 6, the database server includes a core 1021, an external trainer 1022 is located outside the database server.
- the kernel 1021 is the core of the database server, and can be used to perform various functions provided by the DBMS 102.
- the kernel 1021 may include a utility program 10211 and an optimizer 10212.
- the utility program 10211 may trigger the external trainer 1022 to perform model training through machine learning, thereby obtaining model information of the training model.
- the optimizer 10212 can estimate the cost according to the model information trained by the external trainer 1022 to generate an execution plan with the least cost, so that the execution structure performs data operations according to the generated execution plan to improve the performance of the database system.
- Machine learning refers to the process of obtaining new inference models that rely on learning or observation of existing data.
- Machine learning can be implemented through a variety of different algorithms.
- Common machine learning algorithms can include: neural network (NN) and random forest (Random Forest, RF) models.
- a neural network may include a forward feedback neural network (Feed Forward Neural Network, FFNN) and a recurrent neural network (Recurrent Neural Network, RNN).
- FFNN forward feedback neural network
- RNN recurrent neural network
- FIG. 7 it is a schematic diagram of a neural network model.
- the model may include an input layer, a hidden layer, and an output layer. Each layer may include a different number of neurons.
- the embodiment of the present application provides a A method for determining configuration parameters of a database.
- the method provides a dual-state-based configuration parameter adjustment solution, where the dual-states can be a feature vector of a query statement (query vector) and database state information (database states).
- the feature vector of the query statement will affect the statistical information of the various activities of the database. It can also be said that the query statement or load will affect the statistical information of the various activities of the database.
- the state information of the database can include the internal configuration of the database, which will be affected by the configuration. Influence of tuning behavior.
- the adjustment scheme based on the dual-state configuration parameters provided by the embodiments of the present application may be implemented through a dual-state deep reinforcement learning (DS-DRL) model, which can determine the characteristics of the query sentence
- the vector and the status information of the database are processed to obtain the configuration parameters of the database suitable for the query statement.
- the DS-DRL model can have a variety of specific manifestations, one of which can be, for example, the dual-state deep deterministic policy gradient (DS-DDPG) model.
- DS-DDPG dual-state deep deterministic policy gradient
- About the DS-DRL model and the DS-DDPG model The specific structure of the DDPG model and its role in the process of determining the configuration parameters of the database will be described later, and we will not expand too much here.
- the embodiment of the present application includes three granular parameter adjustment modes in the above-mentioned dual-state configuration parameter adjustment scheme, namely: query-level tuning mode and load-level tuning mode (workload- level tuning) and cluster-level tuning mode (cluster-level tuning).
- query-level tuning mode and load-level tuning mode (workload- level tuning)
- cluster-level tuning mode cluster-level tuning
- the DBMS provides the DB with a configuration parameter set matching the load, and then adjusts the configuration parameters of the DB according to the configuration parameter set. Because both the DBMS and the DB belong to the database system, the following describes the process of determining the configuration parameters of the database in the embodiment of the present application with reference to FIG. 8.
- FIG. 8 is a schematic diagram of another architecture of the database system provided by an embodiment of the application.
- the database system includes a DBMS and a DB.
- the functions of the DB are basically the same as those described in the embodiment corresponding to FIG. 1, and will not be described in detail here.
- the DB can adjust its own configuration parameters based on the configuration parameter set provided by the DBMS.
- the DBMS may include a controller (Controller), a query sentence vectorization module (Query2Vector), a load vectorization module (Workload2Vector), a load clustering module (Workload2Cluster), and a parameter tuning system (tuner).
- the DBMS may also include training The parameter set storage module, if the training parameter set storage module is included, training data (Training Data) is stored in the module.
- training data Training Data
- the training data is mainly used to train the DS-DRL model. If the DS-DRL model is trained offline, the database system may not include the training parameter set storage module. In a possible implementation, even if the DS-DRL model is trained offline, it can also include the training parameter set storage module. During the tuning process, the new data generated by the tuning system can also be updated to the training parameter set storage. In the training data of the module, the online DS-DRL model can be optimized.
- the number of clients shown in FIG. 8 is not limited, and it can mean that there is one client, or it can mean that there are two or more clients.
- the client When the client uses the database, it will send a query (Query), which can be sent through a query request.
- Query a query
- the DBMS can execute the process of the method for determining the configuration parameters of the database as shown in FIG. 9.
- an embodiment of a method for determining configuration parameters of a database may include:
- the M is an integer greater than 0, the N is an integer greater than 0, and N ⁇ M.
- the N query sentences correspond to the N query information one to one.
- the N pieces of query information correspond to the N pieces of target feature vectors one-to-one.
- the configuration parameter set includes at least one target configuration parameter group, and the target configuration parameter group includes at least one configuration parameter.
- the configuration parameter set is used to execute the N query statements.
- step 205 may be performed after step 204.
- the configuration parameter adjustment method can be to replace the configuration parameters currently used by the database with the configuration parameters in the current configuration parameter set, or to adjust the current database by increasing or decreasing the offset value from the value of the configuration parameter currently used by the database.
- the configuration parameters used can be to replace the configuration parameters currently used by the database with the configuration parameters in the current configuration parameter set, or to adjust the current database by increasing or decreasing the offset value from the value of the configuration parameter currently used by the database. The configuration parameters used.
- the above steps 201 to 205 can also be executed by using the remaining part of the DBMS excluding the controller as a device for determining the configuration parameters of the database.
- the controller provides the client with access to the database. Interface to forward the query statement.
- the above steps 201 to 205 are executed by the device for determining the configuration parameters of the database.
- the configuration parameters of the database can be determined by the target feature vector of the query sentence and the current state information of the database, which can ensure the good performance of the database under different load requirements. For example: It can meet the performance of delay and throughput.
- the query sentence vectorization module (Query2Vector) vectorizes a single query sentence to obtain the target feature vector of the query sentence, and then The target feature vector is passed to the tuning system (tuner), and the tuning system determines the configuration parameter set of the database for the query statement according to the feature vector of the query statement. That is to say, when the query-level parameter adjustment mode is executed, a configuration parameter set will be obtained through the above process.
- the configuration parameter set includes a target configuration parameter group corresponding to the query statement, and the target configuration parameter group is used for The configuration parameters currently used by the database are adjusted when the single query statement is executed.
- the query-level parameter adjustment mode performs one-time parameter adjustment for a query statement. This parameter adjustment time delay is small and can meet the personalized parameter adjustment requirements for different query statements.
- the query sentence vectorization module performs vectorization coding on each of the N query sentences to obtain N target feature vectors, N A query statement is all query statements to be served by the database. Then, the query sentence vectorization module passes the N target feature vectors to the load vectorization module (Workload2Vector). The load vectorization module fuses the N target feature vectors into a unified vector, and then the load vectorization module unifies the N target feature vectors. The vector is passed to the parameter tuning system, and the parameter tuning system determines the configuration parameter set of the database for the N query statements according to the unified vector.
- the process of the load vectorization module fusing N target feature vectors into a unified vector can be to merge the corresponding parts of the N target feature vectors, for example: each target feature vector includes query type, table information and cost
- each target feature vector includes query type, table information and cost
- This load-level parameter adjustment mode can perform one-time parameter adjustment for N query statements, which can meet the needs of large throughput scenarios.
- cluster-level tuning the query sentence vectorization module (Query2Vector) performs vectorization coding on each of the N query sentences to obtain N target feature vectors, The N query statements are all the query statements for the database to provide services. Then the query sentence vectorization module transfers the N target feature vectors to the tuning system. The tuning system determines N configuration parameter groups for the N target feature vectors, and then the tuning system transfers these N configuration parameter groups to the load.
- the clustering module (Workload2Cluster), the load clustering module clusters N configuration parameter groups to obtain Q target configuration parameter groups.
- the load clustering module then passes the Q target configuration parameter groups to the parameter adjustment system, and the parameter adjustment system determines the configuration parameter set for the database of N query statements according to the Q target configuration parameter groups. Wherein, the load clustering module clusters N configuration parameter groups to obtain Q target configuration parameter groups.
- the process may be clustering according to the similarity of the configuration parameters of each group, such as for the same type of configuration in the two groups.
- the parameters can be clustered by the shortest Euclidean distance.
- the tuning system determines N configuration parameter groups for the N target feature vectors, and then the load clustering module clusters the N configuration parameter groups into Q target configuration parameter groups. In fact, it is not limited to this way of configuring parameter groups from N target feature vectors to Q targets.
- the load clustering module can also determine N configuration parameter groups according to the N target feature vectors, and then cluster them into Q target configuration parameter groups.
- the load clustering module can include two parts: vector to parameter module (Vector2Pattern) and parameter clustering module (Pattern2Cluster). In this case, the content related to clustering can be understood by referring to Figure 10.
- the load vectorization module (Workload2Vector) that is not related to clustering is not shown in FIG.
- the query sentence vectorization module transfers N target feature vectors to the vector to parameter module, and the vector to parameter module determines N configuration parameter groups for the N target feature vectors.
- the vector-to-parameter module then passes the N configuration parameter groups to the parameter clustering module, and the parameter clustering module clusters the N configuration parameter groups into Q target configuration parameter groups.
- the vector to parameter module may include a deep learning model (deep learning model, DL Model), which can determine discrete values as configuration parameters, for example, according to the relationship between the estimated value and the default value in ⁇ -1,0,1 ⁇
- DL Model deep learning model
- the configuration parameters where: when the estimated value is near the default value, take 0; when the estimated value is much larger than the default value, take 1; when the estimated value is much smaller than the default value, take -1.
- the method for determining the configuration parameters of the database may further include:
- the target parameter adjustment mode for the N query statements is determined from the parameter adjustment mode of the query level, the parameter adjustment mode of the load level, and the parameter adjustment mode of the cluster level.
- the target tuning mode is the query-level tuning mode
- the statement determines a configuration parameter set
- the configuration parameter set includes a target configuration parameter set for the query statement
- the target configuration parameter set is used to configure the database when the query statement is executed.
- the method for determining the configuration parameters of the database may further include: fusing the N target feature vectors to obtain a fused unified vector; wherein , Step 204 may include determining the configuration parameter set of the database corresponding to the N query sentences according to the unified vector and the current state information of the database, and the configuration parameter set includes a target configuration parameter group, so The one target configuration parameter group is used to adjust the configuration parameters currently used by the database when the N query statements are executed.
- This load-level parameter adjustment mode can perform one-time parameter adjustment for N query statements, which can meet the needs of large throughput scenarios.
- the configuration parameter set of the database can include:
- N configuration parameter groups According to N target feature vectors and the current state information of the database, wherein the N configuration parameter groups correspond to the N target feature vectors one to one;
- the N configuration parameter groups Clustering the N configuration parameter groups to obtain Q target configuration parameter groups, wherein the first configuration parameter group corresponds to at least one target feature vector, and the first configuration parameter group is the Q target configuration parameters Any one of the group, the Q is an integer greater than 0, and Q ⁇ N, the first configuration parameter group is used to adjust the current use of the database when executing the query sentence corresponding to the at least one target feature vector Configuration parameters.
- This cluster-level parameter adjustment mode can determine a set of target configuration parameter groups for the same type of query statement, which can ensure a large throughput and meet the needs of low latency.
- FIG. 8 and 10 describe the process of determining the configuration parameters of the database from the perspective of DBMS modularization.
- Figure 11 describes the subsequent processing of the N target feature vectors in the above embodiment from the perspective of the processing flow to obtain the configuration parameter set. The process of determining the configuration parameters of the database.
- the embodiment of the present application provides the processing process of the three-granularity parameter adjustment mode, and also extracts the working process of the parameter adjustment system and the clustering level from the processing process of the three-granularity parameter adjustment mode
- the parameter tuning model in Figure 11 is based on DS-DDPG, which is actually not limited to the DS-DDPG tuning model.
- Q ⁇ Q 1
- Q ⁇ Q 1 , Q 2 ,... Q n ⁇
- Set A determines the configuration parameters of the database.
- the N configuration parameter groups [A 1 , A 2 ,...A n ] corresponding to these N target feature vectors can be determined through the DS-DDPG model, or the vector can be passed to the parameter
- the DL Model of (Vector2Pattern) determines the N configuration parameter groups [P 1 , P 2 ,...P n ] corresponding to the N target feature vectors, no matter it is N configuration parameter groups [A 1 , A 2 ,...A n ], or N configuration parameter groups [P 1 , P 2 ,...P n ], and then clustering is performed to obtain Q target configuration parameter groups ⁇ U 1 , U 2 ,...U q ⁇ , and then each target configuration parameters do U j to the quantization and coding, to obtain the quantization
- the specific tuning process of the DS-DDPG model will be introduced in detail later, and I will not expand it here.
- the DL Model can also be understood by referring to Figure 12, where L1 is the input layer, L2, L3, and L4 are hidden layers, and L5 is the output layer.
- the query statement needs to be vectorized encoding.
- the vectorized encoding is the query information in the query statement.
- the query information in the embodiment of this application includes query type and table information. And the type of operator involved in the query and the corresponding cost information. Among them, the query type may include, for example, insert, select, update, delete, and so on.
- the table information may include the identification information of the table, and may also include the data volume and data structure of the table.
- the types of operators involved in the query can include sequential scan (Seq_Scan), hash join (Hash_Join), and aggregation operation (Aggregate).
- the cost information of an operator refers to the execution cost of the operator when the query statement is executed. The execution cost of each operator can be calculated in advance by the optimizer according to the query plan of the query statement, that is, the execution cost of each operator can be obtained in advance.
- Figure 13 shows the process of vectorization for a query statement. In this process, the execution cost from the query plan to the pre-estimator, and then the process of vectorization for each part of the query plan.
- a template vector can include three parts: query type (DML), table (Tables), and operator cost (Operation Costs).
- DML query type
- Tables table
- Operator Costs operator cost
- the template vector can include insert, select, update, and delete. If the query sentence is a select operation, fill 1 in the select bit, and fill 0 in the other insert, update and delete bits.
- the query plan of the query statement includes the table information of the three tables tbl1, tbl2, and tbl3, and the bits of tbl1, tbl2, and tbl3 are filled with 1, and the bits from tbl4 to tbl8 are filled with 0.
- each operator includes the time point of the corresponding operation of the cost estimation.
- the time points of the corresponding operation of the cost estimation include the total time (Total), the start time (Startup) and the child node time (Child).
- the total time indicates the time point when the execution results are all returned;
- the start time indicates the time point when the results are returned;
- the child node time indicating the earliest time when the node operator can be executed.
- the target feature vector of the query sentence is obtained.
- the execution cost of these operators can be normalized to obtain the normalized target feature vector, and all features are constrained to the same order of magnitude. For example, a feature with a value of 10000 accounts for a larger proportion in the process of generating the final result than a feature with a value of -1, but after normalization, both are 0.5, indicating that the two have the same degree of influence .
- the representation of DML and Table in the target feature vector is represented by bit identification.
- bit identification it is not limited to this representation of bit identification, and can also be identified by floating-point bits.
- bit identification it is not limited to the above-mentioned representations of 0 and 1, for example: T can be used instead of 1, F can be used instead of 0, or other representations can be used instead of 1 and 0. .
- floating-point bit identification all the query types involved can be marked with different characters. For example, insert is represented by 1, selection is represented by 2, update is represented by 3, and deletion is represented by 4.
- insert is represented by 1
- selection is represented by 2
- update is represented by 3
- deletion is represented by 4.
- these four query types for illustration. It is not limited to only these four query types, nor is it limited to the numerical value listed here. Different query types can also be represented by other symbols or numerical values. .
- the first query information includes the first query information.
- the query type, the first table information, the type of the operator involved in the query, and the corresponding cost information where the first query type is used to indicate the operation type of the query statement corresponding to the first query information on the database
- the first table information is used to indicate the relational table involved when the query sentence corresponding to the first query information is executed by the database;
- the cost information is used to indicate that the involved operator is executing the first The respective execution cost of the query statement corresponding to the query information.
- the target feature vector corresponding to the first query information includes the identifier of the first query type and the identifier of the second query type, the identifier of the first table information and the identifier of the second table information, and the floating-point bits of each type of operator;
- the identifier of the first query type is represented by a first value
- the identifier of the second query type is represented by a second value
- the second query type belongs to the query type of the database and is not included in the first value.
- the identifier of the first table information is represented by a third value
- the identifier of the second table information is represented by a fourth value
- the second table information belongs to the table information of the database and is not included in
- the floating-point bits of each type of operator include the execution cost of the corresponding type of operator.
- the execution cost of the corresponding type operator is the normalized execution cost.
- the DS-DRL model includes a predictor model, an environment model, and an agent model.
- the environment model includes two parts: an external measurement (outer metric) and an internal state (inner state).
- the process of adjusting parameters through the DS-DRL model can include:
- Workload provides a query statement to the query statement vectorization module.
- the query sentence vectorization module executes the vectorization encoding process described above on the query sentence, and N target feature vectors are obtained.
- the query sentence vectorization module transmits the N target feature vectors to the prediction model.
- the prediction model predicts the state information change ⁇ S of the database according to the N target feature vectors, where ⁇ S is the state information of the database after the execution of the N query sentences relative to the state information of the database before the execution of the N query sentences The amount of change.
- the prediction model can be a neural network model.
- the prediction model transmits the ⁇ S to the environment model.
- the status information of the database can include the utilization of the central processing unit (CPU) of the database, the utilization of the network, the number of input/output (I/O) of the disk, the busyness of the disk and the number of memory page changes Wait.
- CPU central processing unit
- I/O input/output
- the environment model passes S'to the agent model.
- the proxy model takes S'as input and generates a recommended configuration parameter set (Action).
- the agent model transfers the recommended configuration parameter set to the environment model.
- the environment model performs simulation configuration according to the recommended configuration parameter set.
- the environment model also includes an internal state (inner state), and the internal state stores the current configuration parameters of the database, so that the configuration parameters in the internal state can be adjusted according to the recommended configuration parameter set (Action).
- the query sentence vectorization module transmits the N target feature vectors to the environment model.
- the environment model executes the N target feature vectors to update the S and S'.
- the environment model determines the performance value (Reward, R) of the N target feature vectors after being executed.
- R is the performance value after the execution of the N target feature vectors; w i is the weight of the i-th performance indicator among m performance indicators (such as delay, throughput, etc.), and r i is the performance of the i-th performance indicator value.
- ⁇ 0, t represents the amount of change in the current performance relative to the initial performance
- ⁇ (t-1), t represents the amount of change in the current performance relative to the previous performance
- m 0 represents the initial performance metric
- m t represents the current performance metric
- m t-1 represents the previous performance metric relative to the current one, where the performance metric can be such as delay, throughput, and execution time.
- the environment model passes the R value to the agent model.
- the agent model updates the parameter adjustment strategy according to the performance value R, and determines an updated recommended configuration parameter set according to the updated S'.
- the performance value R does not meet the preset condition, repeat the process of updating the S and the S'in the environment model in 305 and 306, and the proxy model determining the updated recommended configuration parameter set until the performance value R meets A preset condition, and the updated recommended configuration parameter set corresponding to the R value meeting the preset condition is the configuration parameter set.
- Satisfying the preset condition in this possible implementation manner may include that the value of R exceeds a preset threshold, or a predetermined resource restriction condition is reached, for example: iterated a predetermined number of times, or iterated for a preset duration.
- step 204 may include: according to the N target feature vectors and the current state information of the database, using a dual-state deep reinforcement learning DS-DRL model to determine the database corresponding to the N query sentences The set of configuration parameters.
- the two-state deep reinforcement learning DS-DRL model is used according to the N target feature vectors and the current state information of the database to determine and
- the configuration parameter set of the database corresponding to the N query sentences may include:
- the state information change amount ⁇ S of the database is predicted according to the N target feature vectors, and the ⁇ S is the difference between the execution of the N query sentences and before the execution of the N query sentences.
- the amount of change in the status information of the database is predicted according to the N target feature vectors, and the ⁇ S is the difference between the execution of the N query sentences and before the execution of the N query sentences.
- the updated recommended configuration parameter set corresponding to the R value satisfying the preset condition is the configuration parameter set.
- the proxy model includes a role actor model and a score critic model.
- the role actor model and the scoring critic model can be neural network models.
- Steps 401 to 403 in the process of introducing parameters based on the DS-DDPG model are the same as steps 301 to 303 in the foregoing embodiment, and will not be repeated here.
- the environment model passes S'to the actor model.
- the role model takes S'as input and generates a recommended configuration parameter set (Action). That is to say, determining a recommended configuration parameter set according to the simulation state information S'through the proxy model may include: determining a recommended configuration parameter set according to the simulation state information S'through the actor model ; Determine the score of the recommended configuration parameter set according to the simulation state information S'and the recommended configuration parameter set through the critic model.
- Steps 405 and 406 are the same as steps 305 and 306 in the foregoing embodiment, and will not be repeated here.
- the environment model passes the R value to the critic model.
- the actor model passes the configuration parameter set to the critic model.
- the critic model determines the score of the recommended configuration parameter set according to the simulation state information S'and the recommended configuration parameter set.
- the critic model passes the score to the role model.
- the updating of the parameter adjustment strategy according to the performance value R in the above 307 may include:
- the actor model updates the weight of the actor model according to the score. That is, through the actor model, the weight of the actor model is updated according to the score.
- the critic model updates the weight of the critic model according to the performance value R. That is to say, through the critic model, the weight of the critic model is updated according to the performance value R.
- the solution for adjusting the configuration parameters of the database has a shorter delay than the default configuration or the recommended configuration of the DBA.
- Table 1 is the benchmark of the Transaction Processing Performance Council Benchmark TM H (TPC- H) Use case as an example to test the delay test results of TPCH10x use case and TPCH30x use case under different configurations.
- Table 1 Total delay of job execution under different configurations (seconds: s)
- the time delay of the tuning recommended configuration solution provided by the embodiment of the present application is far less than the default configuration and the recommended configuration of DBA.
- an embodiment of an apparatus 50 for determining configuration parameters of a database provided by an embodiment of the present application includes:
- the obtaining unit 501 is configured to obtain N query sentences sent by M clients, where M is an integer greater than 0, and the N is an integer greater than 0, and N ⁇ M;
- the first determining unit 502 is configured to determine N query information from the N query statements obtained by the obtaining unit 501, and the N query statements correspond to the N query information one-to-one;
- the vector encoding unit 503 is configured to vectorize and encode the N query information determined by the first determining unit 502 to obtain corresponding N target feature vectors, where the N query information and the N target feature vectors One-to-one correspondence
- the second determining unit 504 is configured to determine the configuration parameter set of the database corresponding to the N query sentences according to the N target feature vectors encoded by the vector encoding unit 503 and the current state information of the database,
- the configuration parameter set includes at least one target configuration parameter group, the target configuration parameter group includes at least one configuration parameter, and the configuration parameter set is used to execute the N query statements.
- the configuration parameters of the database can be determined by the target feature vector of the query sentence and the current state information of the database, which can ensure the good performance of the database under different load requirements. For example: It can meet the performance of delay and throughput.
- the device 50 further includes:
- the third determining unit 505 is configured to determine, in response to the mode selection instruction, the N queries acquired by the acquiring unit 501 from the query-level tuning mode, the load-level tuning mode, and the cluster-level tuning mode The target tuning mode of the statement.
- the vector encoding unit 503 is further configured to merge the N target feature vectors when the target parameter adjustment mode is the load level parameter adjustment mode to obtain the fusion The unified vector after;
- the second determining unit 504 is configured to determine a configuration parameter set of the database corresponding to the N query sentences according to the unified vector and the current state information of the database, and the configuration parameter set includes a target A configuration parameter group, where the one target configuration parameter group is used to adjust the configuration parameters currently used by the database when the N query statements are executed.
- the second determining unit 504 is configured to, when the target parameter adjustment mode is the cluster-level parameter adjustment mode, according to the N target feature vectors and the database Current status information, determine N configuration parameter groups, where N configuration parameter groups correspond to the N target feature vectors one-to-one; cluster the N configuration parameter groups to obtain Q target configuration parameter groups , Wherein the first configuration parameter group corresponds to at least one target feature vector, the first configuration parameter group is any one of the Q target configuration parameter groups, the Q is an integer greater than 0, and Q ⁇ N, The first configuration parameter group is used to adjust the configuration parameters currently used by the database when the query sentence corresponding to the at least one target feature vector is executed.
- the first query information in the N query information includes the first query type, the first table information, the type of the operator involved in the query and the corresponding cost information, and the first query The information is any one of the N query information;
- the first query type is used to indicate the operation type of the query sentence request corresponding to the first query information on the database
- the first table information is used to indicate the relational table involved when the query sentence corresponding to the first query information is executed by the database;
- the cost information is used to indicate the respective execution costs of the operators involved when executing the query sentence corresponding to the first query information.
- the target feature vector corresponding to the first query information includes the identifier of the first query type and the identifier of the second query type, the identifier of the first table information and the identifier of the second table information, and each Floating point of the type operator;
- the identifier of the first query type is represented by a first value
- the identifier of the second query type is represented by a second value
- the second query type belongs to the query type of the database and is not included in the first value.
- the identifier of the first table information is represented by a third value
- the identifier of the second table information is represented by a fourth value
- the second table information belongs to the table information of the database and is not included in the first query Information
- the floating-point bits of each type of operator include the execution cost of the corresponding type of operator.
- the execution cost of the corresponding type of operator is a normalized execution cost.
- the second determining unit 504 is configured to use a dual-state deep reinforcement learning DS-DRL model according to the N target feature vectors and the current state information of the database to determine The configuration parameter set of the database corresponding to the N query statements.
- the second determining unit 504 is configured to: when the DS-DRL model includes a prediction model, an environment model, and a proxy model;
- the state information change amount ⁇ S of the database is predicted according to the N target feature vectors, and the ⁇ S is the difference between the execution of the N query sentences and before the execution of the N query sentences.
- the amount of change in the status information of the database is predicted according to the N target feature vectors, and the ⁇ S is the difference between the execution of the N query sentences and before the execution of the N query sentences.
- the updated recommended configuration parameter set corresponding to the R value satisfying the preset condition is the configuration parameter set.
- the second determining unit 504 is configured to: when the DS-DRL model is a dual-state depth deterministic policy gradient algorithm DS-DDPG model, the proxy model includes a role actor model and a score critic Model time
- the score of the recommended configuration parameter set is determined according to the simulation state information S'and the recommended configuration parameter set.
- the second determining unit 504 is configured to update the weight of the actor model according to the score through the actor model.
- the second determining unit 504 is configured to update the weight of the critic model according to the performance value R through the critic model.
- the corresponding content of the device 50 for determining the configuration parameters of the database can be understood by participating in the corresponding content of the embodiments corresponding to FIG. 8 to FIG. 15, and will not be repeated here.
- the device for determining the configuration parameters of the database may be a database server, and the acquiring unit 501, the first determining unit 502, the vector encoding unit 503, the second determining unit 504, and the parameter adjusting unit may all be processors.
- the acquiring unit 501 may also be a communication interface.
- FIG. 17 shows a schematic diagram of a possible logical structure of the database server 60 involved in the foregoing embodiment provided by the embodiment of this application.
- the database server 60 includes a processor 601, a communication interface 602, a memory 603, and a bus 604.
- the processor 601, the communication interface 602, and the memory 603 are connected to each other through a bus 604.
- the processor 601 is used to control and manage the actions of the database server 60.
- the processor 601 is used to execute steps 201 to 205 in FIG. 9 and/or the technology described herein. Other processes.
- the communication interface 602 is used to support the database server 60 to communicate.
- the memory 603 is used to store the program code and data of the database server 60.
- the processor 601 may be a central processing unit, a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules and circuits described in conjunction with the disclosure of this application.
- the processor may also be a combination that implements computing functions, for example, a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and so on.
- the bus 604 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus or the like. The bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in FIG. 17, but it does not mean that there is only one bus or one type of bus.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- a computer-readable storage medium stores computer-executable instructions.
- the device executes the above-mentioned figure.
- the method for determining the configuration parameters of the database is described in the part of the embodiments from 8 to 15.
- a computer program product in another embodiment, includes computer-executable instructions stored in a computer-readable storage medium; at least one processor of the device can be accessed from a computer The reading storage medium reads the computer-executable instruction, and at least one processor executes the computer-executable instruction to make the device execute the method for determining the configuration parameters of the database described in the above-mentioned partial embodiments of FIGS. 8-15.
- the disclosed system, device, and method may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or the parts that contribute to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium.
- Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Operations Research (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Stored Programmes (AREA)
Abstract
一种确定数据库的配置参数的方法,可以应用于数据库管理系统。该方法通过对查询语句进行向量化编码,从而得到查询语句的目标特征向量,然后再根据目标特征向量和所述数据库(101)的当前状态信息这两方面的双状态信息确定用于确定数据库的配置参数集,该配置参数集用于执行所述N个查询语句。可以动态的根据查询语句和数据库(101)的当前状态信息确定数据库(101)的配置参数,可以保障数据库(101)在不同负载需求下的良好表现,例如:时延、吞吐量等方面的表现。
Description
本申请要求于2019年5月14日提交中国专利局、申请号为201910401715.0、发明名称为“一种确定数据库的配置参数的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及数据库技术领域,具体涉及一种确定数据库的配置参数的方法及装置。
数据库(database,DB)中存储有海量数据,可以为大量客户端提供数据服务。换句话说,客户端就是数据库的负载。数据库工作时可以基于不同的配置参数运行,对客户端提供不同表现的数据服务。也就是说,若数据库的配置参数与负载的业务相匹配,则数据库提供服务时响应的就会较快,若数据库的配置参数与负载的业务不相匹配,则数据库提供服务时响应的就会较慢。
为了更好的为客户端提供服务,数据库运行期间,数据库管理员(database administrator,DBA)会根据负载的情况确定数据库的配置参数。
目前,DBA确定数据库的配置参数需要在测试环境中对所要部署的业务进行测试,需要花费大量时间才能确定适合这次业务的数据库的配置参数,而一旦业务发生变化,则需要重新测试。针对网络中随时有可能变化的业务或者负载的数量,这种DBA调参的方式无法满足当前的需求。
发明内容
本申请实施例提供一种确定数据库的配置参数的方法,用于动态的根据负载的情况和数据库的当前状态信息确定数据库的配置参数,可以保障数据库在不同负载需求下的良好表现,例如:时延、吞吐量等方面的表现。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供一种确定数据库的配置参数的方法,可以包括:获取M个客户端发送的N个查询语句,所述M为大于0的整数,所述N为大于0的整数,且N≥M;从所述N个查询语句中确定N个查询信息,所述N个查询语句与所述N个查询信息一一对应;对所述N个查询信息进行向量化编码,以得到对应的N个目标特征向量,所述N个查询信息与所述N个目标特征向量一一对应;根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括至少一个目标配置参数组,所述目标配置参数组包括至少一个配置参数,所述配置参数集用于执行所述N个查询语句。
上述第一方面提供的技术方案中,在确定配置参数集后可以基于该配置参数集调整数据库当前所使用的配置参数,配置参数的调整方式可以是用当前配置参数集中的配置参数替换数据库当前所使用的配置参数,也可以是通过在数据库当前所使用的配置参数的数值上增加或减少偏移值的方式调整数据库当前所使用的配置参数。该第一方面中可以动态的根据负载的情况和数据库的当前状态信息确定数据库的配置参数,可以保障数据库在不同 负载需求下的良好表现,例如:可以满足时延和吞吐量等方面的表现。
在第一方面的一种可能的实现方式中,该方法还可以包括:响应于模式选择指令,从查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式中确定用于所述N个查询语句的目标调参模式。也就是说,该种可能的实现方式提供了查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式这三种粒度的调参模式,模式选择指令可以是用户触发的,也可以是数据库管理员触发的,也就是说用户或者数据库管理员可以在不同需求场景下对调参模式进行切换,从而可以满足多种调参需求。
在第一方面的一种可能的实现方式中,当所述目标调参模式为所述查询级别的调参模式时,所述M=1,所述N=1,也就是说,针对查询级别的调参模式,可以针对一个查询语句确定一个配置参数集,该配置参数集中包括针对该查询语句的一个目标配置参数组,该一个目标配置参数组用于执行该一个查询语句时配置给数据库。
在第一方面的一种可能的实现方式中,当所述目标调参模式为所述负载级别的调参模式时,所述方法还包括:对所述N个目标特征向量进行融合,以得到融合后的统一向量;所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述统一向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括一个目标配置参数组,所述一个目标配置参数组用于在执行所述N个查询语句时调整所述数据库当前所使用的配置参数。由该第一方面的一种可能的实现方式可知,这种负载级别的调参模式可以为N个查询语句进行一次调参,可以满足大吞吐量的场景需求。
在第一方面的一种可能的实现方式中,当所述目标调参模式为所述聚类级别的调参模式时,所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述N个目标特征向量和所述数据库的当前状态信息,确定N个配置参数组,其中,N个配置参数组与所述N个目标特征向量一一对应;对所述N个配置参数组进行聚类,以得到Q个目标配置参数组,其中,第一配置参数组对应至少一个目标特征向量,所述第一配置参数组为所述Q个目标配置参数组中的任意一个,所述Q为大于0的整数,且Q<N,所述第一配置参数组用于在执行所述至少一个目标特征向量对应的查询语句时调整所述数据库当前所使用的配置参数。由该第一方面的一种可能的实现方式可知,这种聚类级别的调参模式可以针对同类型的查询语句确定一组目标配置参数组,既可以保证较大的吞吐量,又可以满足低时延的需求。
在第一方面的一种可能的实现方式中,所述N个查询信息中的第一查询信息包括第一查询类型、第一表信息和查询时所涉及算子的类型和对应的代价信息,所述第一查询信息为所述N个查询信息中的任意一个;所述第一查询类型用于指示所述第一查询信息所对应的查询语句请求对所述数据库的操作类型;所述第一表信息用于指示所述第一查询信息所对应的查询语句被所述数据库执行时所涉及的关系表;所述代价信息用于指示所涉及的算子在执行所述第一查询信息所对应的查询语句时各自的执行代价。
在第一方面的一种可能的实现方式中,所述第一查询信息对应的目标特征向量包括第一查询类型的标识和第二查询类型的标识、第一表信息的标识和第二表信息的标识,以及 各类型算子的浮点位;其中,所述第一查询类型的标识用第一值表示,所述第二查询类型的标识用第二值表示,所述第二查询类型属于所述数据库的查询类型且未包括在所述第一查询信息中;所述第一表信息的标识用第三值表示,所述第二表信息的标识用第四值表示,所述第二表信息属于所述数据库的表信息且未包括在所述第一查询信息中;所述各类型算子的浮点位上包括对应类型算子的执行代价。
该种可能的实现方式中,标识可以是比特位标识,也可以是浮点位标识,当是比特位标识时可以通过1表示第一查询类型,可以通过0表示第二查询类型,也可以通过T来表示第一查询类型,可以通过F表示第二查询类型,或者其他比特位的表示方式也都可以适用于本方案。当是浮点位标识时,可以对所涉及到的所有查询类型都用不同字符做标记,例如:插入(insert)用1表示、选择(select)用2表示、更新(update)用3表示,删除(delete)用4表示,当然,这里只是以这四种查询类型为例进行说明,不限定只有这四种查询类型,也不限于用该处列举的数值的形式表示,也可以通过其他符号或数值的形式来表示不同的查询类型。
在第一方面的一种可能的实现方式中,所述对应类型算子的执行代价为归一化后的执行代价。
在第一方面的一种可能的实现方式中,所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习(double-state deep reinforcement learning,DS-DRL)模型,确定与所述N个查询语句对应的所述数据库的配置参数集。该DS-DRL模型可以对查询语句的特征向量和数据库的状态信息这两方面的数据进行处理,从而得出适合查询语句的数据库的配置参数。
在第一方面的一种可能的实现方式中,所述DS-DRL模型包括预测模型、环境模型和代理模型;所述根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:通过所述预测模型,根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量;通过所述环境模型,根据所述ΔS,以及执行所述N个查询语句之前的所述数据库的状态信息S确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’;通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述环境模型,根据所述推荐配置参数集进行模拟配置,并执行所述N个目标特征向量,以更新所述S和S’,并确定所述N个目标特征向量被执行后的表现值R;通过所述代理模型,根据所述表现值R更新调参策略,并根据更新后的所述S’确定更新的推荐配置参数集;若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;若所述表现值R不满足预设条件,则重复执行上述环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。由该种可能的实现方式可知,该DS-DRL模型通过预测模型、环境模型和代理模型三个模型的配合实现上述确定配置参数集的过程, 可以确保配置参数集的准确度。
该种可能的实现方式中的满足预设条件可以包括经过R值超过预设阈值,或者达到预先规定的资源限制条件,例如:迭代了预定次数,或者迭代了预设时长等。
在第一方面的一种可能的实现方式中,当所述DS-DRL模型为双状态深度确定性策略梯度算法DS-DDPG模型时,所述代理模型包括角色actor模型和评分critic模型;所述通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集,包括:通过所述actor模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述critic模型,根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐配置参数集的评分。
在第一方面的一种可能的实现方式中,所述根据所述表现值R更新调参策略,可以包括:通过所述actor模型,根据所述评分更新所述actor模型的权重。
在第一方面的一种可能的实现方式中,所述根据所述表现值R更新调参策略,可以包括:通过所述critic模型,根据表现值R,更新所述critic模型的权重。
第二方面,提供了一种确定数据库的配置参数的装置,用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块或单元。
第三方面,提供了一种确定数据库的配置参数的装置。该装置可以包括至少一个处理器、存储器和通信接口。处理器与存储器和通信接口耦合。存储器用于存储指令,处理器用于执行该指令,通信接口用于在处理器的控制下与其他网元进行通信。该指令在被处理器执行时,使处理器执行第一方面或第一方面的任意可能的实现方式中的方法。
第四方面,提供了一种数据库系统。该数据库系统包括第二方面或第三方面的确定数据库的配置参数的装置和数据库。
第五方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有程序,该程序使得数据查询的装置执行上述第一方面,及其各种实现方式中的任一种确定数据库的配置参数的方法。
第六方面,提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得设备实施上述第一方面或者第一方面的任一种可能的实现方式所提供的确定数据库的配置参数的方法。
可以理解地,上述提供的任一种确定数据库的配置参数的装置、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的确定数据库的配置参数的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
图1是本申请实施例提供的数据库系统的一架构示意图;
图2是本申请实施例提供的数据库系统的另一架构示意图;
图3是本申请实施例提供的数据库系统的又一架构示意图;
图4是本申请实施例提供的数据库系统的又一架构示意图;
图5是本申请实施例提供的数据库服务器的一结构示意图;
图6是本申请实施例提供的数据库服务器的另一结构示意图;
图7是本申请实施例提供的神经网络模型的一示意图;
图8是本申请实施例提供的数据库系统的又一架构示意图;
图9是本申请实施例提供的确定数据库的配置参数的方法的一实施例示意图;
图10是本申请实施例提供的数据库系统的又一架构示意图;
图11是本申请实施例提供的确定数据库的配置参数的方法的另一实施例示意图;
图12是本申请实施例提供的神经网络模型的另一示意图;
图13是本申请实施例提供的查询语句向量化编码过程的一示意图;
图14是本申请实施例提供的DS-DRL的一架构示意图;
图15是本申请实施例提供的DS-DDPG的一架构示意图;
图16是本申请实施例提供的确定数据库的配置参数的装置的一实施例示意图;
图17是本申请实施例提供的数据库服务器的另一结构示意图。
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请的实施例所应用的数据库系统的架构如图1所示,该数据库系统包括数据库101和数据库管理系统(Database Management System,DBMS)102。
其中,数据库101是指长期存储在数据存储器(Data Store)中的有组织的数据集合,即按照一定的数据模型组织、存储和使用的相关联的数据集合,比如,数据库101可以包括一个或者多个表数据。
DBMS 102用于建立、使用和维护数据库101,以及对数据库101进行统一的管理和控制,以保证数据库101的安全性和完整性。用户可以通过DBMS 102访问数据库101中的数据,数据库管理员(database administrator,DBA)也通过DBMS 102进行数据库的维护工作。DBMS102提供多种功能,可使多个应用程序和用户设备使用不同的方法,在同一时刻或不同时刻去建立,修改和询问数据库,应用程序和用户设备可以统称为客户端。DBMS 102所提供的功能可以包括以下几项:(1)数据定义功能,DBMS 102提供数据定义语言(Data Definition Language,DDL)来定义数据库结构,DDL用于刻画数据库框架,并可以被保存在数据字典中;(2)数据存取功能,DBMS 102提供数据操纵语言(Data Manipulation Language,DML),实现对数据库数据的基本存取操作,比如检索、插入、修改和删除;(3)数据库运行管理 功能,DBMS 102提供数据控制功能,即是数据的安全性、完整性和并发控制等对数据库运行进行有效地控制和管理,以确保数据正确有效;(4)数据库的建立和维护功能,包括数据库初始数据的装入,数据库的转储、恢复、重组织,系统性能监视、分析等功能;(5)数据库的传输,DBMS 102提供处理数据的传输,实现客户端与DBMS 102之间的通信,通常与操作系统协调完成。
具体地,图2为单机数据库系统示意图,包括一个数据库管理系统和数据存储器(Data Store),该数据库管理系统用于提供数据库的查询和修改等服务,该数据库管理系统将数据存储到数据存储器中。在单机数据库系统中,数据库管理系统和数据存储器通常位于单一服务器上,比如一台对称多处理器(Symmetric Multi-Processor,SMP)服务器。该SMP服务器包括多个处理器,所有的处理器共享资源,如总线,内存和I/O系统等。数据库管理系统的功能可由一个或多个处理器执行内存中的程序来实现。
图3为采用共享磁盘(Shared-storage)架构的集群数据库系统示意图,包括多个节点(如图3中的节点1-N),每个节点部署有数据库管理系统,分别为用户提供数据库的查询和修改等服务,多个数据库管理系统存储有共享的数据在共享数据存储器中,并且通过交换机对数据存储器中的数据执行读写操作。共享数据存储器可以为共享磁盘阵列。集群数据库系统中的节点可以为物理机,比如数据库服务器,也可以为运行在抽象硬件资源上的虚拟机。若节点为物理机,则交换机为存储区网络(Storage Area Network,SAN)交换机、以太网交换机,光纤交换机或其它物理交换设备。若节点为虚拟机,则交换机为虚拟交换机。
图4为采用无共享(Shared-nothing)架构的集群数据库系统示意图,每个节点具有各自独享的硬件资源(如数据存储器)、操作系统和数据库,节点之间通过网络来通信。该体系下,数据将根据数据库模型和应用特点被分配到各个节点上,查询任务将被分割成若干部分,在所有节点上并行执行,彼此协同计算,作为整体提供数据库服务,所有通信功能都在一个高宽带网络互联体系上实现。如同图3所描述的共享磁盘架构的集群数据库系统一样,这里的节点既可以是物理机,也可以是虚拟机。
在本申请所有实施例中,数据库系统的数据存储器(Data Store)包括但不限于固态硬盘(SSD)、磁盘阵列或其他类型的非瞬态计算机可读介质。图2至图4中虽未示出数据库,应理解,数据库存储在数据存储器中。所属领域的技术人员可以理解一个数据库系统可能包括比图2至图4中所示的部件更少或更多的组件,或者包括与图2至图4中所示组件不同的组件,图2至图4仅仅示出了与本申请实施例所公开的实现方式更加相关的组件。例如,虽然图3和图4中已经描述了4个节点,但所属领域的技术人员可理解成一个集群数据库系统可包含任何数量的节点。各节点的数据库管理系统功能可分别由运行在各节点上的软件、硬件和/或固件的适当组合来实现。
本领域技术人员根据本申请实施例的教导可以很清楚地理解,本申请实施例的方法应用于数据库管理系统,该数据库管理系统可应用于单机数据库系统、Shared-nothing架构的集群数据库系统、Shared-storage架构的集群数据库系统,或其它类型的数据库系统。
进一步地,参见图1,DBMS 102在执行数据库101查询时,通常需要对查询语句进行语 法分析、预编译和优化等步骤,估算出数据库系统认为代价最小的执行方式,继而生成代价最小的执行计划,运行时执行结构体将按照生成的执行计划进行数据操作,以提高数据库系统的性能。DBMS 102在对查询语句进行代价估算时,需要收集查询语句的统计信息,并根据收集的统计信息进行代价估算。其中,收集统计信息的方法可以是通过机器学习进行模型训练得到的模型信息,或者是通过数据采样统计得到的统计信息,模型信息也可以称为统计信息。
其中,DBMS 102可以位于数据库服务器中,比如,该数据库服务器具体可以为图2所述的单机数据库系统中的SMP服务器,或者图3或图4中所述的节点。具体的,如图5所示,数据库服务器可以包括内核1021、以及独立于内核1021的且位于数据库服务器内部的外部训练器1022;或者,如图6所示,数据库服务器包括内核1021,外部训练器1022位于数据库服务器之外。其中,内核1021是数据库服务器的核心,可以用于执行DBMS 102所提供的多种功能。内核1021可以包括实用程序10211和优化器10212。在数据库服务器在执行数据库101查询时,实用程序10211可以触发外部训练器1022通过机器学习进行模型训练,从而得到训练模型的模型信息。优化器10212可以根据外部训练器1022训练得到的模型信息进行代价估算,从而生成代价最小的执行计划,使得执行结构体按照生成的执行计划进行数据操作,以提高数据库系统的性能。
机器学习是指依赖于对现存数据的学习或者观察获取新的推理模型的过程。机器学习可以通过多种不同的算法进行实现,常见的机器学习的算法可以包括:神经网络(Neural Network,NN)和随机森林(Random Forest,RF)等模型。比如,神经网络可以包括前向反馈神经网络(Feed Forward Neural Network,FFNN)和循环神经网络(Recurrent Neural Network,RNN)。如图7所示,为一种神经网络的模型示意图,该模型可以包括输入层、隐层和输出层,每一层可以包括不同数量的神经元。
因为请求数据库提供服务的客户端数量不是固定的,为了使数据库在为客户端提供服务时能有更好的表现(如:时延、吞吐量方面的表现),本申请实施例提供了一种确定数据库的配置参数的方法,该方法提供了一种基于双状态的配置参数的调整方案,其中的双状态可以是,查询语句的特征向量(query vector)和数据库的状态信息(database states)。其中,查询语句的特征向量会影响数据库各项活动的统计信息,也可以说是查询语句或者负载会影响数据库各项活动的统计信息,数据库的状态信息可以包括数据库的内部配置,该配置会受调参行为影响。本申请实施例提供的基于双状态的配置参数的调整方案可以是通过双状态深度强化学习(double-state deep reinforcement learning,DS-DRL)模型实现的,该DS-DRL模型可以对查询语句的特征向量和数据库的状态信息这两方面的数据进行处理,从而得出适合查询语句的数据库的配置参数。DS-DRL模型的具体表现形式可以有多种,其中一种例如可以为双状态深度确定性策略梯度算法(double-state deep deterministic policy gradient,DS-DDPG)模型,关于DS-DRL模型和DS-DDPG模型的具体结构和在确定数据库的配置参数的过程中的作用在后文进行描述,此处不做过多展开。
本申请实施例在上述基于双状态的配置参数的调整方案中包括三种粒度的调参模式,分别为:查询级别的调参模式(query-level tuning)、负载级别的调参模式 (workload-level tuning)和聚类级别的调参模式(cluster-level tuning)。关于这三种粒度的调参模式在确定数据库的配置参数的过程中的具体内容在后文进行描述,此处不做过多展开。
因本申请实施例所提供的确定数据库的配置参数的方法应用于DBMS,由DBMS为DB提供与负载相匹配的配置参数集,进而根据配置参数集调整DB的配置参数。因为DBMS和DB都属于数据库系统,下面结合图8介绍本申请实施例的确定数据库的配置参数的过程。
图8为本申请实施例提供的数据库系统的又一架构示意图。
如图8所示,该数据库系统包括DBMS和DB,其中,DB的功能与图1所对应实施例描述的功能基本相同,此处不再做详细赘述。另外,需要说明的是该DB可以基于DBMS所提供的配置参数集调整自身所使用的配置参数。
其中,DBMS可以包括控制器(Controller)、查询语句向量化模块(Query2Vector)、负载向量化模块(Workload2Vector)、负载聚类模块(Workload2Cluster)和调参系统(tuner),该DBMS中还可以包括训练参数集存储模块,若包括训练参数集存储模块则该模块中存储有训练数据(Training Data)。需要说明的是,(Query2Vector)中的“2”表示的是英文“to”的意思,表示的是从查询语句到向量,其他模块中的“2”也都是相同的含义,此处不再一一列举。
训练数据主要用于训练DS-DRL模型,若DS-DRL模型是离线训练的,则该数据库系统也可以不包括该训练参数集存储模块。一种可能的实现方式中,即使DS-DRL模型是离线训练的也可以包括该训练参数集存储模块,在调参过程中,调参系统所产生的新的数据也可以更新到训练参数集存储模块的训练数据中,从而实现在线DS-DRL模型的优化。
图8中所示的客户端并不限制数量,可以表示有一个客户端,也可以表示有两个或多个客户端。
客户端在使用数据库时,会发送查询语句(Query),可以是通过查询请求的方式发送查询语句。
DBMS可以执行如图9所示的确定数据库的配置参数的方法的过程。
如图9所示,本申请实施例提供的确定数据库的配置参数的方法的一实施例可以包括:
201、获取M个客户端发送的N个查询语句。
所述M为大于0的整数,所述N为大于0的整数,且N≥M。
本申请实施例中,可以只有一个客户端,也可以有两个或多个客户端,一个客户端可以发送一个查询语句,也可以发送两个或多个查询语句。
202、从所述N个查询语句中确定N个查询信息。
所述N个查询语句与所述N个查询信息一一对应。
203、对所述N个查询信息进行向量化编码,以得到对应的N个目标特征向量。
所述N个查询信息与所述N个目标特征向量一一对应。
204、根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集。
所述配置参数集包括至少一个目标配置参数组,所述目标配置参数组包括至少一个配 置参数。其中,所述配置参数集用于执行所述N个查询语句。
一种可能的实现方式中,在步骤204之后还可以执行步骤205。
205、在执行所述N个查询语句时,根据所述配置参数集调整所述数据库当前所使用的配置参数。
配置参数的调整方式可以是用当前配置参数集中的配置参数替换数据库当前所使用的配置参数,也可以是通过在数据库当前所使用的配置参数的数值上增加或减少偏移值的方式调整数据库当前所使用的配置参数。
一种可能的实现方式中,上述步骤201至205也可以是将DBMS中除去控制器的剩余部分作为一个确定数据库的配置参数的装置来执行的,这种情况下控制器为客户端提供访问数据库的接口,转发查询语句。然后由确定数据库的配置参数的装置来执行上述步骤201至205。
本申请实施例可以通过查询语句的目标特征向量和数据库的当前状态信息确定数据库的配置参数,可以保障数据库在不同负载需求下的良好表现。例如:可以满足时延和吞吐量等方面的表现。
前述已经介绍了本申请实施例提供了三种粒度的调参模式,这三种调参模式在向量化编码查询语句时的过程分别为:
1、对于查询级别的调参模式(query-level tuning):由查询语句向量化模块(Query2Vector)对单个查询语句进行向量化编码,以得到该查询语句的目标特征向量,然后将该查询语句的目标特征向量传递给调参系统(tuner),调参系统根据该查询语句的特征向量确定针对该查询语句的数据库的配置参数集。也就是说,当执行查询级别的调参模式时,会通过上述过程得到一个配置参数集,该配置参数集包括与该查询语句对应的一个目标配置参数组,所述一个目标配置参数组用于在执行所述该单个查询语句时调整所述数据库当前所使用的配置参数。查询级别的调参模式针对一个查询语句进行一次调参,这种调参时延很小,可以满足对不同查询语句的个性化调参需求。
2、对于负载级别的调参模式(workload-level tuning):由查询语句向量化模块(Query2Vector)对N个查询语句中的每个查询语句进行向量化编码,以得到N个目标特征向量,N个查询语句为要数据库提供服务的所有查询语句。然后,查询语句向量化模块将N个目标特征向量传递给负载向量化模块(Workload2Vector),由该负载向量化模块将N个目标特征向量融合为一个统一向量,然后由负载向量化模块将该统一向量传递给调参系统,调参系统根据该统一向量确定针对N个查询语句的数据库的配置参数集。
其中,负载向量化模块将N个目标特征向量融合为一个统一向量的过程可以是对N个目标特征向量各自对应的部分做合并,例如:每个目标特征向量都包括查询类型、表信息和代价信息三种子向量,则在做向量融合时可以是:对于查询类型的子向量和表信息的子向量,可以按照“并”的逻辑进行合并;对于代价信息的子向量,可以通过加和的方式进行合并。当然,也不仅限定于这里列举的这种融合方式,也可以是有的子向量按照“并”的逻辑进行合并,有的子向量用向量积的方式进行合并。
这种负载级别的调参模式可以为N个查询语句进行一次调参,可以满足大吞吐量的场景需求。
3、对于聚类级别的调参模式(cluster-level tuning):由查询语句向量化模块(Query2Vector)对N个查询语句中的每个查询语句进行向量化编码,以得到N个目标特征向量,N个查询语句为要数据库提供服务的所有查询语句。然后由查询语句向量化模块将N个目标特征向量传递给调参系统,调参系统为N个目标特征向量确定N个配置参数组,然后调参系统再将这N个配置参数组传递给负载聚类模块(Workload2Cluster),负载聚类模块对N个配置参数组进行聚类,以得到Q个目标配置参数组。负载聚类模块再将这Q个目标配置参数组传递给调参系统,调参系统根据该Q个目标配置参数组确定针对N个查询语句的数据库的配置参数集。其中,负载聚类模块对N个配置参数组进行聚类,以得到Q个目标配置参数组的过程可以是按照各组的配置参数的相似度进行聚类,如针对两组中同类型的配置参数的可以通过最短欧氏距离的方式进行聚类。
上述聚类级别的调参模式中,是由调参系统为N个目标特征向量确定N个配置参数组,然后由负载聚类模块将N个配置参数组聚类为Q个目标配置参数组。实际上也不限于这一种由N个目标特征向量到Q个目标配置参数组的方式。还可以通过负载聚类模块根据N个目标特征向量确定N个配置参数组,然后再聚类为Q个目标配置参数组。这种实现方式的情况下,负载聚类模块可以包括向量到参数模块(Vector2Pattern)和参数聚类模块(Pattern2Cluster)两部分,这种情况下与聚类相关的内容可以参阅图10进行理解,与聚类不相关的负载向量化模块(Workload2Vector)在图10中没有示出,但不表示该负载向量化模块不存在。
在图10所示的聚类过程中,查询语句向量化模块将N个目标特征向量传递给向量到参数模块,向量到参数模块为N个目标特征向量确定N个配置参数组。向量到参数模块再将N个配置参数组传递给参数聚类模块,由参数聚类模块将N个配置参数组聚类为Q个目标配置参数组。其中,向量到参数模块可以包括一个深度学习模型(deep learning model,DL Model),该DL Model可以确定离散值作为配置参数,例如可以根据估计值与默认值的关系在{-1,0,1}中确定配置参数,其中:当估计值在默认值附近时取0;当估计值远大于默认值时取1;当估计值远小于默认值时取-1。这种通过离散的数值来表示配置参数的方案,可以节省大量的计算开销。
关于执行哪种调参模式可以是由用户或管理员基于实际需求进行选择的,这种情况下,本申请实施例提供的确定数据库的配置参数的方法还可以包括:
响应于模式选择指令,从查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式中确定用于所述N个查询语句的目标调参模式。
也就是说,当所述目标调参模式为所述查询级别的调参模式时,所述M=1,所述N=1,也就是说,针对查询级别的调参模式,可以针对一个查询语句确定一个配置参数集,该配置参数集中包括针对该查询语句的一个目标配置参数组,该一个目标配置参数组用于执行该一个查询语句时配置给数据库。
当所述目标调参模式为所述负载级别的调参模式时,上述确定数据库的配置参数的方法还可以包括:对所述N个目标特征向量进行融合,以得到融合后的统一向量;其中,步骤204可以包括根据所述统一向量和所述数据库的当前状态信息,确定与所述N个查询语句对 应的所述数据库的配置参数集,所述配置参数集包括一个目标配置参数组,所述一个目标配置参数组用于在执行所述N个查询语句时调整所述数据库当前所使用的配置参数。这种负载级别的调参模式可以为N个查询语句进行一次调参,可以满足大吞吐量的场景需求。
当所述目标调参模式为所述聚类级别的调参模式时,所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,可以包括:
根据所述N个目标特征向量和所述数据库的当前状态信息,确定N个配置参数组,其中,N个配置参数组与所述N个目标特征向量一一对应;
对所述N个配置参数组进行聚类,以得到Q个目标配置参数组,其中,第一配置参数组对应至少一个目标特征向量,所述第一配置参数组为所述Q个目标配置参数组中的任意一个,所述Q为大于0的整数,且Q<N,所述第一配置参数组用于在执行所述至少一个目标特征向量对应的查询语句时调整所述数据库当前所使用的配置参数。
这种聚类级别的调参模式可以针对同类型的查询语句确定一组目标配置参数组,既可以保证较大的吞吐量,又可以满足低时延的需求。
上述图8和图10是从DBMS模块化的角度描述的确定数据库的配置参数的过程,图11从处理流程的角度描述了上述实施例对N个目标特征向量进行后续处理从而得到配置参数集,以确定数据库的配置参数的过程。
如图11所示,本申请实施例提供的三种粒度的调参模式的处理过程,并且还从三种粒度的调参模式的处理过程中提取了调参系统的工作过程,以及聚类级别的调参模式中的向量到参数(Vector2Pattern)的处理过程,图11中的调参模型是以DS-DDPG为例的,实际上不限于DS-DDPG这一种调参模型。
图11中,查询语句级别(Query-level)的调参模式:任何一个查询语句Q
i被向量化编码后都可以得到一个特征向量,该特征向量输入到DS-DDPG模型后,就可以得到针对该查询语句Q
i的一个配置参数组[a
1,a
2,…a
m],在(Query-level)的调参模式下配置参数集A中包括该配置参数组[a
1,a
2,…a
m],即:A=[a
1,a
2,…a
m],在执行该查询语句Q
i时就可以使用该配置参数集A确定数据库的配置参数。
图11中,负载级别(workload-level)的调参模式:查询语句全集有N个,可以表示为Q={Q
1,Q
2,…Q
n},对这N个查询语句分别进行向量化编码,以得到N个目标特征向量,再将这N个目标特征向量融合为一个统一向量V
*,然后将V
*输入到DS-DDPG模型,就可以得到针对N个查询语句Q={Q
1,Q
2,…Q
n}的一个配置参数组[a
1,a
2,…a
m],在(workload-level)的调参模式下配置参数集A中包括该配置参数组[a
1,a
2,…a
m],即:A=[a
1,a
2,…a
m],在执行N个查询语句Q={Q
1,Q
2,…Q
n}时就可以使用该配置参数集A确定数据库的配置参数。
图11中,聚类级别(cluster-level)的调参模式:查询语句全集有N个,可以表示为Q={Q
1,Q
2,…Q
n},对这N个查询语句分别进行向量化编码,以得到N个目标特征向量,可以通过DS-DDPG模型确定出这N个目标特征向量对应的N个配置参数组[A
1,A
2,…A
n],也可以通过向量到参数(Vector2Pattern)的DL Model确定出这N个目标特征向量对应的N个配置参数组[P
1,P
2,…P
n],无论是N个配置参数组[A
1,A
2,…A
n],还是N个配置参数组[P
1,P
2,…P
n], 接下来都进行聚类,以得到Q个目标配置参数组{U
1,U
2,…U
q},然后将每个目标配置参数组U
j做向量化编码,以得到该U
j对应的向量化编码V
**,然后将该U
j对应的向量化编码V
**输入到DS-DDPG模型,就可以得到该U
j对应的配置参数集A,在执行与U
j对应的查询语句时,就可以使用该配置参数集A确定数据库的配置参数。
在上述三个粒度的调参模式中,DS-DDPG模型的工作原理都基本是相同的,DS-DDPG模型会根据查询语句的目标特征向量预测数据库的状态变化量ΔS=[Δs
1,Δs
2,…,Δs
m],然后结合当前的状态信息S=[s
1,s
2,…,s
m],按照S’=ΔS+S的方式确定S’=[S’
1,S’
2,…,S’
m],然后将S’作为表现值(表现值也可以称为观察值Observation)输出给角色(Actor)-评分(Critic)模块,该(Actor-Critic)模块也可以称为(Actor-Critic)模型,然后由(Actor-Critic)模型确定配置参数集A,关于DS-DDPG模型的具体调参过程后文再做详细介绍,此处不做过多展开。
在聚类级别(cluster-level)的调参模式中,DL Model的过程是以查询语句的目标特征向量作为输入,然后经过输入层(Input Layer)、隐藏层(Hidden Layer)和输出层(Output Layer)转换后得到目标特征配置参数组P=[p
1,p
2,…p
n]。
关于DL Model也可以参阅图12进行理解,其中,L1为输入层,L2、L3和L4为隐藏层,L5为输出层,目标特征向量v
1,v
2至v
k输入到L1层,然后经过L1、L2、L3、L4和L5之间的关系后,输出目标特征配置参数组P=[p
1,p
2,…p
n]。
由上述描述可知,无论哪种粒度的调参模式,都需要对查询语句进行向量化编码,向量化编码的是查询语句中的查询信息,本申请实施例中的查询信息包括查询类型、表信息以及查询时所涉及算子的类型和对应的代价信息。其中,查询类型例如可以包括:插入(insert)、选择(select)、更新(update)和删除(delete)等。表信息可以包括表的标识信息,也可以包括表的数据体量和数据结构。查询时所涉及算子的类型可以包括顺序扫描(Seq_Scan)、哈希连接(Hash_Join)和聚合操作(Aggregate)等类型。算子的代价信息指的是在执行该查询语句时该算子所花费的执行代价。每个算子的执行代价可以预先通过优化器根据该查询语句的查询计划计算出来,也就是说,每个算子的执行代价可以预先得到。
如图13示出了针对一个查询语句进行向量化的过程,该过程中从查询计划到预估算子的执行代价,然后该查询计划的各部分进行向量化的过程。
如图13中,一个模板向量可以包括查询类型(DML)、表(Tables)和算子代价(Operation Costs)三部分。
针对DML,模板向量可以包括插入(insert)、选择(select)、更新(update)和删除(delete)。若该查询语句是一个选择操作,则在select的比特位上填1,其他insert、update和delete的比特位都填0。
针对表部分,该查询语句的查询计划中包括tbl1、tbl2和tbl3三个表的表信息,则在tbl1、tbl2和tbl3的比特位上填1,其他tbl4至tbl8的比特位上都填0。
针对各类型的算子,可以对算子结构树上的同类型算子的执行代价做累加,然后在相应类型的算子的浮点位上填累加后的代价值。例如:图13中,每个算子都包括代价预估相 应操作的时间点,该代价预估相应操作的时间点分包括总时间(Total)、开始时间(Startup)和子节点时间(Child)。其中,总时间指示执行结果全部返回的时间点;开始时间指示开始返回结果的时间点;子节点时间(指示可以开始执行节点算子的最早时间。对于查询计划上的任意一个算子节点,可以用(总时间-子节点时间)表示它最大的执行代价)。例如:图13中有三个顺序扫描算子,而且每个顺序扫描算子的子节点时间都为0,所以这三个顺序扫描算子的执行代价加和为2.41+17.8+20.7=40.91。同理,Hash_Join算子的执行代价为23.19+48.16-2.43=68.92,Aggregate算子的执行代价为48.28-23.24=25.04。
上述在模板向量上,对查询类型、表信息和算子代价三部分都赋值后就得到该查询语句的目标特征向量。为了消除由于取值范围不同导致的特征重要性差异,可以这些算子的执行代价可以做归一化处理,得到归一化处理后的目标特征向量,将所有特征约束在相同的数量级上。比如,取值为10000的特征在生成最终结果的过程中比取值为-1的特征占更大的比重,但可能经过归一化后二者都是0.5,表示两者有相同的影响程度。
上述目标特征向量中的DML和Table的表示方式是用比特位标识的方式来表示的,实际上本申请实施例中不限于这种比特位标识的表示方式,还可以是通过浮点位标识的形式来表示。当通过比特位标识来表示时也不限于上述表示方式中的0、1这种表示方式,例如:也可以用T来代替1,用F来代替0,或者用其他的表示来代替1和0。当通过浮点位标识来表示时,可以对所涉及到的所有查询类型都用不同字符做标记,例如:插入用1表示、选择用2表示、更新用3表示,删除用4表示,当然,这里只是以这四种查询类型为例进行说明,不限定只有这四种查询类型,也不限于用该处列举的数值的形式表示,也可以通过其他符号或数值的形式来表示不同的查询类型。
上述是按照对向量模板进行赋值的过程来描述向量化的,实际上,若从N个查询信息中选择任意一个称为第一查询信息,上述过程也可以描述为:第一查询信息包括第一查询类型、第一表信息和查询时所涉及算子的类型和对应的代价信息,所述第一查询类型用于指示所述第一查询信息所对应的查询语句请求对所述数据库的操作类型;所述第一表信息用于指示所述第一查询信息所对应的查询语句被所述数据库执行时所涉及的关系表;所述代价信息用于指示所涉及的算子在执行所述第一查询信息所对应的查询语句时各自的执行代价。
所述第一查询信息对应的目标特征向量包括第一查询类型的标识和第二查询类型的标识、第一表信息的标识和第二表信息的标识,以及各类型算子的浮点位;其中,所述第一查询类型的标识用第一值表示,所述第二查询类型的标识用第二值表示,所述第二查询类型属于所述数据库的查询类型且未包括在所述第一查询信息中;所述第一表信息的标识用第三值表示,所述第二表信息的标识用第四值表示,所述第二表信息属于所述数据库的表信息且未包括在所述第一查询信息中;所述各类型算子的浮点位上包括对应类型算子的执行代价。所述对应类型算子的执行代价为归一化后的执行代价。
上述描述了对查询语句进行向量化编码的过程,下面再介绍调参系统(tuner)基于DS-DRL模型进行调参的过程,以及基于DS-DDPG模型进行调参的过程。
下面先结合图14介绍基于DS-DRL模型进行调参的过程。
DS-DRL模型包括预测模型(Predictor)、环境模型(Environment)和代理模型(Agent);其中,环境模型包括外部测量(outer metric)和内部状态(inner state)两部分。通过DS-DRL模型调参的过程可以包括:
301、负载(Workload)向查询语句向量化模块提供查询语句。
查询语句向量化模块对查询语句执行上文所描述的向量化编码过程,会得到N个目标特征向量。
302、查询语句向量化模块将N个目标特征向量传递给预测模型。
预测模型根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量。
预测模型可以是一个神经网络模型。
303、预测模型将ΔS传递给环境模型。
环境模型获知该ΔS,并根据外部状态(outer metric)中存储的执行所述N个查询语句之前的所述数据库的状态信息S,确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’。可以是S’=ΔS+S。
数据库的状态信息可以包括数据库的中央处理器(central processing unit,CPU)的利用率,网络利用率,磁盘的输入/输出(input/output,I/O)数,磁盘繁忙程度和内存换页数等。
304、环境模型将S’传递给代理模型。
代理模型将S’作为输入,生成一个推荐配置参数集(Action)。
305、代理模型将该推荐配置参数集传递给环境模型。
环境模型根据所述推荐配置参数集进行模拟配置。
环境模型中还包括内部状态(inner state),内部状态存储有数据库当前的配置参数,这样就可以根据推荐配置参数集(Action)调整内部状态中的配置参数。
306、查询语句向量化模块将N个目标特征向量传递给环境模型。
环境模型执行所述N个目标特征向量,以更新所述S和S’。
环境模型确定所述N个目标特征向量被执行后的表现值(Reward,R)。
其中R的计算过程可以参阅如下公式进行理解:
其中,R为N个目标特征向量被执行后的表现值;w
i是m个表现指标(如延迟、吞吐量等)中第i个表现指标的权重,r
i为第i个表现指标的表现值。
其中,Δ
0,t表示当前表现相对于起始表现的变化量,Δ
(t-1),t表示当前表现相对于上一个表现的变化量。
其中,m
0表示起始的表现度量,m
t表示当前的表现度量,m
t-1表示相对于当前的上一个的表现度量,其中,表现度量可以是如延迟、吞吐量和执行时间等。
307、环境模型将R值传递给代理模型。
所述代理模型根据所述表现值R更新调参策略,并根据更新后的所述S’确定更新的推荐配置参数集。
若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;
若所述表现值R不满足预设条件,则重复执行305和306中环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。
该种可能的实现方式中的满足预设条件可以包括经过R值超过预设阈值,或者达到预先规定的资源限制条件,例如:迭代了预定次数,或者迭代了预设时长等。
也就是说,步骤204可以包括:根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集。
当所述DS-DRL模型包括预测模型、环境模型和代理模型时,所述根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集,可以包括:
通过所述预测模型,根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量;
通过所述环境模型,根据所述ΔS,以及执行所述N个查询语句之前的所述数据库的状态信息S确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’;
通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集;
通过所述环境模型,根据所述推荐配置参数集进行模拟配置,并执行所述N个目标特征向量,以更新所述S和S’,并确定所述N个目标特征向量被执行后的表现值R;
通过所述代理模型,根据所述表现值R更新调参策略,并根据更新后的所述S’确定更 新的推荐配置参数集;
若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;
若所述表现值R不满足预设条件,则重复执行上述环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。
上述是通过DS-DRL模型介绍的调参的过程,下面结合图15基于DS-DDPG模型介绍调参的过程。
当所述DS-DRL模型为DS-DDPG模型时,所述代理模型包括角色actor模型和评分critic模型。角色actor模型和评分critic模可以是神经网络模型。
基于DS-DDPG模型介绍调参的过程中的步骤401至403与上述实施例中的步骤301至303相同,此处不再重复赘述。
404、环境模型将S’传递给actor模型。
角色模型将S’作为输入,生成一个推荐配置参数集(Action)。也就是说,所述通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集,可以包括:通过所述actor模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述critic模型,根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐配置参数集的评分。
步骤405和406与上述实施例中的步骤305和306相同,此处不再重复赘述。
407、环境模型将R值传递给critic模型。
408、actor模型将配置参数集传递给critic模型。
所述critic模型根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐配置参数集的评分。
409、critic模型将评分传递给角色模型。
上述307中的所述根据所述表现值R更新调参策略,可以包括:
所述actor模型根据所述评分更新所述actor模型的权重。也就是说,通过所述actor模型,根据所述评分更新所述actor模型的权重。
所述critic模型根据表现值R,更新所述critic模型的权重。也就是说,通过所述critic模型,根据表现值R,更新所述critic模型的权重。
本申请实施例提供的数据库的配置参数调整的方案相对于默认配置或者DBA的推荐配置在时延更短,下面表1是以事务处理性能委员会的基准(transaction processing performance council Benchmark
TMH,TPC-H)用例为例,分别测试了TPCH10x用例和TPCH30x用例在不同配置下的时延测试结果。
表1:不同配置下的作业执行总时延(秒:s)
默认配置 | DBA推荐配置 | 调优推荐配置 | |
TPCH10x | 293.37 | 283.12 | 197.90 |
TPCH30x | 1083.67 | 872.45 | 539.86 |
由表1可以看出,无论是TPCH10x用例还是TPCH30x用例,本申请实施例提供的调优推荐配置方案的时延都远远小于默认配置和DBA的推荐配置。
以上多个实施例描述的数据库系统以及确定数据库的配置参数的方法,下面结合附图介绍本申请实施例提供的确定数据库的配置参数的装置50。
如图16所示,本申请实施例提供的确定数据库的配置参数的装置50的一实施例包括:
获取单元501,用于获取M个客户端发送的N个查询语句,所述M为大于0的整数,所述N为大于0的整数,且N≥M;
第一确定单元502,用于从所述获取单元501获取的N个查询语句中确定N个查询信息,所述N个查询语句与所述N个查询信息一一对应;
向量编码单元503,用于对所述第一确定单元502确定的N个查询信息进行向量化编码,以得到对应的N个目标特征向量,所述N个查询信息与所述N个目标特征向量一一对应;
第二确定单元504,用于根据所述向量编码单元503编码得到的N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括至少一个目标配置参数组,所述目标配置参数组包括至少一个配置参数,所述配置参数集用于执行所述N个查询语句。
本申请实施例可以通过查询语句的目标特征向量和数据库的当前状态信息确定数据库的配置参数,可以保障数据库在不同负载需求下的良好表现。例如:可以满足时延和吞吐量等方面的表现。
一种可能的实现方式中,该装置50还包括:
第三确定单元505,用于响应于模式选择指令,从查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式中确定用于所述获取单元501获取的N个查询语句的目标调参模式。
一种可能的实现方式中,所述向量编码单元503,还用于当所述目标调参模式为所述负载级别的调参模式时,对所述N个目标特征向量进行融合,以得到融合后的统一向量;
所述第二确定单元504,用于根据所述统一向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括一个目标配置参数组,所述一个目标配置参数组用于在执行所述N个查询语句时调整所述数据库当前所使用的配置参数。
一种可能的实现方式中,所述第二确定单元504,用于当所述目标调参模式为所述聚类级别的调参模式时,根据所述N个目标特征向量和所述数据库的当前状态信息,确定N个配置参数组,其中,N个配置参数组与所述N个目标特征向量一一对应;对所述N个配置参数组进行聚类,以得到Q个目标配置参数组,其中,第一配置参数组对应至少一个目标特征向量,所述第一配置参数组为所述Q个目标配置参数组中的任意一个,所述Q为大于0的整数,且Q<N,所述第一配置参数组用于在执行所述至少一个目标特征向量对应的查询语句时调整所述数据库当前所使用的配置参数。
一种可能的实现方式中,所述N个查询信息中的第一查询信息包括第一查询类型、第一表信息和查询时所涉及算子的类型和对应的代价信息,所述第一查询信息为所述N个查询信息中的任意一个;
所述第一查询类型用于指示所述第一查询信息所对应的查询语句请求对所述数据库的 操作类型;
所述第一表信息用于指示所述第一查询信息所对应的查询语句被所述数据库执行时所涉及的关系表;
所述代价信息用于指示所涉及的算子在执行所述第一查询信息所对应的查询语句时各自的执行代价。
一种可能的实现方式中,所述第一查询信息对应的目标特征向量包括第一查询类型的标识和第二查询类型的标识、第一表信息的标识和第二表信息的标识,以及各类型算子的浮点位;
其中,所述第一查询类型的标识用第一值表示,所述第二查询类型的标识用第二值表示,所述第二查询类型属于所述数据库的查询类型且未包括在所述第一查询信息中;
所述第一表信息的标识用第三值表示,所述第二表信息的标识用第四值表示,所述第二表信息属于所述数据库的表信息且未包括在所述第一查询信息中;
所述各类型算子的浮点位上包括对应类型算子的执行代价。
一种可能的实现方式中,所述对应类型算子的执行代价为归一化后的执行代价。
一种可能的实现方式中,所述第二确定单元504,用于根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集。
一种可能的实现方式中,所述第二确定单元504用于:当所述DS-DRL模型包括预测模型、环境模型和代理模型时;
通过所述预测模型,根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量;
通过所述环境模型,根据所述ΔS,以及执行所述N个查询语句之前的所述数据库的状态信息S确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’;
通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集;
通过所述环境模型,根据所述推荐配置参数集进行模拟配置,并执行所述N个目标特征向量,以更新所述S和S’,并确定所述N个目标特征向量被执行后的表现值R;
通过所述代理模型,根据所述表现值R更新调参策略,并根据更新后的所述S’确定更新的推荐配置参数集;
若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;
若所述表现值R不满足预设条件,则重复执行上述环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。
一种可能的实现方式中,所述第二确定单元504用于:当所述DS-DRL模型为双状态深度确定性策略梯度算法DS-DDPG模型,所述代理模型包括角色actor模型和评分critic模型时;
通过所述actor模型,根据所述模拟状态信息S’,确定推荐配置参数集;
通过所述critic模型,根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐 配置参数集的评分。
一种可能的实现方式中,所述第二确定单元504,用于通过所述actor模型,根据所述评分更新所述actor模型的权重。
一种可能的实现方式中,所述第二确定单元504,用于通过所述critic模型,根据表现值R,更新所述critic模型的权重。
关于确定数据库的配置参数的装置50的相应内容可以参与上述图8至图15所对应的实施例的相应内容进行理解,本处不做过多赘述。
在硬件实现上,上述确定数据库的配置参数的装置可以为数据库服务器,上述获取单元501、第一确定单元502、向量编码单元503、第二确定单元504和调参单元都可以为处理器,当然获取单元501也可以为通信接口。
图17所示,为本申请的实施例提供的上述实施例中所涉及的数据库服务器60的一种可能的逻辑结构示意图。数据库服务器60包括:处理器601、通信接口602、存储器603以及总线604。处理器601、通信接口602以及存储器603通过总线604相互连接。在本申请的实施例中,处理器601用于对数据库服务器60的动作进行控制管理,例如,处理器601用于执行图9中的步骤201至205和/或用于本文所描述的技术的其他过程。通信接口602用于支持数据库服务器60进行通信。存储器603,用于存储数据库服务器60的程序代码和数据。
其中,处理器601可以是中央处理器单元,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。总线604可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图17中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请的另一实施例中,还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当设备的至少一个处理器执行该计算机执行指令时,设备执行上述图8至图15部分实施例所描述的确定数据库的配置参数的方法。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得设备执行上述图8至图15部分实施例所描述的确定数据库的配置参数的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装 置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。
Claims (27)
- 一种确定数据库的配置参数的方法,其特征在于,包括:获取M个客户端发送的N个查询语句,所述M为大于0的整数,所述N为大于0的整数,且N≥M;从所述N个查询语句中确定N个查询信息,所述N个查询语句与所述N个查询信息一一对应;对所述N个查询信息进行向量化编码,以得到对应的N个目标特征向量,所述N个查询信息与所述N个目标特征向量一一对应;根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括至少一个目标配置参数组,所述目标配置参数组包括至少一个配置参数,所述配置参数集用于执行所述N个查询语句。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:响应于模式选择指令,从查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式中确定用于所述N个查询语句的目标调参模式。
- 根据权利要求2所述的方法,其特征在于,当所述目标调参模式为所述负载级别的调参模式时,所述方法还包括:对所述N个目标特征向量进行融合,以得到融合后的统一向量;所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述统一向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括一个目标配置参数组,所述一个目标配置参数组用于在执行所述N个查询语句时调整所述数据库当前所使用的配置参数。
- 根据权利要求2所述的方法,其特征在于,当所述目标调参模式为所述聚类级别的调参模式时,所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述N个目标特征向量和所述数据库的当前状态信息,确定N个配置参数组,其中,N个配置参数组与所述N个目标特征向量一一对应;对所述N个配置参数组进行聚类,以得到Q个目标配置参数组,其中,第一配置参数组对应至少一个目标特征向量,所述第一配置参数组为所述Q个目标配置参数组中的任意一个,所述Q为大于0的整数,且Q<N,所述第一配置参数组用于在执行所述至少一个目标特征向量对应的查询语句时调整所述数据库当前所使用的配置参数。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述N个查询信息中的第一查询信息包括第一查询类型、第一表信息和查询时所涉及算子的类型和对应的代价信息,所述第一查询信息为所述N个查询信息中的任意一个;所述第一查询类型用于指示所述第一查询信息所对应的查询语句请求对所述数据库的操作类型;所述第一表信息用于指示所述第一查询信息所对应的查询语句被所述数据库执行时 所涉及的关系表;所述代价信息用于指示所涉及的算子在执行所述第一查询信息所对应的查询语句时各自的执行代价。
- 根据权利要求5所述的方法,其特征在于,所述第一查询信息对应的目标特征向量包括第一查询类型的标识和第二查询类型的标识、第一表信息的标识和第二表信息的标识,以及各类型算子的浮点位;其中,所述第一查询类型的标识用第一值表示,所述第二查询类型的标识用第二值表示,所述第二查询类型属于所述数据库的查询类型且未包括在所述第一查询信息中;所述第一表信息的标识用第三值表示,所述第二表信息的标识用第四值表示,所述第二表信息属于所述数据库的表信息且未包括在所述第一查询信息中;所述各类型算子的浮点位上包括对应类型算子的执行代价。
- 根据权利要求6所述的方法,其特征在于,所述对应类型算子的执行代价为归一化后的执行代价。
- 根据权利要求1-7任一项所述的方法,其特征在于,所述根据所述N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集。
- 根据权利要求8所述的方法,其特征在于,所述DS-DRL模型包括预测模型、环境模型和代理模型;所述根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集,包括:通过所述预测模型,根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量;通过所述环境模型,根据所述ΔS,以及执行所述N个查询语句之前的所述数据库的状态信息S确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’;通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述环境模型,根据所述推荐配置参数集进行模拟配置,并执行所述N个目标特征向量,以更新所述S和S’,并确定所述N个目标特征向量被执行后的表现值R;通过所述代理模型,根据所述表现值R更新调参策略,并根据更新后的所述S’确定更新的推荐配置参数集;若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;若所述表现值R不满足预设条件,则重复执行上述环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。
- 根据权利要求9所述的方法,其特征在于,当所述DS-DRL模型为双状态深度确定 性策略梯度算法DS-DDPG模型时,所述代理模型包括角色actor模型和评分critic模型;所述通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集,包括:通过所述actor模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述critic模型,根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐配置参数集的评分。
- 根据权利要求10所述的方法,其特征在于,所述根据所述表现值R更新调参策略,包括:通过所述actor模型,根据所述评分更新所述actor模型的权重。
- 根据权利要求10或11所述的方法,其特征在于,所述根据所述表现值R更新调参策略,包括:通过所述critic模型,根据表现值R,更新所述critic模型的权重。
- 一种确定数据库的配置参数的装置,其特征在于,包括:获取单元,用于获取M个客户端发送的N个查询语句,所述M为大于0的整数,所述N为大于0的整数,且N≥M;第一确定单元,用于从所述获取单元获取的N个查询语句中确定N个查询信息,所述N个查询语句与所述N个查询信息一一对应;向量编码单元,用于对所述第一确定单元确定的N个查询信息进行向量化编码,以得到对应的N个目标特征向量,所述N个查询信息与所述N个目标特征向量一一对应;第二确定单元,用于根据所述向量编码单元编码得到的N个目标特征向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括至少一个目标配置参数组,所述目标配置参数组包括至少一个配置参数,所述配置参数集用于执行所述N个查询语句。
- 根据权利要求13所述的装置,其特征在于,所述装置还包括:第三确定单元,用于响应于模式选择指令,从查询级别的调参模式、负载级别的调参模式和聚类级别的调参模式中确定用于所述获取单元获取的N个查询语句的目标调参模式。
- 根据权利要求14所述的装置,其特征在于,所述向量编码单元,还用于当所述目标调参模式为所述负载级别的调参模式时,对所述N个目标特征向量进行融合,以得到融合后的统一向量;所述第二确定单元,用于根据所述统一向量和所述数据库的当前状态信息,确定与所述N个查询语句对应的所述数据库的配置参数集,所述配置参数集包括一个目标配置参数组,所述一个目标配置参数组用于在执行所述N个查询语句时调整所述数据库当前所使用的配置参数。
- 根据权利要求14所述的装置,其特征在于,所述第二确定单元,用于当所述目标调参模式为所述聚类级别的调参模式时,根据所述N个目标特征向量和所述数据库的当前状态信息,确定N个配置参数组,其中,N个配置参数组与所述N个目标特征向量一一对应;对所述N个配置参数组进行聚类,以得到Q个目标配置参数组,其中,第一配置参数组对应至少一个目标特征向量,所述第一配置参数组为所 述Q个目标配置参数组中的任意一个,所述Q为大于0的整数,且Q<N,所述第一配置参数组用于在执行所述至少一个目标特征向量对应的查询语句时调整所述数据库当前所使用的配置参数。
- 根据权利要求13-16任一项所述的装置,其特征在于,所述N个查询信息中的第一查询信息包括第一查询类型、第一表信息和查询时所涉及算子的类型和对应的代价信息,所述第一查询信息为所述N个查询信息中的任意一个;所述第一查询类型用于指示所述第一查询信息所对应的查询语句请求对所述数据库的操作类型;所述第一表信息用于指示所述第一查询信息所对应的查询语句被所述数据库执行时所涉及的关系表;所述代价信息用于指示所涉及的算子在执行所述第一查询信息所对应的查询语句时各自的执行代价。
- 根据权利要求17所述的装置,其特征在于,所述第一查询信息对应的目标特征向量包括第一查询类型的标识和第二查询类型的标识、第一表信息的标识和第二表信息的标识,以及各类型算子的浮点位;其中,所述第一查询类型的标识用第一值表示,所述第二查询类型的标识用第二值表示,所述第二查询类型属于所述数据库的查询类型且未包括在所述第一查询信息中;所述第一表信息的标识用第三值表示,所述第二表信息的标识用第四值表示,所述第二表信息属于所述数据库的表信息且未包括在所述第一查询信息中;所述各类型算子的浮点位上包括对应类型算子的执行代价。
- 根据权利要求18所述的装置,其特征在于,所述对应类型算子的执行代价为归一化后的执行代价。
- 根据权利要求13-19任一项所述的装置,其特征在于,所述第二确定单元,用于根据所述N个目标特征向量和所述数据库的当前状态信息,使用双状态深度强化学习DS-DRL模型,确定与所述N个查询语句对应的所述数据库的配置参数集。
- 根据权利要求20所述的装置,其特征在于,所述第二确定单元用于:当所述DS-DRL模型包括预测模型、环境模型和代理模型时;通过所述预测模型,根据所述N个目标特征向量预测所述数据库的状态信息变化量ΔS,所述ΔS是执行所述N个查询语句之后相对于执行所述N个查询语句之前的所述数据库的状态信息的变化量;通过所述环境模型,根据所述ΔS,以及执行所述N个查询语句之前的所述数据库的状态信息S确定模拟所述N个查询语句执行后的所述数据库的模拟状态信息S’;通过所述代理模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述环境模型,根据所述推荐配置参数集进行模拟配置,并执行所述N个目标特征向量,以更新所述S和S’,并确定所述N个目标特征向量被执行后的表现值R;通过所述代理模型,根据所述表现值R更新调参策略,并根据更新后的所述S’确定更 新的推荐配置参数集;若所述表现值R满足预设条件,则确定更新的推荐配置参数集为所述配置参数集;若所述表现值R不满足所述预设条件,则重复执行上述环境模型更新所述S以及所述S’,以及所述代理模型确定更新的推荐配置参数集的过程直至表现值R满足预设条件,所述满足预设条件的R值所对应的更新的推荐配置参数集为所述配置参数集。
- 根据权利要求21所述的装置,其特征在于,所述第二确定单元用于:当所述DS-DRL模型为双状态深度确定性策略梯度算法DS-DDPG模型,所述代理模型包括角色actor模型和评分critic模型时;通过所述actor模型,根据所述模拟状态信息S’,确定推荐配置参数集;通过所述critic模型,根据所述模拟状态信息S’和所述推荐配置参数集,确定所述推荐配置参数集的评分。
- 根据权利要求22所述的装置,其特征在于,所述第二确定单元,用于通过所述actor模型,根据所述评分更新所述actor模型的权重。
- 根据权利要求22或23所述的装置,其特征在于,所述第二确定单元,用于通过所述critic模型,根据表现值R,更新所述critic模型的权重。
- 一种确定数据库的配置参数的装置,其特征在于,所述装置包括至少一个处理器、存储器及存储在所述存储器上并可被所述至少一个处理器执行的指令,其特征在于,所述至少一个处理器执行所述指令,以实现权利要求1至12中任一项所述的方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至12中任一项所述的方法的步骤。
- 一种数据库系统,其特征在于,包括根据权利要求13至26中任一项所述的确定数据库的配置参数的装置和数据库。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3137748A CA3137748C (en) | 2019-05-14 | 2020-01-21 | Method and apparatus for determining configuration knob of database |
EP20804754.8A EP3961413A4 (en) | 2019-05-14 | 2020-01-21 | METHOD AND DEVICE FOR DETERMINING DATABASE CONFIGURATION PARAMETERS |
US17/525,435 US20220067008A1 (en) | 2019-05-14 | 2021-11-12 | Method and apparatus for determining configuration knob of database |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910401715.0 | 2019-05-14 | ||
CN201910401715.0A CN111949631B (zh) | 2019-05-14 | 2019-05-14 | 一种确定数据库的配置参数的方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/525,435 Continuation US20220067008A1 (en) | 2019-05-14 | 2021-11-12 | Method and apparatus for determining configuration knob of database |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020228378A1 true WO2020228378A1 (zh) | 2020-11-19 |
Family
ID=73290033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/073429 WO2020228378A1 (zh) | 2019-05-14 | 2020-01-21 | 一种确定数据库的配置参数的方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20220067008A1 (zh) |
EP (1) | EP3961413A4 (zh) |
CN (1) | CN111949631B (zh) |
CA (1) | CA3137748C (zh) |
WO (1) | WO2020228378A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023150039A1 (en) * | 2022-02-01 | 2023-08-10 | Capital One Services, Llc | Optimization of virtual warehouse computing resource allocation |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113064881B (zh) * | 2021-04-15 | 2024-05-24 | 深信服科技股份有限公司 | 一种数据库管理方法、装置、设备及可读存储介质 |
CN113010547B (zh) * | 2021-05-06 | 2023-04-07 | 电子科技大学 | 一种基于图神经网络的数据库查询优化方法及系统 |
CN114238269B (zh) * | 2021-12-03 | 2024-01-23 | 中兴通讯股份有限公司 | 数据库参数调整方法、装置、电子设备和存储介质 |
CN114911817A (zh) * | 2022-04-19 | 2022-08-16 | 北京百度网讯科技有限公司 | 一种数据处理方法、装置、电子设备及存储介质 |
CN115757464B (zh) * | 2022-11-18 | 2023-07-25 | 中国科学院软件研究所 | 一种基于深度强化学习的智能物化视图查询方法 |
CN116842060B (zh) * | 2023-08-30 | 2024-01-09 | 之江实验室 | 一种基于代理模型重排技术的推理查询优化方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120143847A1 (en) * | 2009-04-24 | 2012-06-07 | Nhn Business Platform Corporation | Database management method and system |
CN105224599A (zh) * | 2015-08-31 | 2016-01-06 | 浪潮(北京)电子信息产业有限公司 | 一种数据库自动优化配置的方法及系统 |
CN108234177A (zh) * | 2016-12-21 | 2018-06-29 | 深圳先进技术研究院 | 一种HBase配置参数自动调优方法及装置、用户设备 |
CN108509628A (zh) * | 2018-04-08 | 2018-09-07 | 腾讯科技(深圳)有限公司 | 数据库配置方法、装置、计算机设备和存储介质 |
CN108763398A (zh) * | 2018-05-22 | 2018-11-06 | 腾讯科技(深圳)有限公司 | 数据库配置参数处理方法、装置、计算机设备和存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100082599A1 (en) * | 2008-09-30 | 2010-04-01 | Goetz Graefe | Characterizing Queries To Predict Execution In A Database |
US11537615B2 (en) * | 2017-05-01 | 2022-12-27 | Futurewei Technologies, Inc. | Using machine learning to estimate query resource consumption in MPPDB |
US11106996B2 (en) * | 2017-08-23 | 2021-08-31 | Sap Se | Machine learning based database management |
-
2019
- 2019-05-14 CN CN201910401715.0A patent/CN111949631B/zh active Active
-
2020
- 2020-01-21 CA CA3137748A patent/CA3137748C/en active Active
- 2020-01-21 WO PCT/CN2020/073429 patent/WO2020228378A1/zh unknown
- 2020-01-21 EP EP20804754.8A patent/EP3961413A4/en active Pending
-
2021
- 2021-11-12 US US17/525,435 patent/US20220067008A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120143847A1 (en) * | 2009-04-24 | 2012-06-07 | Nhn Business Platform Corporation | Database management method and system |
CN105224599A (zh) * | 2015-08-31 | 2016-01-06 | 浪潮(北京)电子信息产业有限公司 | 一种数据库自动优化配置的方法及系统 |
CN108234177A (zh) * | 2016-12-21 | 2018-06-29 | 深圳先进技术研究院 | 一种HBase配置参数自动调优方法及装置、用户设备 |
CN108509628A (zh) * | 2018-04-08 | 2018-09-07 | 腾讯科技(深圳)有限公司 | 数据库配置方法、装置、计算机设备和存储介质 |
CN108763398A (zh) * | 2018-05-22 | 2018-11-06 | 腾讯科技(深圳)有限公司 | 数据库配置参数处理方法、装置、计算机设备和存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3961413A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023150039A1 (en) * | 2022-02-01 | 2023-08-10 | Capital One Services, Llc | Optimization of virtual warehouse computing resource allocation |
Also Published As
Publication number | Publication date |
---|---|
CN111949631B (zh) | 2024-06-25 |
CA3137748A1 (en) | 2020-11-19 |
CA3137748C (en) | 2024-03-26 |
EP3961413A4 (en) | 2022-05-18 |
CN111949631A (zh) | 2020-11-17 |
EP3961413A1 (en) | 2022-03-02 |
US20220067008A1 (en) | 2022-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020228378A1 (zh) | 一种确定数据库的配置参数的方法及装置 | |
US10642835B2 (en) | System and method for increasing accuracy of approximating query results using neural networks | |
US20200012657A1 (en) | Method and system for synthetic generation of time series data | |
US20130332490A1 (en) | Method, Controller, Program and Data Storage System for Performing Reconciliation Processing | |
JP6953738B2 (ja) | データセンターのネットワークにおいてクエリを実行する、コンピュータにより実施される方法 | |
CN108804473B (zh) | 数据查询的方法、装置和数据库系统 | |
CN112231592B (zh) | 基于图的网络社团发现方法、装置、设备以及存储介质 | |
CN110633796B (zh) | 模型更新方法、装置、电子设备及存储介质 | |
WO2015180340A1 (zh) | 一种数据挖掘方法及装置 | |
WO2022252694A1 (zh) | 神经网络优化方法及其装置 | |
JP5673473B2 (ja) | 分散計算機システム及び分散計算機システムの制御方法 | |
US20240152650A1 (en) | Computer-implemented methods, apparatuses, and computer program products for frequency based operations | |
CN116057518A (zh) | 使用机器学习模型的自动查询谓词选择性预测 | |
CN116302481A (zh) | 基于稀疏知识图谱链接预测的资源分配方法及系统 | |
US20230078246A1 (en) | Centralized Management of Distributed Data Sources | |
EP3851967A1 (en) | Smart network interface controller for caching distributed data | |
Luan et al. | Evaluation for sortie generation capacity of the carrier aircraft based on the variable structure RBF neural network with the fast learning rate | |
CN113064554B (zh) | 基于分布式存储的最优存储节点匹配方法、装置及介质 | |
KR20230026137A (ko) | 분산 학습용 서버 및 분산 학습 방법 | |
KR20160044623A (ko) | 리눅스 가상 서버의 로드 밸런싱 방법 | |
KR101617074B1 (ko) | 스마트 워터 그리드에서의 수자원 분배를 위한 컨텍스트 인지 추천방법 및 장치 | |
US11966393B2 (en) | Adaptive data prefetch | |
US12112331B2 (en) | Rule based machine learning for precise fraud detection | |
Wang et al. | Automatically setting parameter-exchanging interval for deep learning | |
US20230388197A1 (en) | Centralized Management of Distributed Data Sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20804754 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3137748 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020804754 Country of ref document: EP Effective date: 20211122 |