WO2018201916A1

WO2018201916A1 - Data query method, device, and database system

Info

Publication number: WO2018201916A1
Application number: PCT/CN2018/083826
Authority: WO
Inventors: 杨新颖
Original assignee: 华为技术有限公司
Priority date: 2017-05-04
Filing date: 2018-04-20
Publication date: 2018-11-08
Also published as: CN108804473A; CN108804473B

Abstract

A data query method, device, and database system, the method comprising: determining, according to the type of a pre-configured training model, a plurality of candidate predicate combinations supported by the pre-configured training model among a plurality of predicate combinations, each of the candidate predicate combinations in the plurality of candidate predicate combinations comprising at least two predicates; determining a first predicate combination in the plurality of candidate predicate combinations, the first predicate combination comprising predicates that are different from each other; and using a training model corresponding to the first predicate combination to determine a first execution plan, and using the first execution plan to perform a data query. The data query method, device and database system may improve the accuracy of the predicate selection rate, thereby improving the query performance. Furthermore, when there are repeated predicates in at least two second predicate combinations, the accuracy of the predicate selection rate may be improved.

Description

Data query method, device and database system

This application claims the priority of the Chinese Patent Application entitled "Method, Apparatus and Database System for Data Query" submitted by the Chinese Patent Office on May 4, 2017, the application number is 201710308623.9, the entire contents of which are incorporated herein by reference. In the application.

Technical field

The present application relates to the field of databases and, more particularly, to a method, apparatus and database system for data query.

Background technique

When the database system processes a query (Query) query from the client, for example, a query represented by a Structured Query Language (SQL), the query needs to be parsed, pre-compiled, optimized, etc., and then executed. plan. Among them, the optimizer is the most important component in the database system that affects the execution efficiency of the SQL statement, and its output estimates the least costly execution plan (or called the optimal execution plan). In the process of selecting the optimal execution plan by the optimizer, the selection rate estimation of the predicate is very important. The accuracy of the predicate selection rate estimation directly affects the accuracy of the optimizer's subsequent estimation of the operator's cost in the execution plan, thus affecting the output of the overall optimal execution plan.

Traditional predicate selectivity estimation methods include histogram based, based on common values and estimates based on common value frequencies. For the selection rate estimation of multi-column compound predicates, there are some composite selective estimation algorithms based on single selectivity and multi-column statistical information techniques, such as several columns combined with histograms. However, these are the calculations of the predicate selection rate for single or multiple predicates. The accuracy of the calculation needs to be improved, especially when a predicate falls into multiple selection rate calculation models at the same time, the calculation accuracy of the predicate selection rate is better. Low accuracy, which affects the output of the optimal execution plan.

Summary of the invention

The application provides a data query method, device and database system to improve the accuracy of the predicate selection rate, thereby improving query performance.

In a first aspect, a method of data query is provided, including:

The database server parses the query statement by receiving a query statement from the client to obtain a plurality of predicates; then performing predicate combination on the plurality of predicates to obtain a plurality of predicate combinations; and then according to the pre-configured training model a type, among the plurality of predicate combinations, determining a plurality of candidate predicate combinations supported by the pre-configured training model, each of the plurality of candidate predicate combinations comprising at least two predicates; Determining a first predicate combination in the plurality of candidate predicate combinations, the first predicate combination includes predicates different from each other; and finally determining a first execution plan using the training model corresponding to the first predicate combination, and using the An execution plan for data query.

In the above technical solution, for a predicate combination, the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.

In a possible implementation, if at least two predicate combinations have the same or repeated predicates, the database server may select an appropriate predicate combination based on the confidence of the training model.

In a possible implementation manner, the method further includes:

The database server determines at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate; each of the two second predicate combinations according to the indication a confidence level of the training model corresponding to the two predicate combinations, wherein the target predicate combination is determined in the at least two predicate combinations, the confidence level is used to indicate the accuracy of the training model; and the training model corresponding to the target predicate combination is used to determine the Second, execute the plan and use the second execution plan to perform data query.

In the above technical solution, the database server determines at least two second predicates by having at least one second predicate combination having at least one identical predicate, and according to a confidence level of the training model corresponding to each second predicate combination Determining a target predicate combination in the combination, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby calculating a selection rate of a predicate combination having overlapping predicates Can improve the accuracy of the predicate selection rate.

In a possible implementation, the database server may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein a second predicate combination may include at least two predicates, the at least two second A predicate combination has at least one identical or repeated predicate.

In a possible implementation manner, the database server may select an appropriate or optimal predicate combination, such as a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.

In a possible implementation, the database server may also select the target predicate combination according to other filtering conditions. For example, the database server may set a threshold screening condition, and select, in the at least two second predicate combinations, a second predicate combination that satisfies the threshold filtering condition, that is, the target predicate combination, to eliminate other second predicate combinations that do not satisfy the threshold screening condition. .

In a possible implementation manner, the method further includes:

Obtaining a confidence level of a training model corresponding to each of the at least two second predicate combinations.

Optionally, the database server may obtain the confidence of the training model corresponding to each predicate combination from the system table of the database.

Optionally, the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model. Among them, the confidence of the model is used to indicate the accuracy of the training model.

In some possible implementations, the confidence of the training model corresponding to the target predicate combination is greater than the confidence of the training model of the other second predicate combinations of the at least two second predicate combinations.

In some possible implementations, in the at least two second predicate combinations determined in the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.

Optionally, the “preset condition” may be a specific threshold, or may be a specific screening condition.

In some possible implementations, the confidence model of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.

In some possible implementations, the second execution plan is determined by using the training model corresponding to the target predicate combination, including:

The database server acquires model parameters of the training model corresponding to the target predicate combination, the training model parameters include at least one of a weight and an offset; and the second execution plan is generated by using the model parameter.

In a second aspect, an apparatus for data query is provided. A method for performing the first aspect or any of the possible implementations of the first aspect described above. In particular, the apparatus comprises a module or unit for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.

In a third aspect, an apparatus for data query is provided. The device includes a processor, a memory, and a communication interface. The processor is coupled to the memory and the communication interface. The memory is for storing instructions for the processor to execute, and the communication interface is for communicating with other network elements under the control of the processor. The instructions, when executed by the processor, cause the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.

In a fourth aspect, a database system is provided. The database system includes the apparatus and database of the data query of the second aspect or the third aspect.

In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a program that causes a device for data query to perform the first aspect described above, and any one of various implementations of the data query Methods.

DRAWINGS

FIG. 1 is a schematic structural diagram of a database system to which an embodiment of the present application is applied.

2 is a schematic diagram of a stand-alone database system to which an embodiment of the present application is applied.

FIG. 3 is a schematic diagram of a cluster database system adopting a shared disk architecture according to an embodiment of the present application.

4 is a schematic diagram of a cluster database system employing a shared-nothing disk architecture according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a database server to which an embodiment of the present application is applied.

FIG. 6 is a schematic flowchart of a method for data query according to an embodiment of the present application.

FIG. 7 is a schematic flowchart of a method for data query according to another embodiment of the present application.

FIG. 8 is a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application.

9 is a flow chart of an example in accordance with an embodiment of the present application.

10 is a flow chart of a specific example in accordance with an embodiment of the present application.

Figure 11 is a schematic diagram of an example of application of an embodiment of the present application.

FIG. 12 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application.

FIG. 13 is a schematic block diagram of an apparatus for data query according to another embodiment of the present application.

14 is a schematic block diagram of a database system in accordance with an embodiment of the present application.

FIG. 15 is a structural block diagram of an apparatus for data query provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. .

The technical solution of the embodiment of the present application can be used in a database system or a database management system (DBMS), such as a relational database management system.

The architecture of the database system to which the embodiment of the present application is applied is as shown in FIG. 1. The database system includes a database and a database management system DBMS. Wherein, a database refers to an organized collection of data stored in a data store, ie, an associated set of data organized, stored, and used in accordance with a certain data model. For example, the database may include one or more table data.

DBMS is used to establish, use and maintain databases, and to manage and control the database in a unified manner to ensure the security and integrity of the database. The user can access the data in the database through the DBMS, and the database administrator also performs database maintenance through the DBMS. DBMS provides a variety of functions that enable multiple applications and user devices to use different methods to create, modify, and query databases at the same time or at different times. Applications and user devices can be collectively referred to as clients. The functions provided by the DBMS can include the following items: (1) data definition function, the DBMS provides a data definition language (DDL) to define the database structure, and the DDL is used to describe the database framework and can be saved in the data dictionary. (2) Data access function, DBMS provides Data Manipulation Language (DML) to achieve basic access operations to database data, such as retrieval, insertion, modification and deletion; (3) database operation management function, DBMS Provide data control functions, that is, data security, integrity and concurrency control to effectively control and manage database operations to ensure data is correct and effective; (4) database establishment and maintenance functions, including database initial data loading Into, database dump, recovery, reorganization, system performance monitoring, analysis and other functions; (5) database transmission, DBMS provides processing data transmission, to achieve communication between the client and the DBMS, usually coordinated with the operating system .

Specifically, FIG. 2 is a schematic diagram of a stand-alone database system, including a database management system and a data storage system for providing services such as querying and modifying a database, and the database management system stores data in the data storage. In a stand-alone database system, the database management system and data storage are usually located on a single server, such as a Symmetric Multi-Processor (SMP) server. The SMP server includes multiple processors, all of which share resources such as bus, memory, and I/O systems. The functionality of the database management system can be implemented by one or more processors executing programs in memory.

Figure 3 is a schematic diagram of a cluster database system using a shared-storage architecture, including multiple nodes (such as nodes 1-N in Figure 3), each node is deployed with a database management system to provide users with database queries. And modifying services, multiple database management systems store shared data in the shared data store, and perform read and write operations on the data in the data store through the switch. The shared data storage can be a shared disk array. A node in a clustered database system can be a physical machine, such as a database server, or a virtual machine running on an abstract hardware resource. If the node is a physical machine, the switch is a Storage Area Network (SAN) switch, an Ethernet switch, a fiber switch, or other physical switching device. If the node is a virtual machine, the switch is a virtual switch.

FIG. 4 is a schematic diagram of a cluster database system adopting a shared-nothing architecture, each node has its own unique hardware resources (such as data storage), an operating system, and a database, and nodes communicate through a network. Under this system, the data will be distributed to each node according to the database model and application characteristics. The query task will be divided into several parts, executed in parallel on all nodes, and coordinated with each other to provide database services as a whole. All communication functions are in Implemented on a high-bandwidth network interconnection system. Like the clustered database system of the shared disk architecture described in Figure 3, the nodes here can be either physical or virtual machines.

In all embodiments of the present application, the data store of the database system includes, but is not limited to, a solid state drive (SSD), a disk array, or other type of non-transitory computer readable medium. Although the database is not shown in Figures 2 through 4, it should be understood that the database is stored in a data store. One of ordinary skill in the art will appreciate that a database system may include fewer or more components than those shown in Figures 2 through 4, or include components different from those shown in Figures 2 through 4, Figure 2 through FIG. 4 only shows components that are more relevant to the implementations disclosed in the embodiments of the present application. For example, although four nodes have been described in Figures 3 and 4, those skilled in the art will appreciate that a cluster database system can include any number of nodes. The database management system functions of each node may be implemented by appropriate combinations of software, hardware, and/or firmware running on each node, respectively.

A person skilled in the art can clearly understand that the method of the embodiment of the present application can be generally applied to a database management system installed or deployed in a stand-alone database system, a cluster of a Shared-nothing architecture, according to the teachings of the embodiments of the present application. Database system, clustered database system of Shared-storage architecture, or other types of database systems.

For ease of understanding and description, by way of example and not limitation, the scheme of the embodiment of the present application is described below by taking a database server as an example. The database server may specifically be an SMP server in the stand-alone database system described in FIG. 2, or a node as described in FIG. 3 or FIG. Specifically, as shown in FIG. 5, the database server 100 includes at least one processor 104, a non-transitory computer-readable medium 106 and a database management system 108 that store executable code. The executable code, when executed by at least one processor 104, is configured to implement the components and functions of database management system 108. The non-transitory computer readable medium 106 can include one or more non-volatile memories. As an example, the non-volatile memory includes a semiconductor memory device, such as an Erasable Programmable Read Only Memory (EPROM). , Electrically Erasable Programmable Read Only Memory (EEPROM) and flash memory; disk, such as internal hard disk or removable disk, magneto optical disk , as well as CD ROM and DVD-ROM. Moreover, the non-transitory computer readable medium 106 can also include any device that is configured as a main memory. The at least one processor 104 can include any type of general purpose computing circuit or special purpose logic circuit, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The at least one processor 104 can also be one or more processors, such as a CPU, coupled to one or more semiconductor substrates.

The database management system 108 can be a Relational Database Management System (RDBMS). Database management system 108 supports Structured Query Language (SQL). In general, SQL refers to a specialized programming language that is dedicated to managing data stored in relational databases. SQL can refer to various types of data-related languages, including, for example, data definition languages and data manipulation languages, where SQL can include data insertion, query, update and delete, schema creation and modification, and data access control. Moreover, in some examples, SQL can include descriptions related to various language elements, including clauses, expressions, predicates, queries, and statements. Wherein the expression can be configured to generate a scalar value and/or a table comprising data columns and/or rows. Predicate (PREDicate) is a logical expression that evaluates to logical values (such as TRUE, FALSE, UNKNOWN) and can be used to describe the connection relationship between objects. For example, in a SELECT query, the filter in the WHERE clause and the HAVING clause can be understood as a specified predicate.

A query is a request to view, access, and/or manipulate data stored in a database. For example, database management system 108 can receive a query in SQL format (referred to as a SQL query) from database client 102. Generally, the database management system 108 receives the client's query through a communication interface, such as an application program interface (API) or an Ethernet interface, accesses relevant data from the database, and manipulates the related data to generate a query result corresponding to the query, and queries the query. The result is returned to the database client 102 via the communication interface described above. A database is a collection of data organized, described, and stored in a mathematical model that can include one or more database structures or formats, such as row storage and column storage. The database is typically stored in a data store, such as external data store 120 in FIG. 5, or non-transitory computer readable medium 106. When the database is stored on the non-transitory computer readable medium 106, the database management system 108 is an in-memory database management system.

Database client 102 can include any type of device or application that is configured to interact with database management system 108. In some examples, database client 102 includes one or more application servers.

The database management system 108 includes a parser 112, a query optimizer 114, a query executor 122, and a storage engine 134. The parser 110 is configured to perform syntax and semantic analysis of a query submitted by the client 102, and expand the view in the query into small query blocks. The query optimizer 114 generates a set of execution plans that may be used for the query, estimates the cost of each execution plan, compares the cost of the plan, and ultimately selects an optimal execution plan. The query executor 122 operates in accordance with the execution plan of the query to generate a query result. The storage engine 134 is responsible for managing the data of the table, the actual content of the index, and also managing the data such as Cache, Buffer, transaction, and Log at runtime. For example, storage engine 134 can write execution results of execution engine 122 to data store 120 via physical I/O.

The calculation of Predicate Selectivity is a very important part of the query optimizer 114 in selecting the optimal execution plan. The accuracy of the predicate selection rate directly affects the accuracy of the execution plan, such as the accuracy of the estimate of the cost of each operator in the execution plan, which affects the output of the optimal execution plan.

The database server 100 based on the above description is directed to a predicate combination with repeated predicates, and a data query method is proposed to improve the accuracy of the predicate selection rate, thereby improving query performance.

FIG. 6 shows a schematic flowchart of a method 600 for data query according to an embodiment of the present application. Referring to FIG. 5, the method includes:

S610, the database management system 108 receives a query statement submitted by the client through a communication connection established with the database server;

S620, the parser 112 of the database management system 108 parses the query statement to obtain a plurality of predicates;

S630, the query optimizer 114 performs predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;

S640. The query optimizer 114 determines, according to the type of the pre-configured training model, a plurality of candidate predicate combinations corresponding to the pre-configured training model, and each of the plurality of candidate predicate combinations. The candidate predicate combination includes at least two predicates;

Optionally, the query optimizer 114 may select a plurality of candidate predicate combinations available according to the training model category, and each candidate predicate combination has a corresponding training model.

Optionally, in the embodiment of the present application, the training model may be a supervised learning model or an unsupervised learning model obtained by a machine learning algorithm, such as a neural network (NN) model, a support vector machine (SVM). Models, fuzzy models, random forests (Random Forest) and other models. Specifically, the neural network model includes a forward neural network (FFNN) model, a recurrent neural network (RNN) model, and the like.

It should be noted that the machine learning training model and process are external to the database, and the database kernel establishes a system table associated with the external machine learning model. After the training of the model for predicate selection rate is completed, the obtained training model and the predicate combination corresponding to the training model are stored in the above system table, and each predicate combination corresponds to a training model. Further, the training model can be tested with partial untrained data, and the summarized model confidence (accuracy) values are stored in the above system table. In addition, after the machine learning model is introduced into the database query optimizer, the specific model training process and related technical processes for writing the training results into the system table can be referred to the prior application ZL201710109372.1 - "An Information Processing Method and Apparatus". I will not repeat them here.

S650. The query optimizer 114 determines a first predicate combination in the plurality of candidate predicate combinations, where the first predicate combination includes predicates different from each other;

Optionally, the query optimizer 114 may also determine at least one first predicate combination in the plurality of candidate predicate combinations, and the at least one first predicate combination includes predicates different from each other.

Here, "the at least one first predicate combination includes predicates different from each other" is for the predicate combination. For example, if the predicate combination 1 includes the predicate 1 and the predicate 2, and the predicate combination 2 includes the predicate 3 and the predicate 4, it can be seen that the predicate combination 1 and the predicate combination 2 include predicates that are different from each other.

S660. The query optimizer 114 determines the first execution plan by using the training model corresponding to the first predicate combination, and the query executor 122 performs a data query using the execution plan generated by the query optimizer 114, and returns the query result to the client 102.

Specifically, when receiving a query statement (such as a SQL statement) from the client, the database server 100 may parse the one query statement to obtain a plurality of predicates. Next, the query optimizer 114 may perform predicate combination or recombination on the plurality of predicates based on the connection relationship of the predicates to obtain a plurality of predicate combinations. For example, the query optimizer 114 can perform peer-level predicate reorganization in a hierarchy. Here, the query optimizer 114 can learn the connection relationship between the predicates. Then, the query optimizer 114 may select a plurality of candidate predicate combinations supported by the training model among the plurality of predicate combinations according to the type of the training model saved in the system table. The query optimizer 114 may select, among the plurality of candidate predicate combinations, the first predicate combination including the predicates that are different from each other. Finally, the query optimizer 114 determines a first execution plan using the training model corresponding to the first predicate combination, and performs a data query using the first execution plan.

Here, if the query optimizer 114 can also determine at least one first predicate combination among the plurality of candidate predicate combinations (the predicates included in the at least one first predicate combination are different from each other), the query optimizer 114 uses The training model corresponding to each of the at least one first predicate combination determines an execution plan, including: the query optimizer 114 calculates the predicate selection rate using the training model corresponding to each of the first predicate combinations to obtain a plurality of The predicate selection rate is then multiplied by the multiple predicate selection rates to obtain a final predicate selection rate and an execution plan is determined based on the final predicate selection rate. For example, the query optimizer 114 obtains the predicate selection rate corresponding to the predicate combinations C1 and C2 by A, and the predicate selection rate corresponding to C3 and C4 is B, so that the predicate selection rate corresponding to C1, C2, C3, and C4 can be obtained. *B. The query optimizer 114 determines the final execution plan based on the predicate selection rate A*B.

Therefore, for a predicate combination, the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance or the SQL execution performance.

The case where the predicate combination has no repeated predicates is described above. Alternatively, as an embodiment, if the two predicate combinations have the same or repeated predicates, the query optimizer 114 may select an appropriate predicate combination based on the confidence of the training model. It should be understood that, in the embodiment of the present application, the “first predicate combination” and the “second predicate combination” are introduced only to distinguish different objects, and the embodiments of the present application are not limited.

A method 700 of data query in accordance with another embodiment of the present application will now be described in conjunction with FIG. As shown in FIG. 7, the method 700 includes:

S710. Determine at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate;

Optionally, the query optimizer 114 may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein one second predicate combination may include at least two predicates, the at least two second predicate combinations having At least one identical or repeated predicate.

Here, two second predicate combinations having repeated predicates determined by the query optimizer 114 are taken as an example, wherein each second predicate combination may include a plurality of predicates.

For example, the predicate combination 1 may include a predicate 1 and a predicate 2, and the predicate combination 2 may include a predicate 1 and a predicate 4, wherein the repeated predicate between the predicate combination 1 and the predicate combination 2 is a predicate 1.

Alternatively, for example, the predicate combination 3 may include a predicate 1, a predicate 2, and a predicate 3, and the predicate combination 4 may include a predicate 1, a predicate 2, and a predicate 5, wherein the repeated predicate between the predicate combination 3 and the predicate combination 4 is a predicate 1 And predicate 2.

Optionally, in the embodiment of the present application, each predicate combination has a corresponding training model. Among them, the training model can be understood as the selectivity model of the predicate combination. For example, for a predicate combination consisting of field 1 and field 2, a two-column correlation model can be established.

Optionally, the method 600 or the method 700 may further include:

Alternatively, the query optimizer 114 may obtain the confidence of the training model corresponding to each second predicate combination from the system table of the database.

Optionally, the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model. Among them, the confidence of the model is used to indicate the accuracy of the training model. For example, an example of partial data of a training model saved in a system table of a database is shown in Table 1 below. As shown in Table 1:

Table 1 Part of the training model saved in the system table

In Table 1, sel2 indicates that the predicate with relevance in the query has 2 bits. PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 are respectively related predicate combinations. PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 respectively correspond to different training models. Valid can be understood as the identification bit of the training model. The value of the flag is used to indicate the validity of the training model. For example, when the valid value is 1, the training model is valid; when the valid value is 0, the training model is invalid. Confidence is used to indicate the confidence of the training model. For example, in Table 1, the confidence of the training model corresponding to PRED1 and PRED2 is 0.76, the confidence of the training model corresponding to PRED1 and PRED4 is 0.93, and the training model corresponding to PRED3 and PRED5. The confidence level is 0.26.

It should be understood that the foregoing is only an example of the data in Table 1. In practice, other possible data may be included in the system table of the database, which is not limited thereto.

S720. Determine a target predicate combination in the at least two second predicate combinations according to the confidence information of the training model corresponding to each second predicate combination in the at least two second predicate combinations, where the confidence is used. Indicating the accuracy of the training model;

Optionally, the query optimizer 114 may select an appropriate or optimal predicate combination, that is, a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.

Alternatively, the query optimizer 114 may also select the target predicate combination according to other screening criteria. For example, the query optimizer 114 may set a threshold screening condition, select a target predicate combination that satisfies the threshold screening condition among the plurality of second predicate combinations, and eliminate other second predicate combinations that do not satisfy the threshold screening condition.

Optionally, the confidence level of the training model corresponding to the target predicate combination is greater than the execution degree of the training model corresponding to the other predicate combinations in the at least two second predicate combinations, that is, the training model corresponding to the target predicate combination Confidence is the largest of all second predicate combinations.

That is, the query optimizer 114 may compare the confidence of the training model corresponding to each second predicate combination of the plurality of second predicate combinations, and then filter out the maximum confidence, thereby determining the target predicate combination, and phasing out other Predicate combination.

For example, if the confidence model of the training model corresponding to predicate combination 1 (PRED1, PRED2) is 0.76, and the confidence model of the training model corresponding to predicate combination 2 (PRED1, PRED4) is 0.93, then the predicate combination 2 with greater confidence is selected. PRED1, PRED4), as the target predicate combination, and eliminate the predicate combination 1.

It should be understood that the description of the predicate combination 1 and the predicate combination 2 is merely taken as an example. In practice, a plurality of predicate combinations may be included, which is not limited thereto.

S730: Determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.

Specifically, the query optimizer 114 may perform a corresponding calculation using the training model corresponding to the target predicate combination to obtain a corresponding execution plan (such as a second execution plan), thereby performing data query using the second execution plan. Since the target predicate combination is the filtered optimal predicate combination, the query optimizer 114 can obtain an optimal execution plan according to the training model corresponding to the target predicate combination.

In an embodiment of the present application, the query optimizer 114 determines at least a second two predicate combinations, each of the at least two second predicate combinations including at least two predicates, the at least two second The predicate combination has at least one identical predicate, wherein each second predicate combination of the at least two second predicate combinations has a corresponding training model, and according to the confidence of the training model corresponding to each second predicate combination Determining a target predicate combination in the at least two second predicate combinations, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby having When the selection rate of the predicate combination of overlapping predicates is increased, the accuracy of the predicate selection rate can be improved.

It should be understood that the method 600 and the method 700 may be used in combination or independently. For example, in a plurality of candidate predicate combinations, there may be some predicate combinations without repeated predicates, and some predicate combinations may have repeated predicates; or In a plurality of candidate predicate combinations, there are only some predicate combinations that do not have a repeating predicate; or, in a plurality of candidate predicate combinations, there are only some predicate combinations having repeated predicates, which are not limited in this embodiment of the present application.

How to calculate the confidence of the training model will be specifically described below. It should be understood that the confidence level of the training model can be calculated by using various evaluation methods, and only one possible calculation method is described as an example, and the embodiments of the present application are not limited. It should also be understood that the "calculation operation of the confidence of the training model" and the "training operation of the training model" may be the same execution subject, and may be a module independent of the database or other implementation means, which may be located outside the database, for which no limited. It should also be understood that the kernel of the database can establish associated system table element information with an external training model to learn the training results or related data of the training model.

For example, the calculation process of the confidence level of the first training model is taken as an example, and may include:

Obtaining a first training predicate combination, and calculating a first selection rate of the first training predicate combination;

Substituting the first training predicate combination into a corresponding first training model, and calculating a second selection rate of the first training predicate combination, the first training model is corresponding to any one of the at least two predicate combinations Training model;

Calculating, according to the first selection rate and the second selection rate, a first confidence level corresponding to the first training predicate combination;

A confidence level of the first training model is determined based on the plurality of the first confidence levels.

Specifically, for example, assumed that the first training predicate combination PRED1 = const1, PRED2 = const2, a first function corresponding to a training model for f _ml. First, the selection rate S _{ml of the} corresponding training predicate combination is calculated according to the first training model, as shown in the following formula:

S _ml =f _ml (const1,const2)

Then calculating the true selection rate S of the first training predicate combination, as shown in the following equation:

Among them, count is the grammatical meaning of count in SQL, which is used to indicate the number of tuples that satisfy the predicate condition in a predicate condition. For example, if there are 10 pieces of data in a table, among which there are 4 tuples satisfying the predicate condition PRED1=const1and PRED2=const2, the result of count(const1, const2) is 4, and the result of count(*) is 10.

Here, assuming that the calculation result S _ml = 0.3, S = 0.28, the first confidence level of the first training model corresponding to the first training predicate combination is c ₁ , and the value of the definition c ₁ is as follows:

Specifically, since S _ml /S=0.3/0.28=1.07, c ₁ =1; if S _ml =0.3, S=0.38, since S _ml /S=0.3/0.38=0.79, c ₁ =0.

The above describes a way to calculate confidence. Similarly, for multiple training predicate combinations, the corresponding confidence can be calculated in a similar way. That is, the query optimizer 114 may acquire a plurality of first training predicate combinations to obtain a first confidence level corresponding to each first training predicate combination. The query optimizer 114 then calculates the confidence of the first training model using a plurality of first confidences.

For example, n training predicate combinations may correspond to values of n c _i . The query optimizer 114 integrates the values of n c _i and calculates the confidence C of the training model as:

Where i∈{1,2,...n}, where n is the number of training predicate combinations.

Here, a plurality of first training predicate combinations can be understood as some untrained data of the training model for verifying the accuracy of the training model. That is to say, the model can be verified by using data that is not part of the model training to obtain the value of the accuracy of the training model.

It should be understood that the first training model is used as an example for description. In the embodiment of the present application, the confidence of each training model may be calculated by using the foregoing method, which is not limited thereto.

Therefore, for a plurality of predicate combinations having repeated predicates, the query optimizer 114 may determine the target predicate combination among the plurality of second predicate combinations according to the confidence of the training model corresponding to each predicate combination, and finally use the target predicate combination. The corresponding first training model determines a second execution plan, and then uses the second execution plan to perform a data query, thereby improving the accuracy of the predicate selection rate when calculating the selection rate of the predicate combination having overlapping predicates.

Optionally, determining a second execution plan by using a training model corresponding to the target predicate combination, including:

Obtaining a model parameter of the training model corresponding to the target predicate combination, where the training model parameter includes at least one of a weight and an offset;

The second execution plan is generated using the model parameters.

Specifically, the query optimizer 114 may search, in the system table of the database, model parameters of the training model corresponding to the target predicate combination, and the model parameters may include training results of the training model, such as weights, offsets, and the like. parameter. For example, the weight may be a neuron connection weight in the neural network training model, including weights between the input layer and the output layer, a hidden layer threshold, an output layer threshold, a hidden layer and an output layer weight matrix, etc., offset The amount may be an offset corresponding to the weight obtained by the training of the neural network training model. Thus, the query optimizer 114 calculates the predicate selection rate based on the model parameters, which in turn generates a second execution plan.

It should be understood that for at least one predicate combination having no repeated predicates as described above, the corresponding execution plan may also be obtained by referring to the method introduced herein, and for brevity, no further description is made.

Optionally, in the at least two second predicate combinations determined from the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.

Specifically, the query optimizer 114 may acquire a plurality of candidate predicate combinations, which are candidate predicate combinations selected by the query optimizer 114 based on the machine learning algorithm, or may be understood as predicates supported by the trained model. combination. For example, the plurality of candidate predicate combinations may be: predicate combination 1 (PRED1, PRED2); predicate combination 2 (PRED1, PRED4); predicate combination 3 (PRED3, PRED5). Then, the query optimizer 114 may select the at least two second predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations. For example, the query optimizer 114 may determine the confidence level of the training model corresponding to each candidate predicate combination, as the candidate predicate combination satisfying the preset condition as the at least two second predicate combinations, so as to facilitate subsequent A target predicate combination is determined in the at least two predicate combinations.

Optionally, the confidence of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.

Here, the first threshold can be understood as a constant recognized internally by the query optimizer 114. If the confidence level of the training model of a certain set of predicate combinations is greater than the first threshold, the accuracy of the training model is considered to be higher.

For example, suppose the first threshold is 0.3, the confidence of the training model corresponding to predicate combination 1 (PRED1, PRED2) is 0.76, and the confidence of the training model corresponding to predicate combination 2 (PRED1, PRED4) is 0.93, and the predicate combination 3 (PRED3) , PRED5) The corresponding training model has a confidence of 0.26. Then, when determining the at least two predicate combinations, the query optimizer 114 selects a predicate combination greater than 0.3, that is, a predicate combination 1 (PRED1, PRED2) and a predicate combination 2 (PRED1, PRED4), and the elimination is less than 0.3. The predicate combination, that is, the predicate combination 3 (PRED3, PRED5).

Alternatively, the query optimizer 114 may also set a filter condition, that is, sort the confidence of the training model of all predicate combinations, and then select the first ratio in which the confidence is ranked in the ranking (such as before the sequence table). 30%) of the training model as the adopted training model. For a training model with a lower confidence ranking (such as the last 70% of the sequence table), it can be considered that the screening conditions are not met and will not be adopted by the query optimizer 114.

Therefore, the query optimizer 114 can select the at least two predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations by introducing a threshold or a filter condition, thereby obtaining a training model with a higher accuracy rate, so as to facilitate subsequent output execution. plan.

In order to facilitate a clear understanding of the embodiments of the present application, those skilled in the art will be described below with reference to FIGS. 8 to 11.

FIG. 8 shows a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application. As shown in FIG. 8, the database management system 108 can receive SQL query statements submitted by the client through a communication connection established with the database server (as shown in the uppermost box of FIG. 8), and the underlined portion is a constant predicate (eg, Constant predicates can be constant expressions or constant functions). Next, the parser 112 of the database management system 108 can analyze the SQL query statement to obtain a predicate that can be supported by the training model (or machine learning model), and obtain the PRED1, PRED2, PRED3, PRED4, and PRED5 after analysis. The underlined predicate in the box in the middle of Figure 8). Among them, the query optimizer 114 analyzes that the connection predicate is not supported by the training model. Further, the query optimizer 114 may specifically obtain a plurality of candidate predicate combinations based on PRED1, PRED2, PRED3, PRED4, and PRED5. As shown in the lowermost box of Figure 8, the query optimizer 114 obtains three sets of predicate combinations with two column selection rates (i.e., two predicates in each predicate combination), namely: PRED2 and PRED1, PRED1 and PRED4. , PRED3 and PRED5. Among them, each set of predicate combinations corresponds to one training model, and each training model has a confidence level. Thus, query optimizer 114 can perform subsequent operations based on the plurality of candidate predicate combinations.

FIG. 9 shows a flow chart of an example in accordance with an embodiment of the present application. As shown in FIG. 9, the query optimizer 114 can acquire a plurality of candidate predicate combinations (such as the plurality of candidate predicate combinations shown in FIG. 8) by the preliminary screening operation, and judge the confidence of each candidate predicate combination. If it is determined that the confidence does not satisfy the preset condition, the candidate predicate combination is eliminated; if it is determined that the confidence meets the preset condition, the remaining candidate predicate combinations are subjected to secondary screening. It should be understood that the preset condition may be a threshold or other screening conditions, which is not limited thereto. Optionally, the query optimizer 114 may also determine whether the training model corresponding to the candidate predicate combination is valid (such as a valid value), and may enter the next operation when the training model is valid.

Then, in a secondary screening operation, for at least two predicate combinations with duplicate or identical predicates, the query optimizer 114 needs to determine if their confidence is greatest among the at least two predicate combinations. Then, the query optimizer 114 selects the predicate combination with the greatest confidence as the winning predicate combination in at least two predicate combinations with repeated or identical predicates, and uses the training model corresponding to the winning predicate combination to calculate the corresponding selection rate, and finally outputs Optimal execution plan. Alternatively, query optimizer 114 may eliminate other predicate combinations where the confidence is not the greatest. Therefore, the query optimizer 114 can obtain the training model corresponding to the optimal predicate combination through two screenings, and perform corresponding calculations to obtain an optimal execution plan.

Optionally, in the secondary screening operation, the query optimizer 114 may also obtain a predicate combination without a repeated predicate, but the corresponding confidence also satisfies the foregoing preset condition, but does not include the repeated predicate (not shown in FIG. show). At this time, the query optimizer 114 can use its corresponding training model to perform corresponding calculations to obtain its corresponding execution plan.

It should be understood that the repetition or the same predicate of the at least two predicate combinations may be one or multiple, which is not limited thereto.

FIG. 10 shows a flow chart of a specific example in accordance with an embodiment of the present application. Here, FIG. 10 is a further visual representation of FIG. As shown in FIG. 10, the three sets of candidate predicate combinations obtained by the query optimizer 114 through the preliminary screening are: PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5. It can be seen that PRED1 and PRED2, PRED1 and PRED4 have duplicate predicates PRED1. Among them, the confidence model of the training model corresponding to PRED1 and PRED2 is 0.76; the confidence of the training model corresponding to PRED1 and PRED4 is 0.93; the confidence of the training model corresponding to PRED3 and PRED5 is 0.26. Next, the query optimizer 114 determines whether the respective confidence levels of the three sets of candidate predicate combinations are greater than 0.3. Obviously, 0.26 is less than 0.3, the query optimizer 114 eliminates the predicate combinations PRED3 and PRED5; 0.76 is greater than 0.3, 0.93 is greater than 0.3, and the query optimizer 114 performs secondary screening on PRED1 and PRED2, PRED1 and PRED4. Next, the query optimizer 114 determines the confidence levels of the predicate combinations with the repeated predicate PRED1 (ie, PRED1 and PRED2, PRED1 and PRED4), and selects the predicate combination with the highest confidence, here PRED1 and PRED4, and eliminates PRED1. And PRED2. Finally, the query optimizer 114 performs corresponding calculations using the training models corresponding to PRED1 and PRED4 to output an execution plan.

It should be understood that the description of the repeated predicate PRED1 is taken as an example. In practice, there may be multiple repeated predicates, and the method of the embodiment of the present application may also be used, which is not limited thereto.

Alternatively, in the secondary screening operation, it is also possible to obtain a predicate combination that does not have a repeating predicate, but its corresponding confidence is also greater than 0.3, such as PRED6 and PRED7 (not shown in FIG. 10). At this time, the query optimizer 114 can perform corresponding calculations using the training models corresponding to PRED6 and PRED7 to obtain their corresponding execution plans.

Fig. 11 is a diagram showing an example of application of an embodiment of the present application. The predicate combination that wins in Figure 10 is visually shown in Figure 11. As shown in FIG. 11, the query optimizer 114 is among a plurality of candidate predicate combinations (PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5), and finally the winning predicate combinations are PRED1 and PRED4.

It should be understood that the foregoing description is only made by taking FIG. 10 and FIG. 11 as an example, and does not limit the embodiment of the present application.

The data query method of the embodiment of the present application can improve the accuracy of the predicate selection rate, thereby improving the query performance of the data query. Further, for at least two predicate combinations having repeated predicates, the training model corresponding to the predicate combination with high confidence is selected according to the confidence degree, and the accuracy of the predicate selection rate can be improved.

It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application. The implementation process constitutes any limitation.

The method of data query according to an embodiment of the present application is described in detail above, and an apparatus and database system for data query according to an embodiment of the present application will be described below. The device for querying the data and the database system can perform the method of data query of the foregoing embodiment of the present application.

FIG. 12 shows a schematic block diagram of an apparatus 1200 for data query in accordance with an embodiment of the present application. As shown in FIG. 12, the apparatus 1200 includes:

The receiving module 1210 is configured to receive a query statement.

The processing module 1220 is configured to parse the query statement to obtain a plurality of predicates, and is further configured to perform a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;

The first determining module 1230 is configured to determine, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, where the plurality of candidate predicate combinations are Each candidate predicate combination includes at least two predicates;

The first determining module 1230 is further configured to determine, in the plurality of candidate predicate combinations, a first predicate combination, where the first predicate combination includes predicates different from each other;

The processing module 1220 is further configured to determine a first execution plan by using a training model corresponding to the first predicate combination, and perform data query by using the first execution plan.

The apparatus 1200 for data query in the embodiment of the present application may determine, in a plurality of candidate predicate combinations, a first predicate combination that does not have the same predicate. Since there is a corresponding training model for each candidate predicate combination, if the first predicate combination does not have the same predicate, the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used. The training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan. In other words, for a predicate combination, the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.

It should be noted that in the embodiment of the present application, the apparatus 1200 may be the query optimizer 114 described above or a software/hardware functional unit integrated in the query optimizer 114. For example, the receiving module 1210 can be implemented by a receiver, or a communication interface, and the functions of the processing module 1220 and the first determining module 1230 can be implemented by at least one processor executing instructions in memory. Optionally, the components in the database query device may be coupled together by a bus system, wherein the bus system includes a power bus, a control bus, a status signal bus, and the like in addition to the data bus.

Optionally, as an embodiment, the first determining module 1220 is further configured to: determine, in the plurality of candidate predicate combinations, at least two second predicate combinations, the at least two second predicate combinations having at least one identical predicate;

As shown in FIG. 13 , as an embodiment, the apparatus 1200 further includes:

a second determining module 1240, configured to determine a target predicate combination in the at least two predicate combinations according to a confidence level of a training model corresponding to each second predicate combination in the at least two second predicate combinations, the confidence Degree is used to indicate the accuracy of the training model;

The processing module 1220 is further configured to determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.

Optionally, as an embodiment, the apparatus 1200 further includes:

And an obtaining module, configured to acquire a confidence level of the training model corresponding to each second predicate combination in the at least two second predicate combinations.

Optionally, a confidence level of the training model corresponding to the target predicate combination is the largest among the at least two predicate combinations.

Optionally, in the second at least two predicate combinations determined by the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.

Optionally, a confidence level of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.

Optionally, the processing module 1220 is specifically configured to:

Obtaining a model parameter of the training model corresponding to the target predicate combination, the training model parameter includes at least one of a weight and an offset; and generating the second execution plan by using the model parameter.

The apparatus 1200 for data query according to an embodiment of the present application may perform the

method

600 or 700 of data query according to an embodiment of the present application, and the above and other operations and/or functions of the respective modules in the apparatus 1200 of the data query are respectively implemented for The corresponding processes of the foregoing various methods are not described herein for the sake of brevity. Additionally, the functions of the second determining module 1240 and the obtaining module may also be implemented by at least one processor executing instructions in memory. The device 1200 for data query in the embodiment of the present application may select a first predicate combination that does not have the same predicate among a plurality of candidate predicate combinations. Since each candidate predicate combination has a corresponding training model, if the first predicate combination includes different predicates, the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used. The training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan. In other words, for a predicate combination, the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.

FIG. 14 shows a schematic block diagram of a database system 1400 in accordance with an embodiment of the present application. As shown in FIG. 14, the database system 1400 includes the device 1200 and database 1410 of the data query of the foregoing embodiment of the present application. The database system 1400 can perform the foregoing method of data query in the embodiment of the present application, and perform an inquiry in the database 1410.

FIG. 15 shows the structure of an apparatus for data query provided by an embodiment of the present application, including at least one processor 1502 (for example, a CPU), at least one network interface 1503 or other communication interface, and a memory 1504. Alternatively, a receiver 1505 and a transmitter 1506 can also be used. The processor 1502 is configured to execute an executable module, such as a computer program, stored in the memory 1504. The memory 1504 may include a high speed random access memory RAM, and may also include a non-volatile memory such as at least one disk memory. A communication connection with at least one other network element is achieved by at least one network interface 1503, which may be wired or wireless. Receiver 1505 and transmitter 1506 are used to transmit various signals or information.

In some embodiments, the memory 1504 stores a program 15041 that can be executed by the processor 1502 for performing the method of data query of the foregoing embodiments of the present application.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present application, or the part contributing to the prior art or the part of the technical solution, may be embodied in the form of a software product stored in a storage medium. The instructions include a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the embodiments of the present application, but the scope of protection of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily adopt the technical scope disclosed in the embodiments of the present application. All changes or substitutions are contemplated to be within the scope of the embodiments of the present application. Therefore, the scope of protection of the embodiments of the present application is subject to the scope of protection of the claims.

Claims

A method for data query, comprising:

Receiving a query statement;

Parsing the query statement to obtain a plurality of predicates;

Performing a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;

Determining, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, each candidate predicate combination of the plurality of candidate predicate combinations including at least Two predicates;

Determining, in the plurality of candidate predicate combinations, a first predicate combination, the first predicate combination including predicates different from each other;

A first execution plan is determined using the training model corresponding to the first predicate combination, and a data query is performed using the first execution plan.
The method of claim 1 further comprising:

Determining at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate;

Determining a target predicate combination in the at least two second predicate combinations according to a confidence level of a training model corresponding to each of the at least two second predicate combinations, the confidence level being used to indicate a training model Accuracy

A second execution plan is determined using a training model corresponding to the target predicate combination, and a data query is performed using the second execution plan.
The method of claim 2, wherein the method further comprises:

Obtaining a confidence level of a training model corresponding to each of the at least two second predicate combinations.
The method according to claim 3, wherein said target predicate combination is a predicate combination having the greatest confidence of the training model in said at least two second predicate combinations.
The method according to any one of claims 2 to 4, wherein each of the at least two second predicate combinations determined from the plurality of candidate predicate combinations corresponds to each second predicate combination The confidence of the training model satisfies the preset conditions.
The method according to claim 5, wherein the confidence of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
The method according to any one of claims 2 to 6, wherein the determining the second execution plan by using the training model corresponding to the target predicate combination comprises:

Obtaining a model parameter of the training model corresponding to the target predicate combination, where the training model parameter includes at least one of a weight and an offset;

The second execution plan is generated using the model parameters.
A device for data query, comprising:

a receiving module, configured to receive a query statement;

a processing module, configured to parse the query statement to obtain a plurality of predicates; and further configured to perform a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;

a first determining module, configured to determine, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, where the plurality of candidate predicate combinations Each candidate predicate combination includes at least two predicates;

The first determining module is further configured to determine, in the plurality of candidate predicate combinations, a first predicate combination, where the first predicate combination includes predicates different from each other;

The processing module is further configured to determine a first execution plan by using a training model corresponding to the first predicate combination, and perform data query using the first execution plan.
The apparatus according to claim 8, wherein the first determining module is further configured to determine at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least An identical predicate;

The device also includes:

a second determining module, configured to determine a target predicate combination in the at least two second predicate combinations according to a confidence level of a training model corresponding to each second predicate combination in the at least two second predicate combinations, Confidence is used to indicate the accuracy of the training model;

The processing module is further configured to determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
The device according to claim 9, wherein the device further comprises:

And an obtaining module, configured to acquire a confidence level of the training model corresponding to each second predicate combination in the at least two second predicate combinations.
The apparatus according to claim 10, wherein said target predicate combination is a predicate combination having the greatest confidence of the training model in said at least two second predicate combinations.
The apparatus according to any one of claims 9 to 11, wherein each of the at least two second predicate combinations determined from the plurality of candidate predicate combinations corresponds to each second predicate combination The confidence of the training model satisfies the preset conditions.
The apparatus according to claim 12, wherein a confidence level of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
The device according to any one of claims 9 to 13, wherein the processing module is specifically configured to:

Obtaining a model parameter of the training model corresponding to the target predicate combination, the training model parameter includes at least one of a weight and an offset; and generating the second execution plan by using the model parameter.
An apparatus for data query, the apparatus comprising at least one processor, a memory, and instructions stored on the memory and executable by the at least one processor, wherein the at least one processing The instructions are executed to implement the steps of the method of any one of claims 1 to 7.
A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to perform the steps of the method of any one of claims 1 to 7.
A database system characterized by comprising means and a database for data querying according to any one of claims 8 to 14.