WO2018201916A1 - Data query method, device, and database system - Google Patents

Data query method, device, and database system Download PDF

Info

Publication number
WO2018201916A1
WO2018201916A1 PCT/CN2018/083826 CN2018083826W WO2018201916A1 WO 2018201916 A1 WO2018201916 A1 WO 2018201916A1 CN 2018083826 W CN2018083826 W CN 2018083826W WO 2018201916 A1 WO2018201916 A1 WO 2018201916A1
Authority
WO
WIPO (PCT)
Prior art keywords
predicate
combinations
training model
combination
candidate
Prior art date
Application number
PCT/CN2018/083826
Other languages
French (fr)
Chinese (zh)
Inventor
杨新颖
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018201916A1 publication Critical patent/WO2018201916A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of databases and, more particularly, to a method, apparatus and database system for data query.
  • the database system processes a query (Query) query from the client, for example, a query represented by a Structured Query Language (SQL), the query needs to be parsed, pre-compiled, optimized, etc., and then executed. plan.
  • SQL Structured Query Language
  • the optimizer is the most important component in the database system that affects the execution efficiency of the SQL statement, and its output estimates the least costly execution plan (or called the optimal execution plan).
  • the selection rate estimation of the predicate is very important. The accuracy of the predicate selection rate estimation directly affects the accuracy of the optimizer's subsequent estimation of the operator's cost in the execution plan, thus affecting the output of the overall optimal execution plan.
  • the application provides a data query method, device and database system to improve the accuracy of the predicate selection rate, thereby improving query performance.
  • a method of data query including:
  • the database server parses the query statement by receiving a query statement from the client to obtain a plurality of predicates; then performing predicate combination on the plurality of predicates to obtain a plurality of predicate combinations; and then according to the pre-configured training model a type, among the plurality of predicate combinations, determining a plurality of candidate predicate combinations supported by the pre-configured training model, each of the plurality of candidate predicate combinations comprising at least two predicates; Determining a first predicate combination in the plurality of candidate predicate combinations, the first predicate combination includes predicates different from each other; and finally determining a first execution plan using the training model corresponding to the first predicate combination, and using the An execution plan for data query.
  • the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
  • the database server may select an appropriate predicate combination based on the confidence of the training model.
  • the method further includes:
  • the database server determines at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate; each of the two second predicate combinations according to the indication a confidence level of the training model corresponding to the two predicate combinations, wherein the target predicate combination is determined in the at least two predicate combinations, the confidence level is used to indicate the accuracy of the training model; and the training model corresponding to the target predicate combination is used to determine the Second, execute the plan and use the second execution plan to perform data query.
  • the database server determines at least two second predicates by having at least one second predicate combination having at least one identical predicate, and according to a confidence level of the training model corresponding to each second predicate combination Determining a target predicate combination in the combination, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby calculating a selection rate of a predicate combination having overlapping predicates Can improve the accuracy of the predicate selection rate.
  • the database server may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein a second predicate combination may include at least two predicates, the at least two second A predicate combination has at least one identical or repeated predicate.
  • the database server may select an appropriate or optimal predicate combination, such as a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.
  • the database server may also select the target predicate combination according to other filtering conditions. For example, the database server may set a threshold screening condition, and select, in the at least two second predicate combinations, a second predicate combination that satisfies the threshold filtering condition, that is, the target predicate combination, to eliminate other second predicate combinations that do not satisfy the threshold screening condition. .
  • the method further includes:
  • the database server may obtain the confidence of the training model corresponding to each predicate combination from the system table of the database.
  • the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model.
  • the confidence of the model is used to indicate the accuracy of the training model.
  • the confidence of the training model corresponding to the target predicate combination is greater than the confidence of the training model of the other second predicate combinations of the at least two second predicate combinations.
  • the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
  • the “preset condition” may be a specific threshold, or may be a specific screening condition.
  • the confidence model of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.
  • the second execution plan is determined by using the training model corresponding to the target predicate combination, including:
  • the database server acquires model parameters of the training model corresponding to the target predicate combination, the training model parameters include at least one of a weight and an offset; and the second execution plan is generated by using the model parameter.
  • an apparatus for data query is provided.
  • the apparatus comprises a module or unit for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.
  • an apparatus for data query includes a processor, a memory, and a communication interface.
  • the processor is coupled to the memory and the communication interface.
  • the memory is for storing instructions for the processor to execute, and the communication interface is for communicating with other network elements under the control of the processor.
  • the instructions when executed by the processor, cause the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
  • a database system in a fourth aspect, includes the apparatus and database of the data query of the second aspect or the third aspect.
  • a computer readable storage medium in a fifth aspect, storing a program that causes a device for data query to perform the first aspect described above, and any one of various implementations of the data query Methods.
  • FIG. 1 is a schematic structural diagram of a database system to which an embodiment of the present application is applied.
  • FIG. 2 is a schematic diagram of a stand-alone database system to which an embodiment of the present application is applied.
  • FIG. 3 is a schematic diagram of a cluster database system adopting a shared disk architecture according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a cluster database system employing a shared-nothing disk architecture according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a database server to which an embodiment of the present application is applied.
  • FIG. 6 is a schematic flowchart of a method for data query according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a method for data query according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application.
  • FIG. 9 is a flow chart of an example in accordance with an embodiment of the present application.
  • FIG. 10 is a flow chart of a specific example in accordance with an embodiment of the present application.
  • Figure 11 is a schematic diagram of an example of application of an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application.
  • FIG. 13 is a schematic block diagram of an apparatus for data query according to another embodiment of the present application.
  • FIG. 14 is a schematic block diagram of a database system in accordance with an embodiment of the present application.
  • FIG. 15 is a structural block diagram of an apparatus for data query provided by an embodiment of the present application.
  • the technical solution of the embodiment of the present application can be used in a database system or a database management system (DBMS), such as a relational database management system.
  • DBMS database management system
  • the architecture of the database system to which the embodiment of the present application is applied is as shown in FIG. 1.
  • the database system includes a database and a database management system DBMS.
  • a database refers to an organized collection of data stored in a data store, ie, an associated set of data organized, stored, and used in accordance with a certain data model.
  • the database may include one or more table data.
  • DBMS is used to establish, use and maintain databases, and to manage and control the database in a unified manner to ensure the security and integrity of the database.
  • the user can access the data in the database through the DBMS, and the database administrator also performs database maintenance through the DBMS.
  • DBMS provides a variety of functions that enable multiple applications and user devices to use different methods to create, modify, and query databases at the same time or at different times. Applications and user devices can be collectively referred to as clients.
  • the functions provided by the DBMS can include the following items: (1) data definition function, the DBMS provides a data definition language (DDL) to define the database structure, and the DDL is used to describe the database framework and can be saved in the data dictionary.
  • DDL data definition language
  • DBMS Data Manipulation Language (DML) to achieve basic access operations to database data, such as retrieval, insertion, modification and deletion
  • database operation management function DBMS Provide data control functions, that is, data security, integrity and concurrency control to effectively control and manage database operations to ensure data is correct and effective
  • database establishment and maintenance functions including database initial data loading Into, database dump, recovery, reorganization, system performance monitoring, analysis and other functions
  • database transmission DBMS provides processing data transmission, to achieve communication between the client and the DBMS, usually coordinated with the operating system .
  • FIG. 2 is a schematic diagram of a stand-alone database system, including a database management system and a data storage system for providing services such as querying and modifying a database, and the database management system stores data in the data storage.
  • the database management system and data storage are usually located on a single server, such as a Symmetric Multi-Processor (SMP) server.
  • SMP server includes multiple processors, all of which share resources such as bus, memory, and I/O systems.
  • the functionality of the database management system can be implemented by one or more processors executing programs in memory.
  • FIG 3 is a schematic diagram of a cluster database system using a shared-storage architecture, including multiple nodes (such as nodes 1-N in Figure 3), each node is deployed with a database management system to provide users with database queries. And modifying services, multiple database management systems store shared data in the shared data store, and perform read and write operations on the data in the data store through the switch.
  • the shared data storage can be a shared disk array.
  • a node in a clustered database system can be a physical machine, such as a database server, or a virtual machine running on an abstract hardware resource. If the node is a physical machine, the switch is a Storage Area Network (SAN) switch, an Ethernet switch, a fiber switch, or other physical switching device. If the node is a virtual machine, the switch is a virtual switch.
  • SAN Storage Area Network
  • FIG. 4 is a schematic diagram of a cluster database system adopting a shared-nothing architecture, each node has its own unique hardware resources (such as data storage), an operating system, and a database, and nodes communicate through a network. Under this system, the data will be distributed to each node according to the database model and application characteristics. The query task will be divided into several parts, executed in parallel on all nodes, and coordinated with each other to provide database services as a whole. All communication functions are in Implemented on a high-bandwidth network interconnection system. Like the clustered database system of the shared disk architecture described in Figure 3, the nodes here can be either physical or virtual machines.
  • the data store of the database system includes, but is not limited to, a solid state drive (SSD), a disk array, or other type of non-transitory computer readable medium.
  • SSD solid state drive
  • the database is stored in a data store.
  • a database system may include fewer or more components than those shown in Figures 2 through 4, or include components different from those shown in Figures 2 through 4, Figure 2 through FIG. 4 only shows components that are more relevant to the implementations disclosed in the embodiments of the present application.
  • four nodes have been described in Figures 3 and 4, those skilled in the art will appreciate that a cluster database system can include any number of nodes.
  • the database management system functions of each node may be implemented by appropriate combinations of software, hardware, and/or firmware running on each node, respectively.
  • a person skilled in the art can clearly understand that the method of the embodiment of the present application can be generally applied to a database management system installed or deployed in a stand-alone database system, a cluster of a Shared-nothing architecture, according to the teachings of the embodiments of the present application.
  • Database system clustered database system of Shared-storage architecture, or other types of database systems.
  • the database server 100 includes at least one processor 104, a non-transitory computer-readable medium 106 and a database management system 108 that store executable code.
  • the executable code when executed by at least one processor 104, is configured to implement the components and functions of database management system 108.
  • the non-transitory computer readable medium 106 can include one or more non-volatile memories.
  • the non-volatile memory includes a semiconductor memory device, such as an Erasable Programmable Read Only Memory (EPROM). , Electrically Erasable Programmable Read Only Memory (EEPROM) and flash memory; disk, such as internal hard disk or removable disk, magneto optical disk , as well as CD ROM and DVD-ROM.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • flash memory such as internal hard disk or removable disk, magneto optical disk , as well as CD ROM and DVD-ROM.
  • the non-transitory computer readable medium 106 can also include any device that is configured as a main memory.
  • the at least one processor 104 can include any type of general purpose computing circuit or special purpose logic circuit, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • the at least one processor 104 can also be one or more processors, such as a CPU, coupled to one or more semiconductor substrates.
  • the database management system 108 can be a Relational Database Management System (RDBMS).
  • Database management system 108 supports Structured Query Language (SQL).
  • SQL refers to a specialized programming language that is dedicated to managing data stored in relational databases.
  • SQL can refer to various types of data-related languages, including, for example, data definition languages and data manipulation languages, where SQL can include data insertion, query, update and delete, schema creation and modification, and data access control.
  • SQL can include descriptions related to various language elements, including clauses, expressions, predicates, queries, and statements. Wherein the expression can be configured to generate a scalar value and/or a table comprising data columns and/or rows.
  • Predicate is a logical expression that evaluates to logical values (such as TRUE, FALSE, UNKNOWN) and can be used to describe the connection relationship between objects.
  • logical values such as TRUE, FALSE, UNKNOWN
  • the filter in the WHERE clause and the HAVING clause can be understood as a specified predicate.
  • a query is a request to view, access, and/or manipulate data stored in a database.
  • database management system 108 can receive a query in SQL format (referred to as a SQL query) from database client 102.
  • the database management system 108 receives the client's query through a communication interface, such as an application program interface (API) or an Ethernet interface, accesses relevant data from the database, and manipulates the related data to generate a query result corresponding to the query, and queries the query.
  • API application program interface
  • the result is returned to the database client 102 via the communication interface described above.
  • a database is a collection of data organized, described, and stored in a mathematical model that can include one or more database structures or formats, such as row storage and column storage.
  • the database is typically stored in a data store, such as external data store 120 in FIG. 5, or non-transitory computer readable medium 106.
  • a data store such as external data store 120 in FIG. 5, or non-transitory computer readable medium 106.
  • the database management system 108 is an in-memory database management system.
  • Database client 102 can include any type of device or application that is configured to interact with database management system 108.
  • database client 102 includes one or more application servers.
  • the database management system 108 includes a parser 112, a query optimizer 114, a query executor 122, and a storage engine 134.
  • the parser 110 is configured to perform syntax and semantic analysis of a query submitted by the client 102, and expand the view in the query into small query blocks.
  • the query optimizer 114 generates a set of execution plans that may be used for the query, estimates the cost of each execution plan, compares the cost of the plan, and ultimately selects an optimal execution plan.
  • the query executor 122 operates in accordance with the execution plan of the query to generate a query result.
  • the storage engine 134 is responsible for managing the data of the table, the actual content of the index, and also managing the data such as Cache, Buffer, transaction, and Log at runtime. For example, storage engine 134 can write execution results of execution engine 122 to data store 120 via physical I/O.
  • Predicate Selectivity is a very important part of the query optimizer 114 in selecting the optimal execution plan.
  • the accuracy of the predicate selection rate directly affects the accuracy of the execution plan, such as the accuracy of the estimate of the cost of each operator in the execution plan, which affects the output of the optimal execution plan.
  • the database server 100 based on the above description is directed to a predicate combination with repeated predicates, and a data query method is proposed to improve the accuracy of the predicate selection rate, thereby improving query performance.
  • FIG. 6 shows a schematic flowchart of a method 600 for data query according to an embodiment of the present application. Referring to FIG. 5, the method includes:
  • the database management system 108 receives a query statement submitted by the client through a communication connection established with the database server;
  • the parser 112 of the database management system 108 parses the query statement to obtain a plurality of predicates
  • the query optimizer 114 performs predicate combination on the plurality of predicates to obtain a plurality of predicate combinations
  • the query optimizer 114 determines, according to the type of the pre-configured training model, a plurality of candidate predicate combinations corresponding to the pre-configured training model, and each of the plurality of candidate predicate combinations.
  • the candidate predicate combination includes at least two predicates;
  • the query optimizer 114 may select a plurality of candidate predicate combinations available according to the training model category, and each candidate predicate combination has a corresponding training model.
  • the training model may be a supervised learning model or an unsupervised learning model obtained by a machine learning algorithm, such as a neural network (NN) model, a support vector machine (SVM). Models, fuzzy models, random forests (Random Forest) and other models.
  • the neural network model includes a forward neural network (FFNN) model, a recurrent neural network (RNN) model, and the like.
  • the machine learning training model and process are external to the database, and the database kernel establishes a system table associated with the external machine learning model.
  • the obtained training model and the predicate combination corresponding to the training model are stored in the above system table, and each predicate combination corresponds to a training model.
  • the training model can be tested with partial untrained data, and the summarized model confidence (accuracy) values are stored in the above system table.
  • the specific model training process and related technical processes for writing the training results into the system table can be referred to the prior application ZL201710109372.1 - "An Information Processing Method and Apparatus". I will not repeat them here.
  • the query optimizer 114 determines a first predicate combination in the plurality of candidate predicate combinations, where the first predicate combination includes predicates different from each other;
  • the query optimizer 114 may also determine at least one first predicate combination in the plurality of candidate predicate combinations, and the at least one first predicate combination includes predicates different from each other.
  • the at least one first predicate combination includes predicates different from each other
  • the predicate combination 1 includes the predicate 1 and the predicate 2
  • the predicate combination 2 includes the predicate 3 and the predicate 4
  • the predicate combination 1 and the predicate combination 2 include predicates that are different from each other.
  • the query optimizer 114 determines the first execution plan by using the training model corresponding to the first predicate combination, and the query executor 122 performs a data query using the execution plan generated by the query optimizer 114, and returns the query result to the client 102.
  • the database server 100 may parse the one query statement to obtain a plurality of predicates.
  • the query optimizer 114 may perform predicate combination or recombination on the plurality of predicates based on the connection relationship of the predicates to obtain a plurality of predicate combinations.
  • the query optimizer 114 can perform peer-level predicate reorganization in a hierarchy.
  • the query optimizer 114 can learn the connection relationship between the predicates.
  • the query optimizer 114 may select a plurality of candidate predicate combinations supported by the training model among the plurality of predicate combinations according to the type of the training model saved in the system table.
  • the query optimizer 114 may select, among the plurality of candidate predicate combinations, the first predicate combination including the predicates that are different from each other. Finally, the query optimizer 114 determines a first execution plan using the training model corresponding to the first predicate combination, and performs a data query using the first execution plan.
  • the query optimizer 114 uses The training model corresponding to each of the at least one first predicate combination determines an execution plan, including: the query optimizer 114 calculates the predicate selection rate using the training model corresponding to each of the first predicate combinations to obtain a plurality of The predicate selection rate is then multiplied by the multiple predicate selection rates to obtain a final predicate selection rate and an execution plan is determined based on the final predicate selection rate.
  • the query optimizer 114 obtains the predicate selection rate corresponding to the predicate combinations C1 and C2 by A, and the predicate selection rate corresponding to C3 and C4 is B, so that the predicate selection rate corresponding to C1, C2, C3, and C4 can be obtained. *B.
  • the query optimizer 114 determines the final execution plan based on the predicate selection rate A*B.
  • the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance or the SQL execution performance.
  • the query optimizer 114 may select an appropriate predicate combination based on the confidence of the training model. It should be understood that, in the embodiment of the present application, the “first predicate combination” and the “second predicate combination” are introduced only to distinguish different objects, and the embodiments of the present application are not limited.
  • a method 700 of data query in accordance with another embodiment of the present application will now be described in conjunction with FIG. As shown in FIG. 7, the method 700 includes:
  • the query optimizer 114 may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein one second predicate combination may include at least two predicates, the at least two second predicate combinations having At least one identical or repeated predicate.
  • each second predicate combination may include a plurality of predicates.
  • the predicate combination 1 may include a predicate 1 and a predicate 2
  • the predicate combination 2 may include a predicate 1 and a predicate 4, wherein the repeated predicate between the predicate combination 1 and the predicate combination 2 is a predicate 1.
  • the predicate combination 3 may include a predicate 1, a predicate 2, and a predicate 3
  • the predicate combination 4 may include a predicate 1, a predicate 2, and a predicate 5, wherein the repeated predicate between the predicate combination 3 and the predicate combination 4 is a predicate 1 And predicate 2.
  • each predicate combination has a corresponding training model.
  • the training model can be understood as the selectivity model of the predicate combination.
  • a two-column correlation model can be established.
  • the method 600 or the method 700 may further include:
  • the query optimizer 114 may obtain the confidence of the training model corresponding to each second predicate combination from the system table of the database.
  • the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model.
  • the confidence of the model is used to indicate the accuracy of the training model.
  • Table 1 an example of partial data of a training model saved in a system table of a database is shown in Table 1 below. As shown in Table 1:
  • PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 are respectively related predicate combinations.
  • PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 respectively correspond to different training models.
  • Valid can be understood as the identification bit of the training model.
  • the value of the flag is used to indicate the validity of the training model. For example, when the valid value is 1, the training model is valid; when the valid value is 0, the training model is invalid. Confidence is used to indicate the confidence of the training model.
  • the confidence of the training model corresponding to PRED1 and PRED2 is 0.76
  • the confidence of the training model corresponding to PRED1 and PRED4 is 0.93
  • the confidence level is 0.26.
  • S720 Determine a target predicate combination in the at least two second predicate combinations according to the confidence information of the training model corresponding to each second predicate combination in the at least two second predicate combinations, where the confidence is used. Indicating the accuracy of the training model;
  • the query optimizer 114 may select an appropriate or optimal predicate combination, that is, a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.
  • the query optimizer 114 may also select the target predicate combination according to other screening criteria. For example, the query optimizer 114 may set a threshold screening condition, select a target predicate combination that satisfies the threshold screening condition among the plurality of second predicate combinations, and eliminate other second predicate combinations that do not satisfy the threshold screening condition.
  • the confidence level of the training model corresponding to the target predicate combination is greater than the execution degree of the training model corresponding to the other predicate combinations in the at least two second predicate combinations, that is, the training model corresponding to the target predicate combination Confidence is the largest of all second predicate combinations.
  • the query optimizer 114 may compare the confidence of the training model corresponding to each second predicate combination of the plurality of second predicate combinations, and then filter out the maximum confidence, thereby determining the target predicate combination, and phasing out other Predicate combination.
  • PRED1, PRED2 the confidence model of the training model corresponding to predicate combination 1
  • PRED1, PRED4 the confidence model of the training model corresponding to predicate combination 2
  • predicate combination 1 and the predicate combination 2 are merely taken as an example. In practice, a plurality of predicate combinations may be included, which is not limited thereto.
  • S730 Determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
  • the query optimizer 114 may perform a corresponding calculation using the training model corresponding to the target predicate combination to obtain a corresponding execution plan (such as a second execution plan), thereby performing data query using the second execution plan. Since the target predicate combination is the filtered optimal predicate combination, the query optimizer 114 can obtain an optimal execution plan according to the training model corresponding to the target predicate combination.
  • the query optimizer 114 determines at least a second two predicate combinations, each of the at least two second predicate combinations including at least two predicates, the at least two second The predicate combination has at least one identical predicate, wherein each second predicate combination of the at least two second predicate combinations has a corresponding training model, and according to the confidence of the training model corresponding to each second predicate combination Determining a target predicate combination in the at least two second predicate combinations, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby having When the selection rate of the predicate combination of overlapping predicates is increased, the accuracy of the predicate selection rate can be improved.
  • the method 600 and the method 700 may be used in combination or independently.
  • the confidence level of the training model can be calculated by using various evaluation methods, and only one possible calculation method is described as an example, and the embodiments of the present application are not limited.
  • the "calculation operation of the confidence of the training model” and the “training operation of the training model” may be the same execution subject, and may be a module independent of the database or other implementation means, which may be located outside the database, for which no limited.
  • the kernel of the database can establish associated system table element information with an external training model to learn the training results or related data of the training model.
  • the calculation process of the confidence level of the first training model is taken as an example, and may include:
  • the first training model is corresponding to any one of the at least two predicate combinations Training model
  • a confidence level of the first training model is determined based on the plurality of the first confidence levels.
  • the selection rate S ml of the corresponding training predicate combination is calculated according to the first training model, as shown in the following formula:
  • the query optimizer 114 may acquire a plurality of first training predicate combinations to obtain a first confidence level corresponding to each first training predicate combination. The query optimizer 114 then calculates the confidence of the first training model using a plurality of first confidences.
  • n training predicate combinations may correspond to values of n c i .
  • the query optimizer 114 integrates the values of n c i and calculates the confidence C of the training model as: Where i ⁇ 1,2,...n ⁇ , where n is the number of training predicate combinations.
  • a plurality of first training predicate combinations can be understood as some untrained data of the training model for verifying the accuracy of the training model. That is to say, the model can be verified by using data that is not part of the model training to obtain the value of the accuracy of the training model.
  • the first training model is used as an example for description.
  • the confidence of each training model may be calculated by using the foregoing method, which is not limited thereto.
  • the query optimizer 114 may determine the target predicate combination among the plurality of second predicate combinations according to the confidence of the training model corresponding to each predicate combination, and finally use the target predicate combination.
  • the corresponding first training model determines a second execution plan, and then uses the second execution plan to perform a data query, thereby improving the accuracy of the predicate selection rate when calculating the selection rate of the predicate combination having overlapping predicates.
  • determining a second execution plan by using a training model corresponding to the target predicate combination including:
  • the second execution plan is generated using the model parameters.
  • the query optimizer 114 may search, in the system table of the database, model parameters of the training model corresponding to the target predicate combination, and the model parameters may include training results of the training model, such as weights, offsets, and the like. parameter.
  • the weight may be a neuron connection weight in the neural network training model, including weights between the input layer and the output layer, a hidden layer threshold, an output layer threshold, a hidden layer and an output layer weight matrix, etc.
  • offset The amount may be an offset corresponding to the weight obtained by the training of the neural network training model.
  • the query optimizer 114 calculates the predicate selection rate based on the model parameters, which in turn generates a second execution plan.
  • the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
  • the query optimizer 114 may acquire a plurality of candidate predicate combinations, which are candidate predicate combinations selected by the query optimizer 114 based on the machine learning algorithm, or may be understood as predicates supported by the trained model. combination.
  • the plurality of candidate predicate combinations may be: predicate combination 1 (PRED1, PRED2); predicate combination 2 (PRED1, PRED4); predicate combination 3 (PRED3, PRED5).
  • the query optimizer 114 may select the at least two second predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations.
  • the query optimizer 114 may determine the confidence level of the training model corresponding to each candidate predicate combination, as the candidate predicate combination satisfying the preset condition as the at least two second predicate combinations, so as to facilitate subsequent A target predicate combination is determined in the at least two predicate combinations.
  • the “preset condition” may be a specific threshold, or may be a specific screening condition.
  • the confidence of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.
  • the first threshold can be understood as a constant recognized internally by the query optimizer 114. If the confidence level of the training model of a certain set of predicate combinations is greater than the first threshold, the accuracy of the training model is considered to be higher.
  • the query optimizer 114 selects a predicate combination greater than 0.3, that is, a predicate combination 1 (PRED1, PRED2) and a predicate combination 2 (PRED1, PRED4), and the elimination is less than 0.3.
  • the predicate combination that is, the predicate combination 3 (PRED3, PRED5).
  • the query optimizer 114 may also set a filter condition, that is, sort the confidence of the training model of all predicate combinations, and then select the first ratio in which the confidence is ranked in the ranking (such as before the sequence table). 30%) of the training model as the adopted training model. For a training model with a lower confidence ranking (such as the last 70% of the sequence table), it can be considered that the screening conditions are not met and will not be adopted by the query optimizer 114.
  • a filter condition that is, sort the confidence of the training model of all predicate combinations, and then select the first ratio in which the confidence is ranked in the ranking (such as before the sequence table). 30%) of the training model as the adopted training model.
  • the query optimizer 114 can select the at least two predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations by introducing a threshold or a filter condition, thereby obtaining a training model with a higher accuracy rate, so as to facilitate subsequent output execution. plan.
  • FIG. 8 shows a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application.
  • the database management system 108 can receive SQL query statements submitted by the client through a communication connection established with the database server (as shown in the uppermost box of FIG. 8), and the underlined portion is a constant predicate (eg, Constant predicates can be constant expressions or constant functions).
  • the parser 112 of the database management system 108 can analyze the SQL query statement to obtain a predicate that can be supported by the training model (or machine learning model), and obtain the PRED1, PRED2, PRED3, PRED4, and PRED5 after analysis.
  • the underlined predicate in the box in the middle of Figure 8).
  • the query optimizer 114 analyzes that the connection predicate is not supported by the training model. Further, the query optimizer 114 may specifically obtain a plurality of candidate predicate combinations based on PRED1, PRED2, PRED3, PRED4, and PRED5. As shown in the lowermost box of Figure 8, the query optimizer 114 obtains three sets of predicate combinations with two column selection rates (i.e., two predicates in each predicate combination), namely: PRED2 and PRED1, PRED1 and PRED4. , PRED3 and PRED5. Among them, each set of predicate combinations corresponds to one training model, and each training model has a confidence level. Thus, query optimizer 114 can perform subsequent operations based on the plurality of candidate predicate combinations.
  • FIG. 9 shows a flow chart of an example in accordance with an embodiment of the present application.
  • the query optimizer 114 can acquire a plurality of candidate predicate combinations (such as the plurality of candidate predicate combinations shown in FIG. 8) by the preliminary screening operation, and judge the confidence of each candidate predicate combination. If it is determined that the confidence does not satisfy the preset condition, the candidate predicate combination is eliminated; if it is determined that the confidence meets the preset condition, the remaining candidate predicate combinations are subjected to secondary screening.
  • the preset condition may be a threshold or other screening conditions, which is not limited thereto.
  • the query optimizer 114 may also determine whether the training model corresponding to the candidate predicate combination is valid (such as a valid value), and may enter the next operation when the training model is valid.
  • the query optimizer 114 needs to determine if their confidence is greatest among the at least two predicate combinations. Then, the query optimizer 114 selects the predicate combination with the greatest confidence as the winning predicate combination in at least two predicate combinations with repeated or identical predicates, and uses the training model corresponding to the winning predicate combination to calculate the corresponding selection rate, and finally outputs Optimal execution plan.
  • query optimizer 114 may eliminate other predicate combinations where the confidence is not the greatest. Therefore, the query optimizer 114 can obtain the training model corresponding to the optimal predicate combination through two screenings, and perform corresponding calculations to obtain an optimal execution plan.
  • the query optimizer 114 may also obtain a predicate combination without a repeated predicate, but the corresponding confidence also satisfies the foregoing preset condition, but does not include the repeated predicate (not shown in FIG. show). At this time, the query optimizer 114 can use its corresponding training model to perform corresponding calculations to obtain its corresponding execution plan.
  • repetition or the same predicate of the at least two predicate combinations may be one or multiple, which is not limited thereto.
  • FIG. 10 shows a flow chart of a specific example in accordance with an embodiment of the present application.
  • FIG. 10 is a further visual representation of FIG.
  • the three sets of candidate predicate combinations obtained by the query optimizer 114 through the preliminary screening are: PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5.
  • PRED1 and PRED2, PRED1 and PRED4 have duplicate predicates PRED1.
  • the confidence model of the training model corresponding to PRED1 and PRED2 is 0.76
  • the confidence of the training model corresponding to PRED1 and PRED4 is 0.93
  • the confidence of the training model corresponding to PRED3 and PRED5 is 0.26.
  • the query optimizer 114 determines whether the respective confidence levels of the three sets of candidate predicate combinations are greater than 0.3. Obviously, 0.26 is less than 0.3, the query optimizer 114 eliminates the predicate combinations PRED3 and PRED5; 0.76 is greater than 0.3, 0.93 is greater than 0.3, and the query optimizer 114 performs secondary screening on PRED1 and PRED2, PRED1 and PRED4. Next, the query optimizer 114 determines the confidence levels of the predicate combinations with the repeated predicate PRED1 (ie, PRED1 and PRED2, PRED1 and PRED4), and selects the predicate combination with the highest confidence, here PRED1 and PRED4, and eliminates PRED1. And PRED2. Finally, the query optimizer 114 performs corresponding calculations using the training models corresponding to PRED1 and PRED4 to output an execution plan.
  • repeated predicate PRED1 is taken as an example. In practice, there may be multiple repeated predicates, and the method of the embodiment of the present application may also be used, which is not limited thereto.
  • the query optimizer 114 can perform corresponding calculations using the training models corresponding to PRED6 and PRED7 to obtain their corresponding execution plans.
  • Fig. 11 is a diagram showing an example of application of an embodiment of the present application.
  • the predicate combination that wins in Figure 10 is visually shown in Figure 11.
  • the query optimizer 114 is among a plurality of candidate predicate combinations (PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5), and finally the winning predicate combinations are PRED1 and PRED4.
  • the data query method of the embodiment of the present application can improve the accuracy of the predicate selection rate, thereby improving the query performance of the data query. Further, for at least two predicate combinations having repeated predicates, the training model corresponding to the predicate combination with high confidence is selected according to the confidence degree, and the accuracy of the predicate selection rate can be improved.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • the method of data query according to an embodiment of the present application is described in detail above, and an apparatus and database system for data query according to an embodiment of the present application will be described below.
  • the device for querying the data and the database system can perform the method of data query of the foregoing embodiment of the present application.
  • FIG. 12 shows a schematic block diagram of an apparatus 1200 for data query in accordance with an embodiment of the present application. As shown in FIG. 12, the apparatus 1200 includes:
  • the receiving module 1210 is configured to receive a query statement.
  • the processing module 1220 is configured to parse the query statement to obtain a plurality of predicates, and is further configured to perform a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;
  • the first determining module 1230 is configured to determine, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, where the plurality of candidate predicate combinations are Each candidate predicate combination includes at least two predicates;
  • the first determining module 1230 is further configured to determine, in the plurality of candidate predicate combinations, a first predicate combination, where the first predicate combination includes predicates different from each other;
  • the processing module 1220 is further configured to determine a first execution plan by using a training model corresponding to the first predicate combination, and perform data query by using the first execution plan.
  • the apparatus 1200 for data query in the embodiment of the present application may determine, in a plurality of candidate predicate combinations, a first predicate combination that does not have the same predicate. Since there is a corresponding training model for each candidate predicate combination, if the first predicate combination does not have the same predicate, the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used. The training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan.
  • the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
  • the apparatus 1200 may be the query optimizer 114 described above or a software/hardware functional unit integrated in the query optimizer 114.
  • the receiving module 1210 can be implemented by a receiver, or a communication interface
  • the functions of the processing module 1220 and the first determining module 1230 can be implemented by at least one processor executing instructions in memory.
  • the components in the database query device may be coupled together by a bus system, wherein the bus system includes a power bus, a control bus, a status signal bus, and the like in addition to the data bus.
  • the first determining module 1220 is further configured to: determine, in the plurality of candidate predicate combinations, at least two second predicate combinations, the at least two second predicate combinations having at least one identical predicate;
  • the apparatus 1200 further includes:
  • a second determining module 1240 configured to determine a target predicate combination in the at least two predicate combinations according to a confidence level of a training model corresponding to each second predicate combination in the at least two second predicate combinations, the confidence Degree is used to indicate the accuracy of the training model;
  • the processing module 1220 is further configured to determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
  • the apparatus 1200 further includes:
  • an obtaining module configured to acquire a confidence level of the training model corresponding to each second predicate combination in the at least two second predicate combinations.
  • a confidence level of the training model corresponding to the target predicate combination is the largest among the at least two predicate combinations.
  • the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
  • a confidence level of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
  • processing module 1220 is specifically configured to:
  • the training model parameter includes at least one of a weight and an offset; and generating the second execution plan by using the model parameter.
  • the apparatus 1200 for data query according to an embodiment of the present application may perform the method 600 or 700 of data query according to an embodiment of the present application, and the above and other operations and/or functions of the respective modules in the apparatus 1200 of the data query are respectively implemented for The corresponding processes of the foregoing various methods are not described herein for the sake of brevity. Additionally, the functions of the second determining module 1240 and the obtaining module may also be implemented by at least one processor executing instructions in memory. The device 1200 for data query in the embodiment of the present application may select a first predicate combination that does not have the same predicate among a plurality of candidate predicate combinations.
  • the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used.
  • the training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan.
  • the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
  • FIG. 14 shows a schematic block diagram of a database system 1400 in accordance with an embodiment of the present application.
  • the database system 1400 includes the device 1200 and database 1410 of the data query of the foregoing embodiment of the present application.
  • the database system 1400 can perform the foregoing method of data query in the embodiment of the present application, and perform an inquiry in the database 1410.
  • FIG. 15 shows the structure of an apparatus for data query provided by an embodiment of the present application, including at least one processor 1502 (for example, a CPU), at least one network interface 1503 or other communication interface, and a memory 1504. Alternatively, a receiver 1505 and a transmitter 1506 can also be used.
  • the processor 1502 is configured to execute an executable module, such as a computer program, stored in the memory 1504.
  • the memory 1504 may include a high speed random access memory RAM, and may also include a non-volatile memory such as at least one disk memory.
  • a communication connection with at least one other network element is achieved by at least one network interface 1503, which may be wired or wireless.
  • Receiver 1505 and transmitter 1506 are used to transmit various signals or information.
  • the memory 1504 stores a program 15041 that can be executed by the processor 1502 for performing the method of data query of the foregoing embodiments of the present application.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present application, or the part contributing to the prior art or the part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • the instructions include a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data query method, device, and database system, the method comprising: determining, according to the type of a pre-configured training model, a plurality of candidate predicate combinations supported by the pre-configured training model among a plurality of predicate combinations, each of the candidate predicate combinations in the plurality of candidate predicate combinations comprising at least two predicates; determining a first predicate combination in the plurality of candidate predicate combinations, the first predicate combination comprising predicates that are different from each other; and using a training model corresponding to the first predicate combination to determine a first execution plan, and using the first execution plan to perform a data query. The data query method, device and database system may improve the accuracy of the predicate selection rate, thereby improving the query performance. Furthermore, when there are repeated predicates in at least two second predicate combinations, the accuracy of the predicate selection rate may be improved.

Description

数据查询的方法、装置和数据库系统Data query method, device and database system
本申请要求于2017年5月4日提交中国专利局、申请号为201710308623.9、申请名称为“数据查询的方法、装置和数据库系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application entitled "Method, Apparatus and Database System for Data Query" submitted by the Chinese Patent Office on May 4, 2017, the application number is 201710308623.9, the entire contents of which are incorporated herein by reference. In the application.
技术领域Technical field
本申请涉及数据库领域,并且更具体地,涉及一种数据查询的方法、装置和数据库系统。The present application relates to the field of databases and, more particularly, to a method, apparatus and database system for data query.
背景技术Background technique
数据库系统在处理来自客户端的查询(Query)查询,例如,以结构化查询语言(Structured Query Language,SQL)表示的查询时,需要对该查询进行语法分析、预编译、优化等步骤,继而生成执行计划。其中,优化器是数据库系统中影响SQL语句的执行效率最重要的组件,其输出估算代价最小的执行计划(或称作最优执行计划)。优化器选择最优执行计划的过程中,谓词的选择率估算是很重要的环节。谓词选择率估算的准确性会直接影响到优化器后续对于执行计划中各算子代价估算的准确性,从而影响整体最优执行计划的输出。When the database system processes a query (Query) query from the client, for example, a query represented by a Structured Query Language (SQL), the query needs to be parsed, pre-compiled, optimized, etc., and then executed. plan. Among them, the optimizer is the most important component in the database system that affects the execution efficiency of the SQL statement, and its output estimates the least costly execution plan (or called the optimal execution plan). In the process of selecting the optimal execution plan by the optimizer, the selection rate estimation of the predicate is very important. The accuracy of the predicate selection rate estimation directly affects the accuracy of the optimizer's subsequent estimation of the operator's cost in the execution plan, thus affecting the output of the overall optimal execution plan.
传统的谓词选择性估算方法包括基于直方图、基于常见值和基于常见值频率的估计。对于多列复合谓词的选择率估算,有一些基于单一选择性的复合选择性估计算法和基于多列统计信息技术,比如,几个列结合直方图。然而,这些都是针对单个或多个谓词进行谓词选择率的计算,计算的准确性有待提高,尤其是当某个谓词同时落入多个选择率计算模型时,谓词选择率的计算准确度较低准确,从而影响最优执行计划的输出。Traditional predicate selectivity estimation methods include histogram based, based on common values and estimates based on common value frequencies. For the selection rate estimation of multi-column compound predicates, there are some composite selective estimation algorithms based on single selectivity and multi-column statistical information techniques, such as several columns combined with histograms. However, these are the calculations of the predicate selection rate for single or multiple predicates. The accuracy of the calculation needs to be improved, especially when a predicate falls into multiple selection rate calculation models at the same time, the calculation accuracy of the predicate selection rate is better. Low accuracy, which affects the output of the optimal execution plan.
发明内容Summary of the invention
本申请提供了一种数据查询的方法、装置和数据库系统,以提高谓词选择率的准确性,进而提升查询性能。The application provides a data query method, device and database system to improve the accuracy of the predicate selection rate, thereby improving query performance.
第一方面,提供了一种数据查询的方法,包括:In a first aspect, a method of data query is provided, including:
数据库服务器通过接收来自客户端的查询语句,对所述查询语句进行解析,以得到多个谓词;继而对所述多个谓词进行谓词组合,以得到多个谓词组合;然后根据预配置的训练模型的类型,在所述多个谓词组合中确定出所述预配置的训练模型支持的多个候选谓词组合,所述多个候选谓词组合中的每个候选谓词组合包括至少两个谓词;接着,在所述多个候选谓词组合中确定第一谓词组合,所述第一谓词组合包括的谓词互不相同;最后使用所述第一谓词组合对应的训练模型确定第一执行计划,并使用所述第一执行计划进行数据查询。The database server parses the query statement by receiving a query statement from the client to obtain a plurality of predicates; then performing predicate combination on the plurality of predicates to obtain a plurality of predicate combinations; and then according to the pre-configured training model a type, among the plurality of predicate combinations, determining a plurality of candidate predicate combinations supported by the pre-configured training model, each of the plurality of candidate predicate combinations comprising at least two predicates; Determining a first predicate combination in the plurality of candidate predicate combinations, the first predicate combination includes predicates different from each other; and finally determining a first execution plan using the training model corresponding to the first predicate combination, and using the An execution plan for data query.
上述技术方案中,对于一个谓词组合,可以基于谓词的相关性获取谓词组合的训练模 型,从而计算谓词选择率。而不需要分别计算一个谓词组合中各个谓词的谓词选择率,并将各个谓词的谓词选择率相乘。也就是说,采用训练模型计算谓词选择率的方法考虑了谓词的相关性,得到的谓词选择率会更准确,从而提高了查询性能。In the above technical solution, for a predicate combination, the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
在一种可能的实现方式中,若至少两个谓词组合具有相同或重复的谓词,数据库服务器可以基于训练模型的置信度选取合适的谓词组合。In a possible implementation, if at least two predicate combinations have the same or repeated predicates, the database server may select an appropriate predicate combination based on the confidence of the training model.
在一种可能的实现方式中,所述方法还包括:In a possible implementation manner, the method further includes:
数据库服务器在所述多个候选谓词组合中确定至少两个第二谓词组合,所述至少两个第二谓词组合具有至少一个相同的谓词;根据所述指示两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度,在所述至少两个谓词组合中确定目标谓词组合,所述置信度用于指示训练模型的准确度;使用所述目标谓词组合对应的训练模型确定第二执行计划,并使用所述第二执行计划进行数据查询。The database server determines at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate; each of the two second predicate combinations according to the indication a confidence level of the training model corresponding to the two predicate combinations, wherein the target predicate combination is determined in the at least two predicate combinations, the confidence level is used to indicate the accuracy of the training model; and the training model corresponding to the target predicate combination is used to determine the Second, execute the plan and use the second execution plan to perform data query.
在上述技术方案中,数据库服务器通过确定具有至少一个相同的谓词的至少两个第二谓词组合,并根据每个第二谓词组合对应的训练模型的置信度,在所述至少两个第二谓词组合中确定目标谓词组合,最后使用所述目标谓词组合对应的训练模型确定第二执行计划,继而使用所述第二执行计划进行数据查询,从而在计算具有重叠谓词的谓词组合的选择率时,能够提高谓词选择率的准确性。In the above technical solution, the database server determines at least two second predicates by having at least one second predicate combination having at least one identical predicate, and according to a confidence level of the training model corresponding to each second predicate combination Determining a target predicate combination in the combination, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby calculating a selection rate of a predicate combination having overlapping predicates Can improve the accuracy of the predicate selection rate.
在一种可能的实现方式中,数据库服务器可以在多个候选谓词组合中确定至少两个第二谓词组合,其中,一个第二谓词组合中可以包括至少两个谓词,所述至少两个第二谓词组合具有至少一个相同或重复的谓词。In a possible implementation, the database server may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein a second predicate combination may include at least two predicates, the at least two second A predicate combination has at least one identical or repeated predicate.
在一种可能的实现方式中,数据库服务器可以根据第二谓词组合对应的训练模型的置信度,选择合适的或最优的谓词组合,比如目标谓词组合。In a possible implementation manner, the database server may select an appropriate or optimal predicate combination, such as a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.
在一种可能的实现方式中,数据库服务器也可以按照其他筛选条件选择所述目标谓词组合。比如,数据库服务器可以设置阈值筛选条件,在所述至少两个第二谓词组合中选择满足阈值筛选条件的第二谓词组合,即目标谓词组合,以淘汰不满足阈值筛选条件的其它第二谓词组合。In a possible implementation, the database server may also select the target predicate combination according to other filtering conditions. For example, the database server may set a threshold screening condition, and select, in the at least two second predicate combinations, a second predicate combination that satisfies the threshold filtering condition, that is, the target predicate combination, to eliminate other second predicate combinations that do not satisfy the threshold screening condition. .
在一种可能的实现方式中,所述方法还包括:In a possible implementation manner, the method further includes:
获取所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度。Obtaining a confidence level of a training model corresponding to each of the at least two second predicate combinations.
可选地,数据库服务器可以从数据库的系统表中获取每个谓词组合对应的训练模型的置信度。Optionally, the database server may obtain the confidence of the training model corresponding to each predicate combination from the system table of the database.
可选地,数据库系统中的系统表中包括每个谓词组合的训练模型的训练结果(比如权值、偏移量等模型参数)以及模型的置信度。其中,模型的置信度用于表示训练模型的准确度。Optionally, the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model. Among them, the confidence of the model is used to indicate the accuracy of the training model.
在一些可能的实现方式中,所述目标谓词组合对应的训练模型的置信度大于所述至少两个第二谓词组合中其它第二谓词组合的训练模型的置信度。In some possible implementations, the confidence of the training model corresponding to the target predicate combination is greater than the confidence of the training model of the other second predicate combinations of the at least two second predicate combinations.
在一些可能的实现方式中,在所述多个候选谓词组合中确定的所述至少两个第二谓词组合中,每个第二谓词组合对应的训练模型的置信度均满足预设条件。In some possible implementations, in the at least two second predicate combinations determined in the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
可选地,“预设条件”可以是一个具体的阈值,或者,也可以是某一个具体的筛选条件。Optionally, the “preset condition” may be a specific threshold, or may be a specific screening condition.
在一些可能的实现方式中,所述至少两个第二谓词组合中每个第二谓词组合对应的训 练模型的置信度均大于第一阈值。In some possible implementations, the confidence model of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.
在一些可能的实现方式中,使用所述目标谓词组合对应的训练模型确定第二执行计划,包括:In some possible implementations, the second execution plan is determined by using the training model corresponding to the target predicate combination, including:
数据库服务器获取所述目标谓词组合对应的训练模型的模型参数,所述训练模型参数包括权值、偏移量中的至少一种;使用所述模型参数生成所述第二执行计划。The database server acquires model parameters of the training model corresponding to the target predicate combination, the training model parameters include at least one of a weight and an offset; and the second execution plan is generated by using the model parameter.
第二方面,提供了一种数据查询的装置。用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地,该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的模块或单元。In a second aspect, an apparatus for data query is provided. A method for performing the first aspect or any of the possible implementations of the first aspect described above. In particular, the apparatus comprises a module or unit for performing the method of any of the above-described first aspect or any of the possible implementations of the first aspect.
第三方面,提供了一种数据查询的装置。该装置包括处理器、存储器和通信接口。处理器与存储器和通信接口耦合。存储器用于存储指令,处理器用于执行该指令,通信接口用于在处理器的控制下与其他网元进行通信。该指令在被处理器执行时,使处理器执行第一方面或第一方面的任意可能的实现方式中的方法。In a third aspect, an apparatus for data query is provided. The device includes a processor, a memory, and a communication interface. The processor is coupled to the memory and the communication interface. The memory is for storing instructions for the processor to execute, and the communication interface is for communicating with other network elements under the control of the processor. The instructions, when executed by the processor, cause the processor to perform the method of the first aspect or any of the possible implementations of the first aspect.
第四方面,提供了一种数据库系统。该数据库系统包括第二方面或第三方面的数据查询的装置和数据库。In a fourth aspect, a database system is provided. The database system includes the apparatus and database of the data query of the second aspect or the third aspect.
第五方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有程序,该程序使得数据查询的装置执行上述第一方面,及其各种实现方式中的任一种数据查询的方法。In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a program that causes a device for data query to perform the first aspect described above, and any one of various implementations of the data query Methods.
附图说明DRAWINGS
图1是应用本申请实施例的一个数据库系统示意性架构图。FIG. 1 is a schematic structural diagram of a database system to which an embodiment of the present application is applied.
图2是应用本申请实施例的单机数据库系统的示意图。2 is a schematic diagram of a stand-alone database system to which an embodiment of the present application is applied.
图3是应用本申请实施例的采用共享磁盘架构的集群数据库系统的示意图。FIG. 3 is a schematic diagram of a cluster database system adopting a shared disk architecture according to an embodiment of the present application.
图4是应用本申请实施例的采用无共享磁盘架构的集群数据库系统的示意图。4 is a schematic diagram of a cluster database system employing a shared-nothing disk architecture according to an embodiment of the present application.
图5是应用本申请实施例的数据库服务器的示意图。FIG. 5 is a schematic diagram of a database server to which an embodiment of the present application is applied.
图6是根据本申请实施例的数据查询的方法的示意性流程图。FIG. 6 is a schematic flowchart of a method for data query according to an embodiment of the present application.
图7是根据本申请另一实施例的数据查询的方法的示意性流程图。FIG. 7 is a schematic flowchart of a method for data query according to another embodiment of the present application.
图8是根据本申请实施例的多个候选谓词组合的一个例子的示意图。FIG. 8 is a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application.
图9是根据本申请实施例的一个例子的流程图。9 is a flow chart of an example in accordance with an embodiment of the present application.
图10是根据本申请实施例的一个具体例子的流程图。10 is a flow chart of a specific example in accordance with an embodiment of the present application.
图11是应用本申请实施例的一个例子的示意图。Figure 11 is a schematic diagram of an example of application of an embodiment of the present application.
图12是根据本申请实施例的数据查询的装置的示意性框图。FIG. 12 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application.
图13是根据本申请另一实施例的数据查询的装置的示意性框图。FIG. 13 is a schematic block diagram of an apparatus for data query according to another embodiment of the present application.
图14是根据本申请实施例的数据库系统的示意性框图。14 is a schematic block diagram of a database system in accordance with an embodiment of the present application.
图15是本申请一个实施例提供的数据查询的装置的结构框图。FIG. 15 is a structural block diagram of an apparatus for data query provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. .
本申请实施例的技术方案可以用于数据库系统(Database System)或数据库管理系统 (Database Management System,DBMS)中,比如关系型数据库管理系统。The technical solution of the embodiment of the present application can be used in a database system or a database management system (DBMS), such as a relational database management system.
本申请实施例所应用的数据库系统的架构如图1所示,该数据库系统包括数据库和数据库管理系统DBMS。其中,数据库是指存储在数据存储器中的有组织的数据集合,即按照一定的数据模型组织、存储和使用的相关联的数据集合,比如,数据库可以包括一个或者多个表数据。The architecture of the database system to which the embodiment of the present application is applied is as shown in FIG. 1. The database system includes a database and a database management system DBMS. Wherein, a database refers to an organized collection of data stored in a data store, ie, an associated set of data organized, stored, and used in accordance with a certain data model. For example, the database may include one or more table data.
DBMS用于建立、使用和维护数据库,以及对数据库进行统一的管理和控制,以保证数据库的安全性和完整性。用户可以通过DBMS访问数据库中的数据,数据库管理员也通过DBMS进行数据库的维护工作。DBMS提供多种功能,可使多个应用程序和用户设备使用不同的方法,在同一时刻或不同时刻去建立,修改和询问数据库,应用程序和用户设备可以统称为客户端。DBMS所提供的功能可以包括以下几项:(1)数据定义功能,DBMS提供数据定义语言(Data Definition Language,DDL)来定义数据库结构,DDL用于刻画数据库框架,并可以被保存在数据字典中;(2)数据存取功能,DBMS提供数据操纵语言(Data Manipulation Language,DML),实现对数据库数据的基本存取操作,比如检索、插入、修改和删除;(3)数据库运行管理功能,DBMS提供数据控制功能,即是数据的安全性、完整性和并发控制等对数据库运行进行有效地控制和管理,以确保数据正确有效;(4)数据库的建立和维护功能,包括数据库初始数据的装入,数据库的转储、恢复、重组织,系统性能监视、分析等功能;(5)数据库的传输,DBMS提供处理数据的传输,实现客户端与DBMS之间的通信,通常与操作系统协调完成。DBMS is used to establish, use and maintain databases, and to manage and control the database in a unified manner to ensure the security and integrity of the database. The user can access the data in the database through the DBMS, and the database administrator also performs database maintenance through the DBMS. DBMS provides a variety of functions that enable multiple applications and user devices to use different methods to create, modify, and query databases at the same time or at different times. Applications and user devices can be collectively referred to as clients. The functions provided by the DBMS can include the following items: (1) data definition function, the DBMS provides a data definition language (DDL) to define the database structure, and the DDL is used to describe the database framework and can be saved in the data dictionary. (2) Data access function, DBMS provides Data Manipulation Language (DML) to achieve basic access operations to database data, such as retrieval, insertion, modification and deletion; (3) database operation management function, DBMS Provide data control functions, that is, data security, integrity and concurrency control to effectively control and manage database operations to ensure data is correct and effective; (4) database establishment and maintenance functions, including database initial data loading Into, database dump, recovery, reorganization, system performance monitoring, analysis and other functions; (5) database transmission, DBMS provides processing data transmission, to achieve communication between the client and the DBMS, usually coordinated with the operating system .
具体地,图2为单机数据库系统示意图,包括一个数据库管理系统和数据存储器,该数据库管理系统用于提供数据库的查询和修改等服务,该数据库管理系统将数据存储到数据存储器中。在单机数据库系统中,数据库管理系统和数据存储器通常位于单一服务器上,比如一台对称多处理器(Symmetric Multi-Processor,SMP)服务器。该SMP服务器包括多个处理器,所有的处理器共享资源,如总线,内存和I/O系统等。数据库管理系统的功能可由一个或多个处理器执行内存中的程序来实现。Specifically, FIG. 2 is a schematic diagram of a stand-alone database system, including a database management system and a data storage system for providing services such as querying and modifying a database, and the database management system stores data in the data storage. In a stand-alone database system, the database management system and data storage are usually located on a single server, such as a Symmetric Multi-Processor (SMP) server. The SMP server includes multiple processors, all of which share resources such as bus, memory, and I/O systems. The functionality of the database management system can be implemented by one or more processors executing programs in memory.
图3为采用共享磁盘(Shared-storage)架构的集群数据库系统示意图,包括多个节点(如图3中的节点1-N),每个节点部署有数据库管理系统,分别为用户提供数据库的查询和修改等服务,多个数据库管理系统存储有共享的数据在共享数据存储器中,并且通过交换机对数据存储器中的数据执行读写操作。共享数据存储器可以为共享磁盘阵列。集群数据库系统中的节点可以为物理机,比如数据库服务器,也可以为运行在抽象硬件资源上的虚拟机。若节点为物理机,则交换机为存储区网络(Storage Area Network,SAN)交换机、以太网交换机,光纤交换机或其它物理交换设备。若节点为虚拟机,则交换机为虚拟交换机。Figure 3 is a schematic diagram of a cluster database system using a shared-storage architecture, including multiple nodes (such as nodes 1-N in Figure 3), each node is deployed with a database management system to provide users with database queries. And modifying services, multiple database management systems store shared data in the shared data store, and perform read and write operations on the data in the data store through the switch. The shared data storage can be a shared disk array. A node in a clustered database system can be a physical machine, such as a database server, or a virtual machine running on an abstract hardware resource. If the node is a physical machine, the switch is a Storage Area Network (SAN) switch, an Ethernet switch, a fiber switch, or other physical switching device. If the node is a virtual machine, the switch is a virtual switch.
图4为采用无共享(Shared-nothing)架构的集群数据库系统示意图,每个节点具有各自独享的硬件资源(如数据存储器)、操作系统和数据库,节点之间通过网络来通信。该体系下,数据将根据数据库模型和应用特点被分布到各个节点上,查询任务将被分割成若干部分,在所有节点上并行执行,彼此协同计算,作为整体提供数据库服务,所有通信功能都在一个高宽带网络互联体系上实现。如同图3所描述的共享磁盘架构的集群数据库系统一样,这里的节点既可以是物理机,也可以是虚拟机。FIG. 4 is a schematic diagram of a cluster database system adopting a shared-nothing architecture, each node has its own unique hardware resources (such as data storage), an operating system, and a database, and nodes communicate through a network. Under this system, the data will be distributed to each node according to the database model and application characteristics. The query task will be divided into several parts, executed in parallel on all nodes, and coordinated with each other to provide database services as a whole. All communication functions are in Implemented on a high-bandwidth network interconnection system. Like the clustered database system of the shared disk architecture described in Figure 3, the nodes here can be either physical or virtual machines.
在本申请所有实施例中,数据库系统的数据存储器包括但不限于固态硬盘(SSD)、 磁盘阵列或其他类型的非瞬态计算机可读介质。图2至图4中虽未示出数据库,应理解,数据库存储在数据存储器中。所属领域的技术人员可以理解一个数据库系统可能包括比图2至图4中所示的部件更少或更多的组件,或者包括与图2至图4中所示组件不同的组件,图2至图4仅仅示出了与本申请实施例所公开的实现方式更加相关的组件。例如,虽然图3和至图4中已经描述了4个节点,但所属领域的技术人员可理解成一个集群数据库系统可包含任何数量的节点。各节点的数据库管理系统功能可分别由运行在各节点上的软件、硬件和/或固件的适当组合来实现。In all embodiments of the present application, the data store of the database system includes, but is not limited to, a solid state drive (SSD), a disk array, or other type of non-transitory computer readable medium. Although the database is not shown in Figures 2 through 4, it should be understood that the database is stored in a data store. One of ordinary skill in the art will appreciate that a database system may include fewer or more components than those shown in Figures 2 through 4, or include components different from those shown in Figures 2 through 4, Figure 2 through FIG. 4 only shows components that are more relevant to the implementations disclosed in the embodiments of the present application. For example, although four nodes have been described in Figures 3 and 4, those skilled in the art will appreciate that a cluster database system can include any number of nodes. The database management system functions of each node may be implemented by appropriate combinations of software, hardware, and/or firmware running on each node, respectively.
本领域技术人员根据本申请实施例的教导可以很清楚地理解,本申请实施例的方法可典型地应用于数据库管理系统,该数据库管理系统安装或者部署在单机数据库系统、Shared-nothing架构的集群数据库系统、Shared-storage架构的集群数据库系统,或其它类型的数据库系统中。A person skilled in the art can clearly understand that the method of the embodiment of the present application can be generally applied to a database management system installed or deployed in a stand-alone database system, a cluster of a Shared-nothing architecture, according to the teachings of the embodiments of the present application. Database system, clustered database system of Shared-storage architecture, or other types of database systems.
为了便于理解和描述,作为示例而非限定,下面以数据库服务器为例说明本申请实施例的方案。该数据库服务器具体可以为图2所述的单机数据库系统中的SMP服务器,或者图3或图4中所述的一个节点。具体的,如图5所示,数据库服务器100,包括:至少一个处理器104、存储可执行代码的非瞬态计算机可读介质(non-transitory computer-readable medium)106和数据库管理系统108。所述可执行代码在被至少一个处理器104执行时被配置为实现数据库管理系统108的组件和功能。非瞬态计算机可读介质106可以包括一个或多个非易失性存储器,作为示例,非易失性存储器包括半导体存储器设备,例如可擦可编程只读存储器(Erasable Programmable Read Only Memory,EPROM),电可擦只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)和闪存(flash memory);磁盘,例如内部硬盘(internal hard disk)或可移动磁盘(removable disk),磁光盘(magneto optical disk),以及CD ROM和DVD-ROM。此外,非瞬态计算机可读介质106还可以包括被配置为主存储器(main memory)的任何设备。至少一个处理器104可以包括任何类型的通用计算电路或专用逻辑电路,例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)。至少一个处理器104也可以是耦合到一个或多个半导体基板的一个或多个处理器,例如CPU。For ease of understanding and description, by way of example and not limitation, the scheme of the embodiment of the present application is described below by taking a database server as an example. The database server may specifically be an SMP server in the stand-alone database system described in FIG. 2, or a node as described in FIG. 3 or FIG. Specifically, as shown in FIG. 5, the database server 100 includes at least one processor 104, a non-transitory computer-readable medium 106 and a database management system 108 that store executable code. The executable code, when executed by at least one processor 104, is configured to implement the components and functions of database management system 108. The non-transitory computer readable medium 106 can include one or more non-volatile memories. As an example, the non-volatile memory includes a semiconductor memory device, such as an Erasable Programmable Read Only Memory (EPROM). , Electrically Erasable Programmable Read Only Memory (EEPROM) and flash memory; disk, such as internal hard disk or removable disk, magneto optical disk , as well as CD ROM and DVD-ROM. Moreover, the non-transitory computer readable medium 106 can also include any device that is configured as a main memory. The at least one processor 104 can include any type of general purpose computing circuit or special purpose logic circuit, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). The at least one processor 104 can also be one or more processors, such as a CPU, coupled to one or more semiconductor substrates.
数据库管理系统108可以是关系型数据库管理系统(Relational Database Management System,RDBMS)。数据库管理系统108支持结构化查询语言(Structured Query Language,SQL)。通常,SQL是指专门用于管理关系型数据库中保存的数据的专用编程语言。SQL可以指代各种类型的数据相关语言,包括例如数据定义语言和数据操纵语言,其中SQL的范围可以包括数据插入,查询,更新和删除,模式创建和修改以及数据访问控制。此外,在一些示例中,SQL可以包括与各种语言元素相关的描述,包括子句(clause),表达式(expression),谓词(predicate),查询(query)和语句(statement)。其中,表达式可以被配置为产生包括数据列和/或行的标量值(scalar value)和/或表。谓词(Predicate,简称PRED)是计算结果为逻辑值(比如TRUE、FALSE、UNKNOWN)的逻辑表达式,可以用于描述对象之间的连接关系。比如,在SELECT查询语句中,在WHERE子句和HAVING子句中的过滤条件可以理解为指定谓词。The database management system 108 can be a Relational Database Management System (RDBMS). Database management system 108 supports Structured Query Language (SQL). In general, SQL refers to a specialized programming language that is dedicated to managing data stored in relational databases. SQL can refer to various types of data-related languages, including, for example, data definition languages and data manipulation languages, where SQL can include data insertion, query, update and delete, schema creation and modification, and data access control. Moreover, in some examples, SQL can include descriptions related to various language elements, including clauses, expressions, predicates, queries, and statements. Wherein the expression can be configured to generate a scalar value and/or a table comprising data columns and/or rows. Predicate (PREDicate) is a logical expression that evaluates to logical values (such as TRUE, FALSE, UNKNOWN) and can be used to describe the connection relationship between objects. For example, in a SELECT query, the filter in the WHERE clause and the HAVING clause can be understood as a specified predicate.
查询(query)是请求查看,访问和/或操纵存储在数据库中的数据。例如,数据库管理系统108可以从数据库客户端102接收SQL格式的查询(称为SQL查询)。通常,数 据库管理系统108通过通信接口,比如应用程序接口(API)或者以太网接口等网络接口接收客户端的查询,从数据库访问相关数据并操纵相关数据以生成查询所对应的查询结果,并将查询结果通过上述通信接口返回到数据库客户端102。数据库是按一定的数学模型组织、描述和存储的数据集合,数据库可以包括一个或多个数据库结构或格式,例如行存储和列存储。数据库通常存储于数据存储器中,比如图5中的外部数据存储器120,或者非瞬态计算机可读介质106。当数据库存储于非瞬态计算机可读介质106时,数据库管理系统108为内存数据库管理系统。A query is a request to view, access, and/or manipulate data stored in a database. For example, database management system 108 can receive a query in SQL format (referred to as a SQL query) from database client 102. Generally, the database management system 108 receives the client's query through a communication interface, such as an application program interface (API) or an Ethernet interface, accesses relevant data from the database, and manipulates the related data to generate a query result corresponding to the query, and queries the query. The result is returned to the database client 102 via the communication interface described above. A database is a collection of data organized, described, and stored in a mathematical model that can include one or more database structures or formats, such as row storage and column storage. The database is typically stored in a data store, such as external data store 120 in FIG. 5, or non-transitory computer readable medium 106. When the database is stored on the non-transitory computer readable medium 106, the database management system 108 is an in-memory database management system.
数据库客户端102可以包括被配置成与数据库管理系统108交互的任何类型的设备或应用程序。在一些示例中,数据库客户端102包括一个或多个应用服务器。Database client 102 can include any type of device or application that is configured to interact with database management system 108. In some examples, database client 102 includes one or more application servers.
数据库管理系统108包括解析器112、查询优化器114、查询执行器122和存储引擎134。解析器110用于执行对客户端102提交的查询(Query)的语法、语义分析,将查询中的视图展开、划分为小的查询块。查询优化器114为查询生成一组可能被使用的执行计划,估算出每个执行计划的代价,比较计划的代价,最终选择一个最优的执行计划。查询执行器122依照查询的执行计划进行操作,以产生查询结果。存储引擎134负责管理表的数据、索引的实际内容,同时也会管理运行时的Cache、Buffer、事务、Log等数据。例如存储引擎134可以将执行引擎122的执行结果通过物理I/O写入数据存储器120。The database management system 108 includes a parser 112, a query optimizer 114, a query executor 122, and a storage engine 134. The parser 110 is configured to perform syntax and semantic analysis of a query submitted by the client 102, and expand the view in the query into small query blocks. The query optimizer 114 generates a set of execution plans that may be used for the query, estimates the cost of each execution plan, compares the cost of the plan, and ultimately selects an optimal execution plan. The query executor 122 operates in accordance with the execution plan of the query to generate a query result. The storage engine 134 is responsible for managing the data of the table, the actual content of the index, and also managing the data such as Cache, Buffer, transaction, and Log at runtime. For example, storage engine 134 can write execution results of execution engine 122 to data store 120 via physical I/O.
查询优化器114选择最优执行计划过程中,谓词选择性(Predicate Selectivity)的计算是非常重要的一个环节。谓词选择率的准确性会直接影响到执行计划的准确性,比如可能会影响到执行计划中各个算子代价估算的准确性,从而影响到最优执行计划的输出。The calculation of Predicate Selectivity is a very important part of the query optimizer 114 in selecting the optimal execution plan. The accuracy of the predicate selection rate directly affects the accuracy of the execution plan, such as the accuracy of the estimate of the cost of each operator in the execution plan, which affects the output of the optimal execution plan.
基于以上描述的数据库服务器100本申请实施例针对具有重复谓词的谓词组合,提出了一种数据查询的方法,以提高谓词选择率的准确性,进而提升查询性能。The database server 100 based on the above description is directed to a predicate combination with repeated predicates, and a data query method is proposed to improve the accuracy of the predicate selection rate, thereby improving query performance.
图6示出了根据本申请实施例的数据查询的方法600的示意性流程图,参照图5,该方法包括:FIG. 6 shows a schematic flowchart of a method 600 for data query according to an embodiment of the present application. Referring to FIG. 5, the method includes:
S610,数据库管理系统108接收客户端通过与数据库服务器建立的通信连接提交的查询语句;S610, the database management system 108 receives a query statement submitted by the client through a communication connection established with the database server;
S620,数据库管理系统108的解析器112对所述查询语句进行解析,以得到多个谓词;S620, the parser 112 of the database management system 108 parses the query statement to obtain a plurality of predicates;
S630,查询优化器114对所述多个谓词进行谓词组合,以得到多个谓词组合;S630, the query optimizer 114 performs predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;
S640,查询优化器114根据预配置的训练模型的类型,在所述多个谓词组合中确定出所述预配置的训练模型对应的多个候选谓词组合,所述多个候选谓词组合中的每个候选谓词组合包括至少两个谓词;S640. The query optimizer 114 determines, according to the type of the pre-configured training model, a plurality of candidate predicate combinations corresponding to the pre-configured training model, and each of the plurality of candidate predicate combinations. The candidate predicate combination includes at least two predicates;
可选地,查询优化器114可以根据训练模型种类挑选可用的多个候选谓词组合,每个候选谓词组合存在对应的训练模型。Optionally, the query optimizer 114 may select a plurality of candidate predicate combinations available according to the training model category, and each candidate predicate combination has a corresponding training model.
可选地,在本申请实施例中,训练模型可以是通过机器学习算法获得的监督学习模型或无监督学习模型,比如神经网络(Neural Network,NN)模型、支持向量机(Support Vector Machine,SVM)模型、模糊模型、随机森林(Random Forest)等模型。具体比如,神经网络模型包括前向反馈神经网络(Forward Neural Network,FFNN)模型、循环神经网络(Recurrent Neural Network,RNN)模型等。Optionally, in the embodiment of the present application, the training model may be a supervised learning model or an unsupervised learning model obtained by a machine learning algorithm, such as a neural network (NN) model, a support vector machine (SVM). Models, fuzzy models, random forests (Random Forest) and other models. Specifically, the neural network model includes a forward neural network (FFNN) model, a recurrent neural network (RNN) model, and the like.
需要说明的是,机器学习训练模型和过程在数据库外部,数据库内核建立与外部机器学习模型相关联的系统表。用于谓词选择率估算的模型训练完毕后,会将得到的训练模型 以及该训练模型对应的谓词组合存入上述系统表,每一个谓词组合对应有一个训练模型。进一步地,可以用部分未训练数据对训练模型进行检验,总结出的模型置信度(准确性)值存入上述系统表中。另外,将机器学习模型引入到数据库查询优化器后,具体的模型训练过程以及将训练结果写入系统表的相关技术流程可参见在先申请ZL201710109372.1-《一种信息处理方法及装置》,在此不再赘述。It should be noted that the machine learning training model and process are external to the database, and the database kernel establishes a system table associated with the external machine learning model. After the training of the model for predicate selection rate is completed, the obtained training model and the predicate combination corresponding to the training model are stored in the above system table, and each predicate combination corresponds to a training model. Further, the training model can be tested with partial untrained data, and the summarized model confidence (accuracy) values are stored in the above system table. In addition, after the machine learning model is introduced into the database query optimizer, the specific model training process and related technical processes for writing the training results into the system table can be referred to the prior application ZL201710109372.1 - "An Information Processing Method and Apparatus". I will not repeat them here.
S650,查询优化器114在所述多个候选谓词组合中确定第一谓词组合,所述第一谓词组合包括的谓词互不相同;S650. The query optimizer 114 determines a first predicate combination in the plurality of candidate predicate combinations, where the first predicate combination includes predicates different from each other;
可选地,所述查询优化器114也可以在所述多个候选谓词组合中确定出至少一个第一谓词组合,所述至少一个第一谓词组合包括的谓词互不相同。Optionally, the query optimizer 114 may also determine at least one first predicate combination in the plurality of candidate predicate combinations, and the at least one first predicate combination includes predicates different from each other.
这里,“所述至少一个第一谓词组合包括的谓词互不相同”是对于谓词组合之间而言的。比如,若谓词组合1包括谓词1和谓词2,谓词组合2包括谓词3和谓词4,则可知谓词组合1和谓词组合2包括的谓词互不相同。Here, "the at least one first predicate combination includes predicates different from each other" is for the predicate combination. For example, if the predicate combination 1 includes the predicate 1 and the predicate 2, and the predicate combination 2 includes the predicate 3 and the predicate 4, it can be seen that the predicate combination 1 and the predicate combination 2 include predicates that are different from each other.
S660,查询优化器114使用所述第一谓词组合对应的训练模型确定第一执行计划,查询执行器122使用查询优化器114生成的执行计划进行数据查询,并将查询结果返回给客户端102。S660. The query optimizer 114 determines the first execution plan by using the training model corresponding to the first predicate combination, and the query executor 122 performs a data query using the execution plan generated by the query optimizer 114, and returns the query result to the client 102.
具体而言,数据库服务器100在接收到来自客户端的一个查询语句(比如SQL语句)时,可以对所述一个查询语句进行解析,以得到多个谓词。接着,查询优化器114可以基于谓词的连接关系对所述多个谓词进行谓词组合或重组,以得到多个谓词组合。比如,查询优化器114可以按照层次进行同层谓词重组。这里,查询优化器114是可以获知谓词之间的连接关系的。然后,查询优化器114可以根据系统表中保存的训练模型的类型,在所述多个谓词组合中选择出训练模型支持的多个候选谓词组合。查询优化器114可以在多个候选谓词组合中,选择出包括的谓词互不相同的第一谓词组合。最后,查询优化器114使用所述第一谓词组合对应的训练模型确定第一执行计划,并使用所述第一执行计划进行数据查询。Specifically, when receiving a query statement (such as a SQL statement) from the client, the database server 100 may parse the one query statement to obtain a plurality of predicates. Next, the query optimizer 114 may perform predicate combination or recombination on the plurality of predicates based on the connection relationship of the predicates to obtain a plurality of predicate combinations. For example, the query optimizer 114 can perform peer-level predicate reorganization in a hierarchy. Here, the query optimizer 114 can learn the connection relationship between the predicates. Then, the query optimizer 114 may select a plurality of candidate predicate combinations supported by the training model among the plurality of predicate combinations according to the type of the training model saved in the system table. The query optimizer 114 may select, among the plurality of candidate predicate combinations, the first predicate combination including the predicates that are different from each other. Finally, the query optimizer 114 determines a first execution plan using the training model corresponding to the first predicate combination, and performs a data query using the first execution plan.
这里,若所述查询优化器114也可以在所述多个候选谓词组合中确定出至少一个第一谓词组合(所述至少一个第一谓词组合包括的谓词互不相同),查询优化器114使用所述至少一个第一谓词组合中每个第一谓词组合对应的训练模型确定一个执行计划,包括:查询优化器114使用每个第一谓词组合对应的训练模型计算谓词选择率,以得到多个谓词选择率,然后将这些多个谓词选择率进行连乘,得到一个最终的谓词选择率,并基于该一个最终的谓词选择率确定一个执行计划。比如,查询优化器114通过计算得到谓词组合C1和C2对应的谓词选择率为A,C3和C4对应的谓词选择率为B,则可以得到C1、C2、C3和C4对应的谓词选择率为A*B。查询优化器114基于谓词选择率A*B确定最终的执行计划。Here, if the query optimizer 114 can also determine at least one first predicate combination among the plurality of candidate predicate combinations (the predicates included in the at least one first predicate combination are different from each other), the query optimizer 114 uses The training model corresponding to each of the at least one first predicate combination determines an execution plan, including: the query optimizer 114 calculates the predicate selection rate using the training model corresponding to each of the first predicate combinations to obtain a plurality of The predicate selection rate is then multiplied by the multiple predicate selection rates to obtain a final predicate selection rate and an execution plan is determined based on the final predicate selection rate. For example, the query optimizer 114 obtains the predicate selection rate corresponding to the predicate combinations C1 and C2 by A, and the predicate selection rate corresponding to C3 and C4 is B, so that the predicate selection rate corresponding to C1, C2, C3, and C4 can be obtained. *B. The query optimizer 114 determines the final execution plan based on the predicate selection rate A*B.
因此,对于一个谓词组合,可以基于谓词的相关性获取谓词组合的训练模型,从而计算谓词选择率。而不需要分别计算一个谓词组合中各个谓词的谓词选择率,并将各个谓词的谓词选择率相乘。也就是说,采用训练模型计算谓词选择率的方法考虑了谓词的相关性,得到的谓词选择率会更准确,从而提高了查询性能或SQL执行性能。Therefore, for a predicate combination, the training model of the predicate combination can be obtained based on the correlation of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply the predicate selection rate of each predicate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance or the SQL execution performance.
上面介绍了谓词组合没有重复谓词的情况,可选地,作为一个实施例,若两个谓词组合具有相同或重复的谓词,查询优化器114可以基于训练模型的置信度选取合适的谓词组 合。应理解,在本申请实施例中,引入“第一谓词组合”和“第二谓词组合”只是为了区分不同的对象,并不对本申请实施例构成限定。The case where the predicate combination has no repeated predicates is described above. Alternatively, as an embodiment, if the two predicate combinations have the same or repeated predicates, the query optimizer 114 may select an appropriate predicate combination based on the confidence of the training model. It should be understood that, in the embodiment of the present application, the “first predicate combination” and the “second predicate combination” are introduced only to distinguish different objects, and the embodiments of the present application are not limited.
下面将结合图7描述根据本申请另一实施例的数据查询的方法700。如图7所示,所述方法700包括:A method 700 of data query in accordance with another embodiment of the present application will now be described in conjunction with FIG. As shown in FIG. 7, the method 700 includes:
S710,在所述多个候选谓词组合中确定至少两个第二谓词组合,所述至少两个第二谓词组合具有至少一个相同的谓词;S710. Determine at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate;
可选地,查询优化器114可以在多个候选谓词组合中确定至少两个第二谓词组合,其中,一个第二谓词组合中可以包括至少两个谓词,所述至少两个第二谓词组合具有至少一个相同或重复的谓词。Optionally, the query optimizer 114 may determine at least two second predicate combinations in the plurality of candidate predicate combinations, wherein one second predicate combination may include at least two predicates, the at least two second predicate combinations having At least one identical or repeated predicate.
这里以查询优化器114确定的具有重复谓词的两个第二谓词组合为例进行说明,其中,每个第二谓词组合可以包括多个谓词。Here, two second predicate combinations having repeated predicates determined by the query optimizer 114 are taken as an example, wherein each second predicate combination may include a plurality of predicates.
比如,谓词组合1可以包括谓词1和谓词2,谓词组合2可以包括谓词1和谓词4,其中,谓词组合1和谓词组合2之间的重复谓词为谓词1。For example, the predicate combination 1 may include a predicate 1 and a predicate 2, and the predicate combination 2 may include a predicate 1 and a predicate 4, wherein the repeated predicate between the predicate combination 1 and the predicate combination 2 is a predicate 1.
或者,又比如,谓词组合3可以包括谓词1、谓词2和谓词3,谓词组合4可以包括谓词1、谓词2和谓词5,其中,谓词组合3和谓词组合4之间的重复谓词为谓词1和谓词2。Alternatively, for example, the predicate combination 3 may include a predicate 1, a predicate 2, and a predicate 3, and the predicate combination 4 may include a predicate 1, a predicate 2, and a predicate 5, wherein the repeated predicate between the predicate combination 3 and the predicate combination 4 is a predicate 1 And predicate 2.
可选地,在本申请实施例中,每个谓词组合存在对应的训练模型。其中,训练模型可以理解为谓词组合的选择率模型。比如,对于由字段1和字段2组成的谓词组合,可以建立一个两列相关的选择率模型。Optionally, in the embodiment of the present application, each predicate combination has a corresponding training model. Among them, the training model can be understood as the selectivity model of the predicate combination. For example, for a predicate combination consisting of field 1 and field 2, a two-column correlation model can be established.
可选地,所述方法600或所述方法700还可以包括:Optionally, the method 600 or the method 700 may further include:
获取所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度。Obtaining a confidence level of a training model corresponding to each of the at least two second predicate combinations.
可选地,查询优化器114可以从数据库的系统表中获取每个第二谓词组合对应的训练模型的置信度。Alternatively, the query optimizer 114 may obtain the confidence of the training model corresponding to each second predicate combination from the system table of the database.
可选地,数据库系统中的系统表中包括每个谓词组合的训练模型的训练结果(比如权值、偏移量等模型参数)以及模型的置信度。其中,模型的置信度用于表示训练模型的准确度。例如,下表1中示出了数据库的系统表中保存的训练模型的部分数据的示例。如表1所示:Optionally, the system table in the database system includes training results (such as weights, offsets, and the like) of the training model of each predicate combination and the confidence of the model. Among them, the confidence of the model is used to indicate the accuracy of the training model. For example, an example of partial data of a training model saved in a system table of a database is shown in Table 1 below. As shown in Table 1:
表1系统表中保存的训练模型的部分数据Table 1 Part of the training model saved in the system table
Figure PCTCN2018083826-appb-000001
Figure PCTCN2018083826-appb-000001
在表1中,sel2表示查询语句中具有相关性的谓词有2位。PRED1与PRED2、PRED1与PRED4、PRED3与PRED5分别为具有相关性的谓词组合。PRED1与PRED2、PRED1与PRED4、PRED3与PRED5分别对应不同的训练模型。valid可以理解为训练模型的标识位,该标志位的值用于表示训练模型的有效性,比如,当valid值为1时,表示训练模型有效;当valid值为0时,表示训练模型无效。confidence用于表示训练模型的置信度,比如,在表1中,PRED1与PRED2对应的训练模型的置信度为0.76,PRED1与PRED4 对应的训练模型的置信度为0.93,PRED3与PRED5对应的训练模型的置信度为0.26。In Table 1, sel2 indicates that the predicate with relevance in the query has 2 bits. PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 are respectively related predicate combinations. PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5 respectively correspond to different training models. Valid can be understood as the identification bit of the training model. The value of the flag is used to indicate the validity of the training model. For example, when the valid value is 1, the training model is valid; when the valid value is 0, the training model is invalid. Confidence is used to indicate the confidence of the training model. For example, in Table 1, the confidence of the training model corresponding to PRED1 and PRED2 is 0.76, the confidence of the training model corresponding to PRED1 and PRED4 is 0.93, and the training model corresponding to PRED3 and PRED5. The confidence level is 0.26.
应理解,上述只是以表1的数据为例进行说明,在实际中,数据库的系统表中还可以包括其他可能的数据,对此不作限定。It should be understood that the foregoing is only an example of the data in Table 1. In practice, other possible data may be included in the system table of the database, which is not limited thereto.
S720,根据所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度信息,在所述至少两个第二谓词组合中确定目标谓词组合,所述置信度用于指示训练模型的准确度;S720. Determine a target predicate combination in the at least two second predicate combinations according to the confidence information of the training model corresponding to each second predicate combination in the at least two second predicate combinations, where the confidence is used. Indicating the accuracy of the training model;
可选地,查询优化器114可以根据第二谓词组合对应的训练模型的置信度,选择合适的或最优的谓词组合,即目标谓词组合。Optionally, the query optimizer 114 may select an appropriate or optimal predicate combination, that is, a target predicate combination, according to the confidence of the training model corresponding to the second predicate combination.
可选地,查询优化器114也可以按照其他筛选条件选择所述目标谓词组合。比如,查询优化器114可以设置阈值筛选条件,在多个第二谓词组合中选择满足阈值筛选条件的目标谓词组合,淘汰不满足阈值筛选条件的其他第二谓词组合。Alternatively, the query optimizer 114 may also select the target predicate combination according to other screening criteria. For example, the query optimizer 114 may set a threshold screening condition, select a target predicate combination that satisfies the threshold screening condition among the plurality of second predicate combinations, and eliminate other second predicate combinations that do not satisfy the threshold screening condition.
可选地,所述目标谓词组合对应的训练模型的置信度大于所述至少两个第二谓词组合中其它谓词组合对应的训练模型的执行度,也就是说,目标谓词组合对应的训练模型的置信度是所有第二谓词组合中最大的。Optionally, the confidence level of the training model corresponding to the target predicate combination is greater than the execution degree of the training model corresponding to the other predicate combinations in the at least two second predicate combinations, that is, the training model corresponding to the target predicate combination Confidence is the largest of all second predicate combinations.
也就是说,查询优化器114可以比较多个第二谓词组合中每个第二谓词组合对应的训练模型的置信度,然后筛选出最大的置信度,从而确定出所述目标谓词组合,淘汰其他谓词组合。That is, the query optimizer 114 may compare the confidence of the training model corresponding to each second predicate combination of the plurality of second predicate combinations, and then filter out the maximum confidence, thereby determining the target predicate combination, and phasing out other Predicate combination.
比如,若谓词组合1(PRED1,PRED2)对应的训练模型的置信度为0.76,谓词组合2(PRED1,PRED4)对应的训练模型的置信度为0.93,则选择置信度较大的谓词组合2(PRED1,PRED4),作为所述目标谓词组合,而淘汰谓词组合1。For example, if the confidence model of the training model corresponding to predicate combination 1 (PRED1, PRED2) is 0.76, and the confidence model of the training model corresponding to predicate combination 2 (PRED1, PRED4) is 0.93, then the predicate combination 2 with greater confidence is selected. PRED1, PRED4), as the target predicate combination, and eliminate the predicate combination 1.
应理解,这里只是以谓词组合1和谓词组合2为例进行说明,实际中可以包括多个谓词组合,对此不作限定。It should be understood that the description of the predicate combination 1 and the predicate combination 2 is merely taken as an example. In practice, a plurality of predicate combinations may be included, which is not limited thereto.
S730,使用所述目标谓词组合对应的训练模型确定第二执行计划,并使用所述第二执行计划进行数据查询。S730: Determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
具体而言,查询优化器114可以使用所述目标谓词组合对应的训练模型进行相应计算,得到对应的执行计划(比如第二执行计划),从而使用所述第二执行计划进行数据查询。由于所述目标谓词组合是筛选过的最优的谓词组合,那么查询优化器114可以根据所述目标谓词组合对应的训练模型得到最优的执行计划。Specifically, the query optimizer 114 may perform a corresponding calculation using the training model corresponding to the target predicate combination to obtain a corresponding execution plan (such as a second execution plan), thereby performing data query using the second execution plan. Since the target predicate combination is the filtered optimal predicate combination, the query optimizer 114 can obtain an optimal execution plan according to the training model corresponding to the target predicate combination.
在本申请实施例中,查询优化器114通过确定至少第二两个谓词组合,所述至少两个第二谓词组合中每个第二谓词组合包括至少两个谓词,所述至少两个第二谓词组合具有至少一个相同的谓词,其中,所述至少两个第二谓词组合中每个第二谓词组合存在对应的训练模型,并根据所述每个第二谓词组合对应的训练模型的置信度,在所述至少两个第二谓词组合中确定目标谓词组合,最后使用所述目标谓词组合对应的训练模型确定第二执行计划,继而使用所述第二执行计划进行数据查询,从而在计算具有重叠谓词的谓词组合的选择率时,能够提高谓词选择率的准确性。In an embodiment of the present application, the query optimizer 114 determines at least a second two predicate combinations, each of the at least two second predicate combinations including at least two predicates, the at least two second The predicate combination has at least one identical predicate, wherein each second predicate combination of the at least two second predicate combinations has a corresponding training model, and according to the confidence of the training model corresponding to each second predicate combination Determining a target predicate combination in the at least two second predicate combinations, and finally determining a second execution plan using the training model corresponding to the target predicate combination, and then using the second execution plan to perform a data query, thereby having When the selection rate of the predicate combination of overlapping predicates is increased, the accuracy of the predicate selection rate can be improved.
应理解,所述方法600与所述方法700可以组合或独立使用,比如,在多个候选谓词组合中,可以存在一些谓词组合不具有重复谓词,还可以存在一些谓词组合具有重复谓词;或者,在多个候选谓词组合中,只存在一些不具有重复谓词的谓词组合;或者,在多个候选谓词组合中,只存在一些具有重复谓词的谓词组合,本申请实施例对此不作限定。It should be understood that the method 600 and the method 700 may be used in combination or independently. For example, in a plurality of candidate predicate combinations, there may be some predicate combinations without repeated predicates, and some predicate combinations may have repeated predicates; or In a plurality of candidate predicate combinations, there are only some predicate combinations that do not have a repeating predicate; or, in a plurality of candidate predicate combinations, there are only some predicate combinations having repeated predicates, which are not limited in this embodiment of the present application.
下面将具体描述如何计算训练模型的置信度。应理解,训练模型的置信度可以通过多种评估方法进行计算,这里只是以一种可能的计算方法为例进行描述,并不对本申请实施例构成限定。还应理解,“训练模型的置信度的计算操作”与“训练模型的训练操作”可以为同一执行主体,可以是独立于数据库的一个模块或其他实现装置,可以位于数据库的外部,对此不作限定。还应理解,数据库的内核可以与外部的训练模型建立相关联的系统表元信息,从而获知训练模型的训练结果或相关数据。How to calculate the confidence of the training model will be specifically described below. It should be understood that the confidence level of the training model can be calculated by using various evaluation methods, and only one possible calculation method is described as an example, and the embodiments of the present application are not limited. It should also be understood that the "calculation operation of the confidence of the training model" and the "training operation of the training model" may be the same execution subject, and may be a module independent of the database or other implementation means, which may be located outside the database, for which no limited. It should also be understood that the kernel of the database can establish associated system table element information with an external training model to learn the training results or related data of the training model.
例如,以第一训练模型的置信度的计算过程为例进行描述,可以包括:For example, the calculation process of the confidence level of the first training model is taken as an example, and may include:
获取第一训练谓词组合,并计算所述第一训练谓词组合的第一选择率;Obtaining a first training predicate combination, and calculating a first selection rate of the first training predicate combination;
将所述第一训练谓词组合代入对应的第一训练模型,计算所述第一训练谓词组合的第二选择率,所述第一训练模型是所述至少两个谓词组合中任一个谓词组合对应的训练模型;Substituting the first training predicate combination into a corresponding first training model, and calculating a second selection rate of the first training predicate combination, the first training model is corresponding to any one of the at least two predicate combinations Training model;
根据所述第一选择率和所述第二选择率,计算所述第一训练谓词组合对应的第一置信度;Calculating, according to the first selection rate and the second selection rate, a first confidence level corresponding to the first training predicate combination;
根据多个所述第一置信度,确定第一训练模型的置信度。A confidence level of the first training model is determined based on the plurality of the first confidence levels.
具体比如,假设第一训练谓词组合为PRED1=const1,PRED2=const2,对应的第一训练模型的函数为f ml。首先根据第一训练模型计算对应的训练谓词组合的选择率S ml,如下式所示: Specifically, for example, assumed that the first training predicate combination PRED1 = const1, PRED2 = const2, a first function corresponding to a training model for f ml. First, the selection rate S ml of the corresponding training predicate combination is calculated according to the first training model, as shown in the following formula:
S ml=f ml(const1,const2) S ml =f ml (const1,const2)
然后计算该第一训练谓词组合真实的选择率S,如下式所示:Then calculating the true selection rate S of the first training predicate combination, as shown in the following equation:
Figure PCTCN2018083826-appb-000002
Figure PCTCN2018083826-appb-000002
其中,count是SQL里面求count的语法含义,用于表示在某个谓词条件中满足谓词条件的元组个数。比如,若一个表里共有10条数据,其中,满足谓词条件PRED1=const1and PRED2=const2的元组共有4个,则count(const1,const2)结果为4,count(*)结果为10。Among them, count is the grammatical meaning of count in SQL, which is used to indicate the number of tuples that satisfy the predicate condition in a predicate condition. For example, if there are 10 pieces of data in a table, among which there are 4 tuples satisfying the predicate condition PRED1=const1and PRED2=const2, the result of count(const1, const2) is 4, and the result of count(*) is 10.
这里,假设计算结果S ml=0.3,S=0.28,所述第一训练谓词组合对应的第一训练模型的第一置信度为c 1,定义c 1的取值如下所示: Here, assuming that the calculation result S ml = 0.3, S = 0.28, the first confidence level of the first training model corresponding to the first training predicate combination is c 1 , and the value of the definition c 1 is as follows:
Figure PCTCN2018083826-appb-000003
Figure PCTCN2018083826-appb-000003
具体地,由于S ml/S=0.3/0.28=1.07,则c 1=1;若S ml=0.3,S=0.38,由于S ml/S=0.3/0.38=0.79,则c 1=0。 Specifically, since S ml /S=0.3/0.28=1.07, c 1 =1; if S ml =0.3, S=0.38, since S ml /S=0.3/0.38=0.79, c 1 =0.
上述描述了一种置信度的计算方式,类似地,对于多个训练谓词组合,对应的置信度可以采用类似的方法进行计算。也就是说,查询优化器114可以获取多个第一训练谓词组合,从而得到每个第一训练谓词组合对应的第一置信度。然后,查询优化器114使用多个第一置信度,计算第一训练模型的置信度。The above describes a way to calculate confidence. Similarly, for multiple training predicate combinations, the corresponding confidence can be calculated in a similar way. That is, the query optimizer 114 may acquire a plurality of first training predicate combinations to obtain a first confidence level corresponding to each first training predicate combination. The query optimizer 114 then calculates the confidence of the first training model using a plurality of first confidences.
比如,n个训练谓词组合可以对应n个c i的值。查询优化器114综合n个c i的值,计算得出训练模型的置信度C为:
Figure PCTCN2018083826-appb-000004
其中,i∈{1,2,...n},n为训练谓词组合的个数。
For example, n training predicate combinations may correspond to values of n c i . The query optimizer 114 integrates the values of n c i and calculates the confidence C of the training model as:
Figure PCTCN2018083826-appb-000004
Where i∈{1,2,...n}, where n is the number of training predicate combinations.
这里,多个第一训练谓词组合可以理解为训练模型的一些未训练数据,用于对训练模型的准确度进行验证。也就是说,可以采用部分未参与模型训练的数据对模型进行校验,以得到训练模型的准确性的值。Here, a plurality of first training predicate combinations can be understood as some untrained data of the training model for verifying the accuracy of the training model. That is to say, the model can be verified by using data that is not part of the model training to obtain the value of the accuracy of the training model.
应理解,这里只是以第一训练模型为例进行说明,在本申请实施例中,每个训练模型的置信度均可以采用上述方法计算,对此不作限定。It should be understood that the first training model is used as an example for description. In the embodiment of the present application, the confidence of each training model may be calculated by using the foregoing method, which is not limited thereto.
因此,针对具有重复谓词的多个谓词组合,查询优化器114可以根据每个谓词组合对应的训练模型的置信度,在多个第二谓词组合中确定目标谓词组合,最后使用所述目标谓词组合对应的第一训练模型确定第二执行计划,继而使用所述第二执行计划进行数据查询,从而在计算具有重叠谓词的谓词组合的选择率时,能够提高谓词选择率的准确性。Therefore, for a plurality of predicate combinations having repeated predicates, the query optimizer 114 may determine the target predicate combination among the plurality of second predicate combinations according to the confidence of the training model corresponding to each predicate combination, and finally use the target predicate combination. The corresponding first training model determines a second execution plan, and then uses the second execution plan to perform a data query, thereby improving the accuracy of the predicate selection rate when calculating the selection rate of the predicate combination having overlapping predicates.
可选地,使用所述目标谓词组合对应的训练模型确定第二执行计划,包括:Optionally, determining a second execution plan by using a training model corresponding to the target predicate combination, including:
获取所述目标谓词组合对应的训练模型的模型参数,所述训练模型参数包括权值、偏移量中的至少一种;Obtaining a model parameter of the training model corresponding to the target predicate combination, where the training model parameter includes at least one of a weight and an offset;
使用所述模型参数生成所述第二执行计划。The second execution plan is generated using the model parameters.
具体而言,查询优化器114可以在数据库的系统表中,查找所述目标谓词组合对应的训练模型的模型参数,所述模型参数可以包括训练模型的训练结果,比如权值、偏移量等参数。比如,权值可以是神经网络训练模型中的神经元连接权值,包括输入层与输出层之间的权值、隐层阈值、输出层阈值、隐层与输出层权值矩阵等,偏移量可以是神经网络训练模型训练所得的权值对应的偏移量。这样,查询优化器114基于模型参数计算谓词选择率,继而生成第二执行计划。Specifically, the query optimizer 114 may search, in the system table of the database, model parameters of the training model corresponding to the target predicate combination, and the model parameters may include training results of the training model, such as weights, offsets, and the like. parameter. For example, the weight may be a neuron connection weight in the neural network training model, including weights between the input layer and the output layer, a hidden layer threshold, an output layer threshold, a hidden layer and an output layer weight matrix, etc., offset The amount may be an offset corresponding to the weight obtained by the training of the neural network training model. Thus, the query optimizer 114 calculates the predicate selection rate based on the model parameters, which in turn generates a second execution plan.
应理解,对于前文所述的不具有重复谓词的至少一个谓词组合,其对应的执行计划也可以参照这里介绍的方法得到,为了简洁,不作赘述。It should be understood that for at least one predicate combination having no repeated predicates as described above, the corresponding execution plan may also be obtained by referring to the method introduced herein, and for brevity, no further description is made.
可选地,在从所述多个候选谓词组合中确定的所述至少两个第二谓词组合中,每个第二谓词组合对应的训练模型的置信度均满足预设条件。Optionally, in the at least two second predicate combinations determined from the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
具体而言,查询优化器114可以获取多个候选谓词组合,所述多个候选谓词组合是查询优化器114基于机器学习算法筛选出来的候选谓词组合,或者可以理解为被训练模型所支持的谓词组合。比如,所述多个候选谓词组合可分别是:谓词组合1(PRED1,PRED2);谓词组合2(PRED1,PRED4);谓词组合3(PRED3,PRED5)。然后,查询优化器114可以在这些多个候选谓词组合中,选择出满足预设条件的所述至少两个第二谓词组合。具体比如,查询优化器114可以对每个候选谓词组合对应的训练模型的置信度进行判断,对于置信度满足预设条件的候选谓词组合,作为所述至少两个第二谓词组合,以便于后续在所述至少两个谓词组合中确定出目标谓词组合。Specifically, the query optimizer 114 may acquire a plurality of candidate predicate combinations, which are candidate predicate combinations selected by the query optimizer 114 based on the machine learning algorithm, or may be understood as predicates supported by the trained model. combination. For example, the plurality of candidate predicate combinations may be: predicate combination 1 (PRED1, PRED2); predicate combination 2 (PRED1, PRED4); predicate combination 3 (PRED3, PRED5). Then, the query optimizer 114 may select the at least two second predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations. For example, the query optimizer 114 may determine the confidence level of the training model corresponding to each candidate predicate combination, as the candidate predicate combination satisfying the preset condition as the at least two second predicate combinations, so as to facilitate subsequent A target predicate combination is determined in the at least two predicate combinations.
可选地,“预设条件”可以是一个具体的阈值,或者,也可以是某一个具体的筛选条件。Optionally, the “preset condition” may be a specific threshold, or may be a specific screening condition.
可选地,所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度均大于第一阈值。Optionally, the confidence of the training model corresponding to each of the at least two second predicate combinations is greater than the first threshold.
这里,所述第一阈值可以理解为查询优化器114内部认可的一个常量。若某一组谓词组合的训练模型的置信度大于所述第一阈值,则认为该训练模型的准确率较高。Here, the first threshold can be understood as a constant recognized internally by the query optimizer 114. If the confidence level of the training model of a certain set of predicate combinations is greater than the first threshold, the accuracy of the training model is considered to be higher.
比如,假设第一阈值为0.3,谓词组合1(PRED1,PRED2)对应的训练模型的置信度为0.76,谓词组合2(PRED1,PRED4)对应的训练模型的置信度为0.93,谓词组合3 (PRED3,PRED5)对应的训练模型的置信度为0.26。那么,查询优化器114在确定所述至少两个谓词组合时,会置信度选择大于0.3的谓词组合,即谓词组合1(PRED1,PRED2)和谓词组合2(PRED1,PRED4),淘汰小于0.3的谓词组合,即谓词组合3(PRED3,PRED5)。For example, suppose the first threshold is 0.3, the confidence of the training model corresponding to predicate combination 1 (PRED1, PRED2) is 0.76, and the confidence of the training model corresponding to predicate combination 2 (PRED1, PRED4) is 0.93, and the predicate combination 3 (PRED3) , PRED5) The corresponding training model has a confidence of 0.26. Then, when determining the at least two predicate combinations, the query optimizer 114 selects a predicate combination greater than 0.3, that is, a predicate combination 1 (PRED1, PRED2) and a predicate combination 2 (PRED1, PRED4), and the elimination is less than 0.3. The predicate combination, that is, the predicate combination 3 (PRED3, PRED5).
或者,查询优化器114也可以设置筛选条件,具体即:对所有谓词组合的训练模型的置信度进行排序,然后在该排序中选取置信度排在较前的第一比值(比如顺序表的前30%)的训练模型,作为被采纳的训练模型。对于置信度排序较靠后的(比如顺序表的后70%)训练模型,可以认为不满足筛选条件,不会被查询优化器114所采纳。Alternatively, the query optimizer 114 may also set a filter condition, that is, sort the confidence of the training model of all predicate combinations, and then select the first ratio in which the confidence is ranked in the ranking (such as before the sequence table). 30%) of the training model as the adopted training model. For a training model with a lower confidence ranking (such as the last 70% of the sequence table), it can be considered that the screening conditions are not met and will not be adopted by the query optimizer 114.
因此,查询优化器114通过引入阈值或筛选条件,可以在多个候选谓词组合中选择满足预设条件的所述至少两个谓词组合,从而获得准确率较高的训练模型,以便于后续输出执行计划。Therefore, the query optimizer 114 can select the at least two predicate combinations that satisfy the preset condition among the plurality of candidate predicate combinations by introducing a threshold or a filter condition, thereby obtaining a training model with a higher accuracy rate, so as to facilitate subsequent output execution. plan.
为了便于本领域的技术人员更清楚得理解本申请实施例,下面将结合图8至图11进行描述。In order to facilitate a clear understanding of the embodiments of the present application, those skilled in the art will be described below with reference to FIGS. 8 to 11.
图8示出了根据本申请实施例的多个候选谓词组合的一个例子的示意图。如图8所示,数据库管理系统108可以接收来自客户端通过与数据库服务器建立的通信连接提交的SQL查询语句(如图8中最上部的框内所示),下划线部分为常量谓词(比如,常量谓词可以为常量表达式或常量函数)。接着,数据库管理系统108的解析器112可以对该SQL查询语句进行分析,以得到可以被训练模型(或机器学习模型)所支持的谓词,分析后得到PRED1、PRED2、PRED3、PRED4、PRED5(如图8中中间部分的框内带有下划线的谓词)。其中,查询优化器114分析出连接谓词不被训练模型所支持。进一步地,查询优化器114可以基于PRED1、PRED2、PRED3、PRED4、PRED5,具体得到多个候选谓词组合。如图8中最下部的框内所示,查询优化器114得到三组具有两列选择率的谓词组合(即每个谓词组合中包括两个谓词),分别为:PRED2和PRED1,PRED1和PRED4,PRED3和PRED5。其中,每组谓词组合对应一个训练模型,每个训练模型具有置信度。这样,查询优化器114可以基于该多个候选谓词组合进行后续操作。FIG. 8 shows a schematic diagram of an example of a plurality of candidate predicate combinations in accordance with an embodiment of the present application. As shown in FIG. 8, the database management system 108 can receive SQL query statements submitted by the client through a communication connection established with the database server (as shown in the uppermost box of FIG. 8), and the underlined portion is a constant predicate (eg, Constant predicates can be constant expressions or constant functions). Next, the parser 112 of the database management system 108 can analyze the SQL query statement to obtain a predicate that can be supported by the training model (or machine learning model), and obtain the PRED1, PRED2, PRED3, PRED4, and PRED5 after analysis. The underlined predicate in the box in the middle of Figure 8). Among them, the query optimizer 114 analyzes that the connection predicate is not supported by the training model. Further, the query optimizer 114 may specifically obtain a plurality of candidate predicate combinations based on PRED1, PRED2, PRED3, PRED4, and PRED5. As shown in the lowermost box of Figure 8, the query optimizer 114 obtains three sets of predicate combinations with two column selection rates (i.e., two predicates in each predicate combination), namely: PRED2 and PRED1, PRED1 and PRED4. , PRED3 and PRED5. Among them, each set of predicate combinations corresponds to one training model, and each training model has a confidence level. Thus, query optimizer 114 can perform subsequent operations based on the plurality of candidate predicate combinations.
图9示出了根据本申请实施例的一个例子的流程图。如图9所示,查询优化器114通过初筛操作可以获取多个候选谓词组合(比如图8中所示的多个候选谓词组合),并对每个候选谓词组合的置信度进行判断。若判断出置信度不满足预设条件,则淘汰该候选谓词组合;若判断出置信度满足预设条件,则对剩余的候选谓词组合进行二次筛选。应理解,该预设条件可以是阈值或其他筛选条件,对此不作限定。可选地,查询优化器114还可以判断候选谓词组合对应的训练模型是否有效(比如valid值),在训练模型有效时,才可能进入下一步操作。FIG. 9 shows a flow chart of an example in accordance with an embodiment of the present application. As shown in FIG. 9, the query optimizer 114 can acquire a plurality of candidate predicate combinations (such as the plurality of candidate predicate combinations shown in FIG. 8) by the preliminary screening operation, and judge the confidence of each candidate predicate combination. If it is determined that the confidence does not satisfy the preset condition, the candidate predicate combination is eliminated; if it is determined that the confidence meets the preset condition, the remaining candidate predicate combinations are subjected to secondary screening. It should be understood that the preset condition may be a threshold or other screening conditions, which is not limited thereto. Optionally, the query optimizer 114 may also determine whether the training model corresponding to the candidate predicate combination is valid (such as a valid value), and may enter the next operation when the training model is valid.
然后,在二次筛选操作中,对具有重复或相同谓词的至少两个谓词组合,查询优化器114需要判断它们的置信度在该至少两个谓词组合中是否是最大的。继而,查询优化器114在具有重复或相同谓词的至少两个谓词组合中,选择置信度最大的谓词组合作为胜出谓词组合,并使用胜出谓词组合对应的训练模型进行相应选择率的计算,最后输出最优执行计划。可选地,查询优化器114可以淘汰置信度非最大的其他谓词组合。因此,查询优化器114通过两次筛选,最终可以获取最优的谓词组合对应的训练模型,并进行相应计算,从而得到最优执行计划。Then, in a secondary screening operation, for at least two predicate combinations with duplicate or identical predicates, the query optimizer 114 needs to determine if their confidence is greatest among the at least two predicate combinations. Then, the query optimizer 114 selects the predicate combination with the greatest confidence as the winning predicate combination in at least two predicate combinations with repeated or identical predicates, and uses the training model corresponding to the winning predicate combination to calculate the corresponding selection rate, and finally outputs Optimal execution plan. Alternatively, query optimizer 114 may eliminate other predicate combinations where the confidence is not the greatest. Therefore, the query optimizer 114 can obtain the training model corresponding to the optimal predicate combination through two screenings, and perform corresponding calculations to obtain an optimal execution plan.
可选地,在二次筛选操作中,查询优化器114也有可能得到不具有重复谓词的谓词组合,但是其对应的置信度也满足前述预设条件,只不过没有包括重复谓词(图9中未示出)。此时,查询优化器114可以使用其对应的训练模型进行相应地计算,得到其对应的执行计划。Optionally, in the secondary screening operation, the query optimizer 114 may also obtain a predicate combination without a repeated predicate, but the corresponding confidence also satisfies the foregoing preset condition, but does not include the repeated predicate (not shown in FIG. show). At this time, the query optimizer 114 can use its corresponding training model to perform corresponding calculations to obtain its corresponding execution plan.
应理解,该至少两个谓词组合具有的重复或相同谓词,可以是一个,也可以是多个,对此不作限定。It should be understood that the repetition or the same predicate of the at least two predicate combinations may be one or multiple, which is not limited thereto.
图10示出了根据本申请实施例的一个具体例子的流程图。这里,图10是图9的进一步直观体现。如图10所示,查询优化器114通过初筛获取的三组候选谓词组合分别为:PRED1和PRED2,PRED1和PRED4,PRED3和PRED5。可知,PRED1和PRED2,PRED1和PRED4具有重复的谓词PRED1。其中,PRED1和PRED2对应的训练模型的置信度为0.76;PRED1和PRED4对应的训练模型的置信度为0.93;PRED3和PRED5对应的训练模型的置信度为0.26。接着,查询优化器114判断该3组候选谓词组合各自对应的置信度是否大于0.3。显然,0.26小于0.3,查询优化器114淘汰谓词组合PRED3和PRED5;0.76大于0.3,0.93大于0.3,查询优化器114对PRED1和PRED2,PRED1和PRED4进行二次筛选。接着,查询优化器114对具有重复谓词PRED1的谓词组合(即PRED1和PRED2,PRED1和PRED4)各自对应的置信度进行判断,挑选出置信度最大的谓词组合,这里即PRED1和PRED4,并淘汰PRED1和PRED2。最后,查询优化器114使用PRED1和PRED4对应的训练模型进行相应计算,以输出执行计划。FIG. 10 shows a flow chart of a specific example in accordance with an embodiment of the present application. Here, FIG. 10 is a further visual representation of FIG. As shown in FIG. 10, the three sets of candidate predicate combinations obtained by the query optimizer 114 through the preliminary screening are: PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5. It can be seen that PRED1 and PRED2, PRED1 and PRED4 have duplicate predicates PRED1. Among them, the confidence model of the training model corresponding to PRED1 and PRED2 is 0.76; the confidence of the training model corresponding to PRED1 and PRED4 is 0.93; the confidence of the training model corresponding to PRED3 and PRED5 is 0.26. Next, the query optimizer 114 determines whether the respective confidence levels of the three sets of candidate predicate combinations are greater than 0.3. Obviously, 0.26 is less than 0.3, the query optimizer 114 eliminates the predicate combinations PRED3 and PRED5; 0.76 is greater than 0.3, 0.93 is greater than 0.3, and the query optimizer 114 performs secondary screening on PRED1 and PRED2, PRED1 and PRED4. Next, the query optimizer 114 determines the confidence levels of the predicate combinations with the repeated predicate PRED1 (ie, PRED1 and PRED2, PRED1 and PRED4), and selects the predicate combination with the highest confidence, here PRED1 and PRED4, and eliminates PRED1. And PRED2. Finally, the query optimizer 114 performs corresponding calculations using the training models corresponding to PRED1 and PRED4 to output an execution plan.
应理解,这里只是以一个重复谓词PRED1为例进行说明,实际中可以有多个重复的谓词,同样可以采用本申请实施例的方法,对此不作限定。It should be understood that the description of the repeated predicate PRED1 is taken as an example. In practice, there may be multiple repeated predicates, and the method of the embodiment of the present application may also be used, which is not limited thereto.
可选地,在二次筛选操作中,还有可能得到不具有重复谓词的谓词组合,但是其对应的置信度也大于0.3,比如PRED6和PRED7(图10中未示出)。此时,查询优化器114可以使用PRED6和PRED7对应的训练模型进行相应地计算,得到其对应的执行计划。Alternatively, in the secondary screening operation, it is also possible to obtain a predicate combination that does not have a repeating predicate, but its corresponding confidence is also greater than 0.3, such as PRED6 and PRED7 (not shown in FIG. 10). At this time, the query optimizer 114 can perform corresponding calculations using the training models corresponding to PRED6 and PRED7 to obtain their corresponding execution plans.
图11示出了应用本申请实施例的一个例子的示意图。图11中直观展示了图10中胜出的谓词组合。如图11所示,查询优化器114在多个候选谓词组合中(PRED1和PRED2,PRED1和PRED4,PRED3和PRED5),最终得到胜出的谓词组合为PRED1和PRED4。Fig. 11 is a diagram showing an example of application of an embodiment of the present application. The predicate combination that wins in Figure 10 is visually shown in Figure 11. As shown in FIG. 11, the query optimizer 114 is among a plurality of candidate predicate combinations (PRED1 and PRED2, PRED1 and PRED4, PRED3 and PRED5), and finally the winning predicate combinations are PRED1 and PRED4.
应理解,上述只是以图10和图11为例进行描述,并不对本申请实施例构成限定。It should be understood that the foregoing description is only made by taking FIG. 10 and FIG. 11 as an example, and does not limit the embodiment of the present application.
本申请实施例的数据查询的方法,能够提高谓词选择率的准确性,从而能够提升数据查询的查询性能。进一步地,对于具有重复谓词的至少两个谓词组合,根据置信度选择置信度高的谓词组合对应的训练模型,能够提高谓词选择率的准确性。The data query method of the embodiment of the present application can improve the accuracy of the predicate selection rate, thereby improving the query performance of the data query. Further, for at least two predicate combinations having repeated predicates, the training model corresponding to the predicate combination with high confidence is selected according to the confidence degree, and the accuracy of the predicate selection rate can be improved.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application. The implementation process constitutes any limitation.
上文详细描述了根据本申请实施例的数据查询的方法,下面将描述根据本申请实施例的数据查询的装置和数据库系统。该数据查询的装置和该数据库系统可以执行前述本申请实施例的数据查询的方法。The method of data query according to an embodiment of the present application is described in detail above, and an apparatus and database system for data query according to an embodiment of the present application will be described below. The device for querying the data and the database system can perform the method of data query of the foregoing embodiment of the present application.
图12示出了根据本申请实施例的数据查询的装置1200的示意性框图。如图12所示,该装置1200包括:FIG. 12 shows a schematic block diagram of an apparatus 1200 for data query in accordance with an embodiment of the present application. As shown in FIG. 12, the apparatus 1200 includes:
接收模块1210,用于接收查询语句;The receiving module 1210 is configured to receive a query statement.
处理模块1220,用于对所述查询语句进行解析,以得到多个谓词;还用于对所述多个谓词进行谓词组合,以得到多个谓词组合;The processing module 1220 is configured to parse the query statement to obtain a plurality of predicates, and is further configured to perform a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;
第一确定模块1230,用于根据预配置的训练模型的类型,在所述多个谓词组合中确定出所述预配置的训练模型对应的多个候选谓词组合,所述多个候选谓词组合中的每个候选谓词组合包括至少两个谓词;The first determining module 1230 is configured to determine, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, where the plurality of candidate predicate combinations are Each candidate predicate combination includes at least two predicates;
所述第一确定模块1230,还用于在所述多个候选谓词组合中确定第一谓词组合,所述第一谓词组合包括的谓词互不相同;The first determining module 1230 is further configured to determine, in the plurality of candidate predicate combinations, a first predicate combination, where the first predicate combination includes predicates different from each other;
所述处理模块1220,还用于使用所述第一谓词组合对应的训练模型确定第一执行计划,并使用所述第一执行计划进行数据查询。The processing module 1220 is further configured to determine a first execution plan by using a training model corresponding to the first predicate combination, and perform data query by using the first execution plan.
本申请实施例的数据查询的装置1200,可以在多个候选谓词组合中,确定出不具有相同谓词的第一谓词组合。由于每个候选谓词组合存在对应的训练模型,若所述第一谓词组合不具有相同的谓词,可以使用第一个谓词组合对应的训练模型确定第一执行计划,即使用第一谓词组合对应的训练模型计算谓词选择率,从而生成第一执行计划,并基于第一执行计划进行数据查询。换言之,对于一个谓词组合,可以基于谓词的相关性获取谓词组合的训练模型,从而计算谓词选择率。而不需要分别计算一个谓词组合中各个谓词的谓词选择率,并将各个谓词选择率相乘。也就是说,采用训练模型计算谓词选择率的方法考虑了谓词的相关性,得到的谓词选择率会更准确,从而提高了查询性能。The apparatus 1200 for data query in the embodiment of the present application may determine, in a plurality of candidate predicate combinations, a first predicate combination that does not have the same predicate. Since there is a corresponding training model for each candidate predicate combination, if the first predicate combination does not have the same predicate, the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used. The training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan. In other words, for a predicate combination, the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
应注意,在本申请实施例中,该装置1200可以是前面所描述的查询优化器114,或者是集成在查询优化器114中的软/硬件功能单元。例如,接收模块1210可以由接收器,或者通信接口实现,处理模块1220和第一确定模块1230的功能可以由至少一个处理器执行存储器中的指令来实现。可选地,该数据库查询装置中的各个组件可通过总线系统耦合在一起,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线等。It should be noted that in the embodiment of the present application, the apparatus 1200 may be the query optimizer 114 described above or a software/hardware functional unit integrated in the query optimizer 114. For example, the receiving module 1210 can be implemented by a receiver, or a communication interface, and the functions of the processing module 1220 and the first determining module 1230 can be implemented by at least one processor executing instructions in memory. Optionally, the components in the database query device may be coupled together by a bus system, wherein the bus system includes a power bus, a control bus, a status signal bus, and the like in addition to the data bus.
可选地,作为一个实施例,第一确定模块1220还用于,在所述多个候选谓词组合中确定至少两个第二谓词组合,所述至少两个第二谓词组合具有至少一个相同的谓词;Optionally, as an embodiment, the first determining module 1220 is further configured to: determine, in the plurality of candidate predicate combinations, at least two second predicate combinations, the at least two second predicate combinations having at least one identical predicate;
如图13所示,可选地,作为一个实施例,所述装置1200还包括:As shown in FIG. 13 , as an embodiment, the apparatus 1200 further includes:
第二确定模块1240,用于根据所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度,在所述至少两个谓词组合中确定目标谓词组合,所述置信度用于指示训练模型的准确度;a second determining module 1240, configured to determine a target predicate combination in the at least two predicate combinations according to a confidence level of a training model corresponding to each second predicate combination in the at least two second predicate combinations, the confidence Degree is used to indicate the accuracy of the training model;
所述处理模块1220还用于,使用所述目标谓词组合对应的训练模型确定第二执行计划,并使用所述第二执行计划进行数据查询。The processing module 1220 is further configured to determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
可选地,作为一个实施例,所述装置1200还包括:Optionally, as an embodiment, the apparatus 1200 further includes:
获取模块,用于获取所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度。And an obtaining module, configured to acquire a confidence level of the training model corresponding to each second predicate combination in the at least two second predicate combinations.
可选地,所述目标谓词组合对应的训练模型的置信度在所述至少两个谓词组合中是最大的。Optionally, a confidence level of the training model corresponding to the target predicate combination is the largest among the at least two predicate combinations.
可选地,在所述多个候选谓词组合中确定的所述第二至少两个谓词组合中,每个第二谓词组合对应的训练模型的置信度均满足预设条件。Optionally, in the second at least two predicate combinations determined by the plurality of candidate predicate combinations, the confidence of the training model corresponding to each second predicate combination satisfies a preset condition.
可选地,所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度均 大于第一阈值。Optionally, a confidence level of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
可选地,所述处理模块1220具体用于:Optionally, the processing module 1220 is specifically configured to:
获取所述目标谓词组合对应的训练模型的模型参数,所述训练模型参数包括权值、偏移量中的至少一种;使用所述模型参数生成所述第二执行计划。Obtaining a model parameter of the training model corresponding to the target predicate combination, the training model parameter includes at least one of a weight and an offset; and generating the second execution plan by using the model parameter.
根据本申请实施例的数据查询的装置1200可执行根据本申请实施例的数据查询的方法600或700,并且该数据查询的装置1200中的各个模块的上述和其它操作和/或功能分别为了实现前述各个方法的相应流程,为了简洁,在此不再赘述。另外,第二确定模块1240和获取模块的功能也可以由至少一个处理器执行存储器中的指令来实现。本申请实施例的数据查询的装置1200,可以在多个候选谓词组合中,选择出不具有相同的谓词的第一谓词组合。由于每个候选谓词组合存在对应的训练模型,若所述第一谓词组合包括的谓词互不相同,可以使用第一谓词组合对应的训练模型确定第一执行计划,即使用第一谓词组合对应的训练模型计算谓词选择率,从而生成第一执行计划,并基于第一执行计划进行数据查询。换言之,对于一个谓词组合,可以基于谓词的相关性获取谓词组合的训练模型,从而计算谓词选择率。而不需要分别计算一个谓词组合中各个谓词的谓词选择率,并将各个谓词选择率相乘。也就是说,采用训练模型计算谓词选择率的方法考虑了谓词的相关性,得到的谓词选择率会更准确,从而提高了查询性能。The apparatus 1200 for data query according to an embodiment of the present application may perform the method 600 or 700 of data query according to an embodiment of the present application, and the above and other operations and/or functions of the respective modules in the apparatus 1200 of the data query are respectively implemented for The corresponding processes of the foregoing various methods are not described herein for the sake of brevity. Additionally, the functions of the second determining module 1240 and the obtaining module may also be implemented by at least one processor executing instructions in memory. The device 1200 for data query in the embodiment of the present application may select a first predicate combination that does not have the same predicate among a plurality of candidate predicate combinations. Since each candidate predicate combination has a corresponding training model, if the first predicate combination includes different predicates, the first execution plan may be determined using the training model corresponding to the first predicate combination, that is, the first predicate combination is used. The training model calculates the predicate selection rate, thereby generating a first execution plan and performing a data query based on the first execution plan. In other words, for a predicate combination, the training model of the predicate combination can be obtained based on the relevance of the predicate, thereby calculating the predicate selection rate. It is not necessary to separately calculate the predicate selection rate of each predicate in a predicate combination, and multiply each predicate selection rate. That is to say, the method of calculating the predicate selection rate by using the training model considers the relevance of the predicate, and the obtained predicate selection rate is more accurate, thereby improving the query performance.
图14示出了根据本申请实施例的数据库系统1400的示意性框图。如图14所示,该数据库系统1400包括前述本申请实施例的数据查询的装置1200和数据库1410。该数据库系统1400可以执行前述本申请实施例的数据查询的方法,在数据库1410中进行查询。FIG. 14 shows a schematic block diagram of a database system 1400 in accordance with an embodiment of the present application. As shown in FIG. 14, the database system 1400 includes the device 1200 and database 1410 of the data query of the foregoing embodiment of the present application. The database system 1400 can perform the foregoing method of data query in the embodiment of the present application, and perform an inquiry in the database 1410.
图15示出了本申请一个实施例提供的数据查询的装置的结构,包括至少一个处理器1502(例如CPU),至少一个网络接口1503或者其他通信接口,存储器1504。可选地,还可以接收器1505和发送器1506。处理器1502用于执行存储器1504中存储的可执行模块,例如计算机程序。存储器1504可能包含高速随机存取存储器RAM,也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个网络接口1503(可以是有线或者无线)实现与至少一个其他网元之间的通信连接。接收器1505和发送器1506用于传输各种信号或信息。FIG. 15 shows the structure of an apparatus for data query provided by an embodiment of the present application, including at least one processor 1502 (for example, a CPU), at least one network interface 1503 or other communication interface, and a memory 1504. Alternatively, a receiver 1505 and a transmitter 1506 can also be used. The processor 1502 is configured to execute an executable module, such as a computer program, stored in the memory 1504. The memory 1504 may include a high speed random access memory RAM, and may also include a non-volatile memory such as at least one disk memory. A communication connection with at least one other network element is achieved by at least one network interface 1503, which may be wired or wireless. Receiver 1505 and transmitter 1506 are used to transmit various signals or information.
在一些实施方式中,存储器1504存储了程序15041,程序15041可以被处理器1502执行,用于执行前述本申请实施例的数据查询的方法。In some embodiments, the memory 1504 stores a program 15041 that can be executed by the processor 1502 for performing the method of data query of the foregoing embodiments of the present application.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请实施例所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点, 所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请实施例各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present application, or the part contributing to the prior art or the part of the technical solution, may be embodied in the form of a software product stored in a storage medium. The instructions include a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。The foregoing is only a specific embodiment of the embodiments of the present application, but the scope of protection of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily adopt the technical scope disclosed in the embodiments of the present application. All changes or substitutions are contemplated to be within the scope of the embodiments of the present application. Therefore, the scope of protection of the embodiments of the present application is subject to the scope of protection of the claims.

Claims (17)

  1. 一种数据查询的方法,其特征在于,包括:A method for data query, comprising:
    接收查询语句;Receiving a query statement;
    对所述查询语句进行解析,以得到多个谓词;Parsing the query statement to obtain a plurality of predicates;
    对所述多个谓词进行谓词组合,以得到多个谓词组合;Performing a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;
    根据预配置的训练模型的类型,在所述多个谓词组合中确定出所述预配置的训练模型对应的多个候选谓词组合,所述多个候选谓词组合中的每个候选谓词组合包括至少两个谓词;Determining, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, each candidate predicate combination of the plurality of candidate predicate combinations including at least Two predicates;
    在所述多个候选谓词组合中确定第一谓词组合,所述第一谓词组合包括的谓词互不相同;Determining, in the plurality of candidate predicate combinations, a first predicate combination, the first predicate combination including predicates different from each other;
    使用所述第一谓词组合对应的训练模型确定第一执行计划,并使用所述第一执行计划进行数据查询。A first execution plan is determined using the training model corresponding to the first predicate combination, and a data query is performed using the first execution plan.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1 further comprising:
    在所述多个候选谓词组合中确定至少两个第二谓词组合,所述至少两个第二谓词组合具有至少一个相同的谓词;Determining at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least one identical predicate;
    根据所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度,在所述至少两个第二谓词组合中确定目标谓词组合,所述置信度用于指示训练模型的准确度;Determining a target predicate combination in the at least two second predicate combinations according to a confidence level of a training model corresponding to each of the at least two second predicate combinations, the confidence level being used to indicate a training model Accuracy
    使用所述目标谓词组合对应的训练模型确定第二执行计划,并使用所述第二执行计划进行数据查询。A second execution plan is determined using a training model corresponding to the target predicate combination, and a data query is performed using the second execution plan.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, wherein the method further comprises:
    获取所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度。Obtaining a confidence level of a training model corresponding to each of the at least two second predicate combinations.
  4. 根据权利要求3所述的方法,其特征在于,所述目标谓词组合为所述至少两个第二谓词组合中训练模型的置信度最大的谓词组合。The method according to claim 3, wherein said target predicate combination is a predicate combination having the greatest confidence of the training model in said at least two second predicate combinations.
  5. 根据权利要求2至4中任一项所述的方法,其特征在于,在从所述多个候选谓词组合中确定的所述至少两个第二谓词组合中,每个第二谓词组合对应的训练模型的置信度均满足预设条件。The method according to any one of claims 2 to 4, wherein each of the at least two second predicate combinations determined from the plurality of candidate predicate combinations corresponds to each second predicate combination The confidence of the training model satisfies the preset conditions.
  6. 根据权利要求5所述的方法,其特征在于,所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度均大于第一阈值。The method according to claim 5, wherein the confidence of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
  7. 根据权利要求2至6中任一项所述的方法,其特征在于,所述使用所述目标谓词组合对应的训练模型确定第二执行计划,包括:The method according to any one of claims 2 to 6, wherein the determining the second execution plan by using the training model corresponding to the target predicate combination comprises:
    获取所述目标谓词组合对应的训练模型的模型参数,所述训练模型参数包括权值、偏移量中的至少一种;Obtaining a model parameter of the training model corresponding to the target predicate combination, where the training model parameter includes at least one of a weight and an offset;
    使用所述模型参数生成所述第二执行计划。The second execution plan is generated using the model parameters.
  8. 一种数据查询的装置,其特征在于,包括:A device for data query, comprising:
    接收模块,用于接收查询语句;a receiving module, configured to receive a query statement;
    处理模块,用于对所述查询语句进行解析,以得到多个谓词;还用于对所述多个谓词进行谓词组合,以得到多个谓词组合;a processing module, configured to parse the query statement to obtain a plurality of predicates; and further configured to perform a predicate combination on the plurality of predicates to obtain a plurality of predicate combinations;
    第一确定模块,用于根据预配置的训练模型的类型,在所述多个谓词组合中确定出所述预配置的训练模型对应的多个候选谓词组合,所述多个候选谓词组合中的每个候选谓词组合包括至少两个谓词;a first determining module, configured to determine, in the plurality of predicate combinations, a plurality of candidate predicate combinations corresponding to the pre-configured training model according to a type of the pre-configured training model, where the plurality of candidate predicate combinations Each candidate predicate combination includes at least two predicates;
    所述第一确定模块,还用于在所述多个候选谓词组合中确定第一谓词组合,所述第一谓词组合包括的谓词互不相同;The first determining module is further configured to determine, in the plurality of candidate predicate combinations, a first predicate combination, where the first predicate combination includes predicates different from each other;
    所述处理模块,还用于使用所述第一谓词组合对应的训练模型确定第一执行计划,并使用所述第一执行计划进行数据查询。The processing module is further configured to determine a first execution plan by using a training model corresponding to the first predicate combination, and perform data query using the first execution plan.
  9. 根据权利要求8所述的装置,其特征在于,第一确定模块还用于,在所述多个候选谓词组合中确定至少两个第二谓词组合,所述至少两个第二谓词组合具有至少一个相同的谓词;The apparatus according to claim 8, wherein the first determining module is further configured to determine at least two second predicate combinations in the plurality of candidate predicate combinations, the at least two second predicate combinations having at least An identical predicate;
    所述装置还包括:The device also includes:
    第二确定模块,用于根据所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度,在所述至少两个第二谓词组合中确定目标谓词组合,所述置信度用于指示训练模型的准确度;a second determining module, configured to determine a target predicate combination in the at least two second predicate combinations according to a confidence level of a training model corresponding to each second predicate combination in the at least two second predicate combinations, Confidence is used to indicate the accuracy of the training model;
    所述处理模块还用于,使用所述目标谓词组合对应的训练模型确定第二执行计划,并使用所述第二执行计划进行数据查询。The processing module is further configured to determine a second execution plan by using a training model corresponding to the target predicate combination, and perform data query using the second execution plan.
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:The device according to claim 9, wherein the device further comprises:
    获取模块,用于获取所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度。And an obtaining module, configured to acquire a confidence level of the training model corresponding to each second predicate combination in the at least two second predicate combinations.
  11. 根据权利要求10所述的装置,其特征在于,所述目标谓词组合为所述至少两个第二谓词组合中训练模型的置信度最大的谓词组合。The apparatus according to claim 10, wherein said target predicate combination is a predicate combination having the greatest confidence of the training model in said at least two second predicate combinations.
  12. 根据权利要求9至11中任一项所述的装置,其特征在于,在从所述多个候选谓词组合中确定的所述至少两个第二谓词组合中,每个第二谓词组合对应的训练模型的置信度均满足预设条件。The apparatus according to any one of claims 9 to 11, wherein each of the at least two second predicate combinations determined from the plurality of candidate predicate combinations corresponds to each second predicate combination The confidence of the training model satisfies the preset conditions.
  13. 根据权利要求12所述的装置,其特征在于,所述至少两个第二谓词组合中每个第二谓词组合对应的训练模型的置信度均大于第一阈值。The apparatus according to claim 12, wherein a confidence level of the training model corresponding to each of the at least two second predicate combinations is greater than a first threshold.
  14. 根据权利要求9至13中任一项所述的装置,其特征在于,所述处理模块具体用于:The device according to any one of claims 9 to 13, wherein the processing module is specifically configured to:
    获取所述目标谓词组合对应的训练模型的模型参数,所述训练模型参数包括权值、偏移量中的至少一种;使用所述模型参数生成所述第二执行计划。Obtaining a model parameter of the training model corresponding to the target predicate combination, the training model parameter includes at least one of a weight and an offset; and generating the second execution plan by using the model parameter.
  15. 一种数据查询的装置,其特征在于,所述装置包括至少一个处理器、存储器及存储在所述存储器上并可被所述至少一个处理器执行的指令,其特征在于,所述至少一个处理器执行所述指令,以实现权利要求1至7中任一项所述的方法的步骤。An apparatus for data query, the apparatus comprising at least one processor, a memory, and instructions stored on the memory and executable by the at least one processor, wherein the at least one processing The instructions are executed to implement the steps of the method of any one of claims 1 to 7.
  16. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to perform the steps of the method of any one of claims 1 to 7.
  17. 一种数据库系统,其特征在于,包括根据权利要求8至14中任一项所述的数据查询的装置和数据库。A database system characterized by comprising means and a database for data querying according to any one of claims 8 to 14.
PCT/CN2018/083826 2017-05-04 2018-04-20 Data query method, device, and database system WO2018201916A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710308623.9 2017-05-04
CN201710308623.9A CN108804473B (en) 2017-05-04 2017-05-04 Data query method, device and database system

Publications (1)

Publication Number Publication Date
WO2018201916A1 true WO2018201916A1 (en) 2018-11-08

Family

ID=64016819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083826 WO2018201916A1 (en) 2017-05-04 2018-04-20 Data query method, device, and database system

Country Status (2)

Country Link
CN (1) CN108804473B (en)
WO (1) WO2018201916A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391565A (en) * 2018-11-15 2019-02-26 天津津航计算技术研究所 A kind of fiber buss network automatic Verification system and method
CN111444220A (en) * 2020-05-09 2020-07-24 南京大学 Cross-platform SQ L query optimization method combining rule driving and data driving
CN115827930A (en) * 2023-02-15 2023-03-21 杭州悦数科技有限公司 Data query optimization method, system and device of graph database
WO2023236238A1 (en) * 2022-06-09 2023-12-14 深圳计算科学研究院 Relational data-based data processing method and apparatus thereof

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324605B (en) * 2020-01-22 2020-11-10 北京东方金信科技有限公司 Dynamic adjustment method and application for data hybrid storage in database
CN113806190A (en) * 2020-06-17 2021-12-17 华为技术有限公司 Method, device and system for predicting performance of database management system
CN112347104B (en) * 2020-11-06 2023-09-29 中国人民大学 Column storage layout optimization method based on deep reinforcement learning
CN115048425A (en) * 2022-06-09 2022-09-13 深圳计算科学研究院 Data screening method and device based on reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1825305A (en) * 2005-10-31 2006-08-30 北京神舟航天软件技术有限公司 Query plan caching method and system based on predicate criticality analysis
US20070219951A1 (en) * 2006-03-15 2007-09-20 Oracle International Corporation Join predicate push-down optimizations
US20090299989A1 (en) * 2004-07-02 2009-12-03 Oracle International Corporation Determining predicate selectivity in query costing
CN105303501A (en) * 2015-11-23 2016-02-03 北京航空航天大学 Community information service system and method based on picture recommendation
CN106095956A (en) * 2016-06-15 2016-11-09 北京智能管家科技有限公司 support information fission querying method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9158815B2 (en) * 2010-10-19 2015-10-13 Hewlett-Packard Development Company, L.P. Estimating a number of unique values in a list
CN102760143A (en) * 2011-04-28 2012-10-31 国际商业机器公司 Method and device for dynamically integrating executing structures in database system
US9720966B2 (en) * 2012-12-20 2017-08-01 Teradata Us, Inc. Cardinality estimation for optimization of recursive or iterative database queries by databases
CN104216891B (en) * 2013-05-30 2018-02-02 国际商业机器公司 The optimization method and equipment of query statement in relevant database
CN104915717B (en) * 2015-06-02 2017-11-14 百度在线网络技术(北京)有限公司 Data processing method, Analysis of Knowledge Bases Reasoning method and relevant apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090299989A1 (en) * 2004-07-02 2009-12-03 Oracle International Corporation Determining predicate selectivity in query costing
CN1825305A (en) * 2005-10-31 2006-08-30 北京神舟航天软件技术有限公司 Query plan caching method and system based on predicate criticality analysis
US20070219951A1 (en) * 2006-03-15 2007-09-20 Oracle International Corporation Join predicate push-down optimizations
CN105303501A (en) * 2015-11-23 2016-02-03 北京航空航天大学 Community information service system and method based on picture recommendation
CN106095956A (en) * 2016-06-15 2016-11-09 北京智能管家科技有限公司 support information fission querying method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391565A (en) * 2018-11-15 2019-02-26 天津津航计算技术研究所 A kind of fiber buss network automatic Verification system and method
CN111444220A (en) * 2020-05-09 2020-07-24 南京大学 Cross-platform SQ L query optimization method combining rule driving and data driving
CN111444220B (en) * 2020-05-09 2023-09-01 南京大学 Cross-platform SQL query optimization method combining rule driving and data driving
WO2023236238A1 (en) * 2022-06-09 2023-12-14 深圳计算科学研究院 Relational data-based data processing method and apparatus thereof
CN115827930A (en) * 2023-02-15 2023-03-21 杭州悦数科技有限公司 Data query optimization method, system and device of graph database

Also Published As

Publication number Publication date
CN108804473A (en) 2018-11-13
CN108804473B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2018201916A1 (en) Data query method, device, and database system
Hilprecht et al. Deepdb: Learn from data, not from queries!
CN109241093B (en) Data query method, related device and database system
Li et al. opengauss: An autonomous database system
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
Kumar et al. Learning generalized linear models over normalized data
Sun et al. Learned cardinality estimation: A design space exploration and a comparative evaluation
US20190147000A1 (en) Systems and methods for performing search and retrieval of electronic documents using a big index
Wu et al. Sampling-based query re-optimization
Hose et al. WARP: Workload-aware replication and partitioning for RDF
Kipf et al. Estimating cardinalities with deep sketches
US20120117054A1 (en) Query Analysis in a Database
JP2017157229A (en) Scalable analysis platform for semi-structured data
US20140156633A1 (en) Scalable Multi-Query Optimization for SPARQL
Yin et al. Robust query optimization methods with respect to estimation errors: A survey
US9110949B2 (en) Generating estimates for query optimization
US10942923B1 (en) Deep learning for optimizer cardinality estimation
EP3929763B1 (en) Database access methods and apparatuses
US11461333B2 (en) Vertical union of feature-based datasets
US20220365933A1 (en) Database processing method and apparatus
US20180365294A1 (en) Artificial intelligence driven declarative analytic platform technology
WO2019120093A1 (en) Cardinality estimation in databases
Kipf et al. Estimating filtered group-by queries is hard: Deep learning to the rescue
EP3940547B1 (en) Workload aware data partitioning
WO2020192542A1 (en) Query processing using logical query steps having canonical forms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18794618

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18794618

Country of ref document: EP

Kind code of ref document: A1