CN116521710A - Data query method, device, electronic equipment and computer readable storage medium - Google Patents
Data query method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN116521710A CN116521710A CN202310492650.1A CN202310492650A CN116521710A CN 116521710 A CN116521710 A CN 116521710A CN 202310492650 A CN202310492650 A CN 202310492650A CN 116521710 A CN116521710 A CN 116521710A
- Authority
- CN
- China
- Prior art keywords
- query
- sub
- data
- statement
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000003860 storage Methods 0.000 title claims abstract description 19
- 238000006243 chemical reaction Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 13
- 238000001914 filtration Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/2445—Data retrieval commands; View definitions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a data query method, a data query device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers. In the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by inquiring the sub-operation according to the sub-operation statement; and determining a target query topological relation for querying target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data query method, a data query device, an electronic device, and a computer readable storage medium.
Background
In modern database systems, optimizers are an essential and important component. Because the structured query language (Structured Query Language, SQL) query executed by the database is declarative, the SQL instructions describe only what data needs to be retrieved and do not describe in what way to specifically retrieve the data. Typically, when querying data, the database has different execution modes to query to obtain the target data, so an optimizer needs to be relied on inside the database to select which mode is the most efficient.
For a user of the database, the optimizer is a black box, and the optimizer cannot provide internal implementation details for the user, so that it is difficult for a developer to know the optimization details, for example, it is difficult to know the actual execution steps of each mode and the amount of data operated, so that the mode selected by the optimizer cannot be evaluated, and thus it is difficult to determine the optimal mode, and the data query efficiency is affected.
Disclosure of Invention
The object of the present application is to solve at least one of the above technical drawbacks, and in particular, the technical drawback that an optimal query method for querying data cannot be determined.
According to one aspect of the present application, there is provided a data query method, the method comprising:
receiving an original query statement of query target data, and determining at least one query topological relation corresponding to the original query statement;
analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation;
determining a first data volume operated by the query sub-operation according to the sub-operation statement;
and determining a target query topological relation for querying the target data according to the first data volume.
Optionally, after the analyzing the query topological relation to obtain the query sub-operation corresponding to the query topological relation, the method further includes:
acquiring an estimated data quantity corresponding to the query sub-operation; wherein the estimated data volume comprises an estimated value of the data volume operated by the query sub-operation;
and determining estimation accuracy according to the estimated data quantity and the first data quantity.
Optionally, the obtaining the estimated data volume corresponding to the query sub-operation includes:
and acquiring the estimated data quantity corresponding to the query sub-operation through a query interface and a preset query statement.
Optionally, at least two query sub-operations corresponding to the query topological relation,
the determining the sub-operation statement corresponding to the query sub-operation includes:
determining the operation sequence of the query sub-operation, and determining a conversion template corresponding to the operation sequence;
and generating the sub-operation sentences based on the corresponding relation between the query operation and the operation sentences in the conversion template.
Optionally, the method further comprises:
and combining at least two query sub-operations under the condition that the operation sequence is inconsistent with the sequence indicated by the conversion template.
Optionally, determining, according to the sub-operation statement, a first data amount corresponding to the query sub-operation includes:
and inquiring the database through the sub-operation statement to obtain the first data volume.
Optionally, the estimated data volume includes an estimated data line number operated by the query sub-operation;
the first data volume includes an actual number of data lines operated on by the query sub-operation.
According to another aspect of the present application, there is provided a data query apparatus, the apparatus comprising:
the receiving module is used for determining at least one query topological relation corresponding to the original query statement by receiving the original query statement of the query target data;
The generation module is used for analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation and determining a sub-operation statement corresponding to the query sub-operation;
the first determining module is used for determining a first data volume operated by the inquiring sub-operation according to the sub-operation statement;
and the second determining module is used for determining a target query topological relation for querying the target data according to the first data volume.
According to another aspect of the present application, there is provided an electronic device including:
a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the data query method according to any of the first aspects of the present application.
For example, in a third aspect of the present application, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the data query method according to the first aspect of the present application.
According to a further aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the data query method of any of the first aspects of the present application.
For example, in a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data query method shown in the first aspect of the present application.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the methods provided in the various alternative implementations of the first aspect described above.
The beneficial effects that this application provided technical scheme brought are:
in the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flow chart of a data query method according to an embodiment of the present application;
fig. 2 is one of application scenario diagrams of a data query method provided in an embodiment of the present application;
FIG. 3 is a second application scenario diagram of a data query method according to an embodiment of the present disclosure;
FIG. 4 is a third application scenario diagram of a data query method according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of an application scenario of a data query method according to an embodiment of the present application;
fig. 6 is a fifth application scenario schematic diagram of a data query method according to an embodiment of the present application;
fig. 7 is a schematic diagram of an application scenario of a data query method according to an embodiment of the present application;
FIG. 8 is a second flow chart of a data query method according to the embodiment of the present application;
fig. 9 is a seventh application scenario diagram of a data query method according to an embodiment of the present application;
FIG. 10 is a third flow chart of a data query method according to the embodiment of the present application;
Fig. 11 is an eighth application scenario diagram of a data query method according to an embodiment of the present application;
fig. 12 is a ninth application scenario schematic diagram of a data query method provided in the embodiment of the present application;
fig. 13 is a schematic diagram of an application scenario of a data query method according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a data query device according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device for data query according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present application. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In modern database systems, optimizers are an essential and important component. Because the structured query language (Structured Query Language, SQL) query executed by the database is declarative, the SQL instructions describe only what data needs to be retrieved and do not describe in what way to specifically retrieve the data. Typically, a database is obtained in different implementations for a desired piece of data, and thus internally requires reliance on optimizers to select which of these is most efficient.
For the user of the database, the optimizer is a black box, and the optimizer cannot provide internal implementation details to the user, so that it is difficult for the developer to know the optimization details, for example, it is difficult to know the actual execution steps of each mode and the amount of data operated, so that the mode selected by the optimizer cannot be evaluated, and thus it is difficult to determine the optimal mode.
The message data processing method, device, electronic equipment and computer readable storage medium provided by the application aim to solve the technical problems in the prior art.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The scheme provided by the embodiment of the application can be executed by any electronic device, such as a terminal device or a server, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the terminal and the server may include a database.
For the technical problems in the prior art, the data query method, the device, the electronic equipment and the storage medium provided by the application aim to solve at least one of the technical problems in the prior art.
The embodiment of the present application provides a possible implementation manner, as shown in fig. 1, a flowchart of a data query method is provided, where the method may be executed by any electronic device, alternatively, the method may be applied to a server side or a terminal device, and for convenience of description, the method provided in the embodiment of the present application is described below with a server as an execution body, where the execution body may be a processing module in a database installed by the server.
The embodiment of the application can be applied to the technical field of databases; determining at least one query topological relation corresponding to an original query statement by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
Specifically, as shown in fig. 1, the data query method in the embodiment of the present application may include the following steps:
s101: and receiving an original query statement of query target data, and determining at least one query topological relation corresponding to the original query statement.
Optionally, the embodiment of the application may be applied to an application scenario of querying data in a database.
Wherein the target data comprises data in a queried database. As an example one, the target data may be data having a specific data attribute; for example, in an actual implementation scenario, the data stored in the database is employee information data, and the target data queried may be name data of an employee whose age is 30 years. As an example two, the target data may be data satisfying a preset condition; for example, in an actual implementation scenario, the data stored in the database is student examination data, the queried target data may be the number of students with a score of greater than 80 and a score of greater than 75 in language, and so on. The above is merely an example, and the target data may be selected and determined according to an actual application scenario, which is not limited in the embodiment of the present application.
The original query statement is a query statement for querying the target data. For example, the query statement may be a statement written in a structured query language (Structured Query Language, SQL); in some implementations, the query statement may also be a statement written in other query languages.
After receiving the original query statement, determining at least one query topological relation corresponding to the original query statement according to the original query statement.
The query topology may be used to indicate a query manner and/or a query step of querying the target data. It will be appreciated that in an actual scenario, there may be multiple query ways to query the target data, each of which may include different query steps.
In an actual implementation, the query step may be an operation in a relationship or a relationship algebra, for example, the operation may be an operation in an SQL query statement, and in the database field, the operation may be referred to as an operator. And connecting a plurality of operators included in each query mode according to the corresponding query sequence to form the query topological relation. The query topology may be represented as an "operator tree"; in an actual scenario, an "operator tree" is the entirety of a tree structure made up of "operators".
As an example one, the operators may include the following four:
operator one: the list is read and may be referred to as Scan hereinafter. In operator trees, the Scan typically has no "child" operators, i.e., no branches following the Scan. When applied, scan (t 1) represents reading data in the t1 data table.
Operator two: filtration, which may be referred to as Select hereinafter. In an operator tree, a Select typically has and only has one "child" operator, i.e., there is only one branch after a Select. Select may be used to indicate a filtering condition. For example, select (b×c > 100) represents selecting a row of data from its "child" operators that corresponds to b×c >100 when applied.
Operator three: polymerization, which may be referred to as Aggregate hereinafter. In the operator tree, aggregate typically has and only has one "child" operator, i.e., aggregate is followed by and only has one branch. Aggregate is used to characterize the Aggregate function, grouping column. For example, in application, aggregate (avg (a), sum (b)) represents grouping by column c, and calculating the average value of column a and the sum of column b for each group.
And (3) an operator IV: the connection may be referred to as Join hereinafter. In an operator tree, a Join typically has and only has two "child" operators, i.e., a Join is followed by only two branches. Join is used to characterize the connection conditions. For example, join (R.a = S.b) represents a cartesian product of R and S calculated, and a data line is selected that matches R.a = S.b.
As an example two, referring to fig. 2, fig. 2 is a schematic diagram of an operator tree, where the operator tree includes 4 operators, which are Scan and Scan according to the arrow sequence; join; the sequence of the query steps in the query topology corresponding to the Select, i.e. the operator tree, is respectively the read data table t1 (t 2), the connection t1.a=t2.a (the data row satisfying t1.a=t2.a is selected), and the filtering b×c >100 (the data row satisfying b×c >100 is selected).
When the query topological relation is determined, all query topological relations corresponding to the original query statement can be enumerated according to the original query statement. For example, in actual implementation, the original query statement may be transformed and morphed to generate the query topological relationship.
S102: analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation.
After the query topological relation is determined, the query topological relation can be analyzed and processed to obtain a query sub-operation corresponding to the query topological relation. The query sub-operation includes an operation of querying the target data, that is, an operation corresponding to each operator, that is, the query sub-operation may include operations of reading a table, filtering, aggregating, connecting, and the like.
Still in combination with the above example two, the query topology (i.e., the operator tree) of fig. 2 includes Scan, respectively; join; select four operators, so that the analysis processing of the query topology relationship may obtain four query sub-operations of a read data table t1, a read data table t2, a connection t1.a=t2.a (a data row satisfying t1.a=t2.a is selected), and a filtering b×c >100 (a data row satisfying b×c >100 is selected).
After the query sub-operation is obtained, a sub-operation statement corresponding to the query sub-operation can be determined.
Specifically, the sub-operation statement is a query statement of each query sub-operation, and in an actual scene, the sub-operation statement may be an SQL statement; in some implementations, the query statement may also be a statement written in other query languages.
When the sub-operation sentence is determined, the sub-operation sentence may be received or acquired, for example, the sub-operation sentence may be acquired from a storage space storing the operation sentence, or the like. In addition, the sub-operation statement may be generated, for example, according to a preset conversion template. Optionally, in some implementation scenarios, the conversion template may include a correspondence between a query sub-operation and a sub-operation statement, and when the sub-operation statement needs to be generated, the conversion template may be queried to determine the sub-operation statement corresponding to the query sub-operation.
S103: and determining a first data volume operated by the query sub-operation according to the sub-operation statement.
After determining the sub-operation statement, a first amount of data operated upon by the query sub-operation may be determined. The first data volume includes an actual data volume operated by the query sub-operation, where the actual data volume is a data volume operated when the query sub-operation is executed. In an actual scenario, the actual data amount may be an actual number of data lines operated upon in performing the query sub-operation. For example, taking a table reading operation as an example, the first data amount corresponding to Scan (t 1) is the number of data lines read when the t1 data table is read; taking filtering operation as an example, the first data size corresponding to the filtering b×c >100 is the number of data lines screened when the data conforming to the filtering b×c >100 is screened.
In an actual implementation scenario, when the first data size is determined, the sub-operation statement may be input to a database for query, so as to obtain the first data size.
S104: and determining a target query topological relation for querying the target data according to the first data volume.
After determining the first data amount operated by the query sub-operation, the total data amount corresponding to the query topological relation can be determined, namely the total data amount is the sum of the first data amounts of all the query sub-operations included by the query topological relation.
Still in combination with the above example two, the query topology (i.e., the operator tree) of fig. 2 includes Scan, respectively; join; select four operators. Wherein, the first data size corresponding to Scan (t 1) is 1000 lines; the first data amount corresponding to Scan (t 2) is 1000 lines; the first data amount corresponding to Join (t1. A=t2. A) is 7000 rows; the first data size corresponding to Select (b x c > 100) is 300 rows; then the total data amount is 1000+1000+7000+300=9300 rows.
After the total data volume is determined, a target query topological relation for querying the target data can be determined according to the total data volume. In some alternative implementations, the query topology with the smallest total data amount may be selected as the target query topology. It can be understood that the total data volume is the smallest, that is, the data volume operated when the target data is queried through the query topological relation is the smallest, so that the query time can be saved and the query efficiency can be improved by querying the target data in the query mode corresponding to the query topological relation.
In the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
In an embodiment of the present application, after the parsing process is performed on the query topological relation to obtain a query sub-operation corresponding to the query topological relation, the method further includes:
acquiring an estimated data quantity corresponding to the query sub-operation; wherein the estimated data volume comprises an estimated value of the data volume operated by the query sub-operation;
And determining estimation accuracy according to the estimated data quantity and the first data quantity.
In this embodiment of the present application, after obtaining the query sub-operation corresponding to the query topological relation, the estimated data volume corresponding to the query sub-operation may also be obtained, and optionally, the estimated data volume may be an estimated data line number of the data volume operated by the query sub-operation.
In one embodiment of the present application, the obtaining the estimated data volume corresponding to the query sub-operation includes:
and acquiring the estimated data quantity corresponding to the query sub-operation through a query interface and a preset query statement.
Specifically, the query interface may be a newly added query interface, for example, the query interface may be an SQL interface of a database, and the preset query statement may be a TRACE statement.
As an example, in connection with fig. 3, four operators are included in the operator tree shown in fig. 3: scan (t 1); scan (t 2); join (t1.a=t2.a); aggregate sum (t 1. C); and t1.B. Wherein, the estimated data quantity corresponding to Scan (t 1) is 1000 lines; the estimated data amount corresponding to Scan (t 2) is 1000 lines; the estimated data amount corresponding to Join (t1. A=t2. A) is 60000 lines; aggregate sum (t 1. C); the estimated data amount corresponding to t1.B is 300 lines.
After the estimated data amount is obtained, an estimation accuracy can be determined according to the estimated data amount and the first data amount. In a practical scenario, an estimation error may also be determined from the estimated data amount and the first data amount, e.g. the estimation error may be denoted as p-error. The specific calculation mode is as follows:
when the first data amount is smaller than the estimated data amount: p-error=1- (first data amount/estimated data amount); when the estimated data amount is greater than the first data amount: p-error= (estimated data amount/first data amount) -1 (when dividing by 0, add 1 to each of the numerator denominators and then do the above calculation); the higher the absolute value of p-error, the larger the error, positive number representing overestimation and negative number representing underestimation.
In this embodiment of the present application, the accuracy of estimating the estimated data amount by the estimation module may be determined by determining the estimated data amount corresponding to the query sub-operation and then according to the estimated data amount and the first data amount.
In one embodiment of the present application, at least two of the query sub-operations corresponding to the query topology,
the determining the sub-operation statement corresponding to the query sub-operation includes:
Determining the operation sequence of the query sub-operation, and determining a conversion template corresponding to the operation sequence;
and generating the sub-operation sentences based on the corresponding relation between the query operation and the operation sentences in the conversion template.
In this embodiment of the present application, the query topological relation may include a plurality of query sub-operations, and when generating the sub-operation statement, a conversion template corresponding to the operation sequence may be determined according to the operation sequence of the query sub-operations.
For example, in the actual implementation process, it may be determined whether the operation sequence of the query sub-operation accords with the preset sequence corresponding to the conversion template, and when the operation sequence accords with the preset sequence, the sub-operation statement may be generated based on the correspondence between the query operation and the operation statement in the conversion template.
As an example, taking the preset order corresponding to the conversion template as "Scan-Join-Select-Aggregate-Select" as an example, in conjunction with fig. 4, it may be first determined whether the operators (i.e., the query sub-operations) in the operator tree shown in fig. 4 conform from bottom to top to the order of the operators contained in "Scan-Join-Select-Aggregate-Select" (where operators other than Scan may be absent, but no unnecessary or different order of operators may exist). If the sequence is met, the sub-operation statement can be generated based on the corresponding relation between the query operation and the operation statement in the conversion template. (the operators shown in FIG. 4 conform to the order of the operators contained in "Scan-Join-Select-Aggregate-Select").
For example, when generating the sub-operation sentence, for Scan and Join, it may be converted into a FROM clause and an ON clause. When the related data table is converted into the FROM clause, the names of the data table are connected by JOIN; the connection condition may be converted into an expression after the ON clause. For Select before Aggregate, it can be converted into a WHERE clause in SQL, the filtering condition in the operator is restored into the corresponding SQL text, and the corresponding SQL text is placed into a WHERE keyword. For Aggregate, it can be converted into a SELECT clause and a GROUP BY clause in SQL, the aggregation function and the grouping column are restored into corresponding SQL text, the aggregation function is added into the expression list after the SELECT key, and the grouping column is directly put into the GROUP BY key. For Select after Aggregate, it can be converted to a HAVING clause, etc.
In one embodiment of the present application, at least two of the query sub-operations are combined in the event that the order of operations is not consistent with the order indicated by the conversion template.
As an example, when the operators in the operator tree (i.e., the query sub-operations) do not conform to the order of "Scan-Join-Select-agate-Select" from bottom to top, the portions that conform to the order may be selected, and the portions that conform to the order may be combined first as a new Scan operator and converted into an SQL query statement. Because any operator following the Scan operator accords with the preset sequence corresponding to the conversion template, for the subsequent remaining operators, the sub-operation statement can be generated based on the corresponding relation between the query operation and the operation statement in the conversion template.
Referring to fig. 5, the original operator tree shown in fig. 5 includes two branches, where the operator sequence of the right branch is Scan, aggregate, select, join, select; the operator sequence does not conform to the preset sequence of the Scan-Join-Select-Aggregate-Select. Since any operator following the Scan operator accords with the preset sequence corresponding to the conversion template, the "Scan, aggregate, select" in the right branch accords with the preset sequence, so that the "Scan, aggregate, select" can be combined to be used as a new Scan operator, namely the scan_tmp operator shown in fig. 6, so that a new operator tree after conversion is obtained, the operator sequence of the new operator tree is "Scan-Join-Select", and the sub-operation statement can be generated based on the corresponding relation between the query operation and the operation statement in the conversion template according to the preset sequence. After the sub-operation statement is generated, the estimated data quantity, operator type, sub-operation statement and the like corresponding to each operator can be output according to a table form, and the method is specifically shown in fig. 7.
In summary, by determining the operation sequence of the query sub-operation, determining the conversion template corresponding to the operation sequence, and generating the sub-operation statement based on the corresponding relationship between the query operation and the operation statement in the conversion template, the sub-operation statement corresponding to each operator can be generated conveniently, and the generation efficiency of the sub-operation statement is improved.
The overall implementation scenario and flow of the embodiments of the present application are described below with reference to fig. 8 to 13:
as shown in fig. 8, first, step D1, importing a data set into a database, and preparing for executing a query sentence; step D2, adding TRACE to the queried statement, sending the TRACE to a database, and obtaining all results, namely output sub-operation statements and estimated data line numbers of the sub-operation statements through a new query interface shown in FIG. 9 (wherein the step of outputting the sub-operation statements can be seen in FIG. 10); step D3, executing each sub-operation statement, and recording the corresponding actual data line number; step D4, calculating an estimated error p-error according to the actual data line number and the estimated data line number; step D5, generating a hypertext markup Language (HTML) file of each item of content in the following steps according to the output result; step D5-1, taking absolute values of all p-error, respectively calculating the numerical values of 50 th percentile, 80 th percentile and 95 th percentile and the maximum value, and drawing a table, wherein the table is shown in FIG. 11; step D5-2, drawing the distribution of all p-error into a bar graph, wherein the bar graph is shown in FIG. 12; and D5-3, taking 20 sub-operation sentences with the maximum absolute value of p-error, and listing the sub-operation sentences, and the estimated data line number and the actual data line number thereof. Alternatively, in some embodiments, the process of calculating the estimated error p-error and generating the HTML file described above may be performed by the analysis program in fig. 9, and the form of the generated HTML file is shown in fig. 13.
In the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
The embodiment of the present application provides a data query device, as shown in fig. 14, the data query device 140 may include: a receiving module 1401, a generating module 1402, a first determining module 1403 and a second determining module 1404, wherein,
A receiving module 1401, configured to determine at least one query topological relation corresponding to an original query statement by receiving the original query statement of the query target data;
a generating module 1402, configured to parse the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determine a sub-operation statement corresponding to the query sub-operation;
a first determining module 1403, configured to determine, according to the sub-operation statement, a first data amount operated by the query sub-operation;
a second determining module 1404 is configured to determine a target query topology relationship for querying the target data according to the first data amount.
In one embodiment of the present application, the apparatus further comprises an estimation module for, in the following
After the query topological relation is analyzed and processed to obtain the query sub-operation corresponding to the query topological relation,
acquiring an estimated data quantity corresponding to the query sub-operation; wherein the estimated data volume comprises an estimated value of the data volume operated by the query sub-operation;
and determining estimation accuracy according to the estimated data quantity and the first data quantity.
In an embodiment of the present application, the estimation module is specifically configured to obtain, through a query interface and a preset query statement, an estimated data amount corresponding to the query sub-operation.
In one embodiment of the present application, at least two of the query sub-operations corresponding to the query topology,
the generation module is specifically used for determining the operation sequence of the query sub-operation and determining a conversion template corresponding to the operation sequence;
and generating the sub-operation sentences based on the corresponding relation between the query operation and the operation sentences in the conversion template.
In one embodiment of the present application, the generating module is specifically configured to combine at least two of the query sub-operations when the operation sequence is inconsistent with the sequence indicated by the conversion template.
In an embodiment of the present application, the first determining module is specifically configured to query, through the sub-operation statement, a database to obtain the first data amount.
In one embodiment of the present application, the estimated data amount includes an estimated data line number operated by the query sub-operation;
the first data volume includes an actual number of data lines operated on by the query sub-operation.
The apparatus of the embodiments of the present application may perform the method provided by the embodiments of the present application, and implementation principles of the method are similar, and actions performed by each module in the apparatus of each embodiment of the present application correspond to steps in the method of each embodiment of the present application, and detailed functional descriptions of each module of the apparatus may be referred to in the corresponding method shown in the foregoing, which is not repeated herein.
In the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which, when executed by the processor, performs: in the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
In an alternative embodiment, there is provided an electronic device, as shown in fig. 15, the electronic device 4000 shown in fig. 15 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 15, but not only one bus or one type of bus.
Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 4003 is used for storing application program codes (computer programs) for executing the present application, and execution is controlled by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.
Among them, electronic devices include, but are not limited to: mobile phones, notebook computers, multimedia players, desktop computers, etc.
The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.
In the embodiment of the application, at least one query topological relation corresponding to an original query statement is determined by receiving the original query statement of query target data; analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation; determining a first data volume operated by the query sub-operation according to the sub-operation statement; and determining a target query topological relation for querying the target data according to the first data volume. In this way, a preferred target query topological relation can be selected from a plurality of query topological relations corresponding to the original query statement, for example, the query topological relation with the least data quantity operated when the target data is queried can be selected as the target query topological relation, and the target data can be queried through the target query topological relation, so that the query time can be saved, and the query efficiency can be improved.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in other sequences than those illustrated or otherwise described.
It should be understood that, although the flowcharts of the embodiments of the present application indicate the respective operation steps by arrows, the order of implementation of these steps is not limited to the order indicated by the arrows. In some implementations of embodiments of the present application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The foregoing is merely an optional implementation manner of the implementation scenario of the application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the application are adopted without departing from the technical ideas of the application, and also belong to the protection scope of the embodiments of the application.
Claims (10)
1. A method of querying data, comprising:
receiving an original query statement of query target data, and determining at least one query topological relation corresponding to the original query statement;
analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation, and determining a sub-operation statement corresponding to the query sub-operation;
determining a first data volume operated by the query sub-operation according to the sub-operation statement;
and determining a target query topological relation for querying the target data according to the first data volume.
2. The data query method according to claim 1, wherein after the parsing process is performed on the query topological relation to obtain the query sub-operation corresponding to the query topological relation, the method further comprises:
Acquiring an estimated data quantity corresponding to the query sub-operation; wherein the estimated data volume comprises an estimated value of the data volume operated by the query sub-operation;
and determining estimation accuracy according to the estimated data quantity and the first data quantity.
3. The method of claim 2, wherein the obtaining the estimated data volume corresponding to the query sub-operation includes:
and acquiring the estimated data quantity corresponding to the query sub-operation through a query interface and a preset query statement.
4. The data query method of claim 1, wherein the query topology corresponds to at least two of the query sub-operations,
the determining the sub-operation statement corresponding to the query sub-operation includes:
determining the operation sequence of the query sub-operation, and determining a conversion template corresponding to the operation sequence;
and generating the sub-operation sentences based on the corresponding relation between the query operation and the operation sentences in the conversion template.
5. The data query method of claim 4, wherein the method further comprises:
and combining at least two query sub-operations under the condition that the operation sequence is inconsistent with the sequence indicated by the conversion template.
6. The data query method of claim 1, wherein determining a first amount of data corresponding to the query sub-operation according to the sub-operation statement comprises:
and inquiring the database through the sub-operation statement to obtain the first data volume.
7. The method for querying data according to any one of claims 2 to 6, wherein,
the estimated data quantity comprises the estimated data line number operated by the inquiring sub-operation;
the first data volume includes an actual number of data lines operated on by the query sub-operation.
8. A data query device, comprising:
the receiving module is used for determining at least one query topological relation corresponding to the original query statement by receiving the original query statement of the query target data;
the generation module is used for analyzing the query topological relation to obtain a query sub-operation corresponding to the query topological relation and determining a sub-operation statement corresponding to the query sub-operation;
the first determining module is used for determining a first data volume operated by the inquiring sub-operation according to the sub-operation statement;
and the second determining module is used for determining a target query topological relation for querying the target data according to the first data volume.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the data query method of any of claims 1-10.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the data query method of any of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310492650.1A CN116521710A (en) | 2023-05-04 | 2023-05-04 | Data query method, device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310492650.1A CN116521710A (en) | 2023-05-04 | 2023-05-04 | Data query method, device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116521710A true CN116521710A (en) | 2023-08-01 |
Family
ID=87389905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310492650.1A Pending CN116521710A (en) | 2023-05-04 | 2023-05-04 | Data query method, device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116521710A (en) |
-
2023
- 2023-05-04 CN CN202310492650.1A patent/CN116521710A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110795455B (en) | Dependency analysis method, electronic device, computer apparatus, and readable storage medium | |
CN110633292A (en) | Query method, device, medium, equipment and system for heterogeneous database | |
CN112559554A (en) | Query statement optimization method and device | |
CN110569243B (en) | Data query method, data query plug-in and data query server | |
CN111400387A (en) | Conversion method and device for import and export data, terminal equipment and storage medium | |
CN112579610A (en) | Multi-data source structure analysis method, system, terminal device and storage medium | |
CN112861501A (en) | Report generation method and device, electronic equipment and computer readable storage medium | |
CN112860730A (en) | SQL statement processing method and device, electronic equipment and readable storage medium | |
CN110765750A (en) | Report data entry method and terminal equipment | |
CN116795859A (en) | Data analysis method, device, computer equipment and storage medium | |
CN117971860A (en) | Method and device for generating SQL (structured query language) sentences based on large language model and terminal equipment | |
CN117194501A (en) | DCS trend measurement point jump logic configuration method, system, equipment and medium | |
CN112634004A (en) | Blood margin map analysis method and system for credit investigation data | |
CN115114325B (en) | Data query method and device, electronic equipment and storage medium | |
CN109697234B (en) | Multi-attribute information query method, device, server and medium for entity | |
CN104331517A (en) | Retrieval method and retrieval device | |
CN115757174A (en) | Database difference detection method and device | |
CN116610700A (en) | Query statement detection method and device and storage medium | |
CN111563094A (en) | Data query method and device, electronic equipment and computer-readable storage medium | |
CN116521710A (en) | Data query method, device, electronic equipment and computer readable storage medium | |
CN115718754A (en) | Electronic accounting archive data query method and device and electronic equipment | |
CN112307050B (en) | Identification method and device for repeated correlation calculation and computer system | |
CN114860759A (en) | Data processing method, device and equipment and readable storage medium | |
CN114358596A (en) | Index calculation method and device | |
CN112799638A (en) | Non-invasive rapid development method, platform, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |