CN115563150B

CN115563150B - Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Info

Publication number: CN115563150B
Application number: CN202211538051.0A
Authority: CN
Inventors: 栗征征; 周明伟; 钱浩东; 舒凡; 占文平; 柳杨
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-04-18
Anticipated expiration: 2042-12-02
Also published as: CN115563150A

Abstract

The application discloses a method, equipment and storage medium for mapping Hive SQL and execution engine DAG. The mapping method of Hive SQL and execution engine DAG comprises the following steps: acquiring an SQL sentence to be mapped; performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule to obtain at least one SQL task; performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-stage; wherein each keyword corresponds to a first sub-stage; and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG. By the scheme, the efficiency and the accuracy of analyzing the specific SQL sentences by the user can be improved.

Description

Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Technical Field

The present application relates to the field of data analysis and processing technologies, and in particular, to a method, a device, and a storage medium for mapping Hive SQL and execution engine DAG.

Background

Apache Hive is a data warehouse architecture component built On a Hadoop file system, and is also a data warehouse based On an OLAP (On Line Analysis Process). A series of ETL tasks can be completed on the mass data in the data warehouse to carry out analysis and query, and the analysis and query results are used for decision analysis of a management layer.

With the increasing data volume and the increasing complexity of scenes, the performance requirement on Hive is also higher, and therefore, the monitoring, statistics and analysis of Hive tasks become more and more important. However, hive, as an analysis engine, does not have an execution capability, and generally translates SQL (structured query language) into tasks that can be recognized by the execution engine, such as spark tasks, so when analyzing Hive SQL tasks, it is necessary to first correspond each keyword of SQL to each phase of spark tasks, and then analyze SQL tasks according to spark task information counted by the execution engine. For example, when the reason that the execution efficiency of the SQL is low is analyzed, the stages of the SQL and the spark task need to be corresponding to each other, and the SQL problem needs to be inferred according to the execution condition of the spark task, such as information about time consumed by task execution, data amount processed by a single task, and the like.

Therefore, it can be seen that, statistical analysis of Hive tasks needs to depend on the DAG (Directed Acyclic Graph) as an execution engine, and analysis of Hive tasks is a relatively large test for users, requires users to know the execution engine to some extent, and is not efficient in analysis.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a method, equipment and storage medium for mapping Hive SQL and execution engine DAG, and the efficiency and accuracy of analyzing specific SQL sentences by a user can be improved.

In order to solve the above problem, a first aspect of the present application provides a mapping method for Hive SQL and a DAG of an execution engine, where the mapping method includes: acquiring an SQL statement to be mapped; performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule to obtain at least one SQL task; performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-stage; wherein each keyword corresponds to a first sub-stage; and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.

The step of performing keyword analysis on each SQL task and dividing the SQL task into at least one first sub-phase includes: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword; and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.

After the step of performing keyword analysis on each SQL task and generating the operation tree corresponding to the SQL statement to be mapped, the method further includes: sequentially traversing each node of the operation tree based on the key words; and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.

The task analysis or/and division of the SQL statement to be mapped by using the grammar rule to obtain at least one SQL task comprises the following steps: and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.

Wherein the step of establishing a mapping relationship between each first sub-phase and a second sub-phase of the execution engine DAG, which executes each first sub-phase, respectively, includes: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG; if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.

Wherein the step of detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG comprises: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.

Wherein the mapping method further comprises: receiving an instruction for inquiring the mapping relation; and displaying the mapping relation.

Wherein the mapping method further comprises: recording the execution time of the first sub-stage corresponding to the execution of each second sub-stage; and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.

In order to solve the above problem, a second aspect of the present application provides an electronic device, where the electronic device for positioning the sound source location includes a processor and a memory connected to each other; the memory is configured to store program instructions, and the processor is configured to execute the program instructions to implement the method for mapping Hive SQL to an execution engine DAG according to the first aspect.

To solve the above problem, a third aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the mapping method between Hive SQL and execution engine DAG of the first aspect.

The invention has the beneficial effects that: different from the prior art, in the Hive SQL and execution engine DAG mapping method, after the SQL statement to be mapped is obtained, the SQL statement to be mapped may be subjected to task analysis or/and division by using the syntax rules to obtain at least one SQL task, then the SQL task is subjected to keyword analysis, and the SQL task is divided into at least one first sub-phase, where each keyword corresponds to one first sub-phase, and thus, a mapping relationship may be established between each first sub-phase of the SQL statement and a second sub-phase of the execution engine DAG, where each first sub-phase executes each first sub-phase. By establishing the mapping between the SQL statement and the DAG of the execution engine, the execution condition of each stage of the execution engine corresponding to the keyword of the SQL statement can be analyzed more intuitively through the mapping relation, and the efficiency and the accuracy of a user on specific SQL analysis can be improved.

Drawings

FIG. 1 is a flowchart illustrating an embodiment of a mapping method for Hive SQL and DAG of an execution engine in the present application;

FIG. 2 is a schematic flowchart of an embodiment of step S13 in FIG. 1;

FIG. 3 is a schematic flow chart diagram of another embodiment of step S13 in FIG. 1;

FIG. 4 is a schematic flow chart illustrating a SQL task partitioning phase in an application scenario of the present application;

FIG. 5 is a flowchart illustrating an embodiment of step S14 in FIG. 1;

FIG. 6 is a schematic flow chart illustrating the establishment of a mapping relationship in an application scenario of the present application;

FIG. 7 is a flowchart illustrating another embodiment of a mapping method for Hive SQL and DAG of an execution engine of the present application;

FIG. 8 is a flowchart illustrating a further embodiment of the method for mapping Hive SQL to DAG of an execution engine according to the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a mapping apparatus of the present application;

FIG. 10 is a schematic block diagram of an embodiment of an electronic device of the present application;

FIG. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

The main task of Hive is to translate the SQL statements into programs that can be recognized by various physical engines, so that the user does not need to program the physical engines, and how Hive converts the SQL statements into the execution plan of the physical engines is transparent to the user. In most cases, the user needs to know the execution flow of SQL in the physical engine to some extent, and Hive cannot provide direct contact currently. Therefore, the application provides a mapping method of Hive SQL and execution engine DAG, and the relation between the execution plan of the physical engine and SQL is established.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a mapping method for Hive SQL and DAG of an execution engine according to the present application. The method for mapping Hive SQL and execution engine DAG in the embodiment comprises the following steps:

step S11: and acquiring the SQL sentence to be mapped.

Step S12: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.

Aiming at the problem that the DAG graph of the execution engine cannot be directly mapped with the specific part of the SQL, the application provides a DAG graph (SQL _ DAG) at the Hive level, and a user can vividly know the execution flow of the SQL in the physical execution engine through the SQL _ DAG. SQL DAG (SQL directed acyclic graph, representing the execution plan of a physical engine) describes the SQL statements submitted by a user, one SQL DAG for each SQL statement. After the SQL statements to be mapped are obtained, task analysis or/and division can be performed on the SQL statements to be mapped by using a grammar rule to obtain at least one SQL task, and it can be understood that if only one task exists in the SQL statements to be mapped, the task division is not required, and only the task analysis is required; in an SQL _ DAG, a layer of queries in an SQL statement is identified by an SQL task (i.e., SQL _ JOB). For example: and a select count from (select from ta group by ta _ 1) tb, wherein the SQL statement has two layers of queries, one SQL _ DAG corresponds to the whole SQL statement, and two SQL _ JOBs correspond to the two layers of queries, wherein SQL _ JOB _0 represents a sub-query (select from group by ta _ 1) tb, and SQL _ JOB _1 represents a parent query select count from (select) from tb.

In an embodiment, the step S12 may specifically include: and performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule of an abstract grammar tree.

The SQL _ JOB is specifically divided according to the Abstract Syntax Tree and is based on the division, so when the AST (Abstract Syntax Tree) is parsed out from the SQL statement to be mapped, the SQL _ JOB of the SQL _ DAG can be divided according to the AST.

Step S13: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.

Specifically, for SQL _ JOB of SQL _ DAG, the SQL _ JOB is further divided into at least one first sub-phase (i.e., SQL _ STAGE) according to the keywords, so that each keyword may correspond to one first sub-phase, for example, select from ta group by ta _1 in the SQL statement to be mapped is used as a spark task, which corresponds to two first sub-phases, one is a group by phase, and the other is another phase, and each first sub-phase corresponds to one keyword.

Step S14: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.

Because the execution engine DAG displays various stages for executing specific tasks, and each stage is taken as a second word stage, that is, each second sub-stage of the execution engine DAG corresponds to each first sub-stage of each SQL task of the SQL statement to be mapped, a mapping relationship can be established between each first sub-stage of the SQL statement to be mapped and each second sub-stage of the execution engine DAG for executing each first sub-stage, and each keyword can correspond to one first sub-stage, the keyword of the SQL statement to be mapped and the execution engine DAG can correspond to each other, so as to count the operation condition of the specific keyword of the SQL statement to be mapped.

According to the scheme, the SQL statement and the DAG of the execution engine are mapped, the execution condition of each stage of the execution engine corresponding to the keyword of the SQL statement can be analyzed visually through the mapping relation, and the efficiency and the accuracy of a user on specific SQL analysis can be improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 specifically includes:

step S131: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; wherein each node of the operation tree represents a keyword.

Step S132: and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.

SQL _ STAGE is divided when the AST (Abstract Syntax Tree) is parsed to generate the OP Tree (Operator Tree), and the STAGE division can be performed more accurately. AST only carries out syntax and semantic analysis on SQL sentences, and cannot directly obtain the operation required by the SQL sentences. When the OP Tree is produced and then is divided into stages, part of keyword information is lost, and keywords corresponding to certain operations in the OP Tree cannot be identified. Therefore, the STAGE division is performed when the AST is analyzed, so that the operations required by the keywords in the SQL sentences to be mapped can be directly obtained, the STAGE division is performed according to the keywords, the information can be ensured not to be lost, and the SQL _ STAGE division is accurately performed on the OP Tree.

In the SQL _ STAGE division, keywords which can cause the execution engine DAG to carry out STAGE division are used as division bases, the OP Tree of the SQL statement is divided into a plurality of independent SQL _ STAGEs, the SQL _ STAGEs are named as SQL _ STAGE _ OP, and the OP represents the types of the keywords. Taking a spark execution engine as an example, when an SQL statement "select from t1 group by name order by name" is executed in the spark execution engine, the spark can be divided into three stages by the keywords of group by and order by, the first stage is a stage for reading data, the second stage is a stage for group by, and the last stage is a stage for order by; when the SQL _ STAGE is divided, the three STAGEs are divided as follows according to the action of each STAGE: the action of the first STAGE is to read data, which is divided into SQL _ STATE _ FROM, the second STAGE is divided into SQL _ STATE _ GROUPBY, and the last STAGE is divided into SQL _ STATE _ ORDERBY.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating another embodiment of step S13 in fig. 1. In an embodiment, the step S13 specifically includes:

step S131a: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; wherein each node of the operation tree represents a keyword.

Step S132a: and traversing all nodes of the operation tree in sequence based on the keywords.

Step S133a: and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.

Step S134a: and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.

Referring to fig. 4, fig. 4 is a schematic flow diagram of the SQL task partitioning STAGE in an application scenario of the present application, and when an AST is resolved from an SQL statement, SQL _ JOB may be partitioned from the AST, and when an OP Tree is generated by resolving the AST, SQL _ STAGE may be partitioned. Regarding the division of the SQL _ STAGE of the current SQL _ JOB, first, the current SQL _ JOB and the root OP of the current SQL _ JOB (i.e. the root node, the node of the OP Tree) are obtained, where each node of the OP Tree expresses an execution process, and the SQL _ STAGE of the current SQL _ JOB is initialized to be SQL _ STAGE _ FROM. Then, traversing OP Tree of the current SQL _ JOB, and judging whether the current node needs SQL _ STAGE division; if the current node does not need to be divided into SQL _ STATEs, mapping the current node to the current SQL _ STATEs; if the current keyword is a keyword needing SQL _ STAGE division, such as group by, order by and the like, dividing a new SQL _ STAGE into SQL _ STAGE _ OP, and establishing mapping with the current SQL _ JOB and the front SQL _ STAGE, wherein the newly divided SQL _ STAGE and the current SQL _ JOB are in a relationship of belonging and a relationship of relying on the front SQL _ STAGE, namely the newly divided SQL _ STAGE belongs to the current SQL _ JOB and depends on the front SQL _ STAGE; and carrying out SQL _ STAGE division on all nodes of the OP Tree of the current SQL _ JOB according to the mode until the current SQL _ JOB is ended. And then, if the residual SQL _ JOB exists, repeating the steps to finish the SQL _ STAGE division of the residual SQL _ JOB until the division is finished.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating an embodiment of step S14 in fig. 1. In an embodiment, the step S14 specifically includes:

step S141: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG.

In an embodiment, the step S141 may specifically include: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.

Step S142: if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.

After the SQL _ DAG divided by SQL _ STAGEs is generated, a mapping relationship needs to be established with the execution engine DAG, and the mapping relationship may change due to the Hive optimization rule, and a situation of 1 to n occurs between 1 STAGE of SQL _ STAGE and the execution engine DAG. Taking the spark execution engine as an example, the SQL statement "select x from t1 order by name" has no corresponding rule optimization, so the resulting SQL _ STAGE and spark execution engine DAG are in a 1-to-1 relationship; when the skew join optimization is started, if a certain key value is larger than a set threshold value, the spark execution engine divides the group by into two STAGEs, and the SQL _ STAGE _ GROUPBY generated by the SQL statement and the DAG of the spark execution engine are in a relationship of 1 to 2.

Referring to fig. 6, fig. 6 is a schematic flow chart illustrating the establishment of the mapping relationship in an application scenario of the present application, and when performing the subsequent rule optimization and generating the execution engine DAG, the relationship between SQL _ JOB, SQL _ STAGE, and OP Tree is retained. When the OP Tree changes, the new OP Tree can inherit the original mapping relationship. When the SQL _ STAGE is changed, the new SQL _ STAGE can inherit the original mapping relationship, and the mapping relationship can deal with the cases of 1 to 1 and 1 to many. And when 1-to-many change occurs, adding sub SQL _ STAGEs to the changed SQL _ STAGE, wherein the number of the sub SQL _ STAGEs corresponds to the number of the new STAGEs and the sub SQL _ STAGEs are numbered. For example: when the SQL statement "select from t1 group by name order by name" opens the skew join optimization, group by is two STAGEs in the spark execution engine DAG, and then SQL _ STAGE _ GROUPBY adds two sub SQL _ STAGEs, namely SQL _ STAGE _ GROUPBY _0 and SQL _ STAGE _ GROUPBY _1.

According to the method and the device, the corresponding relation between the DAG and the SQL statement of the execution engine is established step by step according to the AST and the OP Tree, the corresponding process does not need human participation, the influence of human subjective factors on the analysis of the SQL statement is reduced, and the accuracy and the efficiency of analyzing the SQL statement are improved; meanwhile, the complexity of the specific SQL statement analysis is simplified, and the difficulty in understanding the SQL task is reduced. And the AST and OP Tree based on the Hive compiling process carry out SQL _ STAGE division, so that the scheme has expandability, and when a user needs to increase supportable physical engines, only the processing logic of the corresponding physical engines needs to be increased. In addition, when the Hive compiles the SQL statement, mapping is established according to the compilation product, and external components are not needed, so that a user does not need to additionally increase the maintenance cost of the Hive.

Referring to fig. 7, fig. 7 is a schematic flowchart illustrating another embodiment of a mapping method for Hive SQL and execution engine DAG according to the present application. The method for mapping Hive SQL and execution engine DAG in the embodiment comprises the following steps:

step S71: and acquiring the SQL sentence to be mapped.

Step S72: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.

Step S73: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.

Step S74: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.

Steps S71 to S74 in this embodiment are substantially the same as steps S11 to S14 in the foregoing embodiment of the mapping method for Hive SQL and execution engine DAG of this application, and are not described herein again.

The difference from the foregoing embodiment is that the mapping method of Hive SQL and execution engine DAG in this embodiment may further include the following steps:

step S75: and receiving an instruction for inquiring the mapping relation.

Step S76: and displaying the mapping relation.

After the mapping relationship between each first sub-phase of the SQL statement to be mapped and the second sub-phase of each first sub-phase of the execution engine DAG is established, the mapping relationship can be visualized by displaying the mapping relationship, so that a user can vividly know the execution flow of the SQL statement to be mapped in the physical execution engine.

Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a mapping method of Hive SQL and DAG for an execution engine according to another embodiment of the present application. The mapping method of Hive SQL and execution engine DAG in the embodiment comprises the following steps:

step S81: and acquiring the SQL sentence to be mapped.

Step S82: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.

Step S83: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.

Step S84: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.

Steps S81 to S84 in this embodiment are substantially the same as steps S11 to S14 in the foregoing embodiment of the mapping method of Hive SQL and execution engine DAG in this application, and are not described here again.

step S85: and recording the execution time of the first sub-stage corresponding to the execution of each second sub-stage.

Step S86: and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.

By recording the execution time of the first sub-phase corresponding to the execution of each second sub-phase, the execution condition of a single SQL task in the SQL sentence to be mapped can be determined, the execution efficiency of a specific single SQL task is reflected, and the efficiency and the accuracy of the analysis of the specific SQL task can be effectively improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a mapping apparatus according to an embodiment of the present application. The mapping apparatus 90 in this embodiment includes an obtaining module 900, a first parsing and dividing module 902, a second parsing and dividing module 904, and a mapping relationship establishing module 906 that are connected to each other; the obtaining module 900 is configured to obtain an SQL statement to be mapped; the first parsing and dividing module 902 is configured to perform task parsing or/and dividing on the SQL statement to be mapped by using a syntax rule to obtain at least one SQL task; the second parsing and dividing module 904 is configured to perform keyword parsing on each SQL task, and divide the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage; the mapping relationship establishing module 906 is configured to respectively establish a mapping relationship between each first sub-phase of the SQL statement to be mapped and a second sub-phase of the execution engine DAG, where each first sub-phase is executed.

In an embodiment, the second parsing and dividing module 904 performs keyword parsing on each SQL task, and divides the SQL task into at least one first sub-phase, which specifically includes: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword; and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.

In an embodiment, after the step of performing the keyword analysis on each SQL task and generating the operation tree corresponding to the SQL statement to be mapped, the second parsing and dividing module 904 is further configured to: sequentially traversing each node of the operation tree based on the key words; and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.

In an embodiment, the first parsing and dividing module 902 performs a step of performing task parsing or/and dividing on the SQL statement to be mapped by using a syntax rule to obtain at least one SQL task, which specifically includes: and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.

In an embodiment, the mapping relationship establishing module 906 performs the step of respectively establishing a mapping relationship between each first sub-phase of the SQL statement to be mapped and a second sub-phase of the execution engine DAG, where each first sub-phase is executed, and specifically includes: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG; if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.

In an embodiment, the mapping relationship establishing module 906 performs the step of detecting whether the first sub-phase of each SQL task corresponds to multiple second sub-phases of the execution engine DAG, which may specifically include: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.

In an embodiment, the mapping apparatus 90 further includes a display module (not shown) for displaying the mapping relationship after receiving an instruction for querying the mapping relationship.

In an embodiment, the mapping apparatus 90 further includes a determining module (not shown), where the determining module is configured to record an execution time of the first sub-phase corresponding to the execution of each second sub-phase, and determine, based on the execution time, an execution efficiency of each SQL task of the SQL statement to be mapped.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an electronic device of the present application. The electronic device 100 in the present embodiment includes a processor 102 and a memory 101 connected to each other; the memory 101 is used for storing program instructions, and the processor 102 is used for executing the program instructions stored in the memory 101, so as to implement the steps of any one of the above-mentioned mapping method embodiments of Hive SQL and execution engine DAG. In a specific implementation scenario, the electronic device 100 may include, but is not limited to: microcomputer, server.

In particular, the processor 102 is configured to control itself and the memory 101 to implement the steps of any of the above-described embodiments of the mapping method of Hive SQL to execution engine DAG. Processor 102 may also be referred to as a CPU (Central Processing Unit). The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 102 may be commonly implemented by integrated circuit chips.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 110 of the present application has stored thereon program instructions 1100, and when executed by a processor, the program instructions 1100 implement the steps in any of the above-described embodiments of the method for mapping Hive SQL to execution engine DAG.

The computer-readable storage medium 110 may be a medium that can store the program instructions 1100, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may be a server that stores the program instructions 1100, and the server may transmit the stored program instructions 1100 to another device for execution, or may execute the stored program instructions 1100 by itself.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. For example, the above-described apparatus and device embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Claims

1. A method for mapping Hive SQL and execution engine DAG is characterized in that the method for mapping comprises the following steps:

acquiring an SQL statement to be mapped;

using a grammar rule to carry out task analysis or/and division on the SQL sentence to be mapped to obtain at least one SQL task;

performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage;

establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG;

the step of establishing a mapping relationship between each first sub-phase and a second sub-phase of the execution engine DAG, where the execution engine DAG executes each first sub-phase, includes:

if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage;

and if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.

2. The mapping method according to claim 1, wherein the step of performing keyword parsing on each SQL task and dividing the SQL task into at least one first sub-phase comprises:

performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword;

and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.

3. The mapping method according to claim 2, wherein after the step of performing keyword analysis on each SQL task to generate the operation tree corresponding to the SQL statement to be mapped, the method further comprises:

sequentially traversing each node of the operation tree based on the key words;

and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.

4. The mapping method according to claim 1, wherein the task parsing or/and partitioning the SQL statement to be mapped by using the syntax rules to obtain at least one SQL task comprises:

and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.

5. The mapping method according to claim 1, wherein the mapping method further comprises:

receiving an instruction for inquiring the mapping relation;

and displaying the mapping relation.

6. The mapping method according to claim 1, wherein the mapping method further comprises:

recording the execution time of the first sub-phase corresponding to the execution of each second sub-phase;

and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.

7. An electronic device, characterized in that the electronic device comprises a processor and a memory connected to each other;

the memory is to store program instructions, the processor to execute the program instructions to implement the Hive SQL to execution engine DAG mapping method of any of claims 1-6.

8. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of mapping Hive SQL and execution engine DAG according to any of claims 1 to 6.