CN115563150B - Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current) - Google Patents

Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current) Download PDF

Info

Publication number
CN115563150B
CN115563150B CN202211538051.0A CN202211538051A CN115563150B CN 115563150 B CN115563150 B CN 115563150B CN 202211538051 A CN202211538051 A CN 202211538051A CN 115563150 B CN115563150 B CN 115563150B
Authority
CN
China
Prior art keywords
sql
sub
task
phase
dag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211538051.0A
Other languages
Chinese (zh)
Other versions
CN115563150A (en
Inventor
栗征征
周明伟
钱浩东
舒凡
占文平
柳杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211538051.0A priority Critical patent/CN115563150B/en
Publication of CN115563150A publication Critical patent/CN115563150A/en
Application granted granted Critical
Publication of CN115563150B publication Critical patent/CN115563150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, equipment and storage medium for mapping Hive SQL and execution engine DAG. The mapping method of Hive SQL and execution engine DAG comprises the following steps: acquiring an SQL sentence to be mapped; performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule to obtain at least one SQL task; performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-stage; wherein each keyword corresponds to a first sub-stage; and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG. By the scheme, the efficiency and the accuracy of analyzing the specific SQL sentences by the user can be improved.

Description

Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)
Technical Field
The present application relates to the field of data analysis and processing technologies, and in particular, to a method, a device, and a storage medium for mapping Hive SQL and execution engine DAG.
Background
Apache Hive is a data warehouse architecture component built On a Hadoop file system, and is also a data warehouse based On an OLAP (On Line Analysis Process). A series of ETL tasks can be completed on the mass data in the data warehouse to carry out analysis and query, and the analysis and query results are used for decision analysis of a management layer.
With the increasing data volume and the increasing complexity of scenes, the performance requirement on Hive is also higher, and therefore, the monitoring, statistics and analysis of Hive tasks become more and more important. However, hive, as an analysis engine, does not have an execution capability, and generally translates SQL (structured query language) into tasks that can be recognized by the execution engine, such as spark tasks, so when analyzing Hive SQL tasks, it is necessary to first correspond each keyword of SQL to each phase of spark tasks, and then analyze SQL tasks according to spark task information counted by the execution engine. For example, when the reason that the execution efficiency of the SQL is low is analyzed, the stages of the SQL and the spark task need to be corresponding to each other, and the SQL problem needs to be inferred according to the execution condition of the spark task, such as information about time consumed by task execution, data amount processed by a single task, and the like.
Therefore, it can be seen that, statistical analysis of Hive tasks needs to depend on the DAG (Directed Acyclic Graph) as an execution engine, and analysis of Hive tasks is a relatively large test for users, requires users to know the execution engine to some extent, and is not efficient in analysis.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a method, equipment and storage medium for mapping Hive SQL and execution engine DAG, and the efficiency and accuracy of analyzing specific SQL sentences by a user can be improved.
In order to solve the above problem, a first aspect of the present application provides a mapping method for Hive SQL and a DAG of an execution engine, where the mapping method includes: acquiring an SQL statement to be mapped; performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule to obtain at least one SQL task; performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-stage; wherein each keyword corresponds to a first sub-stage; and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.
The step of performing keyword analysis on each SQL task and dividing the SQL task into at least one first sub-phase includes: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword; and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.
After the step of performing keyword analysis on each SQL task and generating the operation tree corresponding to the SQL statement to be mapped, the method further includes: sequentially traversing each node of the operation tree based on the key words; and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.
The task analysis or/and division of the SQL statement to be mapped by using the grammar rule to obtain at least one SQL task comprises the following steps: and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.
Wherein the step of establishing a mapping relationship between each first sub-phase and a second sub-phase of the execution engine DAG, which executes each first sub-phase, respectively, includes: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG; if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.
Wherein the step of detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG comprises: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.
Wherein the mapping method further comprises: receiving an instruction for inquiring the mapping relation; and displaying the mapping relation.
Wherein the mapping method further comprises: recording the execution time of the first sub-stage corresponding to the execution of each second sub-stage; and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.
In order to solve the above problem, a second aspect of the present application provides an electronic device, where the electronic device for positioning the sound source location includes a processor and a memory connected to each other; the memory is configured to store program instructions, and the processor is configured to execute the program instructions to implement the method for mapping Hive SQL to an execution engine DAG according to the first aspect.
To solve the above problem, a third aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the mapping method between Hive SQL and execution engine DAG of the first aspect.
The invention has the beneficial effects that: different from the prior art, in the Hive SQL and execution engine DAG mapping method, after the SQL statement to be mapped is obtained, the SQL statement to be mapped may be subjected to task analysis or/and division by using the syntax rules to obtain at least one SQL task, then the SQL task is subjected to keyword analysis, and the SQL task is divided into at least one first sub-phase, where each keyword corresponds to one first sub-phase, and thus, a mapping relationship may be established between each first sub-phase of the SQL statement and a second sub-phase of the execution engine DAG, where each first sub-phase executes each first sub-phase. By establishing the mapping between the SQL statement and the DAG of the execution engine, the execution condition of each stage of the execution engine corresponding to the keyword of the SQL statement can be analyzed more intuitively through the mapping relation, and the efficiency and the accuracy of a user on specific SQL analysis can be improved.
Drawings
FIG. 1 is a flowchart illustrating an embodiment of a mapping method for Hive SQL and DAG of an execution engine in the present application;
FIG. 2 is a schematic flowchart of an embodiment of step S13 in FIG. 1;
FIG. 3 is a schematic flow chart diagram of another embodiment of step S13 in FIG. 1;
FIG. 4 is a schematic flow chart illustrating a SQL task partitioning phase in an application scenario of the present application;
FIG. 5 is a flowchart illustrating an embodiment of step S14 in FIG. 1;
FIG. 6 is a schematic flow chart illustrating the establishment of a mapping relationship in an application scenario of the present application;
FIG. 7 is a flowchart illustrating another embodiment of a mapping method for Hive SQL and DAG of an execution engine of the present application;
FIG. 8 is a flowchart illustrating a further embodiment of the method for mapping Hive SQL to DAG of an execution engine according to the present application;
FIG. 9 is a schematic structural diagram of an embodiment of a mapping apparatus of the present application;
FIG. 10 is a schematic block diagram of an embodiment of an electronic device of the present application;
FIG. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.
In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
The main task of Hive is to translate the SQL statements into programs that can be recognized by various physical engines, so that the user does not need to program the physical engines, and how Hive converts the SQL statements into the execution plan of the physical engines is transparent to the user. In most cases, the user needs to know the execution flow of SQL in the physical engine to some extent, and Hive cannot provide direct contact currently. Therefore, the application provides a mapping method of Hive SQL and execution engine DAG, and the relation between the execution plan of the physical engine and SQL is established.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of a mapping method for Hive SQL and DAG of an execution engine according to the present application. The method for mapping Hive SQL and execution engine DAG in the embodiment comprises the following steps:
step S11: and acquiring the SQL sentence to be mapped.
Step S12: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.
Aiming at the problem that the DAG graph of the execution engine cannot be directly mapped with the specific part of the SQL, the application provides a DAG graph (SQL _ DAG) at the Hive level, and a user can vividly know the execution flow of the SQL in the physical execution engine through the SQL _ DAG. SQL DAG (SQL directed acyclic graph, representing the execution plan of a physical engine) describes the SQL statements submitted by a user, one SQL DAG for each SQL statement. After the SQL statements to be mapped are obtained, task analysis or/and division can be performed on the SQL statements to be mapped by using a grammar rule to obtain at least one SQL task, and it can be understood that if only one task exists in the SQL statements to be mapped, the task division is not required, and only the task analysis is required; in an SQL _ DAG, a layer of queries in an SQL statement is identified by an SQL task (i.e., SQL _ JOB). For example: and a select count from (select from ta group by ta _ 1) tb, wherein the SQL statement has two layers of queries, one SQL _ DAG corresponds to the whole SQL statement, and two SQL _ JOBs correspond to the two layers of queries, wherein SQL _ JOB _0 represents a sub-query (select from group by ta _ 1) tb, and SQL _ JOB _1 represents a parent query select count from (select) from tb.
In an embodiment, the step S12 may specifically include: and performing task analysis or/and division on the SQL sentence to be mapped by using a grammar rule of an abstract grammar tree.
The SQL _ JOB is specifically divided according to the Abstract Syntax Tree and is based on the division, so when the AST (Abstract Syntax Tree) is parsed out from the SQL statement to be mapped, the SQL _ JOB of the SQL _ DAG can be divided according to the AST.
Step S13: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.
Specifically, for SQL _ JOB of SQL _ DAG, the SQL _ JOB is further divided into at least one first sub-phase (i.e., SQL _ STAGE) according to the keywords, so that each keyword may correspond to one first sub-phase, for example, select from ta group by ta _1 in the SQL statement to be mapped is used as a spark task, which corresponds to two first sub-phases, one is a group by phase, and the other is another phase, and each first sub-phase corresponds to one keyword.
Step S14: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.
Because the execution engine DAG displays various stages for executing specific tasks, and each stage is taken as a second word stage, that is, each second sub-stage of the execution engine DAG corresponds to each first sub-stage of each SQL task of the SQL statement to be mapped, a mapping relationship can be established between each first sub-stage of the SQL statement to be mapped and each second sub-stage of the execution engine DAG for executing each first sub-stage, and each keyword can correspond to one first sub-stage, the keyword of the SQL statement to be mapped and the execution engine DAG can correspond to each other, so as to count the operation condition of the specific keyword of the SQL statement to be mapped.
According to the scheme, the SQL statement and the DAG of the execution engine are mapped, the execution condition of each stage of the execution engine corresponding to the keyword of the SQL statement can be analyzed visually through the mapping relation, and the efficiency and the accuracy of a user on specific SQL analysis can be improved.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of step S13 in fig. 1. In an embodiment, the step S13 specifically includes:
step S131: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; wherein each node of the operation tree represents a keyword.
Step S132: and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.
SQL _ STAGE is divided when the AST (Abstract Syntax Tree) is parsed to generate the OP Tree (Operator Tree), and the STAGE division can be performed more accurately. AST only carries out syntax and semantic analysis on SQL sentences, and cannot directly obtain the operation required by the SQL sentences. When the OP Tree is produced and then is divided into stages, part of keyword information is lost, and keywords corresponding to certain operations in the OP Tree cannot be identified. Therefore, the STAGE division is performed when the AST is analyzed, so that the operations required by the keywords in the SQL sentences to be mapped can be directly obtained, the STAGE division is performed according to the keywords, the information can be ensured not to be lost, and the SQL _ STAGE division is accurately performed on the OP Tree.
In the SQL _ STAGE division, keywords which can cause the execution engine DAG to carry out STAGE division are used as division bases, the OP Tree of the SQL statement is divided into a plurality of independent SQL _ STAGEs, the SQL _ STAGEs are named as SQL _ STAGE _ OP, and the OP represents the types of the keywords. Taking a spark execution engine as an example, when an SQL statement "select from t1 group by name order by name" is executed in the spark execution engine, the spark can be divided into three stages by the keywords of group by and order by, the first stage is a stage for reading data, the second stage is a stage for group by, and the last stage is a stage for order by; when the SQL _ STAGE is divided, the three STAGEs are divided as follows according to the action of each STAGE: the action of the first STAGE is to read data, which is divided into SQL _ STATE _ FROM, the second STAGE is divided into SQL _ STATE _ GROUPBY, and the last STAGE is divided into SQL _ STATE _ ORDERBY.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating another embodiment of step S13 in fig. 1. In an embodiment, the step S13 specifically includes:
step S131a: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; wherein each node of the operation tree represents a keyword.
Step S132a: and traversing all nodes of the operation tree in sequence based on the keywords.
Step S133a: and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.
Step S134a: and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.
Referring to fig. 4, fig. 4 is a schematic flow diagram of the SQL task partitioning STAGE in an application scenario of the present application, and when an AST is resolved from an SQL statement, SQL _ JOB may be partitioned from the AST, and when an OP Tree is generated by resolving the AST, SQL _ STAGE may be partitioned. Regarding the division of the SQL _ STAGE of the current SQL _ JOB, first, the current SQL _ JOB and the root OP of the current SQL _ JOB (i.e. the root node, the node of the OP Tree) are obtained, where each node of the OP Tree expresses an execution process, and the SQL _ STAGE of the current SQL _ JOB is initialized to be SQL _ STAGE _ FROM. Then, traversing OP Tree of the current SQL _ JOB, and judging whether the current node needs SQL _ STAGE division; if the current node does not need to be divided into SQL _ STATEs, mapping the current node to the current SQL _ STATEs; if the current keyword is a keyword needing SQL _ STAGE division, such as group by, order by and the like, dividing a new SQL _ STAGE into SQL _ STAGE _ OP, and establishing mapping with the current SQL _ JOB and the front SQL _ STAGE, wherein the newly divided SQL _ STAGE and the current SQL _ JOB are in a relationship of belonging and a relationship of relying on the front SQL _ STAGE, namely the newly divided SQL _ STAGE belongs to the current SQL _ JOB and depends on the front SQL _ STAGE; and carrying out SQL _ STAGE division on all nodes of the OP Tree of the current SQL _ JOB according to the mode until the current SQL _ JOB is ended. And then, if the residual SQL _ JOB exists, repeating the steps to finish the SQL _ STAGE division of the residual SQL _ JOB until the division is finished.
Referring to fig. 5, fig. 5 is a schematic flowchart illustrating an embodiment of step S14 in fig. 1. In an embodiment, the step S14 specifically includes:
step S141: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG.
In an embodiment, the step S141 may specifically include: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.
Step S142: if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.
After the SQL _ DAG divided by SQL _ STAGEs is generated, a mapping relationship needs to be established with the execution engine DAG, and the mapping relationship may change due to the Hive optimization rule, and a situation of 1 to n occurs between 1 STAGE of SQL _ STAGE and the execution engine DAG. Taking the spark execution engine as an example, the SQL statement "select x from t1 order by name" has no corresponding rule optimization, so the resulting SQL _ STAGE and spark execution engine DAG are in a 1-to-1 relationship; when the skew join optimization is started, if a certain key value is larger than a set threshold value, the spark execution engine divides the group by into two STAGEs, and the SQL _ STAGE _ GROUPBY generated by the SQL statement and the DAG of the spark execution engine are in a relationship of 1 to 2.
Referring to fig. 6, fig. 6 is a schematic flow chart illustrating the establishment of the mapping relationship in an application scenario of the present application, and when performing the subsequent rule optimization and generating the execution engine DAG, the relationship between SQL _ JOB, SQL _ STAGE, and OP Tree is retained. When the OP Tree changes, the new OP Tree can inherit the original mapping relationship. When the SQL _ STAGE is changed, the new SQL _ STAGE can inherit the original mapping relationship, and the mapping relationship can deal with the cases of 1 to 1 and 1 to many. And when 1-to-many change occurs, adding sub SQL _ STAGEs to the changed SQL _ STAGE, wherein the number of the sub SQL _ STAGEs corresponds to the number of the new STAGEs and the sub SQL _ STAGEs are numbered. For example: when the SQL statement "select from t1 group by name order by name" opens the skew join optimization, group by is two STAGEs in the spark execution engine DAG, and then SQL _ STAGE _ GROUPBY adds two sub SQL _ STAGEs, namely SQL _ STAGE _ GROUPBY _0 and SQL _ STAGE _ GROUPBY _1.
According to the method and the device, the corresponding relation between the DAG and the SQL statement of the execution engine is established step by step according to the AST and the OP Tree, the corresponding process does not need human participation, the influence of human subjective factors on the analysis of the SQL statement is reduced, and the accuracy and the efficiency of analyzing the SQL statement are improved; meanwhile, the complexity of the specific SQL statement analysis is simplified, and the difficulty in understanding the SQL task is reduced. And the AST and OP Tree based on the Hive compiling process carry out SQL _ STAGE division, so that the scheme has expandability, and when a user needs to increase supportable physical engines, only the processing logic of the corresponding physical engines needs to be increased. In addition, when the Hive compiles the SQL statement, mapping is established according to the compilation product, and external components are not needed, so that a user does not need to additionally increase the maintenance cost of the Hive.
Referring to fig. 7, fig. 7 is a schematic flowchart illustrating another embodiment of a mapping method for Hive SQL and execution engine DAG according to the present application. The method for mapping Hive SQL and execution engine DAG in the embodiment comprises the following steps:
step S71: and acquiring the SQL sentence to be mapped.
Step S72: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.
Step S73: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.
Step S74: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.
Steps S71 to S74 in this embodiment are substantially the same as steps S11 to S14 in the foregoing embodiment of the mapping method for Hive SQL and execution engine DAG of this application, and are not described herein again.
The difference from the foregoing embodiment is that the mapping method of Hive SQL and execution engine DAG in this embodiment may further include the following steps:
step S75: and receiving an instruction for inquiring the mapping relation.
Step S76: and displaying the mapping relation.
After the mapping relationship between each first sub-phase of the SQL statement to be mapped and the second sub-phase of each first sub-phase of the execution engine DAG is established, the mapping relationship can be visualized by displaying the mapping relationship, so that a user can vividly know the execution flow of the SQL statement to be mapped in the physical execution engine.
Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a mapping method of Hive SQL and DAG for an execution engine according to another embodiment of the present application. The mapping method of Hive SQL and execution engine DAG in the embodiment comprises the following steps:
step S81: and acquiring the SQL sentence to be mapped.
Step S82: and performing task analysis or/and division on the SQL statement to be mapped by using a grammar rule to obtain at least one SQL task.
Step S83: performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage.
Step S84: and respectively establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG.
Steps S81 to S84 in this embodiment are substantially the same as steps S11 to S14 in the foregoing embodiment of the mapping method of Hive SQL and execution engine DAG in this application, and are not described here again.
The difference from the foregoing embodiment is that the mapping method of Hive SQL and execution engine DAG in this embodiment may further include the following steps:
step S85: and recording the execution time of the first sub-stage corresponding to the execution of each second sub-stage.
Step S86: and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.
By recording the execution time of the first sub-phase corresponding to the execution of each second sub-phase, the execution condition of a single SQL task in the SQL sentence to be mapped can be determined, the execution efficiency of a specific single SQL task is reflected, and the efficiency and the accuracy of the analysis of the specific SQL task can be effectively improved.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a mapping apparatus according to an embodiment of the present application. The mapping apparatus 90 in this embodiment includes an obtaining module 900, a first parsing and dividing module 902, a second parsing and dividing module 904, and a mapping relationship establishing module 906 that are connected to each other; the obtaining module 900 is configured to obtain an SQL statement to be mapped; the first parsing and dividing module 902 is configured to perform task parsing or/and dividing on the SQL statement to be mapped by using a syntax rule to obtain at least one SQL task; the second parsing and dividing module 904 is configured to perform keyword parsing on each SQL task, and divide the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage; the mapping relationship establishing module 906 is configured to respectively establish a mapping relationship between each first sub-phase of the SQL statement to be mapped and a second sub-phase of the execution engine DAG, where each first sub-phase is executed.
In an embodiment, the second parsing and dividing module 904 performs keyword parsing on each SQL task, and divides the SQL task into at least one first sub-phase, which specifically includes: performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword; and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.
In an embodiment, after the step of performing the keyword analysis on each SQL task and generating the operation tree corresponding to the SQL statement to be mapped, the second parsing and dividing module 904 is further configured to: sequentially traversing each node of the operation tree based on the key words; and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.
In an embodiment, the first parsing and dividing module 902 performs a step of performing task parsing or/and dividing on the SQL statement to be mapped by using a syntax rule to obtain at least one SQL task, which specifically includes: and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.
In an embodiment, the mapping relationship establishing module 906 performs the step of respectively establishing a mapping relationship between each first sub-phase of the SQL statement to be mapped and a second sub-phase of the execution engine DAG, where each first sub-phase is executed, and specifically includes: detecting whether the first sub-phase of each SQL task corresponds to a plurality of second sub-phases of the execution engine DAG; if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.
In an embodiment, the mapping relationship establishing module 906 performs the step of detecting whether the first sub-phase of each SQL task corresponds to multiple second sub-phases of the execution engine DAG, which may specifically include: and if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage.
In an embodiment, the mapping apparatus 90 further includes a display module (not shown) for displaying the mapping relationship after receiving an instruction for querying the mapping relationship.
In an embodiment, the mapping apparatus 90 further includes a determining module (not shown), where the determining module is configured to record an execution time of the first sub-phase corresponding to the execution of each second sub-phase, and determine, based on the execution time, an execution efficiency of each SQL task of the SQL statement to be mapped.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an electronic device of the present application. The electronic device 100 in the present embodiment includes a processor 102 and a memory 101 connected to each other; the memory 101 is used for storing program instructions, and the processor 102 is used for executing the program instructions stored in the memory 101, so as to implement the steps of any one of the above-mentioned mapping method embodiments of Hive SQL and execution engine DAG. In a specific implementation scenario, the electronic device 100 may include, but is not limited to: microcomputer, server.
In particular, the processor 102 is configured to control itself and the memory 101 to implement the steps of any of the above-described embodiments of the mapping method of Hive SQL to execution engine DAG. Processor 102 may also be referred to as a CPU (Central Processing Unit). The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 102 may be commonly implemented by integrated circuit chips.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 110 of the present application has stored thereon program instructions 1100, and when executed by a processor, the program instructions 1100 implement the steps in any of the above-described embodiments of the method for mapping Hive SQL to execution engine DAG.
The computer-readable storage medium 110 may be a medium that can store the program instructions 1100, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may be a server that stores the program instructions 1100, and the server may transmit the stored program instructions 1100 to another device for execution, or may execute the stored program instructions 1100 by itself.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and device may be implemented in other ways. For example, the above-described apparatus and device embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

Claims (8)

1. A method for mapping Hive SQL and execution engine DAG is characterized in that the method for mapping comprises the following steps:
acquiring an SQL statement to be mapped;
using a grammar rule to carry out task analysis or/and division on the SQL sentence to be mapped to obtain at least one SQL task;
performing keyword analysis on each SQL task, and dividing the SQL task into at least one first sub-phase; wherein each keyword corresponds to a first sub-stage;
establishing a mapping relation between each first sub-phase of the SQL statement to be mapped and a second sub-phase of each first sub-phase executed by an execution engine DAG;
the step of establishing a mapping relationship between each first sub-phase and a second sub-phase of the execution engine DAG, where the execution engine DAG executes each first sub-phase, includes:
if the task amount corresponding to the first sub-stage is larger than a set value, distributing the first sub-stage to a plurality of second sub-stages based on the task amount of the first sub-stage;
and if one first sub-phase corresponds to a plurality of second sub-phases of the execution engine DAG, adding lower-order sub-phases with the same number as the second sub-phases to the first sub-phase, and respectively corresponding to the second sub-phases of the execution engine DAG.
2. The mapping method according to claim 1, wherein the step of performing keyword parsing on each SQL task and dividing the SQL task into at least one first sub-phase comprises:
performing keyword analysis on each SQL task to generate an operation tree corresponding to the SQL statement to be mapped; each node of the operation tree represents a keyword;
and dividing the SQL statement to be mapped into at least one first sub-phase based on each node of the operation tree.
3. The mapping method according to claim 2, wherein after the step of performing keyword analysis on each SQL task to generate the operation tree corresponding to the SQL statement to be mapped, the method further comprises:
sequentially traversing each node of the operation tree based on the key words;
and if the operation tree omits part of the nodes corresponding to the keywords, adding the nodes corresponding to the keywords to the positions corresponding to the operation tree based on the positions of the keywords in the SQL tasks.
4. The mapping method according to claim 1, wherein the task parsing or/and partitioning the SQL statement to be mapped by using the syntax rules to obtain at least one SQL task comprises:
and carrying out task analysis or/and division on the SQL statement to be mapped by utilizing a grammar rule of an abstract grammar tree.
5. The mapping method according to claim 1, wherein the mapping method further comprises:
receiving an instruction for inquiring the mapping relation;
and displaying the mapping relation.
6. The mapping method according to claim 1, wherein the mapping method further comprises:
recording the execution time of the first sub-phase corresponding to the execution of each second sub-phase;
and determining the execution efficiency of each SQL task of the SQL statement to be mapped based on the execution time.
7. An electronic device, characterized in that the electronic device comprises a processor and a memory connected to each other;
the memory is to store program instructions, the processor to execute the program instructions to implement the Hive SQL to execution engine DAG mapping method of any of claims 1-6.
8. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of mapping Hive SQL and execution engine DAG according to any of claims 1 to 6.
CN202211538051.0A 2022-12-02 2022-12-02 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current) Active CN115563150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211538051.0A CN115563150B (en) 2022-12-02 2022-12-02 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211538051.0A CN115563150B (en) 2022-12-02 2022-12-02 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Publications (2)

Publication Number Publication Date
CN115563150A CN115563150A (en) 2023-01-03
CN115563150B true CN115563150B (en) 2023-04-18

Family

ID=84770368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211538051.0A Active CN115563150B (en) 2022-12-02 2022-12-02 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)

Country Status (1)

Country Link
CN (1) CN115563150B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291990A (en) * 2020-02-04 2020-06-16 浙江大华技术股份有限公司 Quality monitoring processing method and device
CN113204571A (en) * 2021-04-23 2021-08-03 新华三大数据技术有限公司 SQL execution method and device related to write-in operation and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786808B (en) * 2014-12-15 2019-06-18 阿里巴巴集团控股有限公司 A kind of method and apparatus for distributed execution relationship type computations
CN111723249A (en) * 2020-05-22 2020-09-29 上海明略人工智能(集团)有限公司 Method and device for realizing data processing, computer storage medium and terminal
CN112256721B (en) * 2020-10-21 2021-08-17 平安科技(深圳)有限公司 SQL statement parsing method, system, computer device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291990A (en) * 2020-02-04 2020-06-16 浙江大华技术股份有限公司 Quality monitoring processing method and device
CN113204571A (en) * 2021-04-23 2021-08-03 新华三大数据技术有限公司 SQL execution method and device related to write-in operation and storage medium

Also Published As

Publication number Publication date
CN115563150A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN111522816B (en) Data processing method, device, terminal and medium based on database engine
US20220091827A1 (en) Pruning Engine
EP3080721B1 (en) Query techniques and ranking results for knowledge-based matching
US9703830B2 (en) Translation of a SPARQL query to a SQL query
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
US8417690B2 (en) Automatically avoiding unconstrained cartesian product joins
US9892191B2 (en) Complex query handling
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
US11893011B1 (en) Data query method and system, heterogeneous acceleration platform, and storage medium
CN109471889B (en) Report accelerating method, system, computer equipment and storage medium
CN108710662B (en) Language conversion method and device, storage medium, data query system and method
CN113672628A (en) Data blood margin analysis method, terminal device and medium
CN114492264B (en) Gate-level circuit translation method, system, storage medium and equipment
CN108959454B (en) Prompting clause specifying method, device, equipment and storage medium
CN110580170B (en) Method and device for identifying software performance risk
CN115563150B (en) Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)
CN114638184B (en) Gate-level circuit simulation method, system, storage medium and equipment
RU2393536C2 (en) Method of unified semantic processing of information, which provides for, within limits of single formal model, presentation, control of semantic accuracy, search and identification of objects description
CN112783758B (en) Test case library and feature library generation method, device and storage medium
CN114547083A (en) Data processing method and device and electronic equipment
CN114398426A (en) SQL statement global variable attribute query method and tool
CN114090558A (en) Data quality management method and device for database
CN115237936B (en) Method, device, storage medium and equipment for detecting fields in SQL (structured query language) statement
CN112307050B (en) Identification method and device for repeated correlation calculation and computer system
CN114880351B (en) Recognition method and device of slow query statement, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant