CN113032362A

CN113032362A - Data blood margin analysis method and device, electronic equipment and storage medium

Info

Publication number: CN113032362A
Application number: CN202110292110.XA
Authority: CN
Inventors: 于泽; 陈颖; 林义明; 郭酉晨; 解翔
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2021-03-18
Filing date: 2021-03-18
Publication date: 2021-06-25
Anticipated expiration: 2041-03-18
Also published as: CN113032362B

Abstract

The application discloses a data blood margin analysis method, a device, electronic equipment and a storage medium, wherein the data blood margin analysis method comprises the following steps: acquiring an SQL statement to be processed, and converting the SQL statement into an abstract syntax tree; acquiring a parsing rule and a parsing strategy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on the parsing rules and the parsing strategy of the abstract syntax tree; converting the abstract syntax tree into a logic execution plan using a plurality of levels of nodes; and carrying out iterative analysis on the logic execution plan to obtain the blood relation between the data corresponding to the SQL statement. By means of the method, the data processing system and the data processing method, the blood relationship among the data in the data processing process can be acquired, and the data screening cost is reduced.

Description

Data blood margin analysis method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data blood relationship analysis method, apparatus, electronic device, and storage medium.

Background

With the advent of the data explosion age, data has become an important asset for both individuals and businesses. When carrying out data construction and management, data can produce the blood relationship between the data from production, processing and circulation in-process, and the blood relationship between the data can be more complicated after data volume constantly increases, when quality problems appear in data, owing to can't acquire the blood relationship between the data of different levels, consequently, need screen layer upon layer data, greatly increased the cost of data screening. In view of this, how to obtain the blood-related relationship between data in the data processing process and reduce the cost of data screening become problems to be solved urgently.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a data blood relationship analysis method, a data blood relationship analysis device, electronic equipment and a storage medium, which can acquire the blood relationship among data in a data processing process and reduce the cost of data screening.

In order to solve the above technical problem, a first aspect of the present application provides a data blood margin analysis method, including: acquiring an SQL statement to be processed, and converting the SQL statement into an abstract syntax tree; acquiring a parsing rule and a parsing strategy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one type of analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree; traversing nodes of a plurality of levels of the abstract syntax tree based on parsing rules and the parsing policy of the abstract syntax tree; converting the abstract syntax tree into a logical execution plan using the plurality of levels of nodes; and carrying out iterative analysis on the logic execution plan to obtain the blood relationship between the data corresponding to the SQL statement.

In order to solve the above technical problem, a second aspect of the present application provides a data blood margin analysis device, including: the processing module is used for acquiring SQL sentences to be processed and converting the SQL sentences into abstract syntax trees; the acquisition module is used for acquiring the parsing rule and the parsing strategy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one type of analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree; a search module; means for traversing nodes of a plurality of levels of the abstract syntax tree based on parsing rules and parsing policies of the abstract syntax tree; a conversion module to convert the abstract syntax tree into a logic execution plan using the plurality of levels of nodes; and the analysis module is used for carrying out iterative analysis on the logic execution plan so as to obtain the blood relationship between the data corresponding to the SQL statement.

In order to solve the above technical problem, a third aspect of the present application provides an electronic device, including a memory and a processor, which are coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions stored in the memory to implement the data blood margin analysis method of the first aspect.

In order to solve the above technical problem, a fourth aspect of the present application provides a computer-readable storage medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the data blood margin analysis method of the first aspect.

The beneficial effect of this application is: the method comprises the steps of converting SQL sentences into abstract syntax trees, obtaining parsing rules and parsing strategies of the abstract syntax trees, wherein each type of data corresponds to one parsing rule, nodes of different levels of the abstract syntax trees correspond to the parsing strategies of all levels, traversing the nodes of multiple levels of the abstract syntax trees to obtain logic execution plans based on the parsing rules and the parsing strategies of the different nodes of the abstract syntax trees, and carrying out iterative analysis on the logic execution plans to extract the blood relation between the data corresponding to the SQL sentences. Therefore, the SQL sentences to be processed are converted and analyzed to obtain the blood relationship among the data, so that the efficiency can be improved when the data of different levels are screened, and the cost of data screening is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a data blood relationship analysis method of the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a data consanguinity analysis method of the present application;

FIG. 3 is a flowchart illustrating an embodiment corresponding to step S207 in FIG. 2;

FIG. 4 is a block diagram of an embodiment of a data blood margin analysis device according to the present application;

FIG. 5 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 6 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a data blood relationship analysis method according to the present application. Specifically, the method may include the steps of:

step S101: and acquiring the SQL sentences to be processed, and converting the SQL sentences into abstract syntax trees.

Specifically, a source code of the SQL statement is obtained, the character stream of the source code is read in sequentially, and the source code is parsed to obtain an abstract syntax tree.

In an application mode, acquiring a source code corresponding to an SQL statement to be processed, and segmenting the source code based on an end symbol of the source code to obtain a plurality of lines of codes corresponding to the source code; removing redundant characters in the multi-line code to obtain a processed multi-line code; and performing semantic analysis on the processed multi-line code by using a syntax analyzer to obtain an abstract syntax tree.

Specifically, a source code of an SQL statement written by a developer is obtained, a semicolon end symbol is used as a segmentation node for the source code, the source code is segmented into a plurality of lines of codes to form a code structure corresponding to the source code, redundant line feed symbols, index symbols and annotation statements in the plurality of lines of codes are removed to obtain processed multi-line codes, the processed multi-line codes are subjected to SQL semantic analysis line by line to obtain an abstract syntax tree, redundant characters are removed to improve the efficiency of the semantic analysis, the influence of the redundant characters on the semantic analysis is reduced, and the accuracy of the semantic analysis is improved, so that the obtained abstract syntax tree is higher in matching degree with the SQL statement to be processed.

Step S102: acquiring a parsing rule and a parsing strategy of the abstract syntax tree; the multiple types of data corresponding to the SQL statements respectively correspond to one analysis rule, and the analysis rules comprise different analysis strategies aiming at nodes of different levels of the abstract syntax tree.

Specifically, multiple types of data corresponding to the SQL statement respectively correspond to one parsing rule, and the parsing rule includes different parsing strategies for nodes of different levels.

Further, predefined analysis rules and strategies are obtained. For data of the types such as table objects, field objects, aggregation functions, sub-queries, grouping functions, alias references, and the like, a table object parser, a field object parser, an aggregation function parser, a sub-query parser, a grouping function parser, and an alias reference parser are defined as preset parsing rules. Under the analysis rule of each type of data, a case conversion analysis strategy, a sorting grouping analysis strategy, an alias analysis strategy and an aggregation calculation analysis strategy for the nodes of each hierarchy are preset in each analysis rule corresponding to the nodes of different hierarchies, so that the analysis requirements of the nodes of different types of data and different hierarchies in an abstract syntax tree are met, and the analysis accuracy is improved.

Step S103: traversing nodes of a plurality of levels of the abstract syntax tree based on parsing rules and parsing policies of the abstract syntax tree.

Specifically, parsing rules and parsing strategies for a plurality of hierarchical nodes of the abstract syntax tree are predefined, after the parsing rules and the parsing strategies of the abstract syntax tree are obtained, the nodes of the abstract syntax tree are analyzed based on the parsing rules and the parsing strategies of the abstract syntax tree, and all nodes of the abstract syntax tree are traversed from nodes at a high hierarchy to nodes at a low hierarchy.

In an application mode, the abstract syntax tree comprises a unitary node, a binary node and leaf nodes from high to low in hierarchy, corresponding analysis rules are set for a table object and a field object in the abstract syntax tree, corresponding analysis strategies are set for objects of the same type at the unitary node, the binary node and the leaf nodes, and the objects of different types in the abstract syntax tree are analyzed at the unitary node, the binary node and the leaf nodes based on the preset analysis rules and analysis strategies of the abstract syntax tree to obtain the nodes of multiple hierarchies.

Step S104: the abstract syntax tree is converted into a logical execution plan using a plurality of levels of nodes.

Specifically, the nodes of the multiple hierarchies are converted to obtain a logic execution plan based on the nodes of the multiple hierarchies after the resolution.

In an application mode, nodes of the abstract syntax tree are traversed, the optimizer strategy is omitted, a data file is scanned, a physical execution plan is generated, and a logic execution plan is obtained based on the analyzed nodes of multiple levels, so that subsequent analysis and iteration are facilitated.

Step S105: and carrying out iterative analysis on the logic execution plan to obtain the blood relation between the data corresponding to the SQL statement.

Specifically, after the SQL statement is converted into the logic execution plan, iteration is continuously analyzed on the logic execution plan, nodes of multiple hierarchies are encountered in the iteration process, the nodes with the highest hierarchy are obtained through a recursion method, inheritance relations among data in the iteration process are recorded, and then blood relation among data corresponding to the SQL statement is extracted.

In an application mode, the logic execution plan is continuously analyzed and iterated, leaf nodes, binary nodes and unitary nodes are encountered in the iteration process, and inheritance relationships in the iteration process are recorded, so that processing logics of data information and data are read, for example, whether data in a certain field in a current table is from which objects or not, whether aggregation, duplication elimination and the like are included in a calculation mode or not, and the logic execution plan is subjected to deep traversal, so that the purpose of data blood-related analysis is finally achieved.

According to the scheme, the SQL sentences are converted into the abstract syntax tree, the parsing rules and the parsing strategies of the abstract syntax tree are obtained, each type of data corresponds to one parsing rule, nodes of different levels of the abstract syntax tree correspond to the parsing strategies of all levels, the nodes of multiple levels of the abstract syntax tree are traversed to obtain a logic execution plan based on the parsing rules and the parsing strategies of the different nodes of the abstract syntax tree, and the logic execution plan is subjected to iterative analysis so as to extract the blood relation between the data corresponding to the SQL sentences. Therefore, the SQL sentences to be processed are converted and analyzed to obtain the blood relationship among the data, so that the efficiency can be improved when the data of different levels are screened, and the cost of data screening is reduced.

Referring to fig. 2, fig. 2 is a schematic flow chart of another embodiment of the data blood relationship analysis method of the present application. Specifically, the method may include the steps of:

step S201: and acquiring the SQL sentences to be processed, and converting the SQL sentences into abstract syntax trees.

Specifically, the SQL statement is parsed by the parser to avoid scanning the original file, and the SQL statement is parsed line by line to convert the SQL statement into an abstract syntax tree.

In an application mode, an Antlr4 parser is used for parsing lexical and grammatical, all of which are called as the anther Tool for Language Recognition, to obtain a source code of a structured query Language, sequentially reading character streams of the source code, recognizing morphemes in the character streams, mapping the morphemes into tags after obtaining the morphemes, mapping one tag to one type in the grammar of the structured query Language, further performing grammatical analysis on all the tags, and combining all the tags to generate a grammar abstract tree.

Step S202: and acquiring the analysis rule and the analysis strategy.

Specifically, multiple types of data corresponding to the SQL statements respectively correspond to one analysis rule, the analysis rules include different analysis strategies for nodes of different levels, and the analysis strategies are compatible with syntax rules of the multiple types of SQL statements. The analysis rules and the analysis strategy are backward compatible with other SQL grammar rules such as Presto and the like. And presetting an analysis strategy compatible with various grammar rules aiming at different grammar rules so as to meet the analysis requirements of different types of data under various grammar rules and improve the compatibility when the abstract grammar tree is analyzed.

Step S203: and acquiring a user-defined custom function and a mapping cache corresponding to the custom function, wherein the mapping cache corresponds to constant data of the custom function.

Specifically, the user-defined function is obtained and a mapping cache corresponding to the user-defined function is established, based on basic information in an initialization stage, an external program code required by the user-defined function is loaded and dynamically loaded to a ClassLoader, and finally registration of the user-defined function is completed and the corresponding mapping cache is established, so that the user-defined function is convenient to use in subsequent steps.

In an application mode, before the step of obtaining the user-defined custom function and the mapping cache corresponding to the custom function, the method comprises the following steps: and acquiring and caching metadata information from an external storage, wherein the metadata information comprises a custom function. Further, the step of obtaining the user-defined custom function and the mapping cache corresponding to the custom function comprises the following steps: and acquiring a plurality of self-defined functions from external storage and/or SQL statements, and respectively establishing mapping caches corresponding to the self-defined functions for the plurality of self-defined functions.

Specifically, metadata information such as database names, table names and custom functions is stored in the external storage, and after partial metadata information is cached, the extra overhead of acquiring metadata in subsequent steps can be reduced, and the data processing efficiency is improved.

Further, the external storage element stores a directory of metadata information, a desired custom function can be obtained from the directory of the metadata information, or the custom function desired by a user is created through an SQL statement, and the source and destination of constant data of the custom function are respectively mapped with the custom function to obtain a mapping cache corresponding to the custom function, which is convenient for calling of subsequent steps.

Step S204: and traversing the nodes of the plurality of levels of the abstract syntax tree based on the mapping cache corresponding to the custom function, the analysis rule and the analysis strategy aiming at the nodes of different levels.

Specifically, the abstract syntax tree is subjected to deep analysis by combining predefined analysis rules, strategies and custom functions to traverse nodes of multiple levels of the abstract syntax tree, the analysis rules of corresponding types are adopted for data with the defined analysis rules, the corresponding analysis strategies are adopted for nodes of different levels, and the custom functions are obtained to analyze the data with the analysis rules which are not predefined for the data with the analysis rules which are not predefined in advance, so that the analysis requirements of the data with multiple types of the abstract syntax tree are met, and the traversing efficiency and the analysis accuracy of the abstract syntax tree are improved.

Step S205: and obtaining the objectification elements corresponding to the abstract syntax tree by using the nodes of the plurality of levels.

Specifically, the nodes of the plurality of levels after being analyzed are obtained, the abstract syntax tree after being analyzed is subjected to objectification processing and is converted into a sub-query object, a table scanning object and an associated query object, so that the converted objectification elements can be directly operated in the subsequent steps, the syntax tree object is prevented from being directly operated, the processing burden of a system is reduced, and the processing efficiency is improved.

Step S206: a logical execution plan is generated based on the objectified elements.

Specifically, the objectified elements are converted into a logic execution plan, and when the parser corresponding to any type of data identifies logic execution, the parser does not perform parsing any more, so that the parsers corresponding to other types of data perform deep traversal on the abstract syntax tree until each type of data is converted into the objectified elements and the logic execution plan is generated.

Step S207: and performing iterative analysis on the objectified elements corresponding to the logic execution plan to obtain the blood relationship among the multiple types of data corresponding to the SQL statements in the iterative analysis process.

Specifically, data extraction and analysis are performed based on the objectified elements corresponding to the logic execution plan, and processing conditions corresponding to the input/output table, the fields and the partitions are analyzed to obtain relationships among the data tables, the fields and the partitions, so that the blood relationship among the data in the data processing process is obtained, and the cost of data screening is reduced.

In a specific application scenario, please refer to fig. 3, and fig. 3 is a flowchart illustrating an embodiment corresponding to step S207 in fig. 2. Specifically, the method may include the steps of:

step S301: and extracting information of the objectified element to obtain basic information in the objectified element.

Specifically, information extraction is carried out on the basis of the objectification elements, including query objects and execution command objects, and iterative analysis is carried out on the objectification elements. The query object comprises select and with keywords, the execution command object comprises create and add keywords, the query object is extracted through bound directory information to obtain a table name, a field name, remarks, partition information and a calculation mode, and the unique hash value of the query object is recorded during iteration and serves as a parent node code of the next element. And aiming at the part of the execution command object, executing corresponding operations, such as switching a default database, creating a custom function and the like. And when the iterative analysis reaches the element object with the lower objectification element non-hierarchy, ending the iterative analysis to obtain the basic information.

Step S302: and performing constant identification on the objectified element to obtain constant information in the objectified element.

Specifically, the constant information written in the query expression, the filtering condition expression and the grouping expression in the objectified element is analyzed to obtain the constant information in the objectified element.

In an application scenario, a selection '1' as f1 from xx where dt is '2020-01-01' expression is obtained, and a f1 field corresponding constant value '1' and a dt field corresponding constant value '2020-01-01' are obtained in an analysis result, so that a constant recognition result can be obtained in subsequent steps.

Step S303: and performing filter condition analysis on the objectified element to obtain filter definition information in the objectified element.

Specifically, analyzing filtering limitation conditions included in a single table, a plurality of tables and a sub-query module in the objectification element to extract information, recording field sources, and recording a filtering limitation relationship, wherein the filtering limitation relationship at least comprises: IF the iteration is greater than, equal to, less than, equal to or not equal to the limit condition IF, and the filter condition inference is realized according to the constant identification result to obtain filter limit information.

In an application scenario, if the restriction condition for obtaining the a table is dt ═ 2020-01-01', and the restriction condition for associating the a table with the B table is a.dt ═ b.dt, it can be estimated that b.dt is '2020-01-01 '.

Step S304: and carrying out custom function identification on the objectified element to obtain a custom function in the objectified element, and acquiring constant data in the mapping cache by using a reflection mechanism.

Specifically, a user-defined function defined in the SQL statement to be processed is analyzed, whether input parameters of the user-defined function are all constants is analyzed, if the input parameters are all constants, the instantiated type calling UDF function is obtained from the mapping cache by using a JAVA reflection mechanism, the calculation result is calculated, and the calculation result is stored as constant data.

Step S305: pseudo dynamic partition recognition is performed on the objectified element to obtain an implicit constant partition field in the objectified element.

Specifically, when the partition written in a certain table is covered, if the SQL statement is written in a dynamic partition, based on the constant identification result and the filtering restriction information in the above steps, whether the SQL statement includes the implicit constant partition field is analyzed, and partition field constant binding is completed, so that pseudo-dynamic partition identification is realized to obtain the implicit constant partition field.

Step S306: based on the basic information, the constant information, the filtering limitation information, the constant data and the implicit constant partition field, obtaining respective conversion logic of the multiple types of data so as to obtain the blood relationship among the multiple types of data corresponding to the SQL statement.

Specifically, through a recursive analytic iteration process, iterative analysis can be performed on the objectified elements of the logic plan, so that basic information extraction, constant information extraction, filtering limit information extraction, constant data extraction and implicit constant partition field extraction are completed. Therefore, the information of the read object table is extracted, the field processing logic of the write object table is realized, for example, the calculation mode of a certain field in the result table is the data from which the read object table is derived, whether the calculation mode includes aggregation, duplication elimination and the like is judged, and the purpose of analyzing the data blood margin is finally achieved.

In the embodiment, the SQL sentence to be processed is converted into the abstract syntax tree, the nodes of multiple levels of the abstract syntax tree are analyzed through the preset analysis strategy and the custom function, the objectified element is obtained and converted into the logic execution plan, the objectified element corresponding to the logic execution plan is subjected to iterative analysis, so that the blood-related relationship among multiple types of data corresponding to the SQL sentence in the iterative analysis process is obtained, the blood-related relationship analysis among data such as tables, partitions, fields and calculation modes is realized, and the cost of manually combing the data is saved.

Referring to fig. 4, fig. 4 is a schematic diagram of a frame of an embodiment of a data blood margin analysis device according to the present application. The data blood margin analysis device 40 includes: a processing module 400, an acquisition module 402, a lookup module 404, a translation module 406, and an analysis module 408. The processing module 400 is configured to obtain an SQL statement to be processed, and convert the SQL statement into an abstract syntax tree; the obtaining module 402 is configured to obtain a parsing rule and a parsing policy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree; the searching module 404 is configured to traverse nodes of multiple levels of the abstract syntax tree based on parsing rules and parsing strategies of the abstract syntax tree; the conversion module 406 is configured to convert the abstract syntax tree into a logic execution plan using a plurality of levels of nodes; the analysis module 408 is configured to perform iterative analysis on the logic execution plan to obtain a blood-related relationship between data corresponding to the SQL statement.

In the above scheme, the processing module 400 converts the SQL statement into an abstract syntax tree, the obtaining module 402 obtains a parsing rule and a parsing policy of the abstract syntax tree, wherein each type of data corresponds to one parsing rule and nodes at different levels of the abstract syntax tree correspond to parsing policies at different levels, the searching module 404 traverses nodes at multiple levels of the abstract syntax tree based on the parsing rules and the parsing policies at different nodes of the abstract syntax tree to obtain a logic execution plan, and the analyzing module 408 performs iterative analysis on the logic execution plan to extract a blood relationship between data corresponding to the SQL statement. Therefore, the SQL sentences to be processed are converted and analyzed to obtain the blood relationship among the data, so that the efficiency can be improved when the data of different levels are screened, and the cost of data screening is reduced.

In some embodiments, the obtaining module 402 may be further configured to: acquiring a user-defined function and a mapping cache corresponding to the user-defined function; the mapping cache corresponds to constant data of the custom function. The conversion module 406 may also be configured to: and traversing the nodes of the plurality of levels of the abstract syntax tree based on the mapping cache corresponding to the custom function, the analysis rule and the analysis strategy aiming at the nodes of different levels.

In some embodiments, the conversion module 406 may also be configured to: obtaining objectification elements corresponding to the abstract syntax tree by using nodes of a plurality of hierarchies; a logical execution plan is generated based on the objectified elements.

In some embodiments, the analysis module 408 may also be used to: and performing iterative analysis on the objectified elements corresponding to the logic execution plan to obtain the blood relationship among the multiple types of data corresponding to the SQL statements in the iterative analysis process.

In some embodiments, the analysis module 408 may also be used to: extracting information of the objectified element to obtain basic information in the objectified element; and performing constant identification on the objectified element to obtain constant information in the objectified element; and performing filter condition analysis on the objectified elements to obtain filter definition information in the objectified elements; and carrying out self-defined function identification on the objectified element to obtain a self-defined function in the objectified element, and acquiring constant data in the mapping cache by using a reflection mechanism; performing pseudo dynamic partition identification on the objectified element to obtain an implicit constant partition field in the objectified element; based on the basic information, the constant information, the filtering limitation information, the constant data and the implicit constant partition field, obtaining respective conversion logic of the multiple types of data so as to obtain the blood relationship among the multiple types of data corresponding to the SQL statement.

In some embodiments, the obtaining module 402 may be further configured to: obtaining and caching metadata information from an external storage, wherein the metadata information comprises a custom function; and acquiring a plurality of self-defined functions from external storage and/or SQL statements, and respectively establishing mapping caches corresponding to the self-defined functions for the plurality of self-defined functions.

In some embodiments, the processing module 400 may also be configured to: acquiring a source code corresponding to an SQL statement to be processed, and segmenting the source code based on an end symbol of the source code to obtain a plurality of lines of codes corresponding to the source code; removing redundant characters in the multi-line code to obtain a processed multi-line code; and performing semantic analysis on the processed multi-line code by using a syntax analyzer to obtain an abstract syntax tree.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of an electronic device according to the present application. The electronic device 50 comprises a memory 501 and a processor 502 coupled to each other, the memory 501 stores program instructions, and the processor 502 is configured to execute the program instructions stored in the memory 501 to implement the steps of any of the embodiments of the data blood margin analysis method described above.

In particular, the processor 502 is configured to control itself and the memory 501 to implement the steps of any of the embodiments of the data blood margin analysis method described above. Processor 502 may also be referred to as a CPU (Central Processing Unit). The processor 502 may be an integrated circuit chip having signal processing capabilities. The Processor 502 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Additionally, the processor 502 may be implemented collectively by an integrated circuit chip.

In the above scheme, the processor 502 reduces the cost of data screening by obtaining the blood relationship between data in the data processing process.

Referring to fig. 6, fig. 6 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 60 stores program instructions 600 executable by the processor, the program instructions 600 for implementing the steps of any of the data consanguinity analysis method embodiments described above.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A method of data blood margin analysis, the method comprising:

acquiring an SQL statement to be processed, and converting the SQL statement into an abstract syntax tree;

acquiring a parsing rule and a parsing strategy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one type of analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree;

traversing nodes of a plurality of levels of the abstract syntax tree based on parsing rules and parsing policies of the abstract syntax tree;

converting the abstract syntax tree into a logical execution plan using the plurality of levels of nodes;

and carrying out iterative analysis on the logic execution plan to obtain the blood relationship between the data corresponding to the SQL statement.

2. The method of claim 1, wherein traversing the nodes of the plurality of levels of the abstract syntax tree based on the parsing rules and parsing strategies of the abstract syntax tree is preceded by the step of:

acquiring a user-defined function and a mapping cache corresponding to the user-defined function; the mapping cache corresponds to constant data of the custom function;

the step of traversing nodes of a plurality of levels of the abstract syntax tree based on parsing rules and parsing policies of the abstract syntax tree comprises:

traversing the nodes of the plurality of levels of the abstract syntax tree based on the mapping cache corresponding to the custom function, the parsing rule and the parsing strategy thereof for the nodes of different levels.

3. The method according to claim 2, wherein the step of converting the abstract syntax tree into a logic execution plan using the plurality of levels of nodes comprises:

obtaining objectification elements corresponding to the abstract syntax tree by utilizing the nodes of the multiple hierarchies;

generating the logic execution plan based on the objectified element.

4. The method according to claim 3, wherein the step of iteratively analyzing the logic execution plan to obtain the blood-related relationship between the data corresponding to the SQL statement comprises:

and performing iterative analysis on the objectified elements corresponding to the logic execution plan to obtain a blood relationship among multiple types of data corresponding to the SQL statements in the iterative analysis process.

5. The method according to claim 4, wherein the step of performing iterative analysis on the objectified element corresponding to the logic execution plan to obtain the blood relationship between multiple types of data corresponding to the SQL statement in the iterative analysis process comprises:

extracting information of the objectified element to obtain basic information in the objectified element; and the number of the first and second groups,

performing constant identification on the objectified element to obtain constant information in the objectified element; and the number of the first and second groups,

performing filter condition analysis on the objectified element to obtain filter definition information in the objectified element; and the number of the first and second groups,

performing custom function identification on the objectified element to obtain a custom function in the objectified element, and acquiring the constant data in the mapping cache by using a reflection mechanism; and the number of the first and second groups,

performing pseudo-dynamic partition identification on the objectified element to obtain an implicit constant partition field in the objectified element;

based on the basic information, the constant information, the filtering limitation information, the constant data and the implicit constant partition field, obtaining respective conversion logic of the multiple types of data to obtain a blood-related relationship between the multiple types of data corresponding to the SQL statement.

6. The method according to claim 2, wherein the step of obtaining the user-defined custom function and the mapping cache corresponding to the custom function is preceded by:

obtaining and caching metadata information from an external storage, wherein the metadata information comprises the custom function;

the step of obtaining the user-defined custom function and the mapping cache corresponding to the user-defined custom function comprises:

and acquiring a plurality of the self-defined functions from external storage and/or the SQL statements, and respectively establishing mapping caches corresponding to the self-defined functions for the plurality of the self-defined functions.

7. The method according to claim 1, wherein the step of obtaining the SQL statement to be processed and converting the SQL statement into an abstract syntax tree comprises:

acquiring a source code corresponding to the SQL statement to be processed, and segmenting the source code based on an end character of the source code to obtain a plurality of lines of codes corresponding to the source code;

removing redundant characters in the multi-line code to obtain a processed multi-line code;

and performing semantic analysis on the processed multi-line code by using a syntax analyzer to obtain the abstract syntax tree.

8. A data blood margin analysis device, comprising:

the processing module is used for acquiring SQL sentences to be processed and converting the SQL sentences into abstract syntax trees;

the acquisition module is used for acquiring the parsing rule and the parsing strategy of the abstract syntax tree; the SQL statement comprises a plurality of types of data, wherein the plurality of types of data corresponding to the SQL statement respectively correspond to one type of analysis rule, and the analysis rule comprises different analysis strategies aiming at nodes of different levels of the abstract syntax tree;

a searching module, configured to traverse nodes of multiple levels of the abstract syntax tree based on parsing rules and parsing strategies of the abstract syntax tree;

a conversion module to convert the abstract syntax tree into a logic execution plan using the plurality of levels of nodes;

and the analysis module is used for carrying out iterative analysis on the logic execution plan so as to obtain the blood relationship between the data corresponding to the SQL statement.

9. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor calls to perform the method of any of claims 1-7.

10. A computer-readable storage medium, on which program data are stored, which program data, when being executed by a processor, carry out the method of any one of claims 1-7.