CN112256721B - SQL statement parsing method, system, computer device and storage medium - Google Patents

SQL statement parsing method, system, computer device and storage medium Download PDF

Info

Publication number
CN112256721B
CN112256721B CN202011134553.8A CN202011134553A CN112256721B CN 112256721 B CN112256721 B CN 112256721B CN 202011134553 A CN202011134553 A CN 202011134553A CN 112256721 B CN112256721 B CN 112256721B
Authority
CN
China
Prior art keywords
layer
sql
field
directed acyclic
acyclic graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011134553.8A
Other languages
Chinese (zh)
Other versions
CN112256721A (en
Inventor
陈玉
张茜
凌海挺
杜均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011134553.8A priority Critical patent/CN112256721B/en
Priority to PCT/CN2020/135735 priority patent/WO2021179722A1/en
Publication of CN112256721A publication Critical patent/CN112256721A/en
Application granted granted Critical
Publication of CN112256721B publication Critical patent/CN112256721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Abstract

The invention provides an SQL statement analyzing method, an SQL statement analyzing system, computer equipment and a storage medium, wherein the SQL statement analyzing method is used for acquiring a directed acyclic graph, and the directed acyclic graph is obtained by combing SQL statements; traversing the layers of the directed acyclic graph with SQL sub-query to obtain the layers with the substitute, and taking the layer with the substitute which is first appeared during traversal as the current layer; acquiring a field corresponding to the substitute symbol in the current layer according to a field of a layer before the current layer; and continuously acquiring the rest layers with the substitutes, and replacing the layers until all the substitutes in the directed acyclic graph are analyzed. Therefore, the SQL statement analysis method adopted in the SQL-based data blood margin analysis software tool or system does not need to interact with database software and access extra metadata, can ensure the security of the metadata of the user as much as possible, and does not expose the metadata information which does not appear in the related SQL due to the data blood margin analysis. Meanwhile, the invention also relates to a block chain technology.

Description

SQL statement parsing method, system, computer device and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a system, computer equipment and a storage medium for analyzing SQL statements.
Background
Data is used as important assets of an enterprise and needs to be managed, maintained and used in a data governance mode, data consanguinity is used as an important component of the data governance, the source and destination of the data are analyzed through SQL statements used in the data processing process, and an important data base is provided for other work of the data governance. The SQL sentences used by the data blood margin analysis can be divided into original SQL sentences and SQL sentences converted by the database system.
Compared with the SQL sentences converted by the database software, the original SQL sentences are basically independent of the database software, and the complexity of the data blood relationship analysis system is reduced. The data dictionary can provide auxiliary data for implicit reference, so that the analysis result of the blood-related data is more comprehensive and detailed. However, unnecessary data information may be leaked, and in a scene with sensitive data security, the data dictionary enlarges a metadata range accessible in the SQL parsing process, so that a certain security risk exists.
Therefore, the method can completely analyze the implicit reference among the fields appearing in the SQL sentence as much as possible without using a data dictionary and generate the data blood relationship.
Disclosure of Invention
Based on the method, the system, the computer equipment and the storage medium for analyzing the SQL statement can avoid interaction with database software, do not need to access additional metadata, and ensure the security of the metadata of the user as much as possible.
In order to achieve the above object, the present invention provides a method for parsing an SQL statement based on a data blood margin, where the method for parsing an SQL statement comprises:
the method comprises the following steps of firstly, obtaining a directed acyclic graph, wherein the directed acyclic graph is obtained by combing SQL sentences;
traversing the layers of the directed acyclic graph with the SQL sub-query to obtain the layers with the substitutes, wherein the layer with the substitutes appearing at first during traversal is taken as the current layer;
thirdly, acquiring a field corresponding to the substitute symbol in the current layer according to a field of a layer before the current layer;
and step four, continuously acquiring the rest layers with the substitutes, and replacing the substitutes until all the substitutes in the directed acyclic graph are analyzed.
Preferably, the directed acyclic graph is obtained by combing SQL statements, and includes:
extracting a regularized SQL statement from a script file containing an SQL code, and finishing the cleaning of the SQL statement;
and performing lexical analysis on the regularized SQL sentences to generate an abstract syntax tree, and generating a directed acyclic graph according to the abstract syntax tree.
Preferably, the extracting the normalized SQL statement from the script file containing the SQL code to complete the cleaning of the SQL statement includes:
acquiring a script file containing an SQL code, and searching a flag bit of the SQL code;
and filtering irrelevant contents in the script file by using the flag bit, and reserving and obtaining a regularized SQL code statement.
Preferably, the traversing directed acyclic graph has a graph layer of SQL sub-queries, including: and traversing the layers of the directed acyclic graph with SQL sub-queries from top to bottom or traversing the layers of the directed acyclic graph with SQL sub-queries from bottom to top.
Preferably, the obtaining a field corresponding to the substitute symbol in the current layer according to a field of a layer previous to the current layer includes: the previous layer of the current layer is a field layer, the field layer can map the corresponding field to the sub-query of the current layer according to the sub-query name, and the field of the field layer is adopted to replace the substitute symbol of the current layer.
Preferably, the obtaining a field corresponding to the substitute symbol in the current layer according to a field of a layer previous to the current layer includes: if the current layer is the first layer of traversal, continuously searching the layer without the substitute character, taking the layer without the substitute character which appears first as the field layer and the previous layer of the field layer as the substitute layer, and deducing the field corresponding to the substitute character in the substitute layer according to the field layer.
Preferably, after all surrogates in the directed acyclic graph are analyzed, the analysis result is uploaded to a block chain, so that the block chain encrypts and stores the analysis result.
The invention also provides a SQL statement parsing system based on the data blooding margin, which comprises the following steps:
a data set module for obtaining directed acyclic graph obtained by combing SQL sentences
The traversing module is used for traversing the layers of the directed acyclic graph with the SQL sub-query to obtain the layers with the substitutes, and the layer with the substitutes appearing first in the traversing process is taken as the current layer;
the replacing module is used for acquiring a field corresponding to a replacing symbol in the current layer according to a field of a layer before the current layer; and the replacing module continuously acquires the rest layers marked by the traversing module and in which the replacing characters appear, and replaces the replacing characters until all the replacing characters in the directed acyclic graph are analyzed.
To achieve the above object, the present invention further provides a computer device, including a memory and a processor, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the SQL statement parsing method.
In order to achieve the above object, the present invention further provides a storage medium storing a program file capable of implementing the SQL statement parsing method.
The invention provides an SQL statement analysis method, an SQL statement analysis system, computer equipment and a storage medium, wherein the SQL statement analysis method is used for acquiring a directed acyclic graph, and the directed acyclic graph is obtained by combing SQL statements; traversing the layers of the directed acyclic graph with SQL sub-query to obtain the layers with the substitute, and taking the layer with the substitute which is first appeared during traversal as the current layer; acquiring a field corresponding to the substitute symbol in the current layer according to a field of a layer before the current layer; and continuously acquiring the rest layers with the substitutes, and replacing the layers until all the substitutes in the directed acyclic graph are analyzed. Therefore, the SQL statement analysis method adopted in the SQL-based data blood relationship analysis software tool or system can realize the analysis of the blood relationship of the data without interacting with database software or accessing additional metadata; in addition, in a scene with higher data safety requirements, the SQL statement analysis method can realize a data blood relationship analysis function with high cohesion and low coupling, can reduce external dependence, does not need to acquire other metadata which does not appear in the data blood relationship, can ensure the safety of user metadata as much as possible, and does not expose the metadata information which does not appear in the related SQL due to the data blood relationship analysis. Meanwhile, the invention also relates to a block chain technology.
Drawings
FIG. 1 is a diagram of an implementation environment of a method for SQL statement parsing provided in one embodiment;
FIG. 2 is a block diagram showing an internal configuration of a computer device according to an embodiment;
FIG. 3 is a flow diagram of a method for SQL statement parsing in one embodiment;
FIG. 4 is a flow diagram of directed acyclic graph generation in one embodiment;
FIG. 5 is a flow diagram of SQL code cleaning in one embodiment;
FIG. 6 is a diagram of an SQL nested query structure, under an embodiment;
FIG. 7 is a schematic diagram of a directed acyclic graph generated according to FIG. 6;
FIG. 8 is a diagram of a directed acyclic graph in one embodiment;
FIG. 9 is a schematic diagram of a directed acyclic graph of example 1 in one embodiment;
FIG. 10 is a schematic diagram of a directed acyclic graph according to example 2;
FIG. 11 is a schematic diagram of a directed acyclic graph according to example 3, in one embodiment;
FIG. 12 is a schematic diagram of an SQL statement parsing system, under an embodiment;
FIG. 13 is a diagram of a data set module in one embodiment;
FIG. 14 is a schematic view of a cleaning module in one embodiment;
FIG. 15 is a schematic diagram of a computer apparatus in one embodiment;
FIG. 16 is a schematic diagram of a storage medium in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.
Fig. 1 is a diagram of an implementation environment of the data-blood-margin-based SQL statement parsing method provided in an embodiment, as shown in fig. 1, in which the implementation environment includes a computer device 110 and a display device 120.
The computer device 110 may be a computer device used by a user, and the computer device 110 is provided with an SQL statement parsing system based on data blooding margin. When calculating, the user can parse the data through the SQL statement parsing method based on the data blooding margin on the computer device 110, and display the parsing result through the display device 120.
It should be noted that the combination of the computer device 110 and the display device 120 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.
FIG. 2 is a diagram showing an internal configuration of a computer device according to an embodiment. As shown in fig. 2, the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions, when executed by the processor, can enable the processor to realize a data blood margin-based SQL statement parsing method. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a data lineage based SQL statement parsing method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As shown in fig. 3, in an embodiment, an SQL statement parsing method based on data blood margin is provided, which may be applied to the computer device 110 and the display device 120, and specifically includes the following steps:
and step 31, acquiring a directed acyclic graph, wherein the directed acyclic graph is obtained by combing SQL sentences.
Specifically, a complex SQL may be composed of a plurality of sub-SQL (query statements), the most basic and most atomic SQL may be composed of basic elements such as SQL keywords, fields, tables, and functions, and in order to record each basic element in the SQL and the relationship between them as completely as possible, thereby generating a mapping relationship at a field level. Further, referring to fig. 4 and fig. 5, the method for obtaining the directed acyclic graph by extracting the SQL statements from the script containing the SQL codes and by combing the SQL statements includes:
s311, extracting a regularized SQL statement from the script file containing the SQL code, and finishing the cleaning of the SQL statement;
further, the S311 includes:
s3111, acquiring a script file containing an SQL code, and searching a flag bit of the SQL code;
preferably, the script file may be a perl or the like script.
S3112, filtering irrelevant contents in the script file by using the flag bit, and reserving to obtain a regularized SQL code statement.
S312, performing lexical analysis on the regularized SQL sentences to generate abstract syntax trees, and generating directed acyclic graphs according to the abstract syntax trees.
Referring to fig. 6, a specific example of how to form a directed acyclic graph through SQL statements is described in the SQL statement parsing method, that is, the SQL nested query structure may be converted into the following tree structure.
Specifically, referring to fig. 6, assume that the SQL query statement in fig. 6 is S, and S includes a field Cij(i, j ∈ N) containing a source table Ti(i ∈ N); wherein C isi0Field i, C, representing the result of the S queryij(i e N, j e N) represents the j field referenced by the i field of the S query result; t isiSource table of i-th appearing in S(ii) a S using different subscripts when source table in S is a subqueryiTo indicate.
From FIG. 6, the SQL query statement of the upper graph is S0,S0The source table of is S1And S2I.e. S1From T1,S2From T2(T1And T2Representing table alias), S0Includes a field C00And C10,S1Includes a field C01And field C02,S2Includes a field C11. Further, C00Denotes S0Field 0 of the query result, C10Denotes S01 st field of the query result; c01Denotes S1Field 1, C, referenced by field 0 of the query result02Denotes S12 nd field, C, referenced by 0 th field of the query result11Denotes S2The 1 st field referenced by the 1 st field of the query result; t is1Denotes S1The 1 st Source Table, T2Denotes S2The 2 nd source table appearing in (1); s0Has S when the source table in (1) is a subquery1And S2Two sub-queries.
From the above description, it can be seen that the SQL nested query structure in fig. 6 is converted into a tree structure with three layers, that is, the generated directed acyclic graph has three layers, and the generated directed acyclic graph is shown in fig. 7.
It should be noted here that the SQL statement parsing method is mainly used when a "+" (full field substitution symbol) exists in a complex SQL statement, and performs bidirectional inference according to field names used in an upper and lower layer table and a sub-query, and converts the "+" into a real field reference. However, the SQL statement parsing method has two cases where the field represented by "") cannot be inferred, which is as follows:
in the first case, when all query fields in the SQL statement are replaced with "+", no inference can be made.
For example, taking FIG. 8 as an example, if S0Is "", if S1、S2And S3Are all "+", no inference can be made.
In the second case, when the fields listed in the query in the SQL statement do not specify table aliases, and there are multiple source tables or sub-queries, and there is an association relationship between the source tables or sub-queries, and at least two fields in the sub-queries are represented by "x", the field represented by "x" cannot be accurately inferred.
For example, taking FIG. 8 as an example, if S0Is "a + b + c", if S1Is "", S2Is "", S3Display c, then S1And S2Cannot be inferred.
Similarly, the SQL statement parsing method can also be said to have three cases to be inferred, the first case is to have table alias, and inference can be performed according to the mapping relationship; the second case is that the source tables or sub-queries have a set relationship, and can be inferred according to the mapping relationship; the third case is that when the fields listed in the query in the SQL statement do not specify table aliases, and there are multiple source tables or sub-queries with an association relationship between them, and only one field in one sub-query uses "x", the inference can be made through the association between the upper and lower layers and the same layer.
And step 32, traversing the layers of the directed acyclic graph with the SQL sub-query, acquiring the layers with the substitutes, and taking the layer with the substitutes which appears at first during traversal as the current layer.
Specifically, the directed acyclic graph may be traversed from top to bottom with layers of SQL sub-queries or from bottom to top with layers of SQL sub-queries due to the directed acyclic graph itself.
More specifically, assume that the directed acyclic graph has five layers, wherein the second layer and the third layer have surrogates, the second layer is the current layer if the directed acyclic graph is traversed from top to bottom, and the third layer is the current layer if the directed acyclic graph is traversed from bottom to top.
And step 33, acquiring a field corresponding to the substitute symbol in the current layer according to the field of the layer before the current layer.
Specifically, in an embodiment, a layer before the current layer is a field layer, and the field layer may map a corresponding field to a sub-query of the current layer according to a sub-query name, and replace a substitute symbol of the current layer with the field of the field layer.
Specifically, in an embodiment, if the current layer is the first layer of traversal, the layer without the substitute is continuously searched, the layer without the substitute that appears first is taken as the field layer, and the previous layer of the field layer is taken as the substitute layer, so as to deduce the field corresponding to the substitute in the substitute layer according to the field layer.
Further, if there are multiple sub-queries in the same layer and the sub-queries in the same hierarchy have a set operation relationship, field mapping may be performed between the sub-queries in the same layer to obtain a field corresponding to the surrogate.
It should be additionally noted that, according to the SQL syntax rule, the final layer is not necessarily a sub-query but a table, and no field or "x" occurs, and the final layer may not be considered in the inference process, which may specifically refer to fig. 8.
And step 34, continuously acquiring the remaining layers with the substitutes, and replacing the substitutes until all the substitutes in the directed acyclic graph are analyzed.
Specifically, according to step 32 and step 33, if there are surrogates in other layers, field inference can be performed through the mapping relationship between the upper layer and the lower layer until all surrogates are resolved.
Referring to fig. 8, a tree structure is shown according to the above four steps, and in the SQL statement parsing method, it can be solved that the non-x field appears in the L of the query statement0Layer (i.e., uppermost layer), LmLayer (i.e., intermediate layer), LnWhen the layer (i.e. the lowest layer) is in the upper and lower layers, the field represented by the 'x' is deduced, and a data blood relationship result, L, is generatedn+1The final layer that does not contain the sub-query.
Wherein please be not the field CijAppears at L0Then, the corresponding field can be mapped to the sub-query of the lower layer according to the sub-query name, and the sub-query of the lower layer includes the packetWhen containing "+", using field CijReplacement; not including field C in the lower-level subqueryijWhen, will field CijAnd adding the data into a field list of the lower-layer sub-query, and performing the steps downwards by analogy with recursion.
Specifically, please refer to example 1, for example:
Figure BDA0002736235430000081
referring to fig. 9, the above SQL statement is converted into a tree structure (directed acyclic graph) as shown in fig. 9, wherein the solid line in fig. 9 represents the dependency relationship of the fields, and the dotted line represents the inferred relationship between the fields, which is described as follows according to the above SQL statement:
specifically, the top layer is set as layer 0, which represents a data query for generating Table0, and Table0 obtained in the query includes three fields, namely col3, col4 and col 5; expression T according to layer 01.col1+T2Col2 as col3 it is known that the col3 data are from T1Col1 and T2Col 2; col4 data from S1Col4, col5 data from S1Col 5; the next layer (layer 1) consists of 2 tables and 1 subquery, the two tables are table1 alias T1And table2 alias name T2Sub-query alias S1(ii) a Layer 1 of S1The field queried by the subquery is '. X', and the layer 0 is searched for the name S1Has S in layer 01Col4 and S2Col5 is reference S1Fields in the sub-query, replace S with col4 and col51". about.", S in sub-queries1After replacement, select col4, col5 …; t is3The next level (level 2) of (2) is the sub-query S2Sub-query S2The queried field in (1) is "", so the reference field is looked up in the previous layer; due to S1The source table of (2) has only sub-queries S2Then at S1All fields appearing in should come from the sub-query S2So that S2In should use T3Replacement of the field present, S2After replacement isselect col4,col5 from table3。
Wherein, in the middle layer LmWhen the field of (a) is not ", if" "in the last layer of query carries table alias (e.g. T)3A) can infer whether the fields listed in the middle layer are the fields that the previous layer "-" should represent according to the table alias; if the previous layer is marked with no table alias, determining whether the current query is a unique source table queried by the previous layer, and if the current query is the unique source table, determining that the field generated by the current query is the field which the previous layer is marked with the word "-"; if not the only source table, all the fields listed in all the source tables (sub-queries) are the fields that the next layer "+" should represent.
Specifically, please refer to example 2, for example:
Figure BDA0002736235430000091
referring to FIG. 10, the above SQL statement is converted into a tree structure (directed acyclic graph) as shown in FIG. 10, according to the SQL statement, according to S1Col4 in (1) can conclude that '×' in table0 should represent col 4.
Specifically, as known from SQL, the fields of the table0 at level 0 are denoted by'; layer 1 subquery S1Contains a field col4, and only the sub-query S is available for the next layer data source of table01Then S1All fields in (1) should appear in the query of the previous layer; using layer 1 sub-queries S1All fields of (2) replace the upper layer'. The query for layer 0 after replacement should be select col4 from … ….
Wherein, when the relationship among the queries is S0,S1,S2In the form of (1), if the operations between the sub-queries in the same level are aggregation (unity \ interject \ minus), the field mapping can be performed between the sub-queries in the same level. Since the collective operation requires the field names and field sequences to be identical, the field reference mapping can be performed between the sub-queries performing collective operation at the same level according to the non-X field, and the field reference mapping can be performed according to S3The field in (1) deduces S2The fields that should be included in (a),in use of S3And S2The fields in (a) infer the fields that should be contained in table 0.
Specifically, please refer to example 3, for example:
Figure BDA0002736235430000101
referring to FIG. 11, UNION represents a collection relationship in SQL statements, and the above SQL statements are converted into tree structures (directed acyclic graphs) as shown in FIG. 11.
The three examples are just listed, and further, in practical application, the implicit field reference relationship may be completely listed by using the above method to perform combination and multiple iterations according to the complexity of SQL.
In an alternative embodiment, it is also possible to: and uploading the analysis result of the SQL statement analysis method based on the data blood margin to a block chain.
Specifically, the corresponding summary information is obtained based on the analysis result of the data-blood-margin-based SQL statement analysis method, and specifically, the summary information is obtained by performing hash processing on the analysis result of the data-blood-margin-based SQL statement analysis method, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain to verify whether the parsing result of the data-based-consanguineous-SQL statement parsing method is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The invention provides a data blood margin-based SQL statement analysis method, which comprises the steps of obtaining a directed acyclic graph, wherein the directed acyclic graph is obtained by combing SQL statements; traversing the layers of the directed acyclic graph with SQL sub-query to obtain the layers with the substitute, and taking the layer with the substitute which is first appeared during traversal as the current layer; acquiring a field corresponding to the substitute symbol in the current layer according to a field of a layer before the current layer; and continuously acquiring the rest layers with the substitutes, and replacing the layers until all the substitutes in the directed acyclic graph are analyzed. Therefore, the SQL statement analysis method adopted in the SQL-based data blood relationship analysis software tool or system can realize the analysis of the blood relationship of the data without interacting with database software or accessing additional metadata; in addition, in a scene with higher data safety requirements, the SQL statement analysis method can realize a data blood relationship analysis function with high cohesion and low coupling, can reduce external dependence, does not need to acquire other metadata which does not appear in the data blood relationship, can ensure the safety of user metadata as much as possible, and does not expose the metadata information which does not appear in the related SQL due to the data blood relationship analysis. Meanwhile, the invention also relates to a block chain technology.
As shown in fig. 12, the present invention further provides a data-based blood-margin-based SQL statement parsing system, which may be integrated in the computer device 110, and specifically may include a data set module 20, a traversal module 30, and a substitution module 40.
The data set module 20 is configured to obtain a directed acyclic graph, where the directed acyclic graph is obtained by combing SQL statements;
the traversal module 30 is configured to traverse the layer of the directed acyclic graph having the SQL sub-query, obtain the layer in which the substitute occurs, and use the layer in which the substitute occurs first during traversal as the current layer;
the replacing module 40 is configured to obtain a field corresponding to a replacing character in a current layer according to a field of a layer previous to the current layer; and the replacing module continuously acquires the rest layers marked by the traversing module and in which the replacing characters appear, and replaces the replacing characters until all the replacing characters in the directed acyclic graph are analyzed.
In one embodiment, referring to FIG. 13, the data set module 20 includes a cleansing module 21 and a generating module 22. The cleaning module 21 extracts a regularized SQL statement from a script file containing an SQL code to complete cleaning of the SQL statement; the generating module 22 is configured to perform lexical analysis on the regularized SQL statement to generate an abstract syntax tree, and may generate a directed acyclic graph according to the abstract syntax tree.
In one embodiment, referring to fig. 14, the cleaning module 21 includes a finding module 211 and a rule module 212. The searching module 211 is configured to obtain a script file containing an SQL code, and search for a flag bit of the SQL code; the rule module 212 is configured to filter irrelevant content in the script file by using the flag bits, and retain the regularized SQL code statements.
In one embodiment, the traversal module 30 further implements: and traversing the layers of the directed acyclic graph with SQL sub-queries from top to bottom or traversing the layers of the directed acyclic graph with SQL sub-queries from bottom to top.
In one embodiment, the substitution module 40 further implements: if the previous layer of the current layer is a field layer, the field layer may map the corresponding field to the sub-query of the current layer according to the sub-query category, and replace the substitute symbol of the current layer with the field of the field layer.
In one embodiment, the substitution module 40 further implements: if the current layer is the first layer of traversal, continuously searching the layer without the substitute character, taking the layer without the substitute character which appears first as the field layer and the previous layer of the field layer as the substitute layer, and deducing the field corresponding to the substitute character in the substitute layer according to the field layer.
In one embodiment, the computing system further includes a display module (not shown) for displaying the computing result, and the display module may be a display of a desktop computer or a display device of other computer equipment.
Referring to fig. 15, fig. 15 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. As shown in fig. 15, the apparatus 200 includes a processor 201 and a memory 202 coupled to the processor 201.
The memory 202 stores program instructions for implementing the data-lineage-based SQL statement parsing method according to any of the above embodiments.
The processor 201 is used to execute program instructions stored by the memory 202.
The processor 201 may also be referred to as a Central Processing Unit (CPU). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 16, fig. 16 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 301 capable of implementing all the methods described above, wherein the program file 301 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims (9)

1. A SQL statement parsing method based on data blood margin is characterized by comprising the following steps:
the method comprises the following steps of firstly, obtaining a directed acyclic graph, wherein the directed acyclic graph is obtained by combing SQL sentences;
traversing the layers of the directed acyclic graph with the SQL sub-query to obtain the layers with the substitutes, wherein the layer with the substitutes appearing at first during traversal is taken as the current layer;
thirdly, acquiring a field corresponding to the substitute symbol in the current layer according to a field of a layer before the current layer;
step four, continuously acquiring the rest image layers with the substitute symbols, and replacing the substitute symbols until all the substitute symbols in the directed acyclic graph are analyzed;
the obtaining a field corresponding to the surrogate symbol in the current layer according to a field of a layer previous to the current layer includes: the previous layer of the current layer is a field layer, the field layer can map the corresponding field to the sub-query of the current layer according to the sub-query name, and the field of the field layer is adopted to replace the substitute symbol of the current layer.
2. The SQL statement parsing method of claim 1 wherein the directed acyclic graph is derived by combing SQL statements, comprising:
extracting a regularized SQL statement from a script file containing an SQL code, and finishing the cleaning of the SQL statement;
and performing lexical analysis on the regularized SQL sentences to generate an abstract syntax tree, and generating a directed acyclic graph according to the abstract syntax tree.
3. The SQL statement parsing method according to claim 2, wherein the extracting a regularized SQL statement from a script file containing SQL codes to complete the cleaning of the SQL statement comprises:
acquiring a script file containing an SQL code, and searching a flag bit of the SQL code;
and filtering irrelevant contents in the script file by using the flag bit, and reserving and obtaining a regularized SQL code statement.
4. The SQL statement parsing method of claim 1 wherein the traversing directed acyclic graph has layers of SQL sub-queries, comprising: and traversing the layers of the directed acyclic graph with SQL sub-queries from top to bottom or traversing the layers of the directed acyclic graph with SQL sub-queries from bottom to top.
5. The SQL statement parsing method according to claim 1, wherein the obtaining a field corresponding to a surrogate symbol in a current layer according to a field in a layer before the current layer comprises: if the current layer is the first layer of traversal, continuously searching the layer without the substitute character, taking the layer without the substitute character which appears first as the field layer and the previous layer of the field layer as the substitute layer, and deducing the field corresponding to the substitute character in the substitute layer according to the field layer.
6. The SQL statement parsing method according to claim 1, wherein after all surrogates in the directed acyclic graph are parsed, the parsing result is uploaded to a block chain, so that the block chain encrypts and stores the parsing result.
7. An SQL statement parsing system based on data blooding margin, which is characterized by comprising:
the data set module is used for acquiring a directed acyclic graph, and the directed acyclic graph is obtained by combing SQL sentences;
the traversing module is used for traversing the layers of the directed acyclic graph with the SQL sub-query to obtain the layers with the substitutes, and the layer with the substitutes appearing first in the traversing process is taken as the current layer;
the replacing module is used for acquiring a field corresponding to a replacing symbol in the current layer according to a field of a layer before the current layer; the replacing module continuously acquires the rest layers marked by the traversing module and in which the replacing characters appear, and replaces the replacing characters until all the replacing characters in the directed acyclic graph are analyzed; the obtaining a field corresponding to the surrogate symbol in the current layer according to a field of a layer previous to the current layer includes: the previous layer of the current layer is a field layer, the field layer can map the corresponding field to the sub-query of the current layer according to the sub-query name, and the field of the field layer is adopted to replace the substitute symbol of the current layer.
8. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the SQL statement parsing method according to any of claims 1 to 6.
9. A storage medium storing a program file capable of implementing the SQL statement parsing method according to any one of claims 1 to 6.
CN202011134553.8A 2020-10-21 2020-10-21 SQL statement parsing method, system, computer device and storage medium Active CN112256721B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011134553.8A CN112256721B (en) 2020-10-21 2020-10-21 SQL statement parsing method, system, computer device and storage medium
PCT/CN2020/135735 WO2021179722A1 (en) 2020-10-21 2020-12-11 Sql statement parsing method and system, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011134553.8A CN112256721B (en) 2020-10-21 2020-10-21 SQL statement parsing method, system, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN112256721A CN112256721A (en) 2021-01-22
CN112256721B true CN112256721B (en) 2021-08-17

Family

ID=74264521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011134553.8A Active CN112256721B (en) 2020-10-21 2020-10-21 SQL statement parsing method, system, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112256721B (en)
WO (1) WO2021179722A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114911785A (en) * 2022-05-16 2022-08-16 北京航空航天大学 Data blood reason management method and device and electronic equipment
CN115080599B (en) * 2022-07-25 2022-11-25 成都烽顺科技有限公司 Database query SQL field blood relationship generation method
CN115237936B (en) * 2022-09-14 2024-04-05 北京海致星图科技有限公司 Method, device, storage medium and equipment for detecting fields in SQL (structured query language) statement
CN115544065B (en) * 2022-11-28 2023-02-28 北京数语科技有限公司 Data blood relationship discovery method, system, equipment and storage medium
CN115563150B (en) * 2022-12-02 2023-04-18 浙江大华技术股份有限公司 Method, equipment and storage medium for mapping Hive SQL (structured query language) and execution engine DAG (direct current)
CN116303370B (en) * 2023-05-17 2023-08-15 建信金融科技有限责任公司 Script blood margin analysis method, script blood margin analysis device, storage medium, script blood margin analysis equipment and script blood margin analysis product
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582660A (en) * 2018-12-06 2019-04-05 深圳前海微众银行股份有限公司 Data consanguinity analysis method, apparatus, equipment, system and readable storage medium storing program for executing
CN111125758A (en) * 2019-12-19 2020-05-08 北京安华金和科技有限公司 Dynamic desensitization method based on full syntax tree analysis
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130091266A1 (en) * 2011-10-05 2013-04-11 Ajit Bhave System for organizing and fast searching of massive amounts of data
CN103186541B (en) * 2011-12-27 2016-08-24 阿里巴巴集团控股有限公司 A kind of mapping relations generate method and device
CN107239458B (en) * 2016-03-28 2021-01-29 阿里巴巴集团控股有限公司 Method and device for calculating development object relationship based on big data
CN109033109B (en) * 2017-06-09 2020-11-27 杭州海康威视数字技术股份有限公司 Data processing method and system
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Method and device is determined based on the data blood relationship of structured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582660A (en) * 2018-12-06 2019-04-05 深圳前海微众银行股份有限公司 Data consanguinity analysis method, apparatus, equipment, system and readable storage medium storing program for executing
CN111125758A (en) * 2019-12-19 2020-05-08 北京安华金和科技有限公司 Dynamic desensitization method based on full syntax tree analysis
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system

Also Published As

Publication number Publication date
CN112256721A (en) 2021-01-22
WO2021179722A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
CN112256721B (en) SQL statement parsing method, system, computer device and storage medium
US10152607B2 (en) Secure access to hierarchical documents in a sorted, distributed key/value data store
US9965641B2 (en) Policy-based data-centric access control in a sorted, distributed key-value data store
US7672971B2 (en) Modular architecture for entity normalization
US9146994B2 (en) Pivot facets for text mining and search
JP5152877B2 (en) Document data storage method and apparatus in document base system
WO2013154947A1 (en) Clustered information processing and searching with structured-unstructured database bridge
TW200903324A (en) Systems and methods for modeling partitioned tables as logical indexes
US11853329B2 (en) Metadata classification
JP6440542B2 (en) Knowledge engine for managing large amounts of complex structured data
US20230267116A1 (en) Translation of tenant identifiers
CN116034349A (en) Probabilistic text indexing of semi-structured data in a columnar analysis storage format
CN111061828A (en) Digital library knowledge retrieval method and device
CN113407565B (en) Cross-database data query method, device and equipment
CN113377876A (en) Domino platform-based data sub-database processing method, device and platform
US20230153455A1 (en) Query-based database redaction
US11442973B2 (en) System and method for storing and querying document collections
US8498987B1 (en) Snippet search
US10956419B2 (en) Enhanced search functions against custom indexes
CN111881220A (en) Data operation method and device under list storage, electronic equipment and storage medium
WO2017131753A1 (en) Text search of database with one-pass indexing including filtering
CN117332400A (en) User permission checking method and device, electronic equipment and readable storage medium
CN117667882A (en) Method and device for checking sql code
KR20230030281A (en) Apparatus and method for converting of common information model based on ontology
CN117407002A (en) Transcoding method, transcoding device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant