CN107644073A - A kind of field consanguinity analysis method, system and device based on depth-first traversal - Google Patents

A kind of field consanguinity analysis method, system and device based on depth-first traversal Download PDF

Info

Publication number
CN107644073A
CN107644073A CN201710842320.5A CN201710842320A CN107644073A CN 107644073 A CN107644073 A CN 107644073A CN 201710842320 A CN201710842320 A CN 201710842320A CN 107644073 A CN107644073 A CN 107644073A
Authority
CN
China
Prior art keywords
traversal
depth
field
syntax tree
abstract syntax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710842320.5A
Other languages
Chinese (zh)
Inventor
陈乐华
涂继来
黄晓晖
陈星�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Idatatech Co Ltd
Original Assignee
Guangdong Idatatech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Idatatech Co Ltd filed Critical Guangdong Idatatech Co Ltd
Priority to CN201710842320.5A priority Critical patent/CN107644073A/en
Publication of CN107644073A publication Critical patent/CN107644073A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of field consanguinity analysis method, system and device based on depth-first traversal, method includes carrying out conversion process to SQL statement to be analyzed, generates abstract syntax tree;Depth-first traversal is carried out to the abstract syntax tree of generation;Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement;System includes conversion processing module, depth-first traversal module and verification parsing module;Device includes memory and processor.The present invention is analyzed the genetic connection between field by the method for depth-first traversal, is reduced the complexity of analysis and is improved analysis precision;In addition, the method for the present invention is directly handled SQL statement, independent of existing metadata system, it reduce further the complexity of association analysis and improve compatibility.It the composite can be widely applied to association analysis field.

Description

A kind of field consanguinity analysis method, system and device based on depth-first traversal
Technical field
The present invention relates to association analysis field, especially a kind of field consanguinity analysis method based on depth-first traversal, System and device.
Background technology
With the innovation of land tax operation system and the fast development of synthesization process, the species of product is expanded rapidly, Relation between product also becomes increasingly complex.At the same time, because the business demand of Internet era is fast changing, system is caused Workload and the complexity substantial increase of exploitation, test and O&M.Currently, high main of land tax information system complexity is caused Reason includes:First, system scale extends and expanded with product, huge program and software resource are formd, or even system point It is segmented into more multiple subsystem and transfers to multiple tissues to be developed and safeguarded;Second, the land tax core system framework of early stage use towards The designing technique of process, causes that isolation between program module and packaging effects are relatively poor, and the degree of coupling is higher between subsystem;Three It is that operation flow change is more, performance period is short so that core code modification is frequent.For these reasons, accurate identification software is real Incidence relation between body, to develop, test and workload assess and quality management it is most important.
Software entity includes the softwares such as program, database, file, public component and their communication interface structure definition Resource.The correlation analysis of software entity can be divided into two levels of entity level and data field level.Entity level relevance refers to Coupling, call relation between each software entity, data field level relevance then refer to any software entity data field and its The coupled relations such as the mapping of the data field of his software entity, calculating, transmission.The former is simple and easy, execution efficiency is high, but analyzes Precision is relatively low, limited to research and development directive function;The latter is based on program source code (or intermediate code), by programmed instruction Semantic analysis, identify the mapping between related software solid data field and transitive relation, analysis precision is high, and analysis result can Influence property analysis view and service logic view for creating program, convenient design, test, operation maintenance personnel assess software entity it Between influence details and determine modification of program scope, there is broader practice prospect and Geng Gao application value, but realize skill Art is relative complex.
In addition, existing field level consanguinity analysis needs to coordinate specific a set of metadata system, poor compatibility, to specific System or the degree of dependence of platform are high, and therefore, the existing consanguinity analysis method based on field level also cannot be widely applied to close Join analysis field.
The content of the invention
In order to solve the above technical problems, first purpose of the present invention is:There is provided that a kind of complexity is low, analysis precision is high And compatible good, the field consanguinity analysis method based on depth-first traversal.
Second object of the present invention is:There is provided that a kind of complexity is low, analysis precision is high and compatible good, based on depth Spend the field consanguinity analysis system of first traversal.
Third object of the present invention is:There is provided that a kind of complexity is low, analysis precision is high and compatible good, based on depth Spend the field consanguinity analysis device of first traversal.
First technical scheme being taken of the present invention be:
A kind of field consanguinity analysis method based on depth-first traversal, comprises the following steps:
Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax tree;
Depth-first traversal is carried out to the abstract syntax tree of generation;
Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
Further, it is described that conversion process is carried out to SQL statement to be analyzed, the step for generating abstract syntax tree, including Following steps:
Syntax parsing is carried out to SQL statement, generates abstract syntax tree;
Node in abstract syntax tree is traveled through, generates query block.
Further, the node in abstract syntax tree travels through, the step for generating query block, including following step Suddenly:
S1, create female query block and using female query block as current queries block;
S2, judge whether the node in abstract syntax tree is TOK_FROM, if so, then by the Grammar section of the node table name Preserve into the aliasToTabs attributes of current queries block;Conversely, then perform step S3;
S3, judge whether the node in abstract syntax tree is TOK_DESTINATION, if so, the node then is exported into mesh Target Grammar section is preserved into the nameToDest attributes of current queries block;Conversely, then perform step S4;
S4, judge whether the node in abstract syntax tree is TOK_SELECT, if so, then by the querying node expression formula Grammar section preserve to destToSelExpr, destToAggregationExprs of current queries block and In destToDistinctFuncExprs attributes;Conversely, then perform step S5;
S5, judge whether the node in abstract syntax tree is TOK_WHERE, if so, then by the language of the node specified requirements Method part is preserved into the destToWhereExpr attributes of current queries block;Conversely, then perform step S6;
S6, judge whether the node in abstract syntax tree is TOK_INSERT, if so, then creating subquery block and looking into son Block is ask as current queries block and return to step S2;Conversely, then perform step S7;
S7, female query block and subquery block exported.
Further, described the step for exporting female query block and subquery block, comprise the following steps:
Preorder traversal is carried out to female query block and subquery block;
Preamble traversal is carried out to the result of preorder traversal, generates middle table;
Judge whether the middle table of generation meets to impose a condition, if so, then the field in middle table is parsed and held Row next step;Conversely, then directly perform next step;
Information mining operations are carried out to female query block and subquery block.
Further, the abstract syntax tree of described pair of generation carries out the step for depth-first traversal, comprises the following steps:
Abstract syntax tree is traveled through, the metadata information of table and the metadata information of field are stored in state machine;
Abstract syntax tree is traveled through, obtains Union subqueries, Join subqueries and SubSelect subqueries, and will Union subqueries, Join subqueries and the SubSelect subqueries of acquisition store according to corresponding execution sequence successively pop down;
The subquery that pop down stores is popped and carries out genetic connection analysis, and the analysis result of genetic connection is stored in shape State machine.
Further, the subquery by pop down storage is popped and carried out genetic connection analysis, and dividing genetic connection The step for analysing result deposit state machine, it is specially:
State machine enters line statement differentiation to the subquery type currently popped:
If current subquery is Union subqueries, field association is carried out to the subquery and is stored in state machine;
If current subquery is Join subqueries or SubSelect subqueries, the subquery is stored in state machine;
If current subquery is not belonging to Union subqueries, Join subqueries and SubSelect subqueries, according to state The table and field stored in machine carries out consanguinity analysis to current subquery.
Further, the result to depth-first traversal carries out verification parsing, obtains the blood between field in SQL statement The step for edge relation, comprise the following steps:
Input the first test case;
First test case is manually parsed respectively and machine parses;
Judge whether artificial parsing is consistent with the result of machine parsing, if so, then performing next step;Conversely, then input New test case is as the first test case and the first test case is manually parsed respectively for return and machine parses this One step;
The second test case is inputted, machine parsing is carried out to the second test case;
According to the machine analysis result of the first test case and the machine analysis result of the second test case, analysis first is surveyed Genetic connection on probation between example and the second use-case;
Verification operation is carried out to the machine analysis result of the second test case, obtains the blood relationship between field in SQL statement Relation.
Second technical scheme that the present invention takes be:
A kind of field consanguinity analysis method system based on depth-first traversal, including:
Conversion processing module, for carrying out conversion process to SQL statement to be analyzed, generate abstract syntax tree;
Depth-first traversal module, for carrying out depth-first traversal to the abstract syntax tree of generation;
Verify parsing module, for carrying out verification parsing to the result of depth-first traversal, obtain in SQL statement field it Between genetic connection.
Further, the depth-first traversal module includes:
Tadata memory module, for being traveled through to abstract syntax tree, by the metadata information of table and first number of field It is believed that breath deposit state machine;
Pop down memory module, for being traveled through to abstract syntax tree, obtain Union subqueries, Join subqueries and SubSelect subqueries, and subquery is stored according to corresponding execution sequence successively pop down;
Consanguinity analysis module, the subquery for pop down to be stored is popped and carries out genetic connection analysis, and blood relationship is closed The analysis result deposit state machine of system.
The 3rd technical scheme that the present invention takes be:
A kind of field consanguinity analysis device based on depth-first traversal, including:
Memory, for storage program;
Processor, described program is performed, for:Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax Tree;
Depth-first traversal is carried out to the abstract syntax tree of generation;
Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
The beneficial effects of the method for the present invention is:This method is and right by carrying out depth-first traversal to abstract syntax tree The result of depth-first traversal carries out verification parsing, so as to obtain the genetic connection in SQL statement between field, compared to existing Correlation analysis method based on software entity level, reduce the complexity of analysis and improve analysis precision;It is in addition, of the invention Method directly SQL statement is handled, independent of existing metadata system, reduce further answering for association analysis It is miscellaneous to spend and improve compatibility.
The beneficial effect of system of the present invention is:The system is carried out deep by depth-first traversal module to abstract syntax tree First traversal is spent, and verification parsing is carried out to the result of depth-first traversal by verifying parsing module, so as to obtain SQL statement Genetic connection between middle field, compared to the existing correlation analysis system based on software entity level, reduce answering for analysis It is miscellaneous to spend and improve analysis precision;In addition, the system of the present invention is directly handled SQL statement, independent of existing member Data system, it reduce further the complexity of association analysis and improve the compatibility of system.
The beneficial effect of device of the present invention is:The system carries out depth-first time by processor to abstract syntax tree Go through, and verification parsing is carried out to the result of depth-first traversal, so as to obtain the genetic connection in SQL statement between field, phase Compared with the existing correlation analysis system based on software entity level, reduce complexity and improve analysis precision;In addition, this hair Bright device is directly handled SQL statement, independent of existing metadata system, reduce further association analysis Complexity and the compatibility for improving device.
Brief description of the drawings
Fig. 1 is a kind of step flow chart of the field consanguinity analysis method based on depth-first traversal of the present invention;
Fig. 2 is a kind of program module block diagram of the field consanguinity analysis system based on depth-first traversal of the present invention;
Fig. 3 is the step flow chart that the embodiment of the present invention one verifies resolving.
Embodiment
A kind of reference picture 1, field consanguinity analysis method based on depth-first traversal, comprises the following steps:
Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax tree;
Depth-first traversal is carried out to the abstract syntax tree of generation;
Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
Preferred embodiment is further used as, it is described that conversion process is carried out to SQL statement to be analyzed, generate abstract language The step for method tree, comprise the following steps:
Syntax parsing is carried out to SQL statement, generates abstract syntax tree;
Node in abstract syntax tree is traveled through, generates query block.
Wherein, query block refers to each query statement in abstract syntax tree, for preserving the corresponding Grammar section of node.
Preferred embodiment is further used as, the node in abstract syntax tree travels through, and generates query block The step for, comprise the following steps:
S1, create female query block and using female query block as current queries block;
S2, judge whether the node in abstract syntax tree is TOK_FROM, if so, then by the Grammar section of the node table name Preserve into the aliasToTabs attributes of current queries block;Conversely, then perform step S3;
S3, judge whether the node in abstract syntax tree is TOK_DESTINATION, if so, the node then is exported into mesh Target Grammar section is preserved into the nameToDest attributes of current queries block;Conversely, then perform step S4;
S4, judge whether the node in abstract syntax tree is TOK_SELECT, if so, then by the querying node expression formula Grammar section preserve to destToSelExpr, destToAggregationExprs of current queries block and In destToDistinctFuncExprs attributes;Conversely, then perform step S5;
S5, judge whether the node in abstract syntax tree is TOK_WHERE, if so, then by the language of the node specified requirements Method part is preserved into the destToWhereExpr attributes of current queries block;Conversely, then perform step S6;
S6, judge whether the node in abstract syntax tree is TOK_INSERT, if so, then creating subquery block and looking into son Block is ask as current queries block and return to step S2;Conversely, then perform step S7;
S7, female query block and subquery block exported.
In SQL syntax, TOK_FROM, TOK_DESTINATION, TOK_SELECT, TOK_WHERE and TOK_ INSERT is corresponding node label in abstract syntax tree;aliasToTabs、nameToDest、destToSelExpr、 DestToAggregationExprs, destToDistinctFuncExprs and destToWhereExpr are in query block Corresponding attribute mark.
It is further used as preferred embodiment, described the step for exporting female query block and subquery block, including with Lower step:
Preorder traversal is carried out to female query block and subquery block;
Preamble traversal is carried out to the result of preorder traversal, generates middle table;
Judge whether the middle table of generation meets to impose a condition, if so, then the field in middle table is parsed and held Row next step;Conversely, then directly perform next step;
Information mining operations are carried out to female query block and subquery block.
In SQL syntax, table is a kind of bivariate table for being used to deposit data set, and its keyword is TABLE;Present embodiment In middle table refer to a kind of table generated after preamble travels through, operated for follow-up judgement;Female query block and son are looked into Block progress information mining operations are ask to refer to be excavated the information such as the mark stored in respective queries block and attribute.
It is further used as preferred embodiment, the abstract syntax tree of described pair of generation carries out depth-first traversal this step Suddenly, comprise the following steps:
Abstract syntax tree is traveled through, the metadata information of table and the metadata information of field are stored in state machine;
Abstract syntax tree is traveled through, obtains Union subqueries, Join subqueries and SubSelect subqueries, and will Union subqueries, Join subqueries and the SubSelect subqueries of acquisition store according to corresponding execution sequence successively pop down;
The subquery that pop down stores is popped and carries out genetic connection analysis, and the analysis result of genetic connection is stored in shape State machine.
Wherein, state machine is used for the data for storing respective table and field;In SQL syntax, Union subqueries, Join Inquiry and SubSelect subqueries represent different types of query statement respectively.
The present invention is cached by adding state machine to the information during depth-first traversal, except can be right The operations such as create, insert and update are carried out outside consanguinity analysis, moreover it is possible to the changes such as alter are operated and carry out blood relationship point Analysis, makes analysis more comprehensive.
Preferred embodiment is further used as, the subquery by pop down storage, which is popped and carries out genetic connection, to be divided The step for analysing, and the analysis result of genetic connection is stored in state machine, it is specially:
State machine enters line statement differentiation to the subquery type currently popped:
If current subquery is Union subqueries, field association is carried out to the subquery and is stored in state machine;
If current subquery is Join subqueries or SubSelect subqueries, the subquery is stored in state machine;
If current subquery is not belonging to Union subqueries, Join subqueries and SubSelect subqueries, according to state The table and field stored in machine carries out consanguinity analysis to current subquery.
Reference picture 3, is further used as preferred embodiment, and the result to depth-first traversal carries out verification solution Analysis, the step for obtaining in SQL statement the genetic connection between field, comprise the following steps:
Input the first test case;
First test case is manually parsed respectively and machine parses;
Judge whether artificial parsing is consistent with the result of machine parsing, if so, then performing next step;Conversely, then input New test case is as the first test case and the first test case is manually parsed respectively for return and machine parses this One step;
The second test case is inputted, machine parsing is carried out to the second test case;
According to the machine analysis result of the first test case and the machine analysis result of the second test case, analysis first is surveyed Genetic connection on probation between example and the second use-case;
Verification operation is carried out to the machine analysis result of the second test case, obtains the blood relationship between field in SQL statement Relation.
Reference picture 2, the present invention a kind of field consanguinity analysis side based on depth-first traversal corresponding with Fig. 1 method Method system, including:
Conversion processing module, for carrying out conversion process to SQL statement to be analyzed, generate abstract syntax tree;
Depth-first traversal module, for carrying out depth-first traversal to the abstract syntax tree of generation;
Verify parsing module, for carrying out verification parsing to the result of depth-first traversal, obtain in SQL statement field it Between genetic connection.
Preferred embodiment is further used as, the depth-first traversal module includes:
Tadata memory module, for being traveled through to abstract syntax tree, by the metadata information of table and first number of field It is believed that breath deposit state machine;
Pop down memory module, for being traveled through to abstract syntax tree, obtain Union subqueries, Join subqueries and SubSelect subqueries, and subquery is stored according to corresponding execution sequence successively pop down;
Consanguinity analysis module, the subquery for pop down to be stored is popped and carries out genetic connection analysis, and blood relationship is closed The analysis result deposit state machine of system.
It is corresponding with Fig. 1 method, a kind of field consanguinity analysis device based on depth-first traversal of the present invention, including:
Memory, for storage program;
Processor, described program is performed, for:Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax Tree;
Depth-first traversal is carried out to the abstract syntax tree of generation;
Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
The present invention is described in further detail with reference to Figure of description and specific embodiment.
Embodiment one
Complicated and compatibility is realized based on the low and existing field level association analysis of existing entity level association analysis analysis precision The shortcomings that poor, the present invention propose a kind of field consanguinity analysis method, system and device based on depth-first traversal, the present invention The genetic connection between field is analyzed by the method for depth-first traversal, the complexity of analysis is greatly reduced and carries High analysis precision;Meanwhile the present invention only carries out blood relationship point independent of existing metadata system by parsing SQL statement Analysis, substantially increases compatibility, can be suitably used for any platform.
Starting with below from explanation of nouns and specific workflow these two aspects, the present invention is described in detail
(1) explanation of nouns
antlr:Antlr refers to that syntax tree and the visual grammer of increasing income shown can be automatically generated according to input Analyzer.Antlr combines lexical analyzer, syntax analyzer and tree analyzer, and it allows us to define identification word Accord with the morphological rule of stream and the rule governing parsing for explaining Token streams.Then, the grammer that antlr will provide according to user File automatically generates corresponding morphology/syntax analyzer, and user can utilize morphology/syntax analyzer of generation by the text of input Originally it is compiled, and is converted into other forms (such as AST:Abstract Syntax Tree, i.e. abstract syntax tree).
(2) specific workflow
Reference picture 1, a kind of specific workflow of the field consanguinity analysis method based on depth-first traversal of the present invention are:
Step 1:Parsing process.
Parsing process specifically includes following steps:
The SQL statement being analysed to is inputted to antlr syntax analyzers;
SQL statement is converted into abstract syntax tree by antlr syntax analyzers according to specific morphology/syntax rule, simultaneously Morphology and syntax parsing are carried out to SQL statement to be analyzed.
Wherein, SQL statement is converted into abstract syntax tree by antlr syntax analyzers according to specific morphology/syntax rule, The step for carrying out morphology and syntax parsing to SQL statement to be analyzed simultaneously comprises the following steps:
Syntax parsing is carried out to SQL statement, generates abstract syntax tree;
Node in abstract syntax tree is traveled through, generates query block.
Specifically, the node in abstract syntax tree is traveled through, generate query block the step for, comprise the following steps:
S1, create female query block and using female query block as current queries block;
S2, judge whether the node in abstract syntax tree is TOK_FROM, if so, then by the Grammar section of the node table name Preserve into the aliasToTabs attributes of current queries block;Conversely, then perform step S3;
S3, judge whether the node in abstract syntax tree is TOK_DESTINATION, if so, the node then is exported into mesh Target Grammar section is preserved into the nameToDest attributes of current queries block;Conversely, then perform step S4;
S4, judge whether the node in abstract syntax tree is TOK_SELECT, if so, then by the querying node expression formula Grammar section preserve to destToSelExpr, destToAggregationExprs of current queries block and In destToDistinctFuncExprs attributes;Conversely, then perform step S5;
S5, judge whether the node in abstract syntax tree is TOK_WHERE, if so, then by the language of the node specified requirements Method part is preserved into the destToWhereExpr attributes of current queries block;Conversely, then perform step S6;
S6, judge whether the node in abstract syntax tree is TOK_INSERT, if so, then creating subquery block and looking into son Block is ask as current queries block and return to step S2;Conversely, then perform step S7;
S7, female query block and subquery block exported.
Described the step for exporting female query block and subquery block, comprise the following steps:
Preorder traversal is carried out to female query block and subquery block;
Preamble traversal is carried out to the result of preorder traversal, generates middle table;
Judge whether the middle table of generation meets to impose a condition, if so, then the field in middle table is parsed and held Row next step;Conversely, then directly perform next step;
Information mining operations are carried out to female query block and subquery block.
Step 2:Depth-first traversal process.
The depth-first traversal process comprises the following steps:
Abstract syntax tree is traveled through, the metadata information of table and the metadata information of field are stored in state machine;
Abstract syntax tree is traveled through, obtains Union subqueries, Join subqueries and SubSelect subqueries, and will Union subqueries, Join subqueries and the SubSelect subqueries of acquisition store according to corresponding execution sequence successively pop down;
The subquery that pop down stores is popped and carries out genetic connection analysis, and the analysis result of genetic connection is stored in shape State machine.
Wherein, the subquery that pop down stores is popped and carries out genetic connection analysis, and by the analysis result of genetic connection The step for being stored in state machine, it is specially:
State machine enters line statement differentiation to the subquery type currently popped:
If current subquery is Union subqueries, field association is carried out to the subquery and is stored in state machine;
If current subquery is Join subqueries, the subquery is stored in state machine;
If current subquery is SubSelect subqueries, the subquery is stored in state machine;
If current subquery is not belonging to Union subqueries, Join subqueries and SubSelect subqueries, according to state The table and field stored in machine carries out consanguinity analysis to current subquery.
The present invention by being parsed to SQL statement to be analyzed, obtain corresponding input/output list, input and output field with And corresponding treatment conditions, show so as to carry out analysis to the genetic connection between field.
The depth-first traversal process of the present invention by the abstract syntax tree of SQL statement to be analyzed is carried out repeatedly traversal come Realize:
The metadata information deposit state machine of obtained table and field is traveled through first, and eliminates alias influence;
Then travel through again, analyze the subquery sentence comprising Union and Join (in Oracle SQL standards, also Need subquery sentence of the analysis bag containing subSelect), the subquery sentence analyzed is deposited by execution sequence stacking;
Finally each subquery sentence is popped and independently traveled through, and corresponding field is associated into deposit state Machine, state machine are judged the coverage of sub- query statement, and then are decided whether to merge or given up field association.Every time The result of analysis SQL statement will be all retained in state machine, and the result analyzed every time can provide member for later analysis Data and related information.
Reference picture 3, step 3:Verify resolving.
Verification resolving specifically includes following steps:
Input the first test case;
First test case is manually parsed respectively and machine parses;
Judge whether artificial parsing is consistent with the result of machine parsing, if so, then performing next step;Conversely, then input New test case is as the first test case and the first test case is manually parsed respectively for return and machine parses this One step;
The second test case is inputted, machine parsing is carried out to the second test case;
According to the machine analysis result of the first test case and the machine analysis result of the second test case, analysis first is surveyed Genetic connection on probation between example and the second use-case;
Inspection operation is carried out to the machine analysis result of the second test case, obtains the blood relationship between field in SQL statement Relation.
In summary, a kind of field consanguinity analysis method based on depth-first traversal of the present invention has advantages below:
1), the present invention is analyzed the genetic connection between field by the method for depth-first traversal, compared to existing There is the correlation analysis method based on software entity level, reduce the complexity of analysis and improve analysis precision.
2), the present invention directly carries out dissection process to SQL statement, independent of existing metadata system, further drops The low complexity of association analysis and improve the compatibility of system.
3), the present invention by introducing state machine, can Cache associativity analysis during processing information so that every time analysis Result can provide metadata and related information for later analysis.
4), the present invention also supports the SQL standards such as MySQL and Oracle except supporting SQL92 standards.
5), the present invention is by introducing state machine, except can be to the carry out blood relationship of the operations such as create, insert and update Outside analysis, moreover it is possible to the changes such as alter are operated and carry out consanguinity analysis, make analysis more comprehensive.
Above is the preferable implementation to the present invention is illustrated, but the present invention is not limited to the embodiment, ripe A variety of equivalent variations or replacement can also be made on the premise of without prejudice to spirit of the invention by knowing those skilled in the art, this Equivalent deformation or replacement are all contained in the application claim limited range a bit.

Claims (10)

  1. A kind of 1. field consanguinity analysis method based on depth-first traversal, it is characterised in that:Comprise the following steps:
    Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax tree;
    Depth-first traversal is carried out to the abstract syntax tree of generation;
    Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
  2. A kind of 2. field consanguinity analysis method based on depth-first traversal according to claim 1, it is characterised in that:Institute State and conversion process is carried out to SQL statement to be analyzed, the step for generating abstract syntax tree, comprise the following steps:
    Syntax parsing is carried out to SQL statement, generates abstract syntax tree;
    Node in abstract syntax tree is traveled through, generates query block.
  3. A kind of 3. field consanguinity analysis method based on depth-first traversal according to claim 2, it is characterised in that:Institute State and the node in abstract syntax tree traveled through, generate query block the step for, comprise the following steps:
    S1, create female query block and using female query block as current queries block;
    S2, judge whether the node in abstract syntax tree is TOK_FROM, if so, then preserving the Grammar section of the node table name Into the aliasToTabs attributes of current queries block;Conversely, then perform step S3;
    S3, judge whether the node in abstract syntax tree is TOK_DESTINATION, if so, the node then is exported into target Grammar section is preserved into the nameToDest attributes of current queries block;Conversely, then perform step S4;
    S4, judge whether the node in abstract syntax tree is TOK_SELECT, if so, then by the grammer of the querying node expression formula Part preserve to current queries block destToSelExpr, destToAggregationExprs and In destToDistinctFuncExprs attributes;Conversely, then perform step S5;
    S5, judge whether the node in abstract syntax tree is TOK_WHERE, if so, then by the grammer portion of the node specified requirements Code insurance is deposited into the destToWhereExpr attributes of current queries block;Conversely, then perform step S6;
    S6, judge whether the node in abstract syntax tree is TOK_INSERT, if so, then creating subquery block and by subquery block As current queries block and return to step S2;Conversely, then perform step S7;
    S7, female query block and subquery block exported.
  4. A kind of 4. field consanguinity analysis method based on depth-first traversal according to claim 3, it is characterised in that:Institute The step for stating female query block and the output of subquery block, comprises the following steps:
    Preorder traversal is carried out to female query block and subquery block;
    Preamble traversal is carried out to the result of preorder traversal, generates middle table;
    Judge whether the middle table of generation meets to impose a condition, if so, then the field in middle table is parsed and performed down One step;Conversely, then directly perform next step;
    Information mining operations are carried out to female query block and subquery block.
  5. A kind of 5. field consanguinity analysis method based on depth-first traversal according to claim 1, it is characterised in that:Institute The step for depth-first traversal is carried out to the abstract syntax tree of generation is stated, is comprised the following steps:
    Abstract syntax tree is traveled through, the metadata information of table and the metadata information of field are stored in state machine;
    Abstract syntax tree is traveled through, obtains Union subqueries, Join subqueries and SubSelect subqueries, and will obtain Union subqueries, Join subqueries and SubSelect subqueries according to corresponding execution sequence successively pop down store;
    The subquery that pop down stores is popped and carries out genetic connection analysis, and the analysis result of genetic connection is stored in state Machine.
  6. A kind of 6. field consanguinity analysis method based on depth-first traversal according to claim 5, it is characterised in that:Institute State and the subquery of pop down storage popped and carries out genetic connection analysis, and by the analysis result deposit state machine of genetic connection this One step, it is specially:
    State machine enters line statement differentiation to the subquery type currently popped:
    If current subquery is Union subqueries, field association is carried out to the subquery and is stored in state machine;
    If current subquery is Join subqueries or SubSelect subqueries, the subquery is stored in state machine;
    If current subquery is not belonging to Union subqueries, Join subqueries and SubSelect subqueries, according in state machine The table and field of storage carry out consanguinity analysis to current subquery.
  7. A kind of 7. field consanguinity analysis method based on depth-first traversal according to claim 1, it is characterised in that:Institute State and verification parsing is carried out to the result of depth-first traversal, the step for obtaining in SQL statement the genetic connection between field, bag Include following steps:
    Input the first test case;
    First test case is manually parsed respectively and machine parses;
    Judge whether artificial parsing is consistent with the result of machine parsing, if so, then performing next step;Conversely, then input newly Test case is as the first test case and the first test case is manually parsed respectively for return and machine parses this step Suddenly;
    The second test case is inputted, machine parsing is carried out to the second test case;
    According to the machine analysis result of the first test case and the machine analysis result of the second test case, the test of analysis first is used Genetic connection between example and the second use-case;
    Verification operation is carried out to the machine analysis result of the second test case, obtains the genetic connection between field in SQL statement.
  8. A kind of 8. field consanguinity analysis method system based on depth-first traversal, it is characterised in that:Including:
    Conversion processing module, for carrying out conversion process to SQL statement to be analyzed, generate abstract syntax tree;
    Depth-first traversal module, for carrying out depth-first traversal to the abstract syntax tree of generation;
    Parsing module is verified, for carrying out verification parsing to the result of depth-first traversal, is obtained in SQL statement between field Genetic connection.
  9. A kind of 9. field consanguinity analysis system based on depth-first traversal according to claim 8, it is characterised in that:Institute Stating depth-first traversal module includes:
    Tadata memory module, for being traveled through to abstract syntax tree, the metadata of the metadata information of table and field is believed Breath deposit state machine;
    Pop down memory module, for being traveled through to abstract syntax tree, obtain Union subqueries, Join subqueries and SubSelect subqueries, and subquery is stored according to corresponding execution sequence successively pop down;
    Consanguinity analysis module, the subquery for pop down to be stored are popped and carry out genetic connection analysis, and by genetic connection Analysis result is stored in state machine.
  10. A kind of 10. field consanguinity analysis device based on depth-first traversal, it is characterised in that:Including:
    Memory, for storage program;
    Processor, described program is performed, for:Conversion process is carried out to SQL statement to be analyzed, generates abstract syntax tree;
    Depth-first traversal is carried out to the abstract syntax tree of generation;
    Verification parsing is carried out to the result of depth-first traversal, obtains the genetic connection between field in SQL statement.
CN201710842320.5A 2017-09-18 2017-09-18 A kind of field consanguinity analysis method, system and device based on depth-first traversal Pending CN107644073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710842320.5A CN107644073A (en) 2017-09-18 2017-09-18 A kind of field consanguinity analysis method, system and device based on depth-first traversal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710842320.5A CN107644073A (en) 2017-09-18 2017-09-18 A kind of field consanguinity analysis method, system and device based on depth-first traversal

Publications (1)

Publication Number Publication Date
CN107644073A true CN107644073A (en) 2018-01-30

Family

ID=61111944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710842320.5A Pending CN107644073A (en) 2017-09-18 2017-09-18 A kind of field consanguinity analysis method, system and device based on depth-first traversal

Country Status (1)

Country Link
CN (1) CN107644073A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256113A (en) * 2018-02-09 2018-07-06 口碑(上海)信息技术有限公司 The method for digging and device of data genetic connection
CN108595971A (en) * 2018-04-25 2018-09-28 杭州闪捷信息科技股份有限公司 A kind of database adaptive refinement method
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Data blood margin determination method and device based on structural data
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109614432A (en) * 2018-12-05 2019-04-12 北京百分点信息科技有限公司 A kind of system and method for the acquisition data genetic connection based on syntactic analysis
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network
CN109815378A (en) * 2019-01-31 2019-05-28 三盟科技股份有限公司 A kind of data tracing method and system based on metadata link
CN110515823A (en) * 2018-05-21 2019-11-29 百度在线网络技术(北京)有限公司 Program code complexity evaluation methodology and device
CN110555032A (en) * 2019-09-09 2019-12-10 北京搜狐新媒体信息技术有限公司 Data blood relationship analysis method and system based on metadata
CN110633333A (en) * 2019-09-25 2019-12-31 京东数字科技控股有限公司 Data blood relationship processing method and system, computing device and medium
CN110674229A (en) * 2019-09-24 2020-01-10 山东爱城市网信息技术有限公司 AST-based relational database SQL table relational analysis and display method
CN110727677A (en) * 2019-09-19 2020-01-24 上海数禾信息科技有限公司 Method and device for tracing blood relationship of table in data warehouse
CN111078729A (en) * 2019-12-19 2020-04-28 医渡云(北京)技术有限公司 Medical data tracing method, device, system, storage medium and electronic equipment
CN111400338A (en) * 2020-03-04 2020-07-10 平安医疗健康管理股份有限公司 SQ L optimization method, device, storage medium and computer equipment
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system
CN111538744A (en) * 2020-07-08 2020-08-14 浙江大华技术股份有限公司 Method and device for processing data blood margin
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment
CN111782265A (en) * 2020-06-28 2020-10-16 中国工商银行股份有限公司 Software resource system based on field level blood relationship and establishment method thereof
CN112035508A (en) * 2020-08-27 2020-12-04 深圳天源迪科信息技术股份有限公司 SQL (structured query language) -based online metadata analysis method, system and equipment
CN112035416A (en) * 2020-08-31 2020-12-04 北京嘀嘀无限科技发展有限公司 Data blood margin analysis method and device, electronic equipment and storage medium
CN112256720A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Data cost calculation method, system, computer device and storage medium
CN112347123A (en) * 2020-11-10 2021-02-09 北京金山云网络技术有限公司 Data blood margin analysis method and device and server
CN112948400A (en) * 2020-09-17 2021-06-11 深圳市明源云科技有限公司 Database management method, database management device and terminal equipment
CN113177057A (en) * 2021-04-28 2021-07-27 深圳依时货拉拉科技有限公司 SQL statement syntax visualization analysis method, system and computer readable storage medium
CN113220800A (en) * 2021-05-17 2021-08-06 上海合合信息科技股份有限公司 Data field blood relationship analysis method and device based on ANTLR
CN113326286A (en) * 2021-08-03 2021-08-31 杭州量之智能科技有限公司 Semantic analysis method supporting dialect SQL blood margin analysis
CN113326401A (en) * 2021-06-16 2021-08-31 上海哔哩哔哩科技有限公司 Method and system for generating field blood margin
CN113590610A (en) * 2021-06-29 2021-11-02 四川新网银行股份有限公司 Blood relationship representation method based on Elastic Search

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078746A (en) * 1993-10-29 2000-06-20 Microsoft Corporation Method and system for reducing an intentional program tree represented by high-level computational constructs
CN1859359A (en) * 2005-07-12 2006-11-08 上海华为技术有限公司 Realizing method and its device for communication protocol described by abstract grammar rule
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship
CN103235723A (en) * 2013-04-23 2013-08-07 浙江天正思维信息技术有限公司 Application software code extraction method based on abstract syntax tree and software product features
CN104199831A (en) * 2014-07-31 2014-12-10 深圳市腾讯计算机系统有限公司 Information processing method and device
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN104899314A (en) * 2015-06-17 2015-09-09 北京京东尚科信息技术有限公司 Pedigree analysis method and device of data warehouse
CN105608086A (en) * 2014-11-17 2016-05-25 中兴通讯股份有限公司 Transaction processing method and device of distributed database system
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN107133027A (en) * 2017-03-30 2017-09-05 南京南瑞继保电气有限公司 A kind of syntax tree stratification method for expressing

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078746A (en) * 1993-10-29 2000-06-20 Microsoft Corporation Method and system for reducing an intentional program tree represented by high-level computational constructs
CN1859359A (en) * 2005-07-12 2006-11-08 上海华为技术有限公司 Realizing method and its device for communication protocol described by abstract grammar rule
CN103186541A (en) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 Generation method and device for mapping relationship
CN103235723A (en) * 2013-04-23 2013-08-07 浙江天正思维信息技术有限公司 Application software code extraction method based on abstract syntax tree and software product features
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN104199831A (en) * 2014-07-31 2014-12-10 深圳市腾讯计算机系统有限公司 Information processing method and device
CN105608086A (en) * 2014-11-17 2016-05-25 中兴通讯股份有限公司 Transaction processing method and device of distributed database system
CN104899314A (en) * 2015-06-17 2015-09-09 北京京东尚科信息技术有限公司 Pedigree analysis method and device of data warehouse
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN107133027A (en) * 2017-03-30 2017-09-05 南京南瑞继保电气有限公司 A kind of syntax tree stratification method for expressing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王飞: ""云环境下海量数据查询处理与分析技术研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑I138-701》 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256113A (en) * 2018-02-09 2018-07-06 口碑(上海)信息技术有限公司 The method for digging and device of data genetic connection
CN108256113B (en) * 2018-02-09 2020-06-16 口碑(上海)信息技术有限公司 Data blood relationship mining method and device
CN108595971A (en) * 2018-04-25 2018-09-28 杭州闪捷信息科技股份有限公司 A kind of database adaptive refinement method
CN110515823A (en) * 2018-05-21 2019-11-29 百度在线网络技术(北京)有限公司 Program code complexity evaluation methodology and device
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Data blood margin determination method and device based on structural data
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109614432B (en) * 2018-12-05 2021-01-05 北京百分点信息科技有限公司 System and method for acquiring data blood relationship based on syntactic analysis
CN109614432A (en) * 2018-12-05 2019-04-12 北京百分点信息科技有限公司 A kind of system and method for the acquisition data genetic connection based on syntactic analysis
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network
CN109815378A (en) * 2019-01-31 2019-05-28 三盟科技股份有限公司 A kind of data tracing method and system based on metadata link
CN110555032A (en) * 2019-09-09 2019-12-10 北京搜狐新媒体信息技术有限公司 Data blood relationship analysis method and system based on metadata
CN110727677B (en) * 2019-09-19 2022-12-30 上海数禾信息科技有限公司 Method and device for tracing blood relationship of table in data warehouse
CN110727677A (en) * 2019-09-19 2020-01-24 上海数禾信息科技有限公司 Method and device for tracing blood relationship of table in data warehouse
CN110674229A (en) * 2019-09-24 2020-01-10 山东爱城市网信息技术有限公司 AST-based relational database SQL table relational analysis and display method
CN110633333A (en) * 2019-09-25 2019-12-31 京东数字科技控股有限公司 Data blood relationship processing method and system, computing device and medium
CN111078729A (en) * 2019-12-19 2020-04-28 医渡云(北京)技术有限公司 Medical data tracing method, device, system, storage medium and electronic equipment
CN111078729B (en) * 2019-12-19 2023-04-28 医渡云(北京)技术有限公司 Medical data tracing method, device, system, storage medium and electronic equipment
CN111400338A (en) * 2020-03-04 2020-07-10 平安医疗健康管理股份有限公司 SQ L optimization method, device, storage medium and computer equipment
CN111400338B (en) * 2020-03-04 2022-11-22 深圳平安医疗健康科技服务有限公司 SQL optimization method, device, storage medium and computer equipment
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system
CN111538743B (en) * 2020-04-22 2023-08-18 电子科技大学 SQL-based data blood relationship analysis method and system
CN111639143A (en) * 2020-06-05 2020-09-08 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment
CN111639143B (en) * 2020-06-05 2020-12-22 广州市玄武无线科技股份有限公司 Data blood relationship display method and device of data warehouse and electronic equipment
CN111782265A (en) * 2020-06-28 2020-10-16 中国工商银行股份有限公司 Software resource system based on field level blood relationship and establishment method thereof
CN111782265B (en) * 2020-06-28 2024-02-02 中国工商银行股份有限公司 Software resource system based on field-level blood-relation and establishment method thereof
CN111538744A (en) * 2020-07-08 2020-08-14 浙江大华技术股份有限公司 Method and device for processing data blood margin
CN112035508A (en) * 2020-08-27 2020-12-04 深圳天源迪科信息技术股份有限公司 SQL (structured query language) -based online metadata analysis method, system and equipment
CN112035416A (en) * 2020-08-31 2020-12-04 北京嘀嘀无限科技发展有限公司 Data blood margin analysis method and device, electronic equipment and storage medium
CN112948400A (en) * 2020-09-17 2021-06-11 深圳市明源云科技有限公司 Database management method, database management device and terminal equipment
CN112256720A (en) * 2020-10-21 2021-01-22 平安科技(深圳)有限公司 Data cost calculation method, system, computer device and storage medium
CN112347123A (en) * 2020-11-10 2021-02-09 北京金山云网络技术有限公司 Data blood margin analysis method and device and server
CN113177057A (en) * 2021-04-28 2021-07-27 深圳依时货拉拉科技有限公司 SQL statement syntax visualization analysis method, system and computer readable storage medium
CN113220800A (en) * 2021-05-17 2021-08-06 上海合合信息科技股份有限公司 Data field blood relationship analysis method and device based on ANTLR
CN113220800B (en) * 2021-05-17 2023-11-10 上海合合信息科技股份有限公司 ANTLR-based data field blood-edge analysis method and device
CN113326401B (en) * 2021-06-16 2023-01-20 上海哔哩哔哩科技有限公司 Method and system for generating field blood relationship
CN113326401A (en) * 2021-06-16 2021-08-31 上海哔哩哔哩科技有限公司 Method and system for generating field blood margin
CN113590610B (en) * 2021-06-29 2023-06-20 四川新网银行股份有限公司 Blood relationship expression method based on Elastic Search
CN113590610A (en) * 2021-06-29 2021-11-02 四川新网银行股份有限公司 Blood relationship representation method based on Elastic Search
CN113326286A (en) * 2021-08-03 2021-08-31 杭州量之智能科技有限公司 Semantic analysis method supporting dialect SQL blood margin analysis

Similar Documents

Publication Publication Date Title
CN107644073A (en) A kind of field consanguinity analysis method, system and device based on depth-first traversal
CN110489445B (en) Rapid mass data query method based on polymorphic composition
CN104636257B (en) The DBAS automated testing method covered based on SQL
Weir et al. Dbpal: A fully pluggable nl2sql training pipeline
CN111782265B (en) Software resource system based on field-level blood-relation and establishment method thereof
EP3671526B1 (en) Dependency graph based natural language processing
CN111104423B (en) SQL statement generation method and device, electronic equipment and storage medium
CN110674229A (en) AST-based relational database SQL table relational analysis and display method
CN108984155A (en) Flow chart of data processing setting method and device
WO2021253641A1 (en) Shading language translation method
CN117093599A (en) Unified SQL query method for heterogeneous data sources
CN109885665A (en) A kind of data query method, apparatus and system
CN111914534A (en) Semantic mapping method and system for constructing knowledge graph
CN115470231B (en) Method, device and equipment for improving equivalent connection query performance based on data pruning
CN108536728A (en) A kind of data query method and apparatus
CN110909126A (en) Information query method and device
Castillo et al. RDFMatView: Idexing RDF Data for SPARQL Queries
CN108563561B (en) Program implicit constraint extraction method and system
CN113297212A (en) Spark query method and device based on materialized view and electronic equipment
CN115809063A (en) Storage process compiling method, system, electronic equipment and storage medium
EP3407204A1 (en) Methods and systems for translating natural language requirements to a semantic modeling language statement
CN110309214A (en) A kind of instruction executing method and its equipment, storage medium, server
CN110580170B (en) Method and device for identifying software performance risk
CN109918391A (en) A kind of streaming transaction methods and system
CN115292347A (en) Active SQL algorithm performance checking device and method based on rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180130

RJ01 Rejection of invention patent application after publication