CN117851439A

CN117851439A - SQL statement-based data blood-edge analysis method and device

Info

Publication number: CN117851439A
Application number: CN202311852135.6A
Authority: CN
Inventors: 王涛; 杨克伟; 曾东将; 蓝会晶
Original assignee: Hangzhou Eastcom Software Technology Co ltd
Current assignee: Hangzhou Eastcom Software Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-09

Abstract

A data blood-edge analysis method based on SQL statements, the method comprising: acquiring a table name of each table and field names of a plurality of fields in a plurality of database tables; analyzing the SQL sentence through the established multi-level analysis model to determine a first field name, a second field name and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result; recording a first field name, a second field name, a third field name, a corresponding relation between the first field name and the second field name, and a corresponding relation between the second field name and the third field name, wherein the corresponding relation is used for representing the data blood edges of SQL sentences. The method can accurately analyze the data flow direction and the dependency relationship between the data in the multi-level SQL statement.

Description

SQL statement-based data blood-edge analysis method and device

Technical Field

The invention relates to the technical field of data governance development, in particular to a data blood-margin analysis method and device based on SQL sentences.

Background

Data lineage, also known as data blood lineage, data origin, data lineage, refers to the complete life cycle of data from generation, processing, fusion, flow to final extinction of data, and a relationship naturally forms between data. It records the link relationships generated by the data, which are similar to human blood relationship, and therefore are called data blood relationship. With the advent of the big data age, the processing of massive data became an important task. To ensure accuracy and reliability of data, it is necessary to perform a blood-lineage analysis on the data, i.e., to find the source of the data, and which operations the data is affected by. Currently, the data lineage analysis tools in common use are mainly based on analysis of structured query language (structured query language, SQL) statements. The tool can analyze SQL sentences into query objects, parse the SQL sentences and establish basic data blood-edge relations according to database table structures.

However, in some related prior arts, data lineage analysis based on SQL statements can only obtain basic data lineage relationships, and cannot handle complex query operations and data flow and dependency between multiple levels.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a method, a device, a computing device, a computer storage medium and a product containing a computer program for analyzing data blood edges based on SQL sentences, which can accurately analyze the data flow direction and the dependency relationship between data in the multi-level SQL sentences.

In a first aspect, an embodiment of the present application provides a method for analyzing data blood edges based on an SQL statement, where the SQL statement includes a plurality of identifiers of database tables, each identifier of the database table corresponds to one database table, each database table includes at least one field, the SQL statement includes a plurality of layers, and each layer is a sub-SQL statement, and the method includes: acquiring a table name of each table and field names of a plurality of fields in a plurality of database tables; analyzing the SQL sentence through the established multi-level analysis model to determine a first field name, a second field name and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result; recording a first field name, a second field name, a third field name, a corresponding relation between the first field name and the second field name, and a corresponding relation between the second field name and the third field name, wherein the corresponding relation is used for representing the data blood edges of SQL sentences.

In some possible implementations, the multi-level analytical model includes a root surface layer, n view layers, and a result layer, where n is a positive integer greater than 0; the root surface layer is used for recording table names and first field names; an ith view layer in the n view layers is used for recording a table name and a second field name included in the current view layer; the result layer is used for recording the third field name.

In some possible implementations, each view of the n view layers is denoted as i_unit_j, where i layer view is characterized and j characterizes j layer view, where i, j are positive integers.

In some possible implementations, the sub-SQL statement is included in the from clause of the SQL statement and is a view when directly connected to the keyword from.

In some possible implementations, different views in the same view layer are connected by a keyword unit.

In some possible implementations, the method further includes: and displaying the corresponding relation between the first field name and the second field name and the corresponding relation between the second field name and the third field name in a graphical mode through a graphical rendering engine.

In some possible implementations, in a case where the number of view layers is j and j is greater than 1, the method further includes: recording the corresponding relation between the second field name in the kth view layer and the second field name in the kth+1th view layer, wherein k is more than 0 and less than j; recording the relation between the j+1th second field name and the first field name; and recording the corresponding relation between the second field name and the third field name in the 1 st view layer.

In some possible implementations, the correspondence includes a directional relationship, which is used to characterize the flow direction of the data.

In a second aspect, an embodiment of the present application provides a data blood edge analysis device based on an SQL statement, where the SQL statement includes a plurality of identifiers of database tables, each identifier of the database table corresponds to one database table, each database table includes at least one field, the SQL statement includes a plurality of layers, and each layer is a sub-SQL statement, and the device includes: the acquisition module is used for acquiring the table name of each table and the field names of a plurality of fields in the database tables; the processing module is used for analyzing the SQL sentence through the established multi-level analysis model and determining a first field name, a second field name and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result; the recording module is used for recording the first field name, the second field name, the third field name, the corresponding relation between the first field name and the second field name and the corresponding relation between the second field name and the third field name, wherein the corresponding relation is used for representing the data blood edges of SQL sentences.

In a third aspect, embodiments of the present application provide a computer-readable storage medium comprising computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of the first aspects.

In a fourth aspect, embodiments of the present application provide a computing device comprising a processor and a memory, wherein the memory has stored therein computer program instructions that, when executed by the processor, perform the method of any of the first aspects.

In a fifth aspect, embodiments of the present application provide a product comprising a computer program which, when run on a processor, causes the processor to perform the method according to any of the first aspects.

By the method, the data flow direction and the dependency relationship can be accurately analyzed, and the data flow direction and the dependency relationship can be intuitively displayed through the graphical interface.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data blood-edge analysis method based on SQL statements provided in an embodiment of the application;

FIG. 2 is a schematic diagram of a data flow provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of another data flow provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a view layer obtained by parsing an SQL statement according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a relationship between blood edges of data after an imaging process according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a data blood edge analysis device based on SQL statements according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The term "and/or" herein is an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The symbol "/" herein indicates that the associated object is or is a relationship, e.g., A/B indicates A or B.

The terms "first" and "second" and the like in the description and in the claims are used for distinguishing between different objects and not for describing a particular sequential order of objects. For example, the first response message and the second response message, etc. are used to distinguish between different response messages, and are not used to describe a particular order of response messages.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise specified, the meaning of "a plurality of" means two or more, for example, a plurality of processing units means two or more processing units and the like; the plurality of elements means two or more elements and the like.

For the purpose of facilitating an understanding of the embodiments of the present application, reference will now be made to the following description of specific embodiments, taken in conjunction with the accompanying drawings, which are not intended to limit the embodiments of the invention.

First, technical terms involved in the embodiments of the present application will be described:

1. the structured query language (structured query language, SQL) is a database language with multiple functions of data manipulation, data definition and the like, has the characteristic of interactivity, can provide great convenience for users, and can improve the working quality and efficiency of a computer application system by utilizing the SQL language.

Next, the technical solution provided in the embodiments of the present application will be described.

In the big data age, the source of data is extremely wide, various types of data are rapidly generated and also are explosively growing, which results in that the relationship between data is becoming more complex. Therefore, how to manage complex relationships between tables and between codes, so as to better recognize and understand the relationship between business systems and underlying tables and the relationship between tables of underlying tables, and to understand where and where current data (fields, key indicators or data labels) is located, and to know which downstream systems are using these data, etc. becomes an important issue. Data blood-source describes the source and destination of data, and is an important fundamental capability within an organization to make data valuable.

The existing data blood-edge analysis tool is mainly based on SQL statement analysis, and cannot well process complex query operation and data flow direction and dependency relationship among multiple layers. Meanwhile, the existing SQL statement analysis tool usually adopts a form or text mode to display the data blood-edge relationship, so that the data flow direction and the dependency relationship between the data cannot be intuitively displayed.

In view of this, the embodiment of the present application provides a data blood-edge analysis method, which analyzes an SQL statement to obtain a query object, and analyzes the SQL statement through a multi-level analysis model to obtain a flow direction of each field (i.e., a flow direction of data) and a blood-edge relationship (i.e., a dependency relationship between data) related to the SQL statement.

Exemplary, fig. 1 shows a flow chart of a data blood-edge analysis method based on SQL statements according to an embodiment of the present application. The SQL sentence comprises a plurality of identifiers of database tables, each identifier of the database table corresponds to one database table, and each database table comprises at least one field. The SQL statement comprises a plurality of layers, each layer is a sub SQL statement, and the SQL statement is not among the where clauses, because the where clauses are conditional clauses, the data in the SQL statement does not participate in circulation, has no flow direction, and has no blood margin. For example, referring to table 1, table 1 shows a specific SQL statement, which includes a sub-SQL statement, "select customer_ id from customers", indicating that the query is performed in the query result of the sub-SQL statement, i.e., the sub-SQL statement is a hierarchy. That is, the SQL statement in Table 1 includes a hierarchy in addition to the keyword "select" contained in the where clause, an SQL statement beginning with the keyword "select", is a hierarchy.

TABLE 1

Fig. 2 is a schematic diagram of a data flow provided in an embodiment of the present application, in which a data flow when the SQL statement in table 1 is executed, that is, a blood-edge relationship between data is shown. The SQL sentence execution can directly inquire in a database table to obtain an execution result, and the select clause execution of the outermost layer is not required to be executed in the select execution result of the inner layer. As shown in fig. 2, during the execution of the SQL statement, the database table related to the SQL statement is acquired according to the identifier (database table name) of the data table included in the SQL statement, and the field names in the database table are acquired and recorded one by one. At the time of recording, it may be that only the database table name 21, and the field name 22 are recorded. In field names 22, a plurality of specific field names 221 may be included. Executing the SQL sentence to obtain an execution result. Each specific field name 221 in the execution result field name 22 is recorded with the corresponding relation of the specific field name 221 in the recorded database table 21, and the corresponding relation of the record can embody the data blood-edge relation. For example, field "customer_id" is derived from field customer_id in database customers, field "customer_name" is derived from field customer_name in database customers, and field "order_date" is derived from field order_date in database orders.

For another example, referring to table 2, a specific SQL statement is shown in table 2, which contains 2 keywords "select" in addition to the keywords "select" contained in the where clause, i.e., the SQL statement has 2 levels in total.

TABLE 2

Fig. 3 is another data flow diagram provided in an embodiment of the present application, in which a data flow when executing the SQL statement in table 2 is shown. When the SQL statement in the table 2 is executed, the outermost layer of the select clause cannot be queried in the physically existing database table to obtain the execution result, i.e. the execution of the outermost layer of the select clause is required to be executed in the execution result of the inner layer of the select clause. And taking the execution result of the inner layer select clause as a virtual table t, and carrying out query operation on the outermost layer select clause in the virtual table t. The virtual table t is recorded as "view1", and the execution result is recorded as "view2". The correspondence between each field name 221 in the virtual table t and each field name 221 in the database table 21 is recorded, and the correspondence between each field name 221 in the execution result and each field name 221 in the virtual table t is recorded. In this way, the relationship of the blood edges of the data can be determined, for example, the field "customer_id" in the query result is derived from "customer_id" in the virtual table t, which in turn is derived from the customer_id in the database table orders. Thus, the dependency relationship of the data, namely the blood relationship, can be embodied. In other words, it can be understood that the customer_id in the database table orders flows to the "customer_id" in the virtual table t, which in turn flows to the query result.

As shown in fig. 1, the data blood-edge analysis method comprises the following steps:

s11: the table name of each table in the database tables and the field name of each field are obtained.

In this embodiment, an SQL statement may be parsed by an SQL statement parser, for example, jsqlparser, to obtain a query (query) object corresponding to the SQL statement. In the query object, a database table identifier in the SQL statement (i.e., a database table name in the SQL statement) is included, and a field in each database table, the field including a field name.

For example, referring to Table 3, a specific SQL statement is shown in Table 3. The SQL statement is composed of the SQL statement in Table 1 and Table 2 connected by the keyword "unit", describing a scenario in which a customer (customer) orders a book (book). And analyzing the SQL sentence through an SQL sentence analyzer to obtain a query object of the SQL sentence, wherein the query object is a customer object and an order object. In the customer object, a database table name customer is included, and fields customer_id, customer_name in the database table customer. In order objects, database table names order are included, and fields customer_id, book_id, order_date in database table order are included.

TABLE 3 Table 3

S12: analyzing the SQL sentence through the established multi-level analysis model to determine a first field name, a second field name and a third field name; the first field name is a field name corresponding to a table name of the database table; the second field name is a field name corresponding to a view name included in a view layer, the view layer comprises at least one view, and the view is a sub SQL statement included in the SQL statement; the third field name is a field name included in the execution result of the SQL statement.

In this embodiment, the SQL statement may be parsed, and each field name in the root layer included in the SQL statement may be obtained, where the field names are referred to as the first field name. The field names contained in each view layer can also be obtained, and the field names in the view layer are called second field names. Wherein the collection of non-entity tables is referred to as a view, i.e., the collection is not physically present in the database tables, which is a collection formed to facilitate understanding of the schema. For example, with continued reference to Table 2, the SQL statement in Table 2 contains a from clause whose content is "from (select_id, order_ date from orders where customer _id not in (select_ id from customers)) as" shown in a query result (i.e., "select

The customer_id, order_ date from orders where customer _id not in (select customer_ id from customers)) as t ") and the result of this query is referred to as a table t. The lookup result table t at this time is not a table physically existing in the database, and such a virtual table t is regarded as one view. Each view has its own independent name. The field names in each layer view are referred to as second field names. It is noted that the views mentioned in the embodiments of the present application, and the hierarchy in SQL statements, may not be a concept. The parsing result of the SQL sentence can also comprise field names contained in the result layer, and the field names in the result layer are called as third field names. Fields in the result layer, i.e., fields in the select clause in the SQL statement. For example, the SQL statement in Table 1 has select clauses of "select_memories_id, store_name, and order_date", i.e., the third field name includes "memory_id", "store_name", and "order_date". The result layer is also not essentially a physically present database table in the database, and can therefore be regarded as a special view layer.

In some possible embodiments, each of the first field name, the second field name, and the third field name may include at least one of a field name, a field alias, a field type, a field home table alias, a field function, and a field script. The field alias is an additional name of the field, for example, an SQL statement in table 2, "'Unknown' as customer_name", and customer_name is an alias of 'Unknown'. The field home table alias is an alias of a table to which a field belongs, for example, the SQL statement in table 2 is named as t for the virtual table "select customer_id, order_ date from orders where customer _id not in (select customer_ id from customers)".

In some possible embodiments, the multi-level analytical model includes a root surface layer, n view layers, and a result layer, where n is a positive integer greater than 0; the root surface layer is used for recording the table names and the first field names; an ith view layer in the n view layers is used for recording a table name and a second field name included in the current view layer; the result layer is used for recording a third field name.

In this embodiment, the multi-level analytical model includes a root surface layer, a view layer, and a result layer. The root surface layer is recorded with a database table characterized by a query object analyzed by the SQL statement and various fields in the database table. The number of database tables is at least one. The result layer records each field name in the result after the SQL statement is executed, and the result layer can be marked as 'view_result'. For the view layer, the view layer may be included in one SQL statement, or the view layer may not be included. When the view layer is included, the number of view layers may be one or more. In the view layer, at least one view, i.e. a table not present in the database, which is a virtual table, may be included.

In some possible embodiments, different views in the same view layer are connected by a keyword unit.

In this embodiment, for the ith view layer in the n view layers, when the i value is 1, it indicates that the view layer is the outermost view layer in the SQL statement. With continued reference to table 3, the SQL statements in table 3 include the SQL statements in table 1 and the SQL statements in table 2, and the SQL statements in table 1 and the SQL statements in table 2 are connected by a unit. At this time, the SQL statement in table 1 and the SQL statement in table 2 are respectively one view and are the same view layer, and at this time, the view layer is the outermost view layer. Referring to fig. 4, fig. 4 is a schematic diagram of a view layer obtained by parsing the SQL statement in table 3. The first view at this time is named 1_unit_1 and the second view is named 1_unit_2. The SQL statement is formed by splicing the SQL statement in the table 1 (i.e. 1_unit_1 in the first view layer in fig. 2) with the SQL statement in the table 2 (i.e. 1_unit_2 in the first view layer in fig. 2) through a unit keyword. The SQL statement in Table 1 is a view and the SQL statement in Table 2 is a view. The two views are the same view layer.

In some possible embodiments, a sub-SQL statement is included in the from clause of the SQL statement and is a view when directly connected to the keyword from.

In this embodiment, when dividing each view in the nth (n > 1) view layer, division may be performed according to keywords. The view (layer) at this time is not the outermost view layer. For example, referring to the SQL statement in Table 2, the keyword from clause is a select clause, meaning that the query is performed in a query result. The SQL statement in which the select clause is located may be taken as a view where the select clause immediately follows the keyword from.

S13: and recording the first field name, the second field name, the third field name, the correspondence between the first field name and the second field name, and the correspondence between the second field name and the third field name.

In this embodiment, the root surface layer may be used as the n-1 th view layer, the result layer may be used as the n+1 th view layer, and then the correspondence between the field names in each view layer and the field names in the next view layer may be recorded. It is noted that the second field name is a field name corresponding to each view in the view layer. Because of the nesting reason of the SQL sentence, after each sub SQL is nested, the last view layer is closest to the root surface layer, and the corresponding relation between the first field name and the second field name at the moment is the corresponding relation between the root surface layer and the field name of the nth view layer. Similarly, the first view layer is closest to the result layer, and the corresponding relationship between the second field name and the third field name is the corresponding relationship between the field name of the first view layer and the field name of the result layer. The corresponding relationship is the data blood-edge relationship in the SQL sentence. The record correspondence may be performed by recording a direction relationship, i.e. the flow direction of data, i.e. a first field "customer_id" corresponds to a second field "customer_id", and it may be recorded that the first field "customer_id" points to the second field "customer_id", indicating that the data "customer_id" flows from the first field to the second field.

In some possible embodiments, in the case where the number of view layers is j and j is greater than 1, the method further includes: recording the corresponding relation between the second field name in the kth view layer and the second field name in the kth+1th view layer, wherein k is more than 0 and less than j; recording the relation between the j+1th second field name and the first field name; and recording the corresponding relation between the second field name and the third field name in the 1 st view layer.

In this embodiment, the view layer includes a plurality of sub-view layers, and the number of sub-view layers is j. For the kth view layer, the data in the kth view layer originates from the last view layer, and therefore, the correspondence between the view layer and the last view layer needs to be recorded.

In some possible embodiments, the mapping process may also be performed on the first field name and the second field name of the record, and the correspondence between the second field name and the third field name.

In this embodiment, after the corresponding relationship between the first field name and the second field name and the corresponding relationship between the second field name and the third field name are recorded, the blood edge analysis result may be graphically displayed by a graphic rendering engine (for example, X6). Specifically, each table or field may be represented as a node (or a grid in the table), and the data flow directions (i.e., the correspondence between the first field name and the second field name, and the correspondence between the second field name and the third field name) between different nodes are linked by using a connection line to form a data blood-edge graph. For example, referring to fig. 5, fig. 5 shows a schematic diagram of a blood-edge relationship of data after an imaging process according to an embodiment of the present application. As shown in fig. 5, after performing the blood-edge analysis on the SQL statement in table 3, the database table name and the first field name of the root surface layer, the view name of the first view layer, and the second field name under each view, the view name of the second view layer, and the second field name under the view, the name of the result layer, and the third field name may be displayed in the form of tables, and at the same time, the correspondence relationship therein may be displayed in a wired manner.

The above is the data blood edge analysis method based on the SQL statement provided by the embodiment of the application, the query object is obtained by analyzing the SQL statement, and the flow direction (i.e. the flow direction of the data) of each field involved in the SQL statement and the blood edge relationship (i.e. the dependency relationship between the data) of the data are obtained by analyzing the SQL statement through a multi-level analysis model. The method supports various SQL grammars, can process the query operation of complex multi-level SQL sentences and multi-level data blood edge analysis, and can meet the requirements of different levels of data blood edge analysis. Through the graphical display mode, the display device can be further convenient for users to understand and use. The method can bring new achievements and development opportunities to the field of data analysis, and has wide application prospect and market value.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments in this application. In addition, in some possible implementations, each step in the foregoing embodiments may be selectively performed according to practical situations, and may be partially performed or may be performed entirely, which is not limited herein. All or part of any features of any of the embodiments of the present application may be freely, and arbitrarily combined without conflict. The combined technical scheme is also within the scope of the application.

Based on the method in the above embodiment, the embodiment of the present application further provides a data blood edge analysis device based on the SQL statement. Fig. 6 is a schematic structural diagram of a data blood edge analysis device based on an SQL statement according to an embodiment of the present application. As shown in fig. 6, the apparatus 600 may include an acquisition module 601, a processing module 602, and a recording module 603.

The acquiring module 601 is configured to acquire a table name of each table and a field name of a plurality of fields in the plurality of database tables;

the processing module 602 is configured to parse the SQL statement through the built multi-level parsing model to determine a first field name, a second field name, and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result;

the recording module 603 is configured to record the first field name, the second field name, the third field name, a correspondence between the first field name and the second field name, and a correspondence between the second field name and the third field name, where the correspondence is used to characterize a data blood edge of the SQL statement.

Fig. 7 shows a schematic structural diagram of a computer device provided in an embodiment of the present specification, where the computer device may include: processor 710, memory 720, input/output interface 730, communication interface 740, and bus 750. Wherein processor 710, memory 720, input/output interface 730, and communication interface 740 implement a communication connection among each other within the device via bus 750. The computer device may be used to perform the method shown in fig. 2 described above.

The processor 710 may be implemented in a general-purpose CPU (Central Processing Unit ), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing relevant programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 720 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 720 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented in software or firmware, relevant program codes are stored in memory 720 and invoked for execution by processor 710.

The input/output interface 730 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown in the figure) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The communication interface 740 is used to connect with a communication module (not shown) to enable communication interactions between the device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 750 includes a path to transfer information between elements of the device (e.g., processor 710, memory 720, input/output interface 730, and communication interface 740).

It should be noted that although the above-described device only shows processor 710, memory 720, input/output interface 730, communication interface 740, and bus 750, in particular implementations, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.

Based on the methods in the above embodiments, the present application provides a computer-readable storage medium storing a computer program that, when executed on a processor, causes the processor to perform the methods in the above embodiments.

Based on the methods in the above embodiments, embodiments of the present application provide a computer program product that, when run on a processor, causes the processor to perform the methods in the above embodiments.

It is to be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor.

The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by a processor executing software instructions. The software instructions may be comprised of corresponding software modules that may be stored in random access memory (random access memory, RAM), flash memory, read-only memory (ROM), programmable ROM (PROM), erasable programmable PROM (EPROM), electrically erasable programmable EPROM (EEPROM), registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be appreciated that the various numerical numbers referred to in the embodiments of the present application are merely for ease of description and are not intended to limit the scope of the embodiments of the present application.

Claims

1. A method for analyzing data blood edges based on SQL statements, wherein the SQL statements include a plurality of identifiers of database tables, each identifier of the database table corresponds to a database table, each database table includes at least one field, the SQL statements include a plurality of layers, each layer is a sub-SQL statement, the method comprising:

acquiring a table name of each table and field names of a plurality of fields in the database tables;

analyzing the SQL sentence through the established multi-level analysis model to determine a first field name, a second field name and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result;

and recording the first field name, the second field name, the third field name, the corresponding relation between the first field name and the second field name, and the corresponding relation between the second field name and the third field name, wherein the corresponding relation is used for representing the data blood edges of the SQL statement.

2. The method of claim 1, wherein the multi-level analytical model comprises a root skin, n view layers, and a result layer, wherein n is a positive integer greater than 0; wherein,

the root surface layer is used for recording the table name and a first field name;

an ith view layer in the n view layers is used for recording a table name and a second field name included in the current view layer;

the result layer is used for recording a third field name.

3. The method of claim 1, wherein each view of the n view layers is denoted as i_unit_j, wherein the view representing the i layer, j representing the j view, i, j being a positive integer.

4. The method of claim 1, wherein a sub-SQL statement is included in a from clause of the SQL statement and is a view when directly connected to a keyword from.

5. The method of claim 1, wherein different views in the same view layer are connected by a keyword unit.

6. The method according to claim 1, wherein the method further comprises: and displaying the corresponding relation between the first field name and the second field name and the corresponding relation between the second field name and the third field name in a graphical mode through a graphical rendering engine.

7. The method according to claim 1, wherein in case the number of view layers is j and j is greater than 1, the method further comprises:

recording the corresponding relation between the second field name in the kth view layer and the second field name in the kth+1th view layer, wherein k is more than 0 and less than j;

recording the relation between the j+1th second field name and the first field name;

and recording the corresponding relation between the second field name and the third field name in the 1 st view layer.

8. The method of claim 1, wherein the correspondence comprises a directional relationship, the directional relationship being used to characterize a flow direction of data.

9. A data blood edge analysis device based on an SQL statement, wherein the SQL statement includes a plurality of identifiers of database tables, each identifier of the database table corresponds to one database table, each database table includes at least one field, the SQL statement includes a plurality of layers, each layer is a sub-SQL statement, and the device includes:

the acquisition module is used for acquiring the table name of each table and the field names of a plurality of fields in the database tables;

the processing module is used for analyzing the SQL sentence through the established multi-level analysis model and determining a first field name, a second field name and a third field name; the first field name is a field name corresponding to the table name; the second field name is a field name corresponding to a table name included in a view layer, the view layer comprises at least one view, and the view corresponds to one sub SQL statement included in the SQL statement; the third field name is a field name included in the SQL statement execution result;

the recording module is used for recording the first field name, the second field name, the third field name, the corresponding relation between the first field name and the second field name and the corresponding relation between the second field name and the third field name, wherein the corresponding relation is used for representing the data blood edges of the SQL sentences.

10. A computer readable storage medium storing a computer program which, when run on a processor, causes the processor to perform the method of any one of claims 1-8.