CN112364095A

CN112364095A - Data traceability analysis visualization method

Info

Publication number: CN112364095A
Application number: CN202011293369.8A
Authority: CN
Inventors: 郑敏; 吴呈良; 李欣阳
Original assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Current assignee: Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-02-12

Abstract

The invention provides a visual method for data traceability analysis, which belongs to the technical field of data management. The method can reduce the complexity of the work of an enterprise in data management, and can assist a user to quickly locate data sources and initial links of abnormal occurrence when the daily data is abnormal; when the business is changed, the associated table can be quickly sorted and analyzed, the development time of data related to the change is shortened, and the data risk caused by the business change is reduced.

Description

Data traceability analysis visualization method

Technical Field

The invention relates to a data management technology, in particular to a data traceability analysis visualization method.

Background

With the continuous emergence and wide application of various emerging computer technologies and services, the types of data are increasing, and the data analyzed and processed by enterprises are very wide in source, possibly from various internal information systems, possibly from external environments such as internet public data or data acquired from third parties through data purchase. Different data sources have different quality levels and different influences on analysis processing results. When data is abnormal, a reliable tool is needed to trace the source and quickly locate the data. In addition, when various data sources are changed or services are changed, the influence range also needs to be rapidly evaluated, and a response is made in time.

Disclosure of Invention

In order to solve the technical problems, the invention provides a data traceability analysis visualization method, which is used for reducing the complexity of work of an enterprise in data classification management, improving the management efficiency of data, reducing data risks caused by business changes and improving the analysis efficiency of abnormal data.

The technical scheme of the invention is as follows:

a visual method for data traceability analysis is characterized in that various scripts and processing operation programs in the data cleaning and processing process are analyzed, the incidence relation among data tables is stored, and then the data tables are displayed to a user in a visual chart form, so that the visual traceability analysis of various data is realized.

Further, in the above-mentioned case,

the system comprises the following modules:

(1) the source tracing analysis storage module:

according to the granularity from large to small, the various relationships in the tracing process are divided into three levels: task level, table level, field level. Different storage models are configured for these three levels:

and (3) task level:

task relation table R _ JOB

Name of field	Data type	Note
			JOB_ID	Numerical type	Task ID
JOB_PARENT_ID	Numerical type	Parent task ID

Task list DM _ JOB

Name of field	Data type	Note
			JOB_ID	Numerical type	Task ID
JOB_NAME	Character type	Task name
			JOB_FREQ	Numerical type	Frequency of task execution
JOB_EXEC_TIME	Time stamp type	Last execution time of task

Note: the task execution frequency is divided into day, week, month and the like, and the data are respectively marked by numbers 1, 2, 3 and 0 and used for classifying lines among the task nodes in subsequent visual display.

And (3) table level:

TABLE RELATIVE TABLE R _ TABLE

Name of field	Data type	Note
			TABLE_ID	Numerical type	Table ID
TABLE_PARENT_ID	Numerical type	Father data table name

Table list DM _ TABLE

Field level:

field relation table R _ COLUMN

Name of field	Data type	Note
			COLUMN_ID	Numerical type	Field ID
COLUMN_PARENT_ID	Numerical type	Field parent ID

Field list DM _ COLUMN

Name of field	Data type	Note
			COLUMN_ID	Numerical type	Field ID
TABLE_ID	Numerical type	Table ID to which field belongs
			COLUMN_NAME	Character type	Name of field
COLUMN_COMMENT	Character type	Field comments
			COLUMN_TYPE	Character type	Type of field

(2) An analysis module:

the analysis module mainly analyzes various SQL scripts and processing operation programs in the data cleaning and processing process.

a. And analyzing the SQL script to generate an abstract syntax tree for the operation flow of the SQL script class, and then analyzing and generating a source table, a target table, a source field and a target field related to the SQL code segment based on the abstract syntax tree. For all involved TABLEs, the duplication is removed and the TABLEs are merged into DM _ TABLE, and other TABLE basic information is merged into DM _ TABLE from the corresponding DBMS. And then storing the corresponding relation between the source TABLE and the target TABLE to R _ TABLE. And for all the related fields, combining the fields into the DM _ COLUMN after de-duplication, and combining the basic information of other fields into the DM _ COLUMN from the corresponding DBMS. And then storing the corresponding relation between the source field and the target field into the R _ COLUMN.

b. For a processing operation program, a common Data processing tool such as a Pentaho Data Integration-based Data processing tool analyzes metadata of a processing flow in a background resource library to obtain an association relationship of a task level, a table level and a field level, and stores the association relationship into a table of a corresponding level.

c. For other scheduling forms, maintenance updates can be made to R _ JOB, DM _ JOB in a manual form.

(3) Patrol and examine early warning module:

the module mainly inspects and warns two parts of contents:

a. and for the script and the flow which are tried to be analyzed but the relation between the source and the target cannot be confirmed, feeding back the corresponding information to the user, and manually specifying the relation by the user.

b. And carrying out daily change polling on the level relation successfully analyzed and stored in the storage model, and if the change is found, informing the user of the change condition.

(4) Visual display module

The visualization display module mainly displays all levels of models in the traceable storage model in a visualization form (such as a chart and a table), and meanwhile, a user can change the models according to abnormal conditions in the routing inspection early warning module. The visualized display is in one-to-one correspondence with the levels of the storage model, and tasks, tables and fields can be drilled into each other, for example, from the task level, the tasks can be drilled down to the table corresponding to the task, or the tasks related to the table can be drilled up through the tables, and the data relationships of different levels can be displayed through the mode. Meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.

The invention has the advantages that

(1) Through an automatic form, the tracing relation before various metadata is maintained, and is displayed to a user through a visual form, so that the working complexity of the user in data management is reduced.

(2) When the data is abnormal, the abnormal source can be quickly positioned, and the data problem solving efficiency is improved.

(3) When a data source or business is changed, the tables and flows related to the change can be quickly arranged, the time required by subsequent development is shortened, and the data risk caused by the change is reduced.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The invention comprises the following modules:

(1) the source tracing analysis storage module:

and (3) task level:

task relation table R _ JOB

Task list DM _ JOB

Note: the task execution frequency is divided into day, week, month and the like, and is respectively marked by numbers 1, 2, 3 and 0 and used for classifying lines among task nodes in subsequent visual display;

and (3) table level:

TABLE RELATIVE TABLE R _ TABLE

Table list DM _ TABLE

Name of field	Data type	Note
			TABLE_ID	Numerical type	Table ID
JOB_ID	Numerical type	Task ID to which table belongs
			TABLE_NAME	Character type	Name of data table
TABLE_COMMENT	Character type	Data sheet annotation
			TABLE_OWNER	Character type	Data sheet person in charge
TABLE_DB	Character type	Database of data table
			TABLE_CREATE_TIME	Time stamp type	Data table creation time

Field level:

field relation table R _ COLUMN

Field list DM _ COLUMN

(2) An analysis module:

(3) Patrol and examine early warning module:

the module mainly inspects and warns two parts of contents:

(4) Visual display module

The visualization display module mainly displays all levels of models in the traceable storage model in a visualization form (such as a chart and a table), and meanwhile, a user can change the models according to abnormal conditions in the routing inspection early warning module. The visualized display is in one-to-one correspondence with the levels of the storage model, and tasks, tables and fields can be subjected to mutual data drilling operation, for example, the tasks, tables and fields can be drilled downwards to the tables corresponding to the tasks from the task levels or drilled upwards to the tasks related to the tables through the tables, and data relationships of different levels are displayed through the mode. Meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for visualizing data traceability analysis is characterized in that,

the method comprises the steps of analyzing various scripts and processing operation programs in the data cleaning and processing process, storing the association relation among data tables, and displaying the association relation to a user in a visual chart form, so that visual traceability analysis of various data is realized.

2. The method of claim 1,

the system comprises the following modules:

(1) the source tracing analysis storage module:

according to the granularity from large to small, the various relationships in the tracing process are divided into three levels: task level, table level, field level; configuring different storage models for the three layers;

(2) an analysis module:

analyzing various SQL scripts and processing operation programs in the data cleaning and processing process;

(3) patrol and examine early warning module:

carrying out inspection and early warning on the two parts of contents;

(4) visual display module

Through a visual form, models at all levels in the traceable storage model are displayed, and meanwhile, a user can change the abnormal conditions in the routing inspection early warning module.

3. The method of claim 2,

and (3) task level:

task relation table R _ JOB

Task list DM _ JOB

and (3) table level:

TABLE RELATIVE TABLE R _ TABLE

Table list DM _ TABLE

Field level:

field relation table R _ COLUMN

Field list DM _ COLUMN

4. The method of claim 2,

analyzing the SQL script to generate an abstract syntax tree for the operation flow of the SQL script class, and then analyzing and generating a source table and a target table, a source field and a target field related to the SQL code segment based on the abstract syntax tree;

for all the related TABLEs, combining the TABLEs into DM _ TABLE after de-duplication, and combining the basic information of other TABLEs into DM _ TABLE from the corresponding DBMS; then storing the corresponding relation between the source TABLE and the target TABLE to R _ TABLE; for all related fields, combining the fields into the DM _ COLUMN after de-duplication, and combining the basic information of other fields into the DM _ COLUMN from the corresponding DBMS; then storing the corresponding relation between the source field and the target field into R _ COLUMN;

for the processing operation program, the association relation of task level, table level and field level is obtained by analyzing the metadata of the processing flow in the background resource library, and the association relation is stored in the table of the corresponding level.

5. The method of claim 4,

for other scheduling forms, maintenance updates are made to R _ JOB, DM _ JOB by manual form.

6. The method of claim 2,

the specific work content of the patrol early-warning module is as follows:

a. for the script and the flow which are tried to be analyzed but the relation between the source and the target cannot be confirmed, feeding back the corresponding information to the user, and manually specifying the relation by the user;

7. The method of claim 2,

the visual display module mainly displays all levels of models in the traceable storage model in a visual mode, and meanwhile, a user can change the abnormal conditions in the routing inspection early warning module.

8. The method of claim 7,

the levels of the visual display and storage models are in one-to-one correspondence, and tasks, tables and fields can be mutually drilled to display data relationships of different levels;

meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.