CN112364095A - Data traceability analysis visualization method - Google Patents

Data traceability analysis visualization method Download PDF

Info

Publication number
CN112364095A
CN112364095A CN202011293369.8A CN202011293369A CN112364095A CN 112364095 A CN112364095 A CN 112364095A CN 202011293369 A CN202011293369 A CN 202011293369A CN 112364095 A CN112364095 A CN 112364095A
Authority
CN
China
Prior art keywords
data
field
type
task
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011293369.8A
Other languages
Chinese (zh)
Inventor
郑敏
吴呈良
李欣阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202011293369.8A priority Critical patent/CN112364095A/en
Publication of CN112364095A publication Critical patent/CN112364095A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a visual method for data traceability analysis, which belongs to the technical field of data management. The method can reduce the complexity of the work of an enterprise in data management, and can assist a user to quickly locate data sources and initial links of abnormal occurrence when the daily data is abnormal; when the business is changed, the associated table can be quickly sorted and analyzed, the development time of data related to the change is shortened, and the data risk caused by the business change is reduced.

Description

Data traceability analysis visualization method
Technical Field
The invention relates to a data management technology, in particular to a data traceability analysis visualization method.
Background
With the continuous emergence and wide application of various emerging computer technologies and services, the types of data are increasing, and the data analyzed and processed by enterprises are very wide in source, possibly from various internal information systems, possibly from external environments such as internet public data or data acquired from third parties through data purchase. Different data sources have different quality levels and different influences on analysis processing results. When data is abnormal, a reliable tool is needed to trace the source and quickly locate the data. In addition, when various data sources are changed or services are changed, the influence range also needs to be rapidly evaluated, and a response is made in time.
Disclosure of Invention
In order to solve the technical problems, the invention provides a data traceability analysis visualization method, which is used for reducing the complexity of work of an enterprise in data classification management, improving the management efficiency of data, reducing data risks caused by business changes and improving the analysis efficiency of abnormal data.
The technical scheme of the invention is as follows:
a visual method for data traceability analysis is characterized in that various scripts and processing operation programs in the data cleaning and processing process are analyzed, the incidence relation among data tables is stored, and then the data tables are displayed to a user in a visual chart form, so that the visual traceability analysis of various data is realized.
Further, in the above-mentioned case,
the system comprises the following modules:
(1) the source tracing analysis storage module:
according to the granularity from large to small, the various relationships in the tracing process are divided into three levels: task level, table level, field level. Different storage models are configured for these three levels:
and (3) task level:
task relation table R _ JOB
Name of field Data type Note
JOB_ID Numerical type Task ID
JOB_PARENT_ID Numerical type Parent task ID
Task list DM _ JOB
Name of field Data type Note
JOB_ID Numerical type Task ID
JOB_NAME Character type Task name
JOB_FREQ Numerical type Frequency of task execution
JOB_EXEC_TIME Time stamp type Last execution time of task
Note: the task execution frequency is divided into day, week, month and the like, and the data are respectively marked by numbers 1, 2, 3 and 0 and used for classifying lines among the task nodes in subsequent visual display.
And (3) table level:
TABLE RELATIVE TABLE R _ TABLE
Name of field Data type Note
TABLE_ID Numerical type Table ID
TABLE_PARENT_ID Numerical type Father data table name
Table list DM _ TABLE
Figure BDA0002784492400000021
Figure BDA0002784492400000031
Field level:
field relation table R _ COLUMN
Name of field Data type Note
COLUMN_ID Numerical type Field ID
COLUMN_PARENT_ID Numerical type Field parent ID
Field list DM _ COLUMN
Name of field Data type Note
COLUMN_ID Numerical type Field ID
TABLE_ID Numerical type Table ID to which field belongs
COLUMN_NAME Character type Name of field
COLUMN_COMMENT Character type Field comments
COLUMN_TYPE Character type Type of field
(2) An analysis module:
the analysis module mainly analyzes various SQL scripts and processing operation programs in the data cleaning and processing process.
a. And analyzing the SQL script to generate an abstract syntax tree for the operation flow of the SQL script class, and then analyzing and generating a source table, a target table, a source field and a target field related to the SQL code segment based on the abstract syntax tree. For all involved TABLEs, the duplication is removed and the TABLEs are merged into DM _ TABLE, and other TABLE basic information is merged into DM _ TABLE from the corresponding DBMS. And then storing the corresponding relation between the source TABLE and the target TABLE to R _ TABLE. And for all the related fields, combining the fields into the DM _ COLUMN after de-duplication, and combining the basic information of other fields into the DM _ COLUMN from the corresponding DBMS. And then storing the corresponding relation between the source field and the target field into the R _ COLUMN.
b. For a processing operation program, a common Data processing tool such as a Pentaho Data Integration-based Data processing tool analyzes metadata of a processing flow in a background resource library to obtain an association relationship of a task level, a table level and a field level, and stores the association relationship into a table of a corresponding level.
c. For other scheduling forms, maintenance updates can be made to R _ JOB, DM _ JOB in a manual form.
(3) Patrol and examine early warning module:
the module mainly inspects and warns two parts of contents:
a. and for the script and the flow which are tried to be analyzed but the relation between the source and the target cannot be confirmed, feeding back the corresponding information to the user, and manually specifying the relation by the user.
b. And carrying out daily change polling on the level relation successfully analyzed and stored in the storage model, and if the change is found, informing the user of the change condition.
(4) Visual display module
The visualization display module mainly displays all levels of models in the traceable storage model in a visualization form (such as a chart and a table), and meanwhile, a user can change the models according to abnormal conditions in the routing inspection early warning module. The visualized display is in one-to-one correspondence with the levels of the storage model, and tasks, tables and fields can be drilled into each other, for example, from the task level, the tasks can be drilled down to the table corresponding to the task, or the tasks related to the table can be drilled up through the tables, and the data relationships of different levels can be displayed through the mode. Meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.
The invention has the advantages that
(1) Through an automatic form, the tracing relation before various metadata is maintained, and is displayed to a user through a visual form, so that the working complexity of the user in data management is reduced.
(2) When the data is abnormal, the abnormal source can be quickly positioned, and the data problem solving efficiency is improved.
(3) When a data source or business is changed, the tables and flows related to the change can be quickly arranged, the time required by subsequent development is shortened, and the data risk caused by the change is reduced.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The invention comprises the following modules:
(1) the source tracing analysis storage module:
according to the granularity from large to small, the various relationships in the tracing process are divided into three levels: task level, table level, field level. Different storage models are configured for these three levels:
and (3) task level:
task relation table R _ JOB
Name of field Data type Note
JOB_ID Numerical type Task ID
JOB_PARENT_ID Numerical type Parent task ID
Task list DM _ JOB
Name of field Data type Note
JOB_ID Numerical type Task ID
JOB_NAME Character type Task name
JOB_FREQ Numerical type Frequency of task execution
JOB_EXEC_TIME Time stamp type Last execution time of task
Note: the task execution frequency is divided into day, week, month and the like, and is respectively marked by numbers 1, 2, 3 and 0 and used for classifying lines among task nodes in subsequent visual display;
and (3) table level:
TABLE RELATIVE TABLE R _ TABLE
Name of field Data type Note
TABLE_ID Numerical type Table ID
TABLE_PARENT_ID Numerical type Father data table name
Table list DM _ TABLE
Name of field Data type Note
TABLE_ID Numerical type Table ID
JOB_ID Numerical type Task ID to which table belongs
TABLE_NAME Character type Name of data table
TABLE_COMMENT Character type Data sheet annotation
TABLE_OWNER Character type Data sheet person in charge
TABLE_DB Character type Database of data table
TABLE_CREATE_TIME Time stamp type Data table creation time
Field level:
field relation table R _ COLUMN
Name of field Data type Note
COLUMN_ID Numerical type Field ID
COLUMN_PARENT_ID Numerical type Field parent ID
Field list DM _ COLUMN
Name of field Data type Note
COLUMN_ID Numerical type Field ID
TABLE_ID Numerical type Table ID to which field belongs
COLUMN_NAME Character type Name of field
COLUMN_COMMENT Character type Field comments
COLUMN_TYPE Character type Type of field
(2) An analysis module:
the analysis module mainly analyzes various SQL scripts and processing operation programs in the data cleaning and processing process.
a. And analyzing the SQL script to generate an abstract syntax tree for the operation flow of the SQL script class, and then analyzing and generating a source table, a target table, a source field and a target field related to the SQL code segment based on the abstract syntax tree. For all involved TABLEs, the duplication is removed and the TABLEs are merged into DM _ TABLE, and other TABLE basic information is merged into DM _ TABLE from the corresponding DBMS. And then storing the corresponding relation between the source TABLE and the target TABLE to R _ TABLE. And for all the related fields, combining the fields into the DM _ COLUMN after de-duplication, and combining the basic information of other fields into the DM _ COLUMN from the corresponding DBMS. And then storing the corresponding relation between the source field and the target field into the R _ COLUMN.
b. For a processing operation program, a common Data processing tool such as a Pentaho Data Integration-based Data processing tool analyzes metadata of a processing flow in a background resource library to obtain an association relationship of a task level, a table level and a field level, and stores the association relationship into a table of a corresponding level.
c. For other scheduling forms, maintenance updates can be made to R _ JOB, DM _ JOB in a manual form.
(3) Patrol and examine early warning module:
the module mainly inspects and warns two parts of contents:
a. and for the script and the flow which are tried to be analyzed but the relation between the source and the target cannot be confirmed, feeding back the corresponding information to the user, and manually specifying the relation by the user.
b. And carrying out daily change polling on the level relation successfully analyzed and stored in the storage model, and if the change is found, informing the user of the change condition.
(4) Visual display module
The visualization display module mainly displays all levels of models in the traceable storage model in a visualization form (such as a chart and a table), and meanwhile, a user can change the models according to abnormal conditions in the routing inspection early warning module. The visualized display is in one-to-one correspondence with the levels of the storage model, and tasks, tables and fields can be subjected to mutual data drilling operation, for example, the tasks, tables and fields can be drilled downwards to the tables corresponding to the tasks from the task levels or drilled upwards to the tasks related to the tables through the tables, and data relationships of different levels are displayed through the mode. Meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A method for visualizing data traceability analysis is characterized in that,
the method comprises the steps of analyzing various scripts and processing operation programs in the data cleaning and processing process, storing the association relation among data tables, and displaying the association relation to a user in a visual chart form, so that visual traceability analysis of various data is realized.
2. The method of claim 1,
the system comprises the following modules:
(1) the source tracing analysis storage module:
according to the granularity from large to small, the various relationships in the tracing process are divided into three levels: task level, table level, field level; configuring different storage models for the three layers;
(2) an analysis module:
analyzing various SQL scripts and processing operation programs in the data cleaning and processing process;
(3) patrol and examine early warning module:
carrying out inspection and early warning on the two parts of contents;
(4) visual display module
Through a visual form, models at all levels in the traceable storage model are displayed, and meanwhile, a user can change the abnormal conditions in the routing inspection early warning module.
3. The method of claim 2,
and (3) task level:
task relation table R _ JOB
Name of field Data type Note JOB_ID Numerical type Task ID JOB_PARENT_ID Numerical type Parent task ID
Task list DM _ JOB
Name of field Data type Note JOB_ID Numerical type Task ID JOB_NAME Character type Task name JOB_FREQ Numerical type Frequency of task execution JOB_EXEC_TIME Time stamp type Last execution time of task
Note: the task execution frequency is divided into day, week, month and the like, and is respectively marked by numbers 1, 2, 3 and 0 and used for classifying lines among task nodes in subsequent visual display;
and (3) table level:
TABLE RELATIVE TABLE R _ TABLE
Name of field Data type Note TABLE_ID Numerical type Table ID TABLE_PARENT_ID Numerical type Father data table name
Table list DM _ TABLE
Name of field Data type Note TABLE_ID Numerical type Table ID JOB_ID Numerical type Task ID to which table belongs TABLE_NAME Character type Name of data table TABLE_COMMENT Character type Data sheet annotation TABLE_OWNER Character type Data sheet person in charge TABLE_DB Character type Database of data table TABLE_CREATE_TIME Time stamp type Data table creation time
Field level:
field relation table R _ COLUMN
Name of field Data type Note COLUMN_ID Numerical type Field ID COLUMN_PARENT_ID Numerical type Field parent ID
Field list DM _ COLUMN
Name of field Data type Note COLUMN_ID Numerical type Field ID TABLE_ID Numerical type Table ID to which field belongs COLUMN_NAME Character type Name of field COLUMN_COMMENT Character type Field comments COLUMN_TYPE Character type Type of field
4. The method of claim 2,
analyzing the SQL script to generate an abstract syntax tree for the operation flow of the SQL script class, and then analyzing and generating a source table and a target table, a source field and a target field related to the SQL code segment based on the abstract syntax tree;
for all the related TABLEs, combining the TABLEs into DM _ TABLE after de-duplication, and combining the basic information of other TABLEs into DM _ TABLE from the corresponding DBMS; then storing the corresponding relation between the source TABLE and the target TABLE to R _ TABLE; for all related fields, combining the fields into the DM _ COLUMN after de-duplication, and combining the basic information of other fields into the DM _ COLUMN from the corresponding DBMS; then storing the corresponding relation between the source field and the target field into R _ COLUMN;
for the processing operation program, the association relation of task level, table level and field level is obtained by analyzing the metadata of the processing flow in the background resource library, and the association relation is stored in the table of the corresponding level.
5. The method of claim 4,
for other scheduling forms, maintenance updates are made to R _ JOB, DM _ JOB by manual form.
6. The method of claim 2,
the specific work content of the patrol early-warning module is as follows:
a. for the script and the flow which are tried to be analyzed but the relation between the source and the target cannot be confirmed, feeding back the corresponding information to the user, and manually specifying the relation by the user;
b. and carrying out daily change polling on the level relation successfully analyzed and stored in the storage model, and if the change is found, informing the user of the change condition.
7. The method of claim 2,
the visual display module mainly displays all levels of models in the traceable storage model in a visual mode, and meanwhile, a user can change the abnormal conditions in the routing inspection early warning module.
8. The method of claim 7,
the levels of the visual display and storage models are in one-to-one correspondence, and tasks, tables and fields can be mutually drilled to display data relationships of different levels;
meanwhile, the module can search according to various levels to help the user to quickly locate the required data content.
CN202011293369.8A 2020-11-18 2020-11-18 Data traceability analysis visualization method Withdrawn CN112364095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011293369.8A CN112364095A (en) 2020-11-18 2020-11-18 Data traceability analysis visualization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011293369.8A CN112364095A (en) 2020-11-18 2020-11-18 Data traceability analysis visualization method

Publications (1)

Publication Number Publication Date
CN112364095A true CN112364095A (en) 2021-02-12

Family

ID=74533461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011293369.8A Withdrawn CN112364095A (en) 2020-11-18 2020-11-18 Data traceability analysis visualization method

Country Status (1)

Country Link
CN (1) CN112364095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486108A (en) * 2021-07-06 2021-10-08 建信金融科技有限责任公司 Data processing method and device, electronic equipment and computer readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486108A (en) * 2021-07-06 2021-10-08 建信金融科技有限责任公司 Data processing method and device, electronic equipment and computer readable medium

Similar Documents

Publication Publication Date Title
US11249981B2 (en) Data quality analysis
Lu et al. Discovering interacting artifacts from ERP systems
US10296552B1 (en) System and method for automated identification of internet advertising and creating rules for blocking of internet advertising
CN107908550B (en) Software defect statistical processing method and device
US20100179951A1 (en) Systems and methods for mapping enterprise data
CN114356940B (en) Power grid data management system and method
CN111125116B (en) Method and system for positioning code field in service table and corresponding code table
CN104123422A (en) Pre-manufacturing design method through database management system
CN115657890A (en) PRA robot customizable method
CN112364095A (en) Data traceability analysis visualization method
Gupta et al. Process cube for software defect resolution
CN113947468A (en) Data management method and platform
US20080208528A1 (en) Apparatus and method for quantitatively measuring the balance within a balanced scorecard
CN110427410B (en) Method and system for realizing data statistics based on form component
CN107729305B (en) Automatic conference material generation method based on database and front-end display technology
CN106980617B (en) Method and system for operating database based on JSON statement
CN111143356B (en) Report retrieval method and device
JP2015102878A (en) Program relation analysis method
CN111143337A (en) Method for improving data quality in product data management system
CN110807132A (en) Enterprise project data management system
TWI550531B (en) Enterprise resource planning performance evaluation system and method
CN114817171B (en) Buried point data quality control method
CN111695760A (en) Production quality risk recording and tracing method and system
CN117575222A (en) Production management method, system, equipment and storage medium
CN114862180A (en) Product quality data analysis feedback method and system based on full life cycle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210212