CN117235037A - Data entity-based data blood relationship tracing method - Google Patents

Data entity-based data blood relationship tracing method Download PDF

Info

Publication number
CN117235037A
CN117235037A CN202311025166.4A CN202311025166A CN117235037A CN 117235037 A CN117235037 A CN 117235037A CN 202311025166 A CN202311025166 A CN 202311025166A CN 117235037 A CN117235037 A CN 117235037A
Authority
CN
China
Prior art keywords
data
blood
relationship
field
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311025166.4A
Other languages
Chinese (zh)
Inventor
刘静涛
岳丽军
周万宁
管东林
屈峰
王一
苏思
张建延
张新建
陈单英
彦世兵
杨春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unit 91977 Of Pla
Original Assignee
Unit 91977 Of Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unit 91977 Of Pla filed Critical Unit 91977 Of Pla
Priority to CN202311025166.4A priority Critical patent/CN117235037A/en
Publication of CN117235037A publication Critical patent/CN117235037A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a data blood relationship tracing method based on a data entity, which comprises the following steps: mapping is established with the data entity, mapping associated data is obtained, metadata blood-edge relation data records of the data entity are collected, and traceability data are generated; defining blood edge relation data, and setting the thickness granularity of the blood edge relation at the table level and the field level; carrying out mapping data analysis, and analyzing field-level blood-edge relationship and table-level blood-edge relationship of the data entity; according to the structures of the data entity field level and the table level blood-edge relationship, storing the data of the data entity mapping relationship and the table relationship, and obtaining a data blood-edge relationship level and a mapping relationship data set corresponding to the data entity ID; and traversing the blood edge relation data set by using a recursive circulation and breadth-first algorithm to analyze, organize and visually display the blood edge relation data. The method can display the context relation of the data objects of the data management platform and rapidly analyze the influence among the data objects.

Description

Data entity-based data blood relationship tracing method
Technical Field
The invention relates to the technical field of data processing, in particular to a data blood relationship tracing method based on data entities.
Background
The main stream methods of data tracing and tracking include labeling method, reverse query method, data tracking method, bidirectional pointer tracking method, and special query language tracking method using graph theory idea. The labeling method is simple and convenient for tracing the data, but extra storage space is needed for storing labeling information, and the labeling method is not suitable for tracing the data in fine granularity data, especially in large data sets. Although the reverse query method needs smaller storage space than the labeling method, the method is complex to realize, has certain limitation on application scenes, and is difficult to effectively meet the data blood edge tracing of the data entity table and the field level when being singly used. The data blood-edge relationship tracing adopts a data reverse query method and combines a data relationship mapping and a recursion breadth-first traversal algorithm to realize tracing collection and query display of the data blood-edge relationship. The method can meet the data blood source tracing based on the data entity table and field level.
The principle of tracing the data blood relationship is shown in figure 1.
The data blood-edge relationship tracing is to establish a mapping relationship with a data entity in a data life cycle, then collect the data relationship and evolution of metadata of the data entity in different stages of data collection, storage, processing, transmission, exchange, archiving and the like, generate a data tracing link, and store data including data mapping relationship data, table relationship and the like according to a blood-edge relationship data structure.
When the data entity is selected for blood-edge relation display through inquiry, the data mapping corresponding to the data entity ID can be compared and related inquiry is carried out according to the data entity ID, a blood-edge relation tracing algorithm is adopted for data traversal, a data set is formed, and accordingly the data set completes data blood-edge relation visualization effect display of nodes of different levels through a specific graph drawing algorithm.
Disclosure of Invention
The invention aims to provide a data entity-based data blood relationship tracing method which is used for realizing table-level and field-level blood relationship tracing.
In order to achieve the above purpose, the data entity-based data blood relationship tracing method of the present invention includes:
step 1, mapping is established with a data entity to obtain mapping association data, and data entity metadata blood-edge relationship data records are collected according to the mapping relationship, data labeling is carried out, and traceability data are generated;
step 2, performing blood-edge relationship data definition according to the data entity tracing attention requirement, and performing table-level and field-level blood-edge relationship coarse-fine granularity setting;
step 3, carrying out mapping data analysis by adopting a blood-edge relationship analysis algorithm, and analyzing field-level blood-edge relationship and table-level blood-edge relationship of the data entity;
step 4, according to the structure of the data entity field-level blood-edge relationship and the table-level blood-edge relationship, storing data of the data entity mapping relationship and the table-to-table relationship, wherein a relational database or a graph database is adopted as a storage database;
step 5, inquiring the data entity to trace the blood-edge relationship, and acquiring a data blood-edge relationship level and a mapping relationship data set corresponding to the data entity ID;
and 6, traversing the blood-edge relation data set by using a recursion cycle and breadth-first algorithm, and analyzing, organizing and visually displaying the blood-edge relation data.
Further, the step 2 includes creating an ANTLR grammar file.
Further, the step 3 includes generating a lexical and grammatical analysis class.
Further, the step 3 includes tree parsing.
Further, the tree parsing includes field table relationship parsing.
Further, the tree parsing step includes: 1) Analyzing INSERT; 2) Analyzing the SELECT; 3) Processing the situation containing an asterisk, if no inserted field is specified in the inserted statement, taking a field alias of the query of the first layer query statement as an inserted field, and if the field is not aliased, specifying the field name as the inserted field; if the inserted field is null and the query field is asterisk, temporarily not supporting this type of query parsing; processing the query to contain an asterisk, and if the first layer of query field is the asterisk or contains the asterisk, replacing the asterisk with the insertion field list and the complement of the query field list; if the last layer of query field contains an asterisk, replacing the asterisk with the complement of the parent query field and the layer of query field; if the middle layer query contains an asterisk, replacing the asterisk with the complement of the sub-query field list and the query field list of the middle layer; under the condition of no support, the un-support contains an asterisk, and the un-support parent-child query contains an asterisk; 4) And (5) analyzing the association relation.
The method of the invention has the following advantages:
the method supports direct generation of the blood-edge relationship of the data object through lexical analysis technology and visual data processing, clearly displays the context relationship of the data object of the data management platform, and rapidly learns the influence analysis between the data objects on the data flow link through the blood-edge relationship.
Drawings
FIG. 1 is a schematic diagram of a data blood relationship tracing principle;
fig. 2 shows a blood relationship tracing implementation process.
Detailed Description
The technical solution of the present invention will be clearly and completely described in conjunction with the specific embodiments, but it should be understood by those skilled in the art that the embodiments described below are only for illustrating the present invention and should not be construed as limiting the scope of the present invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments of the present invention, are within the scope of the present invention.
Related technical terms of the invention:
and the data mapping association is used for establishing association mapping with the data entity object and realizing the comparison of the data blood-source entity ID.
The data definition is used for marking the granularity of the blood relationship record, such as a table level, a field level or a table record level, and the like, on the data association mapping.
And the blood-edge relation analysis is used for carrying out blood-edge relation analysis processing on the data records such as entity metadata mapping relation, inter-table relation and the like according to the data definition.
And the blood relationship storage is used for storing the data subjected to data blood relationship analysis processing. The storage content comprises data entity metadata, blood edge mapping relations, table relations and the like.
The blood relationship query is used for querying the blood relationship of the selected data entity, and can be selected or imported through a visual interface.
And the blood relationship traversal analysis is used for analyzing and analyzing the blood relationship according to a specified model or method.
And displaying the blood relationship, wherein the visual effect of the blood relationship is displayed according to the blood relationship level and distribution of the data entity.
The tracing application is based on data tracing basic data and tracing access, and the tracing application direction comprises data quality, audit trail, data re-derivation, data analysis and other scenes.
The blood-edge relation of the data reflects the coming pulse of the data, can help us track the source of the data, track the data processing process, and display the source, conversion processing, storage and other processes of the data in a data blood-edge relation visualization graph mode. The data blood-edge starts from a certain entity and traces back the processing procedure until the data source interface of the data system. For different types of entities, the conversion process involved may be of different types, such as: for the underlying warehouse entity, the ETL process is involved. Whereas for warehouse summary tables, it may involve both ETL and warehouse summary processes. And for the index, the process of index generation is also involved in addition to the above process. The data source interface entity is provided by the source system as a data input to the data system, and the other data entities are subjected to one or more different types of processing. The blood margin analysis provides the user with insight into the different processes, inputs, outputs, etc. of each process as desired.
The table-level and field-level blood-edge relation tracing is realized through the data blood-edge relation analysis, and a data map can be formed based on the data blood-edge.
The blood relationship tracing implementation process is shown in fig. 2.
The data blood relationship tracing can realize the data blood relationship tracing based on the processes of data acquisition, processing, storage, application and the like in the data lifecycle stage of the data entity metadata. The specific design implementation process is as follows:
and step 1, mapping is established with the data entity to obtain mapping association data, and data entity metadata blood-edge relationship data records are acquired according to the mapping relationship, and data labeling is carried out to generate traceability data.
And 2, performing blood-edge relationship data definition according to the data entity tracing attention requirement, and performing table-level and field-level blood-edge relationship coarse-fine granularity setting.
And 3, carrying out mapping data analysis by adopting a blood-edge relationship analysis algorithm, and analyzing field-level blood-edge relationships and table-level blood-edge relationships of the data entity.
And 4, storing data such as data entity mapping relation, inter-table relation and the like according to the structures of the data entity field level and the table level blood relationship, wherein a relational database or a graph database can be adopted as a storage database.
And 5, inquiring the data entity to perform blood-edge relationship, and acquiring a data blood-edge relationship level and a mapping relationship data set corresponding to the data entity ID.
And 6, traversing the blood-edge relation data set by using a recursion cycle and breadth-first algorithm, and analyzing, organizing and visually displaying the blood-edge relation data.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
1. Establishing an ANTLR grammar file
Creating tokens
The INSERT_SELECT_ STATEMENT token represents the entire INSERTSELECT statement and INSERT_COL_LIST represents the INSERT column. Example grammar file:
2. generating lexical and grammatical analysis classes
The ANTLRWORKS tool generates SqlLexer.java, sqlParser.java grammar from the lexical rules, tokens, and grammar files.
The SQL lexer.java analyzes the input stream according to the SQL lexical rules and a method of generating an m rule name for each lexical rule is used to segment the input stream.
SqlParser.java. Construct a parser from lexical analysis classes and apply the grammar rules to the token stream and generate a tree for each grammar in the grammar file.
3. Tree parsing
1) Specific parsing process of tree
Root node
The root node name is the token INSERT_ STATEMENT
First layer child node
The first level child nodes are mainly divided into two major categories, one category is related to INSERT, and the other category is the whole SELECT statement (including nested child queries)
INSERT node
The child nodes of the INSERT node have a TABLE _ NAME, TABLE _ ALIAS,
INSERT_COL_LIST, wherein the notation TABLE_NAME refers to the inserted TABLE NAME, TABLE_ALIAS, the TABLE ALIAS, INSERT_COL_LIST refers to the inserted field. And traversing each tree respectively to obtain the table names, table aliases and table field names.
INSERT_SELECT_ STATEMENT node
The following node name of the node is INSERT_SELECT_ STATEMENT, the child nodes under the node are SELECT_COL_EXPRS, SELECT_TAB_EXPRS, SELECT_WITH_EXPRS,
select_UNION, select_JOIN, select_DBLINK node
Select_col_exprs node
The root node name is select_col_exprs, and its main child node is column_expr, the COLUMN syntax notation. The main child nodes of COLUMN_EXPR are ATOM_EXPR, select_COL_ALIAS, ATOM_EXPR is a COLUMN name node, and the main child nodes are FUNCTION_EXPR and sql_identifier.
SELECT TAB EXPRS node
The node is a lookup TABLE node, the root node name is select_tab_exprs, and the main child nodes are select_tab lookup TABLE node, select_tab_alias TABLE ALIAS node, select_union, unit or unionall node, select_join, JOIN node, select_ STATEMENT node. And traversing each node in turn to obtain the table name and the table alias, wherein note that the table name traversed by the sub-query is null.
SELECT TAB ALIAS node
The node is a table alias node, and the child node is taken out to be the table alias.
SELECT _ UNION node
The root node is named as SELECY_UNION, the child nodes mainly comprise a selection_COL_EXPRSSELECT_COL_ALIAS, a selection_TAB_EXPRS, a selection_TAB_ALIAS and a selection_WITH_EXPRS, and the sub nodes are traversed in sequence to obtain the unit node information.
Select_join node
The root node is named as select_join, the main child nodes are select_table, select_ALIAS and ON_EXPRS, and the JOIN node information can be obtained by traversing each child node in turn.
Select_dblink node
The root node is named as select_dblink, and the child nodes are at_sign and sql_identifier.
2) The field table relation analysis process comprises the following steps:
because the tree structure is complex, and the tree is always nested and recursively downwards, the relationship between the target field and the source field of the target table and the relationship between the source field and the source field of the source table cannot be analyzed only by single logic judgment. Therefore, the auxiliary class is used for analysis.
Auxiliary class introduction
The query entity class records the information of the analysis result of each query statement, and if a sub-query exists, the whole sub-query is replaced by a table name of empty.
TABLE 1 QueryEntity class attribute
The TableEntity class records information of the lookup table.
TABLE 2 TableEntity class attributes
The field Entity class records information for a field.
TABLE 3 FieldEntity class attribute
UnionQueryEntity class, record the information of the unit inquiry.
Table 4 UnionQueryEntity class attributes
And resolving the entity class of the result by the relation entity class.
TABLE 5 relationship Entity class attribute
QueryComparator class
A compiler interface is implemented for ordering the List by layer in List < query entity >. Collection. Sort (List < QueryEntity > List, newQueryCompartor ()).
3) The analysis steps of the tree are outlined:
(1) Parsing INSERT
The method includes the steps that an inserted table name, an alias and an inserted field name are separated from a first-layer child node of a main tree, and if the inserted field name is not available, the field name is set to be empty.
(2) Resolving SELECT
And analyzing the SELECT hierarchically, constructing a query entity class object at each layer, constructing a field entity class object for each query column, constructing a TableEntity object for each query table, and constructing a UnionQueryentity class object if the select_UNION exists. And establishes a relationship between the objects.
(3) Handling cases containing asterisks
No specified insert field
If the inserted field is not specified in the inserted statement, the field alias of the query of the first layer query statement is used as the inserted field, and if the field is not aliased, the field name is specified as the inserted field. If the inserted field is null and the query field is asterisk, this type of SQL query parsing is temporarily not supported.
Processing queries containing asterisks
If the first layer query field is or contains an asterisk, the asterisk is replaced with the insert field list and the complement of the query field list. If the last layer of query fields contains an asterisk, the asterisk is replaced with the complement of the parent query field and the layer of query fields.
If the middle layer query contains an asterisk, replacing the asterisk with the sub-query field list and the complement of the layer query field list.
Unsupported case
The un-supported unit contains an asterisk, and the un-supported parent-child query contains an asterisk.
(4) Association analysis
Detailed algorithm
Firstly traversing a table entity list in a query entity, if the table name of a field is equal to a table alias, constructing a relation entity object and putting the relation entity object into a result list if the table type is join, if the table type is common, putting the relation entity object into the result list if the table name is not empty, and jumping to the next layer of query entity for traversing if the table name is empty, and repeating the operation steps.
For each layer of query entity, if the List < UnionQueryEntity > List is not empty, an inserted field corresponding to the input field index is found, the List is constructed, and a relation entity object is placed in the result List.
Analysis result
The parsing result is put in the relation entity list, and one inserted field may be derived from multiple tables and multiple fields.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (6)

1. A data entity-based data blood relationship tracing method comprises the following steps:
step 1, mapping is established with a data entity to obtain mapping association data, and data entity metadata blood-edge relationship data records are collected according to the mapping relationship, data labeling is carried out, and traceability data are generated;
step 2, performing blood-edge relationship data definition according to the data entity tracing attention requirement, and performing table-level and field-level blood-edge relationship coarse-fine granularity setting;
step 3, carrying out mapping data analysis by adopting a blood-edge relationship analysis algorithm, and analyzing field-level blood-edge relationship and table-level blood-edge relationship of the data entity;
step 4, according to the structure of the data entity field-level blood-edge relationship and the table-level blood-edge relationship, storing data of the data entity mapping relationship and the table-to-table relationship, wherein a relational database or a graph database is adopted as a storage database;
step 5, inquiring the data entity to trace the blood-edge relationship, and acquiring a data blood-edge relationship level and a mapping relationship data set corresponding to the data entity ID;
and 6, traversing the blood-edge relation data set by using a recursion cycle and breadth-first algorithm, and analyzing, organizing and visually displaying the blood-edge relation data.
2. The method according to claim 1, wherein the step 2 includes creating an ANTLR grammar file.
3. The method according to claim 1, wherein the step 3 includes generating a lexical and grammatical analysis class.
4. The method for tracing a data blood relationship based on a data entity of claim 3, wherein said step 3 comprises tree parsing.
5. The data entity-based data lineage tracing method according to claim 4, wherein the tree parsing includes field table lineage parsing.
6. The method for tracing a data blood relationship based on a data entity of claim 4, wherein said step of tree parsing comprises: 1) Analyzing INSERT; 2) Analyzing the SELECT; 3) Processing the situation containing an asterisk, if no inserted field is specified in the inserted statement, taking a field alias of the query of the first layer query statement as an inserted field, and if the field is not aliased, specifying the field name as the inserted field; if the inserted field is null and the query field is asterisk, temporarily not supporting query analysis of the asterisk type; processing the query to contain an asterisk, and if the first layer of query field is the asterisk or contains the asterisk, replacing the asterisk with the insertion field list and the complement of the query field list;
if the last layer of query field contains an asterisk, replacing the asterisk with the complement of the parent query field and the layer of query field; if the middle layer query contains an asterisk, replacing the asterisk with the complement of the sub-query field list and the query field list of the middle layer; under the condition of no support, the un-support contains an asterisk, and the un-support parent-child query contains an asterisk; 4) And (5) analyzing the association relation.
CN202311025166.4A 2023-08-15 2023-08-15 Data entity-based data blood relationship tracing method Pending CN117235037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311025166.4A CN117235037A (en) 2023-08-15 2023-08-15 Data entity-based data blood relationship tracing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311025166.4A CN117235037A (en) 2023-08-15 2023-08-15 Data entity-based data blood relationship tracing method

Publications (1)

Publication Number Publication Date
CN117235037A true CN117235037A (en) 2023-12-15

Family

ID=89091996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311025166.4A Pending CN117235037A (en) 2023-08-15 2023-08-15 Data entity-based data blood relationship tracing method

Country Status (1)

Country Link
CN (1) CN117235037A (en)

Similar Documents

Publication Publication Date Title
Chebotko et al. A big data modeling methodology for Apache Cassandra
US9600507B2 (en) Index structure for a relational database table
CN102982075B (en) Support to access the system and method for heterogeneous data source
CN107491561B (en) Ontology-based urban traffic heterogeneous data integration system and method
CN106934062A (en) A kind of realization method and system of inquiry elasticsearch
US20130006968A1 (en) Data integration system
KR960706138A (en) SEMANTIC OBJECT MODELING SYSTEM FOR CREATING RELATIONAL DATABASE SCHEMAS
CN110222110A (en) A kind of resource description framework data conversion storage integral method based on ETL tool
US11334549B2 (en) Semantic, single-column identifiers for data entries
CN108681603B (en) Method for rapidly searching tree structure data in database and storage medium
US20080065592A1 (en) Method, system and computer-readable media for software object relationship traversal for object-relational query binding
US20090043733A1 (en) Systems and methods for efficiently storing, retrieving and querying data structures in a relational database system
US9147040B2 (en) Point-in-time query system
CN115080599B (en) Database query SQL field blood relationship generation method
US6999966B2 (en) Content management system and methodology for implementing a complex object using nested/recursive structures
Schwade et al. A semantic data lake for harmonizing data from cross-platform digital workspaces using ontology-based data access
CN111475534B (en) Data query method and related equipment
CN114490724B (en) Method and device for processing database query statement
US8423523B2 (en) Apparatus and method for utilizing context to resolve ambiguous queries
CN110147396B (en) Mapping relation generation method and device
CN117235037A (en) Data entity-based data blood relationship tracing method
CN113792067B (en) System and method for automatically generating SQL (structured query language) based on recursive algorithm
CN115757593A (en) Data processing method, device and storage medium
US9881055B1 (en) Language conversion based on S-expression tabular structure
US9959295B1 (en) S-expression based computation of lineage and change impact analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination