CN112783857A - Data blood reason management method and device, electronic equipment and storage medium - Google Patents

Data blood reason management method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112783857A
CN112783857A CN202011623179.8A CN202011623179A CN112783857A CN 112783857 A CN112783857 A CN 112783857A CN 202011623179 A CN202011623179 A CN 202011623179A CN 112783857 A CN112783857 A CN 112783857A
Authority
CN
China
Prior art keywords
target
data
source
workflow
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011623179.8A
Other languages
Chinese (zh)
Other versions
CN112783857B (en
Inventor
任亮
傅雨梅
杨飞
文齐辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyin Intelligent Technology Co ltd
Original Assignee
Beijing Zhiyin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyin Intelligent Technology Co ltd filed Critical Beijing Zhiyin Intelligent Technology Co ltd
Priority to CN202011623179.8A priority Critical patent/CN112783857B/en
Publication of CN112783857A publication Critical patent/CN112783857A/en
Application granted granted Critical
Publication of CN112783857B publication Critical patent/CN112783857B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a data blood reason management method, a device, electronic equipment and a storage medium, wherein the data blood reason management method comprises the following steps: acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component; analyzing the process components in the workflow, and determining a source table and a target table of each process component and an incidence relation between a source field in the source table and a target field in the target table; and managing the data blood relationship through the attribute information of the target node and the branching diagram. This application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table blooding reason and the field blooding reason of each node in the workflow, each flow assembly in the workflow, can realize carrying out the location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.

Description

Data blood reason management method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data management technologies, and in particular, to a data blood relationship management method and apparatus, an electronic device, and a storage medium.
Background
The monitoring of the blood relationship of a lot of data in the market at present is obtained from the monitoring of the blood relationship of the individual component data of hive or from different data tables, or the monitoring of the blood relationship of the single data trend, and there is not a complete set of whole blood relationship monitoring from data flow to data table to data field for the whole metadata and data processing process, therefore, in the daily data processing and in the process of managing all data tables and data processing flows in the data warehouse, when the data is in problem or needs to be managed for the data, the blood relationship in the market at present can not be located, tracked and traced in a reasonable way.
Disclosure of Invention
In view of this, an object of the present application is to provide a data lineage management method, apparatus, electronic device and storage medium, which implement full-process recording on table lineage and field lineage of each node, each process component in a workflow by recording and managing the whole workflow in the data processing process, and can implement positioning, tracking and backtracking of data problems in a data warehouse in a reasonable manner.
The application mainly comprises the following aspects:
in a first aspect, an embodiment of the present application provides a data blood relationship management method, where the data blood relationship management method includes:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow comprises at least one process component;
analyzing the process components in the workflow, determining a source table and a target table of each process component and an incidence relation between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the incidence relation between the source field and the target field in a branch graph form;
and managing the data blood relationship through the attribute information of the target node and the branch graph.
In a possible implementation manner, the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
In a possible embodiment, the parsing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table for each process component, and displaying the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form includes:
analyzing the synchronous script of the data exchange flow assembly in the workflow, and determining a source table and a target table of the data exchange flow assembly;
analyzing corresponding fields aiming at source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relation between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relation between the source fields and the target fields in a branch graph form.
In a possible implementation manner, the parsing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table for each process component, and displaying the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form further includes:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
In one possible implementation, the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
In a second aspect, an embodiment of the present application further provides a data blood reason management device, including:
the first acquisition module is used for acquiring a workflow of target metadata and storing the workflow to a target node; the workflow comprises at least one process component;
the analysis module is used for analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field in the source table and a target field in the target table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form;
and the management module is used for managing the data blood relationship through the attribute information of the target node and the branch graph.
In a possible implementation manner, the flow component in the first obtaining module specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
In a possible implementation, the parsing module includes:
a data exchange flow analysis first unit, configured to perform analysis of a synchronization script on the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component;
and the data exchange process analysis second unit is used for analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the data lineage management method as described above.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data blood margin management method as described above.
Compared with the data consanguinity management method in the prior art, the data consanguinity management method and the data consanguinity management device provided by the embodiment of the application can be used for recording and managing the whole workflow of the workflow by recording and managing the whole workflow of each node in the workflow, the form consanguinity and the field consanguinity of each process component in the workflow, and can be used for positioning, tracking and backtracking data problems in a data warehouse in a reasonable mode.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating a data blood reason management method provided by an embodiment of the present application;
FIG. 2 is a flow chart illustrating another method for data lineage management provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram illustrating a data blood reason management device according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of another data margin management device provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 6 is a flowchart illustrating a workflow in a data blood reason management method according to an embodiment of the present application.
In the figure:
300-data blood margin management device; 310-a first obtaining module; 320-a resolution module; 321-data exchange flow parsing the first unit; 322-data exchange flow parsing second unit; 323-data development flow analysis unit; 330-a management module; 500-an electronic device; 510-a processor; 520-a memory; 530-bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
Research shows that the blood relationship monitoring of a lot of data in the current market is obtained by monitoring the blood relationship of single assembly data of hive, or obtained by monitoring different data tables, or is the blood relationship monitoring of single pure data trend, and there is no complete set of whole blood relationship monitoring from data flow to data table to data field aiming at the whole metadata and data processing process, therefore, when data is in problem or needs to be managed aiming at all data tables and data processing flows in a data warehouse in daily data processing, the blood relationship in the current market can not be positioned, tracked and backtracked in a reasonable mode when the data is in problem or needs to be managed aiming at the data.
Based on this, embodiments of the present application provide a data blood reason management method and apparatus, an electronic device, and a storage medium, which implement full-process recording on table blood reasons and field blood reasons of each node in a workflow, each process component in the workflow by recording and managing the whole workflow in a data processing process, and can implement positioning, tracking, and backtracking of data problems in a data warehouse in a reasonable manner.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data blood relationship management method according to an embodiment of the present disclosure. As shown in fig. 1, a data blood relationship management method provided by an embodiment of the present application includes the following steps:
s101, obtaining a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
In the step, workflows of target metadata are obtained, the content of each flow component is recorded according to the content of each workflow, and the content recorded by each flow component is stored in the target node, so that the blood margin recording of the workflows is realized.
Here, the flow of a certain workflow is exemplified, as shown in fig. 6: data exchange, cleaning layer data processing, data quality verification, fusion layer data processing, data quality verification, subject layer data processing, data quality verification and data synchronization.
Wherein, the workflow may specifically be: data exchange (sqoop) - > hql (cleaning layer hive data processing) - > qualitis (quality inspection) - > hql (fusion layer hive data processing) - > qualitis (quality inspection) - > hql (topic layer hive data processing) - > qualitis (quality inspection) - > shell (waterdrop synchronous hive data to clickhouse).
Further, the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Here, the data exchange flow components include sqoop flow components and datax flow components, the data development flow components include shell flow components, hql flow components, sql flow components, flow components spark, python flow components, and the like, the data quality verification flow components are used for verifying problems of data, and the data visualization flow components are used for visually displaying data in each flow component in the workflow processing process, so as to facilitate some index display.
Further, the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
Here, the target nodes are arranged to quickly locate, track, and backtrack the corresponding data sources according to the types of the target nodes and the attribute information of the target nodes and to clearly display the data sources when the data is in a problem during data processing flow management or when an error occurs in a certain target node.
S102, analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form.
In this step, in a data processing flow, at least one different workflow may be generated, and at least one flow component may be used in each workflow, so that it is necessary to perform corresponding analysis on each flow component to determine a source table and a target table of each flow component, and an association relationship between a source field in the source table and a target field in the target table.
In order to more intuitively and clearly display the source table, the target table, and the association between the source field and the target field, the determined association between the source table, the target table, and the source field and the target field may be displayed in the form of a branch graph, where the branch graph is a structure of a knowledge graph.
S103, managing the data blood relationship through the attribute information of the target node and the branch graph.
In this step, the attribute information of the target node is combined with the source table, the target table and the incidence relation between the source field and the target field, and is embodied and displayed in a centralized manner through the driving of the branch diagram.
Compared with the data consanguinity method in the prior art, the data consanguinity method provided by the embodiment of the application realizes the full-process recording of the form consanguinity and the field consanguinity of each node and each flow component in the workflow through the overall recording and management of the workflow in the data processing process, and can realize the positioning, tracking and backtracking of data problems in a data warehouse through a reasonable mode.
Referring to fig. 2, fig. 2 is a flowchart illustrating a data blood relationship management method according to another embodiment of the present application. As shown in fig. 2, a method for managing data blood relationship provided in an embodiment of the present application includes:
s201, obtaining a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
S202, analyzing the synchronous script of the data exchange process assembly in the workflow, and determining a source table and a target table of the data exchange process assembly.
In this step, the process components in the workflow specifically include: the data exchange process component comprises a data exchange process component, a data development process component, a data quality verification process component and a data visualization process component, wherein synchronous scripts of the data exchange process component need to be analyzed, and a source table and a target table of the data exchange process component are determined.
S203, analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
The analysis mode of the data exchange flow component sqoop is as follows: and analyzing the synchronous script, acquiring a source table and a target table according to the table and the hive table, and analyzing corresponding fields of a source field in the source table and a target field in the target table by combining mysql/oracle and the like.
For the data exchange flow component hive, analysis of data blood margin can be achieved by hive.
Further, the analyzing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and displaying the determined association relationship between the source table, the target table, and the source field and the target field in a branch graph form, further includes:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
And for the data development process component spark, synchronous analysis of specific fields in the metadata can be realized through self-defining org.
For a data development flow component hadoop, synchronous analysis of specific fields in metadata can be realized through self-defining org.
Here, the parsing of the corresponding field between the source field and the target field may specifically be:
for hadoop, self-defined codes can be analyzed in org.
For the elastic search, when the hive pushes data to the ES through the mapping table, the table building statement of the mapping table is analyzed to obtain a source field and a target field, and when a spark program is used, corresponding field analysis is performed according to the designated sql field of the program.
And for clickhouse, analyzing a target table and a source table according to a synchronous script of a waterdrop, and acquiring a source field and a target field according to an sql statement in the script and metadata of the clickhouse.
And S204, managing the data blood relationship through the attribute information of the target node and the branch graph.
The description of S201 may refer to the description of S101, and the same technical effect may be achieved, which is not described in detail herein.
Compared with the data consanguinity method in the prior art, the data consanguinity method provided by the embodiment of the application realizes the full-process recording of the form consanguinity and the field consanguinity of each node and each flow component in the workflow through the overall recording and management of the workflow in the data processing process, and can realize the positioning, tracking and backtracking of data problems in a data warehouse through a reasonable mode.
Referring to fig. 3 and 4, fig. 3 is a schematic structural diagram of a data blood margin management device according to an embodiment of the present disclosure, and fig. 4 is a schematic structural diagram of another data blood margin management device according to an embodiment of the present disclosure. As shown in fig. 3, the data blood margin management device 300 includes:
a first obtaining module 310, configured to obtain a workflow of target metadata, and store the workflow to a target node; the workflow includes at least one flow component.
Further, the flow components in the first obtaining module 310 specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Further, the attribute information of the target node in the first obtaining module 310 includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
The parsing module 320 is configured to parse the process components in the workflow, determine a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
And the management module 330 is configured to manage the data blood relationship through the attribute information of the target node and the branch graph.
The data bloody border management device that this application embodiment provided compares with data bloody border management device among the prior art, and this application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table bloody border and the field bloody border of each flow subassembly in each node, the workflow in the workflow, can realize carrying out location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
As shown in fig. 4, the data blood margin management device 300 includes:
a first obtaining module 310, configured to obtain a workflow of target metadata, and store the workflow to a target node; the workflow includes at least one flow component.
Further, the flow components in the first obtaining module 310 specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Further, the attribute information of the target node in the first obtaining module 310 includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
The parsing module 320 is configured to parse the process components in the workflow, determine a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
Further, the parsing module 320 includes:
a data exchange flow parsing first unit 321, configured to parse the synchronization script for the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component.
A second data exchange flow parsing unit 322, configured to parse corresponding fields for source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determine an association relationship between a source field in the source table and a target field in the target table, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
The data development flow analysis unit 323 is used for analyzing the script of the data development flow component in the workflow through the custom code and synchronously analyzing the specific field in the metadata.
And the management module 330 is configured to manage the data blood relationship through the attribute information of the target node and the branch graph.
The data bloody border management device that this application embodiment provided compares with data bloody border management device among the prior art, and this application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table bloody border and the field bloody border of each flow subassembly in each node, the workflow in the workflow, can realize carrying out location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic device 500 includes a processor 510, a memory 520, and a bus 530.
The memory 520 stores machine-readable instructions executable by the processor 510, the processor 510 and the memory 520 communicate via the bus 530 when the electronic device 500 is running, and the machine-readable instructions, when executed by the processor 510, perform the steps of the data blood margin management method in the method embodiments of fig. 1 and 2.
In particular, the machine readable instructions, when executed by the processor 510, may perform the following:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
Analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form.
And managing the data blood relationship through the attribute information of the target node and the branch graph.
This application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table blooding reason and the field blooding reason of each node in the workflow, each flow assembly in the workflow, can realize carrying out the location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data blood margin management method in the method embodiments shown in fig. 1 and fig. 2 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for data consanguinity management, the method comprising:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow comprises at least one process component;
analyzing the process components in the workflow, determining a source table and a target table of each process component and an incidence relation between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the incidence relation between the source field and the target field in a branch graph form;
and managing the data blood relationship through the attribute information of the target node and the branch graph.
2. The data lineage management method according to claim 1, wherein the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
3. The method according to claim 2, wherein the parsing the process components in the workflow, determining the source table, the destination table and the association relationship between the source field and the destination field in the source table for each process component, and displaying the determined source table, destination table and the association relationship between the source field and the destination field in a branch graph form comprises:
analyzing the synchronous script of the data exchange flow assembly in the workflow, and determining a source table and a target table of the data exchange flow assembly;
analyzing corresponding fields aiming at source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relation between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relation between the source fields and the target fields in a branch graph form.
4. The method according to claim 2, wherein the parsing the process components in the workflow, determining the source table, the destination table and the association relationship between the source field and the destination field in the source table for each process component, and displaying the determined source table, destination table and the association relationship between the source field and the destination field in a branch graph form further comprises:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
5. The data lineage management method according to claim 1, wherein the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
6. A data lineage management device, comprising:
the first acquisition module is used for acquiring a workflow of target metadata and storing the workflow to a target node; the workflow comprises at least one process component;
the analysis module is used for analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field in the source table and a target field in the target table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form;
and the management module is used for managing the data blood relationship through the attribute information of the target node and the branch graph.
7. The data bloodline management apparatus of claim 6, characterized in that the flow components in the first acquisition module specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
8. The data lineage management device according to claim 6, wherein the parsing module includes:
a data exchange flow analysis first unit, configured to perform analysis of a synchronization script on the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component;
and the data exchange process analysis second unit is used for analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the data lineage management method according to any one of claims 1 to 5.
10. A computer-readable storage medium, having stored thereon a computer program for performing, when being executed by a processor, the steps of the data-based blood-margin management method according to any one of claims 1 to 5.
CN202011623179.8A 2020-12-31 2020-12-31 Data blood-margin management method and device, electronic equipment and storage medium Active CN112783857B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011623179.8A CN112783857B (en) 2020-12-31 2020-12-31 Data blood-margin management method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011623179.8A CN112783857B (en) 2020-12-31 2020-12-31 Data blood-margin management method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112783857A true CN112783857A (en) 2021-05-11
CN112783857B CN112783857B (en) 2023-10-20

Family

ID=75753280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011623179.8A Active CN112783857B (en) 2020-12-31 2020-12-31 Data blood-margin management method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112783857B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138990A (en) * 2021-05-17 2021-07-20 青岛海信网络科技股份有限公司 Data blood margin construction and tracing method, device and equipment
CN113360720A (en) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 Data asset visualization method, device and equipment based on data consanguinity relationship
CN113360496A (en) * 2021-05-26 2021-09-07 国网能源研究院有限公司 Method and device for constructing metadata tag library
CN113468257A (en) * 2021-07-05 2021-10-01 乐融致新电子科技(天津)有限公司 Data quality monitoring method and device based on data warehouse
CN114064752A (en) * 2021-11-09 2022-02-18 珠海市新德汇信息技术有限公司 Data influence analysis method based on record-level blood relationship, storage medium and equipment
CN114143177A (en) * 2021-12-01 2022-03-04 云赛智联股份有限公司 Business service monitoring system and monitoring method based on data blood margin
CN114911785A (en) * 2022-05-16 2022-08-16 北京航空航天大学 Data blood reason management method and device and electronic equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286495A1 (en) * 2014-04-02 2015-10-08 International Business Machines Corporation Metadata-driven workflows and integration with genomic data processing systems and techniques
WO2017020716A1 (en) * 2015-08-03 2017-02-09 阿里巴巴集团控股有限公司 Method and device for data access control
US20170270022A1 (en) * 2016-03-16 2017-09-21 ASG Technologies Group, Inc. dba ASG Technologies Intelligent Metadata Management and Data Lineage Tracing
CN109213754A (en) * 2018-03-29 2019-01-15 北京九章云极科技有限公司 A kind of data processing system and data processing method
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Data blood margin determination method and device based on structural data
CN109542901A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Data processing method, device, computer readable storage medium and electronic equipment
US20190138345A1 (en) * 2017-11-09 2019-05-09 Cloudera, Inc. Information based on run-time artifacts in a distributed computing cluster
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
TW201933053A (en) * 2017-10-12 2019-08-16 美商帕萊堤卡有限責任公司 System of mapping and transforming data and method of collaboratively manipulating and sharing ideas
CN110399359A (en) * 2019-07-24 2019-11-01 阿里巴巴集团控股有限公司 A kind of data retrogressive method, device and equipment
CN110908997A (en) * 2019-10-09 2020-03-24 支付宝(杭州)信息技术有限公司 Data blood margin construction method and device, server and readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286495A1 (en) * 2014-04-02 2015-10-08 International Business Machines Corporation Metadata-driven workflows and integration with genomic data processing systems and techniques
WO2017020716A1 (en) * 2015-08-03 2017-02-09 阿里巴巴集团控股有限公司 Method and device for data access control
US20170270022A1 (en) * 2016-03-16 2017-09-21 ASG Technologies Group, Inc. dba ASG Technologies Intelligent Metadata Management and Data Lineage Tracing
TW201933053A (en) * 2017-10-12 2019-08-16 美商帕萊堤卡有限責任公司 System of mapping and transforming data and method of collaboratively manipulating and sharing ideas
US20190138345A1 (en) * 2017-11-09 2019-05-09 Cloudera, Inc. Information based on run-time artifacts in a distributed computing cluster
CN109213754A (en) * 2018-03-29 2019-01-15 北京九章云极科技有限公司 A kind of data processing system and data processing method
CN109325078A (en) * 2018-09-18 2019-02-12 拉扎斯网络科技(上海)有限公司 Data blood margin determination method and device based on structural data
CN109542901A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Data processing method, device, computer readable storage medium and electronic equipment
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN110399359A (en) * 2019-07-24 2019-11-01 阿里巴巴集团控股有限公司 A kind of data retrogressive method, device and equipment
CN110908997A (en) * 2019-10-09 2020-03-24 支付宝(杭州)信息技术有限公司 Data blood margin construction method and device, server and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RAJENDRA BOSE等: "Lineage retrieval for scientific data processing: a survey", 《ACM COMPUTING SURVEYS》, vol. 37, no. 1, pages 1 - 28, XP058350704, DOI: 10.1145/1057977.1057978 *
叶天琦等: "数据血缘可视化分析平台研究与应用", 《信息技术与标准化》, no. 11, pages 17 - 20 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113138990A (en) * 2021-05-17 2021-07-20 青岛海信网络科技股份有限公司 Data blood margin construction and tracing method, device and equipment
CN113138990B (en) * 2021-05-17 2023-04-18 青岛海信网络科技股份有限公司 Data blood margin construction and tracing method, device and equipment
CN113360496A (en) * 2021-05-26 2021-09-07 国网能源研究院有限公司 Method and device for constructing metadata tag library
CN113360496B (en) * 2021-05-26 2024-05-14 国网能源研究院有限公司 Method and device for constructing metadata tag library
CN113360720A (en) * 2021-06-24 2021-09-07 平安普惠企业管理有限公司 Data asset visualization method, device and equipment based on data consanguinity relationship
CN113360720B (en) * 2021-06-24 2023-11-21 湖北华中电力科技开发有限责任公司 Data asset visualization method, device and equipment based on data blood relationship
CN113468257A (en) * 2021-07-05 2021-10-01 乐融致新电子科技(天津)有限公司 Data quality monitoring method and device based on data warehouse
CN114064752A (en) * 2021-11-09 2022-02-18 珠海市新德汇信息技术有限公司 Data influence analysis method based on record-level blood relationship, storage medium and equipment
CN114143177A (en) * 2021-12-01 2022-03-04 云赛智联股份有限公司 Business service monitoring system and monitoring method based on data blood margin
CN114911785A (en) * 2022-05-16 2022-08-16 北京航空航天大学 Data blood reason management method and device and electronic equipment

Also Published As

Publication number Publication date
CN112783857B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN112783857A (en) Data blood reason management method and device, electronic equipment and storage medium
JP6750047B2 (en) Application migration system
US10579619B2 (en) Validation of query plan
US20160306612A1 (en) Determining errors and warnings corresponding to a source code revision
US8447737B2 (en) System and method for versioning of configuration items
EP3084610A1 (en) Process for displaying test coverage data during code reviews
US9846844B2 (en) Method and system for quantitatively evaluating the confidence in information received from a user based on cognitive behavior
CN109426604A (en) The monitoring method and equipment of code development
US20160162825A1 (en) Monitoring the impact of information quality on business application components through an impact map to data sources
CN108446326A (en) A kind of isomeric data management method and system based on container
CN111241198B (en) Data synchronization method and device and data processing equipment
CN114153822A (en) Data migration method and device, electronic equipment and storage medium
CN109359027A (en) Monkey test method, device, electronic equipment and computer readable storage medium
CN116186174A (en) Data blood relationship graph construction method and related equipment based on data analysis
US8924343B2 (en) Method and system for using confidence factors in forming a system
CN113010208A (en) Version information generation method, version information generation device, version information generation equipment and storage medium
WO2023098462A1 (en) Improving performance of sql execution sequence in production database instance
CN113781068B (en) Online problem solving method, device, electronic equipment and storage medium
CN113254315B (en) Reporting method of embedded point information, embedded point method, device, medium and electronic equipment
CN112162954B (en) User operation log generation and path positioning method, device, equipment and medium
US20130311967A1 (en) Method and System for Collapsing Functional Similarities and Consolidating Functionally Similar, Interacting Systems
US20180101596A1 (en) Deriving and interpreting users collective data asset use across analytic software systems
CN104317820A (en) Statistical method and device of report
US20200394091A1 (en) Failure analysis support system, failure analysis support method, and computer readable recording medium
CN110347986B (en) Form-based information acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant