CN112783857A - Data blood reason management method and device, electronic equipment and storage medium - Google Patents
Data blood reason management method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112783857A CN112783857A CN202011623179.8A CN202011623179A CN112783857A CN 112783857 A CN112783857 A CN 112783857A CN 202011623179 A CN202011623179 A CN 202011623179A CN 112783857 A CN112783857 A CN 112783857A
- Authority
- CN
- China
- Prior art keywords
- target
- data
- source
- workflow
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007726 management method Methods 0.000 title claims abstract description 54
- 239000008280 blood Substances 0.000 title claims abstract description 48
- 210000004369 blood Anatomy 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 89
- 230000008569 process Effects 0.000 claims abstract description 61
- 238000011161 development Methods 0.000 claims description 17
- 238000013515 script Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 12
- 238000013079 data visualisation Methods 0.000 claims description 9
- 230000001360 synchronised effect Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 238000005206 flow analysis Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 24
- 238000010586 diagram Methods 0.000 abstract description 8
- 238000012544 monitoring process Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a data blood reason management method, a device, electronic equipment and a storage medium, wherein the data blood reason management method comprises the following steps: acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component; analyzing the process components in the workflow, and determining a source table and a target table of each process component and an incidence relation between a source field in the source table and a target field in the target table; and managing the data blood relationship through the attribute information of the target node and the branching diagram. This application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table blooding reason and the field blooding reason of each node in the workflow, each flow assembly in the workflow, can realize carrying out the location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
Description
Technical Field
The present application relates to the field of data management technologies, and in particular, to a data blood relationship management method and apparatus, an electronic device, and a storage medium.
Background
The monitoring of the blood relationship of a lot of data in the market at present is obtained from the monitoring of the blood relationship of the individual component data of hive or from different data tables, or the monitoring of the blood relationship of the single data trend, and there is not a complete set of whole blood relationship monitoring from data flow to data table to data field for the whole metadata and data processing process, therefore, in the daily data processing and in the process of managing all data tables and data processing flows in the data warehouse, when the data is in problem or needs to be managed for the data, the blood relationship in the market at present can not be located, tracked and traced in a reasonable way.
Disclosure of Invention
In view of this, an object of the present application is to provide a data lineage management method, apparatus, electronic device and storage medium, which implement full-process recording on table lineage and field lineage of each node, each process component in a workflow by recording and managing the whole workflow in the data processing process, and can implement positioning, tracking and backtracking of data problems in a data warehouse in a reasonable manner.
The application mainly comprises the following aspects:
in a first aspect, an embodiment of the present application provides a data blood relationship management method, where the data blood relationship management method includes:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow comprises at least one process component;
analyzing the process components in the workflow, determining a source table and a target table of each process component and an incidence relation between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the incidence relation between the source field and the target field in a branch graph form;
and managing the data blood relationship through the attribute information of the target node and the branch graph.
In a possible implementation manner, the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
In a possible embodiment, the parsing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table for each process component, and displaying the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form includes:
analyzing the synchronous script of the data exchange flow assembly in the workflow, and determining a source table and a target table of the data exchange flow assembly;
analyzing corresponding fields aiming at source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relation between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relation between the source fields and the target fields in a branch graph form.
In a possible implementation manner, the parsing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table for each process component, and displaying the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form further includes:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
In one possible implementation, the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
In a second aspect, an embodiment of the present application further provides a data blood reason management device, including:
the first acquisition module is used for acquiring a workflow of target metadata and storing the workflow to a target node; the workflow comprises at least one process component;
the analysis module is used for analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field in the source table and a target field in the target table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form;
and the management module is used for managing the data blood relationship through the attribute information of the target node and the branch graph.
In a possible implementation manner, the flow component in the first obtaining module specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
In a possible implementation, the parsing module includes:
a data exchange flow analysis first unit, configured to perform analysis of a synchronization script on the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component;
and the data exchange process analysis second unit is used for analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the data lineage management method as described above.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data blood margin management method as described above.
Compared with the data consanguinity management method in the prior art, the data consanguinity management method and the data consanguinity management device provided by the embodiment of the application can be used for recording and managing the whole workflow of the workflow by recording and managing the whole workflow of each node in the workflow, the form consanguinity and the field consanguinity of each process component in the workflow, and can be used for positioning, tracking and backtracking data problems in a data warehouse in a reasonable mode.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating a data blood reason management method provided by an embodiment of the present application;
FIG. 2 is a flow chart illustrating another method for data lineage management provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram illustrating a data blood reason management device according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of another data margin management device provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 6 is a flowchart illustrating a workflow in a data blood reason management method according to an embodiment of the present application.
In the figure:
300-data blood margin management device; 310-a first obtaining module; 320-a resolution module; 321-data exchange flow parsing the first unit; 322-data exchange flow parsing second unit; 323-data development flow analysis unit; 330-a management module; 500-an electronic device; 510-a processor; 520-a memory; 530-bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
Research shows that the blood relationship monitoring of a lot of data in the current market is obtained by monitoring the blood relationship of single assembly data of hive, or obtained by monitoring different data tables, or is the blood relationship monitoring of single pure data trend, and there is no complete set of whole blood relationship monitoring from data flow to data table to data field aiming at the whole metadata and data processing process, therefore, when data is in problem or needs to be managed aiming at all data tables and data processing flows in a data warehouse in daily data processing, the blood relationship in the current market can not be positioned, tracked and backtracked in a reasonable mode when the data is in problem or needs to be managed aiming at the data.
Based on this, embodiments of the present application provide a data blood reason management method and apparatus, an electronic device, and a storage medium, which implement full-process recording on table blood reasons and field blood reasons of each node in a workflow, each process component in the workflow by recording and managing the whole workflow in a data processing process, and can implement positioning, tracking, and backtracking of data problems in a data warehouse in a reasonable manner.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data blood relationship management method according to an embodiment of the present disclosure. As shown in fig. 1, a data blood relationship management method provided by an embodiment of the present application includes the following steps:
s101, obtaining a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
In the step, workflows of target metadata are obtained, the content of each flow component is recorded according to the content of each workflow, and the content recorded by each flow component is stored in the target node, so that the blood margin recording of the workflows is realized.
Here, the flow of a certain workflow is exemplified, as shown in fig. 6: data exchange, cleaning layer data processing, data quality verification, fusion layer data processing, data quality verification, subject layer data processing, data quality verification and data synchronization.
Wherein, the workflow may specifically be: data exchange (sqoop) - > hql (cleaning layer hive data processing) - > qualitis (quality inspection) - > hql (fusion layer hive data processing) - > qualitis (quality inspection) - > hql (topic layer hive data processing) - > qualitis (quality inspection) - > shell (waterdrop synchronous hive data to clickhouse).
Further, the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Here, the data exchange flow components include sqoop flow components and datax flow components, the data development flow components include shell flow components, hql flow components, sql flow components, flow components spark, python flow components, and the like, the data quality verification flow components are used for verifying problems of data, and the data visualization flow components are used for visually displaying data in each flow component in the workflow processing process, so as to facilitate some index display.
Further, the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
Here, the target nodes are arranged to quickly locate, track, and backtrack the corresponding data sources according to the types of the target nodes and the attribute information of the target nodes and to clearly display the data sources when the data is in a problem during data processing flow management or when an error occurs in a certain target node.
S102, analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form.
In this step, in a data processing flow, at least one different workflow may be generated, and at least one flow component may be used in each workflow, so that it is necessary to perform corresponding analysis on each flow component to determine a source table and a target table of each flow component, and an association relationship between a source field in the source table and a target field in the target table.
In order to more intuitively and clearly display the source table, the target table, and the association between the source field and the target field, the determined association between the source table, the target table, and the source field and the target field may be displayed in the form of a branch graph, where the branch graph is a structure of a knowledge graph.
S103, managing the data blood relationship through the attribute information of the target node and the branch graph.
In this step, the attribute information of the target node is combined with the source table, the target table and the incidence relation between the source field and the target field, and is embodied and displayed in a centralized manner through the driving of the branch diagram.
Compared with the data consanguinity method in the prior art, the data consanguinity method provided by the embodiment of the application realizes the full-process recording of the form consanguinity and the field consanguinity of each node and each flow component in the workflow through the overall recording and management of the workflow in the data processing process, and can realize the positioning, tracking and backtracking of data problems in a data warehouse through a reasonable mode.
Referring to fig. 2, fig. 2 is a flowchart illustrating a data blood relationship management method according to another embodiment of the present application. As shown in fig. 2, a method for managing data blood relationship provided in an embodiment of the present application includes:
s201, obtaining a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
S202, analyzing the synchronous script of the data exchange process assembly in the workflow, and determining a source table and a target table of the data exchange process assembly.
In this step, the process components in the workflow specifically include: the data exchange process component comprises a data exchange process component, a data development process component, a data quality verification process component and a data visualization process component, wherein synchronous scripts of the data exchange process component need to be analyzed, and a source table and a target table of the data exchange process component are determined.
S203, analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
The analysis mode of the data exchange flow component sqoop is as follows: and analyzing the synchronous script, acquiring a source table and a target table according to the table and the hive table, and analyzing corresponding fields of a source field in the source table and a target field in the target table by combining mysql/oracle and the like.
For the data exchange flow component hive, analysis of data blood margin can be achieved by hive.
Further, the analyzing the process components in the workflow, determining a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and displaying the determined association relationship between the source table, the target table, and the source field and the target field in a branch graph form, further includes:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
And for the data development process component spark, synchronous analysis of specific fields in the metadata can be realized through self-defining org.
For a data development flow component hadoop, synchronous analysis of specific fields in metadata can be realized through self-defining org.
Here, the parsing of the corresponding field between the source field and the target field may specifically be:
for hadoop, self-defined codes can be analyzed in org.
For the elastic search, when the hive pushes data to the ES through the mapping table, the table building statement of the mapping table is analyzed to obtain a source field and a target field, and when a spark program is used, corresponding field analysis is performed according to the designated sql field of the program.
And for clickhouse, analyzing a target table and a source table according to a synchronous script of a waterdrop, and acquiring a source field and a target field according to an sql statement in the script and metadata of the clickhouse.
And S204, managing the data blood relationship through the attribute information of the target node and the branch graph.
The description of S201 may refer to the description of S101, and the same technical effect may be achieved, which is not described in detail herein.
Compared with the data consanguinity method in the prior art, the data consanguinity method provided by the embodiment of the application realizes the full-process recording of the form consanguinity and the field consanguinity of each node and each flow component in the workflow through the overall recording and management of the workflow in the data processing process, and can realize the positioning, tracking and backtracking of data problems in a data warehouse through a reasonable mode.
Referring to fig. 3 and 4, fig. 3 is a schematic structural diagram of a data blood margin management device according to an embodiment of the present disclosure, and fig. 4 is a schematic structural diagram of another data blood margin management device according to an embodiment of the present disclosure. As shown in fig. 3, the data blood margin management device 300 includes:
a first obtaining module 310, configured to obtain a workflow of target metadata, and store the workflow to a target node; the workflow includes at least one flow component.
Further, the flow components in the first obtaining module 310 specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Further, the attribute information of the target node in the first obtaining module 310 includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
The parsing module 320 is configured to parse the process components in the workflow, determine a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
And the management module 330 is configured to manage the data blood relationship through the attribute information of the target node and the branch graph.
The data bloody border management device that this application embodiment provided compares with data bloody border management device among the prior art, and this application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table bloody border and the field bloody border of each flow subassembly in each node, the workflow in the workflow, can realize carrying out location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
As shown in fig. 4, the data blood margin management device 300 includes:
a first obtaining module 310, configured to obtain a workflow of target metadata, and store the workflow to a target node; the workflow includes at least one flow component.
Further, the flow components in the first obtaining module 310 specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
Further, the attribute information of the target node in the first obtaining module 310 includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
The parsing module 320 is configured to parse the process components in the workflow, determine a source table, a target table, and an association relationship between a source field in the source table and a target field in the target table of each process component, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
Further, the parsing module 320 includes:
a data exchange flow parsing first unit 321, configured to parse the synchronization script for the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component.
A second data exchange flow parsing unit 322, configured to parse corresponding fields for source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determine an association relationship between a source field in the source table and a target field in the target table, and display the determined source table, the target table, and the association relationship between the source field and the target field in a branch graph form.
The data development flow analysis unit 323 is used for analyzing the script of the data development flow component in the workflow through the custom code and synchronously analyzing the specific field in the metadata.
And the management module 330 is configured to manage the data blood relationship through the attribute information of the target node and the branch graph.
The data bloody border management device that this application embodiment provided compares with data bloody border management device among the prior art, and this application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table bloody border and the field bloody border of each flow subassembly in each node, the workflow in the workflow, can realize carrying out location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic device 500 includes a processor 510, a memory 520, and a bus 530.
The memory 520 stores machine-readable instructions executable by the processor 510, the processor 510 and the memory 520 communicate via the bus 530 when the electronic device 500 is running, and the machine-readable instructions, when executed by the processor 510, perform the steps of the data blood margin management method in the method embodiments of fig. 1 and 2.
In particular, the machine readable instructions, when executed by the processor 510, may perform the following:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow includes at least one flow component.
Analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form.
And managing the data blood relationship through the attribute information of the target node and the branch graph.
This application is through at data processing's in-process, to the holistic record of workflow and management, realizes the whole flow record to the table blooding reason and the field blooding reason of each node in the workflow, each flow assembly in the workflow, can realize carrying out the location, tracking and backtracking to the data problem in the data warehouse through reasonable mode.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data blood margin management method in the method embodiments shown in fig. 1 and fig. 2 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method for data consanguinity management, the method comprising:
acquiring a workflow of target metadata, and storing the workflow to a target node; the workflow comprises at least one process component;
analyzing the process components in the workflow, determining a source table and a target table of each process component and an incidence relation between a source field and a target field in the source table, and displaying the determined source table, the determined target table and the incidence relation between the source field and the target field in a branch graph form;
and managing the data blood relationship through the attribute information of the target node and the branch graph.
2. The data lineage management method according to claim 1, wherein the process component specifically includes: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
3. The method according to claim 2, wherein the parsing the process components in the workflow, determining the source table, the destination table and the association relationship between the source field and the destination field in the source table for each process component, and displaying the determined source table, destination table and the association relationship between the source field and the destination field in a branch graph form comprises:
analyzing the synchronous script of the data exchange flow assembly in the workflow, and determining a source table and a target table of the data exchange flow assembly;
analyzing corresponding fields aiming at source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relation between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relation between the source fields and the target fields in a branch graph form.
4. The method according to claim 2, wherein the parsing the process components in the workflow, determining the source table, the destination table and the association relationship between the source field and the destination field in the source table for each process component, and displaying the determined source table, destination table and the association relationship between the source field and the destination field in a branch graph form further comprises:
and analyzing the script of the data development flow assembly in the workflow through a custom code, and synchronously analyzing a specific field in the metadata.
5. The data lineage management method according to claim 1, wherein the attribute information of the target node includes: the method comprises the steps of establishing time of each node, capacity of each node, internal execution directories of each node and execution servers corresponding to each node.
6. A data lineage management device, comprising:
the first acquisition module is used for acquiring a workflow of target metadata and storing the workflow to a target node; the workflow comprises at least one process component;
the analysis module is used for analyzing the process components in the workflow, determining a source table and a target table of each process component and an association relationship between a source field in the source table and a target field in the target table, and displaying the determined source table, the determined target table and the association relationship between the source field and the target field in a branch graph form;
and the management module is used for managing the data blood relationship through the attribute information of the target node and the branch graph.
7. The data bloodline management apparatus of claim 6, characterized in that the flow components in the first acquisition module specifically include: the system comprises a data exchange flow component, a data development flow component, a data quality verification flow component and a data visualization flow component.
8. The data lineage management device according to claim 6, wherein the parsing module includes:
a data exchange flow analysis first unit, configured to perform analysis of a synchronization script on the data exchange flow component in the workflow, and determine a source table and a target table of the data exchange flow component;
and the data exchange process analysis second unit is used for analyzing corresponding fields according to source data corresponding to the metadata in the source table and target data corresponding to the metadata in the target table, determining an association relationship between the source fields in the source table and the target fields in the target table, and displaying the determined source table, the target table and the association relationship between the source fields and the target fields in a branch graph form.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operated, the machine-readable instructions being executable by the processor to perform the steps of the data lineage management method according to any one of claims 1 to 5.
10. A computer-readable storage medium, having stored thereon a computer program for performing, when being executed by a processor, the steps of the data-based blood-margin management method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011623179.8A CN112783857B (en) | 2020-12-31 | 2020-12-31 | Data blood-margin management method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011623179.8A CN112783857B (en) | 2020-12-31 | 2020-12-31 | Data blood-margin management method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112783857A true CN112783857A (en) | 2021-05-11 |
CN112783857B CN112783857B (en) | 2023-10-20 |
Family
ID=75753280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011623179.8A Active CN112783857B (en) | 2020-12-31 | 2020-12-31 | Data blood-margin management method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112783857B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113138990A (en) * | 2021-05-17 | 2021-07-20 | 青岛海信网络科技股份有限公司 | Data blood margin construction and tracing method, device and equipment |
CN113360720A (en) * | 2021-06-24 | 2021-09-07 | 平安普惠企业管理有限公司 | Data asset visualization method, device and equipment based on data consanguinity relationship |
CN113360496A (en) * | 2021-05-26 | 2021-09-07 | 国网能源研究院有限公司 | Method and device for constructing metadata tag library |
CN113468257A (en) * | 2021-07-05 | 2021-10-01 | 乐融致新电子科技(天津)有限公司 | Data quality monitoring method and device based on data warehouse |
CN114064752A (en) * | 2021-11-09 | 2022-02-18 | 珠海市新德汇信息技术有限公司 | Data influence analysis method based on record-level blood relationship, storage medium and equipment |
CN114143177A (en) * | 2021-12-01 | 2022-03-04 | 云赛智联股份有限公司 | Business service monitoring system and monitoring method based on data blood margin |
CN114911785A (en) * | 2022-05-16 | 2022-08-16 | 北京航空航天大学 | Data blood reason management method and device and electronic equipment |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150286495A1 (en) * | 2014-04-02 | 2015-10-08 | International Business Machines Corporation | Metadata-driven workflows and integration with genomic data processing systems and techniques |
WO2017020716A1 (en) * | 2015-08-03 | 2017-02-09 | 阿里巴巴集团控股有限公司 | Method and device for data access control |
US20170270022A1 (en) * | 2016-03-16 | 2017-09-21 | ASG Technologies Group, Inc. dba ASG Technologies | Intelligent Metadata Management and Data Lineage Tracing |
CN109213754A (en) * | 2018-03-29 | 2019-01-15 | 北京九章云极科技有限公司 | A kind of data processing system and data processing method |
CN109325078A (en) * | 2018-09-18 | 2019-02-12 | 拉扎斯网络科技(上海)有限公司 | Data blood margin determination method and device based on structural data |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
US20190138345A1 (en) * | 2017-11-09 | 2019-05-09 | Cloudera, Inc. | Information based on run-time artifacts in a distributed computing cluster |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
CN109918437A (en) * | 2019-03-08 | 2019-06-21 | 北京中油瑞飞信息技术有限责任公司 | Distributed data processing method, apparatus and data assets management system |
TW201933053A (en) * | 2017-10-12 | 2019-08-16 | 美商帕萊堤卡有限責任公司 | System of mapping and transforming data and method of collaboratively manipulating and sharing ideas |
CN110399359A (en) * | 2019-07-24 | 2019-11-01 | 阿里巴巴集团控股有限公司 | A kind of data retrogressive method, device and equipment |
CN110908997A (en) * | 2019-10-09 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | Data blood margin construction method and device, server and readable storage medium |
-
2020
- 2020-12-31 CN CN202011623179.8A patent/CN112783857B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150286495A1 (en) * | 2014-04-02 | 2015-10-08 | International Business Machines Corporation | Metadata-driven workflows and integration with genomic data processing systems and techniques |
WO2017020716A1 (en) * | 2015-08-03 | 2017-02-09 | 阿里巴巴集团控股有限公司 | Method and device for data access control |
US20170270022A1 (en) * | 2016-03-16 | 2017-09-21 | ASG Technologies Group, Inc. dba ASG Technologies | Intelligent Metadata Management and Data Lineage Tracing |
TW201933053A (en) * | 2017-10-12 | 2019-08-16 | 美商帕萊堤卡有限責任公司 | System of mapping and transforming data and method of collaboratively manipulating and sharing ideas |
US20190138345A1 (en) * | 2017-11-09 | 2019-05-09 | Cloudera, Inc. | Information based on run-time artifacts in a distributed computing cluster |
CN109213754A (en) * | 2018-03-29 | 2019-01-15 | 北京九章云极科技有限公司 | A kind of data processing system and data processing method |
CN109325078A (en) * | 2018-09-18 | 2019-02-12 | 拉扎斯网络科技(上海)有限公司 | Data blood margin determination method and device based on structural data |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
CN109918437A (en) * | 2019-03-08 | 2019-06-21 | 北京中油瑞飞信息技术有限责任公司 | Distributed data processing method, apparatus and data assets management system |
CN110399359A (en) * | 2019-07-24 | 2019-11-01 | 阿里巴巴集团控股有限公司 | A kind of data retrogressive method, device and equipment |
CN110908997A (en) * | 2019-10-09 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | Data blood margin construction method and device, server and readable storage medium |
Non-Patent Citations (2)
Title |
---|
RAJENDRA BOSE等: "Lineage retrieval for scientific data processing: a survey", 《ACM COMPUTING SURVEYS》, vol. 37, no. 1, pages 1 - 28, XP058350704, DOI: 10.1145/1057977.1057978 * |
叶天琦等: "数据血缘可视化分析平台研究与应用", 《信息技术与标准化》, no. 11, pages 17 - 20 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113138990A (en) * | 2021-05-17 | 2021-07-20 | 青岛海信网络科技股份有限公司 | Data blood margin construction and tracing method, device and equipment |
CN113138990B (en) * | 2021-05-17 | 2023-04-18 | 青岛海信网络科技股份有限公司 | Data blood margin construction and tracing method, device and equipment |
CN113360496A (en) * | 2021-05-26 | 2021-09-07 | 国网能源研究院有限公司 | Method and device for constructing metadata tag library |
CN113360496B (en) * | 2021-05-26 | 2024-05-14 | 国网能源研究院有限公司 | Method and device for constructing metadata tag library |
CN113360720A (en) * | 2021-06-24 | 2021-09-07 | 平安普惠企业管理有限公司 | Data asset visualization method, device and equipment based on data consanguinity relationship |
CN113360720B (en) * | 2021-06-24 | 2023-11-21 | 湖北华中电力科技开发有限责任公司 | Data asset visualization method, device and equipment based on data blood relationship |
CN113468257A (en) * | 2021-07-05 | 2021-10-01 | 乐融致新电子科技(天津)有限公司 | Data quality monitoring method and device based on data warehouse |
CN114064752A (en) * | 2021-11-09 | 2022-02-18 | 珠海市新德汇信息技术有限公司 | Data influence analysis method based on record-level blood relationship, storage medium and equipment |
CN114143177A (en) * | 2021-12-01 | 2022-03-04 | 云赛智联股份有限公司 | Business service monitoring system and monitoring method based on data blood margin |
CN114911785A (en) * | 2022-05-16 | 2022-08-16 | 北京航空航天大学 | Data blood reason management method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN112783857B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112783857A (en) | Data blood reason management method and device, electronic equipment and storage medium | |
JP6750047B2 (en) | Application migration system | |
US10579619B2 (en) | Validation of query plan | |
US20160306612A1 (en) | Determining errors and warnings corresponding to a source code revision | |
US8447737B2 (en) | System and method for versioning of configuration items | |
EP3084610A1 (en) | Process for displaying test coverage data during code reviews | |
US9846844B2 (en) | Method and system for quantitatively evaluating the confidence in information received from a user based on cognitive behavior | |
CN109426604A (en) | The monitoring method and equipment of code development | |
US20160162825A1 (en) | Monitoring the impact of information quality on business application components through an impact map to data sources | |
CN108446326A (en) | A kind of isomeric data management method and system based on container | |
CN111241198B (en) | Data synchronization method and device and data processing equipment | |
CN114153822A (en) | Data migration method and device, electronic equipment and storage medium | |
CN109359027A (en) | Monkey test method, device, electronic equipment and computer readable storage medium | |
CN116186174A (en) | Data blood relationship graph construction method and related equipment based on data analysis | |
US8924343B2 (en) | Method and system for using confidence factors in forming a system | |
CN113010208A (en) | Version information generation method, version information generation device, version information generation equipment and storage medium | |
WO2023098462A1 (en) | Improving performance of sql execution sequence in production database instance | |
CN113781068B (en) | Online problem solving method, device, electronic equipment and storage medium | |
CN113254315B (en) | Reporting method of embedded point information, embedded point method, device, medium and electronic equipment | |
CN112162954B (en) | User operation log generation and path positioning method, device, equipment and medium | |
US20130311967A1 (en) | Method and System for Collapsing Functional Similarities and Consolidating Functionally Similar, Interacting Systems | |
US20180101596A1 (en) | Deriving and interpreting users collective data asset use across analytic software systems | |
CN104317820A (en) | Statistical method and device of report | |
US20200394091A1 (en) | Failure analysis support system, failure analysis support method, and computer readable recording medium | |
CN110347986B (en) | Form-based information acquisition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |