CN109299073B - Data blood margin generation method and system, electronic equipment and storage medium - Google Patents

Data blood margin generation method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN109299073B
CN109299073B CN201811221914.5A CN201811221914A CN109299073B CN 109299073 B CN109299073 B CN 109299073B CN 201811221914 A CN201811221914 A CN 201811221914A CN 109299073 B CN109299073 B CN 109299073B
Authority
CN
China
Prior art keywords
data
source
task
information
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811221914.5A
Other languages
Chinese (zh)
Other versions
CN109299073A (en
Inventor
贾涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201811221914.5A priority Critical patent/CN109299073B/en
Publication of CN109299073A publication Critical patent/CN109299073A/en
Application granted granted Critical
Publication of CN109299073B publication Critical patent/CN109299073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a method, a system, electronic equipment and a storage medium for generating a data blood margin, wherein the method comprises the following steps: acquiring each target data source and current task information; respectively defining each target data source and the current task information through a unified preset protocol format; and when the current task is executed, generating a corresponding data blood margin according to the execution process of the current task and a uniform preset protocol format. No matter what format the target data source is, what data platform or tool is adopted, the blood margin can be generated as long as the preset protocol format is followed, the generation limit of the data blood margin among different data sources is broken, and the compatibility of different data platforms or tools is realized. The method breaks through the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis, and expands the generation range of the blood margin of the data. Meanwhile, the problem of blood relationship generation when the network is blocked when multiple systems are interacted is solved.

Description

Data blood margin generation method and system, electronic equipment and storage medium
Technical Field
The invention relates to a data tracing technology in the field of big data, in particular to a method and a system for generating a data blood margin, electronic equipment and a storage medium.
Background
In recent years, with the development of databases and networks, data consanguinity, which is the ancestral information of data, records the whole history of data processing, including the origin of data and the whole process of data generation and evolution over time, becomes an important research field. For a database system, it is sometimes necessary to trace back the source of a query result to measure the reliability of data, the quality of data, and the like.
The purpose of data consanguinity research is mainly to solve the problems of credibility, quality, version information and the like of data in distributed data sharing through data consanguinity tracking. The prior art is limited to SQL (structured query language) parsing to generate a library, table level blood margin, and the blood margin support for a field level is very difficult and inaccurate, so that the tracking of the data blood margin is limited. Particularly, when different data sources are faced, because the formats of the data sources are different, the generation of the blood margin needs to be performed through different data blood margin tools, and the generation of the data blood margin is severely restricted.
Therefore, how to break the generation boundary of the data blood margin between different data sources to achieve compatibility with different data platforms or tools is a technical problem that needs to be solved by those skilled in the art at present.
Disclosure of Invention
The invention aims to provide a method, a system, electronic equipment and a storage medium for generating data blood margin, which can break the generation limit of the data blood margin between different data sources so as to realize compatibility with different data platforms or tools.
In order to solve the technical problems, the invention provides the following technical scheme:
a method of generating a data blood margin, comprising:
acquiring each target data source and current task information;
respectively defining each target data source and the current task information through a unified preset protocol format;
and when the current task is executed, generating a corresponding data blood margin according to the execution process of the current task and the unified preset protocol format.
Preferably, the acquiring of the target data sources and the current task information includes:
acquiring metadata information of each source data source and each destination data source, wherein the metadata information comprises: libraries, tables, and fields;
and acquiring current task information.
Preferably, the respectively defining each target data source and the current task information through a unified preset protocol format includes:
defining each target data source through the unified preset protocol format, analyzing the metadata information, and generating association relation information of a library containing table and a table containing field;
and defining the task information through the unified preset protocol format, analyzing the task information, and determining the source and destination of the library, table and/or field contained in the task information.
Preferably, when the current task is executed, generating a corresponding data bloodmargin according to the execution process of the current task and the unified preset protocol format includes:
when the current task is executed, corresponding data consanguinity is generated according to the source and destination of the library, table and/or field contained in the task information.
Preferably, the preset protocol format is: JSON data exchange format.
Preferably, the analyzing the task information to determine the source and destination of the library, table and/or field included in the task information includes:
analyzing a data source object of the task and a library, a table and/or a field contained in the task;
and determining the source and destination of the library, table and/or field contained in the task information according to the analysis content of the task.
Preferably, the determining the source and destination of the library, table and/or field included in the task information according to the analysis content of the task includes:
and according to the analysis content of the task, correlating the names of the corresponding source and destination data sources with corresponding library names, table names and field names to determine the source and destination of the library, table and/or field contained in the task information.
A system for generating a data blood margin, comprising:
the acquisition module is used for acquiring each target data source and current task information;
the definition module is used for respectively defining each target data source and the current task information through a unified preset protocol format;
and the blood margin generation module is used for generating corresponding data blood margin according to the execution process of the current task and the unified preset protocol format when the current task is executed.
Preferably, the obtaining module includes:
a metadata obtaining unit, configured to obtain metadata information of each source data source and each destination data source, where the metadata information includes: libraries, tables, and fields;
and the task information acquisition unit is used for acquiring the current task information.
Preferably, the definition module comprises:
the first definition unit is used for defining each target data source through the unified preset protocol format, analyzing the metadata information and generating association relation information of a library containing table and a table containing field;
and the second definition unit is used for defining the task information through the unified preset protocol format, analyzing the task information and determining the source and destination of the library, the table and/or the field contained in the task information.
Preferably, the blood margin generation module comprises:
and the blood margin generating unit is used for generating corresponding data blood margin according to the source and destination of the library, the table and/or the field contained in the task information when the current task is executed.
Preferably, the first defining unit is:
and the JSON format definition unit is used for converting the metadata information of the source data sources and the destination data sources into a JSON format.
Preferably, the second defining unit includes:
the first analysis subunit is used for analyzing the data source object of the task and the library, the table and/or the field contained in the task;
and the first determining subunit is used for determining the source and destination of the library, the table and/or the field contained in the task information according to the analysis content of the task.
Preferably, the first determining subunit includes:
and the association function unit is used for associating the names of the corresponding source and destination data sources with corresponding library names, table names and field names according to the analysis content of the task so as to determine the source and destination of the libraries, tables and/or fields contained in the task information.
An electronic device for generation of data bloodlines, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for generating a data bloodline according to any one of the above when the computer program is executed.
A computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of generating a data lineage as claimed in any one of the preceding claims.
Compared with the prior art, the technical scheme has the following advantages:
the embodiment of the invention provides a method for generating a data blood margin, which comprises the following steps: acquiring each target data source and current task information; respectively defining each target data source and the current task information through a unified preset protocol format; and when the current task is executed, generating a corresponding data blood margin according to the execution process of the current task and the unified preset protocol format. The target data source and the task are defined through a uniform protocol format, then the data blood margin is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, the blood margin can be generated as long as the preset protocol format is followed, the generation limit of the data blood margin among different data sources is broken, and the compatibility of different data platforms or tools is realized. The method breaks through the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis, and expands the generation range of the blood margin of the data. Meanwhile, the problem of blood relationship generation when the network is blocked when multiple systems are interacted is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for generating a data blood margin according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system for generating data blood margin according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for generating a data blood margin according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a method, a system, electronic equipment and a storage medium for generating data blood margin, which can break the generation boundary of the data blood margin between different data sources so as to realize the purpose of being compatible with different data platforms or tools.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, specific details are set forth in order to provide a thorough understanding of the present invention. The invention can be implemented in a number of ways different from those described herein and similar generalizations can be made by those skilled in the art without departing from the spirit of the invention. The invention is therefore not limited to the specific implementations disclosed below.
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for generating a data blood margin according to an embodiment of the present invention.
One embodiment of the present invention provides a method for generating a data blood margin, including:
s11: and acquiring each target data source and current task information.
S12: and respectively defining each target data source and the current task information through a uniform preset protocol format.
S13: and when the current task is executed, generating a corresponding data blood margin according to the execution process of the current task and a uniform preset protocol format.
In the embodiment, the target data source and the task are defined through a uniform protocol format, and then the data blood margin is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, and the blood margin can be generated as long as the preset protocol format is followed, so that the generation limit of the data blood margin among different data sources is broken, and the compatibility of different data platforms or tools is realized. The method breaks through the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis, and expands the generation range of the blood margin of the data. Meanwhile, the problem of blood relationship generation when the network is blocked when multiple systems are interacted is solved.
On the basis of the above embodiment, in an embodiment of the present invention, acquiring each target data source and current task information includes: acquiring metadata information of each source data source and each destination data source, wherein the metadata information comprises: libraries, tables, and fields; and acquiring current task information.
Further, defining each target data source and the current task information through a unified preset protocol format respectively, including: defining each target data source through a uniform preset protocol format, analyzing metadata information, and generating association relation information of a library containing table and a table containing field; and defining the task information through a uniform preset protocol format, analyzing the task information, and determining the source and destination of the library, the table and/or the field contained in the task information.
Preferably, when the current task is executed, generating a corresponding data bloodborder according to the execution process of the current task and a unified preset protocol format, including: when the current task is executed, corresponding data consanguinity is generated according to the source and destination of the libraries, tables and/or fields contained in the task information.
In an embodiment of the present invention, each target data source is defined by a uniform preset protocol format, which specifically includes: and converting the metadata information of each source data source and each destination data source into a JSON format.
Further, analyzing the task information, and determining the source and destination of the libraries, tables and/or fields contained in the task information, includes: analyzing a data source object of a task and a library, a table and/or a field contained in the task; and determining the source and destination of the libraries, tables and/or fields contained in the task information according to the analysis content of the task.
Furthermore, determining the source and destination of the libraries, tables and/or fields contained in the task information according to the analysis content of the task comprises: and according to the analysis content of the task, correlating the names of the corresponding source and destination data sources with corresponding library names, table names and field names to determine the source and destination of the library, table and/or field contained in the task information.
In this embodiment, in order to generate a data lineage in a business system, first, library, table, and field information is generated. In order to break format restrictions of data sources in different data platforms or tools, when data lineage generation is performed, first, metadata information of each data source is obtained, and then, the metadata information is converted into a uniform data source protocol format.
The JSON, JavaScript Object notification, is a lightweight data exchange format. The method has a simple and clear hierarchical structure, is easy to read and write by people, is easy to analyze and generate by machines, and effectively improves the network transmission efficiency.
Analyzing the metadata information to generate the incidence relation information of a library containing table and a table containing field, wherein the method comprises the following steps: analyzing library, table and field information contained in the metadata information; and generating a corresponding relation chain according to the corresponding dependency relations of the library, the table and the field.
The data consanguinity depends on information of the library, the table and the field, and a corresponding relation chain is generated by analyzing the subordination relation of the library, the table and the field, so that the table and the library to which the target field belongs can be conveniently known.
The source and destination of the data in the task are obtained by analyzing the task information, namely the source data source and the destination data source of the data are obtained and are matched and associated with the library, the table and the field where the data in the task are located. In order to facilitate generation of data consanguinity and tracing of data sources, in the present embodiment, it is further preferable that the determining of the source and destination of the library, table, and/or field included in the task information based on the analysis content of the task includes: and according to the analysis content of the task, associating the names of the corresponding source and the destination data source with the corresponding library name, table name and field name. That is, library, table, and field information can be looked up from an existing data source according to a unique name.
Data in a task may originate from different source data sources, or from different libraries, tables, and/or fields. By assembling the pool, table and field consanguinity to generate the data consanguinity, when the consanguinity is viewed, the consanguinity at the level of the pool, table and field can be queried from the database according to the unique name.
Specifically, according to the embodiments, the target data source and the task are defined by a uniform protocol format, and then the data blooding border is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, and the blooding border can be generated as long as the preset protocol format is followed, so that the generation limit of the data blooding border between different data sources is broken, and the compatibility of different data platforms or tools is realized.
In addition, for a system which cannot be accessed currently, as long as the metadata information such as libraries, tables and fields in the system and the processing process of tasks can be known, protocol contents can be generated based on the known information, and blood margin information can be generated based on the protocol contents, so that blood margin can be generated even if a data source cannot be connected.
In one embodiment of the present invention, the following processing procedure of the actual service is described in detail.
In the embodiment, taking an example that web crawlers are used to capture web page data and store the web page data in the hdfs file system, and then the web page data stored in the hdfs file system is processed by spark to generate a structured data flow, the processing flow of the whole job and related blood-related information are shown. The process includes two jobs:
(1) assume that the web crawler crawls the target A web page content and stores the content in the target A directory of the hdfs file system.
In this job, two data sources are involved, where the source data source is the target A-web page and the destination data source is the target A-directory of the hdfs file system. The source of the bloody margin is the web page content captured by the crawler, the source of the data is the source data source, and the destination is the destination data source. The two data sources can be abstracted into the forms of libraries, tables and fields, and the names of the libraries, the names of the tables and the names of the fields are the same. During crawler crawling, a source and a destination are associated through a designated reader (source) and writer (destination). Blood margins were generated from reader and writer.
The information of the library, the table and the field of the target A webpage can be determined according to the link requested by the url; the information of the library, the table and the field of the hdfs data source is determined according to the file information stored by the hdfs. The names of the libraries, tables, fields may be the same.
(2) Suppose spark reads the information of the target A directory of the hdfs file system and gathers the url information referenced on the page and stores it in the urls field of the webinfo table of the web library.
In this job, two data sources are also involved, a source data source, namely the target A directory of the hdfs file system, and a destination data source, namely the webinfo table of the web library. The process of generating the blood margin is the same as the above-mentioned operation, and is not described herein.
In the present embodiment, the two operations are analyzed separately, that is, the data bloods of each operation are analyzed and generated.
Preferably, the information of data source, library, table, field, etc. involved in each job may exist as a unique name in the system in the format of "data source name, library name, table name, field name (column name)".
When the operation is analyzed, the information of the existing data source and the library, the table and the field thereof can be searched according to the unique name.
And then analyzing the input and output information of the blood margin of the operation template, searching corresponding library, table and field information from the existing data source according to the unique name, assembling, associating the source and destination of the operation data with the library, table and field information to generate the blood margin, and storing the blood margin in the database.
When the blood margin needs to be inquired, the blood margin of the library, the table and the field can be inquired from the database according to the unique name.
The job template, which is a predefined template of a job, may define a format thereof, and includes information such as a job name, description, status, creation time, start/end time, and consanguinity. The generation process of the blood margin mainly analyzes the blood margin information in the operation. The blood margin information includes two parts, input and output, one or more inputs corresponding to one output. The reader in the blood relationship information is used as input and comprises information of a data source type, a name, a description, an input table, input columns and output columns, wherein each input column corresponds to one output column; and the writer in the blood relationship information is used as an output and comprises information such as data source type, name, description, output column, dependency type and dependency expression. Wherein the dependent expression is used to indicate which data processing processes the column goes through from the possible multiple fields, such as conversion, separation, merging and/or filtering.
It should be noted that the term "column" is used herein to refer to a field.
According to the data blood margin generation method provided by each embodiment of the invention, the metadata of the target data source and the task are respectively defined through a uniform protocol format, so that the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis is broken, and the generation range of the data blood margin is expanded.
Based on the above embodiments of the present invention, the data source and the task range can be expanded, for example, the process of web crawlers capturing data is the same. The web page and the hdfs disk file can be defined as data sources, and the process of web crawler capturing web pages can be defined as a blood relationship basis.
In addition, the method for generating the data blood margin provided by the embodiment of the invention simultaneously solves the problem of blood margin generation when a network is not available when multiple systems interact. Namely, the off-line problem, the blood margin generation process is basically the same as that in the above embodiments, but the difference is that the following steps are added to the flow when the off-line state is achieved: if any system A contains data blood margin, a specified protocol format can be generated and then stored in a disk specified directory, and the disk specified directory can be preset. The operator copies the data in the specified format to the specified catalog of the blood margin generation system, and the blood margin generation system reads the disk specified catalog at regular time to analyze and generate the blood margin.
Referring to fig. 2, fig. 2 is a schematic diagram of a system for generating a data blood margin according to an embodiment of the present invention.
Accordingly, an embodiment of the present invention further provides a system for generating a data blood margin, including: an obtaining module 21, configured to obtain each target data source and current task information; the defining module 22 is configured to define each target data source and current task information through a unified preset protocol format; and the blood margin generation module 23 is configured to generate a corresponding data blood margin according to the execution process of the current task and the unified preset protocol format when the current task is executed.
The target data source and the task are defined through a uniform protocol format, then the data blood margin is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, the blood margin can be generated as long as the preset protocol format is followed, the generation limit of the data blood margin among different data sources is broken, and the compatibility of different data platforms or tools is realized. The method breaks through the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis, and expands the generation range of the blood margin of the data. Meanwhile, the problem of blood relationship generation when the network is blocked when multiple systems are interacted is solved.
In one embodiment of the present invention, the obtaining module 21 includes: a metadata obtaining unit, configured to obtain metadata information of each source data source and each destination data source, where the metadata information includes: libraries, tables, and fields; and the task information acquisition unit is used for acquiring the current task information.
Further, the definition module 22 includes: the first definition unit is used for defining each target data source through a uniform preset protocol format, analyzing the metadata information and generating the incidence relation information of a library containing table and a table containing field; and the second definition unit is used for defining the task information through a unified preset protocol format, analyzing the task information and determining the source and destination of the library, the table and/or the field contained in the task information.
Still further, the blood margin generation module 23 includes: and the blood margin generating unit is used for generating corresponding data blood margin according to the source and destination of the library, the table and/or the field contained in the task information when the current task is executed.
In this embodiment, in order to generate a data lineage in a business system, first, library, table, and field information is generated. In order to break format limitations of data sources in different data platforms or tools, when data consanguinity is generated, metadata information of each data source is obtained first, and then the metadata information is converted into a uniform data source protocol format.
The data consanguinity depends on information of the library, the table and the field, and a corresponding relation chain is generated by analyzing the subordination relation of the library, the table and the field, so that the table and the library to which the target field belongs can be conveniently known.
The source and destination of the data in the task are obtained by analyzing the task information, namely the source data source and the destination data source of the data are obtained and are matched and associated with the library, the table and the field where the data in the task are located.
In one embodiment of the present invention, the first defining unit is: and the JSON format definition unit is used for converting the metadata information of each source data source and each destination data source into a JSON format.
JSON (JavaScript Object Notation) is a lightweight data exchange format. Has simple and clear hierarchical structure, is easy to read and write by people, is easy to analyze and generate by machines, and effectively improves the network transmission efficiency
Preferably, the second defining unit includes: the first analysis subunit is used for analyzing the data source object of the task and the library, the table and/or the field contained in the task; and the first determining subunit is used for determining the source and destination of the library, the table and/or the field contained in the task information according to the analysis content of the task.
Preferably, the first determining subunit includes: and the association function unit is used for associating the names of the corresponding source and destination data sources with corresponding library names, table names and field names according to the analysis content of the task so as to determine the source and destination of the libraries, tables and/or fields contained in the task information.
In this embodiment, the library, table, and field information can be searched from an existing data source by a unique name. Data in a task may originate from different source data sources, or from different libraries, tables, and/or fields. By assembling the pool, table and field consanguinity to generate the data consanguinity, when the consanguinity is viewed, the consanguinity at the level of the pool, table and field can be queried from the database according to the unique name.
The target data source and the task are defined through a uniform protocol format, then the data blood margin is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, the blood margin can be generated as long as the preset protocol format is followed, the generation limit of the data blood margin among different data sources is broken, and the compatibility of different data platforms or tools is realized.
In addition, for a system which cannot be accessed currently, as long as the metadata information such as libraries, tables and fields in the system and the processing process of tasks can be known, protocol contents can be generated based on the known information, and blood margin information can be generated based on the protocol contents, so that blood margin can be generated even if a data source cannot be connected.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device for generating a data blood margin according to an embodiment of the present invention.
Accordingly, an embodiment of the present invention further provides an electronic device 3 for generation of data blood margin, including: a memory 31 for storing a computer program; a processor 32, configured to execute a computer program to implement the steps of the method for generating a data blood margin as provided in any of the above embodiments.
Of course, the electronic device for generating data blood margin may further include various necessary network interfaces, power supplies, other components, and the like, which are not limited herein, as the case may be.
Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for generating a data blood margin provided in any of the above embodiments. The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, are not limited herein, and the present embodiment is specific to the media.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In summary, according to the data blood margin generation method, the data blood margin generation system, the electronic device and the storage medium provided by the present invention, the target data source and the task are defined by the uniform protocol format, and then the data blood margin is generated based on the protocol format, no matter what format the target data source is, what data platform or tool is adopted, and the blood margin can be generated as long as the preset protocol format is followed, so that the generation limit of the data blood margin between different data sources is broken, and the compatibility of different data platforms or tools is realized. The method breaks through the limitation that only a database system or a distributed database is used as a data source and an ETL task or a MapReduce task is used as a blood margin basis, and expands the generation range of the blood margin of the data. Meanwhile, the problem of blood relationship generation when the network is blocked when multiple systems are interacted is solved.
The above detailed description is provided for a method, a system, an electronic device and a storage medium for generating a data blood margin provided by the present invention. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A method for generating a data blood margin, comprising:
acquiring each target data source and current task information;
respectively defining each target data source and the current task information through a unified preset protocol format; the preset protocol format is as follows: JSON data exchange format;
when a current task is executed, generating a corresponding data blood margin according to the execution process of the current task and the unified preset protocol format;
the acquiring of each target data source and current task information includes: acquiring metadata information of each source data source and each destination data source, wherein the metadata information comprises: libraries, tables, and fields; acquiring current task information;
and if the data source system cannot be accessed, generating protocol content based on the metadata information and the task information, and generating corresponding data blooding margin based on the protocol content.
2. The method according to claim 1, wherein the defining each of the target data sources and the current task information by a unified preset protocol format respectively comprises:
defining each target data source through the unified preset protocol format, analyzing the metadata information, and generating association relation information of a library containing table and a table containing field;
and defining the task information through the unified preset protocol format, analyzing the task information, and determining the source and destination of the library, table and/or field contained in the task information.
3. The method of claim 2, wherein the generating of the corresponding data context according to the execution process of the current task and the unified pre-set protocol format when the current task is executed comprises:
when the current task is executed, corresponding data consanguinity is generated according to the source and destination of the library, table and/or field contained in the task information.
4. The method according to claim 2 or 3, wherein the parsing the task information and determining the source and destination of the libraries, tables and/or fields contained in the task information comprises:
analyzing a data source object of the task and a library, a table and/or a field contained in the task;
and determining the source and destination of the library, table and/or field contained in the task information according to the analysis content of the task.
5. The method according to claim 4, wherein the determining the source and destination of the libraries, tables and/or fields contained in the task information according to the analysis content of the task comprises:
and according to the analysis content of the task, correlating the names of the corresponding source and destination data sources with corresponding library names, table names and field names to determine the source and destination of the library, table and/or field contained in the task information.
6. A system for generating a data blood margin, comprising:
the acquisition module is used for acquiring each target data source and current task information;
the definition module is used for respectively defining each target data source and the current task information through a unified preset protocol format; the preset protocol format is as follows: JSON data exchange format;
the blood margin generation module is used for generating corresponding data blood margin according to the execution process of the current task and the unified preset protocol format when the current task is executed;
wherein the obtaining module is specifically configured to: acquiring metadata information of each source data source and each destination data source, wherein the metadata information comprises: libraries, tables, and fields; acquiring current task information;
and if the data source system cannot be accessed, generating protocol content based on the metadata information and the task information, and generating corresponding data blooding margin based on the protocol content.
7. An electronic device for generation of data blooding borders, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for generating a data bloodline according to any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for generating a data edge according to any one of claims 1 to 5.
CN201811221914.5A 2018-10-19 2018-10-19 Data blood margin generation method and system, electronic equipment and storage medium Active CN109299073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811221914.5A CN109299073B (en) 2018-10-19 2018-10-19 Data blood margin generation method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811221914.5A CN109299073B (en) 2018-10-19 2018-10-19 Data blood margin generation method and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109299073A CN109299073A (en) 2019-02-01
CN109299073B true CN109299073B (en) 2019-12-24

Family

ID=65158238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811221914.5A Active CN109299073B (en) 2018-10-19 2018-10-19 Data blood margin generation method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109299073B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723087B (en) * 2019-03-19 2023-11-10 北京沃东天骏信息技术有限公司 Data blood relationship mining method and device, storage medium and electronic equipment
CN110083639B (en) * 2019-04-25 2023-03-10 中电科嘉兴新型智慧城市科技发展有限公司 Intelligent data blood source tracing method and device based on cluster analysis
CN112182045B (en) * 2019-07-02 2022-12-13 中移(苏州)软件技术有限公司 Metadata management method and device, computer equipment and storage medium
CN111008192B (en) * 2019-11-14 2023-06-02 泰康保险集团股份有限公司 Data management method, device, equipment and medium
CN111708750A (en) * 2019-12-27 2020-09-25 山东鲁能软件技术有限公司 Big data platform based storage adaptation method, system, equipment and readable storage medium
CN113468165A (en) * 2020-03-31 2021-10-01 中国移动通信集团贵州有限公司 Data blood relationship establishing method and device, electronic equipment and storage medium
CN112860662B (en) * 2021-01-22 2023-10-17 平安科技(深圳)有限公司 Automatic production data blood relationship establishment method, device, computer equipment and storage medium
CN112817984B (en) * 2021-02-22 2023-10-20 杭州数梦工场科技有限公司 Data processing method and device, and data source acquisition method and device
CN112989151B (en) * 2021-03-11 2024-05-14 北京锐安科技有限公司 Data blood relationship display method and device, electronic equipment and storage medium
CN113138990B (en) * 2021-05-17 2023-04-18 青岛海信网络科技股份有限公司 Data blood margin construction and tracing method, device and equipment
CN114676678B (en) * 2022-04-08 2023-10-27 北京百度网讯科技有限公司 Method and device for analyzing structured query language data and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN105488222A (en) * 2015-12-24 2016-04-13 广州精点计算机科技有限公司 Data source retrospective tracing method and device
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424269A (en) * 2013-08-30 2015-03-18 中国电信股份有限公司 Data linage analysis method and device
CN104424269B (en) * 2013-08-30 2018-01-30 中国电信股份有限公司 data lineage analysis method and device
CN105488222A (en) * 2015-12-24 2016-04-13 广州精点计算机科技有限公司 Data source retrospective tracing method and device
CN106709024A (en) * 2016-12-28 2017-05-24 深圳市华傲数据技术有限公司 Data table source-tracing method and device based on consanguinity analysis

Also Published As

Publication number Publication date
CN109299073A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109299073B (en) Data blood margin generation method and system, electronic equipment and storage medium
US11520800B2 (en) Extensible data transformations
US20210011926A1 (en) Efficient transformation program generation
US9146994B2 (en) Pivot facets for text mining and search
US11809442B2 (en) Facilitating data transformations
Pimentel et al. A survey on collecting, managing, and analyzing provenance from scripts
US11809223B2 (en) Collecting and annotating transformation tools for use in generating transformation programs
CN106611044B (en) SQL optimization method and equipment
KR101122629B1 (en) Method for creation of xml document using data converting of database
CN103440288A (en) Big data storage method and device
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN103699591A (en) Page body extraction method based on sample page
US20180129662A1 (en) Generating and ranking transformation programs
US20180232410A1 (en) Refining structured data indexes
CA3149710A1 (en) Data collecting method, device, computer equipment and storage medium
US11200201B2 (en) Metadata storage method, device and server
CN107748748B (en) Full text retrieval system for water conservancy and hydropower technology standard
CN110765402A (en) Visual acquisition system and method based on network resources
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN113742392A (en) Data synchronization method and device, electronic equipment and storage medium
CN112818070A (en) Data query method and device based on global data dictionary and electronic equipment
CN112181410A (en) View layer code generation method and device, electronic equipment and storage medium
CN108664503B (en) Data archiving method and device
CN112231534A (en) Crawler configuration method and equipment
CN104298685A (en) Method and device for achieving heterogeneous system unified searching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant