CN114691786A - Method and device for determining data blood relationship, storage medium and electronic device - Google Patents

Method and device for determining data blood relationship, storage medium and electronic device Download PDF

Info

Publication number
CN114691786A
CN114691786A CN202011617620.1A CN202011617620A CN114691786A CN 114691786 A CN114691786 A CN 114691786A CN 202011617620 A CN202011617620 A CN 202011617620A CN 114691786 A CN114691786 A CN 114691786A
Authority
CN
China
Prior art keywords
metadata
data
relationship
database
graph database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011617620.1A
Other languages
Chinese (zh)
Inventor
韩林
侯春华
申光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202011617620.1A priority Critical patent/CN114691786A/en
Priority to PCT/CN2021/136131 priority patent/WO2022143045A1/en
Publication of CN114691786A publication Critical patent/CN114691786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Abstract

The invention provides a method and a device for determining a data blood relationship, a storage medium and an electronic device, wherein the method comprises the following steps: obtaining metadata of an ETL task loaded by an extraction transformation, wherein the metadata comprises at least one of the following: a database, a data table, and data fields; analyzing and processing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the database, and the pairwise inclusion relationship between the data table and the data field is used for indicating the database and the pairwise mapping relationship between the data table and the data field; and responding to a data query request of the target data, determining the data blood relationship of the target data through the graph database, namely storing the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship among the metadata in the graph database, and further determining the data blood relationship of the target data through the graph database.

Description

Method and device for determining data blood relationship, storage medium and electronic device
Technical Field
The present invention relates to the field of communications, and in particular, to a method and an apparatus for determining a data blood relationship, a storage medium, and an electronic apparatus.
Background
With the rapid development of informatization and internet technology, an age of "information explosion" has come. Electronic informatization becomes a necessary trend of self development no matter whether governments or enterprises, data in various informatization systems is huge in data volume, and storage media and formats are various, so that a data island is eliminated, and data integration, sharing and mining analysis on the integrated data are more and more important.
In the approach to solving "data islanding," data warehousing is a best practice. A data warehouse is a theme-oriented, integrated, time-dependent, non-modifiable data collection. And ETL (Extract-Transform-Load) is a key link for constructing a data warehouse. The recording and analysis of data flow in the process of data exchange and sharing through the ETL also have great practical significance, such as data traceability, data value evaluation, data quality evaluation, data archiving and destruction references and the like.
In the related art, the data blood relationship analysis method based on the relational database has the defects that the model creation efficiency, the storage efficiency and the query efficiency under complex conditions cannot meet the requirements under complex situations. The traditional relational database is complex in modeling aiming at the data blood relationship, needs to relate to a plurality of associated data tables, has more concepts and is not easy to understand by developers; when in storage, multiple tables are required to be stored, and the code logic is complex; the query speed is limited to multi-table query of the correlation database, and particularly, the performance problem is obvious when the data consanguineous relation link is long and complex.
Aiming at the problems that in the related art, a data blood relationship analysis method based on a relational database is complex in model establishment, data storage and data blood relationship query, and the like, an effective technical scheme is not provided.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a data blood relationship, a storage medium and an electronic device, which are used for at least solving the problems that in the related technology, a relational database-based data blood relationship analysis method, model establishment, data storage and data blood relationship query are complex and the like.
The embodiment of the invention provides a method for determining a data blood relationship, which comprises the following steps: obtaining metadata of an extract transform load ETL task, wherein the metadata comprises at least one of: a database, a data table, and data fields; analyzing and processing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the pairwise inclusion relationship between the database, the data table and the data field, and the mapping relationship is used for indicating the pairwise mapping relationship between the database, the data table and the data field; and responding to a data query request of the target data, and determining the data blood relationship of the target data through the graph database.
Optionally, analyzing the metadata to store the metadata of the ETL task in a graph database includes: acquiring metadata of a data source end of the ETL task and metadata of a data destination end of the ETL task; determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database; and storing the metadata of the data source end in the graph database according to the first metadata type, and storing the metadata of the data destination end in the graph database according to the second metadata type.
Optionally, analyzing and processing the metadata to save a containment relationship of the metadata in a graph database includes: determining pairwise inclusion relations among the database, the data table and the data fields; and establishing the pairwise inclusion relationship according to an object establishing mode provided by the graph database, and storing the established pairwise inclusion relationship in the graph database.
Optionally, analyzing and processing the metadata to store mapping relationships between the metadata in a graph database includes: creating an ETL task metadata type in the graph database, wherein the ETL task metadata type comprises: the input and output list, the mapping relation of the metadata, and the input and output list attribute are used for storing the metadata of the data source end and the metadata of the data destination end; and acquiring the mapping relation among the metadata, and storing the mapping relation in the created ETL task metadata type so as to store the mapping relation among the metadata in a database.
Optionally, in response to a data query request of the target data, determining the data blood relationship of the target data through the graph database includes: and responding to the data query request, and performing traversal query in the input and output list through the traversal language of the graph database to determine the data blood relationship of the target data.
Optionally, in the input/output list, performing a traversal query through a traversal language of the graph database to determine a data consanguinity relationship of the target data, including: and in the input and output list, performing traversal query through the traversal language according to the input direction and/or the output direction so as to determine the data blood relationship of the target data.
According to another embodiment of the present invention, there is also provided a data blood relationship determination apparatus, including: an obtaining module, configured to obtain metadata of an extract transform load ETL task, where the metadata includes at least one of: a database, a data table, and data fields; the processing module is used for analyzing and processing the metadata so as to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the pairwise inclusion relationship among the database, the data table and the data field, and the mapping relationship is used for indicating the pairwise mapping relationship among the database, the data table and the data field; and the response module is used for responding to a data query request of the target data and determining the data blood relationship of the target data through the graph database.
Optionally, the processing module is further configured to obtain metadata of a data source end of the ETL task and metadata of a data destination end of the ETL task; determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database; and storing the metadata of the data source end in the graph database according to the first metadata type, and storing the metadata of the data destination end in the graph database according to the second metadata type.
According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
Through the technical scheme, the metadata of the ETL task is extracted, converted and loaded, wherein the metadata comprises at least one of the following data: a database, a data table, and data fields; analyzing and processing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the database, and the pairwise inclusion relationship between the data table and the data field is used for indicating the database and the pairwise mapping relationship between the data table and the data field; by adopting the technical scheme, the problems that in the related technology, the data relationship analysis method based on a relational database is complex in model establishment, data storage and data relationship query are solved, and the data relationship determination method based on the graph database enables the model establishment, the data storage and the data relationship query to be simpler and more efficient.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a computer terminal of a method for determining a data blood relationship according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of determining data kindred relationships according to an embodiment of the invention;
FIG. 3 is a schematic diagram of metadata type definition and creation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of data relationship construction according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of data relationship analysis according to an embodiment of the present invention;
FIG. 6 is a diagram of graph traversal nodes and directed edges according to an embodiment of the invention;
fig. 7 is a block diagram of a data relationship determination apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of the computer terminal of the method for determining a data blood relationship according to the embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1. The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the method for determining a data blood relationship in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
According to an embodiment of the present invention, there is provided a method for determining a data blood relationship, which is applied to the computer terminal, and fig. 2 is a flowchart of the method for determining a data blood relationship according to the embodiment of the present invention, as shown in fig. 2, including:
step S202, obtaining metadata of an ETL task for extracting, converting and loading, wherein the metadata comprises at least one of the following: a database, a data table, and data fields;
step S204, analyzing and processing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the two-to-two inclusion relationship between the database, the data table and the data field, and the mapping relationship is used for indicating the two-to-two mapping relationship between the database, the data table and the data field;
step S206, responding to the data query request of the target data, and determining the data blood relationship of the target data through the graph database.
Through the steps, the metadata of the ETL task is extracted, converted and loaded, wherein the metadata comprises at least one of the following data: a database, a data table, and data fields; analyzing and processing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the database, and the pairwise inclusion relationship between the data table and the data field is used for indicating the database and the pairwise mapping relationship between the data table and the data field; by adopting the technical scheme, the problems that in the related technology, the data relationship analysis method based on a relational database is complex in model establishment, data storage and data relationship query are solved, and the data relationship determination method based on the graph database enables the model establishment, the data storage and the data relationship query to be simpler and more efficient.
The specific implementation steps of step S204 are as follows:
step 1: analyzing and processing the metadata to store the metadata of the ETL task in a database, and specifically, acquiring the metadata of a data source end of the ETL task and the metadata of a data destination end of the ETL task; determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database; storing the metadata of the data source end in the graph database according to the first metadata type, and storing the metadata of the data destination end in the graph database according to the second metadata type;
that is to say, the metadata of the data source end and the metadata of the data destination end of the ETL task are obtained, the metadata of the data source end and the metadata of the data destination end of the ETL task, namely, the first metadata type and the second metadata type, are determined according to a metadata type definition method provided by a graph database, and then the metadata of the data source end and the metadata of the data destination end are stored in the graph database according to the first metadata type and the second metadata type, respectively.
Step 2: analyzing and processing the metadata to store the inclusion relationship of the metadata in a database, and specifically determining the two-to-two inclusion relationship among the database, a data table and data fields; creating the two-to-two inclusion relationship according to an object creation mode provided by the graph database, and storing the created two-to-two inclusion relationship in the graph database;
it should be noted that, the specific information of the metadata at the data source end and the metadata at the data destination end of the ETL task, that is, the two-to-two inclusion relationships between the database, the data table, and the data field, the two-to-two inclusion relationship between the database field, the data table, and the data field of the database is created by using the object creation method provided by the graph database, so as to obtain the created two-to-two inclusion relationship, and the created two-to-two inclusion relationship is stored in the graph database again.
And step 3: analyzing and processing the metadata to store mapping relations among the metadata in a graph database, specifically, creating an ETL task metadata type in the graph database, wherein the ETL task metadata type comprises: the input and output list is the mapping relation of the metadata, and the attribute of the input and output list is used for storing the metadata of the data source end and the metadata of the data destination end; and acquiring the mapping relation among the metadata, and storing the mapping relation in the created ETL task metadata type so as to store the mapping relation among the metadata in a database.
Specifically, the mapping relationship between metadata of a data source end and metadata of a data destination end in the ETL task is analyzed, which can be understood as the corresponding relationship between the metadata of the data source end and the direction information and the field of the data destination end, and an ETL task metadata type is created, where the ETL task metadata type includes: the method comprises the steps of inputting an output list and a mapping relation of metadata, storing the metadata of a data source end and the metadata of a data destination end in an ETL task into a corresponding input output list, storing the mapping relation between the metadata into the mapping relation of the metadata in a corresponding ETL task metadata type, and further storing the mapping relation between the metadata into a database.
And (3) finishing the steps 1-3 to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a graph database.
There are many ways to implement step S206, and in an exemplary embodiment, in response to the data query request, a traversal query is performed in the traversal language of the graph database in the input-output list to determine the data blood relationship of the target data.
Specifically, in the input/output list, traversal query is performed through the traversal language according to the input direction and/or the output direction to determine the data blood relationship of the target data.
In order to determine the data blood relationship of the target data through the graph database, firstly, a data query request is acquired, and the data blood relationship of the target data is queried through a traversal language of the graph database according to an input direction and/or an output direction from the target data based on an input/output list of an ETL task metadata type.
The following explains the flow of the above method for determining the data blood relationship with reference to several alternative embodiments, but the invention is not limited to the technical solution of the embodiments of the present invention.
In order to effectively integrate and disperse heterogeneous data information resources and eliminate a data island phenomenon, an ETL tool is adopted to arrange processing tasks at present, the ETL task can be regarded as a data flow mode, three basic elements of data flow are a data source, a data flow direction and a data destination, and further more detailed information can be contained, for example, a field corresponds to a blood relationship, and the specific implementation steps are as follows:
step 1: abstracting metadata types of a data source of an ETL task and a data destination of the ETL task, wherein the metadata types comprise a database, a data table and metadata types of data fields, defining the dependency relationship among the metadata types, for example, the data fields belong to the data table, the data table belongs to the database, and initializing the metadata types according to a metadata type definition method provided by a database. In addition, the ETL task is further defined with an ETL task metadata type, where the ETL task metadata type includes the list attributes of the input and output objects transformed in the ETL and the corresponding field mapping relationships, and the list attributes of the input and output objects transformed in the ETL are important parts of analyzing the data consanguinity relationships.
Step 2: specific information (corresponding to the inclusion relationship of metadata in the above embodiment) of a data source and a data destination in an ETL task is analyzed, that is, a database, a data table, and data fields and the relationship existing therebetween, a metadata object creation method provided by using a graph database newly creates a metadata object corresponding to a metadata type and an inclusion dependency relationship between objects, and the metadata object corresponding to the metadata type and the inclusion dependency relationship between the objects have global uniqueness. Step 1 and step 2 are shown in fig. 3, and fig. 3 is a schematic diagram of metadata type definition and creation according to an embodiment of the present invention.
And step 3: analyzing the data flow direction (corresponding to the mapping relationship between metadata in the above embodiment) between the data elements in the ETL task, that is, the direction information from the data source to the data destination and the corresponding relationship between fields, creating an ETL task metadata type object, and storing the related information, wherein the most important is to store the metadata objects corresponding to the created data source and data destination in the ETL task into the input and output object lists of the ETL task metadata type, and the corresponding relationship between the fields of the data source and data destination metadata object is stored in the field mapping relationship of the ETL task metadata object. As shown in fig. 4, fig. 4 is a schematic diagram of data relationship construction according to an embodiment of the present invention.
And 4, step 4: based on data stored in a graph database, data consanguinity query and analysis are carried out through a query language and a method provided by the graph database, the principle is that based on an input object list and an output object list stored in an ETL task metadata type object, traversal query operation in two directions of input and output is carried out from an input object through a traversal language of the graph database, the depth quantity of traversal levels can be adjusted according to query requirements, and the consanguinity relation can be queried.
And 5: after the queried blood margin data is sorted and converted, data consumption can be carried out, for example, a data blood margin relation graph is drawn, the upstream source and the downstream destination of the data and associated ETL task information are clearly shown, and further analysis such as data tracing can be conveniently carried out. As shown in fig. 5, fig. 5 is a schematic diagram of data blood relationship analysis according to an embodiment of the present invention.
The following details are taken as an example to extract an ETL task and an Atlas metadata tool which are loaded into an Oracle database data table B from data in a Mysql database data table A as a graph data storage tool frame, and the specific steps are as follows:
step 1: first, metadata types such as Mysql database, Oracle database, database table, field and the like are abstracted, and because Mysql and Oracle are both relational databases, the abstractions are of the same type, namely rdbsource (database), rdtable (data table), and rdcolumn (data table field) (which are equivalent to the data field in the above embodiment). Defining Atlas metadata type in xml format, wherein in addition to the attributes of basic name, type and the like, the RdbResource contains a list data attribute of rdbTable type as a contained data table; the RdbTable contains a rdbccolumn type list data attribute which is a contained data table field. In addition, for the ETL task, an ETL task metadata type ETLJob is abstracted, and the ETLJob contains a list data attribute of an ETLJobTrans type except the attributes of a basic name, a type and the like; ETLJobTrans contains input and output object list attributes, as well as field mappings, in addition to basic attributes.
Step 2: the databases and the contained data table fields belonging to the Mysql data table A and the Oracle data table B of the ETL task are analyzed, after the connection of the configuration databases, the databases and the contained data table fields belonging to the Mysql data table A and the Oracle data table B of the ETL task are analyzed through automatic database metadata analysis, and then an Atlas interface is called to create corresponding RdbResource, RdbTable and rdbreumn type objects.
And step 3: analyzing the conversion step (equivalent to the mapping relation between the metadata in the above embodiment) in the ETL task, creating corresponding ETLJob and ETLJobTrans metadata type objects, wherein the input and output object list attributes of the ETLJobTrans object are added with Mysql data table a and Oracle data table B, and the corresponding field mapping relation is also stored in Atlas.
And 4, step 4: and querying and analyzing the stored data of Atlas, circularly traversing query by using a graph data traversal language provided by Atlas and taking the Mysql table A as a starting point, and respectively taking the Mysql as the blood relationship associated with the input and output object list attributes of the ETLJobTrans metadata object. The traversal direction is that the node of an input object list is inquired when the input edge of the Mysql data table A is inquired, the node of an ETLJobTrans is inquired when the input edge is inquired, then the node of an output object list of the output edge of the node and the node of an Oracle data table B of the output object list are inquired, the downstream data blood relationship of the Mysql data table A can be repeatedly inquired for many times through a simple method for inquiring the downstream data blood relationship of the Mysql data table A by one layer, and then the downstream multi-layer data blood relationship can be inquired; conversely, the upstream data relationship of the Mysql data table a may be queried by querying in the opposite direction. As shown in fig. 6, fig. 6 is a schematic diagram of a graph traversal node and a directed edge according to an embodiment of the present invention.
And 5: the data searched by the graph traversal can be simplified and converted as required, so that a data structure convenient to process is provided for the blood relationship data consumers. For example, to simplify, two sets of data: the node data list only comprises data table nodes and ETL task nodes, the directed edges only store source node ids and destination node ids, and a data consumer can construct a real blood relationship through two groups of data reduction. In addition, other related data can be queried through a customized graph database traversal language, such as a database consanguinity relationship, a data table field consanguinity relationship and the like.
Through the steps, various ETL tasks are developed and defined, firstly, a graph database self-defining metadata function is used for predefining various data elements, including but not limited to common relational databases (such as mysql, oracle, sql server and the like), big data related types (hive, impala, hbase, ES, Mongobb and the like), structured or unstructured files (ftp, hdfs) and the like, and the types of the ETL tasks are initialized in a graph database; then, analyzing the blood relationship among the data elements of the ETL tasks for the ETL tasks, converting the relationship into the relationship between the entities constructing the graph database, and storing the relationship into the graph database; and finally, performing data blood relationship analysis by using an efficient and convenient query method provided by a graph database.
The above description specifically describes the preferred implementation of the present application, but the implementation method is not limited to this, and particularly, for the definition of metadata types of graph data and the use of graph database traversal query language, different customizations or transformations may be performed according to different data context analysis requirements, for example, simplifying or omitting the definition of metadata types of ETL tasks, and directly associating the metadata objects of the data table with directed edges. Such equivalent adaptations and modifications are intended to be included within the scope of the present invention as defined in the following claims.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a device for determining a data blood relationship is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a block diagram of a data blood relationship determination apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes:
an obtaining module 72, configured to obtain metadata of the extract transform load ETL task, where the metadata includes at least one of: a database, a data table, and data fields;
a processing module 74, configured to analyze and process the metadata, so as to store, in a graph database, metadata of the ETL task, an inclusion relationship of the metadata, and a mapping relationship between the metadata, where the inclusion relationship is used to indicate a pairwise inclusion relationship between the database, and the data field, and the mapping relationship is used to indicate a pairwise mapping relationship between the database, and the data field;
a response module 76, configured to determine the data blood relationship of the target data through the graph database in response to the data query request of the target data.
According to the invention, the metadata of the ETL task is extracted, converted and loaded, wherein the metadata comprises at least one of the following data: a database, a data table, and data fields; analyzing the metadata to store the metadata of the ETL task, the inclusion relation of the metadata and the mapping relation between the metadata in a database, wherein the inclusion relation is used for indicating the database, and the pairwise inclusion relation between the data table and the data field is used for indicating the pairwise mapping relation between the database and the data table and the data field; by adopting the technical scheme, the problems that in the related technology, the data relationship analysis method based on a relational database is complex in model establishment, data storage and data relationship query are solved, and the data relationship determination method based on the graph database enables the model establishment, the data storage and the data relationship query to be simpler and more efficient.
In an optional embodiment, the processing module is further configured to analyze and process the metadata to store the metadata of the ETL task in a graph database, and specifically, obtain the metadata of a data source end of the ETL task and the metadata of a data destination end of the ETL task; determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database; and storing the metadata of the data source end in the database according to the first metadata type, and storing the metadata of the data destination end in the database according to the second metadata type.
That is to say, the metadata of the data source end and the metadata of the data destination end of the ETL task are obtained, the metadata of the data source end and the metadata of the data destination end of the ETL task, namely, the first metadata type and the second metadata type, are determined according to a metadata type definition method provided by a graph database, and then the metadata of the data source end and the metadata of the data destination end are stored in the graph database according to the first metadata type and the second metadata type, respectively.
In an optional embodiment, the processing module is further configured to analyze and process the metadata to store an inclusion relationship of the metadata in a graph database, and specifically, determine two-by-two inclusion relationships among the database, a data table, and a data field; and establishing the pairwise inclusion relationship according to an object establishing mode provided by the graph database, and storing the established pairwise inclusion relationship in the graph database.
It should be noted that specific information of metadata at a data source end and metadata at a data destination end of an ETL task, that is, a pairwise inclusion relationship among a database, a data table, and data fields is analyzed, a pairwise inclusion relationship among the database, the data table, and the data fields is created by using an object creation method provided by a graph database, so as to obtain a created pairwise inclusion relationship, and the created pairwise inclusion relationship is stored in the graph database again.
In an optional embodiment, the processing module is further configured to analyze and process the metadata to store mapping relationships between the metadata in a graph database, and specifically, create an ETL task metadata type in the graph database, where the ETL task metadata type includes: the input and output list is the mapping relation of the metadata, and the attribute of the input and output list is used for storing the metadata of the data source end and the metadata of the data destination end; and acquiring the mapping relation among the metadata, and storing the mapping relation in the created ETL task metadata type so as to store the mapping relation among the metadata in a database.
Specifically, the mapping relationship between metadata of a data source end and metadata of a data destination end in the ETL task is analyzed, which can be understood as the corresponding relationship between the metadata of the data source end and the direction information and the field of the data destination end, and an ETL task metadata type is created, where the ETL task metadata type includes: the method comprises the steps of inputting an output list and a mapping relation of metadata, storing the metadata of a data source end and the metadata of a data destination end in an ETL task into a corresponding input output list, storing the mapping relation between the metadata into the mapping relation of the metadata in a corresponding ETL task metadata type, and further storing the mapping relation between the metadata into a database.
And further completing the storage of the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database.
In an exemplary embodiment, the response module is further configured to perform a traversal query in the traversal language of the graph database in the input-output list in response to the data query request to determine the data consanguinity of the target data.
Specifically, the response module is further configured to perform traversal query on the input/output list through the traversal language according to the input direction and/or the output direction to determine the data blood relationship of the target data.
In order to determine the data blood relationship of the target data through the graph database, firstly, a data query request is acquired, and the data blood relationship of the target data is queried through a traversal language of the graph database according to an input direction and/or an output direction from the target data based on an input/output list of an ETL task metadata type.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, obtaining metadata of the ETL task of extraction, conversion and loading, wherein the metadata comprises at least one of the following: a database, a data table, and data fields;
s2, analyzing and processing the metadata to store metadata of the ETL task, an inclusion relation of the metadata and a mapping relation between the metadata in a database, wherein the inclusion relation is used for indicating a pairwise inclusion relation between the database, the database and the data field, and the mapping relation is used for indicating a pairwise mapping relation between the database, the database and the data field;
and S3, responding to the data query request of the target data, and determining the data blood relationship of the target data through the graph database.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention further provide an electronic device, comprising a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, obtaining metadata of the ETL task of extraction, conversion and loading, wherein the metadata comprises at least one of the following: a database, a data table, and data fields;
s2, analyzing and processing the metadata to store metadata of the ETL task, an inclusion relation of the metadata and a mapping relation between the metadata in a database, wherein the inclusion relation is used for indicating a pairwise inclusion relation between the database, the database and the data field, and the mapping relation is used for indicating a pairwise mapping relation between the database, the database and the data field;
and S3, responding to the data query request of the target data, and determining the data blood relationship of the target data through the graph database.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for determining data blood relationship, comprising:
obtaining metadata of an extract transform load ETL task, wherein the metadata comprises at least one of: a database, a data table, and data fields;
analyzing the metadata to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the pairwise inclusion relationship between the database, the data table and the data field, and the mapping relationship is used for indicating the pairwise mapping relationship between the database, the data table and the data field;
and responding to a data query request of the target data, and determining the data blood relationship of the target data through the graph database.
2. The method of claim 1, wherein parsing the metadata to save the metadata of the ETL job in a graph database comprises:
acquiring metadata of a data source end of the ETL task and metadata of a data destination end of the ETL task;
determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database;
and storing the metadata of the data source end in the graph database according to the first metadata type, and storing the metadata of the data destination end in the graph database according to the second metadata type.
3. The method of claim 1, wherein analyzing the metadata to store the containment relationship of the metadata in a graph database comprises:
determining pairwise inclusion relations among the database, the data table and the data fields;
and establishing the pairwise inclusion relationship according to an object establishing mode provided by the graph database, and storing the established pairwise inclusion relationship in the graph database.
4. The method of claim 1, wherein analyzing the metadata to store the mapping relationship between the metadata in a graph database comprises:
creating an ETL task metadata type in the graph database, wherein the ETL task metadata type comprises: the input and output list is the mapping relation of the metadata, and the attribute of the input and output list is used for storing the metadata of the data source end and the metadata of the data destination end;
and acquiring the mapping relation among the metadata, and storing the mapping relation in the created ETL task metadata type so as to store the mapping relation among the metadata in a database.
5. The method of claim 1, wherein determining data consanguinity of target data from the graph database in response to a data query request for target data comprises:
and responding to the data query request, and performing traversal query in the input and output list through the traversal language of the graph database to determine the data blood relationship of the target data.
6. The method of claim 5, wherein traversing a query in the input-output list in a traversal language of the graph database to determine data context of the target data comprises:
and in the input and output list, performing traversal query through the traversal language according to the input direction and/or the output direction so as to determine the data blood relationship of the target data.
7. An apparatus for determining data blood relationship, comprising:
an obtaining module, configured to obtain metadata of an extract transform load ETL task, where the metadata includes at least one of: a database, a data table, and data fields;
the processing module is used for analyzing and processing the metadata so as to store the metadata of the ETL task, the inclusion relationship of the metadata and the mapping relationship between the metadata in a database, wherein the inclusion relationship is used for indicating the pairwise inclusion relationship between the database, the data table and the data field, and the mapping relationship is used for indicating the pairwise mapping relationship between the database, the data table and the data field;
and the response module is used for responding to a data query request of the target data and determining the data blood relationship of the target data through the graph database.
8. The apparatus of claim 7, wherein the processing module is further configured to obtain metadata of a data source end of the ETL task and metadata of a data destination end of the ETL task; determining a first metadata type of metadata of the data source end and a second metadata type of metadata of the data destination end according to the metadata types provided by the graph database; and storing the metadata of the data source end in the graph database according to the first metadata type, and storing the metadata of the data destination end in the graph database according to the second metadata type.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has a computer program stored therein, and the processor is configured to execute the computer program to perform the method of any of claims 1 to 6.
CN202011617620.1A 2020-12-30 2020-12-30 Method and device for determining data blood relationship, storage medium and electronic device Pending CN114691786A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011617620.1A CN114691786A (en) 2020-12-30 2020-12-30 Method and device for determining data blood relationship, storage medium and electronic device
PCT/CN2021/136131 WO2022143045A1 (en) 2020-12-30 2021-12-07 Method and apparatus for determining data blood relationship, and storage medium and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011617620.1A CN114691786A (en) 2020-12-30 2020-12-30 Method and device for determining data blood relationship, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN114691786A true CN114691786A (en) 2022-07-01

Family

ID=82134098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011617620.1A Pending CN114691786A (en) 2020-12-30 2020-12-30 Method and device for determining data blood relationship, storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN114691786A (en)
WO (1) WO2022143045A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062049B (en) * 2022-07-28 2022-11-18 浙江城云数字科技有限公司 Data blood margin analysis method and device
CN115168363B (en) * 2022-07-29 2023-04-18 北京远舢智能科技有限公司 Metadata processing method and device, electronic equipment and storage medium
CN115757655B (en) * 2022-11-14 2023-07-07 中国兵器工业计算机应用技术研究所 Metadata management-based data blood-edge analysis system and method
CN116028248B (en) * 2023-03-30 2023-07-25 紫金诚征信有限公司 Data processing method and device suitable for WEB terminal and electronic equipment
CN116166718B (en) * 2023-04-25 2023-07-14 北京捷泰云际信息技术有限公司 Data blood margin acquisition method and device
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform
CN117273131B (en) * 2023-11-22 2024-02-13 四川三合力通科技发展集团有限公司 Cross-node data relationship discovery system and method
CN117555950B (en) * 2024-01-12 2024-04-02 山东再起数据科技有限公司 Data blood relationship construction method based on data center

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739893B (en) * 2018-12-28 2022-04-22 上海尚往网络科技有限公司 Metadata management method, equipment and computer readable medium
CN109739894B (en) * 2019-01-04 2022-12-09 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for supplementing metadata description
CN112115192B (en) * 2020-10-09 2021-07-02 北京东方通软件有限公司 Efficient flow arrangement method and system for ETL system

Also Published As

Publication number Publication date
WO2022143045A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
CN114691786A (en) Method and device for determining data blood relationship, storage medium and electronic device
JP6617117B2 (en) Scalable analysis platform for semi-structured data
US20170357653A1 (en) Unsupervised method for enriching rdf data sources from denormalized data
CN101640613B (en) Method and device for network resource relating management
EP2182448A1 (en) Federated configuration data management
JP2010524060A (en) Data merging in distributed computing
US10394805B2 (en) Database management for mobile devices
EP1810131A2 (en) Services oriented architecture for data integration services
CN111400393B (en) Data processing method and device based on multi-application platform and storage medium
CN112084270A (en) Data blood margin processing method and device, storage medium and equipment
CN112231351A (en) Real-time query method and device for PB-level mass data
CN114254033A (en) Data processing method and system based on BS architecture
CN111125213A (en) Data acquisition method, device and system
US10459987B2 (en) Data virtualization for workflows
Näsholm Extracting data from nosql databases-a step towards interactive visual analysis of nosql data
Radeschütz et al. Business impact analysis—a framework for a comprehensive analysis and optimization of business processes
Dombrowski et al. Knowledge graphs for an automated information provision in the factory planning
CN110245184B (en) Data processing method, system and device based on tagSQL
US20210089527A1 (en) Incremental addition of data to partitions in database tables
CN116226082A (en) Database model generation method and device, storage medium and electronic equipment
CN111159213A (en) Data query method, device, system and storage medium
CN114925054A (en) Meta-model-based metadata management system and method
CN114564621A (en) Method, device and equipment for associating data and readable storage medium
CN113868138A (en) Method, system, equipment and storage medium for acquiring test data
Papalkar et al. Issues of concern in storage system of IoT based big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination