CN114925042A

CN114925042A - Method for constructing metadata relation based on graphic database

Info

Publication number: CN114925042A
Application number: CN202210706119.5A
Authority: CN
Inventors: 李良昆; 岳正飞; 杨融; 高攀
Original assignee: Positive Network Technology Co ltd
Current assignee: Positive Network Technology Co ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-08-19

Abstract

The invention discloses a method for constructing a metadata relation based on a graphic database, which is characterized by comprising the following steps of: s1, analyzing database tables of each department service system, and generating a data general survey report; s2, according to the characteristics of each service system and the database version, ETL task configuration is carried out on each service system through an ETL tool; and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform. According to the method, the metadata management container is constructed based on the Janus graph database, the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government affair large-scale data management and application process are improved, and the problems that metadata tracing is slow, the operation efficiency is poor, clustering deployment cannot be achieved and the concurrency is low when the relationship depth is deep are solved.

Description

Method for constructing metadata relation based on graphic database

Technical Field

The invention belongs to the technical field of graphic databases, and particularly relates to a method for constructing a metadata relationship based on a graphic database.

Background

The digital government system is used for carrying out the overall planning of national digital government construction, accelerating the promotion of the digital government construction, gradually changing the functions of the government and changing the original management mode into an advanced service mode. In the process of government function transformation, information barriers among the existing government departments must be broken, government data development and sharing are continuously promoted, resource integration is promoted, and the treatment capacity is improved. Meanwhile, government departments integrate business system databases used by the departments in a centralized manner by constructing large data centers in provinces, cities, counties and counties to form a data warehouse which is open to the outside for sharing. In the process of external shared use of a data warehouse, the conditions of non-uniform data standards, unclear data association relation and the like exist, so that a lot of shared data become problem metadata, the data sharing efficiency is low, the shared data cannot be directly used and the like.

The construction of the blood relationship of metadata widely used at present mainly takes a traditional relational database as a main part, and although the metadata can be traced, certain limitations and defects exist in the actual use process. If the metadata with the relation depth larger than a certain data value cannot be traced, the operation efficiency is poor when the metadata with the relation depth smaller than the certain data value is traced; secondly, a technical interface needs to be developed to match data tracing, so that a higher technical threshold is provided; and thirdly, the support concurrency is low, high concurrency service cannot be supported, and the like. And the government affair metadata more emphasizes timeliness and accuracy of data tracing, so that the existing metadata consanguinity relationship construction has defects in government affair data application and cannot be effectively supported.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method for constructing a metadata relationship based on a graph database, which is used for constructing a metadata management container based on a Janus graph database, improving the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government large-scale data management and application process, and solving the problems of slow metadata tracing, poor operation efficiency, incapability of clustered deployment and low support concurrency when the relationship depth is deep.

The invention provides the following technical scheme:

a method for constructing metadata relationship based on a graphic database comprises the following steps:

s1, analyzing the database table of each department business system, and generating a data general survey report;

s2, according to the characteristics of each business system and the database version, ETL task configuration is carried out on each business system through an ETL tool; and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform. And registering the configured tasks to a task scheduling center of the big data platform, and collecting a service system database.

Preferably, in step S1, the database table analyzing step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.

Preferably, the big data platform comprises a data governance platform and a task scheduling platform.

Preferably, in step S2, the ETL tool is configured to collect data of each business system, couple each metadata with the data, and manage the metadata through the data management platform by using the collected metadata and data.

Preferably, the implementation manner of the data management platform for managing the metadata is as follows: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library.

Preferably, the task scheduling platform is used for performing data warehouse layering on the data which are treated by the data treatment platform to construct a comprehensive library or a thematic database.

Preferably, in the process of scheduling the data, the data flow direction of the task scheduling platform updates the data flow direction to the JanusGraph library through a log component.

Preferably, in step S3, the big data platform further includes an operation and maintenance monitoring platform for monitoring the task collection situation in real time.

Preferably, the log component includes a system log, an error log and a middle table log, the system log is used for recording operations on the data source and the data warehouse, and the content recorded by the system log includes the current user, the system time, the operations made and the total number of users; the error log is used for recording error information when a flow error point is generated, and the error log can help service developers to debug; the intermediate table log is used for recording the creation information, the system running time and the running period which are constructed in the data transfer process of the system and the flow condition of the program during data conversion. How the display data is extracted from the source database and loaded into the target database.

Preferably, the ETL tool works in a manner of first extracting and reloading, and finally performing data conversion in a system warehouse. I.e. the data conversion follows the data loading.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the method for constructing the metadata relationship based on the graphic database, disclosed by the invention, the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government affair large-scale data management and application process are improved by constructing the metadata management container based on the Janus graph database, and the problems of slow metadata tracing, poor operation efficiency, incapability of clustered deployment and low support concurrency when the relationship depth is deep are solved.

(2) The method for constructing the metadata relationship based on the graphic database supports the collection of multi-type relational or non-relational databases through the adopted ETL tool, can collect multilinks, can ensure the continuous transmission of breakpoints, can collect metadata and support the collection of the data, and does not need a complex treatment process in the collection process.

(3) The invention relates to a construction method of a metadata relationship based on a graphic database, which can establish a unified standard for data through a data management platform, check the quality of the data, accurately describe the metadata attribute of the data, analyze the association relationship between the data, form a data resource catalog, realize the rapid retrieval of the data and manage the whole life cycle of the data.

(4) The metadata relationship construction method based on the graph database has the advantages that the larger the metadata tracing relationship depth is, the more obvious the advantages of the method are, the data depth can reflect the value of the data in the government affair metadata management application process, and the processing data depth is also the basis for constructing a metadata model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a flow chart of a method for removing eutectic phosphorus in accordance with the present invention.

FIG. 2 is a chart of relationship between blood vessels according to the present invention.

Fig. 3 is a schematic diagram of ETL operation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Example 1

Referring to fig. 1-2, a method for constructing a metadata relationship based on a graph database includes the following steps:

s1, analyzing database tables of each department service system, and generating a data general survey report;

s2, according to the characteristics of each service system and the database version, ETL (Extract-Transform-Load) task configuration is carried out on each service system through an ETL tool;

and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform.

In step S1, the database table analysis step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.

The big data platform comprises a data management platform and a task scheduling platform.

In step S2, the ETL tool is configured to collect data of each business system, couple each metadata with the data, and effectively obtain which field, which table, and which library any data belongs to, and manage the metadata and the data that have been collected through the data management platform.

The data management platform manages the metadata in the following implementation mode: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library. The bloody relationship diagram is shown in fig. 2, for example, the name (name) in the user table (user) in the service library (yw _ ku) below the human-society service system (yw _ rs) is formed into an association relationship yw _ re-ye _ ku-user-name.

Through the metadata blood relationship, the construction process of the whole metadata family is reconstructed, and the venation and the path of the family members connected with each other are described. When some data is wrong or abnormal, the source of the problem can be locked by upward analysis; when some data is modified, it can be analyzed downwards to obtain which data entities have affected data. The granularity of tracking is also refined to fields by providing column level access.

And the task scheduling platform is used for carrying out data warehouse layering on the data which are treated by the data treatment platform to construct a comprehensive database or a thematic database.

And in the process of scheduling the data, the data flow direction of the task scheduling platform updates the data flow direction to the Janus graph library finally in a log record mode. For example, data is from ODS (original data) to DWD (detailed data layer) to DWS (service data layer) to last subject library or ADS (data application layer), which particularly shows that only data is recorded to the position of the comprehensive library or subject library in the circulation process, and no intermediate flow is required to be recorded. The relationship of the blood relationship of the aforementioned social system is that the name (name) in the user table (user) in the service library (yw _ ku) below the social service system (yw _ rs) forms the comprehensive population library (ADS _ zh _ ku).

In step S3, the big data platform further includes an operation and maintenance monitoring platform for monitoring the task collection status in real time.

And the early-stage data general survey comprehensively and deeply understands the collected service system and has a clear general survey report on a database, fields and attributes of the service system.

The adopted ETL tool supports the collection of multi-type relational or non-relational databases, can carry out multilink collection, can ensure the continuous transmission of breakpoints, can collect metadata and also support the collection of data, and does not need a complex management process in the collection process.

And a data task scheduling platform and an operation and maintenance monitoring platform are set up, and data scheduling, operation and maintenance control and the like are carried out on the whole task chain.

The data management platform can establish a unified standard for data, check the quality of the data, accurately describe the data element attributes, analyze the incidence relation among the data, form a data resource catalog, realize the rapid retrieval of the data and manage the whole life cycle of the data.

The larger the metadata tracing relation depth is, the more obvious the advantages of the method are. In the process of government affair metadata management application, the depth of data can reflect the value of the data, and the processing data depth is also the basis for building a metadata model. The method is based on the JanusGraph graphic database, the design management of the metadata is realized, the source business system data can be successfully traced back, the relation between the metadata is displayed in a graphic mode, and the display effect is more visual and clear.

Example 2

On the basis of the embodiment 1, please refer to fig. 3, where the log component includes a system log, an error log, and a middle table log, the system log is used to record operations on a data source and a data warehouse, and the content recorded by the system log includes a current user, a system time, operations performed, and a total number of users; the error log is used for recording error information when a flow error point is generated, and the error log can help service developers to debug; the intermediate table log is used for recording the creation information, the system running time and the running period which are constructed in the data transfer process of the system and the flow condition of the program during data conversion. How the display data is extracted from the source database and loaded into the target database.

The working mode of the ETL tool is realized by firstly extracting and reloading and finally performing data conversion in a system warehouse. I.e. the data conversion follows the data loading. The conventional ETL tool has several disadvantages, 1, performance problem, and the data transformation step of the ETL process is obviously the most operational one of the three steps, and the conventional ETL method transformation step is completely executed by the ETL on a special server. The ETL tool converts or quality checks the data item by item, which easily makes the conversion flow become the bottleneck of the whole ETL process. In addition, the forwarding of data between sources, destinations and tools also increases network traffic and causes additional operational problems. An ETL procedure such as this: data is extracted from a database side source table, and some data needs to be selected from a certain reference table of a data warehouse to complete the data warehouse, and then the data is loaded into a target table of the data warehouse. Conventional ETL tools typically accomplish this in several ways: a. loading a warehouse end reference table in a memory; the entire reference table is retrieved from the data repository and loaded into the central engine memory. Then, the recombination (conversion) of the extracted data of the source table is completed in the engine memory, and finally the data is loaded into a destination table of the data warehouse. If the reference table is large, runtime requires a lot of memory and a long time to load and re-index the data in the engine. b. Searching the reference table row by row; for each extraction ETL tool, look up the reference tables of the data warehouse. The query returns a separate row that matches the source data. If the source table has 1000 ten thousand rows, the ETL engine sends 1000 ten thousand queries. This will significantly slow down the ETL process and significantly increase the overhead of the data warehouse. This is almost impossible to accomplish for large scale business data integration. Obviously, both of these approaches are inefficient, and thus it can be seen that the conventional ETL method has certain drawbacks in performance. 2, cost problem: generally, the cost of implementing the ETL process will be offset by saving labor. In the ROI (Return On investment) analysis of the ETL process, additional and potential expense must be taken into account. The most obvious cost is to purchase a dedicated server and ETL engine software. Since the ETL engine is a middle-level component that performs a large amount of budgets, a powerful server, even a cluster of servers, is needed to meet the high-intensity computing requirements. As the size of the data warehouse increases, the ETL server also requires ongoing hardware and software maintenance upgrade costs during runtime. In addition, conventional ETL tools have many potential costs, including the expense of consulting required for deployment, debugging, and the expense of code rewriting as integration requirements change. The difference between the data transfer system structure designed by the invention and the traditional process is that the data conversion is carried out after the data is loaded; placing the data transfer after loading remedies several disadvantages of conventional ETL tools. As shown in fig. 3, the data conversion is performed in the system warehouse after the data is extracted and reloaded, and the conversion function or trigger may be written in PL/SQL language during or after the data is loaded. Such data transfer is not simply a complete replication of the data source to the data warehouse side, where a mirror image of the data source is created. For a large database across platforms, cross-platform access can be made through TCP/IP protocol, because the large database (such as DB2, etc.) provides IP address and corresponding port of data service, service can be obtained through IP and port.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention; any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for constructing metadata relationship based on a graphic database is characterized by comprising the following steps:

s2, according to the characteristics of each business system and the database version, ETL task configuration is carried out on each business system through an ETL tool;

2. The method for constructing metadata relationship based on graphic database as claimed in claim 1, wherein in step S1, the database table analysis step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.

3. The method for constructing the metadata relationship based on the graphic database according to claim 1, wherein the big data platform comprises a data governance platform and a task scheduling platform.

4. The method for constructing metadata relationship based on graphic database as claimed in claim 3, wherein in step S2, said ETL tool is configured to collect and couple each business system data with data, and manage the collected metadata and data through said data management platform.

5. The method for constructing the metadata relationship based on the graphic database according to claim 4, wherein the data management platform manages the metadata in a manner that: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library.

6. The method for constructing metadata relationship based on graphic database according to claim 5, wherein the task scheduling platform is used for performing data warehouse layering on the data which is completed by the data administration platform to construct a comprehensive database or a thematic database.

7. The method for constructing the metadata relationship based on the graphic database according to claim 6, wherein during the process of scheduling data, the task scheduling platform makes the data flow direction update to the JanusGraph library through a log component.