CN114925042A - Method for constructing metadata relation based on graphic database - Google Patents

Method for constructing metadata relation based on graphic database Download PDF

Info

Publication number
CN114925042A
CN114925042A CN202210706119.5A CN202210706119A CN114925042A CN 114925042 A CN114925042 A CN 114925042A CN 202210706119 A CN202210706119 A CN 202210706119A CN 114925042 A CN114925042 A CN 114925042A
Authority
CN
China
Prior art keywords
data
metadata
database
platform
constructing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210706119.5A
Other languages
Chinese (zh)
Inventor
李良昆
岳正飞
杨融
高攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Positive Network Technology Co ltd
Original Assignee
Positive Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Positive Network Technology Co ltd filed Critical Positive Network Technology Co ltd
Priority to CN202210706119.5A priority Critical patent/CN114925042A/en
Publication of CN114925042A publication Critical patent/CN114925042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for constructing a metadata relation based on a graphic database, which is characterized by comprising the following steps of: s1, analyzing database tables of each department service system, and generating a data general survey report; s2, according to the characteristics of each service system and the database version, ETL task configuration is carried out on each service system through an ETL tool; and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform. According to the method, the metadata management container is constructed based on the Janus graph database, the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government affair large-scale data management and application process are improved, and the problems that metadata tracing is slow, the operation efficiency is poor, clustering deployment cannot be achieved and the concurrency is low when the relationship depth is deep are solved.

Description

Method for constructing metadata relation based on graphic database
Technical Field
The invention belongs to the technical field of graphic databases, and particularly relates to a method for constructing a metadata relationship based on a graphic database.
Background
The digital government system is used for carrying out the overall planning of national digital government construction, accelerating the promotion of the digital government construction, gradually changing the functions of the government and changing the original management mode into an advanced service mode. In the process of government function transformation, information barriers among the existing government departments must be broken, government data development and sharing are continuously promoted, resource integration is promoted, and the treatment capacity is improved. Meanwhile, government departments integrate business system databases used by the departments in a centralized manner by constructing large data centers in provinces, cities, counties and counties to form a data warehouse which is open to the outside for sharing. In the process of external shared use of a data warehouse, the conditions of non-uniform data standards, unclear data association relation and the like exist, so that a lot of shared data become problem metadata, the data sharing efficiency is low, the shared data cannot be directly used and the like.
The construction of the blood relationship of metadata widely used at present mainly takes a traditional relational database as a main part, and although the metadata can be traced, certain limitations and defects exist in the actual use process. If the metadata with the relation depth larger than a certain data value cannot be traced, the operation efficiency is poor when the metadata with the relation depth smaller than the certain data value is traced; secondly, a technical interface needs to be developed to match data tracing, so that a higher technical threshold is provided; and thirdly, the support concurrency is low, high concurrency service cannot be supported, and the like. And the government affair metadata more emphasizes timeliness and accuracy of data tracing, so that the existing metadata consanguinity relationship construction has defects in government affair data application and cannot be effectively supported.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method for constructing a metadata relationship based on a graph database, which is used for constructing a metadata management container based on a Janus graph database, improving the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government large-scale data management and application process, and solving the problems of slow metadata tracing, poor operation efficiency, incapability of clustered deployment and low support concurrency when the relationship depth is deep.
The invention provides the following technical scheme:
a method for constructing metadata relationship based on a graphic database comprises the following steps:
s1, analyzing the database table of each department business system, and generating a data general survey report;
s2, according to the characteristics of each business system and the database version, ETL task configuration is carried out on each business system through an ETL tool; and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform. And registering the configured tasks to a task scheduling center of the big data platform, and collecting a service system database.
Preferably, in step S1, the database table analyzing step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.
Preferably, the big data platform comprises a data governance platform and a task scheduling platform.
Preferably, in step S2, the ETL tool is configured to collect data of each business system, couple each metadata with the data, and manage the metadata through the data management platform by using the collected metadata and data.
Preferably, the implementation manner of the data management platform for managing the metadata is as follows: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library.
Preferably, the task scheduling platform is used for performing data warehouse layering on the data which are treated by the data treatment platform to construct a comprehensive library or a thematic database.
Preferably, in the process of scheduling the data, the data flow direction of the task scheduling platform updates the data flow direction to the JanusGraph library through a log component.
Preferably, in step S3, the big data platform further includes an operation and maintenance monitoring platform for monitoring the task collection situation in real time.
Preferably, the log component includes a system log, an error log and a middle table log, the system log is used for recording operations on the data source and the data warehouse, and the content recorded by the system log includes the current user, the system time, the operations made and the total number of users; the error log is used for recording error information when a flow error point is generated, and the error log can help service developers to debug; the intermediate table log is used for recording the creation information, the system running time and the running period which are constructed in the data transfer process of the system and the flow condition of the program during data conversion. How the display data is extracted from the source database and loaded into the target database.
Preferably, the ETL tool works in a manner of first extracting and reloading, and finally performing data conversion in a system warehouse. I.e. the data conversion follows the data loading.
Compared with the prior art, the invention has the following beneficial effects:
(1) according to the method for constructing the metadata relationship based on the graphic database, disclosed by the invention, the efficiency and intuitiveness of metadata identification, modeling, metadata relationship management and data view generation in the government affair large-scale data management and application process are improved by constructing the metadata management container based on the Janus graph database, and the problems of slow metadata tracing, poor operation efficiency, incapability of clustered deployment and low support concurrency when the relationship depth is deep are solved.
(2) The method for constructing the metadata relationship based on the graphic database supports the collection of multi-type relational or non-relational databases through the adopted ETL tool, can collect multilinks, can ensure the continuous transmission of breakpoints, can collect metadata and support the collection of the data, and does not need a complex treatment process in the collection process.
(3) The invention relates to a construction method of a metadata relationship based on a graphic database, which can establish a unified standard for data through a data management platform, check the quality of the data, accurately describe the metadata attribute of the data, analyze the association relationship between the data, form a data resource catalog, realize the rapid retrieval of the data and manage the whole life cycle of the data.
(4) The metadata relationship construction method based on the graph database has the advantages that the larger the metadata tracing relationship depth is, the more obvious the advantages of the method are, the data depth can reflect the value of the data in the government affair metadata management application process, and the processing data depth is also the basis for constructing a metadata model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a flow chart of a method for removing eutectic phosphorus in accordance with the present invention.
FIG. 2 is a chart of relationship between blood vessels according to the present invention.
Fig. 3 is a schematic diagram of ETL operation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Example 1
Referring to fig. 1-2, a method for constructing a metadata relationship based on a graph database includes the following steps:
s1, analyzing database tables of each department service system, and generating a data general survey report;
s2, according to the characteristics of each service system and the database version, ETL (Extract-Transform-Load) task configuration is carried out on each service system through an ETL tool;
and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform.
In step S1, the database table analysis step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.
The big data platform comprises a data management platform and a task scheduling platform.
In step S2, the ETL tool is configured to collect data of each business system, couple each metadata with the data, and effectively obtain which field, which table, and which library any data belongs to, and manage the metadata and the data that have been collected through the data management platform.
The data management platform manages the metadata in the following implementation mode: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library. The bloody relationship diagram is shown in fig. 2, for example, the name (name) in the user table (user) in the service library (yw _ ku) below the human-society service system (yw _ rs) is formed into an association relationship yw _ re-ye _ ku-user-name.
Through the metadata blood relationship, the construction process of the whole metadata family is reconstructed, and the venation and the path of the family members connected with each other are described. When some data is wrong or abnormal, the source of the problem can be locked by upward analysis; when some data is modified, it can be analyzed downwards to obtain which data entities have affected data. The granularity of tracking is also refined to fields by providing column level access.
And the task scheduling platform is used for carrying out data warehouse layering on the data which are treated by the data treatment platform to construct a comprehensive database or a thematic database.
And in the process of scheduling the data, the data flow direction of the task scheduling platform updates the data flow direction to the Janus graph library finally in a log record mode. For example, data is from ODS (original data) to DWD (detailed data layer) to DWS (service data layer) to last subject library or ADS (data application layer), which particularly shows that only data is recorded to the position of the comprehensive library or subject library in the circulation process, and no intermediate flow is required to be recorded. The relationship of the blood relationship of the aforementioned social system is that the name (name) in the user table (user) in the service library (yw _ ku) below the social service system (yw _ rs) forms the comprehensive population library (ADS _ zh _ ku).
In step S3, the big data platform further includes an operation and maintenance monitoring platform for monitoring the task collection status in real time.
And the early-stage data general survey comprehensively and deeply understands the collected service system and has a clear general survey report on a database, fields and attributes of the service system.
The adopted ETL tool supports the collection of multi-type relational or non-relational databases, can carry out multilink collection, can ensure the continuous transmission of breakpoints, can collect metadata and also support the collection of data, and does not need a complex management process in the collection process.
And a data task scheduling platform and an operation and maintenance monitoring platform are set up, and data scheduling, operation and maintenance control and the like are carried out on the whole task chain.
The data management platform can establish a unified standard for data, check the quality of the data, accurately describe the data element attributes, analyze the incidence relation among the data, form a data resource catalog, realize the rapid retrieval of the data and manage the whole life cycle of the data.
The larger the metadata tracing relation depth is, the more obvious the advantages of the method are. In the process of government affair metadata management application, the depth of data can reflect the value of the data, and the processing data depth is also the basis for building a metadata model. The method is based on the JanusGraph graphic database, the design management of the metadata is realized, the source business system data can be successfully traced back, the relation between the metadata is displayed in a graphic mode, and the display effect is more visual and clear.
Example 2
On the basis of the embodiment 1, please refer to fig. 3, where the log component includes a system log, an error log, and a middle table log, the system log is used to record operations on a data source and a data warehouse, and the content recorded by the system log includes a current user, a system time, operations performed, and a total number of users; the error log is used for recording error information when a flow error point is generated, and the error log can help service developers to debug; the intermediate table log is used for recording the creation information, the system running time and the running period which are constructed in the data transfer process of the system and the flow condition of the program during data conversion. How the display data is extracted from the source database and loaded into the target database.
The working mode of the ETL tool is realized by firstly extracting and reloading and finally performing data conversion in a system warehouse. I.e. the data conversion follows the data loading. The conventional ETL tool has several disadvantages, 1, performance problem, and the data transformation step of the ETL process is obviously the most operational one of the three steps, and the conventional ETL method transformation step is completely executed by the ETL on a special server. The ETL tool converts or quality checks the data item by item, which easily makes the conversion flow become the bottleneck of the whole ETL process. In addition, the forwarding of data between sources, destinations and tools also increases network traffic and causes additional operational problems. An ETL procedure such as this: data is extracted from a database side source table, and some data needs to be selected from a certain reference table of a data warehouse to complete the data warehouse, and then the data is loaded into a target table of the data warehouse. Conventional ETL tools typically accomplish this in several ways: a. loading a warehouse end reference table in a memory; the entire reference table is retrieved from the data repository and loaded into the central engine memory. Then, the recombination (conversion) of the extracted data of the source table is completed in the engine memory, and finally the data is loaded into a destination table of the data warehouse. If the reference table is large, runtime requires a lot of memory and a long time to load and re-index the data in the engine. b. Searching the reference table row by row; for each extraction ETL tool, look up the reference tables of the data warehouse. The query returns a separate row that matches the source data. If the source table has 1000 ten thousand rows, the ETL engine sends 1000 ten thousand queries. This will significantly slow down the ETL process and significantly increase the overhead of the data warehouse. This is almost impossible to accomplish for large scale business data integration. Obviously, both of these approaches are inefficient, and thus it can be seen that the conventional ETL method has certain drawbacks in performance. 2, cost problem: generally, the cost of implementing the ETL process will be offset by saving labor. In the ROI (Return On investment) analysis of the ETL process, additional and potential expense must be taken into account. The most obvious cost is to purchase a dedicated server and ETL engine software. Since the ETL engine is a middle-level component that performs a large amount of budgets, a powerful server, even a cluster of servers, is needed to meet the high-intensity computing requirements. As the size of the data warehouse increases, the ETL server also requires ongoing hardware and software maintenance upgrade costs during runtime. In addition, conventional ETL tools have many potential costs, including the expense of consulting required for deployment, debugging, and the expense of code rewriting as integration requirements change. The difference between the data transfer system structure designed by the invention and the traditional process is that the data conversion is carried out after the data is loaded; placing the data transfer after loading remedies several disadvantages of conventional ETL tools. As shown in fig. 3, the data conversion is performed in the system warehouse after the data is extracted and reloaded, and the conversion function or trigger may be written in PL/SQL language during or after the data is loaded. Such data transfer is not simply a complete replication of the data source to the data warehouse side, where a mirror image of the data source is created. For a large database across platforms, cross-platform access can be made through TCP/IP protocol, because the large database (such as DB2, etc.) provides IP address and corresponding port of data service, service can be obtained through IP and port.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention; any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A method for constructing metadata relationship based on a graphic database is characterized by comprising the following steps:
s1, analyzing the database table of each department business system, and generating a data general survey report;
s2, according to the characteristics of each business system and the database version, ETL task configuration is carried out on each business system through an ETL tool;
and S3, performing registration, administration and scheduling operation on the configured tasks through a big data platform.
2. The method for constructing metadata relationship based on graphic database as claimed in claim 1, wherein in step S1, the database table analysis step is: firstly, the database table structure of the existing business system of each department is arranged, and then the incidence relation and the real field meaning among the fields of each business system are analyzed.
3. The method for constructing the metadata relationship based on the graphic database according to claim 1, wherein the big data platform comprises a data governance platform and a task scheduling platform.
4. The method for constructing metadata relationship based on graphic database as claimed in claim 3, wherein in step S2, said ETL tool is configured to collect and couple each business system data with data, and manage the collected metadata and data through said data management platform.
5. The method for constructing the metadata relationship based on the graphic database according to claim 4, wherein the data management platform manages the metadata in a manner that: and performing data association on the collected metadata in a micro-service mode, and directly writing the generated association relationship into a JanusGraph library.
6. The method for constructing metadata relationship based on graphic database according to claim 5, wherein the task scheduling platform is used for performing data warehouse layering on the data which is completed by the data administration platform to construct a comprehensive database or a thematic database.
7. The method for constructing the metadata relationship based on the graphic database according to claim 6, wherein during the process of scheduling data, the task scheduling platform makes the data flow direction update to the JanusGraph library through a log component.
CN202210706119.5A 2022-06-21 2022-06-21 Method for constructing metadata relation based on graphic database Pending CN114925042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210706119.5A CN114925042A (en) 2022-06-21 2022-06-21 Method for constructing metadata relation based on graphic database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210706119.5A CN114925042A (en) 2022-06-21 2022-06-21 Method for constructing metadata relation based on graphic database

Publications (1)

Publication Number Publication Date
CN114925042A true CN114925042A (en) 2022-08-19

Family

ID=82814669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210706119.5A Pending CN114925042A (en) 2022-06-21 2022-06-21 Method for constructing metadata relation based on graphic database

Country Status (1)

Country Link
CN (1) CN114925042A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116541887B (en) * 2023-07-07 2023-09-15 云启智慧科技有限公司 Data security protection method for big data platform

Similar Documents

Publication Publication Date Title
US8671084B2 (en) Updating a data warehouse schema based on changes in an observation model
CN107315776B (en) Data management system based on cloud computing
Aboutorabiª et al. Performance evaluation of SQL and MongoDB databases for big e-commerce data
US10303702B2 (en) System and method for analysis and management of data distribution in a distributed database environment
CN111459985B (en) Identification information processing method and device
CN110300963A (en) Data management system in large-scale data repository
CN109997125A (en) System for importing data to data storage bank
CN109558393B (en) Data model construction method, device, equipment and storage medium
CN104899295B (en) A kind of heterogeneous data source data relation analysis method
CN103430144A (en) Data source analytics
CN106202207A (en) A kind of index based on HBase ORM and searching system
CN104205039A (en) Interest-driven business intelligence systems and methods of data analysis using interest-driven data pipelines
CN103425762A (en) Telecom operator mass data processing method based on Hadoop platform
CN104424360A (en) Method and system for accessing a set of data tables in a source database
CN112527774A (en) Data center building method and system and storage medium
CN111563130A (en) Data credible data management method and system based on block chain technology
CN114116716A (en) Hierarchical data retrieval method, device and equipment
Szárnyas et al. The LDBC social network benchmark: Business intelligence workload
CN114880405A (en) Data lake-based data processing method and system
CN107423390A (en) A kind of real time data synchronization algorithm based on inside OLTP OLAP mixed relationship type Database Systems
CN112883001A (en) Data processing method, device and medium based on marketing and distribution through data visualization platform
CN114925042A (en) Method for constructing metadata relation based on graphic database
CN115640300A (en) Big data management method, system, electronic equipment and storage medium
CN113934750A (en) Data blood relationship analysis method based on compiling mode
CN115329011A (en) Data model construction method, data query method, data model construction device and data query device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination