CN112925777A - Method and system for detecting data blood margin of HIVE database - Google Patents

Method and system for detecting data blood margin of HIVE database Download PDF

Info

Publication number
CN112925777A
CN112925777A CN202110211183.1A CN202110211183A CN112925777A CN 112925777 A CN112925777 A CN 112925777A CN 202110211183 A CN202110211183 A CN 202110211183A CN 112925777 A CN112925777 A CN 112925777A
Authority
CN
China
Prior art keywords
data
hive
log
neo4j
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110211183.1A
Other languages
Chinese (zh)
Inventor
苏瑀
陈筱进
刘登贺
张世杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Yillion Bank Co ltd
Original Assignee
Jilin Yillion Bank Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Yillion Bank Co ltd filed Critical Jilin Yillion Bank Co ltd
Priority to CN202110211183.1A priority Critical patent/CN112925777A/en
Publication of CN112925777A publication Critical patent/CN112925777A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for detecting the blood margin of data in a HIVE database, wherein the method comprises the following steps: configuring a LineageLogger Hook function; analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The invention can effectively complete the analysis and combing of the data blood relationship among the data tables and the fields.

Description

Method and system for detecting data blood margin of HIVE database
Technical Field
The invention relates to the technical field of data management of big data, in particular to a method and a system for detecting the data blood margin of a HIVE database.
Background
Since the age of big data in 2013, the big data brings new opportunities and challenges to the development of various industries, and the importance of various industries on the implied value in the mass data is increasing day by day. The data warehouse collects all commonly used and important business related index data from mass data, time cost of data retrieval is reduced, data quality and consistency are improved, application to historical data is improved, and therefore the value of data hiding is better mined.
The data blood relationship vividly depicts the data from bottom to top to be collected layer by layer, accurately and clearly reveals the blood relationship among all levels of data entities, and powerfully supports the development, test, operation and maintenance of a business system. It records the entire history of data processing, including the origin of the data and all subsequent processes of processing the data, and is particularly important for analyzing the data, tracking the dynamic evolution of the data, measuring the reliability of the data, ensuring the quality of the data, and the like. Along with the operation of the system and the continuous adjustment of the related business system in the practical application process, more and more data nodes have problems, the maintenance cost is very high, and only a few common reports work normally. If the situation occurs, tracing can be carried out according to the data blood relationship, and specific nodes are detected to have problems.
When a certain part of data is abnormal and alarmed, the reason for analyzing the data abnormality can be tracked downwards through the data blood relationship graph, and which data entities can be influenced can be analyzed upwards through the influence graph. When the table structure changes, it can be analyzed by the impact graph which programs need to be modified. Meanwhile, the data consanguinity relationship is beneficial to better combing business of data warehouse colleagues, and functions of establishing the dependency relationship of ETL task scheduling more conveniently and quickly judging whether the task batching failure affects a downstream system or not and the like are achieved.
Metadata management becomes more and more important as data warehouse access to tables and models built increases, and metadata table consanguineous relationships maintain the relationships between tables. And the good metadata management can clearly and definitely see the relation between each table and the model. The mining of the blood relationship of the metadata plays an important role in tracking the data flow direction, troubleshooting the business problem, reducing the maintenance cost, improving the development efficiency and the like.
Therefore, how to effectively determine the blood relationship of data is an urgent problem to be solved.
Disclosure of Invention
In view of this, the invention provides a method for detecting the data blood relationship of the HIVE database, which can effectively complete the analysis and combing of the data blood relationship among the data tables and fields.
The invention provides a method for detecting the blooding margin of data in a HIVE database, which comprises the following steps:
configuring a LineageLogger Hook function;
analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
querying the dependency relationship between the fields by using a neo4j interface;
calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin.
Preferably, the configuring the LineageLogger Hook function includes:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
Preferably, the calling of the graph database neo4j API interface, parsing the JSON string, and performing visual display on the data blood margin includes:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
A system for detecting HIVE database data bloods borders, comprising:
the configuration module is used for configuring a LineageLogger Hook function;
the first analysis module is used for analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
the cleaning module is used for cleaning data of the hive.log to form a JOIN format and importing the cleaned data into an open source database neo4 j;
the query module is used for querying the dependency relationship between the fields by using the neo4j interface;
and the second analysis module is used for calling the graph database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
Preferably, the configuration module is specifically configured to:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
Preferably, the second parsing module is specifically configured to:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
In summary, the invention discloses a method for detecting the blood margin of data in an HIVE database, when the blood margin of the data in the HIVE database needs to be detected, firstly configuring a LineageLogiger Hook function; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The invention can effectively complete the analysis and combing of the data blood relationship among the data tables and the fields.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method of embodiment 1 of the method for detecting the data blooding margin of the HIVE database disclosed in the present invention;
FIG. 2 is a flowchart of a method of embodiment 2 of the method for detecting the data blooding margin of the HIVE database disclosed in the present invention;
FIG. 3 is a flowchart of a method of embodiment 3 of the method for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment 1 of the system for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 5 is a schematic structural diagram of an embodiment 2 of the system for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 6 is a schematic structural diagram of an embodiment 3 of the system for detecting the data blooding margin of the HIVE database.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, which is a flowchart of a method of embodiment 1 of the method for detecting the blood margin of the data in the HIVE database, the method may include the following steps:
s101, configuring a LineageLogger Hook function;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
S102, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S103, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S104, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And S105, calling a graph database neo4j API interface, analyzing JSON strings, and performing visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
In summary, in the above embodiment, when the blood margin of the data in the HIVE database needs to be detected, the LineageLogger Hook function is configured first; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The analysis and the combing of the data blood relationship among the data tables and the fields can be effectively finished.
As shown in fig. 2, which is a flowchart of a method of embodiment 2 of the method for detecting the blood margin of the data in the HIVE database disclosed in the present invention, the method may include the following steps:
s201, configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version, and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
S202, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S203, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S204, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And S205, calling a graph database neo4j API interface, analyzing JSON strings, and performing visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
To sum up, in this embodiment, on the basis of the above embodiments, when configuring the LineageLogger Hook function, the hive-site.xml file may be configured in a manner of adding a parameter above the hive2.0 version, and the Hook output is configured at the same time, so as to implement the LineageLogger Hook function configuration.
As shown in fig. 3, which is a flowchart of a method of embodiment 3 of the method for detecting the blood margin of the data in the HIVE database, the method may include the following steps:
s301, configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version, and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
S302, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S303, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S304, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
S305, calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server object) strings, and visually displaying the data blooding margin.
And finally, calling a graph database neo4j API (application programming interface) through a visual display tool Tableau with higher flexibility and dynamic performance, analyzing JSON (Java Server object) strings, and realizing visual display of data blood margin characterization.
In summary, from the perspective of analyzing the HiveSql method, the present invention configures the LineageLogger Hook function to analyze the HiveSql above the Hive2.0 version, and finally realizes field analysis based on Hive component operation, obtains the dependency relationship between fields, and generates a log, and the method can fully utilize the Hive internal method to analyze, improve the efficiency of analyzing the HiveSql, and reduce the complexity of analysis; the log of the field dependency relationship is cleaned and preprocessed, and then is stored into a JSON format, so that the storage and the query are convenient, a JSON format data file is imported into an open source graph database neo4j, the combination of the field dependency relationship is realized by relying on the strong incidence relationship representation of neo4j, and the visual display of the data consanguinity is realized by calling a graph database neo4j API (application program interface) through a visual display tool Tableau with high flexibility and dynamics. The overall maintenance cost of the plurality of bins is reduced, and the data quality problem is reduced.
As shown in fig. 4, which is a schematic structural diagram of an embodiment 1 of the system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
a configuration module 401, configured to configure a LineageLogger Hook function;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
A first analysis module 402, configured to analyze the HiveSql based on the LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 403, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source graph database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 404, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And the second analysis module 405 is used for calling the database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
In summary, in the above embodiment, when the blood margin of the data in the HIVE database needs to be detected, the LineageLogger Hook function is configured first; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The analysis and the combing of the data blood relationship among the data tables and the fields can be effectively finished.
As shown in fig. 5, which is a schematic structural diagram of embodiment 2 of a system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
a configuration module 501, configured to configure a hive-site.xml file by adding parameters above the hive2.0 version, and configure Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
A first analysis module 502, configured to analyze the HiveSql based on the LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 503, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source graph database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 504, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And the second analysis module 505 is configured to call the database neo4j API interface, analyze the JSON string, and perform visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
To sum up, in this embodiment, on the basis of the above embodiments, when configuring the LineageLogger Hook function, the hive-site.xml file may be configured in a manner of adding a parameter above the hive2.0 version, and the Hook output is configured at the same time, so as to implement the LineageLogger Hook function configuration.
As shown in fig. 6, which is a schematic structural diagram of embodiment 3 of the system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
the configuration module 601 is used for configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
A first analysis module 602, configured to analyze the HiveSql based on a LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 603, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 604, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
The second parsing module 605 is configured to invoke a graph database neo4j API interface through a visualization display tool, Tableau, parse JSON strings, and perform visualization display on data blood margins.
And finally, calling a graph database neo4j API (application programming interface) through a visual display tool Tableau with higher flexibility and dynamic performance, analyzing JSON (Java Server object) strings, and realizing visual display of data blood margin characterization.
In summary, from the perspective of analyzing the HiveSql method, the present invention configures the LineageLogger Hook function to analyze the HiveSql above the Hive2.0 version, and finally realizes field analysis based on Hive component operation, obtains the dependency relationship between fields, and generates a log, and the method can fully utilize the Hive internal method to analyze, improve the efficiency of analyzing the HiveSql, and reduce the complexity of analysis; the log of the field dependency relationship is cleaned and preprocessed, and then is stored into a JSON format, so that the storage and the query are convenient, a JSON format data file is imported into an open source graph database neo4j, the combination of the field dependency relationship is realized by relying on the strong incidence relationship representation of neo4j, and the visual display of the data consanguinity is realized by calling a graph database neo4j API (application program interface) through a visual display tool Tableau with high flexibility and dynamics. The overall maintenance cost of the plurality of bins is reduced, and the data quality problem is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method for detecting HIVE database data bloods borders, comprising:
configuring a LineageLogger Hook function;
analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
querying the dependency relationship between the fields by using a neo4j interface;
calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin.
2. The method of claim 1, wherein configuring the LineageLogger Hook function comprises:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
3. The method according to claim 2, wherein the calling of the graph database neo4j API interface, parsing JSON string, and visualizing the data blooding margin comprises:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
4. A system for detecting HIVE database data bloods borders, comprising:
the configuration module is used for configuring a LineageLogger Hook function;
the first analysis module is used for analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
the cleaning module is used for cleaning data of the hive.log to form a JOIN format and importing the cleaned data into an open source database neo4 j;
the query module is used for querying the dependency relationship between the fields by using the neo4j interface;
and the second analysis module is used for calling the graph database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
5. The system of claim 4, wherein the configuration module is specifically configured to:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
6. The system of claim 5, wherein the second parsing module is specifically configured to:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
CN202110211183.1A 2021-02-25 2021-02-25 Method and system for detecting data blood margin of HIVE database Pending CN112925777A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110211183.1A CN112925777A (en) 2021-02-25 2021-02-25 Method and system for detecting data blood margin of HIVE database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110211183.1A CN112925777A (en) 2021-02-25 2021-02-25 Method and system for detecting data blood margin of HIVE database

Publications (1)

Publication Number Publication Date
CN112925777A true CN112925777A (en) 2021-06-08

Family

ID=76171788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110211183.1A Pending CN112925777A (en) 2021-02-25 2021-02-25 Method and system for detecting data blood margin of HIVE database

Country Status (1)

Country Link
CN (1) CN112925777A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343036A (en) * 2021-08-04 2021-09-03 杭州远眺科技有限公司 Data blood relationship analysis method and system based on key topological structure analysis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system
CN112084270A (en) * 2020-09-17 2020-12-15 腾讯科技(深圳)有限公司 Data blood margin processing method and device, storage medium and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446279A (en) * 2018-10-15 2019-03-08 顺丰科技有限公司 Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN110232056A (en) * 2019-05-21 2019-09-13 苏宁云计算有限公司 A kind of the blood relationship analytic method and its tool of structured query language
CN111538743A (en) * 2020-04-22 2020-08-14 电子科技大学 SQL-based data blood relationship analysis method and system
CN112084270A (en) * 2020-09-17 2020-12-15 腾讯科技(深圳)有限公司 Data blood margin processing method and device, storage medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阿武Z: "HIVE 字段级血缘分析 写入Neo4j", 《HTTPS://BLOG.CSDN.NET/XW514124202/ARTICLE/DETAILS/94029564》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343036A (en) * 2021-08-04 2021-09-03 杭州远眺科技有限公司 Data blood relationship analysis method and system based on key topological structure analysis

Similar Documents

Publication Publication Date Title
US11968264B2 (en) Systems and methods for operation management and monitoring of bots
CN108847994B (en) Alarm positioning method, device, equipment and storage medium based on data analysis
US11537492B1 (en) Application topology graph for representing instrumented and uninstrumented objects in a microservices-based architecture
CN110232024B (en) Software automation test framework and test method
EP3757793A1 (en) Machine-assisted quality assurance and software improvement
US8978016B2 (en) Error list and bug report analysis for configuring an application tracer
US20150347283A1 (en) Multiple Tracer Configurations Applied on a Function-by-Function Level
US20140317454A1 (en) Tracer List for Automatically Controlling Tracer Behavior
US20140317605A1 (en) User Interaction Analysis of Tracer Data for Configuring an Application Tracer
US20140317604A1 (en) Real Time Analysis of Tracer Summaries to Change Tracer Behavior
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
US8688729B2 (en) Efficiently collecting transaction-separated metrics in a distributed enviroment
US20160266998A1 (en) Error list and bug report analysis for configuring an application tracer
US9588869B2 (en) Computer implemented system and method of instrumentation for software applications
US20130047169A1 (en) Efficient Data Structure To Gather And Distribute Transaction Events
Raemaekers et al. An analysis of dependence on third-party libraries in open source and proprietary systems
WO2017172669A2 (en) Tagged tracing, logging and performance measurements
US20180143897A1 (en) Determining idle testing periods
CN110837496A (en) Data quality management method and system based on dynamic sql
CN113762914A (en) Early warning auditing method and related equipment
CN112925777A (en) Method and system for detecting data blood margin of HIVE database
Zhao et al. Research on international standardization of software quality and software testing
KR101830936B1 (en) Performance Improving System Based Web for Database and Application
US20220294710A1 (en) Automatic automation recommendation
Chakraborty et al. Architecture of a modern monitoring system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210608

RJ01 Rejection of invention patent application after publication