CN112925777A - Method and system for detecting data blood margin of HIVE database - Google Patents
Method and system for detecting data blood margin of HIVE database Download PDFInfo
- Publication number
- CN112925777A CN112925777A CN202110211183.1A CN202110211183A CN112925777A CN 112925777 A CN112925777 A CN 112925777A CN 202110211183 A CN202110211183 A CN 202110211183A CN 112925777 A CN112925777 A CN 112925777A
- Authority
- CN
- China
- Prior art keywords
- data
- hive
- log
- neo4j
- analyzing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 210000004369 blood Anatomy 0.000 title claims abstract description 53
- 239000008280 blood Substances 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000000007 visual effect Effects 0.000 claims abstract description 26
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 238000004140 cleaning Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 description 48
- 238000012512 characterization method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000012800 visualization Methods 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 230000008676 import Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a system for detecting the blood margin of data in a HIVE database, wherein the method comprises the following steps: configuring a LineageLogger Hook function; analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The invention can effectively complete the analysis and combing of the data blood relationship among the data tables and the fields.
Description
Technical Field
The invention relates to the technical field of data management of big data, in particular to a method and a system for detecting the data blood margin of a HIVE database.
Background
Since the age of big data in 2013, the big data brings new opportunities and challenges to the development of various industries, and the importance of various industries on the implied value in the mass data is increasing day by day. The data warehouse collects all commonly used and important business related index data from mass data, time cost of data retrieval is reduced, data quality and consistency are improved, application to historical data is improved, and therefore the value of data hiding is better mined.
The data blood relationship vividly depicts the data from bottom to top to be collected layer by layer, accurately and clearly reveals the blood relationship among all levels of data entities, and powerfully supports the development, test, operation and maintenance of a business system. It records the entire history of data processing, including the origin of the data and all subsequent processes of processing the data, and is particularly important for analyzing the data, tracking the dynamic evolution of the data, measuring the reliability of the data, ensuring the quality of the data, and the like. Along with the operation of the system and the continuous adjustment of the related business system in the practical application process, more and more data nodes have problems, the maintenance cost is very high, and only a few common reports work normally. If the situation occurs, tracing can be carried out according to the data blood relationship, and specific nodes are detected to have problems.
When a certain part of data is abnormal and alarmed, the reason for analyzing the data abnormality can be tracked downwards through the data blood relationship graph, and which data entities can be influenced can be analyzed upwards through the influence graph. When the table structure changes, it can be analyzed by the impact graph which programs need to be modified. Meanwhile, the data consanguinity relationship is beneficial to better combing business of data warehouse colleagues, and functions of establishing the dependency relationship of ETL task scheduling more conveniently and quickly judging whether the task batching failure affects a downstream system or not and the like are achieved.
Metadata management becomes more and more important as data warehouse access to tables and models built increases, and metadata table consanguineous relationships maintain the relationships between tables. And the good metadata management can clearly and definitely see the relation between each table and the model. The mining of the blood relationship of the metadata plays an important role in tracking the data flow direction, troubleshooting the business problem, reducing the maintenance cost, improving the development efficiency and the like.
Therefore, how to effectively determine the blood relationship of data is an urgent problem to be solved.
Disclosure of Invention
In view of this, the invention provides a method for detecting the data blood relationship of the HIVE database, which can effectively complete the analysis and combing of the data blood relationship among the data tables and fields.
The invention provides a method for detecting the blooding margin of data in a HIVE database, which comprises the following steps:
configuring a LineageLogger Hook function;
analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
querying the dependency relationship between the fields by using a neo4j interface;
calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin.
Preferably, the configuring the LineageLogger Hook function includes:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
Preferably, the calling of the graph database neo4j API interface, parsing the JSON string, and performing visual display on the data blood margin includes:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
A system for detecting HIVE database data bloods borders, comprising:
the configuration module is used for configuring a LineageLogger Hook function;
the first analysis module is used for analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
the cleaning module is used for cleaning data of the hive.log to form a JOIN format and importing the cleaned data into an open source database neo4 j;
the query module is used for querying the dependency relationship between the fields by using the neo4j interface;
and the second analysis module is used for calling the graph database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
Preferably, the configuration module is specifically configured to:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
Preferably, the second parsing module is specifically configured to:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
In summary, the invention discloses a method for detecting the blood margin of data in an HIVE database, when the blood margin of the data in the HIVE database needs to be detected, firstly configuring a LineageLogiger Hook function; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The invention can effectively complete the analysis and combing of the data blood relationship among the data tables and the fields.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method of embodiment 1 of the method for detecting the data blooding margin of the HIVE database disclosed in the present invention;
FIG. 2 is a flowchart of a method of embodiment 2 of the method for detecting the data blooding margin of the HIVE database disclosed in the present invention;
FIG. 3 is a flowchart of a method of embodiment 3 of the method for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment 1 of the system for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 5 is a schematic structural diagram of an embodiment 2 of the system for detecting the data blooding margin of the HIVE database according to the present invention;
FIG. 6 is a schematic structural diagram of an embodiment 3 of the system for detecting the data blooding margin of the HIVE database.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, which is a flowchart of a method of embodiment 1 of the method for detecting the blood margin of the data in the HIVE database, the method may include the following steps:
s101, configuring a LineageLogger Hook function;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
S102, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S103, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S104, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And S105, calling a graph database neo4j API interface, analyzing JSON strings, and performing visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
In summary, in the above embodiment, when the blood margin of the data in the HIVE database needs to be detected, the LineageLogger Hook function is configured first; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The analysis and the combing of the data blood relationship among the data tables and the fields can be effectively finished.
As shown in fig. 2, which is a flowchart of a method of embodiment 2 of the method for detecting the blood margin of the data in the HIVE database disclosed in the present invention, the method may include the following steps:
s201, configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version, and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
S202, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S203, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S204, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And S205, calling a graph database neo4j API interface, analyzing JSON strings, and performing visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
To sum up, in this embodiment, on the basis of the above embodiments, when configuring the LineageLogger Hook function, the hive-site.xml file may be configured in a manner of adding a parameter above the hive2.0 version, and the Hook output is configured at the same time, so as to implement the LineageLogger Hook function configuration.
As shown in fig. 3, which is a flowchart of a method of embodiment 3 of the method for detecting the blood margin of the data in the HIVE database, the method may include the following steps:
s301, configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version, and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
S302, analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
S303, carrying out data cleaning on the hive.log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
S304, querying the dependency relationship among the fields by using a neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
S305, calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server object) strings, and visually displaying the data blooding margin.
And finally, calling a graph database neo4j API (application programming interface) through a visual display tool Tableau with higher flexibility and dynamic performance, analyzing JSON (Java Server object) strings, and realizing visual display of data blood margin characterization.
In summary, from the perspective of analyzing the HiveSql method, the present invention configures the LineageLogger Hook function to analyze the HiveSql above the Hive2.0 version, and finally realizes field analysis based on Hive component operation, obtains the dependency relationship between fields, and generates a log, and the method can fully utilize the Hive internal method to analyze, improve the efficiency of analyzing the HiveSql, and reduce the complexity of analysis; the log of the field dependency relationship is cleaned and preprocessed, and then is stored into a JSON format, so that the storage and the query are convenient, a JSON format data file is imported into an open source graph database neo4j, the combination of the field dependency relationship is realized by relying on the strong incidence relationship representation of neo4j, and the visual display of the data consanguinity is realized by calling a graph database neo4j API (application program interface) through a visual display tool Tableau with high flexibility and dynamics. The overall maintenance cost of the plurality of bins is reduced, and the data quality problem is reduced.
As shown in fig. 4, which is a schematic structural diagram of an embodiment 1 of the system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
a configuration module 401, configured to configure a LineageLogger Hook function;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
A first analysis module 402, configured to analyze the HiveSql based on the LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 403, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source graph database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 404, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And the second analysis module 405 is used for calling the database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
In summary, in the above embodiment, when the blood margin of the data in the HIVE database needs to be detected, the LineageLogger Hook function is configured first; then analyzing the HiveSql based on a LineagLogger Hook function to generate a hive.log log; carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j; querying the dependency relationship between the fields by using a neo4j interface; calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin. The analysis and the combing of the data blood relationship among the data tables and the fields can be effectively finished.
As shown in fig. 5, which is a schematic structural diagram of embodiment 2 of a system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
a configuration module 501, configured to configure a hive-site.xml file by adding parameters above the hive2.0 version, and configure Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
A first analysis module 502, configured to analyze the HiveSql based on the LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 503, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source graph database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 504, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
And the second analysis module 505 is configured to call the database neo4j API interface, analyze the JSON string, and perform visual display on the data blood margin.
And finally, calling a graph database neo4j API interface, analyzing JSON strings and realizing the visualization display of the data blood relationship representation.
To sum up, in this embodiment, on the basis of the above embodiments, when configuring the LineageLogger Hook function, the hive-site.xml file may be configured in a manner of adding a parameter above the hive2.0 version, and the Hook output is configured at the same time, so as to implement the LineageLogger Hook function configuration.
As shown in fig. 6, which is a schematic structural diagram of embodiment 3 of the system for detecting a blood margin in a HIVE database according to the present invention, the system may include:
the configuration module 601 is used for configuring a hive-site.xml file in a mode of adding parameters above the hive2.0 version and configuring Hook output at the same time;
when the data blooding margin of the HIVE database needs to be detected, firstly, a LineageLogger Hook function is configured above the HIVE2.0 version.
Specifically, from the perspective of analyzing the HiveSql method, a hive-site.xml file is configured in a mode of adding parameters above the hive2.0 version, and Hook output is configured at the same time, namely, a hive-log4j2.properties configuration file is modified, and the linkage logger Hook function is configured and completed.
A first analysis module 602, configured to analyze the HiveSql based on a LineageLogger Hook function, and generate a hive.log log;
after the configuration of the LineageLogger Hook function is completed, field analysis based on hive component operation is achieved based on the configured LineageLogger Hook function, the dependency relationship among the fields is obtained, and a hive.
A cleaning module 603, configured to clean data of the hive.log to form a JOIN format, and import the cleaned data into the open source database neo4 j;
after the hive.log log is generated, as the hive.log has more useless data, the inter-table field dependency relationship cannot be clearly and concisely displayed, and the Neo4j interface cannot be directly used for data access, the hive.log needs to be subjected to data cleaning, the cleaned data is stored in a JOIN format, and the data after relevant processing is imported into the open source database Neo4 j.
A query module 604, configured to query dependencies between fields using the neo4j interface;
then, depending on the strong association characterization of neo4j, a combination of field dependencies is implemented.
The second parsing module 605 is configured to invoke a graph database neo4j API interface through a visualization display tool, Tableau, parse JSON strings, and perform visualization display on data blood margins.
And finally, calling a graph database neo4j API (application programming interface) through a visual display tool Tableau with higher flexibility and dynamic performance, analyzing JSON (Java Server object) strings, and realizing visual display of data blood margin characterization.
In summary, from the perspective of analyzing the HiveSql method, the present invention configures the LineageLogger Hook function to analyze the HiveSql above the Hive2.0 version, and finally realizes field analysis based on Hive component operation, obtains the dependency relationship between fields, and generates a log, and the method can fully utilize the Hive internal method to analyze, improve the efficiency of analyzing the HiveSql, and reduce the complexity of analysis; the log of the field dependency relationship is cleaned and preprocessed, and then is stored into a JSON format, so that the storage and the query are convenient, a JSON format data file is imported into an open source graph database neo4j, the combination of the field dependency relationship is realized by relying on the strong incidence relationship representation of neo4j, and the visual display of the data consanguinity is realized by calling a graph database neo4j API (application program interface) through a visual display tool Tableau with high flexibility and dynamics. The overall maintenance cost of the plurality of bins is reduced, and the data quality problem is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
1. A method for detecting HIVE database data bloods borders, comprising:
configuring a LineageLogger Hook function;
analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
carrying out data cleaning on the hive.log log to form a JOIN format, and importing the cleaned data into an open source database neo4 j;
querying the dependency relationship between the fields by using a neo4j interface;
calling a graph database neo4j API interface, analyzing JSON strings, and carrying out visual display on the data blood margin.
2. The method of claim 1, wherein configuring the LineageLogger Hook function comprises:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
3. The method according to claim 2, wherein the calling of the graph database neo4j API interface, parsing JSON string, and visualizing the data blooding margin comprises:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
4. A system for detecting HIVE database data bloods borders, comprising:
the configuration module is used for configuring a LineageLogger Hook function;
the first analysis module is used for analyzing the HiveSql based on the LineagLogger Hook function to generate a hive.log log;
the cleaning module is used for cleaning data of the hive.log to form a JOIN format and importing the cleaned data into an open source database neo4 j;
the query module is used for querying the dependency relationship between the fields by using the neo4j interface;
and the second analysis module is used for calling the graph database neo4j API interface, analyzing the JSON string and carrying out visual display on the data blood margin.
5. The system of claim 4, wherein the configuration module is specifically configured to:
and configuring a hive-site xml file by adding parameters above the hive2.0 version, and simultaneously configuring Hook output.
6. The system of claim 5, wherein the second parsing module is specifically configured to:
and calling a graph database neo4j API (application program interface) through a visual display tool Tableau, analyzing JSON (Java Server connection) strings, and visually displaying the data blood margin.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110211183.1A CN112925777A (en) | 2021-02-25 | 2021-02-25 | Method and system for detecting data blood margin of HIVE database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110211183.1A CN112925777A (en) | 2021-02-25 | 2021-02-25 | Method and system for detecting data blood margin of HIVE database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112925777A true CN112925777A (en) | 2021-06-08 |
Family
ID=76171788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110211183.1A Pending CN112925777A (en) | 2021-02-25 | 2021-02-25 | Method and system for detecting data blood margin of HIVE database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112925777A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343036A (en) * | 2021-08-04 | 2021-09-03 | 杭州远眺科技有限公司 | Data blood relationship analysis method and system based on key topological structure analysis |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446279A (en) * | 2018-10-15 | 2019-03-08 | 顺丰科技有限公司 | Based on neo4j big data genetic connection management method, system, equipment and storage medium |
CN110232056A (en) * | 2019-05-21 | 2019-09-13 | 苏宁云计算有限公司 | A kind of the blood relationship analytic method and its tool of structured query language |
CN111538743A (en) * | 2020-04-22 | 2020-08-14 | 电子科技大学 | SQL-based data blood relationship analysis method and system |
CN112084270A (en) * | 2020-09-17 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Data blood margin processing method and device, storage medium and equipment |
-
2021
- 2021-02-25 CN CN202110211183.1A patent/CN112925777A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446279A (en) * | 2018-10-15 | 2019-03-08 | 顺丰科技有限公司 | Based on neo4j big data genetic connection management method, system, equipment and storage medium |
CN110232056A (en) * | 2019-05-21 | 2019-09-13 | 苏宁云计算有限公司 | A kind of the blood relationship analytic method and its tool of structured query language |
CN111538743A (en) * | 2020-04-22 | 2020-08-14 | 电子科技大学 | SQL-based data blood relationship analysis method and system |
CN112084270A (en) * | 2020-09-17 | 2020-12-15 | 腾讯科技(深圳)有限公司 | Data blood margin processing method and device, storage medium and equipment |
Non-Patent Citations (1)
Title |
---|
阿武Z: "HIVE 字段级血缘分析 写入Neo4j", 《HTTPS://BLOG.CSDN.NET/XW514124202/ARTICLE/DETAILS/94029564》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343036A (en) * | 2021-08-04 | 2021-09-03 | 杭州远眺科技有限公司 | Data blood relationship analysis method and system based on key topological structure analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11968264B2 (en) | Systems and methods for operation management and monitoring of bots | |
CN108847994B (en) | Alarm positioning method, device, equipment and storage medium based on data analysis | |
US11537492B1 (en) | Application topology graph for representing instrumented and uninstrumented objects in a microservices-based architecture | |
CN110232024B (en) | Software automation test framework and test method | |
EP3757793A1 (en) | Machine-assisted quality assurance and software improvement | |
US8978016B2 (en) | Error list and bug report analysis for configuring an application tracer | |
US20150347283A1 (en) | Multiple Tracer Configurations Applied on a Function-by-Function Level | |
US20140317454A1 (en) | Tracer List for Automatically Controlling Tracer Behavior | |
US20140317605A1 (en) | User Interaction Analysis of Tracer Data for Configuring an Application Tracer | |
US20140317604A1 (en) | Real Time Analysis of Tracer Summaries to Change Tracer Behavior | |
US10116534B2 (en) | Systems and methods for WebSphere MQ performance metrics analysis | |
US8688729B2 (en) | Efficiently collecting transaction-separated metrics in a distributed enviroment | |
US20160266998A1 (en) | Error list and bug report analysis for configuring an application tracer | |
US9588869B2 (en) | Computer implemented system and method of instrumentation for software applications | |
US20130047169A1 (en) | Efficient Data Structure To Gather And Distribute Transaction Events | |
Raemaekers et al. | An analysis of dependence on third-party libraries in open source and proprietary systems | |
WO2017172669A2 (en) | Tagged tracing, logging and performance measurements | |
US20180143897A1 (en) | Determining idle testing periods | |
CN110837496A (en) | Data quality management method and system based on dynamic sql | |
CN113762914A (en) | Early warning auditing method and related equipment | |
CN112925777A (en) | Method and system for detecting data blood margin of HIVE database | |
Zhao et al. | Research on international standardization of software quality and software testing | |
KR101830936B1 (en) | Performance Improving System Based Web for Database and Application | |
US20220294710A1 (en) | Automatic automation recommendation | |
Chakraborty et al. | Architecture of a modern monitoring system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210608 |
|
RJ01 | Rejection of invention patent application after publication |