CN110851325A - Method, device and equipment for monitoring data warehouse based on Hive table - Google Patents
Method, device and equipment for monitoring data warehouse based on Hive table Download PDFInfo
- Publication number
- CN110851325A CN110851325A CN201911089755.2A CN201911089755A CN110851325A CN 110851325 A CN110851325 A CN 110851325A CN 201911089755 A CN201911089755 A CN 201911089755A CN 110851325 A CN110851325 A CN 110851325A
- Authority
- CN
- China
- Prior art keywords
- hive
- data warehouse
- hive table
- generation process
- time period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 146
- 238000012544 monitoring process Methods 0.000 title claims abstract description 59
- 230000008569 process Effects 0.000 claims abstract description 101
- 230000002159 abnormal effect Effects 0.000 claims abstract description 26
- 238000004590 computer program Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 3
- 238000013515 script Methods 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000005192 partition Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002354 daily effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a device and equipment for monitoring a data warehouse based on a Hive table, wherein the method for monitoring the data warehouse based on the Hive table comprises the following steps: responding to an instruction of monitoring the data warehouse within a first preset time period by a user; wherein the data repository comprises a configuration database; acquiring a Hive metadata database in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata database in the configuration database; analyzing whether the generation process of the Hive table in a second preset time period has an error or not according to the generation process of the Hive table in the second preset time period; and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not. The data warehouse is monitored by adopting the generation process of the Hive table, so that the efficiency is improved, and the problem that the data warehouse is too complicated due to the fact that scripts are compiled to monitor is avoided.
Description
Technical Field
The invention relates to the field of data processing, in particular to a method, a device and equipment for monitoring a data warehouse based on a Hive table.
Background
Hive is a data warehouse tool based on Hadoop, can map structured data files into a database table, provides a simple SQL query function, and can convert SQL statements into MapReduce tasks for operation. The method has the advantages that the learning cost is low, simple MapReduce statistics can be quickly realized through SQL-like statements, special MapReduce application does not need to be developed, and the method is very suitable for statistical analysis of a data warehouse.
In order to monitor the data warehouse, daily generation of Hive tables needs to be acquired from a metadata base of Hive of the data warehouse every day. In the prior art, monitoring of the Hive table is usually completed by writing a script, and since daily monitoring needs to be realized by writing the script, the code length is long, and the maintenance of the script is not facilitated. That is, in the prior art, the generation process of the Hive table is not adopted to monitor the data warehouse.
Disclosure of Invention
The invention mainly aims to provide a method, a device and equipment for monitoring a data warehouse based on a Hive table, and aims to solve the problem that the data warehouse is not monitored by adopting the generation process of the Hive table in the related art.
A method of monitoring a data warehouse based on Hive tables, comprising:
responding to an instruction of monitoring the data warehouse within a first preset time period by a user; wherein the data repository comprises a configuration database;
acquiring a Hive metadata database in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata database in the configuration database;
analyzing whether the generation process of the Hive table in a second preset time period has an error or not according to the generation process of the Hive table in the second preset time period;
and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not.
Preferably, the step of responding to the instruction of the user to monitor the data warehouse within the first preset time period is preceded by:
acquiring a data warehouse required to be monitored;
presetting at least one configuration database in the data warehouse;
configuring a Hive table to be monitored in a configuration database according to preset rules.
Preferably, the step of configuring the Hive table to be monitored in the configuration database according to the preset rule includes:
scanning and extracting fields of the Hive table;
according to the fields extracted by the Hive table, performing descending processing on the Hive table;
and configuring the Hive table subjected to descending processing in the configuration database through a preset rule.
Preferably, after the step of obtaining the Hive metadata database in the configuration database and obtaining the generation process of the Hive table within the second preset time period according to the Hive metadata database in the configuration database, the method further includes:
according to the generation process of the Hive table in a second preset time period, at least one of the number of generated records, the generation mode and the size of a generated file in the generation process is obtained;
matching the determined at least one generation process with a corresponding preset threshold;
judging whether the determined at least one generation process reaches a preset threshold value;
determining that the data warehouse is in a normal state under the condition that at least one determined generation process reaches a preset threshold value;
and determining that the data warehouse is in an abnormal state under the condition that the determined at least one generation process does not reach the preset threshold value.
Preferably, the step of analyzing whether the generation process of the Hive table in the second preset time period has an error according to the generation process of the Hive table in the second preset time period includes:
acquiring a character set of the Hive table within a second preset time period;
analyzing whether the character set in the Hive table has messy codes or not;
and when the character set in the Hive table has messy codes, determining that the Hive table has error report and outputting error report information.
Preferably, after the step of determining that the Hive table has an error report and outputting error report information when the character set in the Hive table has a messy code, the method includes:
and responding to a Hive table repairing instruction input by a user, and configuring a MySQL character set in the corresponding Hive table.
Preferably, the method of monitoring a data warehouse further comprises:
acquiring a plurality of Hive tables based on a configuration database, and selecting at least two Hive tables according to the Hive tables;
respectively determining fields according to the selected at least two Hive tables;
comparing the fields of the two Hive tables, and outputting a comparison result;
and determining whether the data warehouse is abnormal or not according to the comparison result.
The invention also provides a device for monitoring the data warehouse based on the Hive table, which comprises the following components:
the response module is used for responding to an instruction of monitoring the data warehouse within a first preset time period;
the acquisition module is used for acquiring a Hive metadata base in a configuration database and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database;
the analysis module is used for analyzing whether the generation process of the Hive table in the second preset time period has an error according to the generation process of the Hive table in the second preset time period;
and the judging module is used for judging whether the data in the data warehouse is balanced or not according to the result of whether the error report is generated or not.
The invention also provides a device for monitoring a data warehouse based on a Hive table, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method for monitoring the data warehouse based on the Hive table.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method for monitoring a data warehouse based on a Hive table as described above.
The method for monitoring the data warehouse based on the Hive table, provided by the invention, at least has the following beneficial effects:
responding to an instruction of a user for monitoring the data warehouse within a first preset time period; wherein the data repository comprises a configuration database; acquiring a Hive metadata base in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database; analyzing whether the generation process of the Hive table in a second preset time period has an error or not according to the generation process of the Hive table in the second preset time period; and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not. The data warehouse is monitored by adopting the generation process of the Hive table, so that the efficiency is improved, and the problem that the data warehouse is too complicated due to the fact that scripts are compiled to monitor is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for monitoring a data warehouse based on a Hive table according to an embodiment of the present invention;
fig. 2 is an application scenario diagram of a method for monitoring a data warehouse based on a Hive table according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of the process before step S10 shown in FIG. 1;
FIG. 4 is another schematic flow chart of FIG. 1 after step S20;
FIG. 5 is a schematic flow chart diagram illustrating a method for monitoring a data warehouse according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of an apparatus for monitoring a data warehouse based on a Hive table according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an apparatus for monitoring a data warehouse based on a Hive table according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and "third," etc. in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
As shown in fig. 1, a method for monitoring a data warehouse based on a Hive table includes:
step S10, responding to an instruction of a user for monitoring the data warehouse within a first preset time period; wherein the data repository comprises a configuration database;
in an embodiment of the present invention, the server stores a large amount of data, and the server is managed by a data warehouse, and the data warehouse is composed of a database, metadata, a data mart, a data extraction tool, a data warehouse management system, an information publishing system and an access tool, that is, the data warehouse is a database, and integrates various application systems together, thereby providing a solid platform for uniform historical data analysis.
In this embodiment, the first preset time period may be 0 to 24 points per friday of the week, and correspondingly, the data warehouse may be well monitored and managed by inputting the monitoring instruction at any time point of friday of the user.
Step S20, acquiring a Hive metadata database in the configuration database, and acquiring a generation process of the Hive table within a second preset time period according to the Hive metadata database in the configuration database;
in a specific embodiment of the present invention, after receiving an instruction to monitor a data warehouse, a server acquires a Hive metadata base in a configuration database, so as to obtain a basic attribute and a generation process of a Hive table from the Hive metadata base, where the server acquires the generation process of the Hive table in a second preset time period from monday 0 to friday 24 of each week from the Hive metadata base, that is, when the server monitors the Hive table in the configuration database in a first preset time period, the generation process may be monitored at the same time.
The configuration database configured in the data warehouse comprises a Hive metadata base, specifically, Hive is a data warehouse tool based on Hadoop, and can map a structured data file into a table and provide a query function of SQL-like statements; the Hive metadata base is a basic element of Hive and mainly comprises basic attributes of Hive tables, such as database names, table names, field names and types, partition fields and types of Hive tables, partitions of tables, attribute locations of partitions, and the like.
Step S30, analyzing whether the generation process of the Hive table in the second preset time period has error according to the generation process of the Hive table in the second preset time period;
in a specific embodiment of the present invention, after the server acquires the generation process of the Hive table within the second preset time period, whether an error occurs in the generation process of the Hive table may be determined according to the generation process of the Hive table, and generally, the error in the generation of the Hive table has a character set with a messy code.
Therefore, the step S30 of analyzing whether the generation process of the Hive table in the second preset time period has an error according to the generation process of the Hive table in the second preset time period includes:
step S31, acquiring a character set of the Hive table in a second preset time period;
step S32, whether the character set in the Hive table has messy codes or not is analyzed;
and step S33, when the character set in the Hive table has messy codes, determining that the Hive table has error report and outputting error report information.
In the specific embodiment of the present invention, in the process of monitoring the data warehouse by the server, whether an error report occurs in the generation process of the Hive table is determined through the above steps, generally, the character set of the Hive table in the second preset time period is determined, and whether a messy code occurs in the corresponding character set is analyzed, so that when the messy code occurs in the character set, it can be determined that an error report occurs in the generation process of the Hive table, and thus error report information is sent for the user to modify.
Specifically, after the step S33, when the character set in the Hive table has a garbled code, determining that the Hive table has an error report and outputting error report information, the method includes:
and step S34, responding to the Hive table repair instruction input by the user, and configuring a MySQL character set in the corresponding Hive table.
In the embodiment, the instruction of the error reporting information of the Hive table is sent to the user, and the server receives the MySQL character set input by the user by responding to the Hive table repairing instruction input by the user, so that the server can repair the error reporting of the Hive table, and the data warehouse is prevented from being abnormal.
And step S40, judging whether the data in the data warehouse is abnormal according to the result of whether the analysis has error report.
In an embodiment of the present invention, after analyzing the result of whether the Hive table reports an error in step S30, in order to determine whether the data warehouse is abnormal, if an error occurs in the Hive table generating process, it is determined that the data warehouse is abnormal, and the error is repaired accordingly, so as to further monitor whether the data warehouse is abnormal based on whether the Hive table reports an error.
As shown in fig. 2, the above embodiment includes an application connection among a server, a user, and a data warehouse, where the data warehouse is configured with a configuration database in advance, a Hive metadata base is called from the configuration database, and the server can determine through a Hive table in the Hive metadata base, and specifically, after a user inputs an instruction to monitor the data warehouse, the server determines whether an error occurs in a generation process of the Hive table in the Hive metadata base, so as to monitor the data warehouse.
As shown in fig. 3, before the step of responding to the instruction of the user to monitor the data warehouse within the first preset time period in step S10, the method includes:
s01, acquiring a data warehouse needing to be monitored;
s02, presetting at least one configuration database in the data warehouse;
and S03, configuring the Hive table to be monitored in the configuration database according to the preset rule.
In a specific embodiment of the present invention, in order to monitor an anomaly of a data warehouse, at least one configuration database may be configured in advance in the data warehouse that needs to be monitored according to needs, and a Hive table that needs to be monitored is configured in the configuration database, so as to monitor the data warehouse through a generation process of the Hive table, and specifically, the preset rule may be Oracle connection configuration, MySQL connection configuration, SQL Server connection configuration DB2 connection configuration, PostgreSQL connection configuration, and Sybase connection configuration.
Further, the step S03 of configuring the Hive table to be monitored in the configuration database according to the preset rule includes:
step one, scanning and extracting fields of a Hive table;
step two, performing descending processing on the Hive table according to the fields extracted by the Hive table;
and step three, configuring the Hive table subjected to descending processing in a configuration database through a preset rule.
In a specific embodiment of the present invention, the Hive table is configured in the configuration database, the fields in the Hive table are scanned first, all the fields in the Hive table are scanned, and then extracted and processed in descending order, where scanning the fields in the Hive table may be performed according to actual needs, for example, when there are many fields, scanning is performed respectively, and when there are few fields, all the fields are scanned at a time, and further processed in descending order, and then configured in the configuration database.
As shown in fig. 4, after the step of acquiring the Hive metadata database in the configuration database in step S20, and acquiring the generation process of the Hive table within the second preset time period according to the Hive metadata database in the configuration database, the method further includes:
step S21, acquiring at least one of the number of generated records, the generation mode and the size of the generated file in the generation process according to the generation process of the Hive table in a second preset time period;
step S22, matching the at least one determined generation process with a corresponding preset threshold;
step S23, judging whether the at least one determined generation process reaches a preset threshold value;
step S24, determining that the data warehouse is in a normal state under the condition that at least one determined generation process reaches a preset threshold value;
and step S25, determining that the data warehouse is in an abnormal state under the condition that the determined at least one generation process does not reach the preset threshold value.
In the specific embodiment of the invention, the generation process of the Hive table comprises the generation record number, the generation mode and the generation file size, and whether the data warehouse is abnormal is judged by determining one of the generation processes; for example, if the generated record number is a and the preset threshold of the generated record number is H, it can be determined that the data warehouse is in a normal state if a reaches the preset threshold of H; and when a exceeds or is lower than H, the data warehouse is in an abnormal state.
In actual requirements, in order to match at least one determined generation process with a corresponding preset threshold, the preset threshold is usually set corresponding to the generation process, for example, the preset threshold of the generated file size may be 100GB, and the determined generated file size needs to reach 100GB, and may be specifically adjusted according to actual needs, which is not limited herein.
In addition, as shown in fig. 5, the method for monitoring a data warehouse further includes:
and step 400, determining whether the data warehouse is abnormal according to the comparison result.
In the specific embodiment of the present invention, for the monitoring method for the data warehouse, two Hive tables in the configuration database may be selected, and the fields in the two Hive tables are compared, for example, it is determined that the field in the Hive table 1 has a user ID, an order, and a style, it is determined that the field in the Hive table 2 has a user ID, a price, and an order, and it is determined whether the generation process of the Hive table is correct by comparing the fields of the two Hive tables, so as to determine whether the data warehouse is abnormal.
The data warehouse generally stores decoration style data, price intention data, preset information data, area size data and the like of a user, safety and stability of the data are guaranteed through management of the data warehouse and monitoring of the data warehouse based on a Hive table, efficiency can be improved when the user checks the data, and the data warehouse is prevented from being abnormal and cannot be found in time.
Therefore, in the above embodiment of the present invention, the instruction for monitoring the data warehouse within the first preset time period is responded; wherein the data repository comprises a configuration database; acquiring a Hive metadata base in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database; analyzing whether the generation process of the Hive table in a second preset time period has an error or not according to the generation process of the Hive table in the second preset time period; and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not. The data warehouse is monitored by adopting the generation process of the Hive table, so that the efficiency is improved, and the problem that the data warehouse is too complicated due to the fact that scripts are compiled to monitor is avoided.
As shown in fig. 6, the present invention further provides an apparatus 2 for monitoring a data warehouse based on Hive tables, including:
the response module 21 is configured to respond to an instruction of a user to monitor the data warehouse within a first preset time period;
the obtaining module 22 is configured to obtain a Hive metadata base in the configuration database, and obtain a generation process of the Hive table within a second preset time period according to the Hive metadata base in the configuration database;
the analysis module 23 is configured to analyze whether an error occurs in the generation process of the Hive table in the second preset time period according to the generation process of the Hive table in the second preset time period;
and the judging module 24 is configured to judge whether the data in the data warehouse is balanced according to the result of whether the error report occurs in the analysis.
As shown in fig. 7, the present invention further provides an apparatus for monitoring a data warehouse based on a Hive table, which includes a memory 11, a processor 13, and a computer program 12 stored in the memory 11 and executable on the processor 13, where the processor 13 executes the computer program 12 to implement the steps of the method for monitoring a data warehouse based on a Hive table as described above.
Specifically, the processor 13 implements the following steps when executing the computer program 12: responding to an instruction of monitoring the data warehouse within a first preset time period by a user; wherein the data warehouse comprises a configuration database; acquiring a Hive metadata base in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database; analyzing whether the generation process of the Hive table in the second preset time period has an error or not according to the generation process of the Hive table in the second preset time period; and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not.
Specifically, the processor 13 implements the following steps when executing the computer program 12: acquiring a data warehouse required to be monitored; presetting at least one configuration database in a data warehouse; configuring a Hive table to be monitored in a configuration database according to preset rules.
Specifically, the processor 13 implements the following steps when executing the computer program 12: scanning and extracting fields of the Hive table; according to the fields extracted from the Hive table, performing descending processing on the Hive table; and configuring the Hive table subjected to descending processing in a configuration database through a preset rule.
Specifically, the processor 13 implements the following steps when executing the computer program 12: acquiring at least one of the number of generated records, the generation mode and the size of a generated file in the generation process according to the generation process of the Hive table in a second preset time period; matching the determined at least one generation process with a corresponding preset threshold; judging whether the determined at least one generation process reaches a preset threshold value; determining that the data warehouse is in a normal state under the condition that at least one determined generation process reaches a preset threshold value; and determining that the data warehouse is in an abnormal state under the condition that the determined at least one generation process does not reach the preset threshold value.
Specifically, the processor 13 implements the following steps when executing the computer program 12: acquiring a character set of the Hive table within a second preset time period; analyzing whether the character set in the Hive table has messy codes or not; and when the character set in the Hive table has messy codes, determining that the Hive table has error report and outputting error report information.
Specifically, the processor 13 implements the following steps when executing the computer program 12: and responding to a Hive table repairing instruction input by a user, and configuring a MySQL character set in the corresponding Hive table.
Specifically, the processor 13 implements the following steps when executing the computer program 12: acquiring a plurality of Hive tables based on a configuration database, and selecting at least two Hive tables according to the Hive tables; respectively determining fields according to the selected at least two Hive tables; comparing the fields of the two Hive tables, and outputting a comparison result; and determining whether the data warehouse is abnormal according to the comparison result.
That is, in the embodiment of the present invention, when the processor 13 of the device 1 for monitoring a data warehouse based on a Hive table executes the computer program 12, the steps of the method for monitoring a data warehouse based on a Hive table are implemented, and the data warehouse is monitored by using the generation process of the Hive table, so that the efficiency is improved, and the data warehouse is prevented from being too cumbersome due to script writing for monitoring the data warehouse.
It should be noted that, since the processor 13 of the device 1 for monitoring a data warehouse based on a Hive table implements the steps of the method for monitoring a data warehouse based on a Hive table when executing the computer program 12, all the embodiments of the method for monitoring a data warehouse based on a Hive table are applicable to the device 1 for monitoring a data warehouse based on a Hive table, and can achieve the same or similar beneficial effects.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method for monitoring a data warehouse based on a Hive table as described above.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: responding to an instruction of monitoring the data warehouse within a first preset time period by a user; wherein the data warehouse comprises a configuration database; acquiring a Hive metadata base in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database; analyzing whether the generation process of the Hive table in the second preset time period has an error or not according to the generation process of the Hive table in the second preset time period; and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: acquiring a data warehouse required to be monitored; presetting at least one configuration database in a data warehouse; configuring a Hive table to be monitored in a configuration database according to preset rules.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: scanning and extracting fields of the Hive table; according to the fields extracted from the Hive table, performing descending processing on the Hive table; and configuring the Hive table subjected to descending processing in a configuration database through a preset rule.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: acquiring at least one of the number of generated records, the generation mode and the size of a generated file in the generation process according to the generation process of the Hive table in a second preset time period; matching the determined at least one generation process with a corresponding preset threshold; judging whether the determined at least one generation process reaches a preset threshold value; determining that the data warehouse is in a normal state under the condition that at least one determined generation process reaches a preset threshold value; and determining that the data warehouse is in an abnormal state under the condition that the determined at least one generation process does not reach the preset threshold value.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: acquiring a character set of the Hive table within a second preset time period; analyzing whether the character set in the Hive table has messy codes or not; and when the character set in the Hive table has messy codes, determining that the Hive table has error report and outputting error report information.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: and responding to a Hive table repairing instruction input by a user, and configuring a MySQL character set in the corresponding Hive table.
In particular, in a particular embodiment of the invention, the computer readable storage medium, when executed by the processor, performs the steps of: acquiring a plurality of Hive tables based on a configuration database, and selecting at least two Hive tables according to the Hive tables; respectively determining fields according to the selected at least two Hive tables; comparing the fields of the two Hive tables, and outputting a comparison result; and determining whether the data warehouse is abnormal according to the comparison result.
That is, in the embodiment of the present invention, when being executed by a processor, a computer program implements the steps of the above method for monitoring a data warehouse based on a Hive table, and monitors the data warehouse by using a generation process of the Hive table, thereby improving efficiency and avoiding that the data warehouse is too complicated due to compiling a script to monitor the data warehouse.
It should be noted that, since the computer program is executed by the processor to implement the steps of the method for monitoring a data warehouse based on a Hive table, all the embodiments of the method for monitoring a data warehouse based on a Hive table are applicable to the computer-readable storage medium, and can achieve the same or similar advantages.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a smart speaker, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method for monitoring a data warehouse based on Hive tables is characterized by comprising the following steps:
responding to an instruction of monitoring the data warehouse within a first preset time period by a user; wherein the data repository comprises a configuration database;
acquiring a Hive metadata database in a configuration database, and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata database in the configuration database;
analyzing whether the generation process of the Hive table in a second preset time period has an error or not according to the generation process of the Hive table in the second preset time period;
and judging whether the data in the data warehouse is abnormal or not according to the result of whether the error report is generated or not.
2. The Hive table-based data warehouse method of claim 1, wherein the step of responding to a user instruction to monitor the data warehouse for a first preset time period is preceded by the step of:
acquiring a data warehouse required to be monitored;
presetting at least one configuration database in the data warehouse;
configuring a Hive table to be monitored in a configuration database according to preset rules.
3. The method for monitoring a data warehouse based on Hive tables according to claim 2, wherein the step of configuring the Hive tables to be monitored in the configuration database according to preset rules comprises:
scanning and extracting fields of the Hive table;
according to the fields extracted by the Hive table, performing descending processing on the Hive table;
and configuring the Hive table subjected to descending processing in the configuration database through a preset rule.
4. The method for monitoring a data warehouse based on Hive tables according to claim 1, wherein the step of obtaining the Hive metadata database in the configuration database and obtaining the generation process of Hive tables within a second preset time period according to the Hive metadata database in the configuration database further comprises:
according to the generation process of the Hive table in a second preset time period, at least one of the number of generated records, the generation mode and the size of a generated file in the generation process is obtained;
matching the determined at least one generation process with a corresponding preset threshold;
judging whether the determined at least one generation process reaches a preset threshold value;
determining that the data warehouse is in a normal state under the condition that at least one determined generation process reaches a preset threshold value;
and determining that the data warehouse is in an abnormal state under the condition that the determined at least one generation process does not reach the preset threshold value.
5. The method for monitoring a data warehouse based on Hive table according to claim 1, wherein the step of analyzing whether the generation process of the Hive table in the second preset time period has an error according to the generation process of the Hive table in the second preset time period comprises:
acquiring a character set of the Hive table within a second preset time period;
analyzing whether the character set in the Hive table has messy codes or not;
and when the character set in the Hive table has messy codes, determining that the Hive table has error report and outputting error report information.
6. The Hive table-based data warehouse method according to claim 5, wherein the step of determining that the Hive table has an error report and outputting error report information when the character sets in the Hive table have garbled codes comprises:
and responding to a Hive table repairing instruction input by a user, and configuring a MySQL character set in the corresponding Hive table.
7. The Hive table based method for monitoring a data warehouse of claim 1, further comprising:
acquiring a plurality of Hive tables based on a configuration database, and selecting at least two Hive tables according to the Hive tables;
respectively determining fields according to the selected at least two Hive tables;
comparing the fields of the two Hive tables, and outputting a comparison result;
and determining whether the data warehouse is abnormal or not according to the comparison result.
8. An apparatus for monitoring a data warehouse based on Hive tables, comprising:
the response module is used for responding to an instruction of monitoring the data warehouse within a first preset time period;
the acquisition module is used for acquiring a Hive metadata base in a configuration database and acquiring a generation process of a Hive table within a second preset time period according to the Hive metadata base in the configuration database;
the analysis module is used for analyzing whether the generation process of the Hive table in the second preset time period has an error according to the generation process of the Hive table in the second preset time period;
and the judging module is used for judging whether the data in the data warehouse is balanced or not according to the result of whether the error report is generated or not.
9. An apparatus for monitoring a data store based on Hive tables, comprising a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the computer program, when executed by the processor, implements the steps of the method for monitoring a data store based on Hive tables as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for monitoring a data warehouse based on a Hive table according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911089755.2A CN110851325B (en) | 2019-11-08 | 2019-11-08 | Method, device and equipment for monitoring data warehouse based on Hive table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911089755.2A CN110851325B (en) | 2019-11-08 | 2019-11-08 | Method, device and equipment for monitoring data warehouse based on Hive table |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110851325A true CN110851325A (en) | 2020-02-28 |
CN110851325B CN110851325B (en) | 2024-03-15 |
Family
ID=69600107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911089755.2A Active CN110851325B (en) | 2019-11-08 | 2019-11-08 | Method, device and equipment for monitoring data warehouse based on Hive table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110851325B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095056A (en) * | 2015-08-14 | 2015-11-25 | 焦点科技股份有限公司 | Method for monitoring data in data warehouse |
CN108090138A (en) * | 2017-11-29 | 2018-05-29 | 链家网(北京)科技有限公司 | The monitoring method and system of a kind of data warehouse |
CN109902507A (en) * | 2019-01-08 | 2019-06-18 | 河南智业科技发展有限公司 | A kind of data warehouse monitoring system |
-
2019
- 2019-11-08 CN CN201911089755.2A patent/CN110851325B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105095056A (en) * | 2015-08-14 | 2015-11-25 | 焦点科技股份有限公司 | Method for monitoring data in data warehouse |
CN108090138A (en) * | 2017-11-29 | 2018-05-29 | 链家网(北京)科技有限公司 | The monitoring method and system of a kind of data warehouse |
CN109902507A (en) * | 2019-01-08 | 2019-06-18 | 河南智业科技发展有限公司 | A kind of data warehouse monitoring system |
Also Published As
Publication number | Publication date |
---|---|
CN110851325B (en) | 2024-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107678907B (en) | Database service logic monitoring method, system and storage medium | |
AU2017101864A4 (en) | Method, device, server and storage apparatus of reviewing SQL | |
US8463822B2 (en) | Data merging in distributed computing | |
CN109508352B (en) | Report data output method, device, equipment and storage medium | |
US11809406B2 (en) | Event records in a log file | |
CN106293891B (en) | Multidimensional investment index monitoring method | |
CN111338693B (en) | Model construction-based target file generation method, server and storage medium | |
CN110659282A (en) | Data route construction method and device, computer equipment and storage medium | |
CN108572945A (en) | Create method, system, storage medium and the electronic equipment of report | |
CN110275878A (en) | Business datum detection method, device, computer equipment and storage medium | |
CN112631754A (en) | Data processing method, data processing device, storage medium and electronic device | |
CN111708756A (en) | Method, device and equipment for automatically processing data warehouse and storage medium | |
CN112596723B (en) | Database script generation method, device, equipment and medium | |
CN117171364B (en) | Operation and maintenance knowledge graph updating method and device | |
CN107908525B (en) | Alarm processing method, equipment and readable storage medium | |
CN110502557B (en) | Data importing method, device, computer equipment and storage medium | |
CN110851325B (en) | Method, device and equipment for monitoring data warehouse based on Hive table | |
CN115658443B (en) | Log filtering method and device | |
WO2019080419A1 (en) | Method for building standard knowledge base, electronic device, and storage medium | |
CN111400289A (en) | Intelligent user classification method, server and storage medium | |
CN110837509A (en) | Method, device, equipment and storage medium for scheduling dependence | |
CN113672497B (en) | Method, device and equipment for generating non-buried point event and storage medium | |
CN116340424A (en) | Method, device, equipment and storage medium for separately storing report data | |
CN115391141A (en) | Database flow analysis method, device, equipment and readable storage medium | |
CN113238901B (en) | Multi-device automatic testing method and device, storage medium and computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 518000 R & D building 3501, block a, building 7, Vanke Cloud City Phase I, Xingke 1st Street, Xili community, Xili street, Nanshan, Shenzhen, Guangdong Applicant after: Tubatu Group Co.,Ltd. Address before: 1001-a, 10th floor, bike technology building, No.9, Keke Road, high tech Zone, Nanshan District, Shenzhen, Guangdong 518000 Applicant before: SHENZHEN BINCENT TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |