CN113986707B

CN113986707B - Method for monitoring and controlling slow SQL based on big data kudu partition

Info

Publication number: CN113986707B
Application number: CN202111293515.1A
Authority: CN
Inventors: 于洋; 高经郡
Original assignee: Beijing Kejie Technology Co ltd
Current assignee: Beijing Kejie Technology Co ltd
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-06-14
Anticipated expiration: 2041-11-03
Also published as: CN113986707A

Abstract

The invention discloses a method for monitoring and controlling slow sql based on big data kudu partition, which carries out cluster classification physical partition division on a large amount of data and numerous classified data, more visually displays data state problems, memory space, partition number and the like existing in each partition system, simultaneously visually checks the slow sql existing in the system, dynamically configures a set time threshold by using a dictionary table, and queries a dynamic trend change diagram of the slow sql quantity exceeding the time threshold and a detailed list state of each slow sql running, so that a system maintainer can visually see the system data problems, quickly solves the problems of system slow link timeout, system accidental interruption and the like caused by the sql existing in the system, solves the problem of sql pain points existing in numerous systems, does not need to track through links, and is convenient and efficient.

Description

Method for monitoring and controlling slow SQL based on big data kudu partition

Technical Field

The invention relates to the technical field of KUDU, in particular to a method for monitoring and controlling slow sql based on big data KUDU subareas.

Background

In recent years, KUDU has become more widely used in large data platforms. And has an irreplaceable position. For characteristics of kudu, generally, such a massive data OLAP scene does not need a preprocessing scheme, for example, Cube management is performed like EBAY kylin, or predefined aggregation operation is performed according to business requirements like google Mesa. And moreover, a data channel is built by the system, and two systems of real-time processing and batch processing are connected in series, so that respective characteristics are exerted. Kudu is positioned in a rapid analysis type data warehouse for dealing with rapidly changing data, and hopefully supports application scenes (possible scenes such as time series data analysis and log data real-time monitoring analysis) which simultaneously need high throughput rate and random reading and writing by the self-capability of the system, provides a system between the performance characteristics of HDFS and HBase, finds a balance point between random reading and writing and batch scanning, and ensures stable and predictable response delay. There is currently a lack of an effective method for kudu partition monitoring and slow sql monitoring.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method for monitoring and controlling slow sql based on big data kudu partitions.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for monitoring and controlling slow sql based on big data kudu partition comprises the following specific processes:

the process of Kudu partition monitoring is as follows: when live partition scheduling is executed, live queues and tenants are configured, and then hdfs configuration is initialized according to different partitions configured by a hadoop cluster; then, according to hdfs configuration after initialization of different cluster partitions, performing kudu partition statistics to obtain kudu partition statistical information, judging whether the kudu partition statistical information is cm5 or cm6 according to the kudu partition statistical information, and obtaining corresponding partition tables and storage capacity according to the judgment result; inserting a kudu partition table, acquiring detailed statistical monitoring information of the kudu partition, acquiring summary statistical monitoring information of the kudu partition, and performing partition monitoring statistics and display on the conditions of different kudu partition tables according to the summary statistical monitoring information;

the process of monitoring and scheduling slow sql information by the kudu comprises the following steps: when the kudu executes monitoring scheduling, configuring Clouderagercontrolbuilder information according to the hadoop cluster node; and then querying a slow sql result set according to a slow query time threshold value and a filtering condition configured in the dictionary table, displaying the condition that slow sql exists in a set time period, and checking details.

Further, the configuration of the dictionary table about the slow query time threshold, the filtering condition and the set time period are all customized by the user.

Further, after a slow sql result set is obtained through query, a Kudu monitoring table obtained through monitoring by a Kudu partition is inserted in batches, details of the slow sql are obtained, the condition that the slow sql exists in a set time period is displayed, and details are checked.

The invention has the beneficial effects that:

1. the method of the invention can better deal with the slow sql query condition existing in each application system, and can carry out list expansion and analysis on the sql execution users of each slow query one by one, so that a maintainer can better optimize the problems existing in the system, improve the system performance, enable the system link to respond more quickly, and avoid a series of problems of link overtime, waiting, interruption and the like caused by the sql.

2. Aiming at different data types with large data volume, the invention adopts partition statistics, different cm acquisition partition tables and storage quantities are different, the invention can monitor the data of different partitions, monitor the data of the partitions with abnormal problems, and count different storage quantities, partition numbers and state data, so that a system maintainer can more visually see the data of the partitions with abnormal conditions, the storage quantities and the partition numbers of different partitions, optimize the data of the partitions with abnormal problems, optimize the data with large storage quantities, promote the quality optimization of the system data, and enable the system to run stably.

The invention carries out cluster classification physical partition on a large amount of classified data, more intuitively displays the data state problem, the memory amount, the partition number and the like existing in each partition system, can intuitively check the slow sql existing in the system, can dynamically configure the set time threshold value by applying the dictionary table, and inquires the dynamic trend change graph of the slow sql quantity exceeding the time threshold value and the running detail list state of each slow sql by taking the time threshold value as a reference, so that a system maintainer can intuitively see the system data problem, quickly solve the problems of system slow link overtime, system accidental interruption and the like caused by sql in the system, solve the sql pain point problem existing in a large number of systems, does not need to track through links, and is convenient and efficient.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that the present embodiment is based on the technical scheme, and a detailed implementation manner and a specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.

The embodiment provides a method for monitoring and controlling slow sql in a partition based on big data kudu, and as shown in fig. 1, the specific process is as follows:

During monitoring of Kudu partitioning: when hive partition scheduling is executed, hive queues and tenants are configured, and then hdfs configuration is initialized according to different partitions configured by a hadoop cluster; then, according to the initialized hdfs configuration of different cluster partitions, performing kudu partition statistics to obtain kudu partition statistical information, judging whether the kudu partition statistical information is cm5 or cm6 according to the kudu partition statistical information, and obtaining corresponding partition tables and storage capacity according to the judgment result; when a kudu partition table is inserted, acquiring detailed statistical monitoring information of the kudu partition, acquiring summary statistical monitoring information of the kudu partition, and performing partition monitoring statistics and display on different kudu partition table conditions according to the summary statistical monitoring information;

when the kudu monitors and schedules slow sql information: when the kudu executes monitoring scheduling, configuring Cloudera manager information according to the hadoop cluster node; then, according to the configured slow query time threshold value and the filtering condition of the dictionary table, querying a slow sql result set; and displaying the condition that slow sql exists in a set time period, and checking the details.

It should be noted that, the dictionary table is configured by the user in a customized manner with respect to the slow query time threshold, the filtering condition and the setting time period. And if the query time exceeds the slow query time threshold, the slow query is carried out.

Further, in the process of inquiring the slow sql result set, inserting the Kudu monitoring table obtained by Kudu partition monitoring in batches, obtaining details of the slow sql, displaying the condition that the slow sql exists in a set time period, and checking the details.

It should be noted that the slow sql existing in the set time period is displayed, and the checking details include a dynamic trend change diagram of the number of slow sql exceeding a slow time threshold and a state of a running detail list of each slow sql.

Various changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims

1. A method for monitoring and controlling slow sql on the basis of big data kudu partition is characterized by comprising the following specific processes:

the process of Kudu partition monitoring is: when hive partition scheduling is executed, hive queues and tenants are configured, and then hdfs configuration is initialized according to different partitions configured by a hadoop cluster; then, according to hdfs configuration after initialization of different cluster partitions, performing kudu partition statistics to obtain kudu partition statistical information, accordingly judging whether the kudu partition statistical information is cm5 or cm6, and obtaining a corresponding partition table and storage capacity according to the judgment result; inserting a kudu partition table, acquiring detailed statistical monitoring information of the kudu partition, acquiring summary statistical monitoring information of the kudu partition, and performing partition monitoring statistics and display on the conditions of different kudu partition tables according to the summary statistical monitoring information;

2. The method of claim 1, wherein the configuration of the dictionary tables with respect to slow query time thresholds, filtering conditions, and set time periods are customized by a user.

3. The method as claimed in claim 1, wherein after the slow sql result set is obtained through query, the Kudu monitoring table obtained through Kudu partition monitoring is inserted in batch, slow sql detail information is obtained, the situation that slow sql exists in a set time period is displayed, and details are checked.