CN111897802B

CN111897802B - Database container fault positioning method and system

Info

Publication number: CN111897802B
Application number: CN202010794362.8A
Authority: CN
Inventors: 张晓娜; 暨光耀; 张�浩; 傅媛媛
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2023-08-04
Anticipated expiration: 2040-08-10
Also published as: CN111897802A

Abstract

The invention provides a method and a system for positioning faults of a database container, and belongs to the technical field of artificial intelligence. The database container fault positioning method comprises the following steps: acquiring the performance characteristics and the current SQL characteristics of a current database container positioned in the same time window; inputting the performance characteristics of the current database container into a database container decision tree model created based on the performance characteristics of the historical database container and the identification results of the historical database container to obtain the identification results of the current database container; inputting the current SQL feature into an SQL decision tree model created based on the historical SQL feature and the historical SQL recognition result to obtain a current SQL recognition result; when the current database container identification result is abnormal, judging whether the current SQL identification result is abnormal; and uploading the first fault location information when the current SQL identification result is normal, otherwise uploading the second fault location information. The invention can accurately and efficiently identify and locate the fault of the database container.

Description

Database container fault positioning method and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a database container fault positioning method and system.

Background

At present, the analysis of database performance problems by production operation staff is generally divided into two steps: firstly, determining that the database container has performance bottleneck by monitoring and finding that the related performance index exceeds a set experience threshold value; then, the expert analyzes the inefficient SQL (Structured Query Language ) which may be problematic, and further locates the cause of the problem. In practical application, the method can basically meet the daily operation and maintenance of the database. However, whether the performance of the database container is problematic is not merely a simple combination of a single indicator exceeding a threshold or multiple indicators exceeding a threshold, but rather a comprehensive representation of the database container-related performance indicators. Therefore, by setting an empirical threshold to judge the performance problem of the database container, misjudgment and missed judgment may occur, and the accuracy is difficult to ensure. Furthermore, database performance problems often have a certain complexity, where the SQL statements involved can be hundreds or thousands, from which it is not easy to identify inefficient SQL that causes performance bottlenecks. Therefore, whether the final performance problem causes can be precisely located depends largely on experience accumulation and technical ability of the relevant operation and maintenance personnel. For a small part of high-grade production operation staff with abundant experience and super technical capability, most of performance problems can be accurately positioned and solved; however, for most common first-line production operators, the situation may not be imagined as optimistic, and situations may occur in which it is time-consuming and laborious and still not possible to locate a specific cause of the problem or where the analysis is inaccurate.

Disclosure of Invention

The embodiment of the invention mainly aims to provide a database container fault positioning method and system so as to accurately and efficiently identify and position a database container fault.

In order to achieve the above object, an embodiment of the present invention provides a method for locating a database container fault, including:

acquiring the performance characteristics and the current SQL characteristics of a current database container positioned in the same time window;

inputting the performance characteristics of the current database container into a database container decision tree model created based on the performance characteristics of the historical database container and the identification results of the historical database container under a random forest algorithm to obtain the identification results of the current database container;

inputting the current SQL feature into an SQL decision tree model created based on the historical SQL feature and the historical SQL recognition result under a random forest algorithm to obtain a current SQL recognition result;

when the current database container identification result is abnormal, judging whether the current SQL identification result is abnormal;

and uploading the first fault location information when the current SQL identification result is normal, otherwise uploading the second fault location information.

The embodiment of the invention also provides a database container fault positioning system, which comprises:

The acquisition unit is used for acquiring the performance characteristics and the current SQL characteristics of the current database container positioned in the same time window;

the database container identification result unit is used for inputting the performance characteristics of the current database container into a database container decision tree model created based on the performance characteristics of the historical database container and the identification results of the historical database container under a random forest algorithm to obtain the identification results of the current database container;

the SQL recognition result unit is used for inputting the current SQL feature into an SQL decision tree model created based on the historical SQL feature and the historical SQL recognition result under a random forest algorithm to obtain the current SQL recognition result;

the judging unit is used for judging whether the current SQL identification result is abnormal or not when the current database container identification result is abnormal;

and the uploading unit is used for uploading the first fault locating information when the current SQL identification result is normal, or uploading the second fault locating information.

The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the steps of the database container fault positioning method when executing the computer program.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the database container fault locating method.

According to the database container fault positioning method and system, the current database container performance characteristics and the current SQL characteristics of the same time window are respectively input into the respective decision tree models to obtain the current database container identification result and the current SQL identification result, and when the current database container identification result is abnormal, fault positioning information is uploaded according to the current SQL identification result, so that the database container fault can be accurately and efficiently identified and positioned.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of database container fault localization in an embodiment of the present invention;

FIG. 2 is a flow chart of creating a database container decision tree model in an embodiment of the invention;

FIG. 3 is a flow chart of creating an SQL decision tree model in an embodiment of the invention;

FIG. 4 is a flow chart of determining database container performance characteristics in an embodiment of the invention;

FIG. 5 is a flow chart of determining SQL features in an embodiment of the invention;

FIG. 6 is a flow chart of a method of database container fault localization in another embodiment of the present invention;

FIG. 7 is a flow chart of data acquisition in an embodiment of the invention;

FIG. 8 is a flow chart of determining database container performance characteristics in an embodiment of the invention;

FIG. 9 is a flow chart of determining SQL features in an embodiment of the invention;

FIG. 10 is a flow chart of creating a database container decision tree model in accordance with another embodiment of the present invention;

FIG. 11 is a flow chart of a method of database container fault localization in yet another embodiment of the present invention;

FIG. 12 is a schematic diagram of fault location information in an embodiment of the present invention;

FIG. 13 is a block diagram of a database container fault localization system in accordance with an embodiment of the present invention;

fig. 14 is a block diagram showing the structure of a computer device in the embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Those skilled in the art will appreciate that embodiments of the invention may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

In view of the fact that the specific problem causes cannot be located or analysis is inaccurate in the prior art, the embodiment of the invention provides a database container fault locating method, so that the database container faults can be accurately and efficiently identified and located. The present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a flow chart of a method of database container fault localization in an embodiment of the present invention. FIG. 6 is a flow chart of a method of database container fault localization in another embodiment of the present invention. FIG. 11 is a flow chart of a method of database container fault localization in yet another embodiment of the present invention. Fig. 12 is a schematic diagram of fault location information in an embodiment of the present invention. As shown in fig. 1, 6, 11 and 12, the database container fault localization method includes:

s101: the method comprises the steps of acquiring the performance characteristics and the current SQL characteristics of a current database container located in the same time window.

In particular, the data analysis module selects data within the same time window t_intv_i from the database container performance feature record table and the SQL feature record table, respectively. For example, the Data selected from the database container performance characteristic record table is Data4DB, the Data selected from the SQL characteristic record table is Data4MYSQL, wherein only one Data in the Data4DB is available, and the Data4MYSQL may have a plurality of Data which all belong to the same time window, for example, all belong to the time window T ₃₁ -T ₆₀ 。

S102: and inputting the performance characteristics of the current database container into a database container decision tree model created based on the performance characteristics of the historical database container and the identification results of the historical database container under a random forest algorithm to obtain the identification results of the current database container.

In the implementation, m decision trees of the database container decision tree model can obtain m database container recognition results, and the final current database container recognition result is determined by voting according to the m database container recognition results in a few rules obeying majority. For example, if the number of database container recognition results are abnormal is greater than the number of database container recognition results are normal, the current database container recognition result is abnormal.

S103: and inputting the current SQL feature into an SQL decision tree model created based on the historical SQL feature and the historical SQL recognition result under a random forest algorithm to obtain the current SQL recognition result.

In the implementation, m ' decision trees of the SQL decision tree model can obtain m ' SQL recognition results, and the final current SQL recognition result is determined by voting according to m ' SQL recognition results in a minority rule obeying majority rule. For example, if the number of abnormal SQL recognition results is greater than the number of normal SQL recognition results, the current SQL recognition result is abnormal.

S104: when the current database container identification result is abnormal, judging whether the current SQL identification result is abnormal or not.

S105: and uploading the first fault location information when the current SQL identification result is normal, otherwise uploading the second fault location information.

In the specific implementation, the identification result can be sent to related operation and maintenance personnel in the modes of telephone, short message, mail and the like, and the identification result consists of the following data information:

time database container ID is a performance bottleneck suspicious inefficiency SQL.

Wherein, the time is a time point when a performance bottleneck problem (the current database container recognition result is abnormal), the database container ID uniquely identifies the database container with the performance bottleneck, and whether the performance bottleneck exists is used for identifying whether the database container has the performance problem, and the suspicious inefficient SQL is an inefficient SQL statement (abnormal SQL recognition result) causing the performance problem of the database container.

When the current database container identification result is abnormal but the current SQL identification result is normal, the database resource is excessively used due to the fact that the database request quantity is too high, and at the moment, the uploaded first fault positioning information comprises a performance analysis result of the database container, the actual line number of the container, the set maximum line number of the container, the actual connection number of the database and the set maximum connection number of the database, and is used for reminding operation and maintenance personnel to evaluate the service transaction pressure and expand the database container in time.

When the current database container identification result and the current SQL identification result are abnormal, the uploaded second fault location information comprises a database container with performance problems, an IP port with a database instance with low-efficiency SQL, index names, index fields, index field ordering and the like of the low-efficiency SQL and related tables of the low-efficiency SQL. When the current database container identification result is normal but the current SQL identification result is abnormal, the second fault locating information can be uploaded to the operation and maintenance personnel.

The database container fault location method shown in fig. 1 may be implemented by a computer. As can be seen from the flow shown in fig. 1, the database container fault locating method in the embodiment of the present invention inputs the current database container performance feature and the current SQL feature of the same time window into respective decision tree models to obtain the current database container identification result and the current SQL identification result, and uploads the fault locating information according to the current SQL identification result when the current database container identification result is abnormal, so that the database container fault can be accurately and efficiently identified and located.

For example, when the performance data analysis module selects the data of the T time window from the database container performance feature record table as input to identify whether a performance bottleneck exists (the current database container identification result is abnormal), the MYSQL slow log analysis module also selects the data of the T time window from the SQL feature record table as input to identify whether an inefficient SQL exists at this time (the current SQL identification result is abnormal), and if the inefficient SQL is identified, the inefficient SQL is the SQL that causes the database container to have the performance bottleneck, so that the identification positioning and processing efficiency of the database container performance problem can be greatly improved.

FIG. 2 is a flow chart of creating a database container decision tree model in an embodiment of the invention. FIG. 10 is a flow chart of creating a database container decision tree model in accordance with another embodiment of the present invention. As shown in fig. 2 and 10, creating a database container decision tree model includes:

s201: a historic database container sample is obtained.

Wherein the historic database container sample includes historic database container performance characteristics and historic database container identification results.

Table 1 is a historical database container sample table. As shown in table 1, the historic database container samples include cpu_ratio, disk_io, disk_rt, memory_ Ratio, network _io, thread_number, max_thread, and Y1/N1, respectively representing CPU usage, disk IO response time, memory usage, network IO, actual Thread count, maximum Thread count, and historic database container identification result. Y1 indicates that the database container corresponding to the performance characteristic of the historical database container has a performance bottleneck (the identification result of the database container is abnormal), and N1 indicates that the database container has no performance bottleneck (the identification result of the database container is normal).

TABLE 1

S202: and selecting a first preset number of samples from the historical database container samples as a database container training set.

For example, n samples may be randomly selected from the historical database container samples as the database container training set in a manner that places back random samples.

S203: determining the segmentation nodes of the database container decision tree according to the historical database container identification result and the historical database container performance characteristics in the database container training set.

S204: creating a database container decision tree model according to the segmentation nodes of the database container decision tree.

In specific implementation, the ID3 algorithm can be used for feature selection to generate a database container decision tree model. For the database container training set, four features can be selected from seven features of CPU usage, disk IO response time, memory usage, network IO, actual thread number and maximum thread number, wherein the CPU usage and the memory usage are necessary options, and the other two features are randomly selected from the other five features.

After selecting the features, the ID3 algorithm is utilized to select the features with the maximum information gain from the features as the segmentation nodes to construct a decision tree. The database container decision tree model includes m database container decision trees (m is an odd number). The ID3 algorithm is used for calculating the conditional entropy of each feature in the database container training set to obtain the information gain of the feature, and finally, the feature with the maximum information gain is selected as a node to split the sample.

For information entropy, assume that the proportion of samples with the identification result of the ith class in the n samples is p _i The information entropy is defined as follows:

where Info (n) is the information entropy of n samples, and c is the number of samples. The smaller the value of Info (n), the higher the purity of the sample. For example, for a database container training set, if the history database container recognition result is 20% of records of Y1 and 80% of records of N1, then:

Info(n)＝-(0.2×log ₂ 0.2+0.8×log ₂ 0.8)＝0.7219。

for conditional entropy, the data is divided by the feature A in n samples, the n samples are divided into k parts, each part corresponds to one attribute of the feature A, and the number of samples in the j-th part is n _j The conditional entropy of n samples under feature a conditions is defined as follows:

wherein, info _A (n) characterization of n samplesConditional entropy under A, info (n _j ) Entropy of the sample data of the j-th part. For example, the feature of CPU utilization rate is selected when feature segmentation is performed on the database container training set, the CPU utilization rate is divided into 5 parts, namely 0-20%, 21-40%, 41-60%, 61-80% and 81-100%, and corresponding samples are n respectively ₁ 、n ₂ 、n ₃ 、n ₄ And n ₅ With |n _i I represents the i-th sample n _i The conditional entropy obtained by sample segmentation under the condition of the feature of CPU utilization rate is:

When the data is segmented by the feature a, the information gain is defined as follows:

Gain(A)＝Info(n)-Info _A (n)。

recursively performing the above steps may generate a database container decision tree.

FIG. 3 is a flow chart of creating an SQL decision tree model in an embodiment of the invention. As shown in fig. 3, creating the SQL decision tree model includes:

s301: a historical SQL sample is obtained.

The historical SQL sample comprises historical SQL features and historical SQL identification results.

Table 2 is a historical SQL sample table. As shown in table 2, the historical SQL samples include sql_duration, sql_locktime, sql_lownum, sql_scan_lownum, and Y2/N2, which represent the SQL time consumption, the SQL lock time, the number of rows sent by the SQL, the number of rows scanned by the SQL, and the historical SQL recognition result, respectively. Y2 indicates that the SQL corresponding to the historical SQL feature is inefficient (the SQL recognition result is abnormal), and N2 indicates that the SQL is not inefficient (the SQL recognition result is normal).

TABLE 2

S302: and selecting a second preset number of samples from the historical SQL samples as an SQL training set.

For example, n' samples may be randomly selected from the historical SQL samples as the SQL training set in a manner that places back random samples.

S303: and determining the segmentation nodes of the SQL decision tree according to the historical SQL recognition results and the historical SQL features in the SQL training set.

S304: and creating an SQL decision tree model according to the segmentation nodes of the SQL decision tree.

In specific implementation, the ID3 algorithm can be used for feature selection to generate an SQL decision tree model. For the SQL training set, three features can be selected from four features, namely SQL time consumption, SQL locking time, the number of lines sent by SQL and the number of lines scanned by SQL, wherein the SQL time consumption and the SQL locking time are necessary options, and the other feature randomly selects one from the number of lines sent by SQL and the number of lines scanned by SQL. The creation flow of the SQL decision tree model can refer to the creation flow of the database container decision tree model as described above. The SQL decision tree model comprises m 'SQL decision trees (m' is an odd number).

FIG. 4 is a flow chart of determining database container performance characteristics in an embodiment of the invention. FIG. 7 is a flow chart of data acquisition in an embodiment of the invention. FIG. 8 is a flow chart of determining database container performance characteristics in an embodiment of the invention. As shown in fig. 4, 7 and 8, the database container fault locating method further includes:

s401: database container performance data is collected.

Wherein the database container performance data includes historical database container performance data and current database container performance data. The data acquisition module for acquiring the database container performance data is automatically started along with the database container, and a database container performance data acquisition thread is created after the data acquisition module is started to acquire the database container performance data.

In specific implementation, a data collection agent (agent) can be deployed in the database container, and the performance data of the database container can be collected and transmitted to the data storage module. The data collection agent contains a database instance configuration file for configuring instance names and ports of the database instances under the container. The data acquisition period can be set according to the practical application, for example, 1 acquisition every 10 seconds; the database container performance data includes container IP address, CPU (central processing unit) usage, disk IO (input output), disk IO response time, memory usage, network IO, actual number of threads, and maximum number of threads.

The data storage module receives and caches the database container performance data acquired by the data acquisition agent, and stores the database container performance data in a database container performance data record table according to the ascending order of the acquisition time, and each database container performance data has corresponding acquisition time. The stored database container performance data is periodically cleaned by a system administrator setting a cleaning cycle. For example, the data is cleaned once a week if the data is cleaned every week, and only the data of the last week is reserved so as to avoid occupying excessive disk space. Table 3 is a database container performance data record table. As shown in Table 3, T ₁ The host_ip1, cpu_ratio, disk_io, disk_rt, memory_ Ratio, network _io, thread_number, and max_thread represent database container performance data collection times, container IP addresses, CPU utilization, disk IO response time, memory utilization, network IO, actual Thread count, and maximum Thread count, respectively.

TABLE 3 Table 3

S402: and denoising and median filling the database container performance data.

The data denoising can be performed by adopting a normal distribution 3 sigma principle, the collected database container performance data is idealized into normal distribution, and the model is as follows:

wherein σ is the standard deviation of the database container performance data, μ is the mean of the database container performance data, and x is the database container performance data.

Noise data can be understood as small probability data relative to normal data. By utilizing the characteristic that the probability of normal distribution 'x falling outside (mu-3 sigma, mu+3 sigma) is less than three thousandths', processing the performance data of each database container, calculating the standard deviation of the performance data of the database container, and rejecting the data less than mu-3 sigma and greater than mu+3 sigma as noise data.

And the median filling treatment is to fill two data average values adjacent to each other. For example, at T ₁ To T ₁₀ During this time, T is deleted ₆ Taking T from the time data ₅ And T ₇ Mean filling of time of day data as T ₆ Time of day data; if T is deleted ₆ 、T ₇ And T ₈ Data at three moments is taken as T ₅ And T ₉ Mean filling T of time data ₇ Time data, then take T ₅ And T ₇ Mean filling T of time data ₆ Time data, take T ₇ And T ₉ Mean filling T of time data ₈ Time of day data.

S403: and dividing the database container performance data subjected to the denoising processing and the median filling processing according to a preset time window.

Since the database container performance data is periodically collected by the data collection agent, the collected database container performance data belongs to discrete data with time sequence. In practical applications, the identification period of the database container performance bottleneck (e.g., once in 5 minutes) is generally much smaller than the data acquisition period, so that further discrete processing of the acquired database container performance data is required. The size of the time window is equal to the identification period of the database container performance bottleneck, for example, if the identification period is 5 minutes, the time window is set to 5 minutes.

S404: and determining the performance characteristics of the database container according to the partitioned database container performance data.

Wherein the database container performance characteristics include historical database container performance characteristics and current database container performance characteristics.

In particular, the data for each time window can be usedAnd finally, organizing the database container performance characteristics of each time window according to a time sequence and storing the database container performance characteristics into a database container performance characteristic record table. The current database container performance characteristics may be used as input to a subsequent data analysis module. Table 4 is a database container performance characteristics record table. As shown in Table 4, T_Intv represents the start and stop time of the time window, e.g., T ₁ -T ₃₀ 、T ₃₁ -T ₆₀ And T ₆₁ -T ₉₀ And the like, wherein host_ip1 is a container IP address, and cpu_ratio, disk_io, disk_rt, memory_ Ratio, network _io, thread_number and max_thread respectively represent the average value of CPU utilization, the average value of Disk IO response time, the average value of Memory utilization, the average value of network IO, the average value of actual Thread Number and the average value of maximum Thread Number in the t_intv time window.

TABLE 4 Table 4

FIG. 5 is a flow chart of determining SQL features in an embodiment of the invention. FIG. 9 is a flow chart of determining SQL features in an embodiment of the invention. As shown in fig. 5, 7 and 9, the database container fault locating method further includes:

S501: SQL data is collected.

The SQL data comprises SQL data and current SQL data. The data acquisition module for acquiring the SQL data is automatically started along with the database container, and an SQL data acquisition thread is created after the data acquisition module is started to acquire the SQL data.

In specific implementation, a data collection agent (agent) can be deployed in the database container, and collected SQL data (MYSQL (relational database management system) slow logs) are transmitted to the data storage module. The data acquisition period can be set according to the practical application, for example, 1 acquisition every 10 seconds; the SQL data comprises SQL text, SQL execution time, a database container IP address for executing SQL, SQL time consumption, SQL locking time, the number of lines sent by SQL and the number of lines scanned by SQL. In order to record more SQL information in SQL data, the acquisition threshold can be defined to be lower than the actual requirement before data acquisition according to the actual requirement so as to avoid missing the potentially inefficient SQL.

The data storage module receives and caches the SQL data acquired by the data acquisition agent, and stores the SQL data in the SQL data record table according to the ascending order of the acquisition time, and each piece of SQL data has corresponding acquisition time. The stored SQL data is cleaned periodically by a system administrator setting a cleaning period. For example, the data is cleaned once a week if the data is cleaned every week, and only the data of the last week is reserved so as to avoid occupying excessive disk space. Table 5 is a table of SQL data records. As shown in Table 5, T ₂ SQL, SQL_exeTime, host_IP2, SQL_duration, SQL_LockTime, SQL_LowNum and SQL_Scan_LowNum represent SQL data collection time, SQL text, SQL execution time, database container IP address executing SQL, SQL time consuming, SQL lock time, number of rows sent by SQL and number of rows scanned by SQL, respectively.

TABLE 5

S502: the SQL data is partitioned by time window.

Since SQL data is periodically collected by the data collection agent, the collected SQL data belongs to discrete data with time sequence. In practical applications, the recognition period of the low-efficiency SQL (for example, recognition once in 5 minutes) is usually far smaller than the data acquisition period, so that further discrete processing is required for the acquired SQL data. The size of the time window is equal to the recognition period of the inefficient SQL, for example, if the recognition period is 5 minutes, the time window is set to 5 minutes.

In addition, since one SQL statement (SQL text) may be executed multiple times, and thus there are multiple SQL data, it is necessary to process the SQL data and combine the relevant features of the same SQL statement.

S503: and determining SQL features according to the divided SQL data.

The SQL features comprise historical SQL features and current SQL features.

In specific implementation, the method can take the median of SQL execution time, and take the average of SQL time consumption, SQL locking time, the number of lines sent by SQL and the number of lines scanned by SQL. And taking the median of SQL execution time, namely taking the median of the earliest SQL execution time and the latest SQL execution time of the same SQL text from the time window. For example, the earliest SQL execution time is T ₁ At the moment, the latest SQL execution time is T _n At the moment, the value is as follows:

and organizing SQL features of each time window according to the time sequence and storing the SQL features into an SQL feature record table. The current SQL feature would be the input to the subsequent data analysis module. Table 6 is a SQL feature record table. As shown in Table 6, T_Intv represents the start and stop time of the time window, e.g., T ₁ -T ₃₀ 、T ₃₁ -T ₆₀ And T ₆₁ -T ₉₀ And the like, wherein SQL and host_IP2 are respectively SQL texts and database container IP addresses for executing SQL, SQL_exeTime is the median value of the SQL text execution time in a time window T_Intv, and SQL_duration, SQL_LockTime, SQL_LowNum and SQL_Scan_LowNum respectively represent the average value of SQL time consumption, the average value of SQL locking time, the average value of the number of lines sent by SQL and the average value of the number of lines scanned by SQL in the T_Intv time window.

TABLE 6

The specific flow of the embodiment of the invention is as follows:

1. historical database container performance data and historical SQL data are collected.

2. And denoising and median filling the historical database container performance data.

3. Dividing the historical database container performance data subjected to noise reduction processing and median filling processing according to a preset time window, and determining the performance characteristics of the historical database container according to the divided historical database container performance data. And dividing the historical SQL data according to the time window, and determining the historical SQL features according to the divided historical SQL data.

4. And acquiring a historical database container identification result, and forming a historical database container sample according to the historical database container performance characteristics and the historical database container identification result.

5. Selecting a first preset number of samples from the historical database container samples as a database container training set, determining segmentation nodes of a database container decision tree according to the historical database container identification result and the historical database container performance characteristics in the database container training set, and creating a database container decision tree model according to the segmentation nodes of the database container decision tree.

6. And acquiring a historical SQL recognition result, and forming a historical SQL sample according to the historical SQL features and the historical SQL recognition result.

7. Selecting a second preset number of samples from the historical SQL samples as an SQL training set, determining segmentation nodes of an SQL decision tree according to historical SQL recognition results and historical SQL features in the SQL training set, and creating an SQL decision tree model according to the segmentation nodes of the SQL decision tree.

8. Current database container performance data and current SQL data are collected.

9. And denoising and median filling the performance data of the current database container.

10. Dividing the current database container performance data subjected to noise reduction processing and median filling processing according to a preset time window, and determining the current database container performance characteristics according to the divided current database container performance data. Dividing the current SQL data according to the time window, and determining the current SQL feature according to the divided current SQL data.

11. The method comprises the steps of acquiring the performance characteristics and the current SQL characteristics of a current database container located in the same time window.

12. Inputting the performance characteristics of the current database container into a database container decision tree model to obtain the identification result of the current database container; and inputting the current SQL features into the SQL decision tree model to obtain a current SQL identification result.

13. When the current database container identification result is abnormal, judging whether the current SQL identification result is abnormal or not.

14. And uploading the first fault location information when the current SQL identification result is normal, otherwise uploading the second fault location information.

In summary, in order to more accurately and efficiently identify the performance bottleneck of the database container and related low-efficiency SQL, the invention discloses a database container fault positioning method. According to the method, based on a random forest algorithm, the database container performance data and the SQL data are divided and processed in fixed time windows, and a unified time sequence relation between the database container performance data and the SQL data is established, so that the database container performance bottleneck existing under each time window and related low-efficiency SQL causing the performance bottleneck can be accurately and efficiently identified, faults are sent to operation and maintenance personnel in real time, and decision basis is provided for daily production and operation and maintenance of the database. On one hand, the recognition efficiency of the performance bottleneck of the database container can be improved, and the response speed of production operation staff to the performance problem of the database container can be accelerated; on the other hand, the recognition accuracy of the low-efficiency SQL can be improved, production, operation and maintenance personnel can be helped to quickly locate the cause of the problem, and the problem is accelerated to be solved.

Based on the same inventive concept, the embodiment of the invention also provides a database container fault positioning system, and because the principle of solving the problem of the system is similar to that of the database container fault positioning method, the implementation of the system can refer to the implementation of the method, and the repetition is omitted.

FIG. 13 is a block diagram of a database container fault localization system in accordance with an embodiment of the present invention. As shown in fig. 13, the database container fault localization system includes:

In one embodiment, the method further comprises: a database container decision tree model creation unit for:

acquiring a historical database container sample; wherein the historical database container sample comprises historical database container performance characteristics and historical database container identification results;

selecting a first preset number of samples from the historical database container samples as a database container training set;

determining segmentation nodes of a database container decision tree according to a historical database container identification result and historical database container performance characteristics in a database container training set;

creating a database container decision tree model according to the segmentation nodes of the database container decision tree.

In one embodiment, the method further comprises: the SQL decision tree model creating unit is used for:

acquiring a historical SQL sample; the historical SQL sample comprises historical SQL features and historical SQL identification results;

selecting a second preset number of samples from the historical SQL samples as an SQL training set;

determining segmentation nodes of the SQL decision tree according to the historical SQL recognition result and the historical SQL features in the SQL training set;

And creating an SQL decision tree model according to the segmentation nodes of the SQL decision tree.

In one embodiment, the method further comprises:

the database container performance data acquisition unit is used for acquiring database container performance data; wherein the database container performance data comprises historical database container performance data and current database container performance data;

the preprocessing unit is used for carrying out denoising processing and median filling processing on the database container performance data;

the first dividing unit is used for dividing the database container performance data subjected to noise reduction processing and median filling processing according to a preset time window;

a database container performance characteristic determining unit for determining database container performance characteristics according to the partitioned database container performance data; wherein the database container performance characteristics include historical database container performance characteristics and current database container performance characteristics.

In one embodiment, the method further comprises:

the SQL data acquisition unit is used for acquiring SQL data; the SQL data comprises historical SQL data and current SQL data;

the second dividing unit is used for dividing SQL data according to the time window;

the SQL feature determining unit is used for determining SQL features according to the divided SQL data; the SQL features comprise historical SQL features and current SQL features.

In summary, the database container fault location system of the embodiment of the invention respectively inputs the current database container performance characteristics and the current SQL characteristics of the same time window into respective decision tree models to obtain the current database container identification result and the current SQL identification result, and uploads the fault location information according to the current SQL identification result when the current database container identification result is abnormal, so that the database container fault can be accurately and efficiently identified and located.

The embodiment of the invention also provides a concrete implementation mode of the computer equipment capable of realizing all the steps in the database container fault locating method in the embodiment. Fig. 14 is a block diagram of a computer device in an embodiment of the present invention, and referring to fig. 14, the computer device specifically includes:

a processor 1401 and a memory 1402.

The processor 1401 is configured to invoke a computer program in the memory 1402, where the processor executes the computer program to implement all the steps in the database container fault locating method in the above embodiment, for example, the processor executes the computer program to implement the following steps:

In summary, the computer device in the embodiment of the invention respectively inputs the performance characteristics of the current database container and the current SQL characteristics of the same time window into the respective decision tree models to obtain the current database container identification result and the current SQL identification result, and when the current database container identification result is abnormal, the computer device uploads the fault positioning information according to the current SQL identification result, so that the fault of the database container can be accurately and efficiently identified and positioned.

The embodiment of the present invention also provides a computer readable storage medium capable of implementing all the steps of the method for locating a fault in a database container in the above embodiment, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all the steps of the method for locating a fault in a database container in the above embodiment, for example, the processor implements the following steps when executing the computer program:

In summary, the computer readable storage medium of the embodiment of the invention respectively inputs the performance characteristics and the current SQL characteristics of the current database container in the same time window into the respective decision tree models to obtain the current database container identification result and the current SQL identification result, and when the current database container identification result is abnormal, the fault locating information is uploaded according to the current SQL identification result, so that the fault of the database container can be accurately and efficiently identified and located.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block), units, and steps described in connection with the embodiments of the invention may be implemented by electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components (illustrative components), elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation is not to be understood as beyond the scope of the embodiments of the present invention.

The various illustrative logical blocks, or units, or devices described in the embodiments of the invention may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a user terminal. In the alternative, the processor and the storage medium may reside as distinct components in a user terminal.

In one or more exemplary designs, the above-described functions of embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer readable media includes both computer storage media and communication media that facilitate transfer of computer programs from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media may include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store program code in the form of instructions or data structures and other data structures that may be read by a general or special purpose computer, or a general or special purpose processor. Further, any connection is properly termed a computer-readable medium, e.g., if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless such as infrared, radio, and microwave, and is also included in the definition of computer-readable medium. The disks (disks) and disks (disks) include compact disks, laser disks, optical disks, DVDs, floppy disks, and blu-ray discs where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included within the computer-readable media.

Claims

1. A method for locating a database container failure, comprising:

when the current database container identification result is abnormal, judging whether the current SQL identification result is abnormal or not;

uploading first fault location information when the current SQL identification result is normal, otherwise uploading second fault location information;

the first fault locating information comprises a performance analysis result of a database container, an actual thread number of the container, a set maximum thread number of the container, an actual connection number of the database and a set maximum connection number of the database; the second fault location information comprises a database container with performance problems, an IP port of a database instance with abnormal SQL, abnormal SQL and index names, index fields and index field ordering of tables related to the abnormal SQL.

2. The database container fault localization method of claim 1, wherein creating a database container decision tree model comprises:

acquiring a historical database container sample; wherein the historic database container sample comprises historic database container performance characteristics and historic database container identification results;

determining segmentation nodes of a database container decision tree according to the historical database container identification result and the historical database container performance characteristics in the database container training set;

and creating the database container decision tree model according to the segmentation nodes of the database container decision tree.

3. The method of claim 1, wherein creating an SQL decision tree model comprises:

determining segmentation nodes of an SQL decision tree according to the historical SQL recognition result and the historical SQL features in the SQL training set;

And creating the SQL decision tree model according to the segmentation nodes of the SQL decision tree.

4. The database container fault location method of claim 1, further comprising:

collecting database container performance data; wherein the database container performance data comprises historical database container performance data and current database container performance data;

denoising and median filling the database container performance data;

dividing database container performance data subjected to noise reduction processing and median filling processing according to a preset time window;

determining database container performance characteristics according to the partitioned database container performance data; wherein the database container performance characteristics include the historical database container performance characteristics and the current database container performance characteristics.

5. The database container fault location method as claimed in claim 4, further comprising:

collecting SQL data; the SQL data comprises historical SQL data and current SQL data;

dividing the SQL data according to the time window;

determining SQL features according to the divided SQL data; wherein the SQL features comprise a historical SQL feature and the current SQL feature.

6. A database container fault location system, comprising:

a database container identification result unit, configured to input the current database container performance characteristic into a database container decision tree model created based on the historical database container performance characteristic and the historical database container identification result under a random forest algorithm, to obtain a current database container identification result;

the SQL recognition result unit is used for inputting the current SQL feature into an SQL decision tree model created based on the historical SQL feature and the historical SQL recognition result under a random forest algorithm to obtain a current SQL recognition result;

the uploading unit is used for uploading the first fault location information when the current SQL identification result is normal, or uploading the second fault location information when the current SQL identification result is not normal;

7. The database vessel fault location system of claim 6, further comprising: a database container decision tree model creation unit for:

8. The database vessel fault location system of claim 6, further comprising: the SQL decision tree model creating unit is used for:

9. The database vessel fault location system of claim 6, further comprising:

a database container performance characteristic determining unit, configured to determine the database container performance characteristic according to the partitioned database container performance data; wherein the database container performance characteristics include the historical database container performance characteristics and the current database container performance characteristics.

10. The database vessel fault location system of claim 9, further comprising:

The second dividing unit is used for dividing the SQL data according to the time window;

the SQL feature determining unit is used for determining the SQL feature according to the divided SQL data; wherein the SQL features comprise a historical SQL feature and the current SQL feature.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the steps of the database container fault localization method of any one of claims 1 to 5 when the computer program is executed.

12. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the database container fault localization method of any of claims 1 to 5.