CN113297307B

CN113297307B - Database request identification and anomaly detection methods, devices, equipment and media

Info

Publication number: CN113297307B
Application number: CN202011486855.1A
Authority: CN
Inventors: 殷征; 陈旭; 李广望; 李飞飞
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-08-15
Filing date: 2020-12-16
Publication date: 2024-03-05
Anticipated expiration: 2040-12-16
Also published as: CN113297307A

Abstract

The present disclosure provides a method, apparatus, device, and medium for identifying and detecting an abnormality of a database request, where in the embodiment of the present disclosure, a database request set in a set period of time may be obtained; after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: responding to a database request with the time length being larger than a first set threshold value and the distribution probability being lower than the set probability threshold value; the cause of the abnormal operation of the database can be accurately positioned through the identified target database request.

Description

Database request identification and anomaly detection methods, devices, equipment and media

Technical Field

The present disclosure relates to the field of database technologies, and in particular, to a method, an apparatus, a device, and a medium for identifying a database request and detecting an anomaly.

Background

With the growth of the cloud database market, identifying slow database requests (slow SQL, structured Query Language) is critical to maintaining the stability of services. Some database systems automatically record detailed processing information of data request SQL, and if the execution time/response time of the SQL exceeds a set threshold, the execution time/response time of the SQL is slow SQL. How to identify the cause of slow SQL becomes a technical problem to be solved.

Disclosure of Invention

In order to overcome the problems in the related art, the present specification provides a method, an apparatus, a device, and a medium for identifying database requests and detecting anomalies.

According to a first aspect of embodiments of the present specification, there is provided a database anomaly detection method, the method including:

acquiring a database request set in a set time period;

after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: in response to a database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold.

According to a second aspect of embodiments of the present specification, there is provided a database anomaly detection method, the method comprising:

acquiring a database request set in a set time period;

after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: responding to a database request with the time length being larger than a first set threshold value and the distribution probability being lower than the set probability threshold value;

And identifying the type of the database abnormality reason to which the target database request belongs.

According to a third aspect of embodiments of the present specification, there is provided an acquisition method of a database anomaly detection model, including:

acquiring a historical target database request set, and acquiring a plurality of historical key abnormal performance indexes corresponding to each historical target database request in the historical target database request set;

performing cluster analysis on each historical target database request in the historical target database request set by using the similarity of the historical key abnormal performance indexes to obtain a plurality of cluster type results; the clustering category result comprises: at least one historical target database request, wherein the historical target database request corresponds to a historical key abnormal performance index and an abnormal characteristic, and each cluster type corresponds to one database abnormal reason type;

training a machine learning model using the plurality of cluster category results; the machine learning model obtained through training is used for identifying the type of the database abnormality reason to which the target database request belongs.

According to a fourth aspect of embodiments of the present specification, there is provided a database request identifying apparatus, the apparatus comprising:

A request acquisition module, configured to: acquiring a database request set in a set time period;

a request identification module for: after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: in response to a database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold.

According to a fifth aspect of embodiments of the present specification, there is provided a database abnormality detection apparatus, the apparatus including:

a request identification module for: after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: responding to a database request with the time length being larger than a first set threshold value and the distribution probability being lower than the set probability threshold value;

the reason identification module is used for: and identifying the type of the database abnormality reason to which the target database request belongs.

According to a sixth aspect of the embodiments of the present specification, there is provided an acquisition apparatus of a database abnormality detection model, including:

An acquisition module for: acquiring a historical target database request set, and acquiring a plurality of historical key abnormal performance indexes corresponding to each historical target database request in the historical target database request set;

a clustering module for: performing cluster analysis on each historical target database request in the historical target database request set by using the similarity of the historical key abnormal performance indexes to obtain a plurality of cluster type results; the clustering category result comprises: at least one historical target database request, wherein the historical target database request corresponds to a historical key abnormal performance index and an abnormal characteristic, and each cluster type corresponds to one database abnormal reason type;

training module for: training a machine learning model using the plurality of cluster category results; the machine learning model obtained through training is used for identifying the type of the database abnormality reason to which the target database request belongs.

According to a seventh aspect of embodiments of the present description, there is provided a system comprising: the system comprises a database abnormality detection end, a database service end and a user request end;

the user request end is used for: sending a database request to the database server;

The database server is used for: responding to the database request;

the database abnormality detection end is used for:

acquiring a database request set in a set time period;

after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: and identifying the type of the abnormal cause of the database to which the target database request belongs in response to the database request with the time length larger than the first set threshold and the distribution probability lower than the set probability threshold.

According to an eighth aspect of embodiments of the present specification, there is provided a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the aforementioned database request identification method, database abnormality detection method, or acquisition method of a database abnormality detection model when executing the program.

According to a ninth aspect of the embodiments of the present specification, there is provided a computer-readable storage medium storing computer instructions that cause the computer to execute the aforementioned database request identification method, database abnormality detection method, or acquisition method of a database abnormality detection model.

According to a tenth aspect of embodiments of the present specification, there is provided a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the aforementioned database request identification method, database abnormality detection method, or acquisition method of a database abnormality detection model when executing the program.

According to an eleventh aspect of the embodiments of the present specification, there is provided a computer-readable storage medium storing computer instructions that cause the computer to execute the aforementioned database request identification method, database abnormality detection method, or acquisition method of a database abnormality detection model.

The technical scheme provided by the embodiment of the specification can comprise the following beneficial effects:

in the embodiment of the specification, the target database request can be identified in a plurality of database requests, the response time of the target database request is longer than the first set threshold value, the distribution probability of the target database request is lower than the set probability threshold value, and the target database request characterizes the abnormal running state of the database, so that the target database request can be used for identifying the abnormal cause type of the database, and the root cause of the slow database request can be determined so as to optimize the database system.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.

FIG. 1A is a flowchart illustrating a method of database request identification according to an exemplary embodiment of the present description.

FIG. 1B is a graph showing probability distribution comparisons for normal SQL, iSQ, and generally slow SQL, according to an example embodiment of the present disclosure.

Fig. 2A is a flowchart illustrating a method of database anomaly detection according to an exemplary embodiment of the present disclosure.

Fig. 2B is a schematic diagram of spikes on two timing features shown in this specification according to an example embodiment.

FIG. 2C is a schematic diagram of the present specification showing the occurrence of a mean up-shift or a mean down-shift on two timing features according to an example embodiment.

Fig. 2D is a flowchart illustrating a method for acquiring a database anomaly detection model according to an exemplary embodiment of the present disclosure.

Fig. 2E is a schematic diagram illustrating a similarity comparison of iSQ1 and iSQ2 according to an exemplary embodiment of the present disclosure.

Fig. 3 is a hardware architecture diagram of a computer device according to an exemplary embodiment shown in the present specification.

Fig. 4 is a schematic diagram of a database request recognition apparatus according to an exemplary embodiment of the present specification.

Fig. 5 is a schematic diagram of a database anomaly detection apparatus according to an exemplary embodiment of the present specification.

Fig. 6 is a schematic diagram of an acquisition apparatus of a database anomaly detection model according to an exemplary embodiment of the present specification.

Fig. 7 is a schematic diagram of a system architecture according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.

The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

With the growth of the cloud database market, identifying slow database requests is critical to maintaining the stability of database services. The database system may automatically record details of database requests SQL (structured query language ) that are slow database requests (slow SQL) if their processing time exceeds a user-defined threshold. That is, a slow database request refers to a request with a longer response time. Some solutions focus on the identification of slow data requests and their corresponding optimizations.

One definition of slow SQL is to filter problem SQL by defining a threshold (e.g., SQL execution time exceeding 1 s), but this is too coarse-grained, and slow SQL is due to system internal reasons, because slow SQL often involves itself executing very slow SQL, such as a full-table scan query involving a large amount of queries, or complex nested queries, or queries without indexing. Because SQL query writing reasons can be optimized by rewriting SQL, some SQL can be optimized by index recommendation. Of course, some SQL itself has little optimization space but the execution time is longer than 1 second, so that the problem of checking the slow SQL is often disturbed by the slow SQL caused by the SQL itself. By setting the filtering rule that the execution time of SQL is longer than 1 second, the real problem SQL can not be well screened obviously, and the real reason for the slow SQL can not be quickly determined.

In analyzing slow SQL, the inventors found that among many slow SQL, there was a class of SQL whose execution time was much slower than its historical execution time, and why the same SQL was slow, which is a concern. Further research has found that the precursors of database system operation faults are all accompanied by the occurrence of SQL, because the database system operation has some faults, which can lead to long execution time of some SQL. If the database system operates normally, the SQL execution time is normal, and the execution time is shorter. Thus, slow SQL's caused by database system operational failures can be executed slower than the historical execution times of these SQL's under normal conditions. Thus, from a plurality of slow SQL, the identification of the slow SQL caused by the operation fault of the database system has important significance.

The embodiment of the present specification refers to the slow database request (iSQ/iSQs, intermittent Slow Queries) caused by abnormal running state of the database system, that is iSQ is caused by non-SQL self-cause, for example, caused by abnormal running of the database system software layer or the database machine layer.

Based on the above definition of iSQ, the embodiment of the present disclosure provides a database request identifying method, which can identify iSQ from a plurality of database requests, as shown in fig. 1A, and includes the following steps:

in step S102, a database request set within a set period of time is acquired;

in step S104, after determining a probability distribution of response durations of each database request in the database request set in the set period of time, determining a target database request according to the probability distribution, where the target database request includes: in response to a database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold.

In this embodiment, the target database request needs to be identified from among many slow database requests. Through research on historical slow database requests, it was found that among many slow database requests, iSQ executed much slower than his historical execution time, and iSQ was caused by intermittent operational anomalies of the database system, so that from a temporal perspective, slow SQL, typically caused by SQL itself, persists regardless of whether the database is operational anomalies; while iSQ runs normally under normal database operation and only occurs under abnormal database system conditions, iSQ is intermittent.

As an example, firstly, a database request set in a set time period is acquired, where the set time period in this embodiment may be understood as an observed time window, and a specific duration may be flexibly configured according to an actual service as required. For example, the time period is set to 10 minutes, i.e., a time window with 10 minutes as an observation, and iSQ is identified from the database request set within 10 minutes every 10 minutes. The obtained database request set may include a full amount of database requests, that is, normal database requests and slow database requests are included; or a request set that includes only slow database requests. The response time of the database requests may be determined by the response time from the initiation time to the post-execution feedback of the database system for each database request.

In view of numerous database requests, the foregoing definition of iSQ is identified by means of a probability distribution, where first a probability distribution of a response duration of each database request is determined, and a target database request is determined from the probability distribution, where the target database request includes: in response to a slow database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold. For the definition of the normal database request and the slow database request in the response time length, the embodiment is called a second set threshold, and the second set threshold can be set according to the actual service; and a first set threshold used in determining the target database request, the first set threshold may be greater than or equal to a second set threshold, and the first set threshold may be set according to the actual service.

As shown in fig. 1B, a probability distribution comparison diagram of normal SQL, iSQ and general slow SQL is shown, in fig. 1B, taking 1 second as an example of a metric of slow database request and normal request, other settings such as 2 seconds may be used in practical application, which is not limited in this embodiment. Wherein, the horizontal and vertical refer to the response time length (Query time) of the request, the vertical axis refers to the probability distribution of the response time length, and the Density (Density) of three SQL in the total SQL is represented. Under the scenes of a cloud database or a distributed database and the like, the database nodes are more, so that even if part of databases run abnormally, the normal SQL quantity still occupies most part, and the normal SQL quantity is more, so that the probability of looking at the distribution is maximum, and as shown in FIG. 1B, a large number of normal SQL with short response time are concentrated together, and the probability is larger; however, the typical slow SQL (the embodiment of FIG. 1B takes the example that the response time Xt is longer than 1 second) is also more, and the probability of the slow SQL is about 0.5 from the distribution; while iSQ is smaller in number and has a lower probability P from a distribution perspective. Thus, in this embodiment, the database request whose response time period is longer than the first set threshold and whose distribution probability is lower than the set probability threshold is determined to be iSQ by the probability distribution. The first set threshold and the set probability threshold may be flexibly configured according to needs, for example, the first set threshold may be 1 second, that is, the same as the measurement standard of the slow database request and the normal request, or may be greater than 1 second, that is, in this embodiment, the threshold of the measurement standard of the slow database request and the normal request is referred to as the second set threshold, and then the first set threshold may be greater than or equal to the second set threshold.

In this embodiment, the implementation manner of obtaining the slow database request may have a plurality of manners, and in general, the slow database request is initiated by the requester, and it may be determined that the response time is greater than the first set threshold database request as the slow database request according to the response time of the database system to each database request. In some examples, this may be accomplished with a probe database request. The probe activity database request refers to a heartbeat packet request, and refers to a database request used for detecting whether a database instance (also called a database node) is available, and measuring the heartbeat of the database by connecting the database in real time and sending a timestamp for updating the SQL to update the probe activity table.

As an example, the get slow database request includes:

sending a probe activity database request to a database instance, and monitoring response time of the probe activity database request;

and if the response time of the probe activity database request is larger than a first set threshold value, determining that the probe activity database request is a slow database request.

The database server can be understood to consist of two parts: physical databases and software database management systems. The database management system is a middle layer between the user and the physical database, and is a software layer. This software layer has a structure, which is also called an instance structure. When starting up the database, the database management system needs to acquire, divide and reserve areas with various purposes in the memory, run background processes with various purposes, namely create an instance (instance), load and open the database by the instance, and finally access and control various physical structures of the database by the instance. In some scenarios, such as large cloud databases, the number of database instances is very large, even reaching over 10 tens of thousands. While the probe database request is generally used to detect whether the database instance is online and whether the connection is available, the embodiment uses the probe database request to identify the target database request, because if the probe SQL is iSQ, the possibility of abnormality of the database system is very high, which can reflect the abnormality of the database system in the embodiment of the present specification, so that the detection of the database stability is more direct and effective. In general, the database system has a faster response to the probe activity SQL, the probe activity iSQ has high sensitivity, and takes longer time to return than other slow SQL database systems, so that the probe activity iSQ can be used for identifying iSQ faster, and further the system abnormality can be identified faster.

As can be seen from the foregoing embodiments, the database request identifying scheme according to the embodiments of the present disclosure can further identify a target database request based on a plurality of slow database requests, where the response time of the target database request is longer than the historical response time of the target database request, and thus has an important meaning for identifying anomalies in a database system. After identifying the target database request, the method can be used for executing various data request related processes, such as collecting the target database request for training a machine learning model, wherein the trained model can have various tasks, such as identifying the characteristics of the target database request, identifying the reasons for the occurrence of the target database request, and the like; the target database request may also be used to trigger an anomaly alarm, or the target database request may be used to identify the cause of a database anomaly, etc.

Before the cloud database fails, iSQ tends to appear in advance, since the response time of iSQ is normal under normal conditions, which is not actually slow in other normal periods, the sudden increase of the return time of RT (i.e. the response time of the database SQL request or the execution time of the database SQL request) of the SQL query can cause a huge change of the service, for example, 0.1 seconds can affect 1% of sales of the service party, and additional delay of every 0.5 seconds can cause 20% of service decline of the search service, and similar situations exist in the cloud database service scenario, so that finding the problem SQL in detail and finding the root cause of the problem SQL are technical problems that need to be solved.

In the process of locating the abnormal root cause of the database in the large-scale cloud database scene, firstly, the occurrence of slow SQL is a common phenomenon, secondly, the locating iSQ from a plurality of slow SQL is relatively complex, the identification of the cause is difficult after iSQ is located, database administrators and operation and maintenance personnel can take a great deal of time to check the system logs or the system performance indexes, and the cost of manual locating problems is increased along with the continuous expansion of the cloud database examples.

By the database request recognition scheme of the foregoing embodiment, a target database request may be located from a plurality of database requests, based on which the recognition difficulty of database anomalies may be significantly reduced, and based on which, as shown in fig. 2A, a flowchart of a database anomaly detection method according to an exemplary embodiment of the present disclosure is shown, including the following steps:

in step S202, a database request set for a set period of time is acquired.

In step S204, after determining a probability distribution of response durations of each database request in the database request set in the set period of time, determining a target database request according to the probability distribution, where the target database request includes: in response to a database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold.

In step S208, a database abnormality cause category to which the target database request belongs is identified.

Steps 202 to 204 may refer to the description of steps 102 and 104 in the embodiment described in fig. 1A. In this embodiment, because the number of traditional slow SQL is large, it is difficult to locate the abnormal remote of the database system, and there are many slow SQLs (such as SQL with large query data amount) that do not need to be optimized, and the execution efficiency and system state of the SQL cannot be truly reflected; according to the embodiment of the application, the database is subjected to anomaly detection through iSQ, so that the problem SQL is accurately screened out from a new dimension in a data driving mode, the whole set of database anomaly analysis can be obviously reduced through positioning the part of SQL, the resource utilization rate is improved, the difficulty in identifying database system anomalies can be obviously reduced, and the anomaly reasons of the database system can be more quickly identified.

In this embodiment, the types of the causes of the database abnormality are various, and may be hardware problems of the device or software problems of the database. If the database runs abnormally, the SQL response time is longer because of longer execution time of the SQL from the appearance; but is due to the fact that some problems with the database instance (i.e., the database server devices in the database) cause operational anomalies, which may be due to CPU overload, I/O resource overload, link bottlenecks, etc. from a device perspective, that are represented in data as performance data for the database operating environment, from which one or more performance index data of interest may be extracted. When iSQ is generated, there is a performance anomaly associated with the database operating environment, i.e., iSQ has an association with performance index data, e.g., CPU overload may result in the generation of some class iSQ, I/O resource overload may result in the generation of some class iSQ, etc. Therefore, based on the relevance between iSQ and the performance index data, the iSQ can identify which performance index data are abnormal, and the performance index data are corresponding to specific database abnormal reasons, so that the positioning of the database abnormal reasons is realized. The index of the embodiment is used to measure the performance of the database running, taking the performance index as the CPU utilization rate as an example, the CPU utilization rate is not overloaded under normal conditions, does not reach a value close to 100%, and is normally maintained in a certain range, as an example, is normally kept to float up and down at 70% in a certain period of time. However, if the CPU hardware abnormality causes the CPU to be overloaded, the CPU utilization rate may climb from 70% of normal to nearly 100%, and from the data appearance, a class iSQ appears in a certain period of time, and the corresponding CPU performance index climbs from a normal value to a value representing the overload. Based on this, for iSQ identified from the SQL of the set time period, performance index data of the database of the corresponding time period can be acquired, and thus the cause of the database abnormality can be located.

In this embodiment, the performance index data includes at least one data representing a performance index. The specific performance index can be flexibly selected according to the needs, and can comprise: CPU utilization, I/O utilization, network throughput, or controller workload, etc. The correspondence between the target database request and the performance index data means that the target database request and the performance index data correspond in time, and the correspondence is generated due to the fact that the performance index data of the database operation environment change when the database operation is abnormal, namely, the target database request and the performance index data are two results of the database operation abnormality. Thus, a target database request is found in a certain time period, and performance index data of the corresponding time period can be acquired so as to perform database system anomaly analysis.

From the time point of view, if the database system is normal, the performance index data will not change greatly; if the database system experiences a change from normal to abnormal, such a change will be reflected in a change in the performance index data, and thus the performance index data of this embodiment may be time-series data. By extracting the time sequence characteristics of the performance index, whether the performance index is abnormal or not can be determined, for example, if the CPU utilization rate of the performance index is increased suddenly from the time sequence, the CPU utilization rate of the performance index has a characteristic of sudden increase, and the database system can be indicated to be possibly abnormal by the characteristic of sudden increase.

In this embodiment, the time sequence characteristics of the performance index represent the time fluctuation condition of the performance index, the database is normal and the database is abnormal, which can cause different time fluctuation of the performance index, some special fluctuation can often represent the abnormality of the database, for example, the time sequence characteristics have peak, mean shift (mean up shift or mean down shift), or the time sequence characteristics are invalid, and these several conditions respectively correspond to the actual abrupt increase of the index value, the increase of the index overall mean, the decrease of the index overall mean, the zero index or the loss of the index, etc. As shown in fig. 2B, a schematic diagram showing spikes on two timing characteristics; as shown in fig. 2C, a schematic diagram of the mean shift up or the mean shift down is shown. Of course, in practical application, other types of timing characteristics may be configured as required to characterize whether the performance index is abnormal, which is not limited in this embodiment.

In some examples, localization of the cause of database anomalies may be achieved through a machine learning model. As an example, the machine learning model may be used to: after identifying at least one performance index from the performance index data and identifying whether each performance index has at least one abnormal index feature, determining the performance index with at least one abnormal feature to determine the cluster category to which the abnormal index feature belongs, and determining the database abnormality cause category corresponding to the determined cluster category as the target database request database abnormality cause category. Therefore, when the type of the abnormal cause of the database needs to be identified, the performance index data of the database operating environment can be acquired, the target database request and the performance index data are input into a machine learning model, and the type of the abnormal cause of the database to which the target database request belongs is determined by using the machine learning model.

In some examples, the machine learning model may be trained using historical data, for example, the historical data may include historical known database operating anomaly data and historical iSQ data, as shown in fig. 2D, which is a flowchart of a method of obtaining a database anomaly detection model according to an exemplary embodiment of the present disclosure, comprising the steps of:

in step 212, a historical target database request set and a plurality of historical key abnormal performance indexes corresponding to each historical target database request in the historical target database request set are obtained;

in step 214, performing cluster analysis on each historical target database request in the historical target database request set by using the similarity of the historical key abnormal performance indexes to obtain a plurality of cluster type results; the clustering category result comprises: at least one historical target database request, wherein the historical target database request corresponds to a historical key abnormal performance index and an abnormal characteristic, and each cluster type corresponds to one database abnormal reason type;

in step 216, a machine learning model is trained using the plurality of cluster category results.

The machine learning model obtained through training is used for identifying the type of the database abnormality reason to which the target database request belongs.

In this embodiment, the model training may be performed using historical data, and the model training stage is an offline stage. The reasons of abnormality of different databases are expressed as different abnormal performance indexes and abnormal characteristics, and at least one abnormal performance index corresponding to the same abnormal cause of the database has certain similarity, so that the abnormal indexes with certain similarity are clustered together and correspond to one abnormal cause of the database by carrying out cluster analysis on a plurality of performance index data; therefore, the clustering type corresponds to a certain type of abnormal cause type of the database, and a plurality of performance index data can be clustered rapidly and automatically through cluster analysis.

As an example, if the similarity match of two iSQ is high, it is considered to be the same class; the category of the performance index corresponding to iSQ can be represented by the following formulas iSQI and iSQj similarity Sij, T, and |kit, kjt | represents the similarity between each performance index.

As shown in fig. 2E, a similarity comparison schematic diagram of iSQ and iSQ2 is shown, wherein performance indexes are divided into four types, namely, CPU, I/O, network and Workload, at least one performance index is corresponding to each type, each performance index of iSQ1 is compared with each performance index of iSQ2, the similarity of iSQ and iSQ2 can be determined by using the similarity of historical key abnormal performance indexes, and whether iSQ and iSQ2 belong to a cluster can be determined.

The database abnormality cause category corresponding to each cluster category can be obtained by manual annotation, for example, the machine learning model can be shown to recognize the abnormality index characteristics of each cluster category, and a configuration interface aiming at the database abnormality cause category information is provided; and determining the database abnormality reason category corresponding to each cluster category by using the configuration information acquired from the configuration interface. According to the embodiment, the abnormal index features of each cluster are displayed, so that a technician can review the abnormal index features to determine the abnormal cause category of the database of the category, and further corresponding abnormal cause category information of the database can be input through the configuration interface, so that manual marking of the abnormal cause category of the database is realized on the abnormal index features of each cluster.

In practical application, the clustering analysis process can be realized by adopting a clustering algorithm, and the dictionary tree algorithm is taken as an example, so that the historical target database request set can be converted into a dictionary, wherein the dictionary comprises character strings obtained by converting each historical target database request, and the character strings comprise sequences formed by a plurality of historical key abnormal performance indexes corresponding to the historical target database request; and determining the similarity between the historical target database requests by using the similarity between the character strings, and performing cluster analysis according to the similarity between the historical target database requests to obtain a plurality of cluster categories. In this embodiment, each history iSQ is followed by a string, specifically, a sequence formed by a plurality of history key abnormal performance indexes corresponding to the history iSQ may be converted into a string, the string corresponding to each history iSQ is converted into a dictionary, that is, the history target database request set is converted into a dictionary, the string is formed by at least one character, the similarity between the history target database requests may be determined by using the similarity between the strings, and cluster analysis may be performed according to the similarity between the history target database requests, so as to obtain a plurality of cluster categories. The multiple character strings contained in the dictionary need to be rapidly analyzed for similarity, in some examples, KD tree (k-dimension tree) algorithm can be adopted to realize KD tree construction of the dictionary, and rapid comparison between nodes can be realized through the constructed tree structure, so that rapid analysis of similarity of two histories iSQ is realized, and clustering efficiency is improved.

In some examples, the plurality of historical key anomaly indicators is obtained by:

and acquiring historical monitoring data of a database operating environment, calculating a plurality of historical initial abnormal performance indexes by using the historical monitoring data, performing association analysis on the historical initial abnormal performance indexes, and deleting at least one of the associated historical initial abnormal performance indexes to obtain the historical key abnormal performance indexes.

In practical application, the historical monitoring data of the database operating environment are numerous and may include very many abnormal indexes, if the performance indexes of the database exceed 50 or even hundreds, in the massive indexes, database management staff often concern about the abnormal characteristics of key indexes and the severe changes of non-key indexes to locate problems, and some abnormal indexes have high relevance, one abnormal performance index is usually accompanied with another or more abnormal performance indexes, and the high relevance of the indexes is that the related abnormal performance is simultaneously caused by the rapid propagation of faults in the database, for example, the occurrence of the abnormal index A inevitably leads to the occurrence of the abnormal index B, one or a part of indexes can be deleted as required, and only a part of abnormal indexes are reserved for analysis, so that the processing efficiency can be improved, the data processing capacity can be reduced, and the embodiment can extract the key abnormal indexes from the plurality of abnormal indexes. Wherein, for the multi-index association analysis, an association analysis algorithm can be adopted.

In practical application, the clustering result comprises a plurality of clustering categories, and in some scenes, each clustering category may comprise a plurality of historical abnormal performance indexes, so that the problem of the interpretability of the clustering category needs to be solved, and a database manager can conveniently consult each clustering category and label the corresponding database abnormal reason. By interpretability we mean that when we need to know or solve a thing, we can get enough information that we need to be able to understand. According to the scheme, the characteristic meaning data of the historical key abnormal performance indexes corresponding to each cluster type is required to be understood, so that technicians can mark the corresponding database abnormal cause type information more quickly.

In some examples, the clustering result may be input to a bayesian case model, the bayesian case model (Bayesian Case Model) is utilized to integrate the abnormal performance indexes corresponding to each clustering category, and the abnormal features of the abnormal performance indexes are integrated, so that the typical abnormal performance indexes and the typical abnormal features of each clustering category are obtained, the typical abnormal performance indexes and the typical abnormal features form feature meaning data of the clustering category, the interpretability problem is solved through the bayesian case model, the bayesian case model outputs a more visual result, and the feature meaning data is used as the representation of the clustering category, so that technicians can mark the database abnormality cause category more rapidly through the representative data of each clustering category.

In actual business, the model training can be finished in an online application, and the actual application faces a complex environment, so that the database abnormality reasons determined in the model training stage have limitations, and new database abnormality reasons can appear in the actual application and the model cannot be identified. Based on the above, the embodiment can display the target database requests with the reasons not being identified for the technicians to manually identify and calibrate, and train the machine learning model by using the calibration result, so that the machine learning model can be continuously optimized. As an example, if the machine learning model does not identify a database abnormality cause category to which the target database request belongs, acquiring an abnormality index feature of the target database request identified by the machine learning model, and providing a configuration interface for database abnormality cause category information corresponding to the abnormality index feature of the target database request; training the machine learning model by using configuration information acquired from the configuration interface and the abnormal index features requested by the target database.

The embodiment provides a method for locating iSQ factors, which aims at a large amount of historical data in an offline processing stage, searches the abnormal characteristics of each performance index from performance index data in time sequence through abnormality detection, screens out main key abnormal performance indexes through association analysis, performs cluster analysis to obtain the key abnormal performance indexes of each iSQ and the cluster type where the key abnormal performance indexes are located, and extracts the characteristic subspace of each iSQ type through a Bayesian case model, so that the model interpretability is improved, and database management personnel can conveniently label the abnormal reasons of each cluster type. After labeling, obtaining the corresponding relation between each type of database abnormality cause category and the abnormality characteristic of iSQ of the category; subsequently, the machine learning model is applied online, and when the occurrence iSQ is identified online, the root cause causing the iSQ anomaly can be identified using the machine learning model.

Corresponding to the embodiments of the database request identification method and the database abnormality detection method, the present specification also provides embodiments of a database request identification device, a database abnormality detection device and a terminal to which the database request identification device and the database abnormality detection device are applied.

The embodiment of the database abnormality detection apparatus of the present specification may be applied to a computer device, such as a server or a terminal device. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory through a processor of the file processing where the device is located. In terms of hardware, as shown in fig. 3, a hardware structure diagram of a computer device in this specification is shown, and in addition to the processor 310, the memory 330, the network interface 320, and the nonvolatile memory 340 shown in fig. 3, the computer device in which the database request identifying device/the database abnormality detecting device/the obtaining device 331 of the database abnormality detecting model is located in the embodiment generally may further include other hardware according to the actual function of the computer device, which is not described herein.

Accordingly, as shown in fig. 4, a schematic diagram of a database request identifying apparatus according to an exemplary embodiment of the present disclosure is shown, where the apparatus includes:

a request acquisition module 41 for: acquiring a database request set in a set time period;

a request identification module 42 for: after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: in response to a database request having a time period greater than a first set threshold and a distribution probability less than the set probability threshold.

Optionally, the target database request includes: and a slow database request caused by abnormal running state of the database, wherein the response time length of the slow database request is larger than a second set threshold value.

Optionally, the database request set is: and the database request set comprises a plurality of slow database requests, and the response time length of the slow database requests is larger than a second set threshold value.

Optionally, the first set threshold is greater than or equal to the second set threshold.

Optionally, the slow database requests in the slow database request set are determined by:

After sending a probe activity database request to database equipment, monitoring response time of the database equipment to the probe activity database request;

and if the response time of the probe activity database request is larger than a first set threshold value, determining that the probe activity database request is the slow database request.

Accordingly, as shown in fig. 5, a schematic diagram of a database anomaly detection apparatus according to an exemplary embodiment of the present disclosure is shown, where the apparatus includes:

a request acquisition module 51 for: acquiring a database request set in a set time period;

a request identification module 52 for: after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: responding to a database request with the time length being larger than a first set threshold value and the distribution probability being lower than the set probability threshold value;

the reason identification module 53 is configured to: and identifying the type of the database abnormality reason to which the target database request belongs.

Optionally, the identifying a database abnormality cause category to which the target database request belongs includes:

Acquiring performance index data of a database operating environment;

and inputting the target database request and the performance index data into a machine learning model, and determining the type of the database abnormality cause to which the target database request belongs by using the machine learning model.

Optionally, the machine learning model is configured to: and identifying at least one performance index from the performance index data, determining a clustering class to which the performance index with at least one abnormal characteristic belongs after identifying whether each performance index has at least one abnormal characteristic, and determining a database abnormality cause class corresponding to the determined clustering class as the database abnormality cause class of the target database request.

Optionally, the machine learning model is trained by:

The machine learning model is trained using a plurality of cluster class results.

Optionally, the performance index data is time series data, and the abnormal feature is determined by analyzing a fluctuation of the performance index in time series.

Optionally, the performing cluster analysis on each historical target database request in the historical target database request set by using the similarity of the historical key abnormal performance indexes to obtain a plurality of cluster categories includes:

converting the historical target database request set into a dictionary, wherein the dictionary comprises character strings obtained by converting each historical target database request, and the character strings comprise sequences formed by a plurality of historical key abnormal performance indexes corresponding to the historical target database request;

and determining the similarity between the historical target database requests by using the similarity between the character strings, and performing cluster analysis according to the similarity between the historical target database requests to obtain a plurality of cluster categories.

Optionally, the plurality of historical key abnormal performance indicators are obtained by:

Optionally, each cluster corresponds to a database abnormality cause category, and is determined by the following method:

extracting and displaying characteristic meaning data of historical key abnormal performance indexes corresponding to historical target database requests in each cluster type, and providing a configuration interface aiming at database abnormality cause type information;

and determining the database abnormality reason category corresponding to each cluster category by using the configuration information acquired from the configuration interface.

Accordingly, as shown in fig. 6, a schematic diagram of an apparatus for acquiring a database anomaly detection model according to an exemplary embodiment of the present disclosure includes:

an acquisition module 61 for: acquiring a historical target database request set, and acquiring a plurality of historical key abnormal performance indexes corresponding to each historical target database request in the historical target database request set;

a clustering module 62 for: performing cluster analysis on each historical target database request in the historical target database request set by using the similarity of the historical key abnormal performance indexes to obtain a plurality of cluster type results; the clustering category result comprises: at least one historical target database request, wherein the historical target database request corresponds to a historical key abnormal performance index and an abnormal characteristic, and each cluster type corresponds to one database abnormal reason type;

Training module 63 for: training a machine learning model using the plurality of cluster category results; the machine learning model obtained through training is used for identifying the type of the database abnormality reason to which the target database request belongs.

Accordingly, as shown in fig. 7, a schematic system structure of the present disclosure is shown in accordance with an exemplary embodiment, where the system includes: the system comprises a database abnormality detection end, a database service end and a user request end;

the user request terminal 71 is configured to: sending a database request to the database server;

the database server 72 is configured to: responding to the database request;

the database anomaly detection end 73 is configured to:

acquiring a slow database request based on the response time of the database server to the database request, wherein the response time of the slow database request is greater than a first set threshold;

acquiring historical data of response time length of a historical slow database request, calculating probability distribution of the slow database request based on the historical data, and determining that the slow database request with probability lower than a set probability threshold and response time length greater than a second set threshold belongs to a target database request;

Accordingly, a computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the database anomaly detection method when executing the program.

Accordingly, a computer-readable storage medium stores computer instructions that cause the computer to perform the aforementioned database anomaly detection method.

The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

The foregoing description of the preferred embodiments is provided for the purpose of illustration only, and is not intended to limit the scope of the disclosure, since any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A database request identification method, the method comprising:

acquiring a database request set in a set time period;

2. The method of claim 1, the target database request comprising: and a slow database request caused by abnormal running state of the database, wherein the response time length of the slow database request is larger than a second set threshold value.

3. The method of claim 1, the set of database requests being: and the database request set comprises a plurality of slow database requests, and the response time length of the slow database requests is larger than a second set threshold value.

4. A method according to claim 2 or 3, the first set threshold being greater than or equal to a second set threshold.

5. The method of claim 2, the slow database requests in the set of database requests being determined by:

6. A database anomaly detection method, the method comprising:

acquiring a target database request identified by the database request identification method according to any one of claims 1 to 5;

7. The method of claim 6, the identifying a database anomaly cause category to which the target database request belongs comprising:

acquiring performance index data of a database operating environment;

8. The method of claim 7, the machine learning model to: and identifying at least one performance index from the performance index data, determining a clustering class to which the performance index with at least one abnormal characteristic belongs after identifying whether each performance index has at least one abnormal characteristic, and determining a database abnormality cause class corresponding to the determined clustering class as the database abnormality cause class of the target database request.

9. The method of claim 7, the machine learning model being trained by:

10. The method according to claim 8 or 9, the performance index data being time-series data, the abnormal feature being determined by analyzing a fluctuation in time series of the performance index.

11. The method of claim 9, wherein the performing cluster analysis on each historical target database request in the set of historical target database requests to obtain a plurality of cluster categories by using the similarity of the historical key abnormal performance indicators comprises:

12. The method of claim 9, the plurality of historical key anomaly performance indicators obtained by:

13. The method of claim 9, wherein each cluster corresponds to a database anomaly cause category, determined by:

14. A database request identifying apparatus, the apparatus comprising:

15. A database anomaly detection device, the device comprising:

16. A system, the system comprising: the system comprises a database abnormality detection end, a database service end and a user request end;

the database server is used for: responding to the database request;

the database abnormality detection end is used for:

acquiring a database request set in a set time period;

after determining probability distribution of response time length of each database request in the database request set in the set time period, determining a target database request according to the probability distribution, wherein the target database request comprises: responding to the database request with the time length larger than the first set threshold value and the distribution probability lower than the set probability threshold value

17. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 13 when the program is executed by the processor.

18. A computer readable storage medium storing computer instructions that cause the computer to perform the method of any one of claims 1 to 13.