CN114356734A

CN114356734A - Service abnormity detection method and device, equipment and storage medium

Info

Publication number: CN114356734A
Application number: CN202111661265.2A
Authority: CN
Inventors: 孟凡欣
Original assignee: China Sports Lottery Hkjc Infotech Beijing Co ltd
Current assignee: China Sports Lottery Hkjc Infotech Beijing Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-15

Abstract

The present application relates to the field of intelligent operation and maintenance technologies, and in particular, to a method and an apparatus for detecting a service anomaly, a device, and a storage medium. The service abnormity detection method of the embodiment of the application comprises the following steps: determining a business index related to a performance index of a service; constructing a relation model of the service index and the performance index; determining an anomaly threshold for the performance indicator using the relational model; and according to the predicted value of the performance index at the first moment, the acquired value of the performance index at the first moment and the abnormal threshold value of the performance index, determining whether the service is abnormal. The method and the device can efficiently and accurately identify whether the service is abnormal or not, thereby effectively reducing the false alarm and the missing alarm of abnormal alarm.

Description

Service abnormity detection method and device, equipment and storage medium

Technical Field

The present application relates to the field of intelligent operation and maintenance technologies, and in particular, to a method and an apparatus for detecting a service anomaly, a device, and a storage medium.

Background

In recent years, with the rapid development of micro service technology, the scale of system software becomes large, the change frequency is improved, the calling relationship becomes increasingly complex, the monitoring data volume becomes larger and larger, operation and maintenance personnel cannot effectively find abnormality from massive monitoring data, and the traditional system operation and maintenance mode is greatly challenged. Data-driven intelligent operation and maintenance (AIOps) are receiving more and more attention, and the AIOps white paper mentions a plurality of application scenarios implemented by artificial intelligence technology in the application operation and maintenance field, wherein the accurate discovery of the abnormality from the monitoring data through an artificial intelligence algorithm is an important application scenario of the intelligent operation and maintenance in the quality assurance direction.

At present, there are many methods for detecting anomaly in operation and maintenance services, for example, a method for detecting anomaly in a large amount of high-dimensional and complex scenes based on a homonymy method and a circulant method of a fixed threshold, a data probability distribution inspection method based on statistical hypothesis, an anomaly detection method based on time series data, a machine learning based on data driving, a deep learning method, or an artificial intelligence detection method.

At present, the existing operation and maintenance service abnormity detection method mainly has the following defects:

1) the anomaly detection based on the time sequence data prediction does not need extra data as priori knowledge, but the time sequence algorithm has stability requirements on data to be analyzed, the short-time prediction is relatively accurate, the prediction effect of the long-term data in the future is poor, and the label of the abnormal data is difficult to obtain.

2) The supervised machine learning algorithm needs labels of abnormal data, the label data needs to be labeled from a large amount of historical index data by operation and maintenance personnel based on experience, the cost is high, and the labeling accuracy is difficult to ensure.

3) For example, a gaussian model, a regression model, a mixed parameter distribution model, and the like based on parameter estimation, an unsupervised statistical model needs to assume in advance that data satisfies a certain probability distribution, and learn parameters of the distribution through training data to determine data distribution and abnormality, but the assumed distribution is often greatly different from an actual situation, and some actual situations are difficult to describe by a function.

4) The method for detecting the abnormality of the non-parameter estimation of unsupervised learning, such as a clustering algorithm and the like, does not need to assume distribution and prior knowledge, but directly adopts data density and a threshold value to detect the data instance of a low-density area as the abnormality.

In addition, the service abnormity detection method has the problems of low efficiency, poor accuracy, abnormity alarm false alarm, abnormity alarm failure and the like.

Therefore, how to efficiently and accurately identify the abnormality of the operation and maintenance monitoring index and avoid the false alarm and the missing alarm of the abnormal alarm is a technical problem to be solved urgently.

Disclosure of Invention

In view of the above problems in the prior art, the present application provides a method and an apparatus for detecting service abnormality, a device, and a storage medium, so as to efficiently and accurately identify abnormality of a monitoring index and avoid false alarm and false missing alarm of an abnormality alarm.

In order to achieve the above object, a first aspect of the present application provides a service anomaly detection method, including:

determining a business index related to a performance index of a service;

constructing a relation model of the service index and the performance index;

determining an anomaly threshold for the performance indicator using the relational model;

calling the relation model to obtain a predicted value of the performance index related to the service index at a first moment according to the acquired value of the service index related to the performance index at the first moment;

and determining whether the service is different according to the predicted value of the performance index at the first moment, the acquired value of the performance index at the first moment and the abnormal threshold value of the performance index.

Therefore, the abnormity of the performance index of the service can be efficiently and accurately identified by constructing a relation model of the performance index and the service index, and the false alarm and the missing alarm of the abnormity alarm are effectively reduced.

In a possible implementation manner of the first aspect, the constructing a relationship model of the service index and the performance index includes: building a relation model of a business index and a performance index based on a supervised learning algorithm by utilizing a training data set, wherein the training data set comprises performance index historical data and business index historical data of the service, the performance index historical data comprises E historical acquisition values of the performance index, the business index historical data comprises E historical acquisition values of the business index, and E is an integer larger than 1. Therefore, a relation model capable of reflecting the direct relation between the business index and the performance index can be accurately obtained through the historical data of the service.

In a possible implementation manner of the first aspect, the determining, by using the relationship model, an abnormal threshold of the performance indicator includes: obtaining the prediction data of the performance index based on the test data of the service index related to the performance index in the test data set and the relation model, calculating to obtain a residual average value according to the prediction data of the performance index and the test data of the performance index in the test data set, and taking the product of the residual average value and a preset residual multiple as an abnormal threshold value of the performance index; the test data set comprises test data of service indexes and test data of performance indexes, the test data of the service indexes comprise F service index acquisition values, the test data of the performance indexes comprise F performance index acquisition values, the prediction data of the performance indexes comprise F performance index prediction values, and F is an integer greater than or equal to 1.

Therefore, more accurate abnormal threshold value is obtained by combining the residual multiple through a relation model of the service index and the performance index, so that the false alarm is further reduced

In a possible implementation manner of the first aspect of the present application, determining whether a service is different according to a predicted value of the performance index at a first time, an acquired value of the performance index at the first time, and an abnormal threshold of the performance index includes: determining a residual error between a predicted value of the performance index at a first moment and an acquired value of the performance index at the first moment; comparing the residual of the performance indicator to an anomaly threshold of the performance indicator to determine whether the service is anomalous or normal.

Therefore, whether the service is abnormal or not can be accurately determined through the residual error and the threshold value, the calculation complexity is low, the consumption of calculation resources is low, and the detection efficiency of the service abnormality can be effectively improved.

A second aspect of the present application provides a service abnormality detection apparatus, including:

the correlation determination module is used for determining a business index related to the performance index of the service;

the model building module is used for building a relation model of the service index and the performance index;

a threshold determination module for determining an abnormal threshold of the performance indicator using the relationship model;

the abnormity determining module is used for calling the relation model to obtain a predicted value of the performance index related to the service index at a first moment according to the acquired value of the service index related to the performance index at the first moment; and determining whether the service is abnormal or not according to the predicted value of the performance index at the first moment, the acquired value of the performance index at the first moment and the abnormal threshold value of the performance index.

In a possible implementation manner of the second aspect, the model building module is specifically configured to: building a relation model of a business index and a performance index based on a supervised learning algorithm by utilizing a training data set, wherein the training data set comprises performance index historical data and business index historical data of the service, the performance index historical data comprises E historical acquisition values of the performance index, the business index historical data comprises E historical acquisition values of the business index, and E is an integer larger than 1.

In a possible implementation manner of the second aspect, the threshold determining module is specifically configured to:

obtaining the prediction data of the performance index based on the test data of the service index related to the performance index in the test data set and the relation model, calculating to obtain a residual average value according to the prediction data of the performance index and the test data of the performance index in the test data set, and taking the product of the residual average value and a preset residual multiple as an abnormal threshold value of the performance index; the test data set comprises test data of service indexes and test data of performance indexes, the test data of the service indexes comprise F service index acquisition values, the test data of the performance indexes comprise F performance index acquisition values, the prediction data of the performance indexes comprise F performance index prediction values, and F is an integer greater than or equal to 1.

In a possible implementation manner of the second aspect, the abnormality determining module is specifically configured to: determining a residual error between a predicted value of the performance index at a first moment and an acquired value of the performance index at the first moment; comparing the residual of the performance indicator to an anomaly threshold of the performance indicator to determine whether the service is anomalous or normal.

A third aspect of the present application provides a computing device comprising: a processor and a memory; the memory is for storing program instructions that, when executed by the processor, cause the computing device to implement the method of the first aspect described above.

A fourth aspect of the present application provides a computer readable storage medium having stored thereon program instructions, characterized in that the program instructions, when executed by a computer, cause the computer to carry out the method of the first aspect described above.

A fifth aspect of the application provides a computer program product comprising a computer program which, when executed by a processor, causes the processor to perform the method of the first aspect.

Drawings

The various features and the connections between the various features of the present application are further described below with reference to the drawings. The figures are exemplary, some features are not shown to scale, and some of the figures may omit features that are conventional in the art to which the application relates and are not essential to the application, or show additional features that are not essential to the application, and the combination of features shown in the figures is not intended to limit the application. In addition, the same reference numerals are used throughout the specification to designate the same components. The specific drawings are illustrated as follows:

FIG. 1 is a flow chart of a service anomaly detection method of the present application;

FIG. 2 is a schematic flow chart of a specific implementation of the service anomaly detection method according to the present application;

FIG. 3 is a schematic structural diagram of a service anomaly detection device according to the present application;

fig. 4 is a schematic diagram of an exemplary specific structure of the service abnormality detection apparatus according to the present application.

Fig. 5 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

To accurately describe the technical contents in the present application and to accurately understand the present application, the terms used in the present specification are given the following explanations or definitions before the description of the specific embodiments.

Performance indexes are as follows: the performance index may be, but is not limited to, CPU utilization, disk read-write speed, memory occupancy, network access rate, etc.

Service indexes are as follows: the actual service type of the service processing may be obtained by respectively counting the number of Transactions processed Per Second (TPS, Transactions Per Second) according to the service request type. Taking the gateway service as an example, the service index may be, but is not limited to, a ticket selling request, a ticket selling response, a prize exchanging request, a prize exchanging response, a terminal login request, a terminal login response, and the like.

Since the performance index of a service is extremely related to the service index of the service processing, even if the collected performance index has partial fluctuation, the residual error is in a certain range. The embodiment of the application provides a method, a device, equipment and a storage medium for detecting service abnormality, wherein a relational model of performance indexes and service indexes is established first, and then the relational model can be used for effectively identifying the performance index abnormality of the service, so that whether the monitoring index of the service is abnormal or not can be efficiently and accurately identified, and the false alarm and the missing alarm of the abnormal alarm are effectively reduced.

The method and the device for monitoring the operation and maintenance of the mobile terminal are suitable for various intelligent operation and maintenance scenes, and are particularly suitable for operation and maintenance scenes adopting micro-service technology, complex operation and maintenance scenes and operation and maintenance scenes with mass monitoring data. For example, the embodiments of the present application may be applied to, but are not limited to, the following scenarios: the operation and maintenance scene with large system software scale, the operation and maintenance scene with high system software change frequency, the operation and maintenance scene with complex system software calling relation, the operation and maintenance scene with large monitoring data volume and the like.

In the embodiment of the present application, "service" refers to a micro service, a micro service refers to a single small service having a business function, and one micro service is only responsible for one type of business. A service can be encapsulated in business capability under a single responsibility principle and can be independently deployed and operated. In practical application, one system or large-scale service can be split into a plurality of micro-services with single functions according to actual requirements. For example, the "service" of the embodiment of the present application may be a ticketing service, a cashing service, and the like, the service indicators of the ticketing service may include, but are not limited to, a ticketing request, a ticketing response, and the like, and the service indicators of the cashing service may include, but are not limited to, a cashing request, a cashing response.

Fig. 1 shows a schematic flow chart of a service anomaly detection method provided in an embodiment of the present application. Referring to fig. 1, a method for detecting service abnormality provided in the embodiment of the present application may include the following steps:

step S110, determining a service index related to the performance index of the service;

in some embodiments, a correlation algorithm may be used to determine a service performance indicator related business indicator. The correlation algorithm is a method for measuring the correlation relationship between two variables. In some examples, the correlation algorithm may be, but is not limited to, a pearson correlation Coefficient method, a Spearman (Spearman) rank correlation Coefficient method, a Kendall (Kendall) rank correlation Coefficient method, a Maximum Information Coefficient (MIC) method, and the like.

In practical applications, the performance index-related service index is related to the service type of the service processing. For example, the CPU utilization of the ticketing service is related to the ticketing request, the CPU utilization is a performance index, and the ticketing request is a business index. As another example, disk utilization is typically a constant value, with no associated traffic indicator. For another example, the CPU utilization rate of the prize exchanging service is related to the terminal prize exchanging request, and the CPU utilization rate of the prize exchanging service is also related to the management end prize exchanging request, the CPU utilization rate is a performance index, and the terminal prize exchanging request, the management end prize exchanging request and the terminal ticket selling request are business indexes. The correlation between the performance indexes and the service indexes can be determined by a correlation algorithm. In practical application, a service index library can be configured in advance through expert experience according to a service scene, and then a service index related to a performance index can be found out from the service index library through a correlation algorithm.

Step S120, a relation model of the service index and the performance index is constructed;

in some embodiments, a relationship model of the business index and the performance index may be constructed based on a supervised learning algorithm using the training data set. For example, the relational model may be, but is not limited to, a fitted curve or other model.

In a specific application, the training data set can be obtained by collecting historical index data. In some embodiments, the training data set may include performance index historical data and business index historical data for the service, the performance index historical data including E historical acquisition values for the performance index, the business index historical data including E historical acquisition values for the business index, E being an integer greater than 1. For example, taking ticketing service as an example, the training data set may include CPU utilization rate history data, disk read-write speed history data, memory occupancy rate history data, and network access rate history data, and the training data set may also include ticket selling request history data, ticket selling response history data, and the like. For another example, the CPU utilization rate historical data includes 100 historical collected values of the CPU utilization rate, the ticket selling request historical data includes 100 historical collected values of the ticket selling request, and the 100 historical collected values of the CPU utilization rate are consistent with the 100 historical collected value time periods of the ticket selling request.

Specifically, an exemplary process of building a relational model may be: and fitting a first curve by using the historical data of the performance indexes and the historical data of the service indexes based on a supervised learning algorithm to obtain a relation model indicating the internal function relation of the first curve, wherein the first curve is a relation curve of the performance indexes and the service indexes.

In some examples, the first curve may be fitted with a custom functional relationship to build a relationship model.

In some examples, the first curve may be fitted with a custom function to establish the relationship model, for example, the custom function may be, but is not limited to, a linear relationship of the following equation (1), a logarithmic relationship shown in the following equation (2), and the like.

y＝k*x+b (1)

y＝a*log₂b*x+c (2)

Wherein y represents the value of the performance index, x represents the value of the service index, k, a, b and c respectively represent fitting parameters, and k, a, b and c can be determined through fitting.

In some examples, a machine learning algorithm may be employed for fitting to build the relational model. For example, the machine learning algorithm may be, but is not limited to, Support Vector Machines (SVMs), order preserving regression, polynomial regression, neural networks, decision trees, and the like.

Step S130, determining an abnormal threshold value of the performance index by using the relation model;

in some embodiments, an anomaly threshold for the relational model is determined using the test data set. Specifically, the predicted data of the performance index may be obtained based on the test data of the service index related to the performance index in the test data set and the relationship model, the residual mean value may be calculated according to the predicted data of the performance index and the test data of the performance index in the test data set, and a product of the residual mean value and a predetermined residual multiple may be used as the abnormal threshold of the performance index. The test data set may include test data of a service index and test data of a performance index, the test data of the service index may include F collected values of the service index, the test data of the performance index may include F collected values of the performance index, the prediction data of the performance index includes F predicted values of the performance index, and F is an integer greater than or equal to 1. Here, the data in the test data set is also time consistent.

Taking ticketing service as an example, the test data set may include CPU utilization rate test data, disk read-write speed test data, memory occupancy rate test data, and network access rate test data, and the test data set may also include ticketing request test data, ticketing response test data, and the like. The CPU utilization rate test data can comprise 50 acquisition values of the CPU utilization rate, the ticket selling request test data can comprise 50 acquisition values of the ticket selling request, and the 50 historical acquisition values of the CPU utilization rate are consistent with the 50 historical acquisition values of the ticket selling request in time.

In some embodiments, the residual multiple may take an empirical value. For example, the value range of the residual multiple can be a value between 1.0 and 2.0. The residual multiple may be calculated as: and determining the mean value and the variance of the residual error distribution, and dividing the value at the position 3 times the variance from the mean value by the mean value to obtain the residual error multiple. Specifically, a quotient obtained by dividing a value at 3 times the variance from the mean by the mean may retain a numerical value obtained 1 digit after its decimal point as a residual multiple. In practical application, the residual error multiple obtained by the calculation mode can be used as a default value or a recommended value, and a user can be allowed to reset the value of the residual error multiple based on experience.

Tests show that if the residual mean value is directly used as an abnormal threshold value, a large amount of false alarms can be generated. The product of the residual mean value and a preset fixed multiple (namely the residual multiple) is used as the abnormal threshold, so that normal index jitter can be contained, false alarms can be effectively reduced, and the abnormal threshold is determined in the mode, so that the method is low in complexity, easy to implement, high in efficiency and low in consumption of computing resources.

The method for determining the abnormality threshold is not limited to the above method, and the abnormality threshold may be determined by other methods. For example, the residual between the predicted data of the performance index and the real data thereof may be calculated and the maximum value of the residual may be used as the abnormal threshold.

The test data set comprises n samples (n is an integer greater than 1), i is 1, 2, 3, …, n, and each sample comprises a collection value of a performance index and a value of a service index related to the performance index.

In some embodiments, the residual between the predicted data and the actual data of the performance indicator can be calculated by the following formula (3):

wherein, reduceⁱDenotes the residual of the i-th sample, y_predFitting value, y, obtained by using the relation model for the ith sample_trueIs the collection value of the performance index.

In some embodiments, the residual mean may be calculated by the following equation (4):

where, reduced _ mean represents the residual mean, and n is the number of samples in the test data set.

In practical application, different performance indexes can be calculated according to the formula (3) and the residual mean value according to the formula (4).

In some embodiments, data distribution of the performance index and the service index can be visualized through a scatter diagram, and operation and maintenance personnel can manually adjust the residual error multiple as required to change the abnormal threshold. Therefore, the abnormal threshold can be set more accurately or verified in a visual mode, and the user can conveniently adjust the abnormal threshold according to the service importance so as to more flexibly control the alarm effect of the service abnormality.

For example, for a service with a higher importance degree, it is more important to avoid missed alarm, and more false alarms can be accepted, and at this time, the operation and maintenance personnel can turn down the abnormal threshold value by manually adjusting the residual error multiple, so as to further reduce missed alarm, and make the abnormal alarm of the service more sensitive. For the service with lower importance degree, the alarm is allowed to have certain time delay, the false alarm needs to be reduced as far as possible, and at the moment, the operation and maintenance personnel can adjust the residual error multiple manually to increase the abnormal threshold so as to further reduce the false alarm.

Step S140, according to the collected value of the service index related to the performance index at the first moment, calling a relation model to obtain the predicted value of the performance index related to the service index at the first moment, and according to the predicted value of the performance index at the first moment, the collected value of the performance index at the first moment and the abnormal threshold value of the performance index, determining whether the service is abnormal or not.

Here, the first time may be any time, and may be, but not limited to, the current time, a past time, or a preset time.

In some embodiments, the predicted value of the performance index may be, but is not limited to, a predicted value of CPU utilization, a predicted value of disk read-write speed, a predicted value of memory occupancy, a predicted value of network access rate, and the like.

In some embodiments, the collected value of the performance index may be, but is not limited to, a collected value of CPU utilization, a collected value of disk read-write speed, a collected value of memory occupancy, a collected value of network access rate, and the like. In some embodiments, an exemplary implementation flow of step 140 may include the following steps:

a1, acquiring a performance index value and a service index value of a service in real time, aggregating the performance index values according to minutes, and aggregating the service index values according to minutes to realize time alignment of the performance index and the service index, thereby acquiring an acquisition value of the performance index and an acquisition value of the service index;

here, the time of the performance index refers to the collection time of the performance index collection value, and the time of the service index may be the service request warehousing time.

In some examples, the aggregation process for the performance indicators may be: and calculating the average value of the performance index values acquired within a predetermined time period (for example, 1 minute) by taking the acquisition time of the performance index as the termination time, and taking the average value as the performance index acquisition value of the acquisition time. For example, the service index is counted once per second, the performance index is collected once every 30 seconds, and the collection time of the performance index may be, for example: 12:00:10,12:00:40,12:01:10, taking the 12:00:40 acquisition time as an example, the mean of all performance index values between the time 11:59:40 and the time 12:00:40 is taken as the performance index acquisition value at the time 12:00: 40.

In specific application, the aggregation of the service indexes is the same as the aggregation of the performance indexes, and the termination time of the service index aggregation is the same as the termination time of the performance index aggregation.

Taking the gateway service as an example, the data of the service index of each service type can be acquired from the traffic routing gateway by minutes.

Step a2, calling a relation module to predict the processed data of the service index to obtain a predicted value of the performance index related to the service index;

step a3, calculating the residual between the predicted value of the performance index and the collected value of the performance index, and judging whether the residual exceeds the abnormal threshold of the performance index to determine whether the service is abnormal or normal.

Here, multiple performance indicators, one for each anomaly threshold, may be monitored simultaneously for a service.

In some examples, the process of determining whether the service is abnormal or normal may include: if the service has g individual performance indexes, the residual errors of h individual performance indexes in the g individual performance indexes exceed the abnormal threshold value, the service is determined to be abnormal, if the number of the performance indexes of which the residual errors exceed the abnormal threshold value is less than h, the service can be determined to be normal, g is an integer which is greater than or equal to 1, and h is an integer which is greater than or equal to 1 and less than or equal to g.

In some examples, a service may monitor one or more performance indicators (e.g., CPU utilization, disk read/write speed, memory occupancy, network access rate, etc.), and determining whether a service is abnormal or normal may include: a service is determined to be abnormal if a predetermined performance indicator (e.g., CPU utilization) of the service continuously exceeds its abnormal threshold m times, even if none of the other performance indicators of the service exceeds its abnormal threshold, and the service is determined to be normal if the predetermined performance indicator (e.g., CPU utilization) of the service continuously exceeds its abnormal threshold m times, where m is an integer greater than or equal to 1, and m may be 3, for example.

Referring to fig. 2, an exemplary specific implementation flow of the method according to the embodiment of the present application may include the following steps:

step S210, the service index value of the service and the performance index value of each service are collected by using the index collection and analysis tool, and the collection value of the service index and the collection value of the performance index of the service are obtained.

In some embodiments, monitoring software such as zabbix, prometis, etc. may be used to collect data related to performance metrics of services, and data analysis engines such as elastic search, etc. may be used to count data related to business metrics.

In some embodiments, the performance indicators of the service may be, but are not limited to: CPU utilization rate, memory usage size, disk read-write speed, network access speed and the like.

In some embodiments, the requested number of each service may be counted separately according to the service type of the system/service processing, and used as different service indexes. Here, the service type is a general term of the same type of service index, such as a prize exchange service, a ticket selling service, and the like. Taking the prize exchanging service as an example, the service indexes may include indexes related to actual services, such as terminal prize exchanging, management terminal prize exchanging, prize exchanging inquiry and the like. The service processing has different service types and different service indexes, and the specific service indexes of the service can be customized according to the specific service.

Step S220, data preprocessing, including: and processing missing data and outlier data in the index data, and performing data type conversion and data standardization to obtain a data format which can be used for a supervised learning algorithm or a correlation algorithm.

Here, the index data includes service index data and performance index data.

In some embodiments, the cleaning may include: and cleaning the index data according to the missing value and/or the abnormal value in the index data. For example, because the acquired index data has high acquisition density, part of the indexes are acquired once every second, part of the indexes are acquired once every 30 seconds, and the acquisition interval is not more than 1 minute at most, when the proportion of missing values is not more than 10%, the missing data can be directly removed by adopting a deleting method to realize cleaning.

In some embodiments, the index data is aligned in a manner similar to the aggregation manner described above, and is not described herein again.

Step S230, a correlation algorithm is used to determine the service index related to the performance index.

In some embodiments, a correlation coefficient algorithm such as pearson may be used to extract the business index that is most relevant to the performance index of the service from the plurality of business indexes. In a specific application, if the service split is fine enough, the service index most relevant to the performance index of the service may only include class 1.

Step S240, fitting by using a supervised learning algorithm or a custom function and a training data set to obtain a relation model between the performance index and the service index of the service;

step S250, determining an abnormal threshold value of the performance index of the service based on the relation model;

step S260, determining the predicted value of each performance index based on the collected value of the service index related to the performance index and the relation model, calculating the residual between the predicted value of the performance index and the collected value of the performance index, comparing the residual with the abnormal threshold of the performance index to determine whether the service is abnormal or normal, if the service is abnormal, continuing step S270, if the service is normal, not executing the action, repeatedly executing step S210 and step S260, and continuing the service abnormality detection at the next moment or in the next period.

And step S270, when the service is abnormal, sending out an alarm prompt.

Fig. 3 shows an exemplary structural schematic diagram of a service anomaly detection device provided in an embodiment of the present application. Referring to fig. 3, a service anomaly detection apparatus provided in an embodiment of the present application may include:

a correlation determination module 31, configured to determine a service index related to a performance index of a service;

a model construction module 32, configured to construct a relationship model between the service index and the performance index;

a threshold determination module 33, configured to determine an abnormal threshold of the performance indicator by using the relationship model;

and the anomaly determination module 34 is configured to invoke the relationship model according to the acquired value of the service index related to the performance index at the first time to obtain a predicted value of the performance index related to the service index at the first time, and determine whether the service is anomalous according to the predicted value of the performance index at the first time, the acquired value of the performance index at the first time, and the anomaly threshold of the performance index.

In some embodiments, model building module 32 is specifically operable to: building a relation model of a business index and a performance index based on a supervised learning algorithm by utilizing a training data set, wherein the training data set comprises performance index historical data and business index historical data of the service, the performance index historical data comprises E historical acquisition values of the performance index, the business index historical data comprises E historical acquisition values of the business index, and E is an integer larger than 1.

In some embodiments, the threshold determination module 33 is specifically configured to: obtaining the prediction data of the performance index based on the test data of the service index related to the performance index in the test data set and the relation model, calculating to obtain a residual average value according to the prediction data of the performance index and the test data of the performance index in the test data set, and taking the product of the residual average value and a preset residual multiple as an abnormal threshold value of the performance index; the test data set comprises test data of a service index and test data of a performance index, the test data of the service index comprises F collected values of the service index, the test data of the performance index comprises F collected values of the performance index, the prediction data of the performance index comprises F predicted values of the performance index, and F is an integer greater than or equal to 1.

In some embodiments, the anomaly determination module 34 is specifically operable to: determining a residual error between a predicted value of the performance index at a first moment and an acquired value of the performance index at the first moment; comparing the residual of the performance indicator to an anomaly threshold of the performance indicator to determine whether the service is anomalous or normal.

In a specific application, the service anomaly detection device provided by the embodiment of the present application may be implemented by software, hardware, or a combination of the two.

Fig. 4 shows an exemplary specific implementation structure of a service anomaly detection apparatus provided in an embodiment of the present application. Referring to fig. 4, an exemplary specific implementation structure of the service anomaly detection apparatus provided in the embodiment of the present application may include:

the data processing unit 41 includes an index collection module 411, an index aggregation module 412, and an index alignment module 413, where the index collection module 411 is mainly used to collect data of a service index and a performance index, the index aggregation module 412 is mainly used to aggregate index data according to time to obtain a performance index collection value and a service index collection value, the index alignment module 413 is mainly used to align data (for example, collection values) of the service index and the performance index, and a unified data set can be generated by the index collection module 411, the index aggregation module 412, and the index alignment module 413, where the unified data set includes the aligned service index collection value and the aligned performance index collection value.

The dynamic fitting unit 42 includes five modules: the data cleaning module 421, the correlation determination module 31, the model construction module 32, the threshold determination module 33, and the anomaly determination module 34, wherein the data cleaning module 421 can be used for processing missing data, outlier data, data type conversion, and data normalization, and generating data which can be used for a supervised learning algorithm or a correlation algorithm.

Fig. 5 is a schematic structural diagram of a computing device 500 provided in an embodiment of the present application. The computing device 500 includes: processor 510, memory 520.

The processor 510 may be coupled to the memory 520, among other things. The memory 520 may be used to store the program codes and data. Therefore, the memory 520 may be a storage unit inside the processor 510, may be an external storage unit independent of the processor 510, or may be a component including a storage unit inside the processor 510 and an external storage unit independent of the processor 510.

Optionally, the computing device 500 may further include: a communication interface 530. It is to be appreciated that communication interface 530 in computing device 500 shown in FIG. 5 may be used to communicate with other devices.

Optionally, computing device 500 may also include a bus 540. The memory 520 and the communication interface 530 may be connected to the processor 510 via a bus 540. The bus 540 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 540 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in FIG. 5, but this does not represent only one bus or one type of bus.

It should be understood that, in the embodiment of the present application, the processor 510 may adopt a Central Processing Unit (CPU). The processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. Or the processor 510 may employ one or more integrated circuits for executing related programs to implement the technical solutions provided in the embodiments of the present application.

The memory 520 may include both read-only memory and random access memory, and provides instructions and data to the processor 510. A portion of processor 510 may also include non-volatile random access memory. For example, processor 510 may also store information of the device type.

When the computing device 500 is running, the processor 510 executes the computer-executable instructions in the memory 520 to perform the operational steps of the service anomaly detection method described above.

It should be understood that the computing device 500 according to the embodiment of the present application may correspond to a corresponding main body for executing the method according to the embodiments of the present application, and the above and other operations and/or functions of each module in the computing device 500 are respectively for implementing corresponding flows of each method of the embodiment, and are not described herein again for brevity.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is used to execute the service anomaly detection method described above when being executed by a processor.

The computer storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the processor executes the service anomaly detection method described above. Here, the programming language of the computer program product may be one or more, and the programming language may include, but is not limited to, an object oriented programming language such as Java, C + +, etc., a conventional procedural programming language such as "C" language, etc.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application.

Claims

1. A method for detecting a service anomaly, comprising:

determining a business index related to a performance index of a service;

constructing a relation model of the service index and the performance index;

and determining whether the service is abnormal or not according to the predicted value of the performance index at the first moment, the acquired value of the performance index at the first moment and the abnormal threshold value of the performance index.

2. The method according to claim 1, wherein the constructing a relational model of the business index and the performance index comprises: building a relation model of a business index and a performance index based on a supervised learning algorithm by utilizing a training data set, wherein the training data set comprises performance index historical data and business index historical data of the service, the performance index historical data comprises E historical acquisition values of the performance index, the business index historical data comprises E historical acquisition values of the business index, and E is an integer larger than 1.

3. The method of claim 1, wherein determining the anomaly threshold for the performance metric using the relational model comprises:

obtaining the prediction data of the performance index based on the test data of the service index related to the performance index in the test data set and the relation model, calculating to obtain a residual average value according to the prediction data of the performance index and the test data of the performance index in the test data set, and taking the product of the residual average value and a preset residual multiple as an abnormal threshold value of the performance index;

the test data set comprises test data of a service index and test data of a performance index, the test data of the service index comprises F collected values of the service index, the test data of the performance index comprises F collected values of the performance index, the prediction data of the performance index comprises F predicted values of the performance index, and F is an integer greater than or equal to 1.

4. The method according to claim 1, wherein determining whether the service is abnormal according to the predicted value of the performance index at the first time, the collected value of the performance index at the first time, and the abnormal threshold of the performance index comprises:

determining a residual error between a predicted value of the performance index at a first moment and an acquired value of the performance index at the first moment;

comparing the residual of the performance indicator to an anomaly threshold of the performance indicator to determine whether the service is anomalous or normal.

5. A service anomaly detection device, comprising:

6. The service anomaly detection device according to claim 5, wherein the model construction module is specifically configured to: the method comprises the steps of constructing a relation model of a service index and a performance index based on a supervised learning algorithm by utilizing a pre-obtained training data set, wherein the training data set comprises performance index historical data and service index historical data of the service, the performance index historical data comprises E historical acquisition values of the performance index, the service index historical data comprises E historical acquisition values of the service index, and E is an integer larger than 1.

7. The service anomaly detection device according to claim 5, wherein the threshold determination module is specifically configured to: obtaining the prediction data of the performance index based on the test data of the service index related to the performance index in the test data set and the relation model, calculating to obtain a residual average value according to the prediction data of the performance index and the test data of the performance index in the test data set, and taking the product of the residual average value and a preset residual multiple as an abnormal threshold value of the performance index;

the test data set comprises test data of service indexes and test data of performance indexes, the test data of the service indexes comprise F service index acquisition values, the test data of the performance indexes comprise F performance index acquisition values, the prediction data of the performance indexes comprise F performance index prediction values, and F is an integer greater than or equal to 1.

8. The service anomaly detection device according to claim 5, wherein the anomaly determination module is specifically configured to:

9. A computing device, comprising: a processor and a memory; the memory is to store program instructions that, when executed by the processor, cause the computing device to implement the method of any of claims 1 to 4.

10. A computer-readable storage medium having stored thereon program instructions, which, when executed by a computer, cause the computer to carry out the method of any one of claims 1 to 4.