CN113556258B

CN113556258B - Anomaly detection method and device

Info

Publication number: CN113556258B
Application number: CN202010331814.9A
Authority: CN
Inventors: 胡永昌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2022-12-27
Anticipated expiration: 2040-04-24
Also published as: CN113556258A; WO2021213247A1

Abstract

An anomaly detection method and device are used for finding network anomalies quickly in a network change scene. The method comprises the following steps: determining a first matrix from first values of a plurality of KPIs for a first service and a first neural network model, the first matrix comprising differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N time points; the predicted values of the KPIs are obtained based on a first neural network model; determining abnormal results corresponding to the N time points according to the first matrix, wherein the abnormal result corresponding to any time point is whether the first service at any time point is abnormal or not; determining the abnormal degree of a plurality of KPIs at each abnormal time point according to the abnormal result and the first matrix, wherein the abnormal degree of any KPI at each abnormal time point is the percentage of the difference value corresponding to any KPI in the sum value of the difference values corresponding to the KPIs; and determining the abnormal type of the first service at each abnormal time point according to the abnormal degrees of the plurality of KPIs at each abnormal time point.

Description

Anomaly detection method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to an anomaly detection method and apparatus.

Background

In the network operation process, the operation of network change is very frequent, and numerous network elements are involved. For example, network change operations may include upgrades, cutoffs, capacity expansions, and the like. When a network is changed, the number of network elements involved in the core network is large. In particular, in the fifth generation (5th generation, 5g) network, it is more important to ensure smooth and safe change of the network. Therefore, in a network change scenario, a proper anomaly detection function is required to ensure that a network anomaly is discovered as soon as possible within a period of time (observation period) after a network change operation, maintenance is performed, success of the network change is ensured, or loss is stopped in time (e.g., rollback operation).

At present, the existing anomaly detection algorithms support daily network scenes, and no anomaly detection method specially suitable for network change scenes exists.

Disclosure of Invention

The application provides an anomaly detection method and device, which are used for providing an anomaly detection method suitable for a network change scene so as to find network anomalies in the network change scene quickly.

In a first aspect, the present application provides an anomaly detection method, including: determining a first matrix according to first values of a plurality of Key Performance Indicators (KPIs) of a first service and a first neural network model, wherein the first matrix comprises differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N time points; determining abnormal results corresponding to the N time points according to the first matrix, wherein the abnormal result corresponding to any time point is whether the first service at any time point is abnormal or not; then, determining the abnormal degree of the plurality of KPIs at each abnormal time point according to the abnormal result and the first matrix, and determining the abnormal type of the first service at each abnormal time point according to the abnormal degree of the plurality of KPIs at each abnormal time point; wherein the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first business; the first service is any one of a plurality of services; n is an integer greater than or equal to 1; the abnormality degree of any KPI at each abnormal time point is the percentage of the difference value corresponding to any KPI in the sum value of the difference values corresponding to the KPIs.

By the method, the problems of zero drop and deformation of the KPI in the network change scene can be solved, and the network abnormity can be found quickly in the network change scene, so that operation and maintenance personnel can find the network abnormity in the change period as soon as possible and stop loss in time.

In one possible design, before a first matrix is determined according to first values of a plurality of KPIs of a first service and a first neural network model, the plurality of KPIs are classified according to the services to obtain KPIs corresponding to the plurality of services respectively, and a KPI corresponding to any one service is selected from the KPIs corresponding to the plurality of services respectively as the plurality of KPIs of the first service.

By the method, the abnormal detection of different services can complement interference, and the granularity of the abnormal detection is controlled at the service level, so that the abnormal positioning is convenient and more accurate.

In one possible design, the first matrix is determined according to the first values of the KPIs of the first service and the first neural network model, and the specific method may be as follows: generating a second matrix based on the first values of the KPIs, wherein the second matrix comprises second values of the KPIs at N time points in each acquisition window in M acquisition windows before the current acquisition window; then inputting the second matrix into the first neural network model to obtain predicted values of the KPIs at N time points; finally, determining the difference value between the predicted values of the KPIs and the first values of the KPIs at N time points, and generating the first matrix; wherein M is an integer greater than or equal to 1.

By the method, the first matrix can be accurately obtained, namely the residual error between the actual data and the predicted data of the KPIs is obtained, so that the abnormal positioning can be accurately carried out subsequently.

In a possible design, the method for determining the abnormal results corresponding to the N time points according to the first matrix may include: determining a KPI comprehensive abnormal value corresponding to each time point based on the difference values corresponding to a plurality of KPIs at each time point in the first matrix; determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; determining that the first service is abnormal at a time point when the KPI comprehensive abnormal value at the time point is greater than the first threshold value; determining that the first service is not abnormal at a time point when the KPI comprehensive abnormal value at the time point is less than or equal to the first threshold value; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the KPIs of the first service; the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormity percentage.

By the method, the abnormal time points of the first service can be determined in preparation, and the abnormal time points of the first service are not abnormal, so that specific abnormal positioning at abnormal time points can be further performed later.

In a possible design, the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormal percentage, and a specific method may be as follows: sorting the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain sorted KPI comprehensive abnormal values; according to the preset abnormal percentage, determining a target KPI comprehensive abnormal value corresponding to the abnormal percentage in the sorted KPI comprehensive abnormal values; taking the determined target KPI comprehensive abnormal value as the third threshold value; wherein the synthetic abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained from the first values of the plurality of KPIs at the N time points.

The third threshold value can be accurately obtained through the method, so that the first threshold value is determined by flexibly combining the third threshold value and the second threshold value when the abnormity judgment at each time point is carried out, and the false alarm is suppressed.

In a possible design, the determining the abnormal type of the first service according to the abnormal degree of the KPIs at each abnormal time point may be: sorting the abnormality degrees of the KPIs at each abnormal time point from high to low to obtain the abnormality degrees of the sorted KPIs at each time point; taking the abnormal type corresponding to the abnormal degree of the previous H KPIs at each abnormal time point as the abnormal type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

By the method, the operation and maintenance personnel can be helped to quickly locate the abnormal problem.

In a second aspect, the present application also provides an abnormality detection apparatus having a function of realizing the above first aspect or each of the possible design examples of the first aspect. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions.

In a possible design, the structure of the abnormality detection apparatus may include a plurality of processing units, such as a first processing unit, a second processing unit, a third processing unit, and the like, which may perform corresponding functions in the first aspect or each possible design example of the first aspect, for which specific reference is made to detailed description in the method example, and details are not repeated here.

In one possible design, the structure of the anomaly detection apparatus includes a memory and a processor, and the processor is configured to support the anomaly detection apparatus to perform corresponding functions in the first aspect or each possible design example of the first aspect. The memory is coupled to the processor and holds the program instructions and data necessary for the anomaly detection device.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, which stores program instructions, and when the program instructions are executed on a computer, the computer is caused to execute the first aspect and any possible design thereof. By way of example, computer readable storage media may be any available media that can be accessed by a computer. Taking this as an example but not limiting: the computer-readable medium may include a non-transitory computer-readable medium, a random-access memory (RAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In a fourth aspect, the present application provides a computer program product comprising computer program code or instructions which, when run on a computer, enable the computer to implement the method of any one of the possible designs of the first aspect.

In a fifth aspect, the present application further provides a chip, coupled to a memory, for reading and executing program instructions stored in the memory to implement the method in any one of the possible designs of the first aspect.

For each of the second aspect to the fifth aspect and possible technical effects of each aspect, please refer to the above description of the possible technical effects of each possible solution in the first aspect, and no repeated description is given here.

Drawings

Fig. 1 is a schematic diagram of a 5G network change scenario provided in the present application;

FIG. 2 is a flow chart of an anomaly detection method provided by the present application;

fig. 3 is a schematic diagram of a KPI packet provided in the present application;

FIG. 4 is a schematic diagram of a KPI according to a service classification provided by the present application;

FIG. 5 is a schematic diagram of offline training and online detection provided herein;

FIG. 6 is a schematic diagram of a conventional neural network provided herein;

FIG. 7 is a schematic diagram of a neural network generation provided herein;

FIG. 8 is a schematic diagram of a recurrent neural network provided herein;

FIG. 9 is a schematic diagram of an LSTM network provided by the present application;

FIG. 10 is a schematic diagram of a composite LSTM neural network provided herein;

FIG. 11 is a schematic diagram of a relationship between a current time point and n previous time points provided herein;

FIG. 12 is a schematic illustration of data in a current time window and all data in n consecutive previous time windows as provided herein;

FIG. 13 is a schematic diagram of an input-output view of a neural network provided herein;

FIG. 14 is a schematic diagram of the input and output of a Composite LSTM neural network provided herein;

FIG. 15 is an example of a network information view of Composite LSTM provided by the present application;

FIG. 16 is a diagram illustrating data before and after a network change operation according to the present application;

FIG. 17 is a schematic diagram illustrating an effect of using Composite LSTM to deal with the problem of zero dropping and deformation of a network change scene KPI provided by the present application;

FIG. 18 is a graph illustrating the effectiveness of the test provided herein based on multiple types of KPIs;

FIG. 19 is a schematic flow chart of a method for obtaining a first matrix according to the present application;

fig. 20 is a schematic diagram of ifoest provided in the present application;

FIG. 21 is a schematic illustration of a threshold value provided herein;

FIG. 22 is a schematic diagram of training data, detection data and KPI complex outliers at each time point provided by the present application;

FIG. 23 is a diagram illustrating a ranking of a degree of abnormality for a plurality of KPIs at an abnormal time provided herein;

FIG. 24 is a schematic view of a flow chart of an anomaly detection method provided herein;

fig. 25 is a schematic structural diagram of an abnormality detection apparatus according to the present application;

fig. 26 is a block diagram of an abnormality detection device according to the present application.

Detailed Description

The present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application provides an anomaly detection method and device, which are used for providing an anomaly detection method suitable for a network change scene to realize that network anomalies are quickly found in the network change scene. The method and the device are based on the same technical concept, and because the principle of solving the problems of the method and the device is similar, the implementation of the device and the method can be mutually referred, and repeated parts are not described again.

In the network operation process, compared with a daily monitoring scene, a Key Performance Indicator (KPI) in a network change scene has the following characteristics: (1) The success rate and traffic related KPIs drop to zero at the time of change operation. Because the network change operation (such as network element upgrade) has operations such as resetting or kicking out users, the success rate and the service quantity related KPI drop zero. (2) After the traffic-related KPI falls to zero, the traffic-related KPI slowly climbs, and the KPI has deformation in the stage. For example, after resetting or kicking out the user, the user may start to gradually access the network (e.g., the kicked-out user rejoins the upgraded network element), and the traffic may slowly climb the slope.

However, the existing anomaly detection algorithms are all supporting daily monitoring scenes (namely scenes without KPI zero-drop and climbing deformation), and are not specially suitable for network change scenes at present. Due to the characteristics of zero dropping and deformation of KPI under a changed scene, the existing abnormal detection algorithm for daily monitoring scenes can generate a large amount of false reports and false reports.

In addition, in the prior art, single-index anomaly detection is generally adopted, but in practice, only a single index is observed, and the service anomaly is difficult to support and detect. For example, under the condition of normal service, the KPI may occasionally jitter, and the single-index abnormal detection may cause false alarm; for another example, even if a local small-range abnormality occurs, the KPI is affected, but because the system toughness is better and the robustness is higher, the service is quickly recovered to normal, and abnormal reporting is not needed, but a single-index abnormality detection false report is performed; for another example, sometimes the system is abnormal, but the observed KPIs are not obviously reflected or reflected in a delayed manner, and the single-index abnormal detection fails to report or reports in a delayed manner. Specifically, in the same service, the related indexes include L1 layer main indexes (success rate class, etc.), L2 layer auxiliary indexes (trial times, etc.), and L3 layer negative indexes (failure error code class, pointing to specific failure reasons). The traditional single index generally only monitors the key L1 main index, but the user base number is large, the display is not obvious, and the problems of missing report, false report or delayed report of the single index abnormal detection exist.

Based on the above problems, the application provides an anomaly detection method suitable for a network change scene, which solves the problems of zero drop and deformation of KPI in the network change scene, and meanwhile, aims at the defects of various single-index anomaly detection mentioned above, so as to realize the purpose of finding network anomalies quickly in the network change scene.

In the description of the present application, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance, nor order.

It should be understood that "at least one" in the embodiments of the present application means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists singly, A and B exist simultaneously, and B exists singly, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a and b, a and c, b and c, or a, b and c, wherein a, b and c can be single or multiple.

In order to more clearly describe the technical solution of the embodiment of the present application, the following describes in detail an abnormality detection method and apparatus provided by the embodiment of the present application with reference to the accompanying drawings.

The anomaly detection method provided by the embodiment of the application is suitable for networks with network change operation, such as 5G networks or networks for future communication, such as 6G networks. Specifically, the anomaly detection method is applied to a plurality of network elements in a network, for example, in a 5G network change scenario shown in fig. 1, network change operation upgrade, cutover, capacity expansion, and the like may relate to a Home Subscriber Server (HSS), a Unified Service Node (USN), a unified policy and charging controller (cc), an universal voice service server (ATS), a Call Session Control Function (CSCF), a Unified Gateway (UGW), and the like in the network elements included in a core network. It should be noted that, in the 5G network change scenario shown in fig. 1, other network elements may also be included, and are not shown here one by one. The names of the network elements in fig. 1 are only used as examples, and in future communications, such as 6G, they may also be referred to as other names, or in future communications, such as 6G, the network elements referred to in this application may also be replaced by other entities or devices with the same functions, and the like, which is not limited in this application. The unified description is made here, and the description is not repeated.

The anomaly detection method provided by the embodiment of the present application may be applied to the network element shown or not shown in fig. 1, and may also be applied to a chip or a chip set in the network element. Referring to fig. 2, an execution subject is taken as an example of the abnormality detection apparatus to describe the abnormality detection method provided in the present application, and a specific flow of the method may include:

step 201: the abnormality detection device determines a first matrix according to first values of a plurality of KPIs of a first service and a first neural network model, wherein the first matrix comprises differences of predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N time points; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first traffic; the first service is any one of a plurality of services; n is an integer greater than or equal to 1.

The plurality of KPIs are correlation indicators of the first service, that is, are related to the first service. The KPIs include all indicators of the first service, for example, a primary indicator of the L1 layer (success rate class, etc.), a secondary indicator of the L2 layer (trial times, etc.), a negative indicator of the L3 layer (failure error code class, pointing to a specific failure reason), and so on.

In an alternative embodiment, the anomaly detection apparatus needs to perform service-based grouping on KPIs before determining the first matrix according to the first values of the KPIs of the first service and the first neural network model, specifically: the abnormality detection device classifies the plurality of KPIs according to services to obtain KPIs corresponding to the services respectively; and selecting the KPI corresponding to any service from the KPIs corresponding to the services as the KPIs of the first service. It should be understood that any one of the multiple services may be used as the first service, that is, the anomaly detection process of the first service in the present application may represent the anomaly detection processes of all services, the anomaly detection processes of all services may be performed according to the anomaly detection process of the first service, and the anomaly detection between the multiple services is independent of each other.

Specifically, the KPI corresponding to each service may exist in the form of a KPI grouping table, and the plurality of services may include registration related services (e.g., call services), access related services, and the like. In the offline training process of the first neural network model, grouping (classifying) the historical KPIs according to the services to obtain a KPI grouping table of a plurality of services, for example, as shown in (a) of fig. 3; in the online detection (i.e., real-time anomaly detection), real-time KPIs are grouped according to services to obtain a KPI grouping table of multiple services, for example, as shown in (b) of fig. 3.

By the method, KPIs of different services are separated from each other, so that abnormal detection tasks of different services are not interfered with each other. Because, if all KPIs in the whole network are put together for detection, unless the whole network exception occurs, the exception of the business level granularity is easily overwhelmed. In addition, the granularity of the abnormal detection can be controlled at a service level, and the abnormal positioning is convenient. In addition, from an algorithmic perspective, different KPIs may be characterized in the anomaly detection task. Excessive KPIs trigger "dimension cursing problems" which in particular introduce more extraneous feature dimensions with increasing feature dimensions, leading to a degradation of the performance of data analysis or anomaly detection, i.e. the effects on low dimensions (features) such as euclidean distance are significantly reduced in high dimensional (feature) space. Therefore, KPIs are separated according to services, the number of KPIs can be controlled, and the algorithm performance can be well ensured.

Further, the traditional single index anomaly detection often monitors only the main index. In the initial stage of network abnormity, the main index is not obvious, but the auxiliary index or the negative index is displayed. For example, the network is abnormal, the failure times are increased, but the success rate index is not obviously reduced due to the larger base number of users. In the embodiment of the application, the index monitoring range (comprising the auxiliary index, the negative index and the like) is expanded in the same service, and the abnormality can be detected earlier.

For example, fig. 4 shows a schematic diagram of a KPI according to traffic classification. In fig. 4, only CSCF network elements are used as an example to describe, and KPIs are classified according to services in both the training process and the real-time anomaly detection process.

The first neural network model is determined based on the historical values of the KPIs of the first service, that is, the first neural network model is obtained by training based on the historical values of the KPIs of the first service, that is, an offline training process for the first neural network model, for example, the offline training process shown in fig. 5. Wherein the process of determining the first matrix by the first values of the plurality of KPIs of the first service and the first neural network model may be an online detection process as shown in fig. 5. Specifically, the features (waveform + correlation) of a plurality of KPIs are learned by the first neural network model, the plurality of KPIs are predicted at the same time during detection, the obtained predicted values of the plurality of KPIs are compared with the actual value (here, the first value), and the first matrix of the plurality of KPIs is calculated. For example, the first matrix in this application may be referred to as a residual matrix.

In an alternative embodiment, the first neural network model may be a synthetic (composite) Long Short Term Memory (LSTM) neural network. The composite LSTM neural network can be built by combining an encoding-decoding (Encoder-Decoder) framework and an LSTM recurrent neural network, and integrates the characteristics of the Encoder-Decoder framework and the LSTM recurrent neural network.

Specifically, the conventional neural network is generally in a supervised form, such as Deep Neural Network (DNN), convolutional Neural Network (CNN), and the like, that is, the output needs to be trained by a label, for example, the conventional neural network shown in fig. 6. However, in practice, the labels are often difficult to obtain in large quantities, such as image recognition and anomaly detection scenes. In this regard, the present application may train a neural network in the form of a generative neural network, i.e., an output reconstruction input, such as the generative neural network shown in fig. 7. The number of neurons in the hidden layer is generally less than that of the input/output layer, so that the input data is compressed in the hidden layer, and main features are effectively extracted. Such as self-encoding (auto Encoder) based on an Encoder-decoder framework, etc.

However, in processing time series data such as KPIs, the correlation in time series needs to be considered. In this regard, the present application may employ a recurrent neural network, i.e., the hidden layer may be updated over time, such as the recurrent neural network shown in fig. 8. The generated neural network is in an unsupervised mode, the recurrent neural network is in a supervised form, and the most common recurrent neural network is a Recurrent Neural Network (RNN).

In the anomaly detection of the network change scene, no anomaly label is often generated, so an unsupervised algorithm is required. Therefore, the neural network is selectively generated when the first neural network model is built, and the Encoder-Decoder framework is borrowed. Because KPI has obvious time sequence characteristic, it is suitable for recurrent neural network, and at the same time, it can borrow LSTM network with longer time sequence memory than RNN. The LSTM network realizes to memorize the correlation for a long time by determining which information to forget or store, and the like, as shown in fig. 9, which is a schematic diagram of the LSTM network. Based on this, the composite LSTM neural network constructed by the present application fuses an Encoder-Decoder framework and an LSTM network, and specifically, the composite LSTM neural network may be as shown in fig. 10, and specifically may include:

and a reconstruction part: the auxiliary neural network automatically extracts KPI waveforms and relevance features;

and a prediction part: performing multi-KPI prediction based on the extracted features, and finally calculating a multi-KPI residual error matrix;

prediction (preparation) part: there is no fused Encoder-decoder framework for outputting prediction outputs and multi-KPI residual matrices without regard to KPI relevance. Since the multi-KPI residual matrix can additionally be utilized by other algorithms as input. This output can be used instead for requirements that the residuals between KPIs do not affect each other (no relevance taken into account), such as event (incident) aggregate rating positioning.

Further, during real-time anomaly detection, the core objective of the Composite LSTM neural network is to predict multiple KPIs simultaneously based on learned KPI features (waveform + correlation), and then compare the predicted values and actual values of the multiple KPIs to generate a residual matrix of the multiple KPIs. The conventional prediction method will use the relationship between the established time points, i.e. the current time point and all the data in the previous n time points, as shown in fig. 11. In the application, a robust prediction method needs to be designed because the KPI in the network change scene has the characteristics of zero dropping and deformation. The Composite LSTM neural network itself is already more robust than conventional approaches. On this basis, the present application improves the prediction mechanism to establish the relationship between time windows, i.e. the data in the current time window and all the data in the previous n consecutive time windows, as shown in fig. 12. Because the traditional method is the relation between the current point and the previous n points, the method is the relation between the current window data and the previous n window data, and the influence of zero drop is obviously reduced due to the fact that the redundant space is larger, the method can well reduce the influence of zero drop of a single time point KPI.

The present application applies a prediction mechanism based on time window to the prediction of multiple KPIs, which can obtain the input and output view of the neural network, as shown in fig. 13. As can be seen from fig. 13, a plurality of KPIs are collected as a whole from two-dimensional data (time point, different KPIs) by a window with a time length of L, and then a relationship between data in previous review (Lookback) = b time windows { Ti-1, \8230; = 8230; =, ti-b } and data in subsequent review (Lookforward) = f time windows { Ti, \8230 = 8230;, ti + f } is established, where Ti is the current time window. By the method, the prediction of the KPIs can be more robust, and the method can also enable the association relationship among the KPIs to be sufficiently learned by a neural network.

In the present application, in order to adapt the above prediction mechanism based on the time window, the neural network needs to be improved accordingly. First, the input and output of the Composite LSTM neural network can be converted into a tensor (tensor) form, as shown in FIG. 14. Shown in fig. 14 is a three-dimensional matrix (samples, times, features), where the samples correspond to time points within a time window, the times correspond to different time windows, and the features correspond to different KPIs. In the present application, the Lookforward is generally set to 1,lookfork, which can be adjusted as needed. It should be noted that the Lookforward may be set to another value, and this is not limited in this application, and for convenience of description, this application will only describe an example in which Lookforward is set to 1.

For example, an example of a network information view of Composite LSTM obtained based on the above method can be shown in FIG. 15. In the example shown in fig. 15, with 17 KPIs and the sampling frequency of 5 minutes/point as an example, based on the sampling frequency, data for establishing the current window and the window within 4 hours before the current window are selected, thereby setting hookback =12 × 4=48, hookforward =1, all hidden layers are set to 50 neurons, and the hidden layers and the input layer are connected to the output layer by an LSTM network. Also shown in FIG. 15 are the input, reconstruction and prediction parts of Composite LSTM.

Based on the above method, when the real-time abnormality detection is performed on the first neural network model obtained based on the above method, the abnormality detection apparatus determines the first matrix according to the first values of the multiple KPIs of the first service and the first neural network model, and the specific method may be: the abnormality detection device generates a second matrix based on the first values of the KPIs, and inputs the second matrix into the first neural network model to obtain predicted values of the KPIs at N time points; then the abnormality detection device determines the difference value between the predicted values of the KPIs and the first value of the KPIs at N time points to generate the first matrix; wherein, the second matrix includes second values of multiple KPIs at N time points in each of M acquisition windows before the current acquisition window (i.e. the current time window); m is an integer greater than or equal to 1.

Wherein the second matrix is input in the form of input of the Composite LSTM neural network in FIG. 14, and the predicted values of the KPIs at the N time points are output in the form of output of the Composite LSTM neural network in FIG. 14. In an optional implementation manner, before the second matrix is input into the first neural network model, data preprocessing may be performed on the second matrix to complete operations such as normalization and data format adaptation.

It should be noted that, in the training process of the first neural network model, the training data (i.e. the historical values of the KPIs) generally adopts normal data at least longer than 3 days before the network change operation, and optionally, generally adopts normal data of 7 days a week, as shown in fig. 16. The algorithm training supports robustness and has certain fault tolerance capability, so that a small number of abnormal data points are allowed to appear in training data. But large area long time data anomaly can result in severe data training. In the abnormality detection process, when the network change operation is started, the abnormality detection apparatus starts to execute the abnormality detection process, for example, as shown in fig. 16. After the network change operation is started, every time a new data point arrives, the anomaly detection is detected together with the historical data and the latest data, namely, the data from more than 3 days before the change (generally 7 days in a week) to the latest data time point is detected. And reporting the result of the latest data at the time point every time new data arrives and after the anomaly detection is finished. After the network change operation, the detection result within one hour is generally not reliable, so that the detection result is not reported, for the following reasons: after network change operation, KPI may drop zero, causing false alarm; the sampling frequency is 15 minutes/point, only 4 points are sampled within one hour, and the data volume is too small; after a network change operation, data may not be collected for a short period of time, resulting in data loss. It should be noted that, when data is missing, zero padding or interpolation is performed when data is reported to ensure the continuity of data in time.

Based on the above, fig. 17 shows a schematic diagram of the effect of dealing with the problem of zero dropping and deformation of the network change scenario KPI by using Composite LSTM. For convenience of illustration, fig. 17 shows the effect of processing after a network change, in which only KPIs of a single index are tested. Firstly, the network change time drops to zero, and the caused abnormity can be suppressed manually because the time point is known. Then, the predicted data and the actual data are compared, and the fact that zero dropping at the time of network change does not affect prediction of subsequent data can be seen. Even in the slope climbing process of KPI, the data can still be well predicted, and the residual error value is always in a lower state. In addition, it can be observed that Composite LSTM is sensitive to sudden changes relative to climbing, and is robust to slight deformation (e.g., climbing after dropping zero).

Illustratively, fig. 18 shows a test effect graph based on multiple types of KPIs. Wherein a plurality of different types of KPIs need not be classified and can be processed simultaneously by the Composite LSTM algorithm. Composite LSTM predicts a plurality of KPIs simultaneously based on the learned waveform and correlation characteristics, and the deviation between actual data and predicted data is a residual value. In the diagram corresponding to each KPI in fig. 18, two lines identifying actual data and predicted data substantially coincide, and the lowest line identifies a residual value corresponding to the KPI, where, for easy viewing, the line corresponding to the residual value is illustrated as the square of the residual value. The correlation of multiple KPIs (like rise and fall) is broken, which also results in the residual value being increased, as in fig. 18, the second (top to bottom) periodic KPI is not in rise and fall with other KPIs, the correlation is broken, and therefore the higher residual value also appears. The large-area KPI residual error value at the same moment is higher, and the probability points to the abnormal business; when only a single or a small amount of indexes have high residual errors, the KPI is likely to shake occasionally, and false alarm is suppressed after comprehensive judgment (false alarm is caused by single-index abnormal detection). The Composite LSTM neural network outputs a multi-KPI residual matrix for subsequent abnormity judgment; it can also be used for other algorithm inputs (items that are currently used, such as lifetime learning based anomaly detection, incident aggregation grading).

Step 202: and the abnormality detection device determines abnormal results corresponding to the N time points according to the first matrix, wherein the abnormal result corresponding to any time point is whether the first service at any time point is abnormal.

For example, when the abnormality detection apparatus performs step 202, the first matrix (i.e., the residual matrix, and the difference value in the first matrix, i.e., the residual (value)) may be processed based on an isolated forest algorithm (ifoest), so as to obtain a 1/0 abnormality result based on a time point, for example, as shown in fig. 19. Optionally, "1" indicates that the first service is abnormal (or not abnormal) at the corresponding time point, and "0" indicates that the first service is not abnormal (or abnormal) at the corresponding time point.

In fig. 19, the residual matrix (i.e. the first matrix) can be regarded as multi-dimensional data (samples, features), with different KPIs as features and different time points as samples. The timing characteristics may not be considered because the residuals of multiple KPIs have no apparent timing characteristics. Therefore, the detection of KPI anomalies for multiple indicators can be understood as: and comprehensively judging abnormal time points (samples) based on residual errors (characteristics) of the multiple KPIs.

It should be noted that the reason why the Iforest algorithm is adopted in the present application is that: when a large number of KPIs are abnormal at the same time, the probability of the service is abnormal, so an algorithm suitable for detecting global abnormal points is needed; meanwhile, false alarm caused by single/small amount of KPI jitter needs to be suppressed, so an algorithm insensitive to local abnormal points is needed; the number of simultaneous detection of multiple KPIs is large, and therefore algorithms suitable for high-dimensional data detection are needed. In summary, iforest is a more suitable algorithm.

Specifically, the basic principle of Iforest can be shown as a schematic diagram shown in fig. 20: and constructing a binary search tree, and randomly selecting one value from the maximum value and the minimum value to segment when the feature space is segmented each time. The global outliers are isolated earlier when the feature space is cut. Thus, the global outliers are closer to the root node in the tree and shallower in depth. The local outliers are difficult to separate as normal points, and are far from the root node in the tree and deep. Iforest can construct a plurality of trees to form a forest, and comprehensively calculate the depth of a point to be used as an abnormal value.

However, the conventional Iforest has the following problems: iforest calculates a threshold based on a priori knowledge (percent anomaly), meaning that there will always be some samples (time points) detected as anomalies. The algorithm code shown below: <xnotran> self. _ threshold _ = np.percentile (self.decision _ function (X), 100.* self. _ contamination), (Scikit-learn) Iforest (contamination) (decision _ function (x) ) . </xnotran> I.e., it is assumed here how many percent of the detected data is contaminated (and anomalous), and then a threshold is calculated based on this percentage. Therefore, even in normal network change operation, some samples are erroneously detected as abnormal, which results in false alarm.

In contrast, the present application improves the threshold value of Iforest as follows: given that the training data is (or can be guaranteed to be) normal data, the offline of the threshold value can be calculated according to the comprehensive abnormal value of the training data, so that false alarm can be restrained. May be calculated using the quartile range IQR of the synthetic outliers of the training data. Where IQR is a more robust statistic in statistics, similar to median, rather than mean. The first quartile range of the training data is Q1, the third quartile range is Q3, and the quartile range is IQR = Q3-Q1. And setting the threshold value calculated by Iforest as T _ iforcest, wherein k is a parameter for calculating the lower limit of the threshold value, and the lower line of the threshold value can be controlled by k. The improved threshold may be calculated as: max (T _ iforcest, Q3+ k IQR). An improved Iforest algorithm Application Program Interface (API) may be as follows:

to_overall_anomalies_iForest(data＝None,contamination＝0.1,n_estimators＝100,split_time＝None,k＝5)；

wherein, the normalization controls the percentage of anomalies; n _ estimators is the number of the constructed trees; the split _ time is a network change operation time point and is used for taking out training data; k controls the lower threshold, the higher k, the higher the lower limit.

For example, in the threshold value diagram shown in fig. 21, it can be observed that the conventional ifoest calculated threshold value causes many false positives, the improved threshold value is relatively higher, and the false positives are suppressed.

Based on the above method, the anomaly detection device determines the anomaly results corresponding to the N time points according to the first matrix, and the specific method may be (that is, a processing method based on an Iforest algorithm): the abnormality detection device determines a KPI comprehensive abnormal value corresponding to each time point based on the difference values corresponding to the plurality of KPIs at each time point in the first matrix; determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; when the KPI comprehensive abnormal value at a time point is larger than the first threshold value, the abnormal detection device determines that the first service is abnormal at the time point; when the KPI comprehensive abnormal value at a time point is less than or equal to the first threshold value, the abnormality detection device determines that the first service is not abnormal at the time point; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the plurality of KPIs of the first service; the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormal percentage. Wherein the non-abnormal historical values of the plurality of KPIs of the first service are some normal data before network change operation.

Wherein the first threshold is the improved threshold mentioned above, the second threshold is Q3+ k × IQR mentioned above, and the third threshold is T _ iforcest.

In an optional implementation manner, the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormality percentage, and specifically may be determined by: the abnormality detection device sorts the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain the sorted KPI comprehensive abnormal values; wherein the synthetic abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained from the first values of the plurality of KPIs at the N time points; the abnormality detection device determines a target KPI comprehensive abnormal value corresponding to the abnormality percentage in the sorted KPI comprehensive abnormal values according to the preset abnormality percentage, and takes the determined target KPI comprehensive abnormal value as the third threshold. For example, the sorted composite anomaly values are 1,2,3,4,5, and the anomaly percentage is 20%, that is, 1 point is an anomaly, and then the third threshold is 4, and points greater than 4 are anomalies.

Step 203: the abnormality detection means determines abnormality degrees of the plurality of KPIs at each abnormal time point based on the abnormality result and the first matrix, and the abnormality degree of any one KPI at each abnormal time point is a percentage of a difference value corresponding to the any KPI to a sum value of the difference values corresponding to the plurality of KPIs.

The abnormality detection apparatus executes the process of step 203, which may be a KPI abnormality degree calculation process as shown in fig. 19, so that obtaining the abnormality degree of the KPI at each abnormal time point may facilitate troubleshooting of problems by operation and maintenance personnel. Specifically, the abnormal time point refers to a time point when the first service is abnormal.

Step 204: the abnormality detection means determines the abnormality type of the first service at each abnormal time point based on the abnormality degree of the plurality of KPIs at each abnormal time point.

Specifically, the abnormality detection apparatus may determine the abnormality type of the first service according to the abnormality degree of the KPIs at each abnormal time point, and the specific method may be: the abnormality detection device sorts the abnormality degrees of the KPIs at each abnormal time point from high to low to obtain the abnormality degrees of the KPIs after sorting at each time point, and takes the abnormality type corresponding to the abnormality degrees of the first KPIs at the first abnormal time point as the abnormality type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

Specifically, the type of the abnormality of the first service is determined, that is, the abnormality is further located. The abnormal type of the business can be more easily determined by operation and maintenance personnel through the method of ordering the abnormal degree. For example, if the indexes of the L3 layer, such as the authentication failure times and the degree of abnormality rank are high, the current abnormality probability points to the authentication process.

In an exemplary embodiment, in step 203, after the abnormality detection apparatus performs KPI abnormality degree calculation, abnormality degree ranking results of a plurality of KPIs may be directly output, as shown in fig. 19.

Illustratively, when calculating the KPI abnormality degree ranking at each abnormal time point, the abnormality detection apparatus sets the residual values of a KPIs at the abnormal time point t to be { x1, x2, \8230;, xA }, and then the abnormality degrees (percentages) of a KPIs are { x1/Σ xi, x2/Σ xi, \8230;, xA/Σ xi }, and Σ xi are the sums of all residual values at the current abnormal time point t, and finally ranks and outputs the KPI abnormality degrees based on the algorithm.

Exemplarily, based on the above description, as the training data shown in fig. 22, the detection data and the KPI comprehensive abnormal value at each time point, it can be seen that the abnormality of the single/small number of KPIs is not clearly reflected in the comprehensive abnormal value and thus is suppressed; after the network change operation, an abnormality occurs and the integrated abnormal value becomes high accordingly. Further, the time-based abnormality determination based on Iforest may be as shown in table 1 below:

TABLE 1

2019/8/13 8:30	FALSE (Exception)
		2019/8/13 8:35	FALSE
2019/8/13 8:40	FALSE
		2019/8/13 8:45	FALSE
2019/8/13 8:50	FALSE
		2019/8/13 8:55	TRUE (not abnormal)
2019/8/13 9:00	TRUE
		2019/8/13 9:05	TRUE
2019/8/13 9:10	TRUE
		2019/8/13 9:15	TRUE
2019/8/13 9:20	TRUE

By calculating the degree of abnormality of a plurality of KPIs, the ordering at the respective abnormal time point can be as shown in fig. 23. Wherein only the top 4 ranked KPIs for the 3 exceptional time points are shown in the figure. Based on the method, the abnormal fault can be determined to be mainly related indexes and services of the switching-on rate on the T side of the ATS.

Based on the above embodiments, in a specific exemplary embodiment, the flow of the anomaly detection method of the present application may be as shown in fig. 24, and may include an offline training process and an online real-time detection process. According to the method and the device, the multi-index KPI is integrated to carry out abnormity detection, and the problems of false alarm and missing alarm of single-index abnormity detection are solved. KPI classification is carried out based on the service scene, and anomaly detection is respectively carried out, so that the service granularity of anomaly detection can be reduced. The KPI deformation problem of the changed scene is solved by using a plurality of KPI characteristics (waveform + correlation) of deep neural network learning history. The neural network outputs a multi-KPI residual matrix, which can be used for the anomaly determination of the application and also can be used for other algorithm inputs (such as incident aggregation and grading). Finally, the improved Iforest algorithm outputs comprehensive abnormity judgment based on time points, and meanwhile, in order to facilitate the positioning of operation and maintenance personnel, a KPI abnormity degree calculation module is added to output a plurality of KPI abnormity degree sequences.

By adopting the abnormity detection method provided by the embodiment of the application, the problems of zero drop and deformation of the KPI in a network change scene can be solved, and meanwhile, the network abnormity can be quickly found in the network change scene aiming at the defects of the above single-index abnormity detection, so that operation and maintenance personnel can find the network abnormity in the change period as soon as possible and stop loss in time.

Based on the foregoing embodiments, an abnormality detection apparatus is further provided in the embodiments of the present application, and is used to implement the abnormality detection method provided in the embodiment shown in fig. 2. Referring to fig. 25, the abnormality detection apparatus 2500 includes a first processing unit 2501, a second processing unit 2502, and a third processing unit 2503, wherein:

the first processing unit 2501 is configured to determine a first matrix according to first values of a plurality of key performance indicators, KPIs, of a first service and a first neural network model, where the first matrix includes differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N time points; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first business; the first service is any one of a plurality of services; n is an integer greater than or equal to 1;

the second processing unit 2502 is configured to determine, according to the first matrix, abnormal results corresponding to the N time points, where an abnormal result corresponding to any one time point is whether the first service at any one time point is abnormal;

the third processing unit 2503 is configured to determine an abnormality degree of the plurality of KPIs at each abnormal time point according to the abnormality result and the first matrix, where the abnormality degree of any KPI at each abnormal time point is a percentage of a difference value corresponding to the any KPI to a sum value of the difference values corresponding to the plurality of KPIs; and determining the abnormal type of the first service at each abnormal time point according to the abnormal degree of the plurality of KPIs at each abnormal time point.

In an alternative embodiment, the abnormality detection apparatus 2500 may further include: the fourth processing unit is used for classifying the plurality of KPIs according to the services to obtain KPIs corresponding to the plurality of services respectively before the first processing unit determines the first matrix according to the first values of the plurality of KPIs of the first service and the first neural network model; and selecting a KPI corresponding to any service from KPIs corresponding to a plurality of services respectively as a plurality of KPIs of the first service.

In a specific embodiment, the first processing unit 2501, when determining the first matrix according to the first value of the plurality of KPIs of the first service and the first neural network model, is specifically configured to: generating a second matrix based on the first values of the KPIs, wherein the second matrix comprises second values of the KPIs at N time points in each acquisition window in M acquisition windows before the current acquisition window; inputting the second matrix into the first neural network model to obtain predicted values of the KPIs at N time points; determining a difference value between the predicted values of the KPIs and a first value of the KPIs at N time points, and generating a first matrix; m is an integer greater than or equal to 1; .

For example, when determining the abnormal results corresponding to the N time points according to the first matrix, the second processing unit 2502 is specifically configured to: determining a KPI comprehensive abnormal value corresponding to each time point based on the difference values corresponding to a plurality of KPIs at each time point in the first matrix; determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; determining that the first service is abnormal at a time point when the KPI comprehensive abnormal value at the time point is greater than the first threshold value; determining that the first service is not abnormal at a time point when the KPI comprehensive abnormal value at the time point is less than or equal to the first threshold value; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the plurality of KPIs of the first service; the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormal percentage.

Specifically, when determining the third threshold based on the first values of the KPIs at the N time points and a preset abnormality percentage, the second processing unit 2502 is specifically configured to: sorting the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain sorted KPI comprehensive abnormal values; wherein the comprehensive abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained by the first values of the plurality of KPIs at the N time points; according to the preset abnormal percentage, determining a target KPI comprehensive abnormal value corresponding to the abnormal percentage in the sorted KPI comprehensive abnormal values; and taking the determined target KPI comprehensive abnormal value as the third threshold value.

In an optional implementation manner, when determining the abnormality type of the first service according to the abnormality degrees of the plurality of KPIs at each abnormal time point, the third processing unit 2503 is specifically configured to: sorting the abnormal degrees of the KPIs at each abnormal time point from high to low to obtain the abnormal degrees of the sorted KPIs at each time point; taking the abnormal type corresponding to the abnormal degree of the first H KPIs at each abnormal time point as the abnormal type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

By adopting the anomaly detection device provided by the embodiment of the application, the problems of zero drop and deformation of the KPI in a network change scene can be solved, and the network anomaly can be quickly found in the network change scene, so that operation and maintenance personnel can find the network anomaly in the change period as soon as possible and stop loss in time.

It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. Each functional unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Based on the above embodiments, the embodiments of the present application further provide an abnormality detection apparatus, where the abnormality detection apparatus is configured to implement the abnormality detection method shown in fig. 2. Referring to fig. 26, the abnormality detection device 2600 may include: a processor 2601 and a memory 2602, wherein:

the processor 2601 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP. The processor 2601 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Wherein, the processor 2601 and the memory 2602 are connected to each other. Optionally, the processor 2601 and the memory 2602 are connected to each other via a bus 2603; the bus 2603 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 26, but this does not indicate only one bus or one type of bus.

In an alternative embodiment, the memory 2602 is used to store programs and the like. In particular, the program may include program code comprising computer operating instructions. The memory 2602 may include RAM and may also include non-volatile memory (non-volatile memory), such as one or more disk memories. The processor 2601 executes the application program stored in the memory 2602 to implement the above-described functions, thereby implementing the functions of the abnormality detection apparatus 2600.

Specifically, the processor 2601 is configured to be coupled to the memory 2602, invoke program instructions in the memory 2602, and perform the following operations to implement the anomaly detection method provided in the embodiment of the present application:

determining a first matrix from first values of a plurality of key performance indicators, KPIs, of a first service and a first neural network model, the first matrix comprising differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N points in time; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first traffic; the first service is any one of a plurality of services; n is an integer greater than or equal to 1;

determining abnormal results corresponding to the N time points respectively according to the first matrix, wherein the abnormal result corresponding to any time point is whether the first service at any time point is abnormal or not;

determining the abnormal degree of the plurality of KPIs at each abnormal time point according to the abnormal result and the first matrix, wherein the abnormal degree of any KPI at each abnormal time point is the percentage of the difference value corresponding to any KPI in the sum value of the difference values corresponding to the plurality of KPIs;

and determining the abnormal type of the first service at each abnormal time point according to the abnormal degree of the plurality of KPIs at each abnormal time point.

In an alternative embodiment, the processor 2601, prior to determining the first matrix from the first values of the plurality of KPIs for the first service and the first neural network model, is further configured to: classifying the plurality of KPIs according to services to obtain KPIs corresponding to the services respectively; and selecting a KPI corresponding to any service from KPIs corresponding to a plurality of services respectively as a plurality of KPIs of the first service.

Specifically, when determining the first matrix according to the first values of the multiple KPIs of the first service and the first neural network model, the processor 2601 is specifically configured to: generating a second matrix based on the first values of the KPIs, wherein the second matrix comprises second values of the KPIs at N time points in each acquisition window in M acquisition windows before the current acquisition window; inputting the second matrix into the first neural network model to obtain predicted values of the KPIs at N time points; determining a difference value between the predicted values of the KPIs and a first value of the KPIs at N time points, and generating a first matrix; m is an integer greater than or equal to 1.

For example, when determining the abnormal results corresponding to the N time points according to the first matrix, the processor 2601 is specifically configured to: determining a KPI comprehensive abnormal value corresponding to each time point based on the difference values corresponding to a plurality of KPIs at each time point in the first matrix; determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; determining that the first service is abnormal at a time point when the KPI comprehensive abnormal value at the time point is larger than the first threshold value; determining that the first service is not abnormal at a time point when the KPI comprehensive abnormal value at the time point is less than or equal to the first threshold value; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the plurality of KPIs of the first service; the third threshold is determined based on the first values of the KPIs at the N time points and a preset abnormal percentage.

Specifically, when the processor 2601 determines the third threshold based on the first values of the KPIs at the N time points and a preset abnormality percentage, it is specifically configured to: sorting the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain sorted KPI comprehensive abnormal values; wherein the synthetic abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained from the first values of the plurality of KPIs at the N time points; according to the preset abnormal percentage, determining a target KPI comprehensive abnormal value corresponding to the abnormal percentage in the sorted KPI comprehensive abnormal values; and taking the determined target KPI comprehensive abnormal value as the third threshold value.

In an optional implementation manner, when determining the abnormal type of the first service according to the abnormal degree of the multiple KPIs at each abnormal time point, the processor 2601 is specifically configured to: sorting the abnormal degrees of the KPIs at each abnormal time point from high to low to obtain the abnormal degrees of the sorted KPIs at each time point; taking the abnormal type corresponding to the abnormal degree of the previous H KPIs at each abnormal time point as the abnormal type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

Based on the foregoing embodiments, the present application further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program is executed by a computer, the computer may implement any one of the anomaly detection methods provided by the foregoing method embodiments.

Embodiments of the present application further provide a computer program product, where the computer program product is used to store a computer program, and when the computer program is executed by a computer, the computer may implement any one of the anomaly detection methods provided in the foregoing method embodiments.

The embodiment of the present application further provides a chip, which includes a processor and a communication interface, where the processor is coupled with a memory, and is configured to call a program in the memory, so that the chip implements any one of the abnormality detection methods provided in the above method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An anomaly detection method is applied to a network change scene, and comprises the following steps:

determining a first matrix from first values of a plurality of key performance indicators, KPIs, for a first business and a first neural network model, the first matrix comprising differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N points in time; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first traffic; the first service is any one of a plurality of services; n is an integer greater than or equal to 1;

determining abnormal results corresponding to the N time points according to the first matrix, wherein the abnormal result corresponding to any time point is whether the first service at any time point is abnormal or not;

2. The method of claim 1, wherein prior to determining the first matrix based on the first values for the plurality of KPIs for the first service and the first neural network model, the method further comprises:

classifying the KPIs according to services to obtain KPIs corresponding to the services respectively;

and selecting a KPI corresponding to any service from KPIs corresponding to a plurality of services respectively as a plurality of KPIs of the first service.

3. A method according to claim 1 or 2, wherein determining a first matrix from first values of a plurality of KPIs for a first service and a first neural network model comprises:

generating a second matrix based on the first values of the KPIs, wherein the second matrix comprises second values of the KPIs at N time points in each acquisition window in M acquisition windows before the current acquisition window; m is an integer greater than or equal to 1;

inputting the second matrix into the first neural network model to obtain predicted values of the KPIs at N time points;

and determining the difference value of the predicted values of the KPIs and the first value of the KPIs at N time points, and generating the first matrix.

4. The method according to any one of claims 1-3, wherein determining the abnormal result corresponding to each of the N time points according to the first matrix comprises:

determining a KPI comprehensive abnormal value corresponding to each time point based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix;

determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the KPIs of the first service; the third threshold value is determined based on the first values of the KPIs at the N time points and a preset abnormity percentage;

determining that the first service is abnormal at a time point when the KPI comprehensive abnormal value at the time point is larger than the first threshold value;

and when the KPI comprehensive abnormal value at a time point is less than or equal to the first threshold value, determining that the first service is not abnormal at the time point.

5. The method of claim 4, wherein the third threshold is determined based on the first values of the KPIs at the N time points and a preset anomaly percentage, and comprises:

sorting the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain sorted KPI comprehensive abnormal values; wherein the synthetic abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained from the first values of the plurality of KPIs at the N time points;

according to the preset abnormal percentage, determining a target KPI comprehensive abnormal value corresponding to the abnormal percentage in the sorted KPI comprehensive abnormal values;

and taking the determined target KPI comprehensive abnormal value as the third threshold value.

6. The method according to any of claims 1-5, wherein determining the anomaly type of the first traffic based on the anomaly degrees of the plurality of KPIs at each of the anomalous time points comprises:

sorting the abnormal degrees of the KPIs at each abnormal time point from high to low to obtain the abnormal degrees of the sorted KPIs at each time point;

taking the abnormal type corresponding to the abnormal degree of the previous H KPIs at each abnormal time point as the abnormal type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

7. An anomaly detection device, applied to a network change scenario, comprising:

a first processing unit, configured to determine a first matrix according to first values of a plurality of key performance indicators, KPIs, of a first service and a first neural network model, where the first matrix includes differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N time points; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first traffic; the first service is any one of a plurality of services; n is an integer greater than or equal to 1;

a second processing unit, configured to determine, according to the first matrix, abnormal results corresponding to the N time points, where an abnormal result corresponding to any time point is whether the first service at any time point is abnormal;

a third processing unit, configured to determine an abnormality degree of the plurality of KPIs at each abnormal time point according to the abnormality result and the first matrix, where the abnormality degree of any KPI at each abnormal time point is a percentage of a difference value corresponding to the any KPI to a sum value of the difference values corresponding to the plurality of KPIs; and

8. The apparatus of claim 7, further comprising:

the fourth processing unit is used for classifying the KPIs according to the services to obtain KPIs corresponding to the services respectively before the first processing unit determines the first matrix according to the first values of the KPIs of the first service and the first neural network model; and selecting the KPI corresponding to any service from the KPIs corresponding to the services as the KPIs of the first service.

9. The apparatus according to claim 7 or 8, wherein the first processing unit, when determining the first matrix based on the first values of the plurality of KPIs for the first service and the first neural network model, is specifically configured to:

10. The apparatus according to any one of claims 7 to 9, wherein the second processing unit, when determining the abnormal results corresponding to the N time points respectively according to the first matrix, is specifically configured to:

determining whether the first service at each time point is abnormal or not by using the KPI comprehensive abnormal value corresponding to each time point and a first threshold; wherein the first threshold is the maximum of the second threshold and the third threshold; the second threshold value is obtained based on the non-abnormal historical values of the plurality of KPIs of the first service; the third threshold value is determined based on the first values of the KPIs at the N time points and a preset abnormal percentage;

11. The apparatus as claimed in claim 10, wherein the second processing unit, when determining the third threshold based on the first values of the KPIs at the N time points and a preset anomaly percentage, is specifically configured to:

sorting the N KPI comprehensive abnormal values corresponding to the N time points from large to small to obtain sorted KPI comprehensive abnormal values; wherein the comprehensive abnormal value corresponding to each time point is determined based on the difference value corresponding to the plurality of KPIs at each time point in the first matrix obtained by the first values of the plurality of KPIs at the N time points;

12. The apparatus according to any of claims 7-11, wherein the third processing unit, when determining the abnormality type of the first service based on the abnormality degrees of the plurality of KPIs at each of the abnormal time points, is specifically configured to:

sorting the abnormality degrees of the KPIs at each abnormal time point from high to low to obtain the abnormality degrees of the sorted KPIs at each time point;

taking the abnormal type corresponding to the abnormal degree of the first H KPIs at each abnormal time point as the abnormal type of the first service at each abnormal time point; h is an integer greater than or equal to 1.

13. An anomaly detection device, applied to a network change scenario, includes:

a memory for storing program instructions;

a processor, coupled to the memory, for invoking program instructions in the memory and performing the following operations:

determining a first matrix from first values of a plurality of key performance indicators, KPIs, for a first business and a first neural network model, the first matrix comprising differences between predicted values of the plurality of KPIs and the first values of the plurality of KPIs at N points in time; the predicted values of the plurality of KPIs are obtained based on the first neural network model; the first neural network model is determined based on historical values of the plurality of KPIs for the first business; the first service is any one of a plurality of services; n is an integer greater than or equal to 1;

14. The apparatus of claim 13, wherein the processor, prior to determining the first matrix from the first values of the plurality of KPIs for the first service and the first neural network model, is further configured to:

15. The apparatus of claim 13 or 14, wherein the processor, when determining the first matrix based on the first values of the plurality of KPIs for the first service and the first neural network model, is specifically configured to:

16. The apparatus according to any one of claims 13 to 15, wherein the processor, when determining the abnormal results corresponding to the N time points respectively according to the first matrix, is specifically configured to:

determining that the first service is abnormal at a time point when the KPI comprehensive abnormal value at the time point is greater than the first threshold value;

17. The apparatus as claimed in claim 16, wherein said processor, when determining said third threshold value based on the first values of the KPIs at the N time points and a preset abnormality percentage, is specifically configured to:

18. An apparatus according to any one of claims 13-17, wherein the processor, when determining the abnormality type for the first service based on the abnormality degrees for the plurality of KPIs at each of the abnormal time points, is specifically configured to:

19. A computer-readable storage medium, in which a computer program is stored which, when executed by a computer, causes the computer to carry out the method according to any one of claims 1 to 6.