CN116560963A

CN116560963A - Abnormality detection method, apparatus, device, and medium

Info

Publication number: CN116560963A
Application number: CN202310561551.4A
Authority: CN
Inventors: 彭奕铮; 吴利华; 张泳; 刘伟
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-08-08

Abstract

The disclosure provides an anomaly detection method, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring operation data of a continuous delivery system, wherein the operation data comprises N seed operation data, wherein each seed operation data is different from at least one other seed operation data in data type, and N is more than or equal to 2; inputting the N seed operation data into N branch networks in a one-to-one correspondence manner to obtain N first feature vectors output by the N branch networks, wherein the N branch networks are obtained based on a machine learning algorithm; fusing the N first feature vectors to obtain a second feature vector; and classifying the second feature vector to obtain an abnormality detection result. The present disclosure also provides an abnormality detection apparatus, device, storage medium, and program product.

Description

Abnormality detection method, apparatus, device, and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to an anomaly detection method, apparatus, device, medium, and program product.

Background

The DevOps continuous delivery system has many advantages in terms of software development and operation engineering: can enable team to have higher development speed and high agility, enable team to have the ability to respond to demand and change rapidly, and release team member's assets to engage in high value tasks.

However, in the production operation process, due to the large system scale and large log data volume, the operation and maintenance personnel can hardly position and judge the abnormal problem in the DevOps system at the first time, and a certain time is required for manual investigation.

Disclosure of Invention

In view of the foregoing, the present disclosure provides an abnormality detection method, apparatus, device, medium, and program product.

In one aspect of the embodiments of the present disclosure, there is provided an anomaly detection method including: acquiring operation data of a continuous delivery system, wherein the operation data comprises N seed operation data, wherein each seed operation data is different from at least one other seed operation data in data type, and N is more than or equal to 2; inputting the N seed operation data into N branch networks in a one-to-one correspondence manner to obtain N first feature vectors output by the N branch networks, wherein the N branch networks are obtained based on a machine learning algorithm; fusing the N first feature vectors to obtain a second feature vector; and classifying the second feature vector to obtain an abnormality detection result.

In some embodiments, the inputting the N seed operational data into the N branch networks in a one-to-one correspondence includes: m abnormal indexes of the continuous delivery system are determined, wherein M is greater than or equal to 1; determining data which are respectively associated with each abnormal index in the N seed operation data; and inputting the data related to the abnormal index in the N seed operation data into N branch networks in a one-to-one correspondence manner according to each abnormal index.

In some embodiments, inputting the data associated with the abnormality index in the N seed operation data into the N branch networks in a one-to-one correspondence based on each abnormality index includes: determining a relationship between at least two of the M anomaly metrics, wherein the relationship comprises at least one of a causal relationship, a temporal relationship, or a category relationship; determining the detection sequence of at least one index in the M abnormal indexes according to the relation; for each abnormal index, determining the input sequence of the data related to the abnormal index based on the detection sequence.

In some embodiments, inputting the data associated with the abnormality index in the N seed operation data into the N branch networks in a one-to-one correspondence based on each abnormality index includes: and simultaneously inputting data associated with the at least two indexes in the N seed operation data into N branch networks in a one-to-one correspondence manner according to the at least two indexes with the relation.

In some embodiments, the inputting the N seed operational data into the N branch networks in a one-to-one correspondence includes: dividing each piece of sub-operation data according to a preset time period; and inputting the data in the same preset time period in the N seed operation data into N branch networks in a one-to-one correspondence manner.

In some embodiments, the N seed operational data comprises image data, the N branch networks comprise first branch networks, and the N first feature vectors comprise image feature vectors; the inputting the N seed operation data into the N branch networks in one-to-one correspondence includes inputting the image data into the first branch network, and specifically includes: inputting the image data into an image feature extraction model of the first branch network to obtain intermediate image features; and inputting the intermediate image features into a convolutional neural network model of the first branch network to obtain the image feature vector.

In some embodiments, the N seed operational data comprises log data, the N branch networks comprise a second branch network, and the N first feature vectors comprise log feature vectors; the inputting the N seed operation data into the N branch networks in a one-to-one correspondence includes inputting the log data into the second branch network, and specifically includes: inputting the log data into a two-way long-short-term memory neural network in the second branch network, wherein the two-way long-term memory neural network is configured to process the log data based on an attention mechanism to obtain an intermediate log characteristic; and inputting the intermediate log feature into a convolutional neural network model of the second branch network to obtain the log feature vector.

In some embodiments, the fusing the N first feature vectors to obtain a second feature vector includes: splicing the image feature vector and the log feature vector to obtain the second feature vector; the classifying the second feature vector to obtain an anomaly detection result includes: and inputting the second feature vector into a classification model to obtain the abnormal detection result output by the classification model.

In some embodiments, the anomaly detection model includes the N branch networks and the classification model, the anomaly detection model being pre-trained via: obtaining historical log data based on the historical operation data of the continuous delivery system, wherein the historical log data comprises known log data and unknown log data, and the known log data comprises normal labels or abnormal labels; clustering the unknown log data based on the known log data to obtain a log training set; and training the abnormality detection model based on the log training set and the rest type of sub-operation data training set.

In some embodiments, the N seed operational data includes at least two of log data, image data, time series data, code data, structured data, audio data, video data, or metadata.

Another aspect of an embodiment of the present disclosure provides an abnormality detection apparatus including: the data acquisition module is used for acquiring the operation data of the continuous delivery system, wherein the operation data comprises N seed operation data, each seed operation data is different from at least one other seed operation data in data type, and N is more than or equal to 2; the data processing module is used for inputting the N seed operation data into N branch networks in a one-to-one correspondence manner to obtain N first feature vectors output by the N branch networks, wherein the N branch networks are obtained based on a machine learning algorithm; the feature fusion module is used for fusing the N first feature vectors to obtain a second feature vector; and the feature classification module is used for classifying the second feature vector to obtain an abnormal detection result.

The apparatus comprises means for performing the steps of the method as claimed in any one of the preceding claims, respectively.

Another aspect of an embodiment of the present disclosure provides an electronic device, including: one or more processors; and a storage means for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Another aspect of the disclosed embodiments also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method as described above.

Another aspect of the disclosed embodiments also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

One or more of the above embodiments have the following advantages: in order to solve the problem of slower abnormality detection in actual production operation, the sensitivity of monitoring operation abnormality is improved, the operation data of a continuous delivery system are combined with data under multiple modes to detect, N first feature vectors of the data of the multiple modes are fused to obtain a second feature vector, and the second feature vector is classified to obtain an abnormality detection result. Effective information of operation data is focused, abnormal data can be detected more accurately, and the speed of detecting abnormal operation of the DevOps system is improved.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of an anomaly detection method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of an anomaly detection method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates an inference architecture diagram of an anomaly detection model in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of obtaining an image feature vector according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a flow chart of obtaining a log feature vector according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a flow chart of inputting sub-operational data according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow chart of determining an input order according to an embodiment of the disclosure;

FIG. 8 schematically illustrates a training architecture diagram of an anomaly detection model in accordance with an embodiment of the present disclosure;

FIG. 9 schematically illustrates a flow chart of a training anomaly detection model in accordance with an embodiment of the present disclosure;

fig. 10 schematically shows a block diagram of a configuration of an abnormality detection apparatus according to an embodiment of the present disclosure; and

fig. 11 schematically illustrates a block diagram of an electronic device adapted to implement an anomaly detection method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Fig. 1 schematically illustrates an application scenario diagram of an anomaly detection method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud computing, network service, and middleware service.

The anomaly detection methods provided by embodiments of the present disclosure may be generally performed by server 105. Accordingly, the abnormality detection apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. In some embodiments, the persistent delivery system may be deployed in server 105, in other embodiments, the persistent delivery system may be deployed in a server other than server 105, the operational data may be obtained by the server 105 in communication with the persistent delivery system, or the operational data may be transmitted to the server 105 by a user via terminal devices 101, 102, 103, uploaded locally, downloaded over the network, etc.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The abnormality detection method of the embodiment of the present disclosure will be described in detail below by fig. 2 to 9 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flowchart of an anomaly detection method according to an embodiment of the present disclosure. Fig. 3 schematically illustrates an inference architecture diagram of an anomaly detection model according to an embodiment of the present disclosure. The anomaly detection model includes N branch networks and a classification model.

As shown in fig. 2, the abnormality detection method of this embodiment includes:

in operation S210, operation data of the continuous delivery system is acquired, wherein the operation data includes N seed operation data, wherein each seed operation data is different from at least one other seed operation data in data type, and N is greater than or equal to 2.

The combination of Development and Operations, also called sustained delivery, is a culture, exercise or convention that pays attention to the communication collaboration between "software developer (Dev)" and "IT operation and maintenance technician (Ops)". The software can be built, tested and released more quickly, frequently and reliably through the processes of automatic software delivery and architecture change.

In some embodiments, the N seed run data includes at least two of log data, image data, time series data, code data, structured data, audio data, video data, or metadata. Metadata (Metadata), also called intermediate data and relay data, is data (data about data) describing data, mainly describing data attribute (property) information, and is used to support functions such as indicating storage location, history data, resource searching, file recording, and the like.

In DevOps, various different formats and types of runtime data generated at the application and infrastructure level are involved, which can be analyzed to find problems, providing room for improvement. In DevOps, the log data includes application and operating system logs. Image data may be a visual depiction of data, such as a scatter plot characterizing the performance of an application. The time series data includes data of the application program and the system index changing with time. The code data includes source code in a code repository and historical versions thereof. The structured data includes structured data such as databases, configuration files, and the like. The audio data includes abnormal sounds that may come from hardware. The video data includes video recorded by the monitoring camera. Metadata includes data describing and identifying the data, such as filename, creation time, etc.

In operation S220, the N seed operation data is input to the N branch networks in a one-to-one correspondence manner, and N first feature vectors output by the N branch networks are obtained, where the N branch networks are obtained based on a machine learning algorithm. The N branch networks have a one-to-one correspondence with the data types of each mode.

Referring to fig. 3, the branch network includes one or more models based on machine learning algorithms for feature extraction and classification of input data for improving the accuracy of anomaly detection.

In operation S230, N first feature vectors are fused to obtain a second feature vector.

In some embodiments, neural network model fusion may be used, for example, the input of the multi-layer perceptron MLP is N first feature vectors, the MLP is trained to obtain the connection weights between nodes, and multiple feature vectors can be fused to output a second feature vector.

In other embodiments, the different feature vectors may be normalized or normalized and then averaged to obtain the second feature vector.

In other embodiments, the image feature vector and the log feature vector are stitched to obtain a second feature vector.

In operation S240, the second feature vector is classified to obtain an abnormality detection result.

The first feature vector refers to a vector used for representing the features of input data after being processed by a branch network. The second feature vector refers to a feature vector obtained by fusing a plurality of first feature vectors and is used for final abnormality detection classification.

In some embodiments, the second feature vector is input into the classification model to obtain an anomaly detection result output by the classification model. The classification model may include models constructed by algorithms such as logistic regression, decision trees, SVMs, neural networks, etc.

According to the embodiment of the disclosure, in order to improve the problem of slower abnormality detection in actual production operation, improve the sensitivity of monitoring operation abnormality, detect operation data of a continuous delivery system by combining data in multiple modes, obtain a second feature vector by fusing N first feature vectors of the data in multiple modes, and classify the second feature vector to obtain an abnormality detection result. Effective information of operation data is focused, abnormal data can be detected more accurately, and the speed of detecting abnormal operation of the DevOps system is improved.

Fig. 4 schematically illustrates a flowchart of obtaining an image feature vector according to an embodiment of the present disclosure.

As shown in fig. 4, the N seed operational data includes image data, the N branch networks include first branch networks, and the N first feature vectors include image feature vectors. This embodiment is one of the embodiments of operation S220, comprising:

in operation S410, image data is input to an image feature extraction model of a first branch network, and intermediate image features are obtained.

In operation S420, the intermediate image feature is input to a convolutional neural network model of the first branch network, and an image feature vector is obtained.

Referring to fig. 3, wherein image data extracts intermediate image features through a RepLKNet (image feature extraction model), and obtains image feature vectors through an FCN full convolution network (convolution neural network model). FCN (Fully Convolutional Networks) full convolutional network is a convolutional neural network for image segmentation.

The image data is subjected to feature extraction through a RepLKNet network, and the RepLKNet adopts a large convolution kernel and has a larger receptive field. The receptive field is a concept which is focused on in the field of segmentation and detection, just like the field of view of a person, and is embodied by a convolution kernel, and if the range is small, the received information is unilateral and local. If the receptive field is increased, more global information can be obtained, so that the judgment of the picture information is facilitated. The size of the effective receptive field is related to the size of the convolution kernel, K, and the model depth, L, and is proportional to K and inversely proportional to L. The effective receptive field is more sensitive to the size of K, and increasing the depth does not increase the convolution kernel size intuitively. In addition, increasing depth also causes optimization problems. In addition, the RepLKNet adopts the problems that the depth separable convolution reduces the large convolution parameter quantity and the floating point operation quantity is large, so that the RepLKNet large convolution kernel is used, and the characteristic extraction of image data is facilitated by a lighter network.

According to the embodiment of the disclosure, the image local features can be better captured by using the image feature extraction model and the convolution network model, and better data processing precision is obtained. Different image portion extraction features may be selected. The advantage of the convolutional neural network can be fully utilized, so that better image feature vectors are obtained.

Fig. 5 schematically illustrates a flowchart of obtaining a log feature vector according to an embodiment of the present disclosure. As shown in fig. 5, the N seed operational data includes log data, the N branch networks include second branch networks, and the N first feature vectors include log feature vectors. This embodiment is one of the embodiments of operation S220, comprising:

in operation S510, log data is input into a two-way long-short-term memory neural network in the second branch network, the two-way long-short-term memory neural network being configured to process the log data based on an attention mechanism, resulting in an intermediate log feature.

In some embodiments, the log data text may be pre-processed, parsed by the log, prior to operation S510. First, a UniParser log parser is used, for example. This is because a unified log parser UniParser trained across multiple log sources can capture common patterns of templates or parameters across heterogeneous log sources, can be directly applied to new log data sources and the interpreter has better performance. The token encoder module and context encoder module of the parser are responsible for learning semantic patterns from the log markers themselves and their neighboring contexts. The context similarity module focuses the model on commonalities of learned patterns, so it can be directly applied to new log data sources. Second, word Embedding. And converting each word in the processed log event into a multidimensional vector by adopting a pre-trained word2vec model, and extracting semantic information from each word in the log event.

In operation S520, the intermediate log feature is input to a convolutional neural network model of the second branch network, and a log feature vector is obtained.

Referring to fig. 3, intermediate log features are obtained by using a Bi-LSTM network based on an attention mechanism, and representative log feature vectors are obtained by using FCN full convolution. And the characteristic fusion is conveniently carried out through the FCN full convolution network after the subsequent fusion with the image characteristic vector.

According to the embodiment of the disclosure, the context information in the log can be captured, key features in the log can be effectively extracted, and representative feature vectors can be extracted from the key features. These feature vectors may well describe critical information in the log, such as anomalies, errors, and the like. The robustness of the model can be improved, and the irregularity and noise in the log sequence can be effectively processed by using the attention mechanism and the full convolution network. Therefore, the method has strong robustness and can cope with various complex log analysis scenes.

By fusing the log feature vector with other types of feature vectors such as image feature vectors, the state of the system can be more comprehensively described, and therefore the accuracy of anomaly detection is improved. Meanwhile, the FCN full convolution network is utilized to perform feature fusion, so that the problems of information loss, over-fitting and the like can be avoided.

By fusing different types of features, the accuracy of anomaly detection can be improved. For example, if the system is abnormal, abnormal information may appear in the log, and abnormal visual features may also appear in the image. By fusing the characteristics, abnormal signals can be better captured, so that the accuracy of abnormality detection is improved.

In some embodiments, the image feature extraction model is not limited to RepLKNet, nor is the convolutional neural network model limited to FCN full convolutional networks.

Fig. 6 schematically illustrates a flow chart of inputting sub-operational data according to an embodiment of the present disclosure. As shown in fig. 6, this embodiment is one of the embodiments of operation S220, including:

in operation S610, M abnormality indexes of the continuous delivery system are determined, M being greater than or equal to 1.

Abnormality index: such as CPU utilization, memory occupancy, response time, number of exception log entries, number and cycle time of software packages that can be issued, failure average interval time, failure average recovery time, stability, throughput, time deployment failure rate per deployment, post-online Bug indicators (such as number of BUGs fed back, number of serious BUGs, etc. to measure online quality), system availability and performance indicators (such as response time, throughput, etc. to determine system stability), etc. The role of operation S610 includes determining possible anomaly metrics such as CPU utilization, response time, etc., locating the type of anomaly that the system needs to be concerned with.

In operation S620, data each associated with each abnormality index in the N seed operation data is determined. And identifying operation data corresponding to different abnormal indexes, such as CPU occupancy rate data corresponding to CPU utilization rate and call duration data corresponding to response time.

In operation S630, based on each anomaly index, data associated with the anomaly index in the N seed operation data is input into the N branch networks in a one-to-one correspondence.

Illustratively, inputting N branched networks may be accomplished in two steps: and firstly, extracting corresponding operation data according to the abnormal indexes with the relation. The following may be used here: and inquiring corresponding operation data such as index data, log data and the like. The data filtering rule is set, and only the data related to the specified abnormality index is extracted. And secondly, inputting the operation data extracted in the first step into a corresponding branch network. The method specifically comprises the following steps: the data points are sequentially input into the branching network in time series. The entire data segment is input into the branching network in a batch manner.

For example, the throughput index, the data of the N seed operation data associated with the throughput index is extracted, and N branch networks are input in a one-to-one correspondence according to the data type.

By locating different abnormal indexes and inputting corresponding data to the branch networks, each branch network can respectively mine effective characteristics of operation data of different modes, and finally, detection results of each branch network are fused to form a second characteristic vector.

According to the embodiment of the disclosure, the data effective information of the specific abnormal index is focused, the characteristics of the operation data of different modes are fused, and the system operation abnormality can be detected more accurately and efficiently.

Fig. 7 schematically illustrates a flow chart of determining an input order according to an embodiment of the disclosure. As shown in fig. 7, this embodiment is one of the embodiments of operation S630, including:

in operation S710, a relationship between at least two of the M anomaly metrics is determined, wherein the relationship includes at least one of a causal relationship, a temporal relationship, or a category relationship.

By way of example, by analyzing the system log and index data, it is possible to determine the relationship between two abnormal indexes by finding that they meet the following conditions. Causal relationship: when one abnormality index value changes, it is often accompanied by another abnormality index value change. Timing relationship: one abnormality index changes in preference to another abnormality index value. Category relationship: the two abnormal indexes belong to the same category as CPU, memory, network, etc.

For example, causal relationships: for example, an increase in CPU utilization may result in an increase in response time. Timing relationship: for example, the memory occupancy rises abnormally prior to the response time. Category relationship: for example, the CPU utilization and the memory occupancy belong to the same system index class.

In operation S720, a detection order of at least one index of the M abnormal indexes is determined according to the relationship.

And determining the detection sequence according to the sequence of the occurrence of the abnormality. The detection sequence can be to detect the indexes possibly causing other anomalies first, so that the processing speed is increased, and the cascade anomalies are avoided. For abnormal indexes with only category relations and no causal relation, the abnormal indexes can be detected simultaneously. The detection sequence can be finally determined according to the comprehensive consideration of the actual system operation data and the abnormal index relation.

In operation S730, for each abnormality index, an input order of data associated with the abnormality index is determined based on the detection order.

And identifying abnormal index relationships, formulating a more reasonable detection sequence according to the relationships, and inputting related data to the branch network according to the detection sequence for analysis so as to find system abnormality. Specifically, the relationship between the abnormality indexes is first determined, for example, an a abnormality index may lead to a B abnormality index. Then, according to the possible B abnormal index caused by the A abnormal index, the detection sequence of the A abnormal index is forward, or only the associated data of the A abnormal index can be input, the A abnormal index is detected, the B abnormal index is correspondingly generated, the efficiency is improved, and the calculation cost is saved.

According to the embodiment of the disclosure, the cause indexes possibly causing other anomalies can be processed early by analyzing the causal, time sequence and category relations among the anomaly indexes, determining a reasonable detection sequence and detecting the related data according to the sequence, so that the system operation anomalies are reduced as a whole. In addition, the associated data of one or more abnormal indexes are input at a time, but not all the data, so that the positioning speed of the abnormal root causes is improved.

In some embodiments, inputting data associated with the abnormality index in the N seed operation data into the N branch networks in a one-to-one correspondence based on each abnormality index includes: and simultaneously inputting data associated with the at least two indexes in the N seed operation data into N branch networks in a one-to-one correspondence manner according to the at least two indexes with the relation.

The detection sequence comprises the sequence of inputting the associated data of each abnormal index to N branch networks each time, wherein the associated data of one abnormal index can be input each time, and the associated data of a plurality of abnormal indexes can be input simultaneously. If two abnormal indexes of CPU utilization rate and memory occupancy rate are related, the corresponding CPU index data and memory index data are extracted in the first step. And the second step is to input the associated data in the two groups of indexes into N branch networks according to the data types.

According to the embodiment of the disclosure, when at least two abnormal indexes of causal, time sequence and category relations exist, the abnormal indexes can be detected at the same time, key information is found by considering the relations among the abnormal indexes, and the detection precision is improved. Therefore, abnormal indexes with causal, time sequence and category relations are analyzed, the association between the abnormal indexes can be fully utilized, and the model performance is improved as a whole.

In other embodiments, unlike fig. 6 and 7, which input associated data in terms of an anomaly index as a dimension, the embodiment inputs N seed operation data into N branch networks in a one-to-one correspondence, including: each seed operational data is divided by a preset period of time (e.g., 1 minute, just an example). And inputting the data in the same preset time period in the N seed operation data into N branch networks in one-to-one correspondence.

Illustratively, data in N modes is processed corresponding to N branch networks, respectively. The branched network may include one or more models that deal specifically with a particular modality. In order to efficiently detect abnormal data, it is necessary to segment the operation data, and the data of each period is handled as a set of inputs. And for the data in the same time period, the operation data of N modes are sent to N branch networks for processing in a one-to-one correspondence manner.

According to the embodiment of the present disclosure, in operation S210, real-time operation data of the continuous delivery system is acquired, and the real-time operation data is divided and analyzed according to the generated preset time period in consideration of the limitation of the performance capacity, so that the timeliness of anomaly detection is improved.

The existing technical scheme depends on a large amount of training data, needs a large amount of abnormal data, and can influence the accuracy of a traditional model due to the fact that a large amount of redundant information exists in image data or an operation log, and is specifically as follows:

1. the amount of data required for training is large: because of the large amount of anomaly data, often heterogeneous and difficult to understand, manual labeling is time consuming and expensive, and thus training is difficult using supervised methods in practical DevOps systems.

2. The accuracy is low: the existing conventional machine learning method is difficult to process redundant information, and excessive redundant information leads to the situation that the accuracy of the model is higher in a training environment and lower in a testing environment.

In an embodiment of the present disclosure, an SMMLFAD (Semi-Supervised Multi-Modality Machine Learning Framework for Anomaly Detection) Semi-Supervised Multi-modal machine learning framework (by way of example only) is deployed, for example, in a DevOps continuous delivery system for anomaly detection. And through a plurality of branch networks, the influence of redundant information is reduced by adopting modes such as clustering, attention mechanism, feature extraction of each mode data of a plurality of models and the like.

Referring to fig. 8 and 9, a training process incorporating multi-modal data is illustrated, enabling abnormal operation detection based on multi-modal data.

Fig. 8 schematically illustrates a training architecture diagram of an anomaly detection model according to an embodiment of the present disclosure. Fig. 9 schematically illustrates a flowchart of training an anomaly detection model according to an embodiment of the present disclosure. As shown in fig. 9, the training abnormality detection model of this embodiment includes:

in operation S910, history log data including known log data and unknown log data including normal tags or abnormal tags is obtained based on the history operation data of the continuous delivery system.

In operation S920, the unknown log data is clustered based on the known log data, and a log training set is obtained. The log training set includes one or more log training samples.

After analyzing the historical operation data by using the UniParser, dividing the content in the text into different byte-slice grams by adopting the log for known log data (Normal log text), counting the frequency of all the grams, and filtering according to a set threshold value to form a key gram list. And adopting a Word2vec mode for unknown log data (Unlabel log text) of unlabeled labels.

Next, clusters are employed to identify log sequences with similar semantics to the same group based on the known log data estimating the unknown log data of unlabeled labels in the training set. In some embodiments, the clustering method employed is the HDBSCAN method. The method is used because when the sparseness of the clusters is different, the clustering effects are also quite different, namely, the data density is uneven, some kinds of the clusters are high in density, when some kinds of the clusters are very low in density, all kinds of the clusters cannot be identified by a single density requirement, and the clustering problem with different densities can be solved by the HDBSCAN.

HDBSCAN algorithm

In operation S930, an anomaly detection model is trained based on the log training set and the remaining types of sub-run data training sets.

For the log training set, referring to FIG. 8, feature vectors are extracted by Bi-LSTM and FCN based on the attention mechanism. The LSTM model is formed by adding a door mechanism and a memory unit on the basis of RNN, so that gradient explosion and gradient disappearance are effectively prevented, meanwhile, longer-distance dependence is better captured, and bi-directional semantic dependence can be captured by using BILSTM, so that extraction of text vectors is facilitated.

In addition, for an image training set, a plurality of image training samples are included. In preprocessing of image data, data enhancement is required for the image due to insufficient number of images. The image data enhancement used is: (1) geometric transformation of the image. Including flipping, rotating, cropping, and zooming. (2) data transformation class data enhancement to the image. Comprising the following steps: gaussian noise is added to the image and color disturbances are added.

And finally, merging the feature vectors of the multi-mode data, such as feature vectors obtained by log text data and feature vectors obtained by image data, performing concatemer splicing, classifying through an FCN full convolution network, and outputting a prediction result. The FCN full convolution network recovers the category of each pixel from the abstract feature, thereby achieving the classification of pixel level, inputting the feature into the network through a sliding window, and avoiding the problems of repeated storage and calculation convolution caused by using pixel blocks.

During training, a training set of each modality data is input into the anomaly detection model, and model parameters are updated using an optimization algorithm to minimize the loss function. Multiple iterations of training, each called an epoch, are typically required. After the iteration is completed, the test dataset may be used to evaluate the performance of the model. The indexes such as the accuracy, the precision, the recall rate and the like of the model can be calculated.

According to embodiments of the present disclosure, pain point problems related to detection of operational anomalies in DevOps are addressed. The operation abnormality has the problems of burstiness, unpredictability and the like, and needs to be positioned and processed in time, and the operation abnormality in the DevOps can be rapidly detected by the abnormality detection method provided by the disclosure. Has the following advantages:

(one), efficient: according to the scheme, abnormality detection in Devops operation is carried out through the depth model, and manual processing is not needed.

Secondly, the accuracy is high: according to the scheme, effective information of operation data is focused through the depth model, and abnormal data can be detected more accurately by combining data under multiple modes, meanwhile, interference of redundant information is reduced by using an attention mechanism, and detection accuracy is improved.

(III), low cost: according to the scheme, automatic identification is performed through a model program, and the technical capability requirement on personnel is low.

Fourth, the universality is high: the scheme is suitable for being used in scenes with different abnormal indexes.

Based on the abnormality detection method, the disclosure also provides an abnormality detection device. The device will be described in detail below in connection with fig. 10.

Fig. 10 schematically shows a block diagram of the abnormality detection apparatus according to the embodiment of the present disclosure.

As shown in fig. 10, the abnormality detection apparatus 1000 of this embodiment includes a data acquisition module 1010, a data processing module 1020, a feature fusion module 1030, and a feature classification module 1040.

The data acquisition module 1010 may perform operation S210 for acquiring operation data of the continuous delivery system, wherein the operation data includes N seed operation data, wherein each seed operation data is different from at least one other seed operation data in data type, and N is greater than or equal to 2.

The data processing module 1020 may perform operation S220, configured to input N seed operation data into N branch networks in a one-to-one correspondence, and obtain N first feature vectors output by the N branch networks, where the N branch networks are obtained based on a machine learning algorithm.

In some embodiments, the data processing module 1020 may perform operations S410 to S420, operations S510 to S520, operations S610 to S630, and operations S710 to S730, which are not described herein.

The feature fusion module 1030 may perform operation S230 for fusing the N first feature vectors to obtain a second feature vector.

The feature classification module 1040 may perform operation S240 for classifying the second feature vector to obtain an anomaly detection result.

In some embodiments, the anomaly detection apparatus 1000 may further include a model training module that may perform operations S910 to S930, which are not described herein.

The abnormality detection device 1000 includes modules for performing the respective steps of any one of the embodiments described above with reference to fig. 2 to 9.

The implementation manner, the solved technical problems, the realized functions and the realized technical effects of each module/unit/sub-unit and the like in the apparatus part embodiment are the same as or similar to the implementation manner, the solved technical problems, the realized functions and the realized technical effects of each corresponding step in the method part embodiment, and are not repeated herein.

Any of the data acquisition module 1010, the data processing module 1020, the feature fusion module 1030, and the feature classification module 1040 may be combined in one module to be implemented, or any of the modules may be split into multiple modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.

At least one of the data acquisition module 1010, the data processing module 1020, the feature fusion module 1030, and the feature classification module 1040 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware, in accordance with embodiments of the present disclosure. Alternatively, at least one of the data acquisition module 1010, the data processing module 1020, the feature fusion module 1030, and the feature classification module 1040 may be implemented at least in part as a computer program module that, when executed, performs the corresponding functions.

As shown in fig. 11, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flow according to embodiments of the present disclosure.

In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. Note that the program can also be stored in one or more memories other than the ROM 1102 and the RAM 1103. The processor 1101 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.

According to an embodiment of the disclosure, the electronic device 1100 may also include an input/output (I/O) interface 1105, the input/output (I/O) interface 1105 also being connected to the bus 1104. The electronic device 1100 may also include one or more of the following components connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, etc. Including an output portion 1107 such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like. Including a storage portion 1108 of a hard disk or the like. And a communication section 1109 including a network interface card such as a LAN card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments. Or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1102 and/or RAM 1103 described above and/or one or more memories other than ROM 1102 and RAM 1103.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1101. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 1109, and/or installed from the removable media 1111. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1101. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. An anomaly detection method, comprising:

acquiring operation data of a continuous delivery system, wherein the operation data comprises N seed operation data, wherein each seed operation data is different from at least one other seed operation data in data type, and N is more than or equal to 2;

Inputting the N seed operation data into N branch networks in a one-to-one correspondence manner to obtain N first feature vectors output by the N branch networks, wherein the N branch networks are obtained based on a machine learning algorithm;

fusing the N first feature vectors to obtain a second feature vector;

and classifying the second feature vector to obtain an abnormality detection result.

2. The method of claim 1, wherein said inputting the N seed operational data into N branch networks in a one-to-one correspondence comprises:

m abnormal indexes of the continuous delivery system are determined, wherein M is greater than or equal to 1;

determining data which are respectively associated with each abnormal index in the N seed operation data;

and inputting the data related to the abnormal index in the N seed operation data into N branch networks in a one-to-one correspondence manner according to each abnormal index.

3. The method of claim 2, wherein said inputting data associated with the abnormality index in the N seed operation data into the N branch networks in a one-to-one correspondence based on the abnormality index comprises:

determining a relationship between at least two of the M anomaly metrics, wherein the relationship comprises at least one of a causal relationship, a temporal relationship, or a category relationship;

Determining the detection sequence of at least one index in the M abnormal indexes according to the relation;

for each abnormal index, determining the input sequence of the data related to the abnormal index based on the detection sequence.

4. The method of claim 3, wherein said inputting data associated with the abnormality index in the N seed operation data into the N branch networks in a one-to-one correspondence based on the abnormality index comprises:

and simultaneously inputting data associated with the at least two indexes in the N seed operation data into N branch networks in a one-to-one correspondence manner according to the at least two indexes with the relation.

5. The method of claim 1, wherein said inputting the N seed operational data into N branch networks in a one-to-one correspondence comprises:

dividing each piece of sub-operation data according to a preset time period; and

and inputting the data in the same preset time period in the N seed operation data into N branch networks in a one-to-one correspondence manner.

6. The method of claim 1, wherein,

the N seed operational data comprises image data, the N branch networks comprise first branch networks, and the N first feature vectors comprise image feature vectors;

The inputting the N seed operation data into the N branch networks in one-to-one correspondence includes inputting the image data into the first branch network, and specifically includes:

inputting the image data into an image feature extraction model of the first branch network to obtain intermediate image features;

and inputting the intermediate image features into a convolutional neural network model of the first branch network to obtain the image feature vector.

7. The method of claim 6, wherein,

the N seed operational data comprises log data, the N branch networks comprise second branch networks, and the N first feature vectors comprise log feature vectors;

the inputting the N seed operation data into the N branch networks in a one-to-one correspondence includes inputting the log data into the second branch network, and specifically includes:

inputting the log data into a two-way long-short-term memory neural network in the second branch network, wherein the two-way long-term memory neural network is configured to process the log data based on an attention mechanism to obtain an intermediate log characteristic;

and inputting the intermediate log feature into a convolutional neural network model of the second branch network to obtain the log feature vector.

8. The method of claim 7, wherein:

the fusing the N first feature vectors to obtain a second feature vector includes: splicing the image feature vector and the log feature vector to obtain the second feature vector;

the classifying the second feature vector to obtain an anomaly detection result includes: and inputting the second feature vector into a classification model to obtain the abnormal detection result output by the classification model.

9. The method of claim 7, wherein an anomaly detection model comprises the N branch networks and the classification model, the anomaly detection model being pre-trained via:

obtaining historical log data based on the historical operation data of the continuous delivery system, wherein the historical log data comprises known log data and unknown log data, and the known log data comprises normal labels or abnormal labels;

clustering the unknown log data based on the known log data to obtain a log training set;

and training the abnormality detection model based on the log training set and the rest type of sub-operation data training set.

10. The method according to claim 1, wherein:

The N seed operation data includes at least two of log data, image data, time series data, code data, structured data, audio data, video data, or metadata.

11. An abnormality detection apparatus comprising:

the data acquisition module is used for acquiring the operation data of the continuous delivery system, wherein the operation data comprises N seed operation data, each seed operation data is different from at least one other seed operation data in data type, and N is more than or equal to 2;

the data processing module is used for inputting the N seed operation data into N branch networks in a one-to-one correspondence manner to obtain N first feature vectors output by the N branch networks, wherein the N branch networks are obtained based on a machine learning algorithm;

the feature fusion module is used for fusing the N first feature vectors to obtain a second feature vector;

and the feature classification module is used for classifying the second feature vector to obtain an abnormal detection result.

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-10.

13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 10.