CN112948155A

CN112948155A - Model training method, state prediction method, device, equipment and storage medium

Info

Publication number: CN112948155A
Application number: CN201911268312.XA
Authority: CN
Inventors: 叶尧罡
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Suzhou Software Technology Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2021-06-11
Anticipated expiration: 2039-12-11
Also published as: CN112948155B

Abstract

The embodiment of the application discloses a system anomaly prediction model training method, a system state prediction method, a device, equipment and a storage medium, wherein the method comprises the following steps: obtaining sample characteristics; initializing the system abnormity prediction model according to the set weight parameters; processing the sample characteristics through the system abnormity prediction model to obtain predicted energy; constructing an objective function based on the predicted energy; in back propagation, updating the weight parameters of the system anomaly prediction model through the objective function. In this way, the abnormal condition of the system operation state can be predicted.

Description

Model training method, state prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a system anomaly prediction model training method, a system state prediction method, an apparatus, a device, and a computer storage medium.

Background

With the development of big data technology, the size of distributed computing clusters is becoming larger and larger nowadays, and the complexity of distributed systems is also increasing. Because the number of nodes of the distributed cluster is large, although the probability of failure of each node is low, the probability of failure of the whole cluster is not low, for example, a large cluster is likely to have a hard disk damaged every day, and the failures have great influence on the stable operation of a large complex system and may even cause serious consequences. Therefore, for a large-scale system, in order to effectively reduce the loss caused by the fault, the problem needs to be discovered and solved as soon as possible, and therefore how to predict the abnormal condition of the system operation state becomes a difficult problem to be faced when the maintenance system is normally operated.

Disclosure of Invention

The embodiment of the application provides a system anomaly prediction model training method, a system state prediction method, a device, equipment and a computer storage medium, which can predict the anomaly condition of the system operation state.

In order to achieve the above purpose, the technical solution of the embodiment of the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a system anomaly prediction model training method, where the method includes:

obtaining sample characteristics;

initializing the system abnormity prediction model according to the set weight parameters;

processing the sample characteristics through the system abnormity prediction model to obtain predicted energy;

constructing an objective function based on the predicted energy;

in back propagation, updating the weight parameters of the system anomaly prediction model through the objective function.

In some embodiments, the obtaining sample features comprises:

acquiring a log sequence, and processing the log sequence to obtain a vector sequence;

and performing feature extraction on the vector sequence to obtain sample features.

In some embodiments, the processing the sample features through the system anomaly prediction model to obtain the predicted energy includes:

obtaining a first output sequence based on the sample characteristics and a first decoder of a first submodel of the system anomaly prediction model;

obtaining a second output sequence based on the sample characteristics and a second decoder of a first submodel of the system anomaly prediction model;

and acquiring a third output sequence based on the first output sequence, the second output sequence and the sample characteristics.

In some embodiments, said obtaining a third output sequence based on said first output sequence, said second output sequence, and said sample features comprises:

determining a first reconstruction error based on the first output sequence and the sample features;

determining a second reconstruction error based on the second output sequence and the sample features;

and splicing the first reconstruction error, the second reconstruction error and the hidden space vector of the last time step of the encoder in the first submodel to obtain a third output sequence.

clustering the third output sequence by using a second sub-model of the system abnormity prediction model to obtain K clusters, wherein K is a positive integer;

determining a mean and a covariance of the mth cluster based on samples within the mth cluster;

based on the mean and covariance, a predicted energy for the sample is determined.

In some embodiments, said determining a mean and covariance of said mth cluster based on samples within said mth cluster comprises:

estimating the third output sequence by utilizing the second submodel of the system abnormity prediction model, and determining the probability of each distribution of the samples in the third output sequence;

determining a mean and covariance of the mth cluster based on samples within the mth cluster and a probability that the samples belong to each distribution.

In some embodiments, said constructing an objective function based on said predicted energy comprises:

determining a reconstruction loss based on the first output sequence, the second output sequence, and the sample characteristics;

and constructing an objective function based on the reconstruction loss and the predicted energy.

In a second aspect, an embodiment of the present application provides a method for predicting a system state, where the method includes:

acquiring sample characteristics of a system;

determining energy corresponding to the sample characteristics based on a system abnormity prediction model;

determining a state of the system based on the energy.

In some embodiments, determining a state of the system based on the energy comprises:

determining that the system is abnormal when the predicted energy is greater than a preset energy threshold;

alternatively, the system is determined to be normal if the predicted energy is less than or equal to the energy threshold.

In some embodiments, before determining the energy corresponding to the sample feature based on the system anomaly prediction model, the method further comprises:

acquiring training sample characteristics;

processing the training sample characteristics through the system anomaly prediction model to obtain the prediction energy of the training sample;

constructing an objective function based on the predicted energies of the training samples;

In a third aspect, an embodiment of the present application provides a system anomaly prediction model training apparatus, which includes a first obtaining module, an initializing module, a first processing module, a constructing module, and an updating module, wherein,

the first obtaining module is used for obtaining sample characteristics;

the initialization module is used for initializing the system abnormity prediction model according to the set weight parameters;

the first processing module is used for processing the sample characteristics through the system abnormity prediction model to obtain predicted energy;

the construction module is used for constructing an objective function based on the predicted energy;

and the updating module is used for updating the weight parameters of the system abnormity prediction model through the target function in back propagation.

In a fourth aspect, embodiments of the present application provide a system state prediction apparatus, which includes a second obtaining module, an energy determination module, and a state determination module, wherein,

the second acquisition module is used for acquiring the sample characteristics of the system;

the energy determining module is used for determining energy corresponding to the sample characteristics based on a system abnormity prediction model;

the state determination module is configured to determine a state of the system based on the energy.

In a fifth aspect, embodiments of the present application provide an apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for training a system anomaly prediction model provided in any embodiment of the present application, and/or the steps of the method for predicting a system state provided in any embodiment of the present application.

In a sixth aspect, embodiments of the present application provide a computer storage medium, where a system anomaly prediction model training program and/or a system state prediction program are stored on the computer storage medium, where the system anomaly prediction model training program, when executed by a processor, implements the steps of the system anomaly prediction model training method provided in any of the embodiments of the present application, and the system state prediction program, when executed by the processor, implements the steps of the system state prediction method provided in any of the embodiments of the present application.

The method for training the system anomaly prediction model provided by the embodiment includes the steps of obtaining sample characteristics, initializing the system anomaly prediction model according to set weight parameters, processing the sample characteristics through the system anomaly prediction model to obtain prediction energy, constructing an objective function based on the prediction energy, and updating the weight parameters of the system anomaly prediction model through the objective function in back propagation. Therefore, an end-to-end system abnormity prediction model is established, an objective function is constructed based on prediction energy, so that the system abnormity prediction model can be trained by adopting label-free samples, the process of manual labeling is reduced, the problem that a large number of positive samples are difficult to obtain and the performance of the model is influenced is solved, and the effect of obtaining a large number of training samples and ensuring the performance of the model is achieved. Meanwhile, the target function is used for back propagation, and the system anomaly prediction model is subjected to combined optimization, so that the purpose of optimal performance of the anomaly prediction model is achieved.

Drawings

FIG. 1 is a schematic processing flow diagram illustrating a system anomaly prediction model training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic processing flow diagram illustrating a system state prediction method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a structure of a system anomaly prediction model according to an embodiment of the present application;

FIG. 4 is a diagram illustrating a structure of a log sub-sequence according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating an exemplary system anomaly prediction model training apparatus according to the present application;

FIG. 6 is a schematic diagram of a system state prediction apparatus according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a system anomaly prediction model training device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a system state prediction device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the following will describe the specific technical solutions of the present application in further detail with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Before describing the system anomaly prediction model training method provided by the embodiment of the present application in detail, the technology related to the present application will be briefly introduced.

A log file is a record file or collection of files used to record system operational events and plays an important role in handling historical data, tracking diagnostic issues, and understanding the activities of the system. At present, a programmer utilizes a log file to locate problems when developing a program, and an operation and maintenance worker utilizes the log file to locate problems when a system has a fault, wherein the log file is composed of a plurality of logs.

Because the log file can reflect the current state of the system in time, the abnormal detection is carried out on the running state of the system based on the existing method based on log data analysis. Specifically, the log is divided into individual words according to a natural language processing technology, each log is converted into a word list or the word list is further encoded, and the encoding method is usually implemented by TF-IDF (term frequency-inverse text frequency index), or BOW (Bag-of-words), or word2vec (word to vector). And based on the encoded log vectors, clustering is carried out or a time series analysis method is used for carrying out abnormity detection. However, the above-mentioned anomaly detection method can only detect that the current operation of the system is abnormal, and even detect the abnormality after the system is in an abnormal operation state for a period of time, so the above-mentioned anomaly detection method cannot ensure high stability and high availability of the system.

In one aspect, an embodiment of the present invention provides a method for training a system anomaly prediction model, please refer to fig. 1, where the method includes:

step 101, obtaining sample characteristics.

Here, the system anomaly prediction model training device encodes the samples for training, and obtains the sample characteristics that can be input into the system anomaly prediction model, that is, converts the sample format into a normalized format, so as to facilitate training of the system anomaly prediction model.

For example, the system abnormality prediction model training device encodes a sample used for training, encodes the sample into a fixed-length vector, and inputs the fixed-length vector as a sample feature into the system abnormality prediction model.

And step 102, initializing the system abnormity prediction model according to the set weight parameters.

Here, the system abnormality prediction model training device sets weight parameters of the system abnormality prediction model in advance, initializes the system abnormality prediction model based on the set weight parameters, and the accuracy of the initialized system abnormality prediction model is not high.

And 103, processing the sample characteristics through the system abnormity prediction model to obtain predicted energy.

Here, the system abnormality prediction model training device inputs the sample characteristics to the system abnormality prediction model, and obtains the prediction energy of the sample through the processing of the system abnormality prediction model.

For example, the system anomaly prediction model captures sequential information of sample characteristics by using a recurrent neural network, and then performs parameter estimation on the sample by using a Gaussian model, so as to obtain prediction energy.

And 104, constructing an objective function based on the predicted energy.

Here, since the system abnormality prediction model training device constructs the objective function based on the prediction function, the samples used for training the system abnormality prediction model may be unlabeled samples. The system anomaly prediction model training belongs to unsupervised learning.

And 105, updating the weight parameters of the system anomaly prediction model through the objective function in back propagation.

The system anomaly prediction model training device continuously corrects the weight parameters of the system anomaly prediction model in back propagation by using an objective function, so that the reconstruction loss of the system anomaly prediction model is smaller and smaller, and convergence is achieved.

In the above implementation, the obtained sample features are input into the system anomaly prediction model, the prediction energy is obtained, an objective function is constructed based on the prediction energy, and the weight parameters of the system anomaly prediction model are updated in the back propagation by using the objective function. Therefore, an end-to-end system abnormity prediction model is established, an objective function is constructed based on prediction energy, so that the system abnormity prediction model can be trained by adopting label-free samples, the process of manually marking samples is reduced, the problem that a large number of positive samples are difficult to obtain and the performance of the model is influenced is solved, and the effect of obtaining a large number of training samples and ensuring the performance of the model is achieved. Meanwhile, the target function is used for back propagation, and the system anomaly prediction model is subjected to combined optimization, so that the purpose of optimal performance of the anomaly prediction model is achieved.

In some embodiments, the step 101 of obtaining sample features includes:

the system abnormity prediction model training device obtains a log sequence, and the log sequence is processed to obtain a vector sequence.

Here, the samples are log sequences, the log sequences usually include various types of variables, and the system anomaly prediction model is directly trained by using the log sequences, which introduces noise interference caused by the variables in the log sequences, thereby reducing the accuracy of the system anomaly prediction model. Therefore, a log sequence needs to be processed to convert the log sequence into a vector sequence.

Wherein the processing the log sequence comprises preprocessing the log sequence and encoding the preprocessed log sequence. Specifically, the preprocessing the log sequence includes cleaning variables in the log sequence, for example, replacing the variable values of the log sequence by using a regular expression matching method based on a unique format and artificial experience of the log, so that each log in the log sequence is converted into a word list, and encoding each log by using an integer encoding method based on a natural language processing idea to obtain a vector sequence.

And the system anomaly prediction model training device extracts the characteristics of the vector sequence to obtain the characteristics of the sample.

Here, the system anomaly prediction model training device performs feature extraction on the vector sequence by using a neural network to obtain sample features. The neural network is trained through a word embedding model by utilizing mass data in advance.

In the above embodiment, when the sample is a log sequence, the log sequence needs to be processed to obtain a vector sequence, and feature extraction is performed on the vector sequence, so as to obtain sample features. Therefore, the log sequence is processed first, noise caused by variables in the log can be avoided when model training is carried out, and accuracy of the system abnormity prediction model is improved.

In some embodiments, the step 103 of processing the sample features through the system anomaly prediction model to obtain the predicted energy includes:

Here, the first sub-model of the system anomaly prediction model includes an encoder and two decoders, and the encoder of the first sub-model is composed of unidirectional stacked GRUs (Gated current units), so that the first sub-model can capture precedence information of samples. Wherein the first decoder comprises a normal decoder and the second decoder comprises a decoder with an attention mechanism.

The encoder of the first sub-model may include one or more coding layers, for example, according to step 101, the obtained sample feature X ═ X (X ═ X)₁,x₂,…,x_n) The first hidden space vector of the first coding layer is respectively obtained after the encoder input to the first submodel 302 passes through the GRU units stacked in the double-layer single direction

And a second hidden space vector of a second coding layer

The first hidden space vector h¹And a second hidden space vector h²Splicing is carried out, and a hidden space vector H ═ H is obtained₁,h₂,…h_n) Wherein, in the step (A),

concat represents the splicing function, and n is the number of time steps of GUR.

The sample feature X obtained in step 101 is input to the first submodel, and the process of obtaining the third output sequence may be represented as:

z＝Attentive_GRU(X) (1)

where z denotes the third output sequence and Attentive _ GRU denotes the non-linear transformation function of the first submodel.

Here, the obtaining, by the first decoder based on the sample features and the first sub-model of the system anomaly prediction model, a first output sequence includes: and determining the current output of the first decoder according to the historical hidden space vector of the first decoder of the first submodel and the historical output of the first decoder.

For example, taking time t as an example, the hidden state s of the first decoder at the previous time¹ _t-1And the output y of the first decoder at the previous time¹ _t-1Input to the first decoder to obtain the output y of the current time¹ _tThe process can be expressed as y¹ _t＝GRU1(s¹ _t-1,y¹ _t-1) Wherein GRU1 is a first decoder nonlinear transformation function, s₀＝h_n，h_nIs the hidden spatial vector at the last time step of the encoder. Thus, the sample features are input into a first submodel, and a first output sequence Y is obtained by the first decoder₁＝(y¹ ₁,y¹ ₂,…,y¹ _n)。

Wherein, y¹ _t＝GRU1(s¹ _t-1,y¹ _t-1)，y¹ _tThe updating process comprises the following steps:

z¹ _t＝σ(W¹ _zy¹ _t-1+U_z ¹s¹ _t-1) (2)

r_t ¹＝σ(W_r ¹y¹ _t-1+U_r ¹s¹ _t-1) (3)

y¹ _t＝σ(W¹ _o·s_t ¹) (6)

wherein, W_z ¹、W_r ¹、W_s ¹、W_o ¹、U_z ¹、U_r ¹、U_s ¹Is a related weight matrix, z_t ¹To refresh the door, r_t ¹In order to reset the gate, the gate is reset,

tanh represents a Tanh function, σ represents a Sigmoid function,

representing matrix element multiplication.

Here, obtaining a second output sequence based on the sample features and a second decoder of the first sub-model 302 of the system anomaly prediction model includes determining a current output of the second decoder from the hidden spatial vector of the encoder, a historical hidden spatial vector of the first decoder, and a historical output of the second decoder.

For example, taking time t as an example, after the hidden space vector of the encoder passes through the attention module, the weighted sum c of the attention mechanism is obtained_tA weighted sum c of said attention mechanisms_tHidden state s of the second decoder at the previous moment² _t-1And the output y of the second decoder at the previous time² _t-1Input into a second decoder to obtain the output y of the current time² _tThe process can be represented as y² _t＝GRU2(s¹ _t-1,y¹ _t-1，c_t) Wherein the GRU2 is a second decoder nonlinear transformation function, such that sample features are input into the first submodel 302, a first output sequence Y is obtained by the first decoder₂＝(y² ₁,y² ₂,…,y² _n)。

Wherein a weighted sum c of attention mechanisms is obtained_tThe process of (a) is as follows,

here, e_tjRepresenting a hidden spatial vector s of a second decoder² _t-1With encoder hidden space vector h_jThe correlation of (c). Alpha is alpha_tjIs a weight coefficient for representing the hidden space vector s of the second decoder_t-1The importance of (c).

Wherein, y² _t＝GRU2(s¹ _t-1,y¹ _t-1，c_t)，y² _tThe updating process comprises the following steps:

z_t ²＝σ(W_z ²y² _t-1+U_z ²s² _t-1+C_zc_t) (9)

r_t ²＝σ(W² _ry² _t-1+U² _rs² _t-1+C_rc_t) (10)

wherein, W_z ²、W_r ²、W_s ²、W_o ²、U_z ²、U_r ²、U_s ²、C_z ²、C_r ²And C_s ²Is a related weight matrix, z_t ²To refresh the door, r_t ²In order to reset the gate, the gate is reset,

tanh represents a Tanh function, σ represents a Sigmoid function,

representing multiplication of matrix elements

In the above embodiment, the first sub-model includes a normal decoder and a decoder with an attention mechanism, and different samples are endowed with different importance by introducing the attention mechanism, so that the correlation information of the samples in time sequence is captured more effectively, and the prediction accuracy of the system anomaly prediction model is improved.

Here, the first reconstruction error e₁The calculation formula is as follows:

wherein, Y₁Is the first output sequence output by the first decoder, D represents the dimension of X, which represents the sample characteristics of the input first submodel.

Here, the second reconstruction error e₂The calculation formula is as follows:

wherein, Y₂Is the second output sequence output by the second decoder, D represents the dimension of X, which represents the sample characteristics of the input first submodel.

Here, the first reconstruction error e is determined₁The second reconstruction error e₂And the hidden space vector h of the last time step of the encoder in the first submodel_nSplicing to obtain a third output sequence z ═ h_n,e₁,e₂]. And the third output sequence is an output sequence of the first sub-model and is used as an input sequence of the second sub-model.

the system abnormity prediction model training device clusters the third output sequence by using a second submodel of the system abnormity prediction model to obtain K clusters, wherein K is a positive integer;

determining a mean and a covariance of the mth cluster based on samples within the mth cluster, 0< m ≦ K;

Here, the second submodel is used to perform parameter estimation on the sample characteristics, and the predicted energy of the sample is determined according to the estimated parameters.

For example, the second submodel includes a gaussian mixture model GMM based on the K-means and the estimation network, and the second submodel is used to determine parameters of a gaussian distribution corresponding to the third output sequence, thereby determining the predicted energy of the sample.

Here, the third output sequence is estimated by using an estimation network of the second submodel of the system anomaly prediction model, and the probability of the third output sequence belonging to each distribution in the GMM is determined

The estimation network is a multilayer fully-connected neural network, and the last layer of the neural network is a normalization function layer. When the probability that the third output sequence belongs to each distribution in the GMM is estimated by utilizing the multilayer fully-connected neural network, the probability that the third output sequence belongs to each distribution

Is modeled as a multi-classification problem, the probability of each distribution is obtained by a multilayer fully-connected neural network:

the MLP is a nonlinear function of a multilayer fully-connected neural network, the number of neurons in an input layer of the fully-connected neural network is the same as the z dimension, the number of neurons in an output layer is K, an activation function is a normalization function, and the value of K is obtained by observing the distribution characteristics of a log sequence. For example, if the log sequence is divided into a normal log sequence and an abnormal log sequence, then K is taken to be 2.

It should be noted that the number of neurons in the output layer is equal to the number of clusters after clustering, and a sample in each cluster after clustering corresponds to a distribution output by the layer fully-connected neural network, that is, the mth cluster corresponds to the mth distribution.

Determining the probability gamma of each sample in the third output sequence belonging to each distribution by using an estimation network, and determining the mean value of the mth cluster according to the calculation method of the Gaussian mixture model parameters based on the probability of the sample in the mth cluster and the probability of the sample in the mth cluster belonging to the mth distribution

Sum covariance

For example:

wherein, γ_imRepresenting clustered intra-cluster samples z_i' the probability of belonging to the m-th distribution,

represents the mean value of the m-th cluster,

represents the covariance of the mth cluster.

Here, the mean and covariance are used as the basisDetermining a predicted energy of the sample, comprising: based on the probability, mean and covariance of the K-distribution, the predicted energy of the sample is determined. Specifically, according to the calculation method of the gaussian distribution, the probability that all samples belong to the mth distribution is calculated in an average manner to obtain the probability of the mth distribution, and the probability of the mth distribution is calculated

The formula of (1) is:

wherein W is the number of samples of the second submodel input to the system abnormality prediction model.

Here, the parameters obtained from the estimation, such as the probability of each distribution

Mean value

Sum variance

The predicted energy e (z) for each sample may be calculated as follows:

in the above embodiment, the sample is subjected to parameter estimation by using the estimation network and the clustering model in the second sub-model, so that the estimated sample distribution is more accurate, and no labeled sample is required to participate.

In some embodiments, the step 104 of constructing an objective function based on the predicted energy comprises:

In the first sub-model, the performance of the encoder and the decoder is described by using the first reconstruction error and the second reconstruction error, W samples are taken as training samples for training the abnormal prediction model of the system, and the reconstruction loss L is calculated_rcThe formula of (1) is:

wherein | | | purple hair₂Denotes norm, y¹ _iRepresenting a first decoder pair x_iReconstructed result, y² _iRepresenting a second decoder pair x_iAnd (5) reconstructing the result.

Here, the objective function L is:

wherein λ is₁Is a hyper-parameter which is the parameter,

is the total predicted energy of W samples.

In the embodiment, in the system anomaly prediction model training, the reconstruction loss and the sample energy are introduced into the objective function to perform model training, and the required training samples are label-free data and belong to unsupervised learning.

In another aspect of the present invention, a method for predicting a system status is provided, please refer to fig. 2, where the method includes:

step 201, a system state prediction device acquires sample characteristics of a system;

step 202, the system state prediction device determines energy corresponding to the sample characteristics based on a system abnormity prediction model;

in step 203, the system state prediction device determines the state of the system based on the energy.

Here, the system state prediction means determines that the system is abnormal in a case where the energy is greater than a preset energy threshold, or determines that the system is normal in a case where the energy is less than or equal to the energy threshold.

For example, when the energy e (z) > θ is determined as an abnormal sample, the system state prediction apparatus determines that an abnormality occurs during operation of the system.

In some embodiments, before said determining the energy corresponding to the sample feature based on the system anomaly prediction model in step 202, the method further comprises:

the system abnormity prediction obtains training sample characteristics;

In the above embodiment, when the system state prediction apparatus performs prediction by using a trained system anomaly prediction model, it is determined whether an anomaly of the system is about to occur by calculating the energy of the sample, and comparing the energy of the sample with a preset energy threshold, so as to perform unsupervised anomaly prediction.

In another aspect of the embodiment of the present application, another method for training a system anomaly prediction model is provided, so as to further understand the method for training an information model provided in the embodiment of the present application, which is described by taking, as an example, log data generated by a sample from each component of a hadoop system in a period of time, where the log data is composed of a plurality of logs, please refer to fig. 3, the system anomaly prediction model includes a first sub-model 302 and a second sub-model 303, and the sample features are processed based on the first sub-model 302 and the second sub-model 303 to obtain prediction energy, where the first sub-model 302 includes a GRU layer including a dual decoder and an attention mechanism, and the second sub-model 303 is a gaussian mixture model based on a K-means and an estimation network. In addition, the system anomaly prediction model may further include an embedding layer 301, where the embedding layer 301 is used to obtain sample features. The system anomaly prediction model training method applied to the system anomaly prediction model training device comprises the following steps:

step 401, obtaining sample characteristics by using an embedding layer 301;

step 402, initializing the system abnormity prediction model according to the set weight parameters;

step 403, processing the sample characteristics through the first submodel 302 and the second submodel 303 to obtain predicted energy;

step 404, constructing an objective function based on the predicted energy;

step 405, updating the weight parameters of the first submodel 302 and the second submodel 303 of the system anomaly prediction model through the objective function in back propagation.

Here, the objective function is used as a loss function, and an adaptive moment estimation (ada) optimizer is used to update the weight parameters in the first sub-model and the second sub-model by calculating the gradient of the loss function.

In the embodiment, vectorization, feature extraction and anomaly prediction of the log are integrated into the system anomaly prediction model, so that the system anomaly prediction model is more convenient to predict and is suitable for different data. And the target function constructed based on the predicted energy performs joint optimization on the whole through a back propagation algorithm, so that the final result can be ensured to be optimal.

In some embodiments, before acquiring the sample feature using the embedding layer 301, step 401 further includes: step 405, pre-processing the sample.

Here, the preprocessing of the samples is to convert a sample format into a data format input by the system anomaly prediction model. Wherein the preprocessing comprises transforming and clustering the samples. Taking a sample as log data as an example for explanation, referring to table 1, a format of a log is (timestamp, level, class, information message), and the converting the log data includes: and cleaning each log in the log data and performing word segmentation on the cleaned log.

Table 1 log preprocessing example

Specifically, the cleaning of each log in the log data includes replacing variable values in the log by using a regular expression matching method based on a unique format and manual experience of the log. The variables to be replaced include an IP (internet protocol) address, a timestamp, a log level, a path/URL (uniform resource locator), 10-System numbers, 16-System numbers, a block number identifier block _ id of a Hadoop Distributed File System (HDFS), an application number identifier application _ id, a work number identifier jobid, a task number identifier task _ id, a container number identifier container _ id, and the like.

Specifically, the segmenting the cleaned log includes segmenting the log with the variable removed. In which, unlike the segmentation of natural language, the segmentation of logs uses more separators, and in the embodiment of the present application, "#", "\\", "-", "" "", "(", ")", "" ","? ","! "or the like" as a word separator, it should be noted that the separator is not limited to the above-mentioned exemplified symbols.

In addition, after the log data is converted, each log_iIs converted into a word list token_i＝{t₁,t₂,…,t_nWhere t is_jRepresenting a string of indefinite length. The word lists of all logs together form a list set total ═ tokens₁,tokens₂,…,tokens_n}. Due to the characteristic that a plurality of noise interferences exist in log data, for each token_iFiltration is also required. The filtering rules may include: there are many randomly generated strings with indefinite length in the log, which are composed of lower case letters and numbers, and in the step of removing the variable, the numbers in the log data are replaced by "#", for example, the random string "1 a8fb23e 6" is processed into "# a # fb # e #" after the step of removing the variable, and after the word segmentation, { "a", "fb", "e" }isobtained. And for each token_iRemoving t of a predetermined length therefrom_jFor example, since many random character strings composed of lower case letters with very short lengths are obtained after word segmentation, most of which have a length of 1, the preset length is set to 1, i.e., each token is set to be 1_iRemoving t of length 1_j。

Here, since the logs are often different, even a log output by the same log print statement in the application program is different depending on a variable in the log. In order to reduce the complexity of log analysis, the converted log data is clustered, each log in the processed log data is distributed to a corresponding cluster, and each cluster is given a label p after the clustering is finished_iGiving the logs in each cluster to the cluster's label p_iThus, logs from the application program with similar log printing statements are labeled with the same label for subsequent analysisThe method treats the logs as the same logs, so that tens of millions of different logs can be simplified into thousands of different labels, and the complexity of analysis is greatly reduced. For example, the edit distance is used as a distance measure of the cluster, that is, each token list is regarded as a sentence composed of several words, and the distance between any two logs is the edit distance of two token sequences. And clustering the converted log sequence by using an OPTICS (Ordering points to identification the clustering structure) algorithm.

So far, after all log data are converted and clustered, log sequences are converted, the log sequences are sorted according to time sequence, and each log corresponding to the log sequences is represented by a cluster label corresponding to each log, but not specific log contents.

In some embodiments, the step 401 of obtaining sample features using the embedding layer 301 includes:

step 4011, obtaining a log sequence, and processing the log sequence by using the embedded layer 301 to obtain a vector sequence.

Here, acquiring the log sequence means acquiring the log sequence for training the system anomaly prediction model. Specifically, the log sequence obtained in step 405 is divided into a plurality of sub-sequences according to a sliding time window, please refer to fig. 4, a time window TC and a TP with fixed duration are taken, and the log sequences are moved on the log sequence with a certain step t, so as to obtain a log sub-sequence, for example, t is 3, the time window TC1 corresponds to the log sequences at times 1 to 30, the time window TP1 corresponds to the log sequences at times 31 to 40, and the log sequences are moved on the log sequences with a certain step 3, the time window TC2 is the log sequence at times 4 to 33, the time window TP2 corresponds to the log sequences at times 34 to 43, the first log sub-sequence is the log sub-sequence corresponding to the time windows TC1 and TP1, and the second log sub-sequence is the log sub-sequence corresponding to the time windows TC2 and TP 2. The log sub-sequence corresponding to the time window TC is a log sequence used for prediction, and the log sequence corresponding to the time window TP is a log sub-sequence used for prediction, that is, the system anomaly prediction model predicts whether an anomaly occurs in the system during the time window TP by using the log sequence corresponding to the time window TC. The method is adopted to obtain the log subsequences corresponding to the time windows TC, and the log subsequences are used as the log sequences of the abnormal prediction model of the training system.

It should be noted that, because the log sequences used for training are all historical data, the label of each time window TC may be determined by the system operating state during the corresponding TP window, for example, as shown in fig. 4, if a system operation abnormality occurs during the TP window, the time window TC label is 1, and if the system operation is normal during the TP window, the TC label is 0. Therefore, the accuracy of the trained system anomaly prediction model can be verified by using the log subsequence corresponding to the marked TC.

Here, the processing the log sequence to obtain a vector sequence includes encoding the log sequence by using an integer encoding method, and converting the log sequence into the vector sequence. For example, a time window TC contains n logs, each log corresponds to a tag P, and the subsequence of logs corresponding to the time window TC can be represented by the tag corresponding to each log as P ═ { P ═_log1,p_log2,…,p_lognBased on the idea of natural language processing, consider P as a sentence and P_logiRegarding as words, adopting one-hot coding method to make every word p_logiEncoding is performed to convert the log sequence into a vector sequence.

And step 4012, extracting features of the vector sequence by using the embedding layer 301 to obtain sample features.

Here, the embedding layer 301 includes a neural network, the vector sequence obtained in step 4011 is input to the neural network of the embedding layer 301, the vectors in the vector sequence are subjected to feature extraction based on a word embedding algorithm, and are converted into feature vectors x with a fixed length_iThus, sample characteristics are obtained. The Word embedding algorithm may adopt Word2Vec (Word vectors) or GloVe (Global vectors for Word representation), but is not limited to the above algorithm. Taking a vector sequence corresponding to a time window TC, said vector sequence comprising n vectors, for example, based on embeddingThe neural network of layer 301 converts the vector into a 64-dimensional feature vector, which is represented by X ═ X (X)₁,x₂,…,x_n) Each x_iAre 64 dimensions.

It should be noted that the model parameters of the neural network in the embedding layer 301 are trained by the word embedding model using mass data in advance. When the system abnormity prediction model is carried out, the neural network parameters are not updated.

In some embodiments, referring to fig. 3, in step 403, the processing the sample features by the first sub-model 302 and the second sub-model 303 to obtain the predicted energy includes:

step 4031a, a first output sequence is obtained based on the sample characteristics and a first decoder of a first submodel 302 of the system anomaly prediction model;

step 4031b, a second output sequence is obtained based on the sample characteristics and a second decoder of the first submodel 302 of the system anomaly prediction model;

step 4032, a third output sequence is obtained based on the first output sequence, the second output sequence and the sample characteristics.

Here, the system anomaly prediction model training apparatus captures precedence information on the log sequence by using a first sub-model 302, where the first sub-model 302 includes an encoder and two decoders, specifically, the first sub-model includes GRU layers including dual decoders and an attention mechanism, and the encoder is composed of unidirectional stacked GRUs, for example, 64 GRU units are used for each layer of the encoder.

Wherein the GRU layer encoder may include one or more encoding layers. For example, in order to achieve better practical effects, the GRU layer encoder includes two encoding layers, and the sample characteristics of the log sub-sequence including n logs corresponding to one time window TC are taken as an example for explanation. Obtaining the sample characteristic X ═ X (X) in the step 402₁,x₂,…,x_n) The first edition is respectively obtained after the encoder input to the first sub-model 302 passes through the GRU units stacked in the double-layer single directionFirst hidden space vector of code layer

And a second hidden space vector of a second coding layer

The sample feature X obtained in step 402 is input into the first submodel 302, and the process of obtaining the third output sequence may be represented as:

z＝Attentive_GRU(X) (23)

where z denotes a third output sequence and Attentive _ GRU denotes a non-linear transformation function of the first submodel 302.

In some embodiments, the first decoder comprises a normal encoder, and the step 4031a, based on the sample features and the first decoder of the first submodel 302 of the system anomaly prediction model, obtains the first output sequence, comprises: and determining the current output of the first decoder according to the historical hidden space vector of the first decoder of the first submodel and the historical output of the first decoder.

For example, taking time t as an example, the hidden state s of the first decoder at the previous time¹ _t-1And the output y of the first decoder at the previous time¹ _t-1Input to the first decoder to obtain the output y of the current time¹ _tThe process can be expressed as y¹ _t＝GRU1(s¹ _t-1,y¹ _t-1) Wherein GRU1 is a first decoder nonlinear transformation function, s₀＝h_n，h_nHidden space being the last time step of the encoderAnd (5) vector quantity. Thus, sample features are input into the first submodel 302, and a first output sequence Y is obtained by the first decoder₁＝(y¹ ₁,y¹ ₂,…,y¹ _n)。

z¹ _t＝σ(W¹ _zy¹ _t-1+U_z ¹s¹ _t-1) (24)

r_t ¹＝σ(W_r ¹y¹ _t-1+U_r ¹s¹ _t-1) (25)

y¹ _t＝σ(W¹ _o·s_t ¹) (28)

tanh represents a Tanh function, σ represents a Sigmoid function,

representing matrix element multiplication.

In some embodiments, the second decoder comprises a decoder with an attention mechanism, and the obtaining a second output sequence based on the sample features and the second decoder of the first sub-model 302 of the system anomaly prediction model in step 4031b comprises determining a current output of the second decoder according to the hidden spatial vector of the encoder, the historical hidden spatial vector of the first decoder, and the historical output of the second decoder.

here, e_tjRepresenting a hidden spatial vector s of a second decoder² _t-1With encoder hidden space vector h_jThe correlation of (c). Alpha is alpha_tjIs a weight coefficient for representing the hidden space vector s of the second decoder_t-1Is heavyNature is important.

z_t ²＝σ(W_z ²y² _t-1+U_z ²s² _t-1+C_zc_t) (31)

r_t ²＝σ(W² _ry² _t-1+U² _rs² _t-1+C_rc_t) (32)

tanh represents a Tanh function, σ represents a Sigmoid function,

representing multiplication of matrix elements

In some embodiments, the step 4032, acquiring a third output sequence based on the first output sequence, the second output sequence and the sample feature, includes:

Here, the sample feature X obtained in step 402 is (X)₁,x₂,…,x_n) And the first output sequence Y acquired in the step 4031a₁＝(y¹ ₁,y¹ ₂,…,y¹ _n) Determining a first reconstruction error e₁。

Wherein the first reconstruction error e₁The calculation formula is as follows:

here, the sample feature X obtained in step 402 is (X)₁,x₂,…,x_n) And the second output sequence Y acquired in the step 4031b₂＝(y² ₁,y² ₂,…,y² _n) Determining a second reconstruction error e₂。

Wherein the second reconstruction error e₂The calculation formula is as follows:

here, the first reconstruction error e is determined₁The second reconstruction error e₂And the hidden space vector h of the last time step of the encoder in the first submodel_nSplicing to obtain a third output sequence z ═ h_n,e₁,e₂]. Wherein the third output sequence is an output sequence of the first submodel 302 as an input sequence of the second submodel 303.

It should be noted that z is a third output sequence of sample features of the log sub-sequence corresponding to one time window TC.

In some embodiments, the step 402 of processing the sample features through the first sub-model 302 and the second sub-model 303 to obtain the predicted energy includes:

Here, the second sub-model includes a gaussian mixture model GMM based on the K-means and the estimation network, and the second sub-model is used to determine parameters of a gaussian distribution corresponding to the third output sequence, so as to determine the predicted energy of the sample.

Taking a log subsequence corresponding to W time windows TC, taking the log subsequence as a log sequence of a training system abnormal prediction model, and outputting W third output sequences (z₁,z₂,…z_W) Inputting the data into a second sub-model of the system abnormity prediction model, clustering W third output sequences based on a clustering algorithm to obtain K clusters, wherein samples in each cluster are

For example, 100 third output sequences are input into the second submodel, a K-means clustering algorithm is adopted, Euclidean distance is used as a similarity measure, and 2 clusters are obtained, wherein a sample in one cluster is (z'₁,z'₂,…,z'₃₅) Here, z'_iCorresponding to the third output sequence z_rAnd r is 100 or less.

Wherein the determining a mean and a covariance of the mth cluster based on the samples within the mth cluster comprises:

and estimating the third output sequence by utilizing the second submodel of the system abnormity prediction model, and determining the probability of each distribution of the samples in the third output sequence.

The estimation network is a multilayer fully-connected neural network, and the last layer of the estimation network is a normalization function layer. When the probability that the third output sequence belongs to each distribution in the GMM is estimated by utilizing the multilayer fully-connected neural network, the probability that the third output sequence belongs to each distribution

the MLP is a nonlinear function of a multilayer fully-connected neural network, the number of neurons in an input layer of the multilayer fully-connected neural network is the same as the z dimension, the number of neurons in an output layer is K, an activation function is a normalization function, and the value of K is obtained by observing the distribution characteristics of a log sequence. For example, if the log sequence is divided into a normal log sequence and an abnormal log sequence, then K is taken to be 2.

Sum covariance

For example:

wherein, γ_imDenotes sample z'_iThe probability of belonging to the m-th distribution,

represents the mean value of the m-th cluster,

represents the covariance of the mth cluster.

Here, the determining the predicted energy of the sample based on the mean and the covariance includes: based on the probability, mean and covariance of the K-distribution, the predicted energy of the sample is determined. Specifically, according to the calculation method of the Gaussian mixture model parameters, the probability that all samples belong to the mth distribution is calculated in an average mode to obtain the probability of the mth distribution, and the probability of the mth distribution is calculated

Formula (2)Comprises the following steps:

Mean value

Sum variance

The predicted energy e (z) for each sample may be calculated as follows:

in some embodiments, the step 404 of constructing an objective function based on the predicted energy comprises:

determining a reconstruction loss based on the first output sequence, the second output sequence, and the sample characteristics.

Here, in the first sub-model 302, the performance of the encoder and the decoder is described by using the first reconstruction error and the second reconstruction error, W log sub-sequences are taken as log sequences for training the system abnormal prediction model, and the reconstruction loss L is calculated_rcThe formula of (1) is:

Here, the objective function L is:

wherein λ is₁Is a hyper-parameter which is the parameter,

is the total predicted energy of W samples.

In some embodiments, when the trained system anomaly prediction model is used for prediction, the system state prediction device inputs a log sequence for prediction into the system anomaly prediction model to obtain the energy of the prediction sample;

here, the system state prediction apparatus inputs the log sequence for prediction to an embedded layer, acquires a sample feature, inputs the sample feature to a first submodel, acquires a third output sequence, inputs the third output sequence to an estimation network of a second submodel, determines a probability that a sample belongs to each distribution, and determines the energy e (z) of the predicted log sequence together with a mean value and a covariance of each distribution obtained during training.

Determining a state of the system based on the energy.

Here, the system state prediction means determines that the system is abnormal in a case where the energy is greater than a preset energy threshold; alternatively, the system is determined to be normal if the energy is less than or equal to the energy threshold.

For example, the preset energy threshold is θ, when the log sequence corresponding to the time window TC is input into the abnormality prediction model, the energy e (z) is determined, and when the energy e (z) > θ is determined, and the prediction sample is determined to be an abnormal sample, it is determined that the system will be abnormal during the TP period.

In another aspect of the present embodiment, a system anomaly prediction model training apparatus is further provided, referring to fig. 5, the apparatus includes a first obtaining module 401, an initializing module 402, a first processing module 403, a constructing module 404, and an updating module 405, wherein,

the first obtaining module 401 is configured to obtain sample characteristics;

the initialization module 402 is configured to initialize the system anomaly prediction model according to a set weight parameter;

the first processing module 403 is configured to process the sample features through the system anomaly prediction model to obtain prediction energy;

the constructing module 404 is configured to construct an objective function based on the predicted energy;

the updating module 405 is configured to update the weight parameter of the system anomaly prediction model through the objective function in the back propagation.

In some embodiments, the first obtaining module 401 comprises a vector obtaining unit and a feature extracting unit, wherein,

the vector acquisition unit is used for acquiring a log sequence and processing the log sequence to obtain a vector sequence;

and the feature extraction unit is used for extracting features of the vector sequence to obtain sample features.

In some embodiments, the first processing module 403 includes a first sequence obtaining unit, a second sequence obtaining unit, and a third sequence obtaining unit, wherein,

the first sequence obtaining unit is used for obtaining a first output sequence based on the sample characteristics and a first decoder of a first submodel of the system abnormity prediction model;

the second sequence obtaining unit is used for obtaining a second output sequence based on the sample characteristics and a second decoder of a first submodel of the system abnormity prediction model;

the third sequence obtaining unit is configured to obtain a third output sequence based on the first output sequence, the second output sequence, and the sample feature.

In some embodiments, the third sequence acquisition unit comprises a first error determination unit, a second error determination unit, and a stitching unit, wherein,

the first error determination unit is configured to determine a first reconstruction error based on the first output sequence and the sample characteristics;

the second error determination unit is configured to determine a second reconstruction error based on the second output sequence and the sample feature;

and the splicing unit is used for splicing the first reconstruction error, the second reconstruction error and the hidden space vector of the last time step of the encoder in the first sub-model to obtain a third output sequence.

In some embodiments, the first processing module 403 comprises a clustering unit, a determination unit, a prediction energy determination unit, wherein,

the clustering unit is used for clustering the third output sequence by using a second sub-model of the system abnormity prediction model to obtain K clusters, wherein K is a positive integer;

the determining unit is used for determining the mean value and the covariance of the mth cluster based on the samples in the mth cluster, wherein 0< m is less than or equal to K;

the prediction energy determination unit is used for determining the prediction energy of the sample based on the mean value and the covariance.

In some embodiments, the determining unit comprises a probability determining unit and a mean covariance determining unit, wherein,

the probability determining unit is configured to estimate the third output sequence by using the second sub-model of the system anomaly prediction model, and determine a probability that a sample in the third output sequence belongs to each distribution;

the mean covariance determination unit is configured to determine a mean and a covariance of the mth cluster based on the samples in the mth cluster and a probability that the samples belong to each distribution.

In some embodiments, the construction module 404 includes a reconstruction loss determination unit and a function construction unit, wherein,

the reconstruction loss determining unit is used for determining the reconstruction loss according to the first output sequence, the second output sequence and the sample characteristics;

the function construction unit is used for constructing an objective function based on the reconstruction loss and the predicted energy.

In another aspect of the present embodiment, a system status prediction apparatus is further provided, referring to fig. 6, the apparatus includes the second obtaining module 501, an energy determining module 502, and a status determining module 503, wherein,

the second obtaining module 501 is configured to obtain sample characteristics of the system;

the energy determining module 502 is configured to determine energy corresponding to the sample feature based on a system anomaly prediction model;

the state determination module 503 is configured to determine a state of the system based on the energy.

In some embodiments, the state determination module 503 is specifically configured to determine that the system is abnormal when the energy is greater than a preset energy threshold; alternatively, the system is determined to be normal if the energy is less than or equal to the energy threshold.

In some embodiments, the system state prediction device further comprises a training module, wherein,

the second obtaining module 501 is further configured to obtain training sample characteristics;

the energy determining module 502 is further configured to process the training sample features through the system anomaly prediction model to obtain the predicted energy of the training sample;

the training module is specifically used for initializing the system abnormity prediction model according to a set weight parameter; constructing an objective function based on the predicted energies of the training samples; in back propagation, updating the weight parameters of the system anomaly prediction model through the objective function.

In another aspect of the embodiments of the present application, an apparatus is also provided, referring to fig. 7, where the computer apparatus at least includes at least one processor 601 and at least one memory 605. Wherein the memory 605 comprises a computer program for storing data executable on the processor 601, wherein the processor 601 is configured to execute, when executing the computer program: a method of system anomaly prediction model training, the method comprising:

obtaining sample characteristics;

constructing an objective function based on the predicted energy;

The processor 601 is further configured to execute, when the computer program runs, the following steps: the obtaining sample features includes:

The processor 601 is further configured to execute, when the computer program runs, the following steps: the processing the sample characteristics through the system anomaly prediction model to obtain the prediction energy comprises:

The processor 601 is further configured to execute, when the computer program runs, the following steps: obtaining a third output sequence based on the first output sequence, the second output sequence, and the sample features includes:

The processor 601 is further configured to execute, when the computer program runs, the following steps: the determining a mean and a covariance of the mth cluster based on the samples within the mth cluster comprises:

The processor 601 is further configured to execute, when the computer program runs, the following steps: the constructing an objective function based on the predicted energy comprises:

In some embodiments, the device further comprises a system bus 602, a user interface 603, a communication interface 604. Wherein the communication bus 602 is configured to enable connectivity communication between these components, the user interface 603 may include a display screen, and the communication interface 604 may include standard wired and wireless interfaces.

In another aspect of the embodiments of the present application, an apparatus is also provided, with reference to fig. 8, where the computer apparatus at least includes at least one processor 701 and at least one memory 705. Wherein the memory 705 comprises a computer program for storing data executable on the processor 701, wherein the processor 701 is configured to execute, when executing the computer program: a method of system state prediction, the method comprising:

acquiring sample characteristics of a system;

determining a state of the system based on the energy.

In some embodiments, the processor 701 is configured to execute, when running the computer program, the following: the determining a state of the system based on the energy includes:

determining that the system is abnormal when the energy is greater than a preset energy threshold;

alternatively, the system is determined to be normal if the energy is less than or equal to the energy threshold.

In some embodiments, the processor 701, when running the computer program, is configured to perform:

acquiring training sample characteristics;

In some embodiments, the device further comprises a system bus 702, a user interface 703, a communication interface 704. Wherein the communication bus 702 is configured to enable connectivity communication between these components, the user interface 703 may include a display screen, and the communication interface 704 may include standard wired and wireless interfaces.

In yet another aspect of the embodiments of the present application, a storage medium is further provided, where a system anomaly prediction model training program and/or a system state prediction program are stored on the computer readable storage medium, where the system anomaly prediction model training program is executed by a processor to implement the steps of the system anomaly prediction model training method provided in any one of the embodiments of the present application, and the system state prediction program is executed by the processor to implement the steps of the system state prediction method provided in any one of the embodiments of the present application.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A system anomaly prediction model training method is characterized by comprising the following steps:

obtaining sample characteristics;

constructing an objective function based on the predicted energy;

2. The method of claim 1, wherein the obtaining sample features comprises:

3. The method of claim 1, wherein the processing the sample features by the system anomaly prediction model to obtain a predicted energy comprises:

4. The method of claim 3, wherein obtaining a third output sequence based on the first output sequence, the second output sequence, and the sample features comprises:

5. The method of claim 3, wherein the processing the sample features by the system anomaly prediction model to obtain a predicted energy comprises:

6. The method of claim 5, wherein the determining the mean and covariance of the mth cluster based on the samples in the mth cluster comprises:

7. The method of claim 4, wherein constructing an objective function based on the predicted energy comprises:

8. A method for predicting a system state, the method comprising:

acquiring sample characteristics of a system;

determining a state of the system based on the energy.

9. The method of claim 8, wherein the determining the state of the system based on the energy comprises:

10. The method according to claim 8 or 9, wherein before the determining the energy corresponding to the sample feature based on the system anomaly prediction model, the method further comprises:

acquiring training sample characteristics;

11. The system abnormity prediction model training device is characterized by comprising a first acquisition module, an initialization module, a first processing module, a construction module and an updating module, wherein,

the first obtaining module is used for obtaining sample characteristics;

12. A system state prediction apparatus comprising a second acquisition module, an energy determination module, and a state determination module, wherein,

13. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of a method of training a system anomaly prediction model according to any one of claims 1 to 7 and/or a method of predicting a system state according to any one of claims 8 to 10.

14. A storage medium having stored thereon a system anomaly prediction model training program which, when executed by a processor, implements the steps of the system anomaly prediction model training method according to any one of claims 1 to 7, and/or a system state prediction program which, when executed by a processor, implements the steps of the system state prediction method according to any one of claims 8 to 10.