CN113342597B - System fault prediction method based on Gaussian mixture hidden Markov model - Google Patents

System fault prediction method based on Gaussian mixture hidden Markov model Download PDF

Info

Publication number
CN113342597B
CN113342597B CN202110597641.XA CN202110597641A CN113342597B CN 113342597 B CN113342597 B CN 113342597B CN 202110597641 A CN202110597641 A CN 202110597641A CN 113342597 B CN113342597 B CN 113342597B
Authority
CN
China
Prior art keywords
data set
fault
type
gaussian mixture
log file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110597641.XA
Other languages
Chinese (zh)
Other versions
CN113342597A (en
Inventor
应时
田园
王冰明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110597641.XA priority Critical patent/CN113342597B/en
Publication of CN113342597A publication Critical patent/CN113342597A/en
Application granted granted Critical
Publication of CN113342597B publication Critical patent/CN113342597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Optimization (AREA)
  • Databases & Information Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a system fault prediction method based on a Gaussian mixture hidden Markov model, which comprises the following steps: preprocessing an original log file and marking; extracting log file features and constructing feature vectors; constructing a corresponding data set for each fault to be predicted by using a sliding window; respectively training a Gaussian mixture hidden Markov fault prediction model for each fault to be predicted; predicting whether the real-time log can be in failure or not and the type of the failure which can be in failure through a trained Gaussian mixture hidden Markov model. By the technical scheme of the invention, the problems of interleaving and redundancy of original log files are solved, so that the extracted features are fewer and more accurate; the system state and the log before the system fails are modeled by adopting a Gaussian mixture hidden Markov model, so that the system failure is rapidly and accurately predicted, and the availability of the system is improved.

Description

System fault prediction method based on Gaussian mixture hidden Markov model
Technical Field
The invention belongs to the field of intelligent operation and maintenance, particularly relates to a system fault prediction method based on a Gaussian mixture hidden Markov model, and aims at the problem of system fault prediction.
Background
The complexity of software systems has increased over the past decade as demand has grown. The complexity of the software, human mental behavior, and other resource constraints make it very difficult to develop fault-free software. High complexity software systems need to guarantee their reliability. The software failure prediction predicts the future software failure tendency by using the basic prediction index and the historical failure data, and eliminates the potential failure by predicting the result. The method for preventing the software system from the faults in the past is beneficial to improving the usability and the use efficiency of the software system. By logging such semi-structured text-type data, however, there are two significant improvements to predicting system failure problems:
the effect and the efficiency of the fault prediction have further improved space
Based on the traditional machine learning models such as a support vector machine and a clustering fault prediction algorithm, the prediction accuracy and the recall rate are both about 80%, and the method can be further improved. Although the accuracy of the fault prediction algorithm based on deep learning such as CNN and LSTM reaches 90%, the training time and the prediction time of the model are obviously higher than those of the traditional machine learning model, so that the fault prediction efficiency can be further improved.
There is a need for more efficient data preprocessing methods
The log sequence has three characteristics after analysis:
long-term ordering: the sequential log of the system in the state transition can be generated in a time sequence in a series of actions in a long time, so that the sequence of the log sequence cannot be damaged when the frequent log sequence is analyzed and mined.
Staggering in the short term: because the system cluster is large in scale, multiple different tasks may be executed at the same node or different nodes, and corresponding logs are generated while each task is executed. The logs are arranged according to the time sequence to form a log sequence, so that the logs of other tasks can be inserted into the log sequence of a certain task, and the normal sequence of the logs corresponding to the tasks is damaged.
Redundancy in the short term: a component of the system is heavily accessed in a short period of time (especially when a failure occurs) and thus produces a large number of logs of the same type. For example, when a request connection error occurs, the system will immediately issue a connection request again until the connection is successful or a certain condition is reached. In the log-based failure prediction method, such redundant logs not only increase the computation cost, but also overwhelm other important logs, which is not favorable for the analysis of frequent log sequences. However, some kind of log is generated in a large amount in a short time, which may also be a feature of some kind of failure, and therefore a certain proportion of redundant logs also need to be kept.
Due to the preservation of the partial redundant logs, the number of logs in a certain period of time is large, and the types of logs are small. In the traditional log data preprocessing method, each log is regarded as an independent sample, and a feature vector is extracted. This method on the one hand results in a too large number of samples to be analyzed; the other party also causes a smaller amount of useful information in this time period. Therefore, a better method for preprocessing log data is needed, so that the processed data set can be more representative.
Disclosure of Invention
Aiming at the research background and problems, the invention provides a system fault prediction method based on a Gaussian mixture hidden Markov model, which comprises the following steps: according to the historical system logs, a GMM-HMM model is respectively constructed for each fault type to be predicted, the real-time log sequence of the system is respectively input into each GMM-HMM model during prediction, the probability of the log sequence under each model is calculated, and whether a fault occurs or the fault type occurs is judged based on the probability.
The technical scheme of the invention is a system fault prediction method based on a Gaussian mixture hidden Markov model, which comprises the following specific steps:
step 1: preprocessing an original log file data set to obtain a preprocessed log file data set, extracting a plurality of keywords of each preprocessed log file of the preprocessed log file data set by a keyword extraction method, further constructing a word frequency matrix, extracting a plurality of clustered preprocessed log files of each preprocessed log file data set of the preprocessed log file data set by the word frequency matrix by adopting a coacervation hierarchical clustering method, and manually marking the type of each clustered preprocessed log file;
step 2: extracting the characteristic of the type of each clustered preprocessed log file, further constructing a characteristic vector of the type of each clustered preprocessed log file, and arranging the characteristic vectors of the types of all clustered preprocessed log files according to the sequence of original log files to obtain a characteristic vector data set;
and step 3: positioning all occurrence positions of specified faults on a characteristic vector data set, positioning an initial position of a sliding window and a stop position of the sliding window on the characteristic vector data set, intercepting a characteristic vector sequence in the sliding window from the initial position of the sliding window and putting the characteristic vector sequence into the specified fault data set, moving the sliding window backwards by a sliding step length distance, continuously intercepting the characteristic vector sequence in the sliding window and putting the characteristic vector sequence into the specified fault data set until the sliding window reaches or exceeds the stop position of the sliding window, and taking the characteristic vector sequence as a specified fault data set;
and 4, step 4: respectively setting hyper-parameters of a Gaussian mixture hidden Markov model of each specified fault to be predicted, respectively taking a data set of each specified fault as the input of a training algorithm of the Gaussian mixture hidden Markov model, and optimizing and training the parameters to be estimated of the Gaussian mixture hidden Markov model of each specified fault through what algorithm to obtain the parameters to be estimated of the optimized Gaussian mixture hidden Markov model of each specified fault so as to construct the optimized Gaussian mixture hidden Markov prediction model of each specified fault;
and 5: intercepting a section of real-time log sequence through the sliding window in the step 3 to serve as a log sequence to be predicted; converting the log sequence to be predicted into a type sequence of the log files after the preprocessing of clustering by the method of step 1, converting the type sequence of the log files after the preprocessing of clustering into a characteristic vector sequence by the method of step 2, and predicting the characteristic vector sequence by the optimized Gaussian mixed hidden Markov model of each specified fault to obtain a prediction result;
preferably, the pretreatment in step 1 is: cleaning meaningless parameters and filtering redundant logs on the original log file data set to obtain a preprocessed log file data set;
step 1, the pre-processed log files after the clustering are obtained, and the specific formula symbols are defined as follows:
Figure BDA0003091764420000032
wherein lj,iRepresenting the ith log file, N in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]K represents the number of acquisition instants;
step 1, the type of each pre-processed clustered log file is defined as following by a specific formula symbol:
Figure BDA0003091764420000031
wherein e isj,iRepresenting the type, N, of the ith log file in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]K represents the number of acquisition instants;
preferably, in the step 2, the specific calculation method for extracting the feature of the type of the preprocessed log file after each cluster is as follows:
Figure BDA0003091764420000041
wherein the content of the first and second substances,
Figure BDA0003091764420000042
indicating that the type m of the log file is in typejFrequency of middle energizer, FmIs represented in { type1,type2,...,typeKIn the structure, m e typeiIs true i e [1, M ∈]Frequency of (N)jRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, wherein M belongs to [1, M ∈],j∈[1,K]M represents the total number of the types of the clustered log files in the step 1, and K represents the number of the acquisition moments;
step 2, the feature vector of the type of the log file after each cluster is preprocessed is defined as:
Figure BDA0003091764420000043
wherein component (a)
Figure BDA0003091764420000044
Indicating that the type m of the log file is in typejM is within [1, M ]],j∈[1,K]M represents the total number of the types of the clustered log files in the step 1, and K represents the number of the acquisition moments;
step 2, the characteristic vector data set is defined by a specific formula symbol
V={v1,v2,...,vK}
Wherein v isjRepresents typejExtracting characteristic vector, j belongs to [1, K ∈ after characteristic extraction];
Preferably, the step 3 of locating all occurrence positions of the specified fault on the characteristic vector data set includes:
searching and positioning a position d where the specified fault f appears in an original log file through a keyword of the specified fault f, and recording an index;
f∈[1,F],d∈[1,Df]f denotes the number of all specified fault types to be predicted, DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
positioning an index j of the acquisition time of the specified fault f in the preprocessed log file data set through the recorded indexesf,jf∈[1,Df],DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
on the feature vector data set, locate an index jfThe total number of the positioning positions is
Figure BDA0003091764420000045
Step 3, the sliding window, the specific formula symbol is defined as:
Figure BDA0003091764420000051
wherein v isr,zIndicating the z-th feature vector in the vector sequence intercepted for the r-th time before the sliding window specifies the d-th positioning position of the fault on the feature vector data set,
f∈[1,F],d∈[1,Df],r∈[1,Sd],z∈[1,Lw],
f denotes the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdRepresenting the number of times of interception before the d-th position on the feature vector data set, d ∈ [1, N]N denotes the number of fault locations specified on the sequence of feature vectors, LwRepresenting the number of feature vectors contained in the sliding window;
step 3, the sliding step length, a specific formula symbol, is defined as:
Figure BDA0003091764420000052
wherein v isvDenotes the v-th feature vector contained in the sliding step, v ∈ [1, Ls],LsRepresenting the number of feature vectors contained in the step of sliding;
step 3, specifying a fault data set, wherein a specific formula symbol is defined as:
Figure BDA0003091764420000053
wherein the content of the first and second substances,
Figure BDA0003091764420000054
representing all feature vector sequence segments intercepted before the d-th localization position of the specified fault f,
Figure BDA0003091764420000055
representing the r-th truncated segment of the feature vector before the d-th localized position of the specified fault f,
f∈[1,F],d∈[1,Df],r∈[1,Sd],
where F represents the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdIndicating the number of truncations made before the d-th position on the feature vector data set.
Preferably, in step 4, the specified failure data set is defined as:
Data={Data1,Data2,...,DataF}
wherein DatafData set representing a specified failure F, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
step 4, the hyper-parameters of the Gaussian mixture hidden Markov model for specifying the faults comprise: the number of hidden states and the number of Gaussian components in the hidden Markov model;
the number of hidden states in the hidden Markov model is QfF denotes specifying the type of failure, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
the number of the Gaussian partial models is GfF denotes specifying the type of failure, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
step 4, the parameters to be estimated of the Gaussian mixture hidden Markov prediction model of each specified fault comprise: the method comprises the following steps of weighting a Gaussian mixture model, a mean vector of the Gaussian mixture model, a covariance matrix of the Gaussian mixture model, a state transition probability matrix and an initial state probability vector;
the weight of the Gaussian partial model is
Figure BDA0003091764420000061
Representing the weight of a Gaussian component model g in the mixed Gaussian model corresponding to the hidden state p of the specified fault f;
the mean vector of the Gaussian partial model is
Figure BDA0003091764420000062
Representing a mean vector of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the Gaussian mixture model covariance matrix is
Figure BDA0003091764420000063
Representing a covariance matrix of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the state transition probability matrix is
Figure BDA0003091764420000064
Wherein
Figure BDA0003091764420000065
Representing the probability of a hidden state p of a given fault f transitioning to a hidden state q;
the initial state probability vector is
Figure BDA0003091764420000066
πp fIndicating the probability of occurrence of the hidden state p at the initial moment of the specified fault f,
Figure BDA0003091764420000067
wherein the content of the first and second substances,
g represents a Gaussian component model, and G belongs to [1, G ]f],GfRepresenting the number of Gaussian component models corresponding to the hidden state of the specified fault f;
f represents the specified fault type, F belongs to [1, F ], and F represents the number of all specified fault types to be predicted;
p represents the hidden state at the current time, p is in [1, Q ]f],QfRepresenting the number of hidden states of a specified fault type f;
q represents the hidden state at the next time, Q ∈ [1, Q ]f],QfIndicating the number of hidden states specifying the fault type f.
Preferably, in step 5, the specific method for predicting the feature vector sequence by the optimized gaussian mixture hidden markov model for each specified fault in step 4 is as follows:
and taking the characteristic vector sequence as the input of a backward algorithm of each Gaussian mixture hidden Markov model to obtain the probability of the characteristic vector sequence appearing under each Gaussian mixture hidden Markov model:
PR={PR1,PR2,...,PRF}
wherein, PRfRepresenting the probability of occurrence of a sequence of feature vectors in a Gaussian mixture hidden Markov model of the fault type F, found by a backward algorithm, with F ∈ [1, F]And F denotes the number of specified fault types to be predicted.
Step 5, the specific method for obtaining the prediction result is as follows:
defining a threshold value T, if PRresult={PRtIs not empty, where PRt>T,PRt∈PR,t∈[1,F]F denotes the number of all specified fault types to be predicted,
then take max { PRresultAnd taking the corresponding fault type as a prediction result, otherwise, taking the prediction result as no fault.
The invention has the advantages that the invention adopts the feature extraction method of the segmentation log, solves the problem of interleaving and redundancy of the original log file, and ensures that the constructed feature vector has more identifiability; the system state and the log before the system fails are modeled by adopting the Gaussian mixture hidden Markov model, and compared with other system failure prediction models, the method has certain advantages in prediction effect and efficiency.
Drawings
FIG. 1: a flow chart of a method of fault prediction for an embodiment of the invention;
FIG. 2: the GMM-HMM system based fault prediction model of the embodiment of the invention;
FIG. 3: data pre-processing activity graphs for embodiments of the invention;
FIG. 4: data change graphs for examples of the invention;
FIG. 5: the recognition rate thermodynamic diagram of an embodiment of the invention;
FIG. 6: log cross-contrast graphs for examples of the invention;
FIG. 7: different methods predict effect contrast chart;
FIG. 8: different methods compare the efficiency to the figure.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The following describes a specific embodiment of the present invention with reference to fig. 1 to 8, and a technical solution of the specific embodiment of the present invention is a system fault prediction method based on a gaussian hybrid hidden markov model, which includes the following specific steps:
step 1: preprocessing an original log file data set to obtain a preprocessed log file data set, extracting a plurality of keywords of each preprocessed log file of the preprocessed log file data set by a keyword extraction method, further constructing a word frequency matrix, extracting a plurality of clustered preprocessed log files of each preprocessed log file data set of the preprocessed log file data set by the word frequency matrix by adopting a coacervation hierarchical clustering method, and manually marking the type of each clustered preprocessed log file;
the pretreatment in the step 1 comprises the following steps: cleaning meaningless parameters and filtering redundant logs on the original log file data set to obtain a preprocessed log file data set;
step 1, the pre-processed log files after the clustering are obtained, and the specific formula symbols are defined as follows:
Figure BDA0003091764420000081
wherein lj,iRepresenting the ith log file, N in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]And K represents the acquisition timeThe number of (2);
step 1, the type of each pre-processed clustered log file is defined as following by a specific formula symbol:
Figure BDA0003091764420000082
wherein e isj,iRepresenting the type, N, of the ith log file in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]K is 1024, which represents the number of acquisition moments;
step 2: extracting the characteristic of the type of each clustered preprocessed log file, further constructing a characteristic vector of the type of each clustered preprocessed log file, and arranging the characteristic vectors of the types of all clustered preprocessed log files according to the sequence of original log files to obtain a characteristic vector data set;
step 2, extracting the characteristics of the type of the preprocessed log file after each cluster, wherein the specific calculation method comprises the following steps:
Figure BDA0003091764420000091
wherein the content of the first and second substances,
Figure BDA0003091764420000092
indicating that the type m of the log file is in typejFrequency of middle energizer, FmIs represented in { type1,type2,...,typeKIn the structure, m is an element tupeiIs true i e [1, M ∈]Frequency of (N)jRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, wherein M belongs to [1, M ∈],j∈[1,K]M is 80 to represent the total number of the log file types after clustering in the step 1, and K is 1024 to represent the number of the acquisition time;
step 2, the feature vector of the type of the log file after each cluster is preprocessed is defined as:
Figure BDA0003091764420000093
wherein component (a)
Figure BDA0003091764420000094
Indicating that the type m of the log file is in typejM is within [1, M ]],j∈[1,K]M is 80 to represent the total number of the log file types after clustering in the step 1, and K is 1024 to represent the number of the acquisition time;
step 2, the characteristic vector data set is defined by a specific formula symbol
V={v1,v2,...,vK}
Wherein v isjRepresents typejExtracting characteristic vector, j belongs to [1, K ∈ after characteristic extraction];
And step 3: positioning all occurrence positions of specified faults on a characteristic vector data set, positioning an initial position of a sliding window and a stop position of the sliding window on the characteristic vector data set, intercepting a characteristic vector sequence in the sliding window from the initial position of the sliding window and putting the characteristic vector sequence into the specified fault data set, moving the sliding window backwards by a sliding step length distance, continuously intercepting the characteristic vector sequence in the sliding window and putting the characteristic vector sequence into the specified fault data set until the sliding window reaches or exceeds the stop position of the sliding window, and taking the characteristic vector sequence as a specified fault data set;
step 3, positioning all occurrence positions of the specified fault on the characteristic vector data set, wherein the specific method comprises the following steps:
searching and positioning a position d where the specified fault f appears in an original log file through a keyword of the specified fault f, and recording an index;
f∈[1,F],d∈[1,Df]denotes the number of all specified fault types to be predicted, DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
positioning the acquisition of the specified fault f in the preprocessed log file data set by the recorded indexIndex j of set timef,jf∈[1,Df],DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
on the feature vector data set, locate an index jfThe total number of the positioning positions is
Figure BDA0003091764420000101
Step 3, the sliding window, the specific formula symbol is defined as:
Figure BDA0003091764420000102
wherein v isr,zIndicating the z-th feature vector in the vector sequence intercepted for the r-th time before the sliding window specifies the d-th positioning position of the fault on the feature vector data set,
f∈[1,F],d∈[1,Df],r∈[1,Sd],z∈[1,Lw],
f denotes the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdRepresenting the number of times of interception before the d-th position on the feature vector data set, d ∈ [1, N]N denotes the number of fault locations specified on the sequence of feature vectors, LwRepresenting the number of feature vectors contained in the sliding window;
step 3, the sliding step length, a specific formula symbol, is defined as:
Figure BDA0003091764420000103
wherein v isvDenotes the v-th feature vector contained in the sliding step, v ∈ [1, Ls],LsRepresenting the number of feature vectors contained in the step of sliding;
step 3, specifying a fault data set, wherein a specific formula symbol is defined as:
Figure BDA0003091764420000111
wherein the content of the first and second substances,
Figure BDA0003091764420000112
representing all feature vector sequence segments intercepted before the d-th localization position of the specified fault f,
Figure BDA0003091764420000113
representing the r-th truncated segment of the feature vector before the d-th localized position of the specified fault f,
f∈[1,F],d∈[1,Df],r∈[1,Sd],
where F-4 denotes the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdIndicating the number of truncations made before the d-th position on the feature vector data set.
And 4, step 4: respectively setting hyper-parameters of a Gaussian mixture hidden Markov model of each specified fault to be predicted, respectively taking a data set of each specified fault as the input of a training algorithm of the Gaussian mixture hidden Markov model, and optimizing and training the parameters to be estimated of the Gaussian mixture hidden Markov model of each specified fault through what algorithm to obtain the parameters to be estimated of the optimized Gaussian mixture hidden Markov model of each specified fault so as to construct the optimized Gaussian mixture hidden Markov prediction model of each specified fault;
step 4, specifying a fault data set, wherein a specific formula is defined as:
Data={Data1,Data2,...,DataF}
wherein DatafData set representing a specified failure F, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
step 4, the hyper-parameters of the Gaussian mixture hidden Markov model for specifying the faults comprise: the number of hidden states and the number of Gaussian components in the hidden Markov model;
the number of hidden states in the hidden Markov model is QfF denotes a specified fault type, F ∈ [1, F ∈ [ ]3]F represents the number of all specified fault types to be predicted;
the number of the Gaussian partial models is G f6, F denotes a specified fault type, F ∈ [1, F ∈]F represents the number of all specified fault types to be predicted;
step 4, the parameters to be estimated of the Gaussian mixture hidden Markov prediction model of each specified fault comprise: the method comprises the following steps of weighting a Gaussian mixture model, a mean vector of the Gaussian mixture model, a covariance matrix of the Gaussian mixture model, a state transition probability matrix and an initial state probability vector;
the weight of the Gaussian partial model is
Figure BDA0003091764420000121
Representing the weight of a Gaussian component model g in the mixed Gaussian model corresponding to the hidden state p of the specified fault f;
the mean vector of the Gaussian partial model is
Figure BDA0003091764420000122
Representing a mean vector of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the Gaussian mixture model covariance matrix is
Figure BDA0003091764420000123
Representing a covariance matrix of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the state transition probability matrix is
Figure BDA0003091764420000124
Wherein
Figure BDA0003091764420000125
Representing the probability of a hidden state p of a given fault f transitioning to a hidden state q;
the initial state probability vector is
Figure BDA0003091764420000126
πp fIndicating the probability of occurrence of the hidden state p at the initial moment of the specified fault f,
Figure BDA0003091764420000127
wherein the content of the first and second substances,
g represents a Gaussian component model, and G belongs to [1, G ]f],GfRepresenting the number of Gaussian component models corresponding to the hidden state of the specified fault f;
f represents the specified fault type, F belongs to [1, F ], and F represents the number of all specified fault types to be predicted;
p represents the hidden state at the current time, p is in [1, Q ]f],QfRepresenting the number of hidden states of a specified fault type f;
q represents the hidden state at the next time, Q ∈ [1, Q ]f],QfIndicating the number of hidden states specifying the fault type f.
And 5: intercepting a section of real-time log sequence through the sliding window in the step 3 to serve as a log sequence to be predicted; converting the log sequence to be predicted into a type sequence of the log files after the preprocessing of clustering by the method of step 1, converting the type sequence of the log files after the preprocessing of clustering into a characteristic vector sequence by the method of step 2, and predicting the characteristic vector sequence by the optimized Gaussian mixed hidden Markov model of each specified fault to obtain a prediction result;
step 5, the specific method for predicting the feature vector sequence by the optimized gaussian mixed hidden markov model of each specified fault in step 4 is as follows:
and taking the characteristic vector sequence as the input of a backward algorithm of each Gaussian mixture hidden Markov model to obtain the probability of the characteristic vector sequence appearing under each Gaussian mixture hidden Markov model:
PR={PR1,PR2,...,PRF}
wherein, PRfRepresenting the probability of occurrence of a sequence of feature vectors in a Gaussian mixture hidden Markov model of the fault type F, found by a backward algorithm, with F ∈ [1, F]And F denotes the number of specified fault types to be predicted.
Step 5, the specific method for obtaining the prediction result is as follows:
defining a threshold value T of 0.76, if PRresult={PRtIs not empty, where PRt>T,PRt∈PR,t∈[1,F]F denotes the number of all specified fault types to be predicted,
then take max { PRresultAnd taking the corresponding fault type as a prediction result, otherwise, taking the prediction result as no fault.
Modeling transitions between system states and log generation processes using GNN-HMM
The system is divided into a system state layer, a task layer and an observation layer. The system layer is also referred to as a hidden layer, where a finite number of system states can transition with some probability to any one system state. When the system state is converted, a series of system tasks are executed, so that a task layer is generated. The corresponding log is generated while the task is executed, thereby creating an observation layer. In the observation layer, the system log is processed into a numerical vector form which can be understood by a machine through a data processing method. For the sake of simplifying the problem, the invention ignores the task layer and considers that the transition of the system state only depends on the state of the previous time, and simultaneously generates the log, so that the generation of the log at the current time also only depends on the system state at the current time. The process is modeled by a hidden Markov model, and further a Gaussian mixture distribution density is used as a function of the log generated by the system state, namely, a GMM model is used for fitting the probability distribution of the observed values in the HMM model.
In light of the above-described discussion,
FIG. 1 is a flow chart illustrating the overall process of the present invention from step 1 to step 5.
Figure 2 shows the three-layer structure of the system and the connection mode between the layers.
FIG. 3 is a detailed flow diagram illustrating the operation of the data preprocessing of FIG. 1.
FIG. 4 depicts the change in data during the process of converting the original log sequence into feature vectors.
The embodiment of the invention verifies the description method from three aspects
And verifying the influence of the system state quantity and the Gaussian mixture model quantity on the fault identification rate. After the number of hidden states and the number of Gaussian partial models are set, the models are respectively trained by using the data sets of each fault. And inputting the training data into the trained model again, calculating the probability, and marking the input data as the type with the highest probability value. The recognition rate is equal to the fraction of correctly marked data in the original data. And performing multiple tests by setting different state numbers and Gaussian branch model numbers, and taking the state number and the Gaussian branch model number corresponding to the model with the highest recognition rate as corresponding parameter values in subsequent experiments.
And verifying whether the feature construction method solves the problem of log interleaving. The original log sequence is artificially interleaved for a short time: and traversing the log data set in sequence, and randomly extracting a log exchange position from the first 50 logs or the last 50 logs of each log, thereby artificially manufacturing more staggered logs. And performing a comparison experiment on the manually staggered log data set and the original log data set, training a model, predicting faults, and comparing the predicted accuracy, recall rate and F-value.
The prediction effect of the GMM-HMM model is compared with other models. Comparing a prediction method based on a GMM-HMM (Gaussian mixture model) -HMM (hidden Markov model) with a prediction method based on a random index-support vector machine (RI-SVN), a prediction method based on a dual-combination long-and-short term memory network (CNN-LSTM) and a prediction method based on log event sequence clustering (Cluster), and evaluating the accuracy, the recall rate, the F-value, the training time and the prediction time of each method.
The invention verifies that the selected data set is the log data generated by the supercomputer Spirit in the actual operation process, and the data set is disclosed on the internet. The scale of the test data set is shown in table 1. The data set is divided into training data and test data, a training data set with 3 faults is constructed by a log data preprocessing method and a data set construction method, and the number of three fault events and corresponding seed logs is shown in table 2.
TABLE 1 Spirit data set Specification
Figure BDA0003091764420000151
Table 2 fault event seed log
Description of failure events Corresponding seed Log quantity
drive error SCSI port ID 52
writing message file 38
unknown service 83
And (5) training the model after setting the number of the hidden states and the number of the Gaussian mixture models, and calculating the recognition rate of the model. And setting different numbers of hidden states and Gaussian partial models, and performing multiple tests to find better numbers of hidden states and Gaussian partial models. The labeling rates corresponding to the number of different hidden states and the number of gaussian models are shown in table 3.
TABLE 3 identification rates of different hidden states and Gaussian fraction model numbers
Figure BDA0003091764420000152
It can be seen from table 3 that when the number of concealment is constant, the recognition rate increases with the number of gaussian partial models, because the higher the number of gaussian partial models is, the higher its accuracy is, and the better the effect of simulating distribution is. However, the excessive number of the partial models can cause large data quantity on one hand and increase of calculation quantity and storage quantity on the other hand, thereby bringing great consumption to the system and reducing algorithm efficiency. Fig. 5 is a thermodynamic diagram of recognition rate as a function of hidden state and partial model changes. It can be observed that the recognition rate is highest when the number of hidden states is 3 and the number of gaussian component models is 6 or more. In order to reduce the amount of calculation and the amount of memory, the number of partial models is selected to be 6.
Comparing the manually staggered logs with the original logs, training the model by adopting the optimal group of hidden state quantity and Gaussian score model quantity in the training model after the number of hidden states and the number of Gaussian score models are preset, and calculating the accuracy rate, the recall rate and the F-value of the model.
As can be seen from fig. 6, the change in the accuracy, recall, and F-value of the model is not large after the cross-logs are artificially made. This shows that the log data preprocessing method of the present invention can solve the problem of interleaving of the log in a short time.
By combining various experimental verifications and analyses, the method provided by the invention has the advantages that under the conditions of better super-parameters and threshold values, the accuracy rate exceeds 80%, and the recall rate also reaches over 75%. In the aspect of predicting fault events, each evaluation index is good, and the time complexity and the space complexity of the algorithm can be accepted in the training and prediction of the model. Therefore, the failure prediction method of the present invention is also highly feasible in practice.
Comparing and analyzing the fault prediction effects of a GMM-HMM fault prediction method and other fault prediction methods, such as a random index and support vector machine (RI-SVN) method, a double-combination long-time memory network (CNN-LSTM) method and a log event sequence clustering (Cluster) method. Fig. 7 illustrates the difference in accuracy and recall between the above methods and the method of the present invention. Fig. 8 illustrates the difference between the training time and the prediction time for these methods. As can be seen from FIG. 3, the prediction effect of the GMM-HMM failure prediction model is immediately after the deep learning method CNN-LSTM and is superior to statistical learning methods such as RI-SVN, Cluster, etc. This is because the method of the present invention improves the log data preprocessing step, solves the log interleaving problem, and preserves a certain amount of redundant data in combination with the log distribution characteristics before the failure occurs, because the improvement of the data set improves the prediction effect. The CNN-LSTM has better prediction effect because CNN can locally read data and directly input the data to LSTM for analysis and calculation, thereby solving the problem of local interleaving of logs. However, the neural network has more complex structure parameters and requires a larger amount of data, and therefore, more calculation time and calculation resources are consumed. It can be seen from FIG. 8 that the training time of CNN-LSTM is much longer than that of GMM-HMM, and the prediction time is also longer than that of GMM-HMM, but the overall efficiency of each method in the statistical learning prediction method is not very different. Therefore, the GMM-HMM fault prediction method provided by the invention has certain advantages when the efficiency and the effect of the algorithm are comprehensively considered.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A system fault prediction method based on a Gaussian mixture hidden Markov model is characterized by comprising the following steps:
step 1: preprocessing an original log file data set to obtain a preprocessed log file data set, extracting a plurality of keywords of each preprocessed log file of the preprocessed log file data set by a keyword extraction method, further constructing a word frequency matrix, extracting a plurality of clustered preprocessed log files of each preprocessed log file data set of the preprocessed log file data set by the word frequency matrix by adopting a coacervation hierarchical clustering method, and manually marking the type of each clustered preprocessed log file;
step 2: extracting the characteristic of the type of each clustered preprocessed log file, further constructing a characteristic vector of the type of each clustered preprocessed log file, and arranging the characteristic vectors of the types of all clustered preprocessed log files according to the sequence of original log files to obtain a characteristic vector data set;
and step 3: positioning all occurrence positions of specified faults on a characteristic vector data set, positioning an initial position of a sliding window and a stop position of the sliding window on the characteristic vector data set, intercepting a characteristic vector sequence in the sliding window from the initial position of the sliding window and putting the characteristic vector sequence into the specified fault data set, moving the sliding window backwards by a sliding step length distance, continuously intercepting the characteristic vector sequence in the sliding window and putting the characteristic vector sequence into the specified fault data set until the sliding window reaches or exceeds the stop position of the sliding window, and taking the characteristic vector sequence as a specified fault data set;
and 4, step 4: respectively setting hyper-parameters of a Gaussian mixture hidden Markov model of each specified fault to be predicted, respectively taking a data set of each specified fault as the input of a training algorithm of the Gaussian mixture hidden Markov model, and optimizing and training the parameters to be estimated of the Gaussian mixture hidden Markov model of each specified fault through what algorithm to obtain the parameters to be estimated of the optimized Gaussian mixture hidden Markov model of each specified fault so as to construct the optimized Gaussian mixture hidden Markov prediction model of each specified fault;
and 5: intercepting a section of real-time log sequence through the sliding window in the step 3 to serve as a log sequence to be predicted; and (3) converting the log sequence to be predicted into a type sequence of the log files after the preprocessing of clustering by the method of step 1, converting the type sequence of the log files after the preprocessing of clustering into a characteristic vector sequence by the method of step 2, and predicting the characteristic vector sequence by the optimized Gaussian mixture hidden Markov model of each specified fault to obtain a prediction result.
2. The method for predicting system faults based on the Gaussian mixture hidden Markov model according to claim 1, wherein the preprocessing in the step 1 is as follows: cleaning meaningless parameters and filtering redundant logs on the original log file data set to obtain a preprocessed log file data set;
step 1, the pre-processed log files after the clustering are obtained, and the specific formula symbols are defined as follows:
Figure FDA0003541752710000021
wherein lj,iRepresenting the ith log file, N in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]K represents the number of acquisition instants;
step 1, the type of each pre-processed clustered log file is defined as following by a specific formula symbol:
Figure FDA0003541752710000022
wherein e isj,iRepresenting the type, N, of the ith log file in the preprocessed log file data set at the jth acquisition timejRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, i belongs to [1, N ]j],j∈[1,K]And K represents the number of acquisition instants.
3. The method for predicting system faults based on the Gaussian mixture hidden Markov model according to claim 1, wherein the step 2 is to extract the characteristics of the type of each pre-processed log file after clustering, and the specific calculation method is as follows:
Figure FDA0003541752710000023
wherein the content of the first and second substances,
Figure FDA0003541752710000024
indicating that the type m of the log file is in typejFrequency of middle energizer, FmIs represented in { type1,type2,...,typeKIn the structure, m e typeiIs true i e [1, M ∈]Frequency of (N)jRepresenting the total number of log files in the preprocessed log file data set at the jth acquisition time, wherein M belongs to [1, M ∈],j∈[1,K]M represents the total number of the types of the clustered log files in the step 1, and K represents the number of the acquisition moments;
step 2, the feature vector of the type of the log file after each cluster is preprocessed is defined as:
Figure FDA0003541752710000025
wherein component (a)
Figure FDA0003541752710000026
Indicating that the type m of the log file is in typejM is within [1, M ]],j∈[1,K]M represents the total number of the types of the clustered log files in the step 1, and K represents the number of the acquisition moments;
step 2, the characteristic vector data set is defined by a specific formula symbol
V={v1,v2,...,vK}
Wherein v isjRepresents typejExtracting characteristic vector, j belongs to [1, K ∈ after characteristic extraction]。
4. The method for predicting system faults based on the Gaussian mixture hidden Markov model according to claim 1, wherein the step 3 locates all occurrence positions of the specified faults on the characteristic vector data set by a specific method comprising the following steps:
searching and positioning a position d where the specified fault f appears in an original log file through a keyword of the specified fault f, and recording an index;
f∈[1,F],d∈[1,Df]f denotes the number of all specified fault types to be predicted, DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
positioning an index j of the acquisition time of the specified fault f in the preprocessed log file data set through the recorded indexesf,jf∈[1,Df],DfRepresenting the number of the occurrence positions of the specified fault type f to be predicted;
on the feature vector data set, locate an index jfThe total number of the positioning positions is
Figure FDA0003541752710000031
Step 3, the sliding window, the specific formula symbol is defined as:
Figure FDA0003541752710000032
wherein v isr,zIndicating the z-th feature vector in the vector sequence intercepted for the r-th time before the sliding window specifies the d-th positioning position of the fault on the feature vector data set,
f∈[1,F],d∈[1,Df],r∈[1,Sd],z∈[1,Lw],
f denotes the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdRepresenting the number of times of interception before the d-th position on the feature vector data set, d ∈ [1, N]N denotes the number of fault locations specified on the sequence of feature vectors, LwRepresenting the number of feature vectors contained in the sliding window;
step 3, the sliding step length, a specific formula symbol, is defined as:
Figure FDA0003541752710000041
wherein v isvDenotes the v-th feature vector contained in the sliding step, v ∈ [1, LS],LSRepresenting the number of feature vectors contained in the step of sliding;
step 3, specifying a fault data set, wherein a specific formula symbol is defined as:
Figure FDA0003541752710000042
wherein the content of the first and second substances,
Figure FDA0003541752710000043
representing all feature vector sequence segments intercepted before the d-th localization position of the specified fault f,
Figure FDA0003541752710000044
representing the r-th truncated segment of the feature vector before the d-th localized position of the specified fault f,
f∈[1,F],d∈[1,Df],r∈[1,Sd],
where F represents the number of all specified fault types to be predicted, DfRepresenting the number of locations where the specified fault type f occurs, SdIndicating the number of truncations made before the d-th position on the feature vector data set.
5. The Gaussian mixture hidden Markov model-based system failure prediction method of claim 4,
step 4, the hyper-parameters of the Gaussian mixture hidden Markov model for specifying the faults comprise: the number of hidden states and the number of Gaussian components in the hidden Markov model;
hidden Markov modelThe number of hidden states is QfF denotes specifying the type of failure, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
the number of the Gaussian partial models is GfF denotes specifying the type of failure, F ∈ [1, F ∈ [ ]]F represents the number of all specified fault types to be predicted;
step 4, the parameters to be estimated of the Gaussian mixture hidden Markov prediction model of each specified fault comprise: the method comprises the following steps of weighting a Gaussian mixture model, a mean vector of the Gaussian mixture model, a covariance matrix of the Gaussian mixture model, a state transition probability matrix and an initial state probability vector;
the weight of the Gaussian partial model is
Figure FDA0003541752710000051
Representing the weight of a Gaussian component model g in the mixed Gaussian model corresponding to the hidden state p of the specified fault f;
the mean vector of the Gaussian partial model is
Figure FDA0003541752710000052
Representing a mean vector of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the Gaussian mixture model covariance matrix is
Figure FDA0003541752710000053
Representing a covariance matrix of a Gaussian mixture model g in a mixed Gaussian model corresponding to a hidden state p of a specified fault f;
the state transition probability matrix is
Figure FDA0003541752710000054
Wherein
Figure FDA0003541752710000055
Representing the probability of a hidden state p of a given fault f transitioning to a hidden state q;
the initial state probability vector is
Figure FDA0003541752710000056
πp fIndicating the probability of occurrence of the hidden state p at the initial moment of the specified fault f,
Figure FDA0003541752710000057
wherein the content of the first and second substances,
g represents a Gaussian component model, and G belongs to [1, G ]f],GfRepresenting the number of Gaussian component models corresponding to the hidden state of the specified fault f;
f represents the specified fault type, F belongs to [1, F ], and F represents the number of all specified fault types to be predicted;
p represents the hidden state at the current time, p is in [1, Q ]f],QfRepresenting the number of hidden states of a specified fault type f;
q represents the hidden state at the next time, Q ∈ [1, Q ]f],QfIndicating the number of hidden states specifying the fault type f.
6. The method for predicting system faults based on the Gaussian mixture hidden Markov model according to claim 1, wherein the specific method for predicting the characteristic vector sequence by the optimized Gaussian mixture hidden Markov model for each fault in step 4 is as follows:
and taking the characteristic vector sequence as the input of a backward algorithm of each Gaussian mixture hidden Markov model to obtain the probability of the characteristic vector sequence appearing under each Gaussian mixture hidden Markov model:
PR={PR1,PR2,...,PRF}
wherein, PRfRepresenting the probability of occurrence of a sequence of feature vectors in a Gaussian mixture hidden Markov model of the fault type F, found by a backward algorithm, with F ∈ [1, F]F represents the number of all specified fault types to be predicted;
step 5, the specific method for obtaining the prediction result is as follows:
defining a threshold value T, if PRresult={PRtIs not empty, where PRt>T,PRt∈PR,t∈[1,F]F denotes the number of all specified fault types to be predicted,
then take max { PRresultAnd taking the corresponding fault type as a prediction result, otherwise, taking the prediction result as no fault.
CN202110597641.XA 2021-05-31 2021-05-31 System fault prediction method based on Gaussian mixture hidden Markov model Active CN113342597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110597641.XA CN113342597B (en) 2021-05-31 2021-05-31 System fault prediction method based on Gaussian mixture hidden Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110597641.XA CN113342597B (en) 2021-05-31 2021-05-31 System fault prediction method based on Gaussian mixture hidden Markov model

Publications (2)

Publication Number Publication Date
CN113342597A CN113342597A (en) 2021-09-03
CN113342597B true CN113342597B (en) 2022-04-29

Family

ID=77472623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110597641.XA Active CN113342597B (en) 2021-05-31 2021-05-31 System fault prediction method based on Gaussian mixture hidden Markov model

Country Status (1)

Country Link
CN (1) CN113342597B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704075B (en) * 2021-09-23 2022-09-02 中国人民解放军国防科技大学 Fault log-based high-performance computing system fault prediction method
CN114816962B (en) * 2022-06-27 2022-11-04 南京争锋信息科技有限公司 ATTENTION-LSTM-based network fault prediction method
CN116150636B (en) * 2023-04-18 2023-07-07 苏州上舜精密工业科技有限公司 Fault monitoring method and system for transmission module

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《一种基于系统日志聚类的多类型故障事件预测方法》;王卫华;《全国优秀硕士学位论文全文库》;20180521;全文 *
《基于深度特征学习的网络异常行为检测》;宋绪成;《全国优秀硕士学位论文全文库》;20210321;全文 *
基于双阶段并行隐马尔科夫模型的电力系统暂态稳定评估;唐飞等;《中国电机工程学报》;20130405(第10期);全文 *
基于日志数据的分布式软件系统故障诊断综述;贾统等;《软件学报》;20200715(第07期);全文 *
面向服务软件中异常处理模块重要性的仿真分析方法;吴青等;《计算机科学》;20121015(第10期);全文 *

Also Published As

Publication number Publication date
CN113342597A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN113342597B (en) System fault prediction method based on Gaussian mixture hidden Markov model
CN110413788B (en) Method, system, device and storage medium for predicting scene category of conversation text
CN111914873A (en) Two-stage cloud server unsupervised anomaly prediction method
Tan et al. Network fault prediction based on CNN-LSTM hybrid neural network
KR20210141784A (en) A method for training a deep learning network based on AI and a learning device using the same
US11443168B2 (en) Log analysis system employing long short-term memory recurrent neural net works
CN112951311B (en) Hard disk fault prediction method and system based on variable weight random forest
CN114925238B (en) Federal learning-based video clip retrieval method and system
CN103995828B (en) A kind of cloud storage daily record data analysis method
CN112756759A (en) Spot welding robot workstation fault judgment method
Qin et al. The NLP task effectiveness of long-range transformers
CN111860981A (en) Enterprise national industry category prediction method and system based on LSTM deep learning
CN115859777A (en) Method for predicting service life of product system in multiple fault modes
CN115618732A (en) Nuclear reactor digital twin key parameter autonomous optimization data inversion method
CN111949459A (en) Hard disk failure prediction method and system based on transfer learning and active learning
CN117743909A (en) Heating system fault analysis method and device based on artificial intelligence
CN111898673A (en) Dissolved oxygen content prediction method based on EMD and LSTM
CN114816962A (en) ATTENTION-LSTM-based network fault prediction method
CN113822336A (en) Cloud hard disk fault prediction method, device and system and readable storage medium
Yang et al. Zte-predictor: Disk failure prediction system based on lstm
CN117472679A (en) Anomaly detection method and system combining data flow and control flow drift discovery
CN117332858A (en) Construction method of intelligent automobile fault diagnosis system based on knowledge graph
Dui et al. Reliability Evaluation and Prediction Method with Small Samples.
CN115460061B (en) Health evaluation method and device based on intelligent operation and maintenance scene
CN115328866A (en) Log sampling based method and system for predicting event under process instance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant