CN113341919A - Computing system fault prediction method based on time sequence data length optimization - Google Patents
Computing system fault prediction method based on time sequence data length optimization Download PDFInfo
- Publication number
- CN113341919A CN113341919A CN202110601375.3A CN202110601375A CN113341919A CN 113341919 A CN113341919 A CN 113341919A CN 202110601375 A CN202110601375 A CN 202110601375A CN 113341919 A CN113341919 A CN 113341919A
- Authority
- CN
- China
- Prior art keywords
- model
- prediction
- data
- length
- precision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0243—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/20—Pc systems
- G05B2219/24—Pc safety
- G05B2219/24065—Real time diagnostics
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a computing system fault prediction method based on time sequence data length optimization, and belongs to the field of fault detection. The method comprises the following steps: s1: off-line training: based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea; s2: online prediction: the optimal sequence data length generated by offline training is used for a real-time fault prediction process; s3: updating the model: and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision. The invention improves the precision of the fault prediction model, reduces the times of model training in the optimal length searching process, and improves the adaptability of the prediction model to the change of a system and an environment.
Description
Technical Field
The invention belongs to the field of fault detection, and relates to a computing system fault prediction method based on time sequence data length optimization.
Background
In view of the common application of computing systems in various industries, unknown system faults may cause great influence, and the maintenance of system reliability is crucial to guarantee the continuous operation of the computing systems. However, a computing system is often composed of a plurality of different components, such as a hardware processor, a software module, a database, a network system, and the like, and the failure rules of the different components are unknown, the relationships are complex and influence each other, and it is difficult to perform accurate failure analysis through the internal composition structure of the system. From the perspective of a system, monitoring the state or quality of the system by using logs, probes and the like, and performing overall evaluation and fault prediction on a software module and bottom hardware and the like included in the software module through monitoring data are the main methods for maintaining the reliability of the current computing system.
The monitoring data of the computing system has periodicity and randomness, and the continuous monitoring data with single attribute or multiple attributes is the main basis for predicting and classifying system faults. Due to technical limitations of new computer technologies such as cloud platforms, microservices and the like or limitations of computing resources in real-time embedded computing systems such as unmanned aerial vehicle flight control systems, the systems often shield the outside from the bottom hardware architecture and even software module composition; meanwhile, due to the complexity of the attribute data relationship, it is difficult to establish a system state change mathematical model based on the data attribute change rule. Therefore, numerical analysis methods based on time series data, such as Bayesian analysis, machine learning, deep learning and the like, are not only widely applied to failure prediction of computing systems, but also applied to failure prediction in various fields of aerospace, intelligent manufacturing and the like.
In the prior art, known or unknown rules of monitoring data before a historical fault happens are obtained by analyzing historical monitoring data and extracting data characteristics; by comparing the characteristics of the current monitoring data, whether the fault is about to occur or not can be predicted, and the fault type can be judged. In the fault analysis of a computing system, the prior patent adopts a statistical analysis method such as a Bayes model and an ARIMA time sequence analysis model, a machine learning method such as a support vector machine and XGboost, and a deep learning method such as a deep neural network model of LSTM, CNN, GRU and the like to detect or predict faults. Compared with other methods, the deep learning method can improve the accuracy and precision of system fault prediction and classification, but usually adopts a fixed-length or indefinite-length time sequence data set for model construction. Data acquisition by a computing system in a real-world environment is a long-lasting process, and the acquired data is continuous data that changes over time. In order to generate a time series data set, the prior art discusses a data slicing method for continuous data or a timing fixed length data acquisition method, but does not discuss a selection method of a sequence data length or a data slice length, and has the following problems:
(1) for the fault prediction of different time periods, the time sequence data with different lengths have more obvious influence on the accuracy of the prediction model. In the model training phase of a real system, an algorithm model may need to be trained for multiple times by using sequence data with different lengths so as to compare the model accuracy. The existing fault prediction algorithm generally does not consider how to select the length of time series data used for training, does not consider the influence effect of the data length, does not have good practicability, and cannot ensure the performance of the algorithm in the training stage.
(2) In the system operation process, the data rule may dynamically change with time, and the fault prediction model trained by historical data may not be suitable for a long time and needs to be dynamically updated. While the prior art discusses dynamic training and updating methods for models, the change in the length of time series data used for training is not discussed further.
In view of the above disadvantages, a failure prediction method capable of improving the accuracy of the failure prediction model, reducing the number of times of training the model, and making the failure prediction model better adapt to the system change is needed.
Disclosure of Invention
In view of the above, the present invention provides a method for computing system fault prediction based on time series data length optimization, which is based on a binary search concept and uses the precision of a fault prediction model trained by different sequence data lengths as an evaluation index to compare the lengths of sequence data with different lengths, so as to achieve the purpose of optimally selecting the length of time series data of a specific fault prediction problem. Meanwhile, the purposes of estimating and maintaining the model prediction precision in real time are achieved by dynamically adjusting the length of the time series data.
In order to achieve the purpose, the invention provides the following technical scheme:
a computing system fault prediction method based on time sequence data length optimization comprises three processes of off-line training, on-line prediction and model updating. The off-line training process completes the selection of the optimal sequence length and the training of the model, the on-line prediction process adopts the off-line process training model to carry out fault prediction and system control, and the model updating process carries out the inspection and updating feedback of the model in the system operation process. As shown in fig. 1, the offline training needs to be performed before the online prediction process, and the model updating can be performed synchronously with the online prediction process. The method comprises the following steps:
s1: off-line training;
based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea;
s2: online prediction;
the optimal sequence data length generated by offline training is used for a real-time fault prediction process;
s3: updating the model;
and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision.
Further, in step S1, the offline training process (as shown in fig. 2) specifically includes the following steps:
s11: and (3) selecting a prediction period: determining a fault prediction time period n according to the characteristics of the computing system and project requirements, namely predicting the probability of a certain type of fault of the system after n times; querying whether there is a trained model f associated with nwAnd the optimal sequence data length tuple mwSetting the length m of the initial input data sequence to be searched if it exists0For the last searched recorded value mwOtherwise, set the starting search length m0Is a prediction period n;
s12: setting an initial to-be-searched setCombining: setting the value in the length set of the sequence data to be searched as m0Setting a lower boundary m1=m0/2, upper boundary m2=2m0Establishing a sequence data length set M ═ M to be searched0,m1,m2};
S13: model training and evaluation: for each value M in MjE is M (j is more than or equal to 0 and less than or equal to 2), if M does not exist in the trained model set FjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision;
s14: optimal sequence data length search: according to the sequence data length set M, different prediction models formed by the set M and model precision pjSearching and searching for the data sequence length with the optimal prediction precision:
if m is2-m1If the result is less than or equal to 2, ending the search and executing the optimal result storage step;
if m is2-m1>2, regenerating the element M in the search set M according to the following rulej:
If p is0≥p1≥p2Then at [ m1,m0]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m1)/2,m1’=m1,m2’=m0;
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m2)/2,m1’=m0,m2’=m2;
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m1,m1’=m1/2,m2’=m0;
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m2,m1’=m0,m2’=2m2;
S15: updating a set to be searched: p generated in the search process0、p1、p2Storing the model precision set P, and updating the set M to be searched into { M }0’,m1’,m2' }, returning to execute the model training and evaluation in the step S13;
s16: storing the optimal search result: comparing all model precisions in the model precision set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average valueComparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
Further, in step S13, if there is no training model in the training model set F, m is not includedjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision by adopting the following steps, specifically comprising:
s131: and (3) data set generation: slicing the continuous monitoring data to generate a plurality of m-length slicesjThe probability y of whether the system has specific faults after n times of each group of sequence data is used as a sequence data label, and the sequence data with the label is randomly divided into a training data set SjAnd test data set Tj;
S132: training a fault prediction model: model training data set S adopting time series related deep learning neural network such as LSTM, GRU and the likejObtaining a failure prediction model fjThe relevant parameters of (1); model fjThe middle input variable is m in lengthjIf a specific type of fault occurs after the output variable is n times, the model f is processedjAdding the training model set F;
s133: and (3) evaluating the model precision: using a prediction model fjFor test data set TjPredicting the intermediate sequence data and predicting the failure probabilityComparing with the actual fault probability y to evaluate the model precision pj。
Further, in step S2, the optimal sequence data length generated by offline training is used in a real-time fault prediction process (as shown in fig. 3), which specifically includes the following steps:
s21: searching a model: inquiring whether a trained model f related to n exists or not according to the fault prediction time period nwIf the off-line training flow does not exist, waiting for the off-line training flow to be executed; if yes, executing a fault real-time prediction step;
s22: and (3) fault real-time prediction: continuously extracting the length of m from the current latest datawInto the model fwIn the method, the predicted fault probability of various faults of the system after n times is obtainedIf the probability of the system generating specific faults is not less than the system maintenance probability threshold value, the corresponding system maintenance strategy is executed and the step S21 is returned, otherwise, the step is executed repeatedly.
Further, in step S3, the model updating process (as shown in fig. 4) specifically includes the following steps:
s31: updating the real-time data set: extracting length m from latest operation datawAnd n times after each set of sequence data, and the probability y of whether a particular failure has occurred in the systemwUpdating the training data set SwAnd test data set Tw;
S32: and (3) real-time evaluation of the model: after the system continuously runs for t time, adopting a prediction model fwFor test data set TwPredicting the middle sequence data and evaluating the model precision pw’;
By using amplification factor x<1, if pw’≥xpwReturning to step S31 to continue updating the data set;
if xp isA<pw’<xpwThen go to step S33;
if p isw’<xpAThen specify the starting sequence search length as mwRe-executing the off-line training process, searching for a new optimal sequence length and a prediction model, and returning to the step S31 to continuously update the data set;
s33: updating the model: using the updated test data set and training data set without changing mwOn the premise of adopting time series related deep learning neural network such as LSTM, GRU and the like to retrain the model fwThe relevant parameters of (1); returning to step S31 continues updating the data set.
The invention has the beneficial effects that:
1) the invention provides an optimal data sequence length selection mechanism for the fault prediction method based on the time sequence data, improves the precision of a fault prediction model, and reduces the times of model training in the optimal length search process.
2) In the running process of the system, the invention provides a dynamic optimal data sequence length transformation mechanism for the fault prediction method so as to improve the adaptability of the prediction model to the change of the system and the environment.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of a method for computing system fault prediction based on time series data length optimization in accordance with the present invention;
FIG. 2 is a flow chart of the off-line training process steps in the method of the present invention;
FIG. 3 is a flow chart of the on-line prediction process steps in the method of the present invention;
FIG. 4 is a flowchart illustrating the steps of the model update process of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 4, the embodiment takes a fault prediction of a flight control system of a multi-rotor unmanned aerial vehicle as an example, and describes implementation steps of the fault prediction method of the present invention. The flight control system is one of key core systems of the multi-rotor unmanned aerial vehicle, acquires information such as an angular velocity sensor, an attitude sensor, an altitude airspeed sensor and a position sensor, and realizes flight management, attitude control and flight on demand of the unmanned aerial vehicle. Errors in the flight control system and its associated modules can have serious consequences during flight of the drone. However, the flight control system is generally implemented by an onboard embedded system, and the complexity of associated modules and the limitation of resources limit online fault tracing and elimination, so that system fault prediction based on real-time operation data is crucial.
In this embodiment, the fault prediction is mainly performed by using 18 time-related sequence attributes, including sensor information continuously acquired by the flight control system, such as data of a gyroscope, an accelerometer, a barometer, and a GPS, real-time software and hardware information during the operation of the flight control system, such as data of CPU occupancy, memory occupancy, and IO throughput, and software logs during the flight control process, such as data of flight status, flight time, and flight distance. According to different attribute meanings, the data acquisition frequency is 0.2-5 Hz. The fault/state types of the flight control system are mainly classified into 4 types including GPS positioning fault, control instruction delay, unknown error and normal operation. In order to ensure that system faults can be processed in time, the frequency of fault prediction is 1Hz, and the prediction period is not more than 5 seconds.
According to the method content of the invention, firstly, 3 algorithm modules of model training, optimal sequence length searching and model updating are realized:
(1) a model training module: according to the specified data sequence length mjAnd a fault prediction period n, generating a fault prediction model and a model accuracy value.
Generating a data set: slicing the continuous monitoring data to generate a plurality of m-sized piecesjX 18 matrix data, and the probability [ y ] of whether a system has certain type of fault after 5 seconds of each group of matrix data1,y2,y3,y4]As a data tag, where y4Is the probability of the system operating normally. Partitioning tagged matrix data into a training data set SjAnd test data set Tj。
Training a fault prediction model: training data set S adopting time series related deep learning neural network such as LSTM, GRU and the likejObtaining a failure prediction model fjThe relevant parameters of (1). Model fjAdding a trained modelType set F, model FjThe middle input variable is mjX 18 matrix data, the output variable is the probability of a particular type of fault occurring after 5 seconds.
Assessing the precision of the model: using a prediction model fjFor test data set TjPredicting the intermediate sequence data and predicting the failure probabilityAnd the actual failure probability yi(i is more than or equal to 1 and less than or equal to 4) are compared, and the model precision p is evaluatedj. Aiming at the multi-classification probability prediction value, the average value of MAE and RMSE is adopted as the model precision evaluation index,wherein
(2) The optimal sequence length searching module: according to the length m of the initial data sequencewAnd a fault prediction period n, searching and searching the length of the data sequence with the optimal prediction precision.
Firstly, search initialization: setting the initial search number i to 0 if the sequence data length m is givenwThen setting the initial value m of the length of the input sequence data to be searched0=mwOtherwise, set m0N. Establishing a sequence data length set M to be searchedi={m0,m1,m2In which the lower boundary m1=m0/2, upper boundary m2=2m0。
Invoking a model training module: to MiEach sequence length value mj∈Mi(j is more than or equal to 0 and less than or equal to 2), calling a model training module to generate a corresponding fault prediction model fjAnd obtaining the accuracy p of the prediction modelj。
Generating a subsequent search set: accuracy index p of prediction model formed by comparing sequence data of different lengthsjAnd generating a data length set of subsequent search:
if m is2-m1If not more than 2, the searching is finished, and the fifth step is executed.
If m is2-m1>2, regenerating the search set M according to the following ruleiElement m in (1)j:
If p is0≥p1≥p2Then at [ m1,m0]Continuing to search within the interval, setting m0’=(m0+m1)/2,m1’=m1,m2’=m0;
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing to search within the interval, setting m0’=(m0+m2)/2,m1’=m0,m2’=m2;
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Direction search of (1), setting m0’=m1,m1’=m1/2,m2’=m0;
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Direction search of (1), setting m0’=m2,m1’=m0,m2’=2m2;
Fourthly, updating the set to be searched: p is to be0、p1、p2Storing the result into a model precision set P, updating the search times i to i +1, and updating a set M to be searchedi={m0’,m1’,m2' }, return to the step of executing II.
Storing an optimal search result: comparing all model precisions stored in the set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average valueComparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
(3) A model updating module: obtaining a prediction model f related to a prediction period nwAnd its precision pwAnd pAAnd updating the model according to the evaluation result of the latest test data set.
Adopting a prediction model fwFor the latest test data set TwPredicting the middle sequence data and evaluating the average accuracy p of the modelw’。
② adopting the magnification factor x as 0.9 if xpA<pw’<xpwThen the optimal sequence length search module and the latest data set S are calledwAnd TwRe-search sequence length and train algorithm model fw。
③ if pw’<xpAThen the starting sequence search length is assigned to the current sequence length mw(e.g., 24 seconds), calling an algorithm training module and the latest data set SwAnd TwRetraining the algorithmic model fw。
The operation of the flight control system is divided into two stages of non-flight and flight, so that the three processes of the invention are executed at different stages of the operation of the system.
(1) An off-line training process: the method is executed in the non-flight stage of the flight control system. By collecting historical data or generating simulation data in the flight process of the unmanned aerial vehicle in advance, the optimal sequence length search algorithm is called to obtain the optimal sequence length m by searching matrix data with the lengths of 5 seconds, 10 seconds, 2.5 seconds, 20 seconds, 40 seconds, 30 seconds, 25 seconds, 26 seconds, 24 seconds and the like in sequencew24 seconds, and calling a model training module to obtain an optimal model f in the searching processw。
(2) An online prediction process: the method is continuously executed in the in-flight phase of the flight control system. Reading the stored prediction model f after trainingwExtracting 18-attribute matrix data with the length of 24 seconds from the current latest data, and inputting the matrix data into a model fwIn the method, the probability of various faults of the system after 5 seconds is obtainedProbability of specific fault if systemAnd (4) the system maintenance probability threshold is more than or equal to 0.7, a fault warning is sent to the flight control background, the background is waited to take over the manual flight or take other control measures, and otherwise, data in the flight process are continuously read and the fault prediction at the next moment is carried out.
(3) And (3) updating the model: the method is executed in the flight and non-flight phases of the flight control system.
The data acquisition process is executed in the in-flight stage, and continuous data of 18 attributes in the flight process of the unmanned aerial vehicle are continuously recorded.
Secondly, the data set updating process is executed in the non-flying stage, 18-attribute matrix data with the length of 24 seconds are continuously extracted from the latest recorded data, the data interval frequency is 1Hz, and the probability y of specific fault of the system is acquired after 5 seconds corresponding to each group of sequence datawUpdating the training data set SwAnd test data set Tw。
Thirdly, the model evaluation and updating process is executed in the non-flying stage, and the latest data set S is adoptedwAnd TwAnd calling a model updating module to evaluate and update the algorithm model and the optimal sequence length value.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
Claims (7)
1. A method for predicting faults of a computing system based on time series data length optimization is characterized by comprising the following steps:
s1: off-line training;
based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea;
s2: online prediction;
the optimal sequence data length generated by offline training is used for a real-time fault prediction process;
s3: updating the model;
and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision.
2. The method for predicting faults of a computing system according to claim 1, wherein in the step S1, the off-line training specifically comprises the following steps:
s11: and (3) selecting a prediction period: determining a fault prediction time period n according to the characteristics of the computing system and project requirements, namely predicting the probability of a certain type of fault of the system after n times; querying whether there is a trained model f associated with nwAnd the optimal sequence data length tuple mwSetting the length m of the initial input data sequence to be searched if it exists0For the last searched recorded value mwOtherwise, set the starting search length m0Is a prediction period n;
s12: setting an initial set to be searched: setting the value in the length set of the sequence data to be searched as m0Setting a lower boundary m1=m0/2, upper boundary m2=2m0Establishing a sequence data length set M ═ M to be searched0,m1,m2};
S13: model training and evaluation:for each value M in MjE is M, j is more than or equal to 0 and less than or equal to 2, if M does not exist in the trained model set FjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision;
s14: optimal sequence data length search: according to the sequence data length set M, different prediction models formed by the set M and model precision pjSearching and searching for the data sequence length with the optimal prediction precision:
if m is2-m1If the result is less than or equal to 2, ending the search and executing the optimal result storage step;
if m is2-m1>2, regenerating the element M in the set M according to the following rulesj:
If p is0≥p1≥p2Then at [ m1,m0]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m1)/2,m1’=m1,m2’=m0;
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m2)/2,m1’=m0,m2’=m2;
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m1,m1’=m1/2,m2’=m0;
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m2,m1’=m0,m2’=2m2;
S15: updating a set to be searched: p generated in the search process0、p1、p2Storing the model precision set P, and updating the set M to be searched into { M }0’,m1’,m2' }, returning to execute the model training and evaluation in the step S13;
s16: storing the optimal search result: comparing all model precisions in the model precision set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average valueComparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
3. The method of claim 2, wherein in step S13, if there is no model in the trained model set F, m is determined to be the same as mjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision by adopting the following steps, specifically comprising:
s131: and (3) data set generation: slicing the continuous monitoring data to generate a plurality of m-length slicesjThe probability y of whether the system has specific faults after n times of each group of sequence data is used as a sequence data label, and the sequence data with the label is randomly divided into a training data set SjAnd test data set Tj;
S132: training a fault prediction model: deep learning neural network training data set S adopting time series correlationjObtaining a failure prediction model fjThe relevant parameters of (1); model fjThe middle input variable is m in lengthjIf a specific type of fault occurs after the output variable is n times, the model f is processedjAdding the training model set F;
5. The method for predicting faults of a computing system according to claim 2, wherein in step S2, the optimal sequence data length generated by offline training is used in a real-time fault prediction process, and the method specifically comprises the following steps:
s21: searching a model: inquiring whether a trained model f related to n exists or not according to the fault prediction time period nwIf the off-line training flow does not exist, waiting for the off-line training flow to be executed; if yes, executing a fault real-time prediction step;
s22: and (3) fault real-time prediction: continuously extracting the length of m from the current latest datawInto the model fwIn the method, the predicted fault probability of various faults of the system after n times is obtainedIf the system happensIf the probability of the specific fault is not less than the system maintenance probability threshold, the corresponding system maintenance strategy is executed and the step S21 is returned, otherwise, the step is repeatedly executed.
6. The method for predicting a failure in a computing system according to claim 2, wherein in step S3, the updating the model specifically includes the steps of:
s31: updating the real-time data set: extracting length m from latest operation datawAnd n times after each set of sequence data, and the probability y of whether a particular failure has occurred in the systemwUpdating the training data set SwAnd test data set Tw;
S32: and (3) real-time evaluation of the model: after the system continuously runs for t time, adopting a prediction model fwFor test data set TwPredicting the middle sequence data and evaluating the model precision pw’;
By using amplification factor x<1, if pw’≥xpwReturning to step S31 to continue updating the data set;
if xp isA<pw’<xpwThen go to step S33;
if p isw’<xpAThen specify the starting sequence search length as mwRe-executing the off-line training process to search for a new optimal sequence length and a prediction model, and returning to the step S31 to continuously update the data set;
s33: updating the model: using the updated test data set and training data set without changing mwOn the premise of adopting the deep learning neural network related to the time series to retrain the model fwThe relevant parameters of (1); returning to step S31 continues updating the data set.
7. The method of predicting a failure in a computing system of claim 3 or 6, wherein the deep learning neural network comprises an LSTM or GRU network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110601375.3A CN113341919B (en) | 2021-05-31 | 2021-05-31 | Computing system fault prediction method based on time sequence data length optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110601375.3A CN113341919B (en) | 2021-05-31 | 2021-05-31 | Computing system fault prediction method based on time sequence data length optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113341919A true CN113341919A (en) | 2021-09-03 |
CN113341919B CN113341919B (en) | 2022-11-08 |
Family
ID=77472832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110601375.3A Active CN113341919B (en) | 2021-05-31 | 2021-05-31 | Computing system fault prediction method based on time sequence data length optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113341919B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113985207A (en) * | 2021-10-28 | 2022-01-28 | 国网北京市电力公司 | Method, system and device for monitoring faults of power grid operation equipment and storage medium |
CN114002597A (en) * | 2021-10-25 | 2022-02-01 | 浙江理工大学 | Motor fault diagnosis method and system based on GRU network stator current analysis |
CN115509789A (en) * | 2022-09-30 | 2022-12-23 | 中国科学院重庆绿色智能技术研究院 | Computing system fault prediction method and system based on component calling analysis |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007108809A (en) * | 2005-10-11 | 2007-04-26 | Hitachi Ltd | Time-series prediction system, time-series prediction method, and time-series prediction program |
CN104252406A (en) * | 2013-06-28 | 2014-12-31 | 华为技术有限公司 | Method and device for processing data |
CN104316801A (en) * | 2014-10-31 | 2015-01-28 | 国家电网公司 | Power system fault diagnosis method based on time sequence similarity matching |
CN107358311A (en) * | 2017-06-07 | 2017-11-17 | 西安工业大学 | A kind of Time Series Forecasting Methods |
CN109325060A (en) * | 2018-07-27 | 2019-02-12 | 山东大学 | A kind of Model of Time Series Streaming method for fast searching based on data characteristics |
CN110222329A (en) * | 2019-04-22 | 2019-09-10 | 平安科技(深圳)有限公司 | A kind of Chinese word cutting method and device based on deep learning |
US20190370603A1 (en) * | 2018-05-29 | 2019-12-05 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method and apparatus for establishing an application prediction model, storage medium and terminal |
CN110570013A (en) * | 2019-08-06 | 2019-12-13 | 山东省科学院海洋仪器仪表研究所 | Single-station online wave period data prediction diagnosis method |
CN110865625A (en) * | 2018-08-28 | 2020-03-06 | 中国科学院沈阳自动化研究所 | Process data anomaly detection method based on time series |
CN110889190A (en) * | 2018-09-11 | 2020-03-17 | 湖南银杏可靠性技术研究所有限公司 | Performance degradation modeling data volume optimization method facing prediction precision requirement |
CN111273623A (en) * | 2020-02-25 | 2020-06-12 | 电子科技大学 | Fault diagnosis method based on Stacked LSTM |
CN111614504A (en) * | 2020-06-02 | 2020-09-01 | 国网山西省电力公司电力科学研究院 | Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis |
CN111639798A (en) * | 2020-05-26 | 2020-09-08 | 华青融天(北京)软件股份有限公司 | Intelligent prediction model selection method and device |
CN111798018A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Behavior prediction method, behavior prediction device, storage medium and electronic equipment |
CN111815056A (en) * | 2020-07-10 | 2020-10-23 | 中国人民解放军空军工程大学 | Aircraft external field aircraft fuel system fault prediction method based on flight parameter data |
US20200380409A1 (en) * | 2019-05-29 | 2020-12-03 | Samsung Sds Co., Ltd. | Apparatus and method for analyzing time-series data based on machine learning |
US20200387753A1 (en) * | 2019-06-10 | 2020-12-10 | International Business Machines Corporation | Data slicing for machine learning performance testing and improvement |
CN112712166A (en) * | 2020-12-31 | 2021-04-27 | 深圳前海微众银行股份有限公司 | Prediction method and device based on time series |
-
2021
- 2021-05-31 CN CN202110601375.3A patent/CN113341919B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007108809A (en) * | 2005-10-11 | 2007-04-26 | Hitachi Ltd | Time-series prediction system, time-series prediction method, and time-series prediction program |
CN104252406A (en) * | 2013-06-28 | 2014-12-31 | 华为技术有限公司 | Method and device for processing data |
CN104316801A (en) * | 2014-10-31 | 2015-01-28 | 国家电网公司 | Power system fault diagnosis method based on time sequence similarity matching |
CN107358311A (en) * | 2017-06-07 | 2017-11-17 | 西安工业大学 | A kind of Time Series Forecasting Methods |
US20190370603A1 (en) * | 2018-05-29 | 2019-12-05 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method and apparatus for establishing an application prediction model, storage medium and terminal |
CN109325060A (en) * | 2018-07-27 | 2019-02-12 | 山东大学 | A kind of Model of Time Series Streaming method for fast searching based on data characteristics |
CN110865625A (en) * | 2018-08-28 | 2020-03-06 | 中国科学院沈阳自动化研究所 | Process data anomaly detection method based on time series |
CN110889190A (en) * | 2018-09-11 | 2020-03-17 | 湖南银杏可靠性技术研究所有限公司 | Performance degradation modeling data volume optimization method facing prediction precision requirement |
CN111798018A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Behavior prediction method, behavior prediction device, storage medium and electronic equipment |
CN110222329A (en) * | 2019-04-22 | 2019-09-10 | 平安科技(深圳)有限公司 | A kind of Chinese word cutting method and device based on deep learning |
US20200380409A1 (en) * | 2019-05-29 | 2020-12-03 | Samsung Sds Co., Ltd. | Apparatus and method for analyzing time-series data based on machine learning |
US20200387753A1 (en) * | 2019-06-10 | 2020-12-10 | International Business Machines Corporation | Data slicing for machine learning performance testing and improvement |
CN110570013A (en) * | 2019-08-06 | 2019-12-13 | 山东省科学院海洋仪器仪表研究所 | Single-station online wave period data prediction diagnosis method |
CN111273623A (en) * | 2020-02-25 | 2020-06-12 | 电子科技大学 | Fault diagnosis method based on Stacked LSTM |
CN111639798A (en) * | 2020-05-26 | 2020-09-08 | 华青融天(北京)软件股份有限公司 | Intelligent prediction model selection method and device |
CN111614504A (en) * | 2020-06-02 | 2020-09-01 | 国网山西省电力公司电力科学研究院 | Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis |
CN111815056A (en) * | 2020-07-10 | 2020-10-23 | 中国人民解放军空军工程大学 | Aircraft external field aircraft fuel system fault prediction method based on flight parameter data |
CN112712166A (en) * | 2020-12-31 | 2021-04-27 | 深圳前海微众银行股份有限公司 | Prediction method and device based on time series |
Non-Patent Citations (8)
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114002597A (en) * | 2021-10-25 | 2022-02-01 | 浙江理工大学 | Motor fault diagnosis method and system based on GRU network stator current analysis |
CN113985207A (en) * | 2021-10-28 | 2022-01-28 | 国网北京市电力公司 | Method, system and device for monitoring faults of power grid operation equipment and storage medium |
CN115509789A (en) * | 2022-09-30 | 2022-12-23 | 中国科学院重庆绿色智能技术研究院 | Computing system fault prediction method and system based on component calling analysis |
CN115509789B (en) * | 2022-09-30 | 2023-08-11 | 中国科学院重庆绿色智能技术研究院 | Method and system for predicting faults of computing system based on component call analysis |
Also Published As
Publication number | Publication date |
---|---|
CN113341919B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113341919B (en) | Computing system fault prediction method based on time sequence data length optimization | |
CN109191922B (en) | Large-scale four-dimensional track dynamic prediction method and device | |
CN109117883B (en) | SAR image sea ice classification method and system based on long-time memory network | |
CN109766583A (en) | Based on no label, unbalanced, initial value uncertain data aero-engine service life prediction technique | |
CN112906858A (en) | Real-time prediction method for ship motion trail | |
CN106709588B (en) | Prediction model construction method and device and real-time prediction method and device | |
CN110609524A (en) | Industrial equipment residual life prediction model and construction method and application thereof | |
CN110018453A (en) | Intelligent type recognition methods based on aircraft track feature | |
CN113366473A (en) | Method and system for automatic selection of models for time series prediction of data streams | |
CN111046979A (en) | Method and system for discovering badcase based on small sample learning | |
CN109298633A (en) | Chemical production process fault monitoring method based on adaptive piecemeal Non-negative Matrix Factorization | |
CN115310674A (en) | Long-time sequence prediction method based on parallel neural network model LDformer | |
CN113406623A (en) | Target identification method, device and medium based on radar high-resolution range profile | |
CN114091752A (en) | Method for improving time sequence prediction effect of time sequence prediction system | |
CN117349583A (en) | Intelligent detection method and system for low-temperature liquid storage tank | |
Wang et al. | Three‐stage feature selection approach for deep learning‐based RUL prediction methods | |
CN114510871A (en) | Cloud server performance degradation prediction method based on thought evolution and LSTM | |
Li et al. | A lightweight and explainable data-driven scheme for fault detection of aerospace sensors | |
CN116029379B (en) | Method for constructing air target intention recognition model | |
CN114139589A (en) | Fault diagnosis method, device, equipment and computer readable storage medium | |
Dai et al. | Predicting go-around occurrence with input-output hidden Markov model | |
CN112257893A (en) | Complex electromechanical system health state prediction method considering monitoring error | |
CN115130380A (en) | Strategic flight schedule delay distribution prediction method based on machine learning | |
Hao et al. | Ship trajectory anomaly detection based on TCN model | |
Nivitha et al. | An Ensemble Approach for Flight Delay Prediction Through Spatiotemporal Parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |