CN113341919A - Computing system fault prediction method based on time sequence data length optimization - Google Patents

Computing system fault prediction method based on time sequence data length optimization Download PDF

Info

Publication number
CN113341919A
CN113341919A CN202110601375.3A CN202110601375A CN113341919A CN 113341919 A CN113341919 A CN 113341919A CN 202110601375 A CN202110601375 A CN 202110601375A CN 113341919 A CN113341919 A CN 113341919A
Authority
CN
China
Prior art keywords
model
prediction
data
length
precision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110601375.3A
Other languages
Chinese (zh)
Other versions
CN113341919B (en
Inventor
何盼
刘刚
洪昌萍
江玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN202110601375.3A priority Critical patent/CN113341919B/en
Publication of CN113341919A publication Critical patent/CN113341919A/en
Application granted granted Critical
Publication of CN113341919B publication Critical patent/CN113341919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0243Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/24Pc safety
    • G05B2219/24065Real time diagnostics

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a computing system fault prediction method based on time sequence data length optimization, and belongs to the field of fault detection. The method comprises the following steps: s1: off-line training: based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea; s2: online prediction: the optimal sequence data length generated by offline training is used for a real-time fault prediction process; s3: updating the model: and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision. The invention improves the precision of the fault prediction model, reduces the times of model training in the optimal length searching process, and improves the adaptability of the prediction model to the change of a system and an environment.

Description

Computing system fault prediction method based on time sequence data length optimization
Technical Field
The invention belongs to the field of fault detection, and relates to a computing system fault prediction method based on time sequence data length optimization.
Background
In view of the common application of computing systems in various industries, unknown system faults may cause great influence, and the maintenance of system reliability is crucial to guarantee the continuous operation of the computing systems. However, a computing system is often composed of a plurality of different components, such as a hardware processor, a software module, a database, a network system, and the like, and the failure rules of the different components are unknown, the relationships are complex and influence each other, and it is difficult to perform accurate failure analysis through the internal composition structure of the system. From the perspective of a system, monitoring the state or quality of the system by using logs, probes and the like, and performing overall evaluation and fault prediction on a software module and bottom hardware and the like included in the software module through monitoring data are the main methods for maintaining the reliability of the current computing system.
The monitoring data of the computing system has periodicity and randomness, and the continuous monitoring data with single attribute or multiple attributes is the main basis for predicting and classifying system faults. Due to technical limitations of new computer technologies such as cloud platforms, microservices and the like or limitations of computing resources in real-time embedded computing systems such as unmanned aerial vehicle flight control systems, the systems often shield the outside from the bottom hardware architecture and even software module composition; meanwhile, due to the complexity of the attribute data relationship, it is difficult to establish a system state change mathematical model based on the data attribute change rule. Therefore, numerical analysis methods based on time series data, such as Bayesian analysis, machine learning, deep learning and the like, are not only widely applied to failure prediction of computing systems, but also applied to failure prediction in various fields of aerospace, intelligent manufacturing and the like.
In the prior art, known or unknown rules of monitoring data before a historical fault happens are obtained by analyzing historical monitoring data and extracting data characteristics; by comparing the characteristics of the current monitoring data, whether the fault is about to occur or not can be predicted, and the fault type can be judged. In the fault analysis of a computing system, the prior patent adopts a statistical analysis method such as a Bayes model and an ARIMA time sequence analysis model, a machine learning method such as a support vector machine and XGboost, and a deep learning method such as a deep neural network model of LSTM, CNN, GRU and the like to detect or predict faults. Compared with other methods, the deep learning method can improve the accuracy and precision of system fault prediction and classification, but usually adopts a fixed-length or indefinite-length time sequence data set for model construction. Data acquisition by a computing system in a real-world environment is a long-lasting process, and the acquired data is continuous data that changes over time. In order to generate a time series data set, the prior art discusses a data slicing method for continuous data or a timing fixed length data acquisition method, but does not discuss a selection method of a sequence data length or a data slice length, and has the following problems:
(1) for the fault prediction of different time periods, the time sequence data with different lengths have more obvious influence on the accuracy of the prediction model. In the model training phase of a real system, an algorithm model may need to be trained for multiple times by using sequence data with different lengths so as to compare the model accuracy. The existing fault prediction algorithm generally does not consider how to select the length of time series data used for training, does not consider the influence effect of the data length, does not have good practicability, and cannot ensure the performance of the algorithm in the training stage.
(2) In the system operation process, the data rule may dynamically change with time, and the fault prediction model trained by historical data may not be suitable for a long time and needs to be dynamically updated. While the prior art discusses dynamic training and updating methods for models, the change in the length of time series data used for training is not discussed further.
In view of the above disadvantages, a failure prediction method capable of improving the accuracy of the failure prediction model, reducing the number of times of training the model, and making the failure prediction model better adapt to the system change is needed.
Disclosure of Invention
In view of the above, the present invention provides a method for computing system fault prediction based on time series data length optimization, which is based on a binary search concept and uses the precision of a fault prediction model trained by different sequence data lengths as an evaluation index to compare the lengths of sequence data with different lengths, so as to achieve the purpose of optimally selecting the length of time series data of a specific fault prediction problem. Meanwhile, the purposes of estimating and maintaining the model prediction precision in real time are achieved by dynamically adjusting the length of the time series data.
In order to achieve the purpose, the invention provides the following technical scheme:
a computing system fault prediction method based on time sequence data length optimization comprises three processes of off-line training, on-line prediction and model updating. The off-line training process completes the selection of the optimal sequence length and the training of the model, the on-line prediction process adopts the off-line process training model to carry out fault prediction and system control, and the model updating process carries out the inspection and updating feedback of the model in the system operation process. As shown in fig. 1, the offline training needs to be performed before the online prediction process, and the model updating can be performed synchronously with the online prediction process. The method comprises the following steps:
s1: off-line training;
based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea;
s2: online prediction;
the optimal sequence data length generated by offline training is used for a real-time fault prediction process;
s3: updating the model;
and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision.
Further, in step S1, the offline training process (as shown in fig. 2) specifically includes the following steps:
s11: and (3) selecting a prediction period: determining a fault prediction time period n according to the characteristics of the computing system and project requirements, namely predicting the probability of a certain type of fault of the system after n times; querying whether there is a trained model f associated with nwAnd the optimal sequence data length tuple mwSetting the length m of the initial input data sequence to be searched if it exists0For the last searched recorded value mwOtherwise, set the starting search length m0Is a prediction period n;
s12: setting an initial to-be-searched setCombining: setting the value in the length set of the sequence data to be searched as m0Setting a lower boundary m1=m0/2, upper boundary m2=2m0Establishing a sequence data length set M ═ M to be searched0,m1,m2};
S13: model training and evaluation: for each value M in MjE is M (j is more than or equal to 0 and less than or equal to 2), if M does not exist in the trained model set FjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision;
s14: optimal sequence data length search: according to the sequence data length set M, different prediction models formed by the set M and model precision pjSearching and searching for the data sequence length with the optimal prediction precision:
if m is2-m1If the result is less than or equal to 2, ending the search and executing the optimal result storage step;
if m is2-m1>2, regenerating the element M in the search set M according to the following rulej
If p is0≥p1≥p2Then at [ m1,m0]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m1)/2,m1’=m1,m2’=m0
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m2)/2,m1’=m0,m2’=m2
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m1,m1’=m1/2,m2’=m0
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m2,m1’=m0,m2’=2m2
S15: updating a set to be searched: p generated in the search process0、p1、p2Storing the model precision set P, and updating the set M to be searched into { M }0’,m1’,m2' }, returning to execute the model training and evaluation in the step S13;
s16: storing the optimal search result: comparing all model precisions in the model precision set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average value
Figure BDA0003093124170000031
Comparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
Further, in step S13, if there is no training model in the training model set F, m is not includedjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision by adopting the following steps, specifically comprising:
s131: and (3) data set generation: slicing the continuous monitoring data to generate a plurality of m-length slicesjThe probability y of whether the system has specific faults after n times of each group of sequence data is used as a sequence data label, and the sequence data with the label is randomly divided into a training data set SjAnd test data set Tj
S132: training a fault prediction model: model training data set S adopting time series related deep learning neural network such as LSTM, GRU and the likejObtaining a failure prediction model fjThe relevant parameters of (1); model fjThe middle input variable is m in lengthjIf a specific type of fault occurs after the output variable is n times, the model f is processedjAdding the training model set F;
s133: and (3) evaluating the model precision: using a prediction model fjFor test data set TjPredicting the intermediate sequence data and predicting the failure probability
Figure BDA0003093124170000041
Comparing with the actual fault probability y to evaluate the model precision pj
Further, in step S133, MAE and RMSE are used as model accuracy evaluation indexes, wherein
Figure BDA0003093124170000042
Further, in step S2, the optimal sequence data length generated by offline training is used in a real-time fault prediction process (as shown in fig. 3), which specifically includes the following steps:
s21: searching a model: inquiring whether a trained model f related to n exists or not according to the fault prediction time period nwIf the off-line training flow does not exist, waiting for the off-line training flow to be executed; if yes, executing a fault real-time prediction step;
s22: and (3) fault real-time prediction: continuously extracting the length of m from the current latest datawInto the model fwIn the method, the predicted fault probability of various faults of the system after n times is obtained
Figure BDA0003093124170000043
If the probability of the system generating specific faults is not less than the system maintenance probability threshold value, the corresponding system maintenance strategy is executed and the step S21 is returned, otherwise, the step is executed repeatedly.
Further, in step S3, the model updating process (as shown in fig. 4) specifically includes the following steps:
s31: updating the real-time data set: extracting length m from latest operation datawAnd n times after each set of sequence data, and the probability y of whether a particular failure has occurred in the systemwUpdating the training data set SwAnd test data set Tw
S32: and (3) real-time evaluation of the model: after the system continuously runs for t time, adopting a prediction model fwFor test data set TwPredicting the middle sequence data and evaluating the model precision pw’;
By using amplification factor x<1, if pw’≥xpwReturning to step S31 to continue updating the data set;
if xp isA<pw’<xpwThen go to step S33;
if p isw’<xpAThen specify the starting sequence search length as mwRe-executing the off-line training process, searching for a new optimal sequence length and a prediction model, and returning to the step S31 to continuously update the data set;
s33: updating the model: using the updated test data set and training data set without changing mwOn the premise of adopting time series related deep learning neural network such as LSTM, GRU and the like to retrain the model fwThe relevant parameters of (1); returning to step S31 continues updating the data set.
The invention has the beneficial effects that:
1) the invention provides an optimal data sequence length selection mechanism for the fault prediction method based on the time sequence data, improves the precision of a fault prediction model, and reduces the times of model training in the optimal length search process.
2) In the running process of the system, the invention provides a dynamic optimal data sequence length transformation mechanism for the fault prediction method so as to improve the adaptability of the prediction model to the change of the system and the environment.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of a method for computing system fault prediction based on time series data length optimization in accordance with the present invention;
FIG. 2 is a flow chart of the off-line training process steps in the method of the present invention;
FIG. 3 is a flow chart of the on-line prediction process steps in the method of the present invention;
FIG. 4 is a flowchart illustrating the steps of the model update process of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 4, the embodiment takes a fault prediction of a flight control system of a multi-rotor unmanned aerial vehicle as an example, and describes implementation steps of the fault prediction method of the present invention. The flight control system is one of key core systems of the multi-rotor unmanned aerial vehicle, acquires information such as an angular velocity sensor, an attitude sensor, an altitude airspeed sensor and a position sensor, and realizes flight management, attitude control and flight on demand of the unmanned aerial vehicle. Errors in the flight control system and its associated modules can have serious consequences during flight of the drone. However, the flight control system is generally implemented by an onboard embedded system, and the complexity of associated modules and the limitation of resources limit online fault tracing and elimination, so that system fault prediction based on real-time operation data is crucial.
In this embodiment, the fault prediction is mainly performed by using 18 time-related sequence attributes, including sensor information continuously acquired by the flight control system, such as data of a gyroscope, an accelerometer, a barometer, and a GPS, real-time software and hardware information during the operation of the flight control system, such as data of CPU occupancy, memory occupancy, and IO throughput, and software logs during the flight control process, such as data of flight status, flight time, and flight distance. According to different attribute meanings, the data acquisition frequency is 0.2-5 Hz. The fault/state types of the flight control system are mainly classified into 4 types including GPS positioning fault, control instruction delay, unknown error and normal operation. In order to ensure that system faults can be processed in time, the frequency of fault prediction is 1Hz, and the prediction period is not more than 5 seconds.
According to the method content of the invention, firstly, 3 algorithm modules of model training, optimal sequence length searching and model updating are realized:
(1) a model training module: according to the specified data sequence length mjAnd a fault prediction period n, generating a fault prediction model and a model accuracy value.
Generating a data set: slicing the continuous monitoring data to generate a plurality of m-sized piecesjX 18 matrix data, and the probability [ y ] of whether a system has certain type of fault after 5 seconds of each group of matrix data1,y2,y3,y4]As a data tag, where y4Is the probability of the system operating normally. Partitioning tagged matrix data into a training data set SjAnd test data set Tj
Training a fault prediction model: training data set S adopting time series related deep learning neural network such as LSTM, GRU and the likejObtaining a failure prediction model fjThe relevant parameters of (1). Model fjAdding a trained modelType set F, model FjThe middle input variable is mjX 18 matrix data, the output variable is the probability of a particular type of fault occurring after 5 seconds.
Assessing the precision of the model: using a prediction model fjFor test data set TjPredicting the intermediate sequence data and predicting the failure probability
Figure BDA0003093124170000061
And the actual failure probability yi(i is more than or equal to 1 and less than or equal to 4) are compared, and the model precision p is evaluatedj. Aiming at the multi-classification probability prediction value, the average value of MAE and RMSE is adopted as the model precision evaluation index,
Figure BDA0003093124170000062
wherein
Figure BDA0003093124170000063
(2) The optimal sequence length searching module: according to the length m of the initial data sequencewAnd a fault prediction period n, searching and searching the length of the data sequence with the optimal prediction precision.
Firstly, search initialization: setting the initial search number i to 0 if the sequence data length m is givenwThen setting the initial value m of the length of the input sequence data to be searched0=mwOtherwise, set m0N. Establishing a sequence data length set M to be searchedi={m0,m1,m2In which the lower boundary m1=m0/2, upper boundary m2=2m0
Invoking a model training module: to MiEach sequence length value mj∈Mi(j is more than or equal to 0 and less than or equal to 2), calling a model training module to generate a corresponding fault prediction model fjAnd obtaining the accuracy p of the prediction modelj
Generating a subsequent search set: accuracy index p of prediction model formed by comparing sequence data of different lengthsjAnd generating a data length set of subsequent search:
if m is2-m1If not more than 2, the searching is finished, and the fifth step is executed.
If m is2-m1>2, regenerating the search set M according to the following ruleiElement m in (1)j
If p is0≥p1≥p2Then at [ m1,m0]Continuing to search within the interval, setting m0’=(m0+m1)/2,m1’=m1,m2’=m0
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing to search within the interval, setting m0’=(m0+m2)/2,m1’=m0,m2’=m2
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Direction search of (1), setting m0’=m1,m1’=m1/2,m2’=m0
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Direction search of (1), setting m0’=m2,m1’=m0,m2’=2m2
Fourthly, updating the set to be searched: p is to be0、p1、p2Storing the result into a model precision set P, updating the search times i to i +1, and updating a set M to be searchedi={m0’,m1’,m2' }, return to the step of executing II.
Storing an optimal search result: comparing all model precisions stored in the set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average value
Figure BDA0003093124170000071
Comparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
(3) A model updating module: obtaining a prediction model f related to a prediction period nwAnd its precision pwAnd pAAnd updating the model according to the evaluation result of the latest test data set.
Adopting a prediction model fwFor the latest test data set TwPredicting the middle sequence data and evaluating the average accuracy p of the modelw’。
② adopting the magnification factor x as 0.9 if xpA<pw’<xpwThen the optimal sequence length search module and the latest data set S are calledwAnd TwRe-search sequence length and train algorithm model fw
③ if pw’<xpAThen the starting sequence search length is assigned to the current sequence length mw(e.g., 24 seconds), calling an algorithm training module and the latest data set SwAnd TwRetraining the algorithmic model fw
The operation of the flight control system is divided into two stages of non-flight and flight, so that the three processes of the invention are executed at different stages of the operation of the system.
(1) An off-line training process: the method is executed in the non-flight stage of the flight control system. By collecting historical data or generating simulation data in the flight process of the unmanned aerial vehicle in advance, the optimal sequence length search algorithm is called to obtain the optimal sequence length m by searching matrix data with the lengths of 5 seconds, 10 seconds, 2.5 seconds, 20 seconds, 40 seconds, 30 seconds, 25 seconds, 26 seconds, 24 seconds and the like in sequencew24 seconds, and calling a model training module to obtain an optimal model f in the searching processw
(2) An online prediction process: the method is continuously executed in the in-flight phase of the flight control system. Reading the stored prediction model f after trainingwExtracting 18-attribute matrix data with the length of 24 seconds from the current latest data, and inputting the matrix data into a model fwIn the method, the probability of various faults of the system after 5 seconds is obtained
Figure BDA0003093124170000072
Probability of specific fault if system
Figure BDA0003093124170000073
And (4) the system maintenance probability threshold is more than or equal to 0.7, a fault warning is sent to the flight control background, the background is waited to take over the manual flight or take other control measures, and otherwise, data in the flight process are continuously read and the fault prediction at the next moment is carried out.
(3) And (3) updating the model: the method is executed in the flight and non-flight phases of the flight control system.
The data acquisition process is executed in the in-flight stage, and continuous data of 18 attributes in the flight process of the unmanned aerial vehicle are continuously recorded.
Secondly, the data set updating process is executed in the non-flying stage, 18-attribute matrix data with the length of 24 seconds are continuously extracted from the latest recorded data, the data interval frequency is 1Hz, and the probability y of specific fault of the system is acquired after 5 seconds corresponding to each group of sequence datawUpdating the training data set SwAnd test data set Tw
Thirdly, the model evaluation and updating process is executed in the non-flying stage, and the latest data set S is adoptedwAnd TwAnd calling a model updating module to evaluate and update the algorithm model and the optimal sequence length value.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (7)

1. A method for predicting faults of a computing system based on time series data length optimization is characterized by comprising the following steps:
s1: off-line training;
based on historical system operation data, data slicing is carried out by adopting different data sequence lengths, and different fault prediction models are constructed; searching the sequence data length with the optimal prediction precision and a corresponding fault prediction model based on a binary search idea;
s2: online prediction;
the optimal sequence data length generated by offline training is used for a real-time fault prediction process;
s3: updating the model;
and in the continuous operation process of the system, the real data statistical model is adopted to predict the precision in real time, and the failure prediction model parameters or the sequence data length are updated according to the decline of the precision.
2. The method for predicting faults of a computing system according to claim 1, wherein in the step S1, the off-line training specifically comprises the following steps:
s11: and (3) selecting a prediction period: determining a fault prediction time period n according to the characteristics of the computing system and project requirements, namely predicting the probability of a certain type of fault of the system after n times; querying whether there is a trained model f associated with nwAnd the optimal sequence data length tuple mwSetting the length m of the initial input data sequence to be searched if it exists0For the last searched recorded value mwOtherwise, set the starting search length m0Is a prediction period n;
s12: setting an initial set to be searched: setting the value in the length set of the sequence data to be searched as m0Setting a lower boundary m1=m0/2, upper boundary m2=2m0Establishing a sequence data length set M ═ M to be searched0,m1,m2};
S13: model training and evaluation:for each value M in MjE is M, j is more than or equal to 0 and less than or equal to 2, if M does not exist in the trained model set FjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision;
s14: optimal sequence data length search: according to the sequence data length set M, different prediction models formed by the set M and model precision pjSearching and searching for the data sequence length with the optimal prediction precision:
if m is2-m1If the result is less than or equal to 2, ending the search and executing the optimal result storage step;
if m is2-m1>2, regenerating the element M in the set M according to the following rulesj
If p is0≥p1≥p2Then at [ m1,m0]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m1)/2,m1’=m1,m2’=m0
If p is0≥p2≥p1Or p0≥p1And p is2-p0Delta is less than or equal to delta, then in [ m ]0,m2]Continuing searching in the interval, and resetting the median, the lower boundary and the upper boundary of the set to be m0’=(m0+m2)/2,m1’=m0,m2’=m2
If p is1≥p2≥p0Or p1≥p0≥p2Then decrease m1Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m1,m1’=m1/2,m2’=m0
If p is2≥p1≥p0Or p2≥p0≥p1And p is2-p0>δ, then increase m2Searching direction of (1), resetting median, lower boundary and upper boundary m in the set0’=m2,m1’=m0,m2’=2m2
S15: updating a set to be searched: p generated in the search process0、p1、p2Storing the model precision set P, and updating the set M to be searched into { M }0’,m1’,m2' }, returning to execute the model training and evaluation in the step S13;
s16: storing the optimal search result: comparing all model precisions in the model precision set P, and selecting the first k precision data P with the highest precisionvE.g. P, calculating the average value
Figure FDA0003093124160000021
Comparing the highest prediction precisions P in the set Pw=max{pv|pvBelongs to P, and obtains PwCorresponding model fwAnd data length mwPredicting the failure period n and the prediction precision pwPrediction model fwLength m of sequence datawAnd average prediction accuracy pAStored as tuples in a pre-trained model library.
3. The method of claim 2, wherein in step S13, if there is no model in the trained model set F, m is determined to be the same as mjCorresponding failure prediction model fjAnd prediction model accuracy pjTraining a prediction model and evaluating the model precision by adopting the following steps, specifically comprising:
s131: and (3) data set generation: slicing the continuous monitoring data to generate a plurality of m-length slicesjThe probability y of whether the system has specific faults after n times of each group of sequence data is used as a sequence data label, and the sequence data with the label is randomly divided into a training data set SjAnd test data set Tj
S132: training a fault prediction model: deep learning neural network training data set S adopting time series correlationjObtaining a failure prediction model fjThe relevant parameters of (1); model fjThe middle input variable is m in lengthjIf a specific type of fault occurs after the output variable is n times, the model f is processedjAdding the training model set F;
s133: and (3) evaluating the model precision: using a prediction model fjFor test data set TjPredicting the intermediate sequence data and predicting the failure probability
Figure FDA0003093124160000022
Comparing with the actual fault probability y to evaluate the model precision pj
4. The method of claim 3, wherein in step S133, MAE and RMSE are used as model accuracy evaluation indexes, wherein
Figure FDA0003093124160000023
Figure FDA0003093124160000024
5. The method for predicting faults of a computing system according to claim 2, wherein in step S2, the optimal sequence data length generated by offline training is used in a real-time fault prediction process, and the method specifically comprises the following steps:
s21: searching a model: inquiring whether a trained model f related to n exists or not according to the fault prediction time period nwIf the off-line training flow does not exist, waiting for the off-line training flow to be executed; if yes, executing a fault real-time prediction step;
s22: and (3) fault real-time prediction: continuously extracting the length of m from the current latest datawInto the model fwIn the method, the predicted fault probability of various faults of the system after n times is obtained
Figure FDA0003093124160000025
If the system happensIf the probability of the specific fault is not less than the system maintenance probability threshold, the corresponding system maintenance strategy is executed and the step S21 is returned, otherwise, the step is repeatedly executed.
6. The method for predicting a failure in a computing system according to claim 2, wherein in step S3, the updating the model specifically includes the steps of:
s31: updating the real-time data set: extracting length m from latest operation datawAnd n times after each set of sequence data, and the probability y of whether a particular failure has occurred in the systemwUpdating the training data set SwAnd test data set Tw
S32: and (3) real-time evaluation of the model: after the system continuously runs for t time, adopting a prediction model fwFor test data set TwPredicting the middle sequence data and evaluating the model precision pw’;
By using amplification factor x<1, if pw’≥xpwReturning to step S31 to continue updating the data set;
if xp isA<pw’<xpwThen go to step S33;
if p isw’<xpAThen specify the starting sequence search length as mwRe-executing the off-line training process to search for a new optimal sequence length and a prediction model, and returning to the step S31 to continuously update the data set;
s33: updating the model: using the updated test data set and training data set without changing mwOn the premise of adopting the deep learning neural network related to the time series to retrain the model fwThe relevant parameters of (1); returning to step S31 continues updating the data set.
7. The method of predicting a failure in a computing system of claim 3 or 6, wherein the deep learning neural network comprises an LSTM or GRU network.
CN202110601375.3A 2021-05-31 2021-05-31 Computing system fault prediction method based on time sequence data length optimization Active CN113341919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601375.3A CN113341919B (en) 2021-05-31 2021-05-31 Computing system fault prediction method based on time sequence data length optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601375.3A CN113341919B (en) 2021-05-31 2021-05-31 Computing system fault prediction method based on time sequence data length optimization

Publications (2)

Publication Number Publication Date
CN113341919A true CN113341919A (en) 2021-09-03
CN113341919B CN113341919B (en) 2022-11-08

Family

ID=77472832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601375.3A Active CN113341919B (en) 2021-05-31 2021-05-31 Computing system fault prediction method based on time sequence data length optimization

Country Status (1)

Country Link
CN (1) CN113341919B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113985207A (en) * 2021-10-28 2022-01-28 国网北京市电力公司 Method, system and device for monitoring faults of power grid operation equipment and storage medium
CN114002597A (en) * 2021-10-25 2022-02-01 浙江理工大学 Motor fault diagnosis method and system based on GRU network stator current analysis
CN115509789A (en) * 2022-09-30 2022-12-23 中国科学院重庆绿色智能技术研究院 Computing system fault prediction method and system based on component calling analysis

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007108809A (en) * 2005-10-11 2007-04-26 Hitachi Ltd Time-series prediction system, time-series prediction method, and time-series prediction program
CN104252406A (en) * 2013-06-28 2014-12-31 华为技术有限公司 Method and device for processing data
CN104316801A (en) * 2014-10-31 2015-01-28 国家电网公司 Power system fault diagnosis method based on time sequence similarity matching
CN107358311A (en) * 2017-06-07 2017-11-17 西安工业大学 A kind of Time Series Forecasting Methods
CN109325060A (en) * 2018-07-27 2019-02-12 山东大学 A kind of Model of Time Series Streaming method for fast searching based on data characteristics
CN110222329A (en) * 2019-04-22 2019-09-10 平安科技(深圳)有限公司 A kind of Chinese word cutting method and device based on deep learning
US20190370603A1 (en) * 2018-05-29 2019-12-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for establishing an application prediction model, storage medium and terminal
CN110570013A (en) * 2019-08-06 2019-12-13 山东省科学院海洋仪器仪表研究所 Single-station online wave period data prediction diagnosis method
CN110865625A (en) * 2018-08-28 2020-03-06 中国科学院沈阳自动化研究所 Process data anomaly detection method based on time series
CN110889190A (en) * 2018-09-11 2020-03-17 湖南银杏可靠性技术研究所有限公司 Performance degradation modeling data volume optimization method facing prediction precision requirement
CN111273623A (en) * 2020-02-25 2020-06-12 电子科技大学 Fault diagnosis method based on Stacked LSTM
CN111614504A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN111639798A (en) * 2020-05-26 2020-09-08 华青融天(北京)软件股份有限公司 Intelligent prediction model selection method and device
CN111798018A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN111815056A (en) * 2020-07-10 2020-10-23 中国人民解放军空军工程大学 Aircraft external field aircraft fuel system fault prediction method based on flight parameter data
US20200380409A1 (en) * 2019-05-29 2020-12-03 Samsung Sds Co., Ltd. Apparatus and method for analyzing time-series data based on machine learning
US20200387753A1 (en) * 2019-06-10 2020-12-10 International Business Machines Corporation Data slicing for machine learning performance testing and improvement
CN112712166A (en) * 2020-12-31 2021-04-27 深圳前海微众银行股份有限公司 Prediction method and device based on time series

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007108809A (en) * 2005-10-11 2007-04-26 Hitachi Ltd Time-series prediction system, time-series prediction method, and time-series prediction program
CN104252406A (en) * 2013-06-28 2014-12-31 华为技术有限公司 Method and device for processing data
CN104316801A (en) * 2014-10-31 2015-01-28 国家电网公司 Power system fault diagnosis method based on time sequence similarity matching
CN107358311A (en) * 2017-06-07 2017-11-17 西安工业大学 A kind of Time Series Forecasting Methods
US20190370603A1 (en) * 2018-05-29 2019-12-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for establishing an application prediction model, storage medium and terminal
CN109325060A (en) * 2018-07-27 2019-02-12 山东大学 A kind of Model of Time Series Streaming method for fast searching based on data characteristics
CN110865625A (en) * 2018-08-28 2020-03-06 中国科学院沈阳自动化研究所 Process data anomaly detection method based on time series
CN110889190A (en) * 2018-09-11 2020-03-17 湖南银杏可靠性技术研究所有限公司 Performance degradation modeling data volume optimization method facing prediction precision requirement
CN111798018A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN110222329A (en) * 2019-04-22 2019-09-10 平安科技(深圳)有限公司 A kind of Chinese word cutting method and device based on deep learning
US20200380409A1 (en) * 2019-05-29 2020-12-03 Samsung Sds Co., Ltd. Apparatus and method for analyzing time-series data based on machine learning
US20200387753A1 (en) * 2019-06-10 2020-12-10 International Business Machines Corporation Data slicing for machine learning performance testing and improvement
CN110570013A (en) * 2019-08-06 2019-12-13 山东省科学院海洋仪器仪表研究所 Single-station online wave period data prediction diagnosis method
CN111273623A (en) * 2020-02-25 2020-06-12 电子科技大学 Fault diagnosis method based on Stacked LSTM
CN111639798A (en) * 2020-05-26 2020-09-08 华青融天(北京)软件股份有限公司 Intelligent prediction model selection method and device
CN111614504A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN111815056A (en) * 2020-07-10 2020-10-23 中国人民解放军空军工程大学 Aircraft external field aircraft fuel system fault prediction method based on flight parameter data
CN112712166A (en) * 2020-12-31 2021-04-27 深圳前海微众银行股份有限公司 Prediction method and device based on time series

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
LI XR 等: "Discrimination and Prediction of Tool Wear State Based on Gray Theory", 《JOURNAL OF TESTING AND EVALUATION》 *
ROBERT A. SOWAH 等: "Design of Power Distribution Network Fault Data Collector for Fault Detection, Location and Classification using Machine Learning", 《2018 IEEE 7TH INTERNATIONAL CONFERENCE ON ADAPTIVE SCIENCE & TECHNOLOGY (ICAST)》 *
SHENGFANG LU 等: "Automatic Fault Detection of Multiple Targets in Railway Maintenance Based on Time-Scale Normalization", 《IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT》 *
SOUFIANEBELAGOUNE 等: "Deep learning through LSTM classification and regression for transmission line fault detection, diagnosis and location in large-scale multi-machine power systems", 《MEASUREMENT》 *
刘刚: "目标价格视角下主要畜产品价格风险预警研究", 《中国优秀博硕士学位论文全文数据库(博士) 经济与管理科学辑》 *
杨凤: "基于离线时间序列数据的设备突发大故障预测", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
胡姣姣: "基于深度学习的时间序列数据异常检测方法", 《信息与控制》 *
邓伟辉: "时间序列的多粒度智能分析方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 基础科学辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114002597A (en) * 2021-10-25 2022-02-01 浙江理工大学 Motor fault diagnosis method and system based on GRU network stator current analysis
CN113985207A (en) * 2021-10-28 2022-01-28 国网北京市电力公司 Method, system and device for monitoring faults of power grid operation equipment and storage medium
CN115509789A (en) * 2022-09-30 2022-12-23 中国科学院重庆绿色智能技术研究院 Computing system fault prediction method and system based on component calling analysis
CN115509789B (en) * 2022-09-30 2023-08-11 中国科学院重庆绿色智能技术研究院 Method and system for predicting faults of computing system based on component call analysis

Also Published As

Publication number Publication date
CN113341919B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN113341919B (en) Computing system fault prediction method based on time sequence data length optimization
CN109191922B (en) Large-scale four-dimensional track dynamic prediction method and device
CN109117883B (en) SAR image sea ice classification method and system based on long-time memory network
CN109766583A (en) Based on no label, unbalanced, initial value uncertain data aero-engine service life prediction technique
CN112906858A (en) Real-time prediction method for ship motion trail
CN106709588B (en) Prediction model construction method and device and real-time prediction method and device
CN110609524A (en) Industrial equipment residual life prediction model and construction method and application thereof
CN110018453A (en) Intelligent type recognition methods based on aircraft track feature
CN113366473A (en) Method and system for automatic selection of models for time series prediction of data streams
CN111046979A (en) Method and system for discovering badcase based on small sample learning
CN109298633A (en) Chemical production process fault monitoring method based on adaptive piecemeal Non-negative Matrix Factorization
CN115310674A (en) Long-time sequence prediction method based on parallel neural network model LDformer
CN113406623A (en) Target identification method, device and medium based on radar high-resolution range profile
CN114091752A (en) Method for improving time sequence prediction effect of time sequence prediction system
CN117349583A (en) Intelligent detection method and system for low-temperature liquid storage tank
Wang et al. Three‐stage feature selection approach for deep learning‐based RUL prediction methods
CN114510871A (en) Cloud server performance degradation prediction method based on thought evolution and LSTM
Li et al. A lightweight and explainable data-driven scheme for fault detection of aerospace sensors
CN116029379B (en) Method for constructing air target intention recognition model
CN114139589A (en) Fault diagnosis method, device, equipment and computer readable storage medium
Dai et al. Predicting go-around occurrence with input-output hidden Markov model
CN112257893A (en) Complex electromechanical system health state prediction method considering monitoring error
CN115130380A (en) Strategic flight schedule delay distribution prediction method based on machine learning
Hao et al. Ship trajectory anomaly detection based on TCN model
Nivitha et al. An Ensemble Approach for Flight Delay Prediction Through Spatiotemporal Parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant