CN114282443A

CN114282443A - Residual service life prediction method based on MLP-LSTM supervised joint model

Info

Publication number: CN114282443A
Application number: CN202111623573.6A
Authority: CN
Inventors: 张新民; 张雨桐; 李乐清; 朱哲人
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-12-28
Filing date: 2021-12-28
Publication date: 2022-04-05
Anticipated expiration: 2041-12-28
Also published as: CN114282443B

Abstract

The invention discloses a residual service life prediction method based on an MLP-LSTM supervised joint model, which comprises the steps of firstly, carrying out data fusion on multi-dimensional time sequence historical information by using a multi-layer perceptron MLP to extract health index characteristics of a machine; and inputting the extracted health index time sequence into the LSTM, and calculating the current remaining service life RUL of the machine. Further, supervised training is carried out on two serially connected neural networks by using a labeled sample data set to update the weight, the prediction result is evaluated in a verification set, and the parameters are adaptively adjusted to obtain an optimization model. The MLP-LSTM supervised combined model obtained by training not only effectively improves the prediction capability of the LSTM on the residual service life, but also provides a feature fusion result of multi-dimensional sensor data, can effectively express the health condition of the current machine, and provides an effective reference index for equipment maintenance.

Description

Residual service life prediction method based on MLP-LSTM supervised joint model

Technical Field

The invention belongs to the field of industrial process control, and particularly relates to a residual service life prediction method based on an MLP-LSTM supervised joint model.

Background

In the industrial field, the working performance and health of some important machine equipment and industrial components tend to decline when they continuously work due to the influence of internal motion factors or external environmental factors. With the continuous decline of health condition, equipment can't normally work at some time in the future, and its work efficiency drops rapidly even stops the operation, reaches life, and this can lead to industrial process to receive the influence even take place to interrupt. It is therefore necessary to predict the Remaining Useful Life (RUL) of the system during its entire useful life, i.e. the length of time from the current time until the end of the useful life of the machine equipment.

In recent years, with the collection and accumulation of large amounts of industrial data, data-driven solutions have received much attention in RUL prediction. The data-driven solution does not need to know the detailed operation mechanism of the mechanical system, and only needs to identify the condition of the system based on the data collected by the sensor and according to a data-driven algorithm, so that the residual service life of modern factory equipment with a complex mechanism model can be accurately predicted. Traditional predictions are mainly based on physical degradation models, and the correct establishment of degradation models relies heavily on expert knowledge. These assumptions and requirements are very limited in practical industrial production applications. The traditional machine learning method needs manual feature design, which requires a lot of professional knowledge of practitioners and a separate feature extraction process, which increases the difficulty of wide application of the model. Because the deep learning has the capability of extracting the automatic features of the data and less depends on the prior knowledge of the system, the recent research shows that the deep learning can better process industrial big data and more accurately predict the residual service life of mechanical equipment compared with a physical degradation model and a traditional machine learning algorithm.

However, the current research on predicting RUL by deep learning methods still has some problems. Firstly, data fusion and prediction are generally divided into two separate steps, generally, data fusion is firstly carried out to obtain a health index, and then a fusion signal is used for executing an RUL prediction process, and the traditional process results in that an internal relation between two tasks is lacked, and the relation between the fusion signal and a prediction result cannot be explained. Therefore, a common method for predicting the RUL deep learning of the multi-sensor data is to directly obtain end-to-end prediction output. The advantage of this approach is that it is completely data-driven, without the need to assume a degenerate model, parametric distribution and manually extract features. However, this method belongs to the black box model method and cannot provide any information of the performance degradation process. The remaining useful life is a time value whose magnitude is linearly decaying. The physical condition of the machine is not changed linearly but decays exponentially, and the maintenance of the machine depends on the exponential model to a great extent, so that the maintenance is completed before the machine enters a rapid decay period. There is therefore a need to be able to predict and build simultaneously remaining usage time and health indicators based on sensor information.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a residual service life prediction method based on an MLP-LSTM supervised joint model, which can obtain the information of a performance degradation process while judging the current residual service life of a machine and simultaneously improve the prediction effect of a single LSTM neural network.

A residual service life prediction method based on an MLP-LSTM supervised joint model is characterized in that an MLP neural network is added between an input layer and a deep LSTM neural network, and the MLP neural network is used for data fusion; the deep LSTM neural network is used for predicting the residual service life, namely RUL;

the method comprises the following steps:

the method comprises the following steps: collecting equipment data to form a data set, dividing the data set into a training set and a verification set, and preprocessing the data according to different working conditions;

step two: inputting the training set into an MLP neural network, compressing the multi-dimensional sensor characteristics into HI health characteristic indexes by the MLP neural network, and obtaining a plurality of HI time sequences of health indexes;

step three: inputting the health index HI time sequence into a depth LSTM neural network, and calculating by the depth LSTM neural network to obtain an RUL predicted value;

step four: calculating a loss function based on an error between a predicted value and a true value of the RUL, and training an MLP-LSTM supervised joint model by adopting a training set through RMSprop gradient self-adaptation; when the error result obtained after the training set and the verification set are input into the current model is smaller than a certain value or the variation of the error result is smaller than a certain value, the loss function of the model training is converged, the model training is finished, and the MLP-LSTM supervised combined model is stored;

step five: and preprocessing the equipment data to be predicted, and inputting the preprocessed equipment data into a stored MLP-LSTM supervised joint model to obtain HI and RUL values output in real time.

Further, the tagged data set in the first step is:

X_o＝{(x_it，rul_it)|i≤n，t≤T_i} (1)

wherein, rul_itFor the value of the remaining service life at time t,

rul_it＝T_i-t (2)

when the device is completely out of use rul_itIs 0, and all rul_itAre increased in reverse time sequence;

x_itfor the ith sensor data sequence from initial to time t,

x_it＝[x_i(1)，x_i(2)，...，x_i(t)](3) wherein x is_iFrom initial to time T for ith sensor data_iThe sequence of (a) to (b),

x_i＝[x_i(1)，x_i(2)，...，x_i(T_i)] (4)

the preprocessing comprises normalization processing and sliding time window sampling processing; and when the equipment data are data under different working conditions, carrying out condition normalization, otherwise, carrying out global normalization.

Further, in the second step, a multi-sensor information multi-dimensional time sequence is input into the MLP, the MLP compresses the multi-dimensional data into one dimension, and finally a set including a health index HI time sequence is output;

the MLP building and pre-training process is as follows:

inputting a multi-sensor information multi-dimensional time sequence into an MLP neural network, and compressing multi-dimensional data into one dimension by the MLP neural network; in the MLP neural network forward propagation process, each node is obtained by calculating all nodes of the previous layer, a weight W is given to each node of the previous layer, a bias b is added, and finally the value of a certain node of the next layer is obtained through an activation function:

wherein the value of the L +1 layer node j is

The output of the last layer of the MLP neural network is a set H of HI time sequences

H＝{h_i(t_j)|i＝1，2，...，N；1，2，...，T_i} (6)

Wherein H is the health index H at each time point_i(t_j) Set of constituents, h_i(t_j)＝f(x_i(t_j))，f(x_i) The function is a function corresponding to the MLP neural network; t is the length of the time sequence; h represents the health index HI; x is the number of_i(t_j) Represents t_jSet of each sensor data l at a time, x_i(t_j)＝[l_i，1(t_j)，l_i，2(t_j)，...，l_i，p(t_j)]∈R^1×p(ii) a x represents the raw sample, p represents the number of sensors; x is the number of_i(t_j) Is X, X ═ X_i(t_j)|i＝1，2，...，N；1，2，...，T_i}。

Further, the third step is specifically divided into the following sub-steps:

the depth LSTM neural network is formed by stacking a plurality of layers of LSTMs, and the vector dimension of each layer of LSTM is variable; the HI health index is decoded into a multidimensional sensor time sequence through a first layer LSTM, the output of the upper layer of a depth LSTM network is used as the input of the next layer, and the updating formula of the l layer is as follows:

wherein l represents the number of layers of the deep LSTM neural network, t represents the number of units at a certain time of the LSTM,

an input unit indicating the time t of the l-th layer,

a forgetting unit indicating the time t of the l-th layer,

an output unit representing the time t of the l-th layer,

a status cell indicating the time t of the l-th layer,

hidden unit for indicating t-th layer timeσ denotes a sigmoid activation function, an-denotes an element multiplication calculation, tanh denotes a tanh activation function,

representing the hidden unit weight at the time t of layer l-1,

representing the hidden unit weight at time t-1 of the l-th layer,

indicating a deviation;

and outputting the multi-dimensional characteristic vector by the last unit of the LSTM neural network of the last layer, and obtaining the RUL predicted value through linear layer calculation.

Further, the step four is specifically divided into the following sub-steps:

(1) the input layer of the deep LSTM neural network is the l-th layer of the MLP-LSTM supervised joint model and comprises n neurons, and the output layer of the MLP neural network is the l-1-th layer of the MLP-LSTM supervised joint model and only has one neuron; designing neuron error delta of l layer and l-1 layer of MLP-LSTM supervised joint model^l、δ^l-1The MLP neural network and the deep LSTM neural network are used for realizing synchronous training of the MLP neural network and the deep LSTM neural network;

δ^l＝(w^l+1)^Tδ^l+1 (12)

wherein w and B are weight parameters and batch sizes of the neural network, respectively;

(2) in the supervised joint training, a square error loss function constrained by L2 regularization is applied to carry out gradient self-adaptive training parameters, a score function is adopted to evaluate the prediction accuracy of the MLP-LSTM supervised joint model, the score function is added into a global loss function with certain weight as punishment, the square error loss function is optimized, and the MLP-LSTM supervised joint model biased to early prediction is obtained:

wherein, the square error loss function calculation formula is as follows:

wherein the values of Θ, w, B, λ,

and y_iRespectively representing a parameter set learned in the MLP-LSTM supervised joint model, a weight parameter set in the MLP-LSTM supervised joint model, a batch size, a regularization parameter, a predicted RUL and a true RUL of an ith sample;

the scoring function Score is calculated as follows:

d＝RUL_pred-RUL_true (16)

the global Loss function Loss_totalThe calculation formula of (a) is as follows:

Loss_total＝αLoss_score+(1-α)Loss_MSE (17)

wherein α is the weight of the two scoring functions;

(3) training the MLP-LSTM supervised joint model by adopting a labeled training set through RMSprop gradient self-adaptation, wherein the calculation formula is as follows:

where r is an accumulated variable of the history gradient, ρ is a contraction coefficient for controlling acquisition of history information, η is a learning rate, δ is a constant, and g is Loss_totalOf the gradient of (c).

The invention has the following beneficial effects:

the invention provides a general RUL semi-supervised joint prediction framework method aiming at analyzing the conventional RUL prediction method in the running degradation process of machine equipment and combining a deep learning theory and the problems existing in the previous research, thereby realizing the synchronous RUL prediction of health index data fusion and multi-sensor data. The training model provides a continuous visualization process of system degradation, but also ensures efficient prediction of the generated fusion signal for RUL and rapid convergence of the predictive model training process. Furthermore, a loss function of the RUL life prediction model is modified, so that the trained model is more biased to early prediction, the prediction result can ensure the maintenance to be advanced, and the prediction model is safer.

Drawings

FIG. 1 is a schematic diagram of an MLP-LSTM neural network model;

FIG. 2 is a flow diagram of a joint model framework training implementation;

FIG. 3 is a diagram illustrating results generated after different normalization strategies;

FIG. 4 is a graphical (raw) plot of the data fusion output health indicator HI over time, wherein the upper graph in FIG. 4 represents the HI output for all turbine data in the test set, and the lower graph represents the HI output for selected test set partial sample data.

FIG. 5 is a graphical representation of a time-varying (filtered) plot of data fusion output health indicator HI, where the upper graph in FIG. 5 represents the HI output for all turbine data in a test set, and the lower graph represents the HI output for selected test set partial sample data.

FIG. 6 is a schematic diagram of a fitting curve of the MLP-LSTM neural network model.

FIG. 7 is a schematic diagram of an on-line prediction RUL fitting curve.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and preferred embodiments, and the objects and effects of the present invention will become more apparent, it being understood that the specific embodiments described herein are merely illustrative of the present invention and are not intended to limit the present invention.

In the method for predicting the remaining service life based on the MLP-LSTM supervised joint model, the MLP-LSTM supervised joint model is realized by adding a DBN neural network between an input layer and a deep LSTM neural network, and the MLP neural network is used for data fusion; the deep LSTM neural network is used for residual life prediction, i.e., RUL prediction. And inputting the data slice sequence into an MLP neural network of an MLP-LSTM supervised joint model to calculate the HI time sequence of a single dimension. The last layer of the MLP neural network is connected with a deep LSTM network model, namely, the HI sequence fused by the MLP neural network data is sequentially input into the deep LSTM network for calculation according to the time _ step length. And after a complete MLP-LSTM network joint prediction model is obtained, further performing gradient descent iteration by using an RMSprop optimizer. The concrete structure is shown in figure 1. In addition, because the MLP neural network can deeply extract data characteristics, the overfitting problem can easily occur in the training process, the network is optimized by adopting a batch standardized network layer, and a regularization item is added to the MLP neural network to solve the problem. Inputting the data of the test set, outputting the change process of the health index HI in the MLP network, and outputting the predicted value of the RUL in the final network layer of the whole model. In the concrete engineering implementation, the network layers in the two models are connected in the same model, and an RMSprop optimizer is used for iterative joint training. The entire training process proceeds as shown in fig. 2.

The method comprises the following steps:

the method comprises the following steps: collecting equipment data to form a data set, dividing the data set into a training set and a verification set, and preprocessing the data according to different working conditions; wherein the data of the labeled training set comprises a time stamp of time sequence data, a value of each characteristic variable at each moment, and an RUL data label or equipment life end time for calculating the RUL label; the content of the labeled verification set is the same as that of the labeled training set, and the number of the labeled verification set is 10% -30% of that of the labeled training set.

The tagged data set in the step one is as follows:

X_o＝{(x_it，rul_it)|i≤n，t≤T_i} (1)

wherein, rul_itFor the value of the remaining service life at time t,

rul_it＝T_i-t (2)

x_itfor the ith sensor data sequence from initial to time t,

x_it＝[x_i(1)，x_i(2)，...，x_i(t)] (3)

wherein x is_iFrom initial to time T for ith sensor data_iThe sequence of (a) to (b),

x_i＝[x_i(1)，x_i(2)，...，x_i(T_i)] (4)

The LSTM recurrent neural network has standard input forms (batch _ size, time _ steps, feature _ nums), where batch _ size refers to the number of samples that are processed in a batch during the training of the neural network model, time _ steps refers to the time step of the time series data in each sample, and feature _ nums refers to the number of dimensions of features in the multi-sensor data. In order to further process the data set into a standard pattern, a sliding time window method is used for sample sampling.

in the second step, a multi-sensor information multi-dimensional time sequence is input into the MLP, the MLP compresses the multi-dimensional data into one dimension, and finally a set comprising a health index HI time sequence is output;

the MLP building and pre-training process is as follows:

wherein the value of the L +1 layer node j is

H＝{h_i(t_j)|i＝1，2，...，N；1，2，...，T_i} (6)

the third step is specifically divided into the following substeps:

the depth LSTM network is formed by stacking a plurality of layers of LSTMs, and the vector dimension of each layer of LSTM is variable; the HI health index is decoded into a multidimensional sensor time sequence through a first layer LSTM, the output of the upper layer of a depth LSTM network is used as the input of the next layer, and the updating formula of the l layer is as follows:

an input unit indicating the time t of the l-th layer,

a forgetting unit indicating the time t of the l-th layer,

an output unit representing the time t of the l-th layer,

a status cell indicating the time t of the l-th layer,

indicating a hidden unit at the moment of the ith layer t,. sigma.indicating a sigmoid activation function,. alpha.indicating an element multiplication calculation,. tanh indicating a tanh activation function,

denotes the l-hidden unit weights at layer 1 time t,

representing the hidden unit weight at time t-1 of the l-th layer,

indicating a deviation;

the fourth step is specifically divided into the following substeps:

δ^l＝(w^l+1)^Tδ^l+1 (12)

wherein, the square error loss function calculation formula is as follows:

wherein the values of Θ, w, B, λ,

the scoring function Score is calculated as follows:

d＝RUL_pred-RUL_true (16)

Loss_total＝αLoss_score+(1-α)Loss_MSE (17)

wherein α is the weight of the two scoring functions;

where r is an accumulated variable of the history gradient, ρ is a contraction coefficient for controlling acquisition of history information, η is a learning rate, and δ is oneA constant, g is Loss_totalOf the gradient of (c).

The usefulness of the present invention is illustrated below with reference to a specific industrial example. The invention adopts an open source turbofan engine degradation simulation data set C-MAPSS provided by NASA as an example, the data specifically comprises four sub data sets FD001-FD004 with different operating conditions and failure modes, and each data set comprises three files, namely train _ FD00X, test _ FD00X and RUL _ FD00X, which are respectively RUL truth labels of a training set, a test set and a test set. The details are shown in the following table:

table 1: details of C-MAPSS dataset

The data set FD002 is mainly used as a research object, compared with FD001 or FD003, the multi-sensor data of the data set has 6 working conditions, the complexity of the external environment is higher, more data are available, and the RUL is more difficult to predict theoretically. The specific meanings of the various dimensions of the sensor are shown in the following table:

table 2: multi-sensor data specific representation of a turbomachine

After data are obtained, the data set is divided to obtain a label-free training set, a label training set and a label verification set, the original data are subjected to condition normalization processing according to 6 working conditions of the data set, and then sliding window processing is carried out to obtain a data slice sequence. The operating conditions of the turbine have a great influence on the sensor values, with the readings of the sensor in different states lying in completely different value ranges. The global normalization ignores the influence of working condition conditions, and all values of each sensor are normalized simultaneously. And the condition normalization is to normalize the data of each sensor under the same working condition. As shown in fig. 3, the effect of a certain turbine unit sensor 4, 7, respectively, after processing under different normalization strategies. If global normalization is used, although the prediction accuracy of the RUL is not influenced, the output HI of the data fusion model is caused to be a global normalization variable, and the degradation trend is difficult to present. Therefore, a conditional normalization strategy is used in the data preprocessing to obtain the health index HI of the degradation trend.

The training data set and the test data set each had 100 units of turbomachinery. A sliding window of size num steps is used in each subset of units to generate the input sequence. For the model structure itself, it is mainly influenced by two hyper-parameters: batch size and num steps, which are set to different values and trained for different LSTM models in order to compare the effects of different hyper-parameters. The appropriate parameters are then selected based on the Score obtained for each model in the test set.

And adjusting regularization parameters of Dropout and the BN network according to the change of train _ loss and val _ loss and the stable condition, wherein over-fitting problems can be caused if regularization is too small, and model accuracy can be influenced if regularization is too large. In order to prevent overfitting and reduce the training time cost, an early-stop strategy is used, a threshold value of loss reduction change is set, and training is stopped when the change does not exceed the threshold value in n continuous periods. An appropriate threshold parameter may be set to implement the early-stop strategy as required by the accuracy of the model results.

Based on the built joint training neural network, the output of the multi-sensor data fusion model, namely the time sequence of the health index HI, can be obtained at the middle network layer. The health indicator decay process curve for each turbine is shown in FIG. 4. The HI time series are further filtered using a Savitzky-Golay filter. The method is a method for realizing smooth filtering in a time domain by combining convolution and local polynomial regression. The filter is characterized in that the shape and the width of a signal can be kept unchanged while noise is filtered. The filtered HI time series curve is shown in fig. 5.

Based on the C-MAPSS data set FD002 sub-data set, RUL prediction is performed on the test set by using the neural network model, and the RUL prediction result of the deep learning model on the multi-sensor time series of 100 turbines in the test set can be obtained as shown in FIG. 6. It was found that the MLP-LSTM joint model better predicts the RUL for each turbine based on the current multi-dimensional sensor time series. And further taking each time series slice as the state of one turbine unit, namely taking the historical information of the turbine at each moment as input, and realizing online real-time RUL prediction. The predicted results for 4 of these turbines are shown in figure 7. The predicted results are shown in table 3 for comparison with other methods:

table 3: comparison result of each RUL prediction algorithm model under RMSE and SCORE indexes

Methods	RMSE	Score
			MLP	80.03	7.80×10⁶
SVR	42.00	1.74×10⁴
			RVR	31.30	5.90×10⁵
CNN	30.29	1.36×10⁴
			Depth LSTM	14.93	465
MLP-LSTM	12.74	335

As can be seen from table 3, the LSTM neural network exhibits better prediction effect on the RUL prediction application than the conventional machine learning algorithm and other deep learning algorithms. The MLP-LSTM combined neural network training model provided by the research can provide a health index HI for data fusion. Due to the more complex neural network structure, the deep extraction of model features can be realized, the prediction error is smaller, and the predicted value is more accurate. While having an order of magnitude lower error on the Score evaluation system.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and although the invention has been described in detail with reference to the foregoing examples, it will be apparent to those skilled in the art that various changes in the form and details of the embodiments may be made and equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.

Claims

1. A residual service life prediction method based on an MLP-LSTM supervised joint model is characterized in that the MLP-LSTM supervised joint model is that an MLP neural network is added between an input layer and a deep LSTM neural network, and the MLP neural network is used for data fusion; the deep LSTM neural network is used for predicting the residual service life, namely RUL;

the method comprises the following steps:

2. The MLP-LSTM supervised joint model-based remaining service life prediction method as recited in claim 1, wherein the labeled data set in the first step is:

X_o＝{(x_it，rul_it)|i≤n，t≤T_i} (1)

wherein, rul_itFor the value of the remaining service life at time t,

rul_it＝T_i-t (2)

x_itfor the ith sensor data sequence from initial to time t,

x_it＝[x_i(1)，x_i(2)，...，x_i(t)] (3)

x_i＝[x_i(1)，x_i(2)，...，x_i(T_i)] (4)

3. The MLP-LSTM supervised joint model-based residual service life prediction method as recited in claim 1, wherein in the second step, a multi-sensor information multi-dimensional time series is input into the MLP, the MLP compresses the multi-dimensional data into one dimension, and finally, a set comprising a health index HI time series is output;

the MLP building and pre-training process is as follows:

wherein the value of the L +1 layer node j is

The output of the last layer of the MLP neural network is an aggregate slice of HI time sequences

H＝{h_i(t_j)|i＝1，2，...，N；1，2，...，T_i} (6)

4. The MLP-LSTM supervised joint model-based remaining service life prediction method as recited in claim 1, wherein the step III is specifically divided into the following sub-steps:

an input unit indicating the time t of the l-th layer,

a forgetting unit indicating the time t of the l-th layer,

an output unit representing the time t of the l-th layer,

a status cell indicating the time t of the l-th layer,

representing the hidden unit weight at the time t of layer l-1,

representing the hidden unit weight at time t-1 of the l-th layer,

indicating a deviation;

5. The MLP-LSTM supervised joint model-based remaining service life prediction method as recited in claim 1, wherein the step four is specifically divided into the following sub-steps:

δ^l＝(w^l+1)^Tδ^l+1 (12)

wherein, the square error loss function calculation formula is as follows:

wherein the values of Θ, w, B, λ,

the scoring function Score is calculated as follows:

d＝RUL_pred-RUL_true (16)

Loss_total＝αLoss_score+(1-α)Loss_MSE (17)

wherein α is the weight of the two scoring functions;