CN117010263A

CN117010263A - Residual life prediction method based on convolutional neural network and long-term and short-term memory network

Info

Publication number: CN117010263A
Application number: CN202310339141.5A
Authority: CN
Inventors: 赵帅; 陈绍炜; 张菁; 龙云翔; 温鹏飞; 陈雨含; 高萌; 李毅
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-04-01
Filing date: 2023-04-01
Publication date: 2023-11-07

Abstract

The invention discloses a residual life prediction method based on a convolutional neural network and a long-term and short-term memory network, and relates to the field of residual life prediction. The method is mainly divided into two parts, namely data preprocessing and a training model. And in the preprocessing stage, the working condition information identification of the invalid equipment monitoring data is mainly completed, normalization and standardization are carried out according to the working condition category, and sliding window processing is simultaneously carried out on the preprocessed monitoring data and the corresponding RUL to obtain an input sample and an output label. In the model training stage, training samples are mainly input into a CNN-LSTM model for time sequence feature extraction and degradation correlation modeling, prediction RUL of input information is obtained through forward propagation, errors of a true value and a predicted value are calculated, reverse propagation updating is carried out on model parameters by using a loss function, and the process is repeated until the predicted loss value falls to a certain range and tends to be stable. According to the method, dropout and early stopping methods are introduced to reduce negative effects of the over-fitting problem on model prediction performance, so that the accuracy of RUL prediction is improved, and a better solution is provided for modeling the degradation process of RUL prediction.

Description

Residual life prediction method based on convolutional neural network and long-term and short-term memory network

Technical Field

The invention relates to the field of residual life prediction, in particular to the field of residual life prediction application based on a convolutional neural network and a long-term and short-term memory network.

Prior Art

The data-driven residual life (Remaining Useful Life, RUL) based prediction method has become the most widely used prediction classification method due to its powerful data processing capability. However, the existing research mainly completes deep feature extraction and RUL prediction tasks on sensor data of monitoring equipment, but lacks consideration on prediction problems in complex working condition environments, and has important application value considering influences of working conditions on RUL prediction results due to various noise factors existing in actual industrial environments.

Through the current document retrieval, the method has the advantages that the performance of modeling the degradation process and the RUL prediction process of a complex system based on various neural networks under artificial intelligence is excellent, and the powerful nonlinear approximation capability is shown. Most algorithms can only obtain the mapping relation between data and RUL, and the uncertainty factor in the modeling process is not considered. Sbarufatti et al, in Sequential Monte-Carlo sampling based on a committee of artificial neural networks for posterior state estimation and residual lifetime prediction, combine a feed-forward neural network with a Monte Carlo method to apply the RUL probability distribution of a part affected by fatigue cracks, enabling real-time detection of the damaged condition of the part. Yang et al in Remaining Useful Life Prediction Based on a Double-Convolutional Neural Network Architecture propose a model combining two convolutional neural networks (Convolutional Neural Network, CNN) for RUL prediction, a first convolutional network for identifying initial failure points for different components, and a second convolutional network for establishing a reliable mapping between intermediate variables and RUL. The deep neural network can also be used for modeling Health Indexes (HI), a Cox PHM proportional risk model is introduced into the model An integrated deep learning-based approach for automobile maintenance prediction with GIS data by Chen et al to construct HI, and the HI is modeled by combining a plurality of proposed Long Short-Term Memory (LSTM) deep neural network structures, so that the LSTM model is excellent in prediction accuracy. To build the RUL prediction model based on the deep neural network, the key is to accurately build a mapping function of input monitoring data and target health state indexes. In addition, the neural network integrates data characteristic processing and modeling analysis into a network structure, so that RUL prediction is realized end to end, the accuracy of prediction is ensured, and the traditional RUL prediction flow is greatly simplified.

At present, a life prediction method based on deep learning is mainly used for completing deep feature extraction and RUL prediction tasks aiming at monitoring data of equipment, but the prediction problem under a complex working condition environment is freshly researched, and the influence of the working condition on a life prediction result is very necessary and valuable by combining the variability and randomness of the actual running environment of the equipment. Therefore, a RUL method based on deep learning is needed to solve RUL prediction under complex working conditions.

Disclosure of Invention

Aiming at the existing problems, the invention provides an RUL prediction method based on a convolutional neural network and a long-short-term memory network. Firstly, carrying out working condition identification on collected monitoring data by adopting a K-means method to reduce the influence of equipment environment working conditions on model performance, and using a sliding window for generating a three-dimensional sample form of LSTM (LSTM) good treatment in a data preprocessing stage; then adopting CNN to extract depth characteristics of the data subjected to the working condition analysis; and finally, fitting the extracted features by using LSTM to establish a time sequence degradation model, and extrapolating the completed RUL prediction task.

The overall data flow framework of the invention is shown in fig. 1, and is mainly divided into two parts of data preprocessing and training models.

(1) And in the preprocessing stage, the working condition information identification of the invalid equipment monitoring data is mainly completed, normalization and standardization are carried out according to the working condition category, and sliding window processing is simultaneously carried out on the preprocessed monitoring data and the corresponding RUL to obtain an input sample and an output label.

(2) In the model training stage, training samples are mainly input into a CNN-LSTM model for time sequence feature extraction and degradation correlation modeling, prediction RUL of input information is obtained through forward propagation, errors of a true value and a predicted value are calculated, reverse propagation updating is carried out on model parameters by using a loss function, and the process is repeated until the predicted loss value falls to a certain range and tends to be stable.

Step one: selecting a raw data set

And selecting a plurality of sensors to acquire data according to the actual condition of the maintenance equipment, so as to obtain an original data set.

Step two: dataset preprocessing

(1) Data condition information identification

Since the working states of the same equipment in different configurations are not the same, the working condition information of the equipment in different operating states needs to be identified. And randomly selecting a plurality of data points from the original data set as an initial clustering centroid, calling a K-means algorithm to learn the data distribution of the training set, classifying the working condition conditions of the data of the training set and the data of the test set, and adding a new label for representing the working condition information to the data.

(2) Data normalization and normalization process

The running states of the equipment working under different working condition environments are necessarily different, so that the data needs to be normalized and standardized by combining the label obtained in the step 1.

Equation (1) mainly completes the normalization operation of the data, where x _d Representing the original value of the d-th sensor, x _d-max And x _d-min Representing the maximum and minimum values of the sensor in all training sets, x _d Representing the normalized value.

The formula (2) completes the standardization process of the data and retains the statistical distribution information of the data. Wherein u is _d Sum sigma _d Mean and standard deviation of the d-th sensor are shown, respectively.

(3) Time series sample generation

In order to convert the original two-dimensional signal into a three-dimensional matrix in LSTM input data format, a sliding window process is required, and the principle of generating a multi-variable time sample sequence sample using a sliding time window is shown in fig. 2.

Equation (3) represents the sliding window processing, inputIndicating all monitoring data collected by the ith device,/-for>n represents the number of sensors. N (N) _tw Indicating the size of the sliding window, the total information obtained by sliding the window at the kth time point is +.>The window is slid from left to right in step 1 until the last point in time T.

(4) RUL tag generation

In combination with the actual operation of the devices, each device has an initial normal operation, i.e. the RUL is considered to be unchanged during this time. According to the piecewise linear model, as shown in FIG. 3, a threshold is set that indicates the health of the device RUL, and the device is considered to begin degrading when the RUL of the device is below the threshold, and is considered to begin linear degradation when the RUL is below the threshold.

Step three: construction of CNN-LSTM network model

The network structure based on the CNN-LSTM model is shown in fig. 4, an input sample is a two-dimensional matrix, one dimension is the feature quantity, the other dimension is the time sequence length, and the convolution kernels with the shared weight perform sliding feature extraction in step length 1 in the time dimension of the sample matrix, so that 1-dimensional convolution operation is realized. In addition, to ensure that the input and output are the same in size, a feature vector representation is obtained for each time point, the convolved input is filled with 0 in the time dimension, each layer of convolution is followed by an active layer, and in order to avoid information loss, no pooling layer downsampling is added. Feature vectors which fully describe degradation information are obtained through feature screening of 3 layers of CNNs, then time correlation modeling of degradation system health indexes is completed through a 1-layer LSTM network, and a multi-layer perceptron (Multilayer Perceptron, MLP) fits the health state representation vectors obtained through deep neural network extraction. In order to reduce the possibility of occurrence of training over fitting, a Dropout method is introduced, as shown in fig. 5, some neurons are deactivated randomly with a certain probability in the training process, so that the neurons are prevented from participating in network training, and the generalization capability of a model is enhanced.

Step four: model training

(1) Parameter update

In the parameter updating process, a small-batch gradient descent algorithm is adopted, so that the matrix operation with high deep learning efficiency is utilized, and fluctuation possibly caused by using a single sample gradient to update the parameters is avoided. Firstly, calculating gradient of each layer by adopting a chain derivative rule of BP algorithm, and updating weight parameter w of the layer by MBGD algorithm ^(l) 。

In delta ^(l) Represents the gradient of the first layer, η is the learning rate.

When the parameters of the convolution layer are updated, the convolution kernel needs to be complemented with a circle of 0 and then rotated for 180 degrees to obtain gradient errors, and an updated weight matrix W is obtained according to the obtained gradient errors ^l : the specific steps are as shown in the formula (6-10):

α ^(l) ＝f ^l (net ^l ) (6)

net ^(l+1) ＝conv(W ^l+1 ,α ^(l) ) (7)

δ ^(l) ＝δ ^(l+1) rot180(W ^l+1 )⊙f ^l (net ^(l) ) (8)

wherein f ^l (. Cndot.) represents the activation function used after the first convolution layer, W ^l+1 Weight matrix, alpha, representing convolutional layers ^(l) Representing the output of layer l, conv (·) representing the convolution operation, net ^(l) Representing a feature map obtained by a convolution operation, rot180 represents a rotation of 180 degrees.

The early-stop method is adopted to improve the training efficiency of the model, as shown in fig. 6, after the completion of multiple rounds of epoch training, the loss error on the training set is reduced, but the loss on the verification set has an ascending trend, at this time, the model can be considered to have an overfitting phenomenon, and the network training is stopped and the model weight parameter W is saved ^l And (5) completing establishment of a prediction model.

(2) Model parameter optimization

The adaptive optimization algorithm of Adam is adopted to optimize model parameters, the first moment is used for controlling the updating direction of the model, the second moment is used for controlling the learning rate, and compared with other optimization algorithms, the adaptive optimization algorithm of Adam is insensitive to gradient scale and is suitable for optimizing depth models with sparse parameters or high complexity.

Step five: model evaluation

To more directly observe the predictive performance optimization process of the model, a model loss function is constructed based on root mean square error (Root Mean Square Error, RMSE):

wherein n is the number of test units, RUL _{pred_i} And RUL (Rul) _{true_i} The predicted RUL and the real RUL of the i-th test unit are represented, respectively.

Evaluating the model for effectiveness using a Score function, as shown in (12)

Where Score is the total predictive scoring function, n represents the number of test units, d _i Representing the prediction error of the i-th unit.

Step six: RUL prediction

Inputting the test sample into a prediction model, and outputting the RUL prediction results of the final test sample at all times through forward calculation.

The invention provides a RUL prediction method based on CNN-LSTM, which analyzes the characteristics of collected data under complex working conditions, and combines the structure and characteristics of a network to provide a method for preprocessing multi-working condition data. The specific CNN-LSTM network model is carried, the specific step of equipment life prediction is provided, and because of the over-fitting risk in the neural network training, dropout and early-stop methods are introduced to reduce the negative influence of the over-fitting problem on the model prediction performance, the accuracy of RUL prediction is improved, and a better solution is provided for the degradation process modeling of RUL prediction.

Drawings

FIG. 1 is a flow chart of a RUL prediction method based on CNN-LSTM

FIG. 2 is a schematic diagram of a sliding window generation sequence sample

FIG. 3 is a piecewise linearized RUL tag model

FIG. 4 is a diagram of the network structure of CNN-LSTM

FIG. 5 is a dropout schematic diagram

FIG. 6 is a schematic diagram of an early stop method

FIG. 7 is a schematic view of raw degradation trends of 21 sensors of FD002 data set

FIG. 8 is a graph of the result of FD002 engine data clustering

FIG. 9 is a schematic view showing the degradation trend of 21 sensors in the FD002 data set after normalization

FIG. 10 is a diagram showing experimental results of parameters of network structure

FIG. 11 is a graph of experimental results of input sample parameters

FIG. 12 is a FD001 dataset 100 test set engine prediction results

FIG. 13 is a graph of C-MAPSS test subset single engine predictions

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The residual life prediction method based on the convolutional neural network and the long-term memory network provided by the invention is verified by a specific case, and the specific process comprises the following steps:

step 1: selecting a raw data set

The effects of the invention are demonstrated and verified by commercial modular aviation propulsion system simulation (Commercial Modular Aero-Propulsion System Simulation, C-MAPSS) data sets.

In the C-MAPSS data set, 4 subsets are total, specific information is shown in table 1, the data sets FD001 and FD003 are all engine degradation data under a single-working-condition environment, and FD002 and FD004 are multi-working-condition engine degradation data under six different working conditions, so that the performance verification of the proposed CNN-LSTM life prediction method is completed based on the data sets of different working conditions.

TABLE 1C-MAPSS Engine dataset Profile

Each data subset contains 21 sensor data and 3 running operation data, and according to different operation settings, the running condition of the engine can be further refined into 6 kinds, and the monitoring data of the corresponding sensor are influenced. FIG. 7 shows raw degradation data for 21 sensors of one of the engines of the FD002 data set. As can be seen from the graph, due to the influence of the working conditions on the performance of the engine, the raw degradation data of 21 sensors contained in FD002 obviously fluctuates in the whole degradation process, so that further analysis according to the working conditions is required to screen the sensors to extract effective degradation information.

Step 2: data preprocessing

(1) Data condition information identification

The method comprises the steps of firstly, carrying out working condition identification and marking on FD002 and FD004 data sets by using a K-means algorithm so as to carry out standardized processing on monitoring data according to the current running state. Fig. 8 shows a clustering result of the operation information of FD002 on the operation conditions of the device, and it can be seen that the distance interval between the operation information of different conditions is larger, so that a good clustering effect can be achieved.

(2) Data normalization and normalization process

The data was then normalized and a schematic representation of the degradation traces for 21 sensors after normalization is shown in fig. 9. As can be seen from the figure, the individual sensors such as T24, T30, T50 after the normalization process have obvious variation trend, but the effective degradation information still cannot be obtained for the sensors such as T2, P20, P15, etc.

(3) Time series sample generation

And processing the sensor reserved after the preliminary screening by utilizing a sliding window technology and generating a time sequence sample.

(4) RUL tag generation

The threshold for RUL labels is set to 120 according to a piecewise linear function.

Step 3: constructing and training CNN-LSTM network model

The main parameters affecting the prediction performance are: the number of convolutional layers of the CNN, the number of hidden units of the LSTM network, the sequence length of the samples and the training batch size. These parameters are adjusted separately by a controlled single variable method, and in addition, in order to eliminate the influence of the network parameter initialization randomness on the experimental result, 5 repeated experiments are performed on each set of parameter settings and the average value of the evaluation results is taken.

Fig. 10 shows the influence of the number of convolutional neural network layers and the LSTM hidden unit dimension on the model prediction performance, and it can be seen from the graph that as the number of network convolutional layers increases, the model feature extraction capability further increases, the Score value and RMSE value of the scoring function gradually decrease, when the number of convolutional layers reaches 3, the model prediction performance reaches an optimal state in comparison, and then the number of convolutional layers is increased, the complexity of the model increases, the risk of fitting is present, and the model prediction performance has a decreasing trend. The fixed convolution layer number is 3, the experimental result has similar change along with the increase of the LSTM hidden unit dimension, and when the LSTM hidden unit dimension is 64, the model prediction error is reduced to the lowest historical point.

Besides the effect of model prediction caused by the network structure, the size of the training sample is also an important parameter for influencing the prediction performance of the proposed method, in particular the influence on the convergence speed of the model. As can be seen from fig. 11, as the sample sequence length increases, the network requires more time to analyze the input information, and thus the model convergence rate decreases. In addition, as the sequence length increases, the amount of information contained in a single sample increases, the prediction error gradually decreases, and the prediction accuracy gradually increases. In particular, when the sequence length is increased from 20 to 25, the prediction accuracy is greatly improved, so that it can be inferred that the longer sample sequence contains enough historical information to be helpful for predicting the degradation trend. The size of the training sample is also another important factor affecting the prediction performance of the model, and as can be seen from the right graph of fig. 11, as the training batch increases, the speed of processing the same data increases, the gradient descent direction also tends to be stable, and the model convergence speed increases.

In summary, the selected network configuration parameter settings are shown in Table 2

Table 2 network parameter settings

Step 4: RUL prediction results and analysis

According to the life prediction method based on the CNN-LSTM, firstly, a degradation model of equipment is obtained through supervised network training in an offline stage, and secondly, in an online RUL prediction stage, processed data are input into a training storage model, and then RUL can be directly obtained. Fig. 12 shows the result of the prediction of the RUL of the last monitoring point of 100 engines in the FD001 test set, and it can be seen that the predicted RUL is substantially coincident with the real RUL label, and the prediction accuracy of the RUL prediction model established based on the neural network is greatly improved compared with the method based on particle filtering.

One engine is randomly selected from four test sets to conduct RUL prediction, and FIG. 13 shows the prediction RUL of the engines No. 76, no. 190, no. 99 and No. 126 which change along with time, and it is easy to find that the RUL predicted value of the proposed method is basically consistent with the corresponding RUL true mark value change trend during the whole degradation process prediction period of the engines under four different operation conditions, and the error of the predicted value and the true value is smaller. From the figures it can be derived that:

(1) The proposed algorithm gives accurate life estimation on four kinds of equipment in different working conditions, shows stronger model generalization performance, and the estimated RUL is further converged to a true value along with the increase of input information. This is because the device can be considered to be in a normal operation phase at the early stage of monitoring, and thus the predicted value fluctuates around a constant. With the increase of the monitoring time, the equipment enters a degradation stage, the failure trend is further remarkable, at the moment, the RUL predicted value is gradually close to the real label value, and the prediction accuracy of the model is gradually improved. The method can give out the health state evaluation with higher accuracy in the early period of the equipment near the fault, has important guiding significance for the subsequent preparation of predictive maintenance plans, and has very valuable predictive performance in practical industrial environment application.

(2) The complexity of the operating conditions has a large impact on model predictive representation. The RUL prediction results of the engine under different randomly selected working conditions are given in the figure, and from the comparison of the fluctuation of RUL prediction curves, the engines No. 76 and No. 99 of FD001 and FD003 under single working conditions are smaller than the engines No. 190 and No. 126 of FD002 and FD004 under complex working conditions. In particular, in the stage close to failure, the RUL measured value of the simple working condition gradually converges and is basically overlapped with the real label value, and the RUL predicted value of the complex working condition still fluctuates up and down on the real label. The CNN-LSTM algorithm provided by the chapter performs cluster analysis of working conditions before data are input into the depth network model, is beneficial to reducing the influence of working condition noise on the prediction model, and improves the prediction accuracy of the prediction method in an actual scene.

In addition, the model predictive performance was measured according to RMSE and Score, and the results are shown in table 3.

TABLE 3 RUL prediction Performance based on CNN-LSTM

The RUL prediction result can be accurately given out by combining the RUL prediction result and the evaluation given by the two model performance evaluation methods, and the RUL prediction method based on the CNN-LSTM has strong algorithm robustness and stable and effective prediction performance under different working condition environments.

Claims

1. The residual life prediction method based on the convolutional neural network and the long-term and short-term memory network is characterized by comprising the following steps of:

step one: selecting a raw data set

Selecting a plurality of sensors to acquire data according to the actual condition of maintenance equipment, so as to obtain an original data set;

step two: dataset preprocessing

(1) Data condition information identification

Randomly selecting a plurality of data points from an original data set as an initial clustering centroid, calling a K-means algorithm to learn the data distribution of the training set, classifying the working condition conditions of the data of the training set and the data of the test set, and adding a new label for representing the working condition information to the data;

(2) Data normalization and normalization process

The running states of the equipment working under different working condition environments are necessarily different, so that the data are required to be normalized and standardized by combining the label obtained in the step 1;

equation (1) mainly completes the normalization operation of the data, where x _d Representing the original value of the d-th sensor, x _d-max And x _d-min Representing the maximum and minimum values of the sensor in all training sets,representing the normalized value;

the formula (2) completes the standardization process of the data and reserves the statistical distribution information of the data; wherein u is _d Sum sigma _d Mean and standard deviation of the d-th sensor are shown respectively;

(3) Time series sample generation

In order to convert the original two-dimensional signal into a three-dimensional matrix in LSTM input data format, sliding window processing is required;

equation (3) represents the sliding window processing, inputIndicating all of the monitoring data collected by the ith device,n represents the number of sensors; n (N) _tw Indicating the size of the sliding window, the total information obtained by sliding the window at the kth time point is +.>Sliding the window from left to right by step 1 until the last time point T;

(4) Actual RUL tag generation

In combination with the fact that each piece of equipment has an initial normal operating condition, that is to say that the RUL can be considered unchanged during this time; thus, setting a threshold value representing the health of the device RUL according to the piecewise linear model, considering the RUL to be unchanged when the device is operating normally, and considering the device to start to be linearly degraded when the RUL of the device is lower than the threshold value;

step three: construction of CNN-LSTM network model

Inputting a two-dimensional matrix sample, wherein one dimension of the sample is the feature quantity, the other dimension is the time sequence length, and the convolution kernel shared by the weight performs sliding feature extraction on the time dimension of the sample matrix by a step length 1 to realize 1-dimensional convolution operation; in addition, in order to ensure that the input and output are the same in size, a feature vector representation of each time point is obtained, the convolution input is filled with 0 in the time dimension, and each layer of convolution is followed by an activation layer; feature vectors are obtained through feature screening of 3 layers of CNNs, modeling of health indexes of a degradation system is completed through a 1-layer LSTM network, and MLP is selected to fit the extracted health state characterization vectors; to reduce the likelihood of training over-fits occurring, a Dropout method is introduced; in the training process, some neurons are randomly inactivated with a certain probability, so that the neurons are prevented from participating in network training, and the generalization capability of the model is enhanced;

completing establishment of a CNN-LSTM network model;

step four: model training

(1) Parameter update

A small batch gradient descent algorithm is adopted in the parameter updating process; firstly, calculating gradient of each layer by adopting a chain derivative rule of BP algorithm, and updating weight parameter w of the layer by MBGD algorithm ^(l) ；

In delta ^(l) Representing the gradient of the first layer, wherein eta is the learning rate;

when the parameters of the convolution layer are updated, the convolution kernel is complemented with a circle of 0 and then rotated for 180 degrees to obtain gradient errors, and an updated weight parameter matrix W is obtained according to the obtained gradient errors ^l : the specific steps are as shown in the formula (6-10):

α ^(l) ＝f ^l (net ^l ) (6)

net ^(l+1) ＝conv(W ^l+1 ,α ^(l) ) (7)

δ ^(l) ＝δ ^(l+1) rot180(W ^l+1 )⊙f ^l (net ^(l) ) (8)

wherein f ^l (. Cndot.) represents the activation function used after the first convolution layer, W ^l+1 Weight matrix, alpha, representing convolutional layers ^(l) Representing the output of layer l, conv (·) representing the convolution operation, net ^(l) Representing a feature map obtained through convolution operation, rot180 representing rotation by 180 degrees;

the early stop method is adopted to improve the training efficiency of the model, after the completion of multiple rounds of epoch training, the loss error on the training set is reduced, but the loss on the verification set has an ascending trend, at the moment, the model can be considered to have an overfitting phenomenon, and the network training is stopped and the model weight parameter W is saved ^l Completing establishment of a prediction model;

(2) Model parameter optimization

Optimizing model parameters by adopting an Adam self-adaptive optimization algorithm, controlling the updating direction of the model by using a first moment, controlling the learning rate by using a second moment, and being insensitive to gradient scale compared with other optimization algorithms, and being suitable for optimizing a depth model with sparse parameters or high complexity;

step five: model evaluation

wherein n is the number of test units, RUL _{pred_i} And RUL (Rul) _{true_i} Respectively representing the predicted RUL and the real RUL of the ith test unit;

evaluating the model for effectiveness using a Score function, as shown in (12)

Where Score is the total predictive scoring function, n represents the number of test units, d _i Representing the prediction error of the i-th unit;

step six, RUL prediction