CN114880925A

CN114880925A - Equipment life prediction method based on time convolution network and multi-layer self-attention

Info

Publication number: CN114880925A
Application number: CN202210462882.8A
Authority: CN
Inventors: 尚志武; 张宝仁; 李万祥; 高茂生; 张洁; 钱仕琪; 刘虎; 冯泽华
Original assignee: Tianjin Polytechnic University
Current assignee: Tianjin Polytechnic University
Priority date: 2022-04-16
Filing date: 2022-04-16
Publication date: 2022-08-09

Abstract

The invention discloses a device life prediction method based on a time convolution network and multilayer self-attention, which comprises the following steps: collecting various monitoring data of equipment to be detected, and segmenting the data by using a sliding window method to obtain an original input sample; extracting statistical characteristics such as mean values, trend coefficients and the like from each sample, and meanwhile, constructing a depth characteristic extraction module based on a Time Convolution Network (TCN) and multilayer self-attention (MLSA) to extract depth characteristics from an original sample; and finally, constructing a multi-source feature fusion module to fuse the statistical features and the depth features and predict the residual life through a regression layer. The method overcomes the problems that the sense field of the convolutional neural network is limited, gradient explosion and gradient disappearance are easy to occur, weight distribution is carried out on different information in a self-adaptive mode, the contribution of important information to life prediction is enhanced, and depth characteristics and statistical characteristics are fully utilized, so that the residual service life of equipment is accurately predicted, and important references are provided for equipment maintenance personnel.

Description

Equipment life prediction method based on time convolution network and multilayer self-attention

Technical Field

The invention belongs to the field of equipment residual life prediction, and particularly relates to an equipment residual life prediction method based on a Time Convolution Network (TCN) and Multi-layer self-attention (MLSA).

Background

Modern equipment is extremely complex in structure and is composed of a plurality of components and electronic equipment, the catastrophic failure of the system can be caused by the failure of any component, the importance of the reliability of the system is highlighted, the maintenance cost can be reduced and the reliability of the equipment can be improved by implementing a flexible maintenance strategy, the residual service Life (RUL) of the equipment can be predicted, the reaction time can be provided for maintenance personnel, and the maintenance personnel can implement measures by reducing the system downtime and the maintenance cost so as to avoid the catastrophic failure.

With the development of computer science, an artificial intelligence technology provides a new solution for life prediction, wherein deep learning is successfully applied to the field of life prediction due to good automatic feature extraction capability of deep learning, and in the current RUL prediction method based on deep learning, different sensor signals or statistical features extracted from the signals are generally used as the input of a model, so that the feature source is single, the implied degradation information is incomplete, and the RUL prediction accuracy can be reduced. And during the network construction they assume that the input data acquired at different sensors and different times contribute the same to the output, but in a practical situation the monitored data from different sensors and different times contain different degrees of degradation information. If this discrepancy is not recognized and important degradation information is highlighted, the predictive performance of the model will be affected by irrelevant or redundant information, resulting in poor RUL prediction accuracy and generalization of the machine.

Disclosure of Invention

The invention aims to solve the problems and designs an equipment residual life prediction method based on a time convolution network and multi-layer self attention.

The purpose of the invention is realized by the following technical scheme, which comprises the following steps:

1) data acquisition: acquiring sensor detection data of equipment, and eliminating sensors irrelevant to degradation by calculating variance of data acquired by each sensor to establish an initial data set;

2) data preprocessing: normalizing the data of each sensor, and creating an original sample by sliding a window on a time dimension;

3) and (3) extracting statistical characteristics: calculating the mean value and the trend coefficient of the sensor in each original sample to obtain statistical characteristics;

4) depth feature extraction: constructing a depth feature extraction module consisting of a channel attention layer, a TCN (transport communication network) and a time attention layer, and extracting depth features from an original sample;

5) and (3) predicting the residual life: and constructing a life prediction module, inputting the statistical characteristics and the depth characteristics as the life prediction module, comparing the residual life prediction value with the life label of the training set, updating the life prediction model parameters through an Adam optimizer, and finishing model training.

Further, the construction steps of the depth feature extraction module in the step 4) are as follows:

step 4-1), constructing a channel attention layer, and adaptively distributing attention weights to different channels;

step 4-2), constructing a TCN model, wherein the input of the TCN model is the output of the channel attention layer; the TCN is formed by stacking residual blocks, each residual block comprises an expansion cause and effect convolution layer, a Dropout layer and a batch normalization layer, and the input and the output of the residual block are connected by adopting residual connection;

step 4-3), constructing a time attention layer, wherein the input of the time attention layer is the output of TCN, and adaptively distributing attention weights to different time steps through the time attention layer;

and 4-4) constructing a flattening layer, wherein the input of the flattening layer is the output of the time attention layer, and the two-dimensional matrix sample is flattened to one dimension through the flattening layer.

Further, the specific steps of the channel attention layer in the step 4-1) for carrying out weight distribution on different channels are as follows:

4-1-1) the original sample can be represented as

x _t Sensor data representing time t, t _max In order to be the maximum time step size,

x _k，t representing the value of the kth sensor at time t, k _max Is the total number of sensors;

4-1-2) firstly scoring different sensor data at the time t, wherein the specific formula is as follows:

wherein

As scoring functions, such as Sigmoid and Linear functions, W, b are weight matrix and offset vector, respectively, s _t For the scores obtained for the different sensors at time t,

4-1-3) obtaining the scores obtained by different sensors at the time t, and then comparing the score s with the score s through a Softmax function _k，t And carrying out normalization to obtain a corresponding attention weight, wherein the attention weight is as follows:

4-1-4) taking the average value of the weights distributed to the k-th sensors at all the moments to obtain the weights corresponding to the k-th sensors

4-1-5) finally, multiplying the input data x by the weight to obtain processed data:

this is the output of the channel attention layer, where

Further, the specific steps of the time attention layer in the step 4-3) for performing weight allocation on different time step lengths are as follows:

4-3-1) step 4-1-2) the output of the TCN is the input of the temporal attention layer, and the output of the TCN can be expressed as

Data corresponding to the k channel is represented as

4-3-2) first score the data for different time steps:

wherein

s′ _k The fraction is obtained by the data of the kth channel with different time steps;

4-3-3) then score s 'by Softmax function' _k，t Normalized to attention weight β _k，t The formula is as follows:

4-3-4) taking the average value of the weights distributed by all the sensors at the moment t, namely the weight corresponding to the t-th time step:

4-3-5) the output of the temporal attention layer is:

this is a depth characterization of the sensor data, where

Further, the construction steps of the life prediction module in the step) are as follows:

5-1) reducing the dimensions of the depth features and the statistical features by using the full-connection layer, and splicing the two reduced features;

5-2) constructing a feature attention layer, and adaptively distributing attention weight to the spliced features;

5-3) constructing a regression prediction layer, wherein the input of the regression prediction layer is the output of the characteristic attention layer, and the RUL is output through the regression prediction layer.

Further, the specific step of performing weight assignment on the different source features by the feature attention layer in the step 5-2) is as follows:

5-2-1) the depth feature is denoted as D ═ D ₁ ，d ₂ ，...，d _m Great face, and manual feature is marked as H ═ great faceh ₁ ，h ₂ ，...，h _m }. The characteristic of the spliced two is expressed as F ═ D, H ═ D ₁ ，d ₂ ，...，d _m ，h ₁ ，h ₂ ，...，h _m }＝{f ₁ ，f ₂ ，...，f _n H, wherein n is 2 m;

5-2-2) in the feature attention layer, using a self-attention mechanism to perform weight distribution on features of different sources, wherein the calculation flow is as follows:

wherein the content of the first and second substances,

as a scoring function, s _n Is characterized by _n Corresponding fraction, γ _n Is characterized by _n A corresponding attention weight;

5-2-3) multiplying the weight and the feature to obtain a final depth feature:

this is the depth feature representation of the original sample, wherein

Compared with the prior art, the invention has the following beneficial effects:

1. according to the method and the device, the attention weight is adaptively distributed to different channels, time step lengths and characteristics through multilayer self-attention, the difference of contribution of different channels, time step lengths and characteristics to the RUL prediction can be considered, the higher attention weight is distributed to important information according to the difference, the contribution of the importance weight to the RUL prediction is further enhanced, and therefore the accuracy of the service life prediction is improved.

2. According to the invention, the multisource fusion characteristic expression of the original sample is obtained by fusing the depth characteristic and the statistical characteristic, the contained degradation information is richer, and the accuracy of the life prediction model can be enhanced.

Drawings

FIG. 1 is a flow chart of an equipment life prediction method of the present invention;

FIG. 2 is a schematic diagram of the present invention using a sliding window approach to sample capture;

FIG. 3 is a graph of a remaining life tag function employed by the present invention;

fig. 4 is a schematic diagram of the principle of extracting a tendency coefficient in the present invention.

Detailed Description

The invention is described in detail below with reference to the attached drawing figures:

as shown in fig. 1, a method for predicting remaining life of equipment based on a time convolution network and multi-layer self-attention comprises the following steps:

1) data acquisition: collecting monitoring data of a multi-dimensional sensor of equipment to be detected, and eliminating the sensor irrelevant to degradation by calculating variance of the data collected by each sensor to establish an initial data set.

The present case used the NASA turbofan engine data set to perform validation evaluation of the proposed method. The data set is generated by the CMAPSS software, simulates the degradation process of the turbofan engine, and records the process that the engine runs under different operating conditions and the sensor signals change along with time. The data set comprises four sub-data sets which are respectively marked as FD001, FD002, FD003 and FD004, each sub-data set has different operating conditions and failure modes, the sub-data sets comprise three parts, namely a training set, a test set and an RUL, and the test set and the training set are not repeated. Each training set and test set contained 26 columns of data, the first two columns being the engine ID and number of operating cycles, respectively, the last three columns being the three operating parameters of the engine, respectively, altitude, mach number and throttle resolver angle, and the last 21 columns being 21 sensor data, as shown. The training set turbofan engine had complete run-to-failure data, while the test set turbofan engine provided only the previous segment of data in the full life cycle, with the RUL tag, as shown in table 1.

TABLE 1 output of 21 Sensors when Engine is running

Table 2 data set details

As can be seen by calculating the variance of each column of data, some sensors did not change during engine operation, indicating that there was no significant correlation between these sensors and remaining life, and finally, the data for the 1 st, 5 th, 6 th, 10 th, 16 th, 18 th, and 19 th sensors were rejected.

2) Data preprocessing: normalizing the data of each sensor, creating a raw sample by sliding a window over a time dimension, comprising the steps of:

2-1) data normalization: the data dimension of each sensor is different, the size range is also different, in order to eliminate the influence of the data dimension on the prediction result, the data of each sensor is normalized to be in the range of [0, 1] by adopting the minimum and maximum normalization, and the formula is as follows:

where max (x) _k )，min(x _k ) Respectively the maximum value and the minimum value of the kth sensor full life cycle;

2-2) sliding window creates original samples:

the data at different moments have dependency, which is important for the time sequence data processing problem. To capture this dependency, a sequential sample needs to be created by slicing the data along the time dimension data by a sliding window method. As shown in fig. 2, a time window with a length w is used to slide on the normalized sensor sequence, and a sample with a length w is obtained, where the residual lifetime RUL corresponding to the s-th sample is T-w-s, and T is the total period of the device operation. Since the turbofan engine is in a normal operation state, the operation state is stable, and the degradation is not obvious, the RUL is set to a constant value during the previous operation. When a fault occurs, the performance of the engine begins to degrade, and as the fault progresses, the state continuously decreases until the remaining life is reduced to 0, and the engine fails. In the present reference, the RUL label is set as a piecewise linear function, and 130 is taken as a critical value for the degradation start as shown in fig. 3, that is, when the RUL is greater than 130, the engine is considered to be in normal operation and the state is stable, and the sample labels with RUL greater than 130 are all set as 130. And the sample with RUL less than 130 is set as the real RUL corresponding to the sample;

3) and (3) extracting statistical characteristics: calculating the mean value and the trend coefficient of each sensor from the original input sample obtained in the step 2) to obtain statistical characteristics, wherein the mean value can quantitatively express the overall situation of the sensor data in a period of time, and the trend coefficient reflects the degradation speed in a period of time. A regression coefficient between the time step and the sequence data is calculated as a trend coefficient characteristic, and specifically, as shown in fig. 4, a best-fit straight line is obtained using the time step t as the X axis and the sequence data X as the Y axis. The slope of this line is used as the regression coefficient for x. These two statistical features are extracted from the data of each sensor and fused with the depth features by a feature fusion framework for RUL prediction. An example of these two features is shown in fig. 4, where it can be seen that the mean and trend coefficients are increasing over time, better reflecting the properties of the raw sensor data;

4) constructing a depth feature extraction module: the depth feature extraction module consists of a channel attention layer, a TCN (transmission control network) and a time attention layer, and extracts depth features by taking the original input sample obtained in the step 2) as the input of the depth feature extraction module;

5) and (3) life prediction: and (3) using the statistical characteristics and the depth characteristics obtained in the step (3) and the step (4) as the input of a life prediction module, comparing the residual life prediction value output by the model with the life label of a training set, and updating the life prediction model parameters through an Adam optimizer to finish model training.

In the embodiment of the present invention, the Mean Square Error (MSE) is used as the loss function of training, and the calculation formula is:

wherein N is the total number of samples in the training process,

to predict residual life, r _i Is the true remaining life.

The final determination of the hyperparameters of TCN by comparative experiments is shown in Table 3:

TABLE 3 TCN parameter settings

6) And (3) experimental verification:

6-1) constructing an evaluation index: in order to fairly compare the generalization performance of the method provided herein with other methods on a test set, objective and effective performance metrics are required to be evaluated, and a Score Function (Score Function) and a Root Mean Square Error (RMSE) are used to evaluate the prediction effect of RUL, and the evaluation index is calculated as follows:

in the formula (I), the compound is shown in the specification,

to predict residual life, r _i For true remaining life, N is the total number of samples. The smaller the values of Score and RMSE, the better the model predicts.

6-2) prediction and comparison of residual life, in order to verify the effectiveness of the method described in the present application, three sets of comparison experiments were performed:

6-2-1) influence experiment of multi-source characteristics on prediction performance:

the method disclosed by the application uses depth feature and artificial feature fusion for RUL prediction, and in order to verify the effectiveness of multi-source feature fusion for RUL prediction, different features are used as model inputs: use Depth Feature (DF) only for RUL prediction; using only Statistical Features (SF) for RUL prediction; the method (deployed) as referred to herein. The parameters were the same except for the different features used, each model was trained 10 times to eliminate random errors, and the results are shown in table 4.

TABLE 4 results of training models using different features

As can be seen from the table, the RMSE and Score of the feature fusion model were lower on the FD001 and FD002 datasets than the model using single features, validating the validity of multi-source feature fusion for RUL prediction. In other words, by the method provided by the text, the depth feature and the artificial feature are complementary, and the implied degradation information is more comprehensive, which is helpful for improving the accuracy of the RUL prediction.

6-2-2) attention mechanism setting on the predicted performance:

the method of the present application uses a three-layer self-attention mechanism to enhance the characteristics of different channels, different time steps, and different sources, respectively. This experiment was performed to verify the effect of each layer of attention mechanism, and four sets of comparative experiments were performed, respectively: no attention mechanism is used, only one layer of attention mechanism is used, two layers of attention mechanism are used and the Proposed method (deployed). The four models were trained 10 times with the same parameters except for the number of layers used in the attention mechanism. The results of the experiment are shown in table 5:

TABLE 5 Effect of attention settings on predicted Performance

According to the experimental result, the 3 models using the self-attention mechanism are superior to the model without the self-attention mechanism in performance, and the self-adaptive weight distribution performed by the self-attention mechanism can effectively improve the accuracy of model prediction. In addition, with the increase of the number of layers of the attention mechanism, the performance of the model is gradually improved, which shows that the data is enhanced in different dimensions through the self-attention mechanism, and the difference of different characteristics is considered more comprehensively. The method provided by the invention enhances the characteristics of different channels, different time steps and different sources, and has the lowest RMSE and Score, thereby obtaining the best prediction performance.

6-2-3) comparative experiments on the comprehensive properties:

the experiment is used for comparing the effect of the RUL prediction by different algorithms, and compared models comprise shallow machine learning models such as Support Vector Regression (SVR), Decision Tree Regression (DTR) and Random Forest (RF), deep learning algorithms such as Deep Convolutional Neural Network (DCNN) and multi-layer attention convolutional neural network (MA-CNN). The MA-CNN is an algorithm combining a multilayer attention mechanism and a convolutional neural network. Experiments are carried out on four subdata sets of FD001, FD002, FD003 and FD004, each model is trained for 10 times, and except for different algorithms adopted in the experiments, other parameters are consistent. The results of the experiment are shown in tables 6 and 7:

TABLE 6 RMSE Performance of different methods

TABLE 7 Score Performance of different methods

As can be seen from the table, the average RMSE of the three deep learning algorithms is reduced by 13.94% compared to the shallow machine learning RMSE and the average Score is reduced by 70.8%, which indicates that the deep learning algorithms such as CNN perform better than the shallow machine learning algorithms as a whole. The method presented herein shows a 14.19% reduction in RMSE and 68.00% reduction in Score over the other methods, which is overall superior to the other methods, d, indicating the superiority of the method herein in RUL prediction.

In summary, according to the equipment life prediction method based on the time convolution network and the multilayer self-attention, weights are distributed to different time steps and different channels in a self-adaptive manner through the self-attention mechanism, the contribution of important moments and channels to the RUL prediction is strengthened, the depth feature expression is obtained through the TCN, the defect of the CNN is overcome, the fact that the statistical features also contain rich degradation information is considered, a feature fusion framework is provided, the depth feature and the statistical feature prediction RUL are fused, the differences of the contribution of the features of different sources to the RUL prediction are considered, the self-attention mechanism is used for distributing the weights to the features of the different sources in a self-adaptive manner, and therefore high-precision life prediction is achieved.

The technical solutions described above only represent the preferred technical solutions of the present invention, and some possible modifications to some parts of the technical solutions by those skilled in the art all represent the principles of the present invention, and fall within the protection scope of the present invention.

Claims

1. A method for predicting the service life of equipment based on a time convolution network and multi-layer self attention comprises the following steps:

2. The method according to claim 1, wherein in the step 4, the depth feature extraction module is constructed by the specific steps of:

step 4-1: constructing a channel attention layer, and adaptively distributing attention weights to different channels;

step 4-2: constructing a TCN model, wherein the input of the TCN model is the output of a channel attention layer; the TCN is formed by stacking residual blocks, each residual block comprises an expansion cause and effect convolution layer, a Dropout layer and a batch normalization layer, and the input and the output of the residual block are connected by adopting residual connection;

step 4-3: constructing a time attention layer, wherein the input of the time attention layer is the output of TCN, and adaptively distributing attention weights to different time steps through the time attention layer;

step 4-4: and constructing a flattening layer, wherein the input of the flattening layer is the output of the time attention layer, and the two-dimensional matrix sample is flattened to one dimension through the flattening layer.

3. The method according to claim 1, wherein in the step 5), the life prediction module is constructed by the specific steps of:

step 5-1: respectively reducing the dimensions of the depth features and the statistical features by using a full-connection layer, and splicing the two reduced features;

step 5-2: constructing a feature attention layer, and adaptively distributing attention weight to the spliced features;

step 5-3: and constructing a regression prediction layer, wherein the input of the regression prediction layer is the output of the characteristic attention layer, and the output is the residual service life.