CN112561035A

CN112561035A - Fault diagnosis method based on CNN and LSTM depth feature fusion

Info

Publication number: CN112561035A
Application number: CN202011446332.4A
Authority: CN
Inventors: 周福娜; 张志强
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-12-08
Filing date: 2020-12-08
Publication date: 2021-03-26
Anticipated expiration: 2040-12-08
Also published as: CN112561035B

Abstract

The invention discloses a fault diagnosis method based on the depth feature fusion of CNN and LSTM, which is widely applied to the field of fault diagnosis in recent years by deep learning. However, the feature extraction and fault diagnosis by using a single deep learning model face the problems of insufficient data utilization and incomplete feature extraction, thereby affecting the precision of fault diagnosis. Aiming at the problems, a feature fusion mechanism is provided, two different neural networks are used for respectively extracting features of the one-dimensional sequence data and the two-dimensional oscillogram data: the autocorrelation characteristics of the one-dimensional sequence data are extracted using LSTM and the cross-correlation characteristics of the two-dimensional oscillogram data are extracted using CNN. The purpose of complementary fusion of the two neural network characteristics is realized by adding the characteristic fusion network, which is different from the existing simple splicing fusion of the characteristics. Therefore, the data is more fully utilized, the feature extraction is more comprehensive, and the fault diagnosis is more accurate.

Description

Fault diagnosis method based on CNN and LSTM depth feature fusion

Technical Field

The invention relates to the technical field of gearbox fault diagnosis based on deep learning, in particular to a fault diagnosis method based on depth feature fusion of CNN and LSTM, and the fault diagnosis of a gearbox based on deep learning is realized.

Background

With the rapid development of modern industrial technology, the structure of a large-scale automation system is more and more complex, the coupling degree between different parts of production equipment is higher and higher, and a fault occurring at one place can cause the breakdown of the whole system and even cause a catastrophic event. Therefore, accurate and reliable real-time fault diagnosis of mechanical equipment is crucial.

Common fault diagnosis methods are generally classified into three categories: empirical knowledge based methods, analytical model based methods, and data driven based methods. The method based on empirical knowledge and the method based on analytical model are limited by the completeness of prior knowledge and the accuracy of a mathematical model in engineering practice, and the model has poor expandability and has great limitation in fault diagnosis. The method based on data driving is not limited by rich prior knowledge and an established accurate mechanism model, and can carry out fault diagnosis on a complex system only by establishing a fault diagnosis model based on data through a data feature extraction technology, so that the method is widely concerned in recent years. Deep learning is a data driving method and has strong self-adaptive capacity. The method is a multi-level feature learning method, the features of each layer are converted into more abstract high-level features by utilizing nonlinear components, and deep learning is widely concerned by experts in the field of fault diagnosis due to strong feature representation capability. Among various deep learning models, the earliest Convolutional Neural Networks (CNNs) for image recognition have been successfully applied to feature extraction. The unique modeling properties of CNNs help to find local structures or configurable relationships in the observations. In recent years, a failure diagnosis method based on CNN has been widely studied. Although CNN has achieved great success in fault diagnosis, CNN focuses more on local features, ignoring the overall-to-local relationship of the signal. For sequence signals, there is a lack of long-term dependence between the internals. This long-term dependency hidden within the sequence is considered to be a feature that is very helpful for fault diagnosis.

The long-short term memory network (LSTM) is an important branch of the Recurrent Neural Network (RNN), is very suitable for processing the problem highly related to the time series, and can learn the long-term dependence hidden in the time series data. In the field of fault diagnosis and prediction, due to the excellent sequence autocorrelation feature extraction capability, the method is widely concerned by field experts. Although LSTM has been successful in diagnosing time series data faults, its inherent sequential nature makes the LSTM model fail to take into account local features of the data, which results in incomplete feature extraction, inefficient data utilization, and information loss.

Therefore, it is a problem to be studied by those skilled in the art to provide a feature fusion method based on deep learning with a better effect for fault diagnosis.

Disclosure of Invention

The invention provides an online fault diagnosis method based on the fusion of CNN and LSTM characteristics, aiming at the technical problems that the existing training data cannot be utilized to carry out fault diagnosis with maximum efficiency due to insufficient data utilization and incomplete characteristic extraction of the existing fault diagnosis method.

Specifically, the invention realizes the above purpose by the following scheme:

an online fault diagnosis method based on CNN and LSTM feature fusion is characterized by comprising the following steps:

s1, establishing a data set, wherein the data set comprises a training set and a testing set, the training set and the testing set both comprise one-dimensional sequence data and corresponding two-dimensional oscillogram data, and the two-dimensional oscillogram is drawn by the one-dimensional sequence data;

the step S1 includes the steps of:

s1.1, selecting one-dimensional sequence sample data of gear boxes with different fault types, and setting different fault type labels;

s1.2, drawing a corresponding oscillogram by utilizing Matlab on the one-dimensional sequence sample data in the step S1.1 to obtain two-dimensional oscillogram data;

s1.3, dividing the one-dimensional data and the two-dimensional data in the steps S1.1 and S1.2 into a training set and a test set according to a certain proportion;

s2, extracting local cross correlation characteristics and trend characteristics of the two-dimensional oscillogram data in the training set through a Convolutional Neural Network (CNN)F_CNN：

The step S2 includes the steps of:

s2.1, building a convolutional neural network Net according to two-dimensional oscillogram data in the training set_CNNAs shown in equation (7):

[Net_CNN,Tr_CNN]＝Feedforward(θ_CNN；M_CL,M_pool；SIZE_cl,SIZE_pool；X_2D) (7)

where feed forward is a function of generating a neural network, M_CLIs the number of convolutional layers of the CNN network; m_poolIs the number of pooled layers of the CNN network; SIZE (silicon carbide)_clRepresents the convolution kernel size; SIZE (silicon carbide)_poolRepresenting pooled kernel size; theta_CNN＝{W_CNN,b_CNNIs a network parameter, W_CNNIs a weight matrix, b_CNNIs a bias vector; x_2DRepresenting input two-dimensional oscillogram data. Training a CNN network based on two-dimensional waveform image data;

s2.2, extracting feature F of two-dimensional waveform image by using trained convolutional neural network and network parameters_CNN,

F_CNN＝G_CNN(Net_CNN,Tr_CNN,X_2D) (8)

Wherein G is_CNNIs a nonlinear output function, Tr, of a CNN network_CNNRepresenting the trained CNN network model parameters;

s3, extracting autocorrelation characteristics F among sequences of one-dimensional sequence data in the training set through a long-time memory neural network LSTM_LSTM：

The step S3 includes the steps of:

s3.1, building a long-short term memory network Net according to the one-dimensional sequence data in the training set_LSTMAs shown in formula (9):

[Net_LSTM,Tr_LSTM]＝Feedforward(θ_LSTM；H_LSTM；X_1D) (9)

wherein, theta_LSTM＝{W_LSTM,b_LSTMIs a network parameter, W_LSTMIs the moment of weightArray, b_LSTMIs an offset vector, H_LSTMNumber of neurons of the hidden layer, X_1DRepresenting input one-dimensional sequence data; training an LSTM network based on the one-dimensional sequence data;

s3.2, extracting one-dimensional sequence data characteristic F by using trained network structure parameters_LSTM：

F_LSTM＝G_LSTM(Net_LSTM,Tr_LSTM,X_1D) (10)

Wherein G is_LSTMIs a nonlinear output function, Tr, of an LSTM network_LSTMRepresenting the parameters of the well-trained LSTM network model;

s4, extracting local cross-correlation characteristic F of image according to CNN in step S2_CNNAnd the autocorrelation feature F of the LSTM extracted sequence in step S3_LSTMFusing two different types of features through a multilayer fusion network to obtain fused features F_fusion；

The step S4 includes the steps of:

s4.1, extracting the local cross-correlation characteristic F of the image extracted by the CNN in the step S2_CNNAnd the autocorrelation feature F of the LSTM extracted sequence in step S3_LSTMSplicing;

s4.2, establishing a feature fusion network Net_fusionTraining the fusion network parameters to obtain the fused characteristic F_fusion. As shown in equation (6):

F_fusion＝G_fusion(Net_fusion,Tr_fusion,X_1D,X_2D) (11)

wherein G is_fusionIs a non-linear output function of the network, Tr_fusionAre the trained model parameters.

S5, the fusion feature F in the step S4_fusionAs an input to the Softmax classifier, fault diagnosis classification is performed, as shown in equation (7):

result＝Softmax(F_fusion,θ) (12)

wherein result represents the classification accuracy, and theta represents a Softmax network model parameter;

s6, same asTime-adjusted fault diagnosis network Net_CNN、Net_LSTM、Net_fusionSoftmax network parameters;

and S7, inputting all the data in the test set in the step S2 into the network model to obtain the fault diagnosis classification result of the test set, and evaluating the effect of the network model.

Compared with the prior art, the invention has the beneficial effects that: and respectively performing feature extraction on the one-dimensional sequence data and the two-dimensional image data by using two different neural networks of CNN and LSTM: extracting autocorrelation characteristics of the one-dimensional sequence data by using the LSTM, and extracting cross-correlation characteristics of the two-dimensional image data by using the CNN; adjusting the number of output nodes of the two types of neural network feature output layers to enable the two types of extracted data features to have the same structure; by adding a feature fusion network, the cross-correlation features extracted by CNN and the auto-correlation features extracted by LSTM are fused, so that the purpose of complementary fusion of the two network features is realized. Compared with the prior art, the method and the device solve the problem of high misclassification rate caused by the fact that the features extracted by using a CNN or LSTM network model alone are not accurate enough, and the fused features are used for fault diagnosis, so that the data can be more fully utilized, the feature extraction is more comprehensive, and the fault diagnosis is more accurate. The invention can effectively improve the precision of fault diagnosis, has certain promotion effect on further development, popularization and application of fault diagnosis and deep learning, and has practical significance on promoting the progress of industrial production.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a structural diagram of a fault diagnosis method based on the depth feature fusion of CNN and LSTM according to the present invention.

FIG. 2 is a global optimization diagram of the fault diagnosis method based on the depth feature fusion of CNN and LSTM.

Fig. 3 is a graph showing the accuracy of fault diagnosis of LSTM in the case where the sample sequence length is 100 in the experiment.

Fig. 4 is a diagram showing the accuracy of failure diagnosis of CNN in the case where the sample sequence length in the experiment is 100.

Fig. 5 is a fault diagnosis precision diagram based on the depth feature fusion of CNN and LSTM in the present invention under the condition that the sample sequence length in the experiment is 100.

Fig. 6 is a graph showing the accuracy of fault diagnosis of LSTM in the case where the sample sequence length is 400 in the experiment.

Fig. 7 is a diagram showing the accuracy of failure diagnosis of CNN in the case where the sample sequence length in the experiment is 400.

Fig. 8 is a fault diagnosis precision diagram based on the depth feature fusion of CNN and LSTM in the present invention under the condition that the sample sequence length in the experiment is 400.

Fig. 9 is a graph showing the accuracy of fault diagnosis of LSTM in the case where the sample sequence length is 900 in the experiment.

Fig. 10 is a diagram showing the accuracy of failure diagnosis of CNN in the case where the sample sequence length in the experiment is 900.

In the experiment of FIG. 11, the length of the sample sequence is 900, and the fault diagnosis precision graph is based on the depth feature fusion of CNN and LSTM.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

Fig. 1 shows a structure diagram of a fault diagnosis method based on CNN and LSTM depth feature fusion, which includes the following steps:

s1, establishing a data set,

the original gearbox fault data set has 15600 sequence sample data, which comprises six fault types: pitting, tooth breakage, abrasion, point grinding, breakage grinding and normal, and setting fault labels as 1, 2, 3, 4, 5 and 6 respectively, and setting a data set as 10: 3, dividing the ratio into a training set and a testing set, and drawing a corresponding oscillogram of each sample.

S2, extracting local cross correlation characteristics and trend characteristics F of the two-dimensional oscillogram data in the training set through a convolutional neural network CNN according to the two-dimensional oscillogram data in the training set_CNN：

The step S2 includes the steps of:

s2.1, building a convolutional neural network Net according to two-dimensional oscillogram data in the training set_CNNAs shown in equation (13):

[Net_CNN,Tr_CNN]＝Feedforward(θ_CNN；M_CL,M_pool；SIZE_cl,SIZE_pool；X_2D) (13)

F_CNN＝G_CNN(Net_CNN,Tr_CNN,X_2D) (14)

The step S3 includes the steps of:

s3.1, according to a one-dimensional sequence in the training setLong-short term memory network (Net) built by line data_LSTMAs shown in equation (15):

[Net_LSTM,Tr_LSTM]＝Feedforward(θ_LSTM；H_LSTM；X_1D) (15)

wherein, theta_LSTM＝{W_LSTM,b_LSTMIs a network parameter, W_LSTMIs a weight matrix, b_LSTMIs an offset vector, H_LSTMNumber of neurons of the hidden layer, X_1DRepresenting input one-dimensional sequence data; training an LSTM network based on the one-dimensional sequence data;

F_LSTM＝G_LSTM(Net_LSTM,Tr_LSTM,X_1D) (16)

The step S4 includes the steps of:

s4.2, establishing a feature fusion network Net_fusionTraining the fusion network parameters to obtain the fused characteristic F_fusion. As shown in equation (17):

F_fusion＝G_fusion(Net_fusion,Tr_fusion,X_1D,X_2D) (17)

S5, the fusion feature F in the step S4_fusionAs input to the Softmax classifier, fault diagnosis classification is performed, as shown in equation (18):

result＝Softmax(F_fusion,θ) (18)

s6, adjusting fault diagnosis network Net simultaneously_CNN、Net_LSTM、Net_fusionSoftmax network parameters, global optimization is shown in fig. 2. The loss function is shown by the following equation:

with a global Error_global＝{Error_fusion,Error_softmax,Error_PeLSTM,Error_CNNThe loss function J (θ) is minimized on the basis. Error of fusion network_fusionError including LSTM_LSTMError of and CNN_CNNThe relationship can be expressed by the following formula.

Error_fusion＝Error_PeLSTM+Error_CNN

In order to verify the effectiveness and the generalization performance of the invention, the following experiment is carried out by adopting a QPZZ-I experiment platform:

the QPZZ-I type rotating mechanical vibration test platform system is used for simulating gear faults, the QPZZ-I type rotating mechanical vibration test platform system can quickly simulate various states and vibration of a rotating machine, and the gear fault simulation is realized by replacing a defective gear. The faults which can be simulated are pitting corrosion, abrasion, broken tooth, mixed fault pitting abrasion, broken tooth abrasion and the like. In the test, when the rotating speed is 880r/min and 0.05A of current is loaded, the acceleration data of the bearing Y at the side of the output shaft motor is recorded, and six health states of the gear box are selected as follows: pitting, wear, tooth breakage, pitting wear, tooth breakage wear, and normal conditions, the feasibility of the present invention was discussed using gearbox fault data and compared to using only one-dimensional sequence data as input to the LSTM network for diagnostics, and oscillogram data containing vibration signal trend information as input to the CNN for fault diagnostics.

(1) Data pre-processing

Fig. 1 is a specific block diagram of the present invention, which employs a sliding window for data preprocessing. Each sliding window is a sample. The sliding window size is set to 100, 400, 900 respectively, i.e. the number of parameters per sample is 100, 400, 900, and the sliding step size is set to 20. The screen shot size was set to 28 x 28, and each type of fault contained 2000 training samples and 600 test samples. The fail flag settings are shown in table 1.

TABLE 1 Fault Label settings

Type of failure	Label arrangement
		Pitting corrosion	1
Broken tooth	2
		Wear and tear	3
Pitting wear	4
		Wear of broken teeth	5
Normal state	6

(2) Design of experiments

Experimental setup the feasibility of the inventive method CNN-FF-LSTM was discussed using gearbox fault data and a comparative experiment was set up: a. the fault diagnosis is performed using only the screenshot data as input to the CNN. b. And c, performing fault diagnosis by using the feature fusion method CNN-FF-LSTM, wherein specific experimental settings are shown in Table 2. Each set of experiments was compared using the three methods described above.

TABLE 2 Experimental design

(3) Parameter setting

Convolutional Neural Networks (CNN) are a special model of feed-forward neural networks that are better at handling image inputs, especially the associated machine learning problems for large images. Convolutional neural networks generally consist of an input layer, convolutional layer, pooling layer, full-link layer, and output layer.

The long-short term memory network (LSTM) is a special Recurrent Neural Network (RNN), which has a cyclic structure in the network, and each output of the recurrent neural network depends on the previous output. This architecture enables it to model sequential inputs. The LSTM is a recurrent neural network with good effect and has good capability of solving the problem of long-term sequences. LSTM has a similar chain structure as RNN, but its mechanism inside the duplicated modules is different, and its information transfer mainly consists of three gates, namely forgetting gate, input gate and output gate. Specific network parameters for the inventive experiments are shown in table 3.

TABLE 3 values of model parameters

(4) Analysis of Experimental results

The results of the experiments are shown in tables 4-6.

Table 4 fault diagnosis precision table with sequence length of 100

	LSTM	CNN	CNN-FF-LSTM
				Pitting corrosion	83.33％	76.00％	95.00％
Broken tooth	69.17％	61.83％	98.50％
				Wear and tear	92.67％	89.00％	97.50％
Pitting wear	95.67％	94.17％	98.17％
				Wear of broken teeth	82.83％	78.33％	71.00％
Normal state	78.83％	85.00％	97.83％
				Average accuracy	83.75％	80.72％	93.00％

TABLE 5 Fault diagnosis precision Table with 400 sequence Length

	LSTM	WFCNN	CNN-FF-LSTM
				Pitting corrosion	84.50％	85.17％	99.50％
Broken tooth	82.83％	74.50％	99.67％
				Wear and tear	93.00％	93.00％	96.83％
Pitting wear	93.50％	96.33％	98.00％
				Wear of broken teeth	93.50％	85.17％	81.00％
Normal state	93.83％	89.67％	100.00％
				Average accuracy	90.19％	87.31％	95.83％

Table 6 fault diagnosis precision table with sequence length of 900

	LSTM	CNN	CNN-FF-LSTM
				Pitting corrosion	92.33％	93.00％	97.00％
Broken tooth	91.17％	88.17％	98.17％
				Wear and tear	95.67％	94.00％	99.33％
Pitting wear	97.67％	93.50％	98.50％
				Wear of broken teeth	89.50％	92.33％	99.67％
Normal state	89.33％	89.83％	100.00％
				Average accuracy	92.61％	91.81％	98.78％

As can be seen from tables 4, 5 and 6, for the gear fault vibration signal, the LSTM network is slightly better than the CNN network for time series fault diagnosis, but the diagnosis result obtained by the fusion method CNN-FF-LSTM of the invention is greatly better than that of a model used alone.

As can be seen from Table 4, the diagnostic accuracy is lowest when the screenshot is used as the input of the CNN, and after the features extracted by the LSTM are fused, the diagnostic accuracy is improved by 12.28% compared with that when the CNN is used alone for carrying out fault, and the effect is remarkable. And compared with the LSTM, the diagnosis precision is improved by nearly 9.25 percent. The diagnosis result graphs of each model are shown in fig. 3, 4 and 5, the stars represent the fault diagnosis results, the circles represent the real fault types, and the superposition indicates that the diagnosis is correct.

As can be seen from table 5, the accuracy of each model in table 5 is improved relative to table 4, because the sequence length of the training sample is increased, which indicates that the longer the sequence length of the sample, the more fault information is included, and the better the result of fault diagnosis is. In contrast, the diagnosis results of the present invention in table 5 showed an improvement of 8.52% in the progress of diagnosis compared to CNN alone. Compared with the fault diagnosis by singly using the LSTM, the method has the advantages that the diagnosis precision is improved by 5.64 percent, and the effectiveness of the method is verified. The diagnosis result graphs of each model are shown in fig. 6, 7 and 8, the stars represent the fault diagnosis results, the circles represent the real fault types, and the superposition indicates that the diagnosis is correct.

Comparing table 6 with tables 4 and 5, it can be seen that the diagnosis accuracy of each model in table 6 is improved compared with tables 4 and 5, because the length of the corresponding experimental sample sequence in table 6 is the longest, and each sample contains more complete fault information, which also indicates that the length of the sample sequence has a certain influence on the accuracy of fault diagnosis. In table 6, compared with the single use of the CNN model, the fusion method provided by the present invention has the advantage that the diagnosis accuracy is improved by 6.97%. Compared with the LSTM model which is used alone, the diagnosis precision is improved by 6.17%, and the effectiveness of the method is verified. The diagnosis result graphs of each model are shown in fig. 9, fig. 10 and fig. 11, the stars represent the fault diagnosis results, the circles represent the real fault types, and the superposition indicates that the diagnosis is correct.

The invention provides a feature fusion mechanism, which respectively extracts features of one-dimensional sequence data and two-dimensional image data by using two different neural networks: features of the one-dimensional sequence data are extracted using LSTM, and features of the two-dimensional image data are extracted using CNN. By adding the feature fusion layer, the long-term dependence relationship between the local features extracted by the CNN and the sequences extracted by the LSTM is fused, and the purpose of complementary fusion of the two network features is realized, so that the data utilization is more sufficient, the feature extraction is more comprehensive, and the fault diagnosis is more accurate.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An online fault diagnosis method based on CNN and LSTM depth feature fusion is characterized by comprising the following steps:

the step S1 includes the steps of:

The step S2 includes the steps of:

s2.1, building a convolutional neural network Net according to two-dimensional oscillogram data in the training set_CNNAs shown in equation (1):

[Net_CNN,Tr_CNN]＝Feedforward(θ_CNN；M_CL,M_pool；SIZE_cl,SIZE_pool；X_2D) (1)

where feed forward is a function of generating a neural network, M_CLIs the number of convolutional layers of the CNN network; m_poolIs the number of pooled layers of the CNN network; SIZE (silicon carbide)_clRepresents the convolution kernel size; SIZE (silicon carbide)_poolRepresenting pooled kernel size; theta_CNN＝{W_CNN,b_CNNIs a network parameter, W_CNNIs a weight matrix, b_CNNIs a bias vector; x_2DRepresenting input two-dimensional oscillogram data; training a CNN network based on two-dimensional waveform image data;

F_CNN＝G_CNN(Net_CNN,Tr_CNN,X_2D) (2)

The step S3 includes the steps of:

s3.1, building a long-short term memory network Net according to the one-dimensional sequence data in the training set_LSTMAs shown in equation (3):

[Net_LSTM,Tr_LSTM]＝Feedforward(θ_LSTM；H_LSTM；X_1D) (3)

F_LSTM＝G_LSTM(Net_LSTM,Tr_LSTM,X_1D) (4)

The step S4 includes the steps of:

s4.2, establishing a feature fusion network Net_fusionTraining the fusion network parameters to obtain the fused characteristic F_fusion. As shown in equation (5):

F_fusion＝G_fusion(Net_fusion,Tr_fusion,X_1D,X_2D) (5)

S5, the fusion feature F in the step S4_fusionAs an input to the Softmax classifier, fault diagnosis classification is performed, as shown in equation (6):

result＝Softmax(F_fusion,θ) (6)

s6, adjusting network Net simultaneously_CNN、Net_LSTM、Net_fusionNetwork parameters of Softmax;