CN117235490A

CN117235490A - Fault self-adaptive diagnosis method integrating deep volume and self-attention network

Info

Publication number: CN117235490A
Application number: CN202310101522.XA
Authority: CN
Inventors: 孙磊; 王有杰; 梁中婷
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2023-02-10
Filing date: 2023-02-10
Publication date: 2023-12-15

Abstract

The invention discloses a fault self-adaptive diagnosis method integrating deep convolution and a self-attention network, which relates to the field of fault diagnosis, and comprises the steps of firstly constructing a multi-scale time-frequency characteristic diagram and a global statistical characteristic matrix of vibration signals by utilizing wavelet packet transformation; then designing a depth feature extraction network combining a residual error network ResNet and a SAM network to realize fusion extraction of local and global time-frequency features; finally, constructing a joint loss function to optimize a depth feature extraction network through the combination of the multi-core maximum average difference MK-MMD and the domain antagonistic neural network DANN, thereby improving the cross-domain invariance and fault state discrimination capability of the depth feature; the optimization method provided by the invention fully plays the advantages of high-dimensional space distribution difference evaluation and gradient inversion countermeasure strategies, proves the effectiveness of the method through the transition fault diagnosis task of the variable working conditions, and shows superior performance compared with other intelligent fault diagnosis methods.

Description

Fault self-adaptive diagnosis method integrating deep volume and self-attention network

Technical Field

The invention relates to the field of fault diagnosis, in particular to a fault self-adaptive diagnosis method integrating deep convolution and a self-attention network.

Background

Rotary machines play an important role in various industries, such as industrial manufacturing, aerospace, and transportation. Rolling bearings are also critical components of rotating machinery, and are inevitably damaged by high-speed operation in a complex working environment for a long time, resulting in accidents and reduced efficiency. Therefore, fault diagnosis for the rolling bearing is of great importance to ensure safe operation of the rotary machine. The traditional bearing fault diagnosis method is mainly based on time-frequency analysis and machine learning of vibration signals. However, conventional machine learning methods often require selection of appropriate features based on expert experience knowledge, and selection of features directly affects the accuracy of bearing fault diagnosis. In addition, the above method is light in structure, and it is difficult to learn efficient feature representation and nonlinear mapping relation in a complex system.

Deep learning has gained great attention in the field of fault diagnosis in recent years due to its powerful automatic depth feature extraction capability and end-to-end training mode integrated with a classifier, as compared to conventional machine learning. Meanwhile, considering the nonlinear and non-stationary characteristics of the vibration signal, some students firstly perform time-frequency processing on the vibration signal, and then perform feature extraction on the time-frequency signal by using a deep learning method. Compared with the extraction of the local characteristics of the signals by the convolution network, the self-attention mechanism has strong global characteristic extraction capability, and can be used for more effectively extracting the fault characteristics of the vibration signals by combining deep learning.

Fault diagnosis methods using deep learning typically require a large amount of labeled training data and assume that the training data set and the test data set have similar distributions. However, it is difficult to ensure high distribution similarity of the sample data sets due to the constantly changing operating conditions of the device (such as mechanical load and speed). The domain distribution adaptation method solves the problem of variable working condition migration by searching a similar distribution space of a source domain and a target domain, and becomes an important tool in the field of fault diagnosis. However, there are still the following problems to be further studied:

(1) The time-frequency information of the vibration signals must be organized more reasonably to improve the fault feature extraction capability of the deep learning network.

(2) The common deep convolution network has good performance in the aspect of local feature extraction of signals, and ignores the influence of global time-frequency features. Thus, the processing of global time-frequency features requires further investigation.

(3) The domain invariant features must be extracted more effectively to improve the domain adaptation capability of the deep learning network in the diagnostic model.

Disclosure of Invention

The invention aims to solve the technical problem of meeting the current requirements of fault diagnosis of rotary mechanical equipment in actual industrial production, ensuring that the rotary machinery can run safely and reliably, reducing the risk of safety accidents caused by faults, and providing a rolling bearing fault self-adaptive diagnosis method integrating deep rolling and self-attention network, which meets the high-precision fault classification requirements of a fault diagnosis model of the rotary mechanical equipment.

The invention adopts the following technical scheme for solving the technical problems:

a fault self-adaptive diagnosis method integrating deep volume and self-attention network specifically comprises the following steps;

step 1, performing time-frequency analysis processing on a rolling bearing vibration signal by utilizing wavelet packet transformation WPT, and establishing a multi-scale time-frequency characteristic diagram MTFFM and a global statistical characteristic matrix GSFM based on the WPT;

step 2, extracting a network LRSAN by fusing the depth features of the lightweight network LRN and the self-attention SAM, wherein the network LRSAN is used for realizing the fusion extraction of local and global time-frequency features;

and 3, constructing a joint loss function through the multi-core maximum average difference MK-MMD and the domain antagonistic neural network DANN to optimize the depth feature extraction network, so as to reduce the difference of the distribution of the depth feature domains under different working conditions and improve the cross-domain diagnosis capability of the depth migration learning model LRSADTLM.

As a further preferable scheme of the fault self-adaptive diagnosis method integrating the deep convolution and the self-attention network, in the step 1, vibration signal time-frequency analysis and processing are specifically as follows:

performing time-frequency analysis processing on the vibration signal by adopting WPT, constructing a multi-scale time-frequency feature map MTFFM, and improving the local fault feature extraction capability of the LRN; constructing a feature matrix GSFM by adopting time-frequency statistical features of the vibration signals, and extracting global time-frequency features of the vibration signals;

a multi-scale time-frequency characteristic diagram MTFFM;

decomposing the vibration signal sample through four layers of WPT to obtain small leaf nodes of 16 wavelet packets; each wavelet packet node has 256 wavelet packet coefficients; the wavelet packet coefficient of each leaflet node is constructed as a 16 multiplied by 16 coefficient matrix; reordering the 16 coefficient matrixes through Z-shaped stitching to obtain a 64 multiplied by 64 multi-scale time-frequency characteristic map MTFFM;

global statistical feature matrix GSFM:

carrying out WPT reconstruction on the 16 wavelet packet leaf nodes to obtain 16 reconstructed branch signals; simultaneously calculating envelope spectrums of 16 branch signals; calculating seven statistical characteristics and envelope spectrums of each branch signal, and setting the seven statistical characteristics and envelope spectrums as global statistical characteristics of vibration signals;

specifically, a certain characteristic, such as amplitude, of the 16 branch signals is packed into a 16-dimensional vector, and a statistical characteristic matrix A E R is obtained ^7*16 The method comprises the steps of carrying out a first treatment on the surface of the Obtaining the statistical characteristic matrix B epsilon R of 16 envelope spectrums ^7*16 The method comprises the steps of carrying out a first treatment on the surface of the The matrix A and the matrix B are connected in series to obtain GSFM epsilon R ^14*16 As input to the SAM network.

As a further preferable scheme of the fault self-adaptive diagnosis method fusing deep volume and self-attention network of the present invention, in step 2, the deep fusion feature extraction network LRSAN specifically comprises the following steps:

the local feature extraction capability of CNN makes it difficult to grasp the internal relationship between the overall trend and local representation of the multi-scale time-frequency signal of the vibration signal;

the feature extraction network consists of an LRN module and a SAM module, and the local time-frequency feature based on the LRN and the global statistical feature based on the SAM are respectively fused;

1) Lightweight ResNet LRN

Excitation response of defects of rotating machinery is often expressed in vibration signals with different time-frequency periods; the state characteristics of the rolling bearing are scattered in the MTFFM, and the image size of the MTFFM is small; in an LRN network, a convolution kernel of 3×3 is used, the convolution step size is set to 1, the padding value is set to 1 or 0, only three residual structures are used, and the pooling operation of the middle layer is canceled;

2) Self-care mechanism SAM network:

setting an input sequence X of a SAM network _a ＝[x ¹ ,x ² ,...,x ^N ]∈R ^L*di Where L is the length of the sequence and di is the size of the elements in the sequence;

input sequence X _a By a trainable linear projection W ^P ∈R ^di*dm Is embedded as a sequenceThen in sequence X _p Adding a randomly initialized x to the header of (a) ₀ Also included is position code E _pos ∈R ^(L+1)*dm In (3), a step of; the resulting input insertion sequence->

The SAM network has four layers, each layer is composed of a multi-head self-attention MSA module, a feedforward layer and a residual error normalization layer;

given input embedding X ε R ^(L+1)*dm Output sequence O epsilon R of MSA layer ^(L+1)*dm Representing a sequence of features weighted by the attention moment array; the calculation formula of the self-care mechanism is as follows:

O＝MSA(X)＝Concat(head ₁ ,...,head _h )W ^O

wherein head _i Represents the ith attention head, W _i ^Q ∈R ^dm*dm 、W _i ^K ∈R ^dm*dm And W is _i ^V ∈R ^dm*dm Representing a linear projection of the ith attention head of the input insert X, obtaining different queries, keys and values, W, respectively ^O ∈R ^h·dm*dm Refers to linear transformation of multiple heads;

according to the input GSFM epsilon R ^14*16 In Table 3, the present invention sets the input embedded layer, position code and MSA network structure parameters, while the feed-forward layer with two fully connected layers is set to FC1 ε R ^256*128 And FC2 ε R ^128*256 The method comprises the steps of carrying out a first treatment on the surface of the Through the SAM network, the global time-frequency characteristic of the vibration signal is extracted.

As a further preferable scheme of the fault self-adaptive diagnosis method fusing deep volume and self-attention network, in step 3, a model optimization strategy based on MK-MMD and DANN is specifically as follows:

the depth migration learning model LRSADTLM is composed of a depth feature extractor G _LRSAN A state classification network G _y And a domain discrimination network G _d Composition; to optimize the model, three loss functions are constructed, namely the classification loss L of the source domain _y And MK-MMD distribution difference loss L between source domain and target domain _MK-MMD And discriminating loss L _d The method comprises the steps of carrying out a first treatment on the surface of the Network parameters are updated to improve the performance of migration diagnostic tasks by minimizing joint loss functions.

As a further preferable scheme of the fault self-adaptive diagnosis method fusing deep volume and self-attention network, the classification loss L of the source domain _y And MK-MMD distribution difference loss L between source domain and target domain _MK-MMD And discriminating loss L _d The method is characterized by comprising the following steps:

L _MK-MMD : under different working conditions, the similarity of data distribution is poorResulting in a decrease in accuracy of the model; in order to reduce the distribution difference, enhancing the extraction capability of the feature extractor on domain variable features, selecting MK-MMD as a distance measure between a source domain and a target domain to evaluate the distribution offset, and constructing an MK-MMD loss function; l (L) _MK-MMD The smaller the value, the more similar the distribution of the two samples;and->Representing depth characteristics of the source domain data and the target domain data after passing through the depth characteristic extractor respectively; l (L) _MK-MMD Representing MK-MMD loss between sums:

where E is the mathematical expectation,a map representing regenerated hubert space, hk representing regenerated kernel hubert space having a characteristic kernel k;

L _d : in the antagonism training, the domain discriminator G _d Loss function L for identifying whether sample data belongs to source domain or target domain _d Comprising two counter-propagates for updating G respectively _LRSAN And G _d Network parameters of (a); the gradient reverse layer GRL enables the two networks to form a countermeasure relationship, and the two networks are optimized through reverse propagation to achieve Nash equilibrium, L _d The loss function is as follows:

L _y : the rolling bearing state classifier consists of a full connection layer and a softmax activation function, and outputs class prediction of source domain data; cross entropy is used for calculating fault classification loss L _y It measures the difference between the predicted tag and the real tag of the source domain, written as:

where n represents the number of categories, M represents the number of samples, F represents the sign function 0 or 1, p _ic Representing the predicted probability that sample i belongs to category c;

training strategy

The joint optimization objective of lrsadlm, in combination with the optimization objective functions (2), (3) and (4), can be expressed by the formula (5):

wherein the super parameter lambda and _μ is a weight coefficient for adjusting the relationship between losses;

optimizing parameters of LRSADTLM using back propagation and random gradient descent methods by minimizing joint loss functions;

θ _f 、θ _y and theta _d Respectively represent G _LRSAN 、G _y And G _d Parameters of (a); parameter θ _f 、θ _y And theta _d Updating according to a formula (6);

where α represents the learning rate.

Compared with the prior art, the technical scheme provided by the invention has the following technical effects:

1. aiming at the problem that the global information of vibration signals is easy to ignore in the existing deep school method of the rotating mechanical equipment, the invention designs a global statistical feature matrix and a multi-scale wavelet packet time-frequency feature diagram, and more reasonably organizes the time-frequency information of the fault state of the rotating mechanical equipment;

2. aiming at the problem that the traditional deep learning model cannot fully extract effective characteristics of vibration signals, an LRN and SAM network are designed and fused, a deep fusion characteristic extraction network LRSAN is constructed, local and global information of the signals is effectively extracted, and characteristic expression capacity is enhanced;

3. aiming at the problem of insufficient diagnosis capability of a variable working condition model, the invention combines MK-MMD and DANN to evaluate the distribution difference between the depth characteristics of the data of the source domain and the target domain, combines the classification loss of the source domain, completes the optimization of the depth characteristic extraction network and improves the fault diagnosis accuracy of the variable working condition model;

4. the rolling bearing fault self-adaptive diagnosis method integrating the deep rolling and the self-attention network has good variable working condition migration diagnosis capability, including state identification accuracy, scene adaptation capability and noise immunity.

Drawings

FIG. 1 is a schematic construction diagram of MTFFM and GSFM of the present invention;

FIG. 2 is a flow chart of the depth fusion feature extraction network LRSAN of the present invention;

FIG. 3 is a schematic diagram of the LRSADTLM network structure of the present invention;

FIG. 4 is a schematic view of a convergence curve of classification accuracy and loss in accordance with the present invention;

FIG. 5 is a t-SNE diagram and an confusion matrix diagram obtained by the M3-M6 model transfer learning task P3- > P2.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention discloses a rolling bearing fault self-adaptive diagnosis method integrating deep rolling and self-attention network, which is realized by the following technical scheme:

step 1, vibration signal time-frequency analysis and processing

According to the invention, WPT is adopted to perform time-frequency analysis processing on the vibration signal, a multi-scale time-frequency characteristic diagram MTFFM is constructed, and the local fault characteristic extraction capability of the LRN is improved. Meanwhile, a feature matrix GSFM is constructed by adopting time-frequency statistical features of the vibration signals and is used for extracting global time-frequency features of the vibration signals. Here, we take 4096 vibration signal samples as an example, and describe the time-frequency analysis process of WPT on vibration signals, as shown in fig. 1.

1) Multi-scale time frequency characteristic diagram (MTFFM)

WPT is a time-frequency analysis method that can effectively analyze and process nonlinear and non-stationary vibration signals. According to the invention, the vibration signal sample is decomposed through four layers of WPT, and the small leaf nodes of 16 wavelet packets are obtained. Each wavelet packet node has 256 wavelet packet coefficients. The wavelet packet coefficients for each leaflet node are then constructed as a 16 x 16 coefficient matrix. Finally, the 16 coefficient matrixes are reordered through Z-shaped stitching, and a 64 multiplied by 64 multi-scale time-frequency characteristic diagram MTFFM is obtained.

2) Global Statistical Feature Matrix (GSFM)

And carrying out WPT reconstruction on the 16 wavelet packet leaf nodes to obtain 16 reconstructed branch signals. The envelope spectra of the 16 branch signals are calculated simultaneously. Then, seven statistical characteristics and the envelope spectrum of each branch signal are calculated, the seven statistical characteristic calculation formulas are shown in table 1, and the seven statistical characteristics (x represents a sequence with a length of n) of the vibration signal sample in table 1 are set as global statistical characteristics of the vibration signal. Specifically, a certain characteristic (such as amplitude) of the 16 branch signals is packed into a 16-dimensional vector, and a statistical characteristic matrix A epsilon R is obtained ^7*16 . Similarly, a statistical feature matrix B E R of 16 envelope spectra is obtained ^7*16 . Finally, the matrix A and the matrix B are connected in series to obtain GSFM epsilon R ^14*16 As input to the SAM network.

TABLE 1

Step 2. Deep fusion feature extraction network LRSAN

The local feature extraction capability of CNN makes it difficult to grasp the inherent relationship between the global trend and the local representation of the multi-scale time-frequency signal of the vibration signal. Thus, based on the self-attention mechanism, the present invention devised a unique deep converged feature extraction network, called LRSAN. As shown in FIG. 2, the feature extraction network is comprised of two modules, an LRN and a SAM, that respectively incorporate local time-frequency features based on the LRN and global statistical features based on the SAM. The design details of each module are as follows.

1) Lightweight ResNet (LRN)

The excitation response of defects in rotating machinery is often expressed in vibration signals at different time-frequency periods. Therefore, the state characteristics of the rolling bearing are also dispersed in the MTFFM. Further, as can be seen from fig. 1, the image size of the MTFFM is small. Based on this, unlike conventional deep convolutional residual networks, the present invention designs a lightweight Resnet Block (LRN) that prevents the network from becoming too complex, resulting in a reduction in model generalization capability. In an LRN network, a convolution kernel of size 3×3 is used, the convolution step size is set to 1, the padding value is set to 1 or 0, only three residual structures are used, the pooling operation of the middle layer is canceled, and specific model structure parameters are shown in Table 2.

TABLE 2

2) Self-attention mechanism (SAM) network

We set the input sequence xa= [ x ] of SAM network ¹ ,x ² ,...,x ^N ]∈R ^L*di Where L is the length of the sequence and di is the size of the elements in the sequence. Input sequence X _a By a trainable linear projection W ^P ∈R ^di*dm Is embedded as a sequenceThen in sequence X _p Adding a randomly initialized x to the header of (a) ₀ Also included is position code E _pos ∈R ^(L+1)*dm Is a kind of medium. Finally, the resulting input insertion sequence +.>

The SAM network has four layers. Each layer consists of a "multi-headed" self-attention (MSA) module, a feed-forward layer and a residual normalization layer. Given input embedding X ε R ^(L+1)*dm Output sequence O epsilon R of MSA layer ^(L+1)*dm Representing a sequence of features weighted by the attention moment array. The calculation formula of the self-care mechanism is as follows:

O＝MSA(X)＝Concat(head ₁ ,...,head _h )W ^O

wherein head _i Representing the ith attention header.And->Representing a linear projection of the ith attention head of the input insert X, different queries, keys and values are obtained, respectively. W (W) ^O ∈R ^h·dm*dm Refers to the linear transformation of multiple heads.

According to the input GSFM epsilon R ^14*16 In Table 3, the present invention sets the input embedded layer, position code and MSA network structure parameters, while the feed-forward layer with two fully connected layers is set to FC1 ε R ^256*128 And FC2 ε R ^128*256 . Through the SAM network, the global time-frequency characteristic of the vibration signal is extracted.

TABLE 3 Table 3

Step 3, model optimization strategy based on MK-MMD and DANN

As shown in fig. 3, lrsadlm is extracted by a depth feature extractor G _LRSAN A state classification network G _y And a domain discrimination network G _d Composition is prepared. To optimize the model, three loss functions are constructed, namely the classification loss L of the source domain _y And MK-MMD distribution difference loss L between source domain and target domain _MK-MMD And discriminating loss L _d . Network parameters are updated to improve the performance of migration diagnostic tasks by minimizing joint loss functions.

1) Optimization objective

L _MK-MMD : under different working conditions, the similarity of data distribution is poor, resulting in the decrease of the accuracy of the model. To reduce the distribution variance, the feature extractor is enhanced in its ability to extract domain variable features, MK-MMD is selected as a distance measure between the source domain and the target domain to evaluate the distribution offset, and an MK-MMD penalty function is constructed. L (L) _MK-MMD The smaller the value, the more similar the distribution of the two samples.And->Representing depth features of the source domain data and the target domain data after passing through the depth feature extractor, respectively. L (L) _MK-MMD Representing MK-MMD loss between sums:

where E is the mathematical expectation that,representing a map of regenerated hilbert space, hk represents regenerated kernel hilbert space with characteristic kernel k.

L _d : in the antagonism training, the domain discriminator G _d Loss function L for identifying whether sample data belongs to source domain or target domain _d Comprising two counter-propagates for updating G respectively _LRSAN And G _d Network parameters of (a) are provided. Gradient inversion layer (GRL) makes two networks form an antagonistic relationship, and optimizes by back propagation to achieve Nash equilibrium, L _d The loss function is as follows:

L _y : the rolling bearing state classifier consists of a fully connected layer and a softmax activation function, which outputs a class prediction of source domain data. Cross entropy is used for calculating fault classification loss L _y It measures the difference between the predicted tag and the real tag of the source domain, written as:

where n represents the number of categories, M represents the number of samples, F represents the sign function (0 or 1), p _ic Representing the predicted probability that sample i belongs to category c.

2) Training strategy

The joint optimization objective of lrsadlm designed by the present invention can be expressed by formula (5) in combination with the optimization objective functions (2), (3) and (4).

Wherein the super parameter lambda and _μ is a weight coefficient for adjusting the relationship between losses. The parameters of LRSADTLM are optimized using back-propagation and random gradient descent methods by minimizing the joint loss function. Here, θ _f 、θ _y And theta _d Respectively represent G _LRSAN 、G _y And G _d Is included in the parameters. Then, parameter θ _f 、θ _y And theta _d Can be updated according to the formula (6)。

Where α represents the learning rate.

Specific examples are as follows:

step 1, collecting bearing vibration signals in various states under four different powers of N15_M01_F10, N15_M07_F04 and N15_M07_F10, and dividing the bearing vibration signals into source domain data and target domain data, wherein the source domain data comprises bearing sample data and corresponding sample labels, the target domain data only comprises the bearing sample data, and the target domain data does not comprise the labels corresponding to the bearing samples. The dataset contains three bearing states, inner Ring (IR) damage, outer Ring (OR) damage and health.

Step 2, carrying out wavelet packet decomposition on sample signals of a training set and a testing set to obtain wavelet packet coefficients of four-layer wavelet packet decomposition of each sample signal; constructing a time-frequency characteristic diagram by using vibration signal samples with 4096 sampling points for four-layer wavelet packets, wherein the vibration signal samples are decomposed by the four-layer wavelet packets to obtain 16 wavelet Bao Shezi nodes, and each wavelet Bao Shezi node has 256 wavelet packet coefficients; secondly, constructing wavelet packet coefficients of each leaf node into a coefficient matrix of 16 x 16; and finally, the coefficient matrixes of the 16 wavelets Bao Shezi nodes are spliced in a Z-shaped manner according to the number, so that the MTFFM of the vibration signal sample can be obtained.

And 3, performing WPT reconstruction on the 16 wavelet packet leaf nodes in the step 2 to obtain 16 reconstructed branch signals. The envelope spectra of the 16 branch signals are calculated simultaneously. Then, seven statistical characteristics and the envelope spectrum of each branch signal are calculated, and the seven statistical characteristic calculation formulas are shown in table 1, and are set as global statistical characteristics of the vibration signal. Specifically, a certain characteristic (such as amplitude) of the 16 branch signals is packed into a 16-dimensional vector, and a statistical characteristic matrix A epsilon R is obtained ^7*16 . Similarly, a statistical feature matrix B E R of 16 envelope spectra is obtained ^7*16 . Finally, the matrix A and the matrix B are connected in series to obtain GSFM epsilon R ¹⁴ ^*16 。

And 4, designing a lightweight convolution residual network LRN and a self-attention SAM network to extract the local features and the global features of the MTFFM and the GSFM respectively in order to better extract the depth features of the MTFFM and the GSFM of the source domain and the target domain. Splicing the depth features extracted by the two sub-networks into combined depth adjustment, sending the depth features of the source domain samples into a full-connection layer to combine with the softmax classifier and the source domain label information, and calculating the cross entropy loss L _y 。

And 5, in order to improve the similarity of the depth feature probability distribution of the source domain data and the target domain data, the MK-MMD and the DANN domain adaptation method are combined, the adjustment distribution difference between different domains is reduced, and the model is optimized, so that more effective domain invariant features are extracted. Calculating the depth feature MK-MMD distance of the source domain and target domain samples to obtain MK-MMD loss of the depth feature space of the source domain and the target domainCalculating the classification loss of the discriminator, i.e. discrimination loss L _d 。

Step 6, classifying loss L by using source domain samples _y Counter-propagating optimized state classification network G _y And depth feature extraction network G _LRSAN By usingBack propagation optimized depth feature extractor G _LRSAN By L _d Back propagation optimized domain discrimination network G _d And depth feature extraction network G _LRSAN The method comprises the steps of carrying out a first treatment on the surface of the Iterative steps 4) -5) until L _LRSAN The loss iteration times reach the target requirement, the model parameters of minimizing the loss are reached, the trained LRSAN network is utilized to extract the depth characteristics of the target domain sample time-frequency diagram, and the class label of the sample is obtained.

Description of experimental data

Experimental data were taken from bearing data sets of the university of Pamphlet (PU). These data sets contain three bearing states, inner Ring (IR) damage, outer Ring (OR) damage and health, each failure category containing bearings with multiple damage generation methods. As shown in table 4, the bearing experiments were performed under three conditions, denoted P1, P2 and P3, of n15_m01_f10, n15_m07_f04 and n15_m07_f10, respectively, with 3120 vibration signal samples for each condition. The bearing codes and fault types used in this experiment are shown in table 5. Finally, six transfer tasks (P1- > P2, P1- > P3, P2- > P1, P2- > P3, P3- > P1, P3- > P2) were set up in the experiment, where P1- > P2 means that P1 (source domain dataset) is transferred to P2 (target domain dataset).

TABLE 4 Table 4

TABLE 5

2 experimental procedure

In order to verify the migration diagnosis effect of the fusion depth convolution and self-care network bearing fault self-adaptive diagnosis method provided by the invention between different working conditions, five groups of experiments are set, and the effectiveness of the method is verified. At the same time we set up different comparison methods to verify the superiority of the proposed methods, as shown in table 6. These methods are divided into two parts. In the first section, ablation models M0-M5 are designed as a comparison method of LRSADTLM (denoted as M6). M0 utilizes 7-layer CNNs with MK-MMD and DANN. M1-M3 use different ResNet backbone networks as depth feature extraction networks, resNet34, resNet18, and LRNs, respectively, whose migration method is the same as that of M6. The depth profile extraction networks for M4 and M5 are the same as for M6, using MK-MMD or DANN, respectively, as the transfer method. In the second part, we compare the proposed method with other most advanced transfer learning methods. These methods include transition component analysis (TC A), joint Distributed Adaptation (JDA), correlation arrangement of deep-field-adaptive networks (D-CORAL), maximum mean difference of deep-field-adaptive networks (D-MMD), and antagonistic discriminative field adaptation (ADDA). In the experimental case we used a wavelet packet tool of Matlab to process the vibration signal. The parent wavelet function is Daubechies wavelet (db 5' in Matlab) with decomposition level set to 4. During training, the batch size of the dataset was set to 200, the number of iterations was set to 2000, and the learning rate was set to 0.001.

TABLE 6

The classification accuracy and experimental results of the LRSADTLM model and the ablation method in the first set of experiments are shown in table 7. Firstly, it can be seen that the average accuracy of lrsadlm (M6) proposed by the present invention is the highest, 95.28%. In contrast, the highest average accuracy of the other ablation model comparison methods (M0-M5) was 92.36%, which is about 4% lower than M6. Comparing the results of M0, M1, M2, and M3, it can be seen that the LRNs of the present design perform better than classical CNNs, resNet18, and ResNet34 networks. This illustrates that an increase in the number of ResNet network layers results in a problem of overfitting of the low resolution MTFFM, resulting in reduced accuracy, which also illustrates the effectiveness of the LRN network architecture of the present invention design. In the transfer tasks of P1- > P2, P1- > P3, P3- > P1, P3- > P2, the classification accuracy of M4 is higher than that of M5, while in the transfer tasks of P2- > P1, P2- > P3, the opposite situation occurs. Meanwhile, in various transfer modes, the experimental results of M6 are better than those of M4 and M5, which shows that the fusion of the two strategies can improve the accuracy of transfer diagnosis. Compared with M3, the SAM network is added in M6, global time-frequency characteristics are added, and classification accuracy under different transfer tasks can be effectively improved.

TABLE 7

/>

In a second set of experiments, FIG. 4 shows the precision convergence curves for the first 500 durations of four methods M3-M6 under the transfer tasks P1- > P2, P2- > P1, P2- > P3 and P3- > P2. The precision convergence curve of M5 is more unstable, which corresponds to the nash equalization procedure of DANN. The M6 method fused with MK-MMD strategy can effectively inhibit the fluctuation, and M6 is superior to M4 in precision and stability (only MK-MMD is used). Although the convergence speed and stability of M3 are comparable to those of M6, the diagnostic accuracy is low. In summary, the M6 method achieves the best convergence result.

Meanwhile, fig. (e) - (h) show the loss convergence curves of M3-M6 at transfer tasks P1- > P2 and P2- > P1. Loss curves for MK-MMD and DANN are shown in FIG. (e) -FIG. (f) and FIG. (g) -FIG. (h), respectively. When the curve has stabilized, the loss value of M6 is minimal and the diagnostic performance is good. Comparing M3 and M6, M6 was found to reach nash equilibrium faster, indicating that the proposed LRSAN feature extraction network is more efficient in extracting domain invariant features due to the fusion of global and local features, enhancing the representativeness of the failure. Compared with M4 and M5, the proposed M6 method is found to accelerate loss convergence and can effectively inhibit fluctuation caused by DANN by combining MK-MM D and a DANN domain adaptation method.

In a third set of experiments, the domain adaptation capability of the depth features extracted by M3-M6 is analyzed by using a t-SNE method and a confusion matrix under the transfer task P3- > P2, as shown in FIG. 5, wherein 0, 1 and 2 represent class labels of bearing states. The graphs (a) and (d) show that compared with M3, the depth features extracted by M6 have better class separability, and after the global time-frequency features are added to the vibration signals, confusion among states is effectively reduced. Graphs (b) and (c) show that the depth features of the M4 method have better inter-class distances, while the extracted depth features of M5 are more advantageous in terms of intra-class aggregation. In contrast, the extracted depth features of M6 combine the advantages of M4 and M5, resulting in better inter-class separation features. Thus, M6 achieves better classification results than M4 and M5.

In a fourth set of experiments, weight coefficients were used to adjust the relationship between MK-MMD and DANN in an optimization strategy. Mu and lambda are weights of MK-MMD and DANN, respectively, as shown in equation (5). Table 8 lists the fault diagnosis results at different μ and λ. According to these results, when μ is greater than λ, the influence of the weight on the diagnostic accuracy is insignificant; when μ is smaller than λ, as the gap between the two coefficients becomes larger, the diagnostic accuracy also decreases. By combining the convergence curves, we can find that increasing the weight of MK-MMD can reduce the distribution difference of deep features of the source domain and the target domain as soon as possible, and further reduce the fluctuation brought by DANN, thereby achieving better diagnosis precision. Thus, we set μ=1 and λ=1 herein.

TABLE 8

In order to further verify the effectiveness of the proposed method, we performed a comparative experiment with the existing method in table 6. The results are shown in Table 9. Compared with other models, the M6 model has the highest accuracy under each task, and the average accuracy is nearly 5% higher than that of other 5 comparison methods. This demonstrates that LRSADTLM is more advantageous in solving the problem of condition migration diagnostics.

TABLE 9

From the experimental results, the following conclusions can be drawn:

(1) In fault diagnosis, it is critical to make full use of time-frequency information of vibration signals. The local time-frequency information structure MTFFM and the global time-frequency information structure GSFM of the vibration signal are constructed by utilizing the WPT, so that the fault state of the rolling bearing can be effectively expressed.

(2) The fault state feature extraction capability of the global information and the local information is remarkable by adopting the LRN and SAM fusion depth network with unique design.

(3) Lrsadlm is able to learn more effectively the domain invariant features under the proposed DA strategy of MK-MMD and DANN combination.

The invention provides a rolling bearing fault self-adaptive diagnosis method integrating deep rolling and self-attention network, and the fault identification accuracy and generalization capability of the proposed LRSADTLM method are verified by comparing bearing data sets of Pade Boen university (PU) with various experiments.

The foregoing is a further detailed description of the invention in connection with the specific embodiments of the practical cases, and it is not to be construed as limited to the specific embodiments of the invention, but rather as a matter of simple deduction or substitution by those skilled in the relevant art without departing from the spirit of the invention.

Claims

1. A fault self-adaptive diagnosis method integrating deep volume and self-attention network is characterized in that: the method specifically comprises the following steps of;

2. The method for adaptively diagnosing faults of a fused deep convolution and self-attention network as set forth in claim 1, wherein: in step 1, the vibration signal time-frequency analysis and processing are specifically as follows:

a multi-scale time-frequency characteristic diagram MTFFM;

global statistical feature matrix GSFM:

3. The method for adaptively diagnosing faults of a fused deep convolution and self-attention network as set forth in claim 1, wherein: in step 2, the depth fusion feature extraction network LRSAN is specifically as follows:

1) Lightweight ResNet LRN

2) Self-care mechanism SAM network:

O＝MSA(X)＝Concat(head ₁ ,...,head _h )W ^O

Q _i ＝X·W _i ^Q ,K _i ＝X·W _i ^K ,V _i ＝X·W _i ^V (1)

wherein head _i Represents the ith attention head, W _i ^Q ∈R ^dm*dm 、W _i ^K ∈R ^dm*dm And W is _i ^V ∈R ^dm*dm Representing the ith attention to input embedding XLinear projection of force head, respectively obtaining different inquiry, key and value, W ^O ∈R ^h·dm*dm Refers to linear transformation of multiple heads;

4. The method for adaptively diagnosing faults of a fused deep convolution and self-attention network as set forth in claim 1, wherein: in step 3, the model optimization strategy based on MK-MMD and DANN is specifically as follows:

5. The method for adaptively diagnosing faults in a fused deep convolution and self-attention network of claim 4 in which: the optimization objective is as follows:

L _MK-MMD : under different working conditions, the similarity of data distribution is poor, so that the accuracy of the model is reduced; in order to reduce the distribution difference, enhancing the extraction capability of the feature extractor on domain variable features, selecting MK-MMD as a distance measure between a source domain and a target domain to evaluate the distribution offset, and constructing an MK-MMD loss function; l (L) _MK-MMD The smaller the value, the more similar the distribution of the two samples; f (F) _s ＝{f _i ^s } _i＝1,..n And F _t ＝{f _i ^t } _i＝1,..n Respectively represent source domainsDepth characteristics of the data and the target domain data after passing through the depth characteristic extractor; l (L) _MK-MMD Representing MK-MMD loss between sums:

training strategy

wherein, the super parameters lambda and mu are weight coefficients for adjusting the relation between losses;

where α represents the learning rate.