CN114239384A

CN114239384A - Rolling bearing fault diagnosis method based on nonlinear measurement prototype network

Info

Publication number: CN114239384A
Application number: CN202111429337.0A
Authority: CN
Inventors: 苏祖强; 吴然然; 韩冷; 张小龙; 姜维龙
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-25

Abstract

The invention relates to the technical field of simulation analysis, in particular to a rolling bearing fault diagnosis method based on a nonlinear measurement prototype network, which comprises the steps of constructing a cascade attention prototype nonlinear measurement network, carrying out classification training on the constructed network, carrying out data processing on data with diagnosis, inputting the data into the trained cascade attention prototype nonlinear measurement network, and outputting a diagnosis result; the invention extracts the feature diagram through the prototype calculation module, calculates the prototype for the support set feature, splices the query sample feature and various prototypes one by one in the cascade attention module, then extracts the long-distance correlation of the spliced sample through the cascade attention mechanism, and finally inputs the feature extracted by the cascade attention module into the nonlinear measurement module, thereby realizing the accurate and effective bearing fault diagnosis under the condition of small sample.

Description

Rolling bearing fault diagnosis method based on nonlinear measurement prototype network

Technical Field

The invention relates to the technical field of simulation analysis, in particular to a rolling bearing fault diagnosis method based on a nonlinear measurement prototype network.

Background

The rolling bearing is one of the most critical components in large-scale rotating machinery equipment, is easy to damage after long-time operation in severe environment, even leads to the abnormal work of the whole unit, and causes huge economic loss or casualties. Therefore, accurate and intelligent fault diagnosis of the rolling bearing has very important significance in the industrial and academic fields.

The rolling bearing fault diagnosis method based on deep learning is rapidly developed in the past years, and fault diagnosis and identification are carried out on vibration signals by utilizing strong characteristic dimension reduction and mode identification capability of a neural network. Compared with the traditional diagnosis algorithm, the deep learning has stronger high-dimensionality and nonlinear abstract data feature extraction capability and more accurate pattern recognition capability without artificial feature extraction. Deep learning methods such as an Automatic Encoder (AE), a Deep Belief Network (DBN), a Convolutional Neural Network (CNN), and a deep residual error network (DRN) have been widely used in the field of diagnosis of rolling bearing faults with sufficient labeled samples, and have exhibited good performance. However, the success of these methods is due in large part to the large amount of label data, and in practical industrial application scenarios, it is difficult to directly obtain sufficient labeled fault samples since the rolling bearings are in most cases functioning properly during their life cycle. The marked fault sample scarcity can cause the problems of overfitting, poor robustness, low fault diagnosis accuracy and the like of a fault diagnosis model of the traditional deep learning method. Therefore, under the condition that the label fault samples are few, the research on the fault diagnosis model for the rolling bearing has important engineering significance.

Disclosure of Invention

Aiming at the problem that ideal recognition effect is difficult to obtain by a fault diagnosis method based on deep learning due to the scarcity of marked samples in the prior art, the invention provides a rolling bearing fault diagnosis method based on a nonlinear measurement prototype network.

Further, the cascade attention prototype nonlinear metric network comprises a sample set division module, a prototype calculation module, a cascade attention mechanism learning module and a nonlinear metric strategy classification training module, wherein:

dividing the sample set into a support set and a query set by using a sample set dividing module;

inputting the divided data sets into a prototype calculation module to obtain feature graphs corresponding to samples in the data sets, and calculating class prototypes through the feature graphs of the support sets;

splicing the feature graphs of the query set samples with the prototypes of all categories one by one, and extracting the long-distance correlation of the spliced samples by adopting a cascade attention mechanism learning module;

and inputting the long-distance correlation extracted by the cascade attention mechanism learning module into a nonlinear measurement strategy classification training module for classification training.

Further, inputting the divided data set into a prototype calculation module to obtain a feature map corresponding to the sample in the data set, namely, using a feature extractor

Sample x in sample set L_iEmbedding into a feature space, represented as:

for type c faults, prototype P is generated by using support set S_CThe method comprises the following steps:

wherein, y_iA label representing the ith sample in support set S.

Further, the cascade attention mechanism learning module comprises a channel attention submodule and a space attention submodule, and the extracting the long-distance correlation of the spliced sample comprises:

the cascade attention mechanism learning module performs convolution on the input spliced sample and extracts a characteristic F;

respectively inputting the feature F into a channel attention submodule and a space attention submodule, wherein the channel attention submodule adaptively adjusts feature values among channels, establishes a channel dependency relationship and obtains a channel attention feature F_c'；

The space attention submodule focuses on the position information of the target sample in the input feature mapping to obtain a space attention feature F_s'；

Attention feature of channel F_c' and spatial attention feature F_sAnd performing information fusion, and accumulating the fused characteristic information and the input characteristic F to obtain the long-distance correlation of the spliced sample.

Further, the channel attention submodule comprises a global average pooling layer, a first convolution block and a second convolution block, each convolution block is composed of a convolution layer, a BN layer and an activation function, the characteristic F is input into the global average pooling layer, the first convolution block and the second convolution block which are cascaded, a channel information structure body S is obtained through extraction, and the matrix product of the characteristic F unified by the channel information structure body S and the characteristic F are added to be used as the output of the channel attention submodule.

Further, the channel attention feature F_c' is represented as:

wherein the content of the first and second substances,

channel attention feature map, W, obtained for global average pooling of features F₁And W₂Weights of convolution layers in the first convolution block and the second convolution block respectively; sigma (.) is sigmoid activation function; gamma (.) is the relu activation function,

and

respectively, a matrix multiplication operation and an addition operation.

Further, the spatial attention submodule comprises a third rolling block and a global average pooling layer, the third rolling block is composed of a rolling layer and a BN layer, the characteristic F is input into the cascaded third rolling block and the global average pooling layer to extract a spatial information structure S ', the value obtained by multiplying the spatial information structure S' by the input characteristic F is added with the characteristic F to obtain a spatial attention characteristic F_s'。

Further, the spatial attention feature F_s' is represented as:

wherein the content of the first and second substances,

for the average pooling of features F in their channel dimensions, W₃Represents the weight of the convolutional layer in the convolutional block, sigma (.) is sigmoid activation function,

and

respectively, a matrix multiplication operation and an addition operation.

Further, the additional convolution block adopted in the process of carrying out convolution on the input spliced sample by the cascade attention mechanism learning module comprises a convolution layer, a pooling layer, a BN layer and an activation function.

The invention extracts the feature diagram through the prototype calculation module, calculates the prototype for the support set feature, splices the query sample feature and various prototypes one by one in the cascade attention module, then extracts the long-distance correlation of the spliced sample through the cascade attention mechanism, and finally inputs the feature extracted by the cascade attention module into the nonlinear measurement module, thereby realizing the accurate and effective bearing fault diagnosis under the condition of small sample.

Drawings

FIG. 1 is a flow chart of an embodiment of a rolling bearing fault diagnosis method based on a nonlinear metric prototype network, which is disclosed by the invention;

FIG. 2 is a schematic diagram of a nonlinear metrology prototype network architecture according to the present invention;

FIG. 3 is a diagram of a prototype network architecture;

FIG. 4 is a schematic diagram of a linear metrology structure of a prototype;

FIG. 5 is a schematic view of a non-linear metrology structure in accordance with the present invention;

FIG. 6 is a schematic diagram of a cascade attention mechanism according to the present invention;

FIG. 7 is a schematic diagram of a vibration signal of a rolling bearing collected in a state a by the MFS experimental apparatus of the present invention;

FIG. 8 is a schematic diagram of the vibration signals of the rolling bearing collected in the MFS experimental apparatus of the present invention at state b;

FIG. 9 is a schematic diagram of the vibration signals of the rolling bearing collected in the state c of the MFS experimental apparatus of the present invention;

FIG. 10 is a schematic diagram of the vibration signals of the rolling bearing collected by the MFS experimental apparatus of the present invention at state d;

FIG. 11 is a schematic diagram of vibration signals of a rolling bearing collected in the MFS experimental apparatus of the present invention at state e;

FIG. 12 is a schematic diagram showing the comparison of diagnostic accuracy between different fault diagnosis and identification methods;

FIG. 13 is a schematic diagram of the output of a confusion matrix under the WDCNN fault diagnosis and identification method;

FIG. 14 is a schematic diagram of confusion matrix output under the SiaNet fault diagnosis and identification method;

FIG. 15 is a schematic diagram of confusion matrix output under the RelayNet fault diagnosis and identification method;

FIG. 16 is a schematic diagram of confusion matrix output under the ProNet fault diagnosis and identification method;

FIG. 17 is a schematic diagram of confusion matrix output under the NM-ProNet fault diagnosis and identification method;

FIG. 18 is a schematic diagram of confusion matrix output under the (CANM-ProNet) fault diagnosis and identification method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a rolling bearing fault diagnosis method based on a nonlinear measurement prototype network, which comprises the steps of constructing a cascade attention prototype nonlinear measurement network, carrying out classification training on the constructed network, carrying out data processing on data with diagnosis, inputting the data into the trained cascade attention prototype nonlinear measurement network, and outputting a diagnosis result.

As shown in fig. 1, the present invention includes two parts, namely, nonlinear metric prototype network training and fault diagnosis and identification, specifically including:

1. training a nonlinear metric prototype network: based on the limited label sample set, the limited label sample set is divided into a training set and a test set, wherein the training set is further divided into a support set and a query set, the samples are mapped to an embedding space through a prototype network, and various types of prototypes are calculated based on the support set. And (4) splicing the query samples and the class prototypes in the embedding space one by one, and sending the query samples and the class prototypes into a cascade attention module to extract non-local information. And finally, the similarity between the sample and the prototype is better measured through a nonlinear measurement module so as to improve the fault performance. Initializing all parameters of the nonlinear measurement prototype network based on the steps, and feeding training samples through a gradient descent algorithm to train parameters of a network optimization model;

2. fault diagnosis of rolling bearing under small sample: processing vibration data of the rolling bearing to be diagnosed and identified; inputting identification data to be diagnosed into the trained nonlinear measurement prototype network in the process 1; and outputting a fault diagnosis result by the trained nonlinear metric prototype network.

The bearing fault diagnosis model based on the nonlinear measurement prototype network performs the following operations:

s11, dividing a sample set; before dividing a sample set, normalizing original vibration signal samples containing C-type fault categories into standardized one-dimensional samples, and dividing limited marked fault samples into support sets

And query set

Training sample set L as nonlinear metric prototype network is used for S12-S14;

s12, extracting a feature diagram in a prototype calculation module according to the divided data set, and calculating a category prototype through the feature of the support set; namely:

embedded prototype module utilization feature extractor in nonlinear metrology prototype networks

Training samples x in sample set L_iEmbedding into a feature space:

for type c faults, prototype P is generated by using support set S_c：

Since the prototype network measures the similarity between a sample and a class prototype in a linear manner, the linear measurement is intended to directly calculate the distance between features by predefining a fixed metric (e.g. euclidean distance), which requires that a feature extractor can extract obvious discriminant features as prototype representations, whereas a mechanical vibration signal is difficult to extract fault features with high recognizability under the condition of few labeled samples. Secondly, the fixed linear metric cannot learn the non-linear relationship between complex signals, and the diagnostic performance thereof will be greatly reduced. Aiming at the defects of prototype network linear measurement, a learnable nonlinear classifier is used for replacing a prototype network fixed linear measurement mode, class prototypes and query sample characteristics are spliced, nonlinear measurement is learnt through a nonlinear neural network, and similarity scoring is carried out on each batch of spliced samples to complete sample category identification.

And S13, splicing the characteristics of the query sample with various types of prototypes one by one based on the calculated prototypes, and extracting the long-distance correlation of the spliced sample by adopting a cascade attention module.

A query sample is spliced with the C type prototype characteristics, and a spliced sample l (x)_i) Inputting the volume block in the cascade attention module, and performing primary feature extraction on the splicing features to obtain a feature map

Where H × W × C represents the height, width, and number of channels of the feature map, respectively. In the cascade attention module, the characteristic map is controlled

Flow into the channel attention and spatial attention modules, respectively. Through the channel attention module, the characteristic values among the channels can be adjusted in a self-adaptive mode, a channel dependency relationship is established, and the channel attention characteristic F is obtained_c'; in the space attention module, the position information of the target sample in the input feature mapping is mainly focused, and the unimportant target features are ignored to obtain the space attention feature F_s'; finally attention feature F of the channel_c' and spatial attention feature F_sAnd performing information fusion, and accumulating the fused characteristic information and the input characteristic F to extract the important characteristics of the spliced sample.

(1) The channel attention module. In the channel attention module, a global average pooling operation is firstly adopted to compress the feature F in a space dimension, and the space information of feature mapping is aggregated to generate a channel attention feature map

Then pass throughThe two rolling blocks extract the nonlinear relation between each channel, the channel dimensions of the two rolling blocks are subjected to dimensionality reduction processing and then to dimensionality enhancement processing, and then an activation function is used for obtaining a channel attention weight S. The channel attention internal network structure is shown in fig. 5, wherein CPBA represents the corresponding convolutional layer, pooling layer, BN layer and activation function, and CBA represents the convolutional layer, BN layer and activation function. Then, multiplying the input characteristic F by a channel information structure S matrix, and fusing the generated result with the F information to obtain the channel attention weighted characteristic

The final channel attention module output results are as follows:

in the formula (I), the compound is shown in the specification,

channel attention feature map obtained for global average pooling of F, W₁And W₂Respectively representing the weights of the two convolutions in the CBA, sigma (. and gamma.). sigma. are sigmoid and relu activation functions, respectively,

and

respectively, a matrix multiplication operation and an addition operation.

(2) A spatial attention module. In the space attention module, firstly, a layer of convolution layer is adopted to extract information from the characteristic F, and the output characteristics of the convolution layer are subjected to channel fusion to obtain a space attention characteristic diagram

The activation function is then used to obtain the spatial attention weight S'. The spatial attention network structure is shown in fig. 5, where CB represents the corresponding convolutional layer and BN layer. Then, inputting the characteristicsF is multiplied by the space information structure S' matrix, and the generated result is fused with the F information to obtain the space attention weighted feature

The final spatial attention module output results are as follows:

in the formula (I), the compound is shown in the specification,

for average pooling of F in its channel dimension, W₃Denotes the weight of convolution kernel 7 × 7 convolution in CB, σ () is sigmoid activation function,

and

respectively, a matrix multiplication operation and an addition operation.

S14, inputting the features extracted by the attention module into a nonlinear measurement module to realize effective few-shot learning (FSL) bearing fault diagnosis, and conveying the spliced sample to the nonlinear measurement module

Through a series of continuous mapping of network layers, the module finally outputs a scalar quantity V with the C value between 0 and 1 through softmax_j,r，

V_j,rRepresenting query samples

With a certain type of prototype p_CSimilarity between, i.e. query samples

Probability values belonging to the class. The linear measurement and nonlinear measurement modes based on the prototype network are distinguished as shown in fig. 4:

in order to improve the accuracy of the classifier, a network model is trained by minimizing the classification loss of class prototypes corresponding to the query sample and the support set, the mean square error is used as a loss function, and the similarity probability value V is output through the above_j,rAnd a label for the query sample

And the class prototype belongs to the label

Calculating the mean square error L_MSE：

Finally, the network model is trained by minimizing the above equation:

after the class prototype is spliced with the feature map of the query sample, the long-distance correlation of the spliced sample with doubled feature dimension cannot be captured because the class prototype is directly input into a nonlinear measurement network and is influenced by the size of a receptive field. Therefore, a cascade attention mechanism is used to extract the long-distance correlation of the spliced sample, so as to better extract the nonlinear relation between the sample and the prototype through the nonlinear measurement module.

The identification process of fault diagnosis identification comprises the following steps:

s21, processing identification data to be diagnosed;

s22, inputting identification data to be diagnosed into a nonlinear measurement prototype network, and outputting a fault diagnosis result by the network;

the nonlinear measurement prototype network is a small sample supervised learning model and mainly comprises a prototype calculation module, a cascade attention module and a nonlinear measurement module.

In order to verify the effectiveness of the fault diagnosis and identification method disclosed by the invention, a comparison test is carried out by utilizing a vibration signal of a Machine Fault Simulator (MFS); the experiment simulates 5 health states of the rolling bearing, collects the belt end bearing Y-axis vibration signal of the simulator under the 44Hz conversion frequency, and the sampling frequency is 10240 Hz. Specifically, each set of health status data was repeatedly collected 6 times. After obtaining the vibration signals of five different states, data preprocessing is required for the vibration signals, and first, the vibration data with a length of 102400 is divided into 25 samples, each sample containing 4096 data points, so that the number of samples per class is 25 × 6 — 150. The original vibration waveforms for the five different conditions are shown in fig. 6 below, and their health states are shown in table 1 below. For the fault diagnosis of few labeled samples in practical application, all samples are randomly divided into a training set and a testing set, 20% of the samples are used as training samples, and the rest 80% are used as testing samples. The number of marked samples of the training set is set to be 4, each class respectively comprises 5, 10, 15 and 20 samples which are called 6-way 5/10/15/20-shot, and each shot sequentially comprises 5 methods from left to right, namely, a CNWDN, a SiaNet, a RelayNet, a ProNet, an NM-ProNet and the invention (CANM-ProNet).

TABLE 1 health status of rolling bearings

Table 2 data set description

Based on the above sample set, the method proposed by the present invention (CANM-ProNet) is compared with five other methods, including the parameter Network (silanet) described in WDCNN, document 32nd International Conference Machine Learning,2015, "parameter Network for One-Shot Image registration", the relationship Network (RelaNet) described in cvpr, 2018, "Learning to match: relationship Network for fe-Shot Learning", the document nips, 2017, "Prototypical Networks for fe-Shot Learning", and the present method without additive cascade attention (NM-ProNet). In order to ensure the fairness of the experiment, the six methods uniformly use the same feature extractor and the same hyper-parameter, obtain the same training and testing samples for each batch of data, and totally perform comprehensive evaluation on ten batches of data by repeating the experiment.

The method is characterized in that the structure parameters of the characteristic extractor are shown in the following table 3:

TABLE 3 parameters of the network layer

In order to reduce the influence of randomness of experimental data on experimental results, ten random experiments were performed, and the experimental results are shown in fig. 12 and table 4 below:

table 4 comparative experimental results

As is apparent from the table, the fault diagnosis performance of the conventional deep learning method WDCNN is not ideal, mainly because the true distribution of data in a high-dimensional space cannot be sufficiently reflected in the case of a small amount of training data. However, with the increase of training samples, the accuracy of the WDCNN is greatly improved, and when 20 training samples are used in each class, the fault diagnosis recognition rate is higher than that of 5 samples in each class by about 24%. In the table, the three FSL methods, i.e., SiaNet, proset and RelaNet, have significantly improved diagnostic performance compared to the conventional deep learning method, because SiaNet, RelaNet and proset all acquire knowledge from small samples through similarity calculation and class expansion. Of these three FSL methods, ProNet's overall recognition is best, with the average increase rates of WDCNN being about 7%, 6% and 4% in the cases of 5-shot, 10-shot, 15-shot and 20-shot, respectively. This shows that ProNet can better improve the classification accuracy of the rolling bearing by prototype fitting of the data distribution center. In addition, it can be found that the improvement precision of the FSL method gradually decreases with the increase of training samples. In the improved method based on the prototype network, NM-ProNet uses a nonlinear measurement strategy to judge whether spliced samples belong to the same class. When the number of each type of training samples is increased from 5 to 20, the recognition accuracy of NM-ProNet is increased from 81% to 91%, and is increased by about 16%, 11%, 5% and 5% compared with the original ProNet, which shows that in the prototype network, compared with the linear measurement mode, the fault diagnosis performance can be greatly improved by using the nonlinear measurement strategy, mainly because the fixed similarity measurement function cannot update the network model by learning more parameters, overfitting is easily caused, and the nonlinear measurement uses a learnable similarity measurement function, so that the classification effect is improved. However, as the feature dimension of the spliced sample is increased, the long-distance correlation cannot be acquired in the nonlinear measurement module due to the limitation of the receptive field, which affects the extraction of the fault features of the complex vibration signal. By comparing NM-ProNet and CANM-ProNet in the table, 5-shot, 10-shot, 15-shot and 20-shot of each type are respectively improved by about 3%, 4% and 2%, which shows that the long-distance correlation is obtained by adding an attention module to the spliced sample, so that the method can be better suitable for nonlinear measurement, and the performance of fault diagnosis is improved. Therefore, the proposed CANM-ProNet achieves the best test classification compared to other methods.

To compare the classification of experimental methods between classes in more detail, FIG. 8 lists the confusion matrix of the results of the WDCNN, SiaNet, RelayNet, ProNet, NM-ProNet and CANM-ProNet methods at 5-shot. Wherein, each type of test sample is 120, and the total number is 5 fault types. As can be seen from the overall classification situation, the following methods mainly focus on IF and BF, and the original signal waveforms of the two types shown in FIGS. 7-11 are combined to find that the IF and BF signal waveforms are somewhat consistent, so that the possibility of difficult complete distinction is caused. As can be seen from fig. 13, the recognition effect of the WDCNN is very poor, when BF is classified, more than half of samples are classified incorrectly, and the wrong labels are mostly concentrated on IF, which indicates that the WDCNN cannot learn the sample characteristics well to achieve the classification effect under the condition that only 5-shot labeled samples are used for network training, and is not suitable for fault diagnosis of the WDCNN under small samples. As can be seen from fig. 14 to 16, compared with WDCNN, the classification effects of the three methods are correspondingly improved, but the overall difference is not great, which indicates the classification effectiveness of the three methods in fault diagnosis under FSL. As shown in fig. 18, compared with the comparison method, the method provided herein improves the obvious classification effect, enhances the distinctiveness of IF and BF, and also improves the method of fig. 17 to a certain extent, because in the method provided herein, the similarity of the spliced sample can be better judged through nonlinear measurement, and meanwhile, the long-distance correlation of the spliced sample is calculated by using the cascade attention, so as to further obtain the more distinguishing features, so that the method has higher identification accuracy for each class, and therefore, compared with other methods, the proposed CANM-ProNet realizes the best small sample fault diagnosis accuracy.

The invention provides an improved FSL method of a rolling bearing fault diagnosis model aiming at an application scene of the shortage of fault marking data, which is called as a cascade attention and nonlinear metric improvement prototype network (CANM-ProNet). First, the prototype calculation module extracts feature maps of the support set and the query set, and calculates a prototype using the feature maps of the support set. The query feature map is then concatenated with each prototype and a cascade attention module is introduced to extract non-local information of the concatenated features. Finally, a non-linear metrology module is presented for better measuring the similarity between the samples and the prototype to improve fault diagnosis performance. Numerous experiments have shown that this method is more efficient than other methods with fewer samples of faults.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A rolling bearing fault diagnosis method based on a nonlinear measurement prototype network is characterized by comprising the steps of constructing a cascade attention prototype nonlinear measurement network, carrying out classification training on the constructed network, carrying out data processing on data with diagnosis, inputting the data into the trained cascade attention prototype nonlinear measurement network, and outputting a diagnosis result.

2. The rolling bearing fault diagnosis method based on the nonlinear measurement prototype network according to claim 1, wherein the cascade attention prototype nonlinear measurement network comprises a sample set division module, a prototype calculation module, a cascade attention mechanism learning module and a nonlinear measurement strategy classification training module, wherein:

3. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 2, characterized in that the divided data sets are input into a prototype calculation module to obtain a feature map corresponding to the samples in the data sets, namely, a feature extractor is utilized

Sample x in sample set L_iEmbedding into a feature space, representingComprises the following steps:

wherein, y_iA label representing the ith sample in support set S.

4. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 1, wherein the cascade attention mechanism learning module comprises a channel attention submodule and a space attention submodule, and the extracting of the long-distance correlation of the spliced sample comprises:

5. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 4, wherein the channel attention submodule comprises a global average pooling layer, a first convolution block and a second convolution block, each convolution block is composed of a convolution layer, a BN layer and an activation function, the feature F is input into the global average pooling layer, the first convolution block and the second convolution block which are cascaded to extract a channel information structure S, and the matrix product of the unified feature F of the channel information structure S and the feature F are added to be used as the output of the channel attention submodule.

6. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 5, wherein the channel attention feature F_c' is represented as:

wherein the content of the first and second substances,

and

respectively, a matrix multiplication operation and an addition operation.

7. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 4, wherein the spatial attention submodule comprises a third rolling block and a global average pooling layer, the third rolling block is composed of a convolution layer and a BN layer, the feature F is input into the cascaded third rolling block and global average pooling layer to extract a spatial information structure S ', the spatial information structure S' is multiplied by the input feature F to obtain a value, and the value is added with the feature F to obtain a spatial attention feature F_s'。

8. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 7, wherein the spatial attention feature F_s' is represented as:

wherein the content of the first and second substances,

and

respectively, a matrix multiplication operation and an addition operation.

9. The rolling bearing fault diagnosis method based on the nonlinear metric prototype network according to claim 7, wherein the additional convolution blocks adopted in the convolution process of the cascaded attention mechanism learning module on the input spliced sample comprise a convolution layer, a pooling layer, a BN layer and an activation function.