CN117077815A

CN117077815A - Bearing fault diagnosis method based on deep learning under limited sample

Info

Publication number: CN117077815A
Application number: CN202311323130.4A
Authority: CN
Inventors: 贺长波; 吴少雄; 杨阳; 杜明刚; 汪航; 刘永斌
Original assignee: Anhui University; China North Vehicle Research Institute
Current assignee: Anhui University; China North Vehicle Research Institute
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2023-11-17

Abstract

The invention discloses a bearing fault diagnosis method based on deep learning under a limited sample, which relates to the technical field of bearing fault diagnosis and comprises the following steps: dividing multi-source sensor data acquired from equipment into training set data and test set data, and then preprocessing the data to enhance the characteristic representation of the data of each source sensor; establishing a label for the preprocessed data; constructing an end-to-end deep learning prediction model; inputting the training set data into a deep learning prediction model to train the deep learning prediction model, so as to obtain a trained deep learning prediction model; and inputting the test set data into a trained deep learning prediction model, and determining the model effect. The bearing fault diagnosis method based on deep learning under the limited sample can deeper mine potential characteristic information in data, and solves the problem of low fault classification precision under the limited sample condition.

Description

Bearing fault diagnosis method based on deep learning under limited sample

Technical Field

The invention relates to the technical field of bearing fault diagnosis, in particular to a bearing fault diagnosis method based on deep learning under a limited sample.

Background

The development of the modern industry has led to an increase in the complexity of the mechanical equipment, and the failure of any component can lead to serious accidents, resulting in great economic losses and even personal injury. Health Management (PHM) can provide a guarantee for safe operation of the device. Wherein the rolling bearing plays a critical role in the rotating mechanism, the overall health of the rolling bearing is critical to maintaining the performance, stability and life of the mechanism. It is necessary to monitor the failure diameter of the rolling bearing under different loads to avoid any possible damage. Thus, advanced fault diagnosis techniques are required to ensure operational reliability and real-time health monitoring thereof. With the deep development of fault classification and identification technology, students propose various intelligent fault identification methods. Conventional machine learning methods typically employ time-domain, frequency-domain, and time-frequency-domain analysis as the primary techniques for feature extraction, followed by fault identification using Artificial Neural Networks (ANNs), random Forests (RF), and Support Vector Machines (SVMs) training models. Although these fault diagnosis methods can effectively solve the problems, the analysis process requires expertise of researchers, and the processing process is complicated. With the continued development of technology and demand, there is a need to increase the speed and accuracy of the evaluation of these methods, with deep learning being favored by its powerful automatic feature extraction capabilities. Generally, a deeper network represents more features extracted, but the classification accuracy is not proportional to the number of layers when the convolutional neural network is trained deeply.

Disclosure of Invention

The invention aims to provide a bearing fault diagnosis method based on deep learning under a limited sample, which can deeply mine potential characteristic information in data and solve the problem of low fault classification precision under the condition of the limited sample.

In order to achieve the above object, the present invention provides a bearing fault diagnosis method based on deep learning under a limited sample, comprising the steps of:

s1, using different fault types and different fault positions of sensor acquisition equipment under different rotation speeds, different loads and different radial forces as original data, dividing the original data into training set data and test set data, performing Fourier transformation on the acquired original data, performing normalization processing, and performing data enhancement on the training set data;

s2, constructing an end-to-end deep learning prediction model;

s3, inputting the training set data in the step S1 into a deep learning prediction model to train the deep learning prediction model, so as to obtain a trained deep learning prediction model;

s4, inputting the test set data in the step S1 into the deep learning prediction model trained in the step S4, and determining the model effect.

Preferably, in step S1, the data enhancement method for training set data includes: random scaling, random clipping, random addition of gaussian noise.

Preferably, in step S1, the fourier transform takes the first 1024 points of the frequency domain, expressed as:

；

FFTrepresenting the fourier transform.

Preferably, in step S3, constructing an end-to-end deep learning prediction model includes: the self-calibration attention module is combined with the feature extraction module, and meanwhile, the group normalization GN and the batch normalization BN are combined, so that the combination of the group normalization GN and the batch normalization BN is utilized to improve the performance of the deep learning prediction model.

Preferably, the batch normalized BN is calculated along the channel axis direction and the expression of the batch normalized BN is as follows:

；

wherein the method comprises the steps ofAnd->Representing the>Input and output features of the individual observations; />And->Is two trainable parameters for adjusting the distribution; />Is a constant that tends to zero;mrepresenting a change in the size of the collection; />Representing the mean; />Representing the variance; />Representing the normalized data; />Representing an output characteristic;

group normalized GN is calculated along the group axis direction and normalized set of GNS _i The expression is as follows:

；

wherein,Gis the number of groups of the optical fiber,Cis the number of channels;C/Grepresenting the number of channels per group;is a downward rounding operation; />Representing an index number; />、/>An index representing the group; />、/>Representing the index on each set of channels.

Preferably, the self-calibration attention module improves the fault feature extraction capability of the model under the condition of small samples by performing convolution feature conversion in two different scale spaces, namely an original scale space and a self-calibration scale space with smaller resolution, wherein the self-calibration convolution comprises the following steps:

s31, splitting the input features according to the channel numberIn equal parts, denoted asThe method comprises the steps of carrying out a first treatment on the surface of the Splitting the convolution kernel K into 3 parts of the same dimension, denoted +.>；

S32, processing the original scale feature space, and inputting the feature X ₁ Through K ₁ Convolving to obtain a feature Y1;

s33, processing the self-correction scale space to obtain a characteristic X ₂ Input into two channels with different resolutions, reducing the resolution by 4 times in the width direction of the input feature, and pooling the input feature X by averaging ₂ Performing downsampling conversion into low-dimensional embedding to correct convolution transformation of the high-resolution partial convolution kernel;

s34, rolling and up-sampling the extracted features, outputting the result after ReLU function calculation, and summing the result of ReLU calculation and K ₃ The characteristic after convolution extraction is corrected to obtain the output characteristic Y of the self-correcting part ₂ The operation is as follows:

；

wherein:representing average pooling; />Representing the downsampling rate; />Representing upsampling; />Representing a ReLU activation function, increasing the nonlinearity of the model; t, & lt + & gt>、/>Each characteristic of the output is represented respectively;

s35, fusing two scale space output features Y ₁ And Y ₂ And obtaining a final output characteristic Y through group normalization GN and GELU activation function calculation.

Preferably, the self-calibration attention module is combined with the Resnet50, and the bottleneck structure of the Resnet50 is changed into the self-calibration attention module structure, so as to obtain the deep learning prediction model, which specifically comprises the following steps:

(1) The original one-dimensional vibration signal passes throughFFTFirstly, shallow layer characteristics z are extracted through a wide-kernel convolution module, wherein the wide-kernel convolution weakens the influence of environmental interference on capturing useful characteristics;

(2) Directly adding the shallow layer characteristic z and the output of the self-calibration convolution module through residual connection to obtain cross-layer output;

(3) The cross-layer output in the step (2) enters a self-calibration convolution module of four stages, the self-calibration convolution module can learn multi-scale space characteristics, enhance the capability of a model to learn the multi-scale characteristics of signals, consider the correlation of information of each channel through a channel self-attention module and highlight faults by giving more weights to similar characteristics;

(4) The global average pooling layer is used to enter the softmax layer to identify various types of faults.

Preferably, the ReLU activation function is expressed as follows:

；

wherein,representing a ReLU activation function;

the GELU activation function is expressed as follows:

。

preferably, in step S4, the precision of the deep learning prediction model is checked by using Last-Mean, last-Std, best-Mean and Best-Std;

wherein, last-Mean and Last-Std represent the Mean and standard deviation of the Last epoch, and Best-Mean and Best-Std represent the Mean and standard deviation of highest accuracy achieved throughout the experiment.

Therefore, the bearing fault diagnosis method based on deep learning under the limited sample has the following technical effects:

(1) A classification method of a convolutional neural network based on a self-calibration attention mechanism in bearing fault diagnosis under a limited sample is provided, which is an end-to-end prediction model without prior knowledge. The preprocessed raw sensor data can be directly utilized without additional feature engineering. And obtaining frequency domain characteristics through Fourier transformation.

(2) According to the invention, GN, BN, reLU and GELU are combined, and the feature extraction capability of the model is improved by fusing multi-scale information.

(3) According to the invention, through preprocessing and a series of data enhancement methods such as random scaling, random shearing, random Gaussian noise addition and the like, a training sample with better and clearer characteristics can be obtained.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph comparing GELU and ReLU;

FIG. 3 is a block diagram of a self-calibrating attention module;

FIG. 4 is a topology of a deep classification learning model;

fig. 5 is a bottleneck structure of the Resnet 50.

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

Example 1

The proposed framework was evaluated using a bearing dataset consisting of real damage data provided by the Pandberg University (PU) bearing dataset. The sampling frequency of the dataset was 64kHz, as shown in table 1, 13 sets of bearing signals were used to test the model, and classification information is shown in table 2.

Table 1 data set introduction

；

TABLE 2 bearing information of actual failure

；

OR: an outer ring; IR: an inner ring; s: single point injury; r: repeating the injury; m: multiple lesions

Referring to fig. 1, a flow chart of a bearing fault diagnosis method based on deep learning under a limited sample of the present invention specifically includes the following steps:

step 1: different fault types and different fault positions of the sensor acquisition equipment under different rotating speeds, different loads and different radial forces are used as original data, and according to a training set and a test set 1:9, dividing the data according to the proportion, then carrying out Fourier transform on the collected original data, taking the first 1024 points of the frequency domain data for normalization processing of Min-Max normalization in order to ensure the same input, carrying out data enhancement on the training data, and the like;

step 2: and (3) establishing a label for the data processed in the step (1). Different fault locations and different fault types have different categories;

step 3: building an end-to-end deep learning prediction model, wherein the end-to-end prediction model directly utilizes the preprocessed original data without additional characteristic engineering; the model is mainly used for improving the performance of the model by combining the self-calibration attention module with a depth network and simultaneously combining Group Normalization (GN) and Batch Normalization (BN) therein, and utilizing the combination of the Group Normalization (GN) and the Batch Normalization (BN). Although both methods normalize the pixel sets, they differ in defined dimensions. BN along the Channel axis (Channel) direction and GN along the Group axis (Group) direction. The two normalization techniques are combined, so that the diversity of samples can be better utilized, and the robustness of the model is improved. Model learning rate is selected to be 0.001, training is performed for 100 rounds, and an optimizer is selected as an Adam optimizer; the number of batches is 32, the downsampling rate is 4, and the number of groups in group normalization is 4;

step 4: inputting a certain proportion of training set data into a deep learning prediction model to train the end-to-end deep learning prediction model, and considering four metrics when analyzing experimental results: last-Mean, last-Std, best-Mean and Best-Std. The Last-Mean and Last-Std represent the Mean and standard deviation of the Last epoch, and the Best-Mean and Best-Std represent the Mean and standard deviation of highest accuracy achieved throughout the experiment. The accuracy of the experiment was evaluated according to the mean ± standard deviation of the accuracy of the validation set. Obtaining a trained deep learning classification model;

step 5: and (3) inputting the test set data with a certain proportion in the step (1) into a trained deep learning prediction model, and determining the model effect according to the classification accuracy obtained by the model.

The data enhancement method in step 1 comprises the following steps:

(1) Random scaling

The random scaling refers to multiplying the data of the sample by one number so as to generate new data, and in intelligent fault diagnosis, the random scaling can increase the diversity of the data, effectively enlarge a training data set, improve the adaptability of the model to the sample value, and simultaneously can alleviate the fitting problem and improve the generalization capability of the model.

（1）

Wherein,βto follow Gaussian distributionN(1, 0.01).

(2) Random cutting

Random clipping refers to multiplying each point of sample data by a number that is randomly 0 or 1, thereby generating a new sample. In intelligent fault diagnosis, random cutting can increase the diversity of data, improve the adaptability of the model to sample value deficiency, enlarge the training data set and improve the robustness and generalization capability of the model.

（2）

Wherein,maskrepresenting random willxMultiplied by zero.

(3) Random addition of gaussian noise

Random addition of gaussian noise means adding random noise conforming to a gaussian distribution to a signal. In intelligent fault diagnosis, gaussian noise is randomly added to simulate noise and interference in an actual signal, so that the tolerance of a model to the noise and the interference is increased, and the robustness and generalization capability of the model are improved.

（3）

Wherein,nis a AND conforming to Gaussian distributionxAnd (5) a vector with the same dimension.

In step 3, BN, GN, reLU, GELU is represented as follows:

(1) In a one-dimensional convolution layer, each input has three dimensions [ N, C, H ], representing the number of samples, the channel (convolution kernel), and the height (sample length), respectively. Wherein BN operation is represented as follows:

；

wherein the method comprises the steps ofAnd->Representing the>The observed input and output characteristics. />And->Is two trainable parameters for adjusting the distribution. />Is a constant that tends to zero.

(2) GN solves the problem of too much dependency of BN on batch size. Normalized set of GNS _i The expression is as follows:

here, theGIs the number of groups of the optical fiber,Cis the number of channels.C/GThe number of channels per group is indicated.Is a rounding down operation.

(3) The ReLU activation function is expressed as follows:

；

representing a ReLU activation function; />Representing larger values of y=0 and y=x.

(4) The GELU activation function is expressed as follows:

；

wherein the method comprises the steps ofIs->To facilitate code implementation, the gel activation function may be approximated using the following equation:

；

in step 3, the self-calibrating attention module is represented as follows:

self-calibrating attention module convolution improves the feature transformation process of traditional convolution to improve the overall performance of the model, taking into account the inherent multi-scale nature of the input data. The model performs feature extraction on the features on a smaller resolution scale so as to reduce interference of useless information on fault feature information and improve the capability of the model for extracting useful fault information in a smaller sample size. The specific structure of the self-calibration attention module is shown in fig. 2.

The self-calibration attention module convolution improves the feature conversion process of the traditional convolution in consideration of the inherent multi-scale characteristics of the input data, and improves the overall performance of the model without adding additional parameters and complexity. By means of convolution feature conversion in two different scale spaces, fault feature extraction capacity of the model under the condition of small samples is improved, the method comprises an original scale space and a self-correction scale space with smaller resolution, and the specific steps of self-correction convolution are divided into the following 5 steps:

(1) Splitting the input feature into two equal parts according to the number of channels, and recording asThe method comprises the steps of carrying out a first treatment on the surface of the Splitting the convolution kernel K into 3 parts of the same dimension, denoted +.>；

(2) Processing the original scale feature space, and inputting the feature X ₁ Through K ₁ Convolving to obtain a feature Y1;

(3) Processing the self-correction scale space to obtain a characteristic X ₂ Input into two channels with different resolutions, reducing the resolution by 4 times in the width direction of the input feature, and pooling the input feature X by averaging ₂ Performing downsampling conversion into low-dimensional embedding to correct convolution transformation of the high-resolution partial convolution kernel;

(4) The extracted features are rolled and up-sampled, and then output after ReLU function calculation, and the output of ReLU calculation and K are calculated ₃ The characteristic after convolution extraction is corrected to obtain the output characteristic Y of the self-correcting part ₂ The operation is as follows:

；

(5) Fusing two scale space output features Y ₁ And Y ₂ And obtaining a final output characteristic Y through group normalization GN and GELU activation function calculation.

The multiscale spatial information can be encoded by the attention operation and only information around each spatial location is considered, avoiding contamination from extraneous region information. Therefore, the self-calibration attention module convolution can effectively enlarge the receptive field of the convolution layer, and when the fault samples are fewer, the capability of the model to acquire useful fault characteristic information is improved.

The bottleneck structure of the Resnet50 is shown in FIG. 5. Combining the self-calibration attention module with the Resnet50, and changing the bottleneck structure of the Resnet50 into the self-calibration attention module structure to obtain a deep learning prediction model, wherein the method specifically comprises the following steps of:

(1) The original one-dimensional vibration signal passes through a wide-kernel convolution module, so that the influence of environmental interference on capturing useful features is weakened;

(2) Obtaining cross-layer input through residual connection, and combining the cross-layer input and the upper-layer convolution output into new output;

(3) The new output in the step (2) enters a self-calibration convolution module in four stages, the self-calibration convolution module can learn multi-scale space characteristics, enhance the capability of a model to learn the multi-scale characteristics of signals, consider the correlation of information of each channel through a channel self-attention module and highlight faults by giving more weights to similar characteristics;

Example two

The ratio of training set to test set in example one was modified to 2:8, the rest conditions are unchanged.

Table 3 shows the results of the first and second examples compared with the other methods, and the experimental results show that no matter 2:8 is also 1:9, the proposed network model has higher accuracy than the other four models. When the training set to validation set ratio is 2:8, an optimal accuracy of 96.35% is obtained; when the training set y to verification set ratio is 1: at 9, an optimum accuracy of 91.68% is obtained. Furthermore, the proposed model shows a relatively small standard deviation. According to experimental results, the model provided by the invention can obtain higher classification accuracy under a limited sample, so that catastrophic results caused by faults are avoided.

Table 3 results (%)

。

Therefore, the bearing fault diagnosis method based on deep learning under the limited sample is adopted, and the method is an end-to-end prediction model which does not need priori knowledge. The preprocessed raw sensor data can be directly utilized without additional feature engineering. Obtaining frequency domain characteristics through Fourier transformation; according to the invention, GN, BN, reLU and GELU are combined, and the feature extraction capability of the model is improved by fusing multi-scale information; according to the invention, through preprocessing and a series of data enhancement methods such as random scaling, random shearing, random Gaussian noise addition and the like, a training sample with better and clearer characteristics can be obtained.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. A bearing fault diagnosis method based on deep learning under a limited sample, comprising the steps of:

s2, constructing an end-to-end deep learning prediction model;

2. The method for diagnosing bearing faults based on deep learning under limited samples as claimed in claim 1, wherein in step S1, the method for enhancing the data of the training set comprises the following steps: random scaling, random clipping, random addition of gaussian noise.

3. The method for diagnosing bearing faults based on deep learning under limited samples as claimed in claim 1, wherein in step S1, fourier transform is taken for the first 1024 points of the frequency domain, expressed as:

；

FFTrepresenting the fourier transform.

4. The method for diagnosing a bearing failure based on deep learning under a limited sample as set forth in claim 1, wherein in step S3, constructing an end-to-end deep learning prediction model includes: the self-calibration attention module is combined with the feature extraction module, and meanwhile, the group normalization GN and the batch normalization BN are combined, so that the combination of the group normalization GN and the batch normalization BN is utilized to improve the performance of the deep learning prediction model.

5. The method for diagnosing bearing faults based on deep learning under limited samples as claimed in claim 4, wherein the expression of the batch normalized BN is calculated and summed along the channel axis direction as follows:

；

6. The method for bearing fault diagnosis based on deep learning under finite sample according to claim 5, wherein the self-calibration attention module improves the fault feature extraction capability of the model under the condition of small sample by performing convolution feature conversion in two different scale spaces, namely an original scale space and a self-calibration scale space with smaller resolution, wherein the self-calibration convolution comprises the following steps:

s31, dividing the input characteristic into two equal parts according to the number of channels, and recording asThe method comprises the steps of carrying out a first treatment on the surface of the Splitting the convolution kernel K into 3 parts of the same dimension, denoted +.>；

S32, processing the original scale feature space, and inputting the feature X ₁ Through K ₁ Convolution to obtain feature Y ₁ ；

s34, rolling and up-sampling the extracted features, outputting the result after ReLU function calculation, and summing the result of ReLU calculation and K ₃ The feature after convolution extraction is corrected to obtain a self-correcting partOutput characteristic Y of the component ₂ The operation is as follows:

；

7. The method for diagnosing bearing faults based on deep learning under limited samples as claimed in claim 6, wherein the self-calibration attention module is combined with the Resnet50, the bottleneck structure of the Resnet50 is changed into the self-calibration attention module structure, and a deep learning prediction model is obtained, and the method specifically comprises the following steps:

8. The method for deep learning based bearing fault diagnosis under finite samples according to claim 7, wherein the ReLU activation function is expressed as follows:

；

wherein,representing a ReLU activation function;

the GELU activation function is expressed as follows:

。

9. the method for diagnosing bearing faults based on deep learning under a limited sample as claimed in claim 1, wherein in step S4, the precision of the deep learning prediction model is checked by using Last-Mean, last-Std, best-Mean and Best-Std;