CN113834656B

CN113834656B - Bearing fault diagnosis method, system, equipment and terminal

Info

Publication number: CN113834656B
Application number: CN202110997171.6A
Authority: CN
Inventors: 刘立芳; 张梓锐; 和伟辉; 李飞龙; 齐小刚
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2024-04-30
Anticipated expiration: 2041-08-27
Also published as: CN113834656A

Abstract

The invention belongs to the technical field of bearing fault diagnosis, and discloses a bearing fault diagnosis method, a system, equipment and a terminal, wherein the bearing fault diagnosis method comprises the following steps: extracting time-frequency characteristics from an original vibration signal of the bearing by using continuous wavelet transformation, and converting the time-frequency characteristics into a two-dimensional image of 32 multiplied by 32 pixels; performing fault feature extraction on the time spectrum diagram by using an improved AlexNet model; for fault diagnosis classification, optimal model parameters are selected through LGBM classification algorithm and using Bayesian optimization. The bearing fault diagnosis method provided by the invention has optimal fault diagnosis accuracy. Through experimental comparison, the method provided by the invention has the highest accuracy rate 99.712% compared with other 7 methods, the prediction time of 1800 samples is 1.47 seconds and is in the same order of magnitude as the time of other models, the variance of the accuracy rate of five predictions is only 0.063, and the method is stable compared with other 6 methods, and the method provided by the invention has the optimal comprehensive performance.

Description

Bearing fault diagnosis method, system, equipment and terminal

Technical Field

The invention belongs to the technical field of bearing fault diagnosis, and particularly relates to a bearing fault diagnosis method, a system, equipment and a terminal.

Background

Currently, effective mechanical equipment failure diagnosis can reduce huge economic losses in industrial production, and in recent years, the application of machine learning or deep learning technology is greatly increased, and in addition, the utilization of advanced measurement technology enables a large amount of data in an industrial environment to be collected. In the context of big data, machine learning and deep learning fault diagnosis algorithm models exhibit excellent effects, such as deep neural networks (Deep Neural Network, DNN), CNN, recurrent neural networks, etc.

Currently, automatic encoders and convolutional neural networks are common in deep learning fault diagnosis models. Lei et al propose a deep neural network for rotary machine fault diagnosis based on frequency domain data. Zong et al propose a bearing fault diagnosis denoising self-encoder based on frequency domain data. Wei et al propose a one-dimensional CNN for bearing failure diagnosis by means of raw time signals that perform well in noisy environments. Guo X et al propose a hierarchical adaptive depth CNN for bearing fault diagnosis by converting the raw time signal into a 32X 32 matrix as input. Wang Q et al propose a CNN-based bearing reliability assessment and residual life prediction method that converts the frequency domain signal into a 32 x 32 matrix as input. Wang J et al propose a general bearing fault diagnosis model transferred from the well-known AlexNet model and compared the effects of the eight time-frequency feature extraction methods. Wang lh et al propose a motor fault diagnosis CNN that uses Short-time fourier transforms (Short-Time Fourier Transform, STFT) to convert fault signals into time-frequency images. CLAESSENS et al propose a bearing fault diagnosis local connection network consisting of normalized sparse automatic encoders. Eren et al use one-dimensional convolutional neural networks for time series prediction for data preprocessing. Better efficiency is achieved by filtering, decimating and normalizing the input data. Ran et al claim to achieve a high degree of accuracy in using DNN for time series prediction, but do not provide any architectural details for the DNN networks they propose. The same problem also occurs in Mao et al's study claiming to achieve high accuracy using a new deep learning approach, but they offer only training accuracy (rather than test accuracy) and do not provide any viable architecture for the proposed network, resulting in difficulty in reproduction. In a more advanced paper, they are concerned both with CNN and with Long Short Term Memory (LSTM) for bearing failure diagnosis. However, the stepwise construction process of the model they propose is not explicitly explained. Therefore, a new bearing fault diagnosis method is needed to make up for the defects of the conventional bearing fault diagnosis method.

Through the above analysis, the problems and defects existing in the prior art are as follows:

(1) In the existing bearing fault diagnosis method, no architecture details are provided for the proposed DNN network.

(2) In existing bearing failure diagnostic methods, only training accuracy (rather than test accuracy) is provided, and no viable architecture is provided for the proposed network, resulting in difficulty in reproduction.

(3) In the prior technical scheme of simultaneously focusing on CNN and long-short-term memory network LSTM for bearing fault diagnosis, the gradual construction process of the model is not explicitly explained.

The difficulty of solving the problems and the defects is as follows:

(1) Many DNN models have large depth and complex structure.

(2) In training and testing a model, the test accuracy is generally less than the training accuracy, and a higher accuracy can be obtained given the training accuracy, but the model method cannot be better illustrated than the test accuracy.

(3) Sometimes, the model is built by continuously adjusting the result feedback to obtain a final result, and the construction process is difficult to explain.

The meaning of solving the problems and the defects is as follows:

(1) Aiming at the first problem, the structural details of the DNN can be given to directly construct the same network model through a deep learning tool, so that the constructed excellent model is directly utilized for fault diagnosis.

(2) In view of the second problem, the method advantages and functions can be explained by providing test accuracy in the fault diagnosis method, and a feasible architecture can be provided for the proposed network to reproduce the method more easily.

(3) The third problem is diagnosed, and the gradual construction process of the model can lead the diagnosis method to have better interpretability, be clearer in exploring the principle of the method and have clearer guiding effect in improving the method.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a bearing fault diagnosis method, a system, equipment and a terminal, in particular to a bearing fault diagnosis method, a system, equipment and a terminal based on continuous wavelet transformation CWT and AlexNet-lightweight gradient elevator fusion model AlexNet-LGBM.

The invention is realized in that a bearing fault diagnosis method comprises the following steps:

Firstly, extracting time-frequency characteristics of an original vibration signal of a bearing by using continuous wavelet transformation, and converting the time-frequency characteristics into a two-dimensional image of 32 multiplied by 32 pixels; secondly, performing fault feature extraction on the time-frequency spectrum chart by using an improved AlexNet model; finally, for fault diagnosis classification, optimal model parameters are selected through LGBM classification algorithm and using Bayesian optimization.

Further, the bearing fault diagnosis method includes the steps of:

Step one, signal sampling: taking each sample_length of continuous data points as one sample of the original vibration data, and continuously sampling according to the sampling interval sample_interval in an overlapped sampling mode; the method has the effect of dividing the original signal sample so as to generate a sample with proper size for processing in the subsequent step, and in addition, after the sample is divided, more samples can be generated for training and testing so as to increase the accuracy of the model.

Step two, morlet continuous wavelet transformation signal processing: performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a testing set; for the training process, executing a step three; for the test process, jumping to the fifth step; the main effects of this step are at two points: (1) The Morlet continuous wavelet transformation is utilized to process the one-dimensional signals, so that the time domain features and the frequency domain features of the one-dimensional signals can be extracted, and the Morlet continuous wavelet transformation is utilized to convert the one-dimensional signals into two-dimensional pictures for training of subsequent models and further feature extraction.

Step three, alexNet feature extraction: inputting a time-frequency diagram with the size of N multiplied by N of a training set into an improved AlexNet model for training, and storing the model; the main effect of the step is to train an improved AlexNet feature extraction model, adjust various super parameters of the model, so that the model has optimal feature extraction capability and then stores model parameters, thereby being used in a subsequent test stage.

Fourth, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last-to-last full-connection layer, inputting a LGBM model for training, wherein the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model; the main function of this step is to train LGBM the model, and to train the final fault classifier using the fault signature input LGBM extracted by AlexNet model.

Step five, testing process: inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, and inputting a trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result. The main function of this step is to obtain the final classification result of fault diagnosis.

Further, in step one, the signal sampling includes:

Selecting sampling length sample_length continuous data points from an original vibration signal as an original sample; sample_length continuous sampling points are subjected to continuous wavelet transformation to generate a corresponding time-frequency image; readjusting the time-frequency image to a suitable size of N x N; the sequential sample length data points following the sample interval are selected in an overlapping fashion as another sample to produce another nxn sized image, and the process is repeated to produce enough training and test images.

Further, in the second step, the Morlet continuous wavelet transform signal processing includes:

the wavelet function ψ (t) performs the continuous wavelet transform formula of the signal x (t) as follows:

Among the different wavelets, the complex or analytic wavelet has a fourier transform with a negative frequency of zero. With such complex wavelets, the phase and amplitude components of the signal are separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis with Morlet complex wavelet has the advantage of enabling separation of information in the wavelet domain and simpler relationship between transform ridge and instantaneous frequency. Bearing vibration signals are processed using Morlet, the Morlet wavelet is defined as:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

Wherein f ₀ is the center frequency of the mother wavelet; the second term in brackets is called the correction term, which is used to correct the complex sine multiplied by the non-zero average of the gaussian term. In practice, the value of f ₀ >0 is ignored, in which case the Morlet wavelet is represented as follows:

Wherein the Morlet wavelet is a simple complex sine exp (i 2 pi f ₀ t) within a Gaussian envelope exp (-t ²/2); the term pi ^1/4 is a normalization factor that ensures that the wavelet has unit energy.

The fourier transform of the Morlet wavelet is as follows:

Wherein the expression of the fourier transform of the Morlet wavelet has the form of a gaussian function shifted by f ₀ along the frequency axis, the center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the parent wavelet and varies as follows according to wavelet scale a:

The energy spectrum, i.e. the square magnitude of the fourier transform, is calculated as follows:

The Morlet wavelet energy integrated is equal to 1 according to equation (3).

And converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises a corresponding relation between time and frequency.

Further, in the third step, the AlexNet feature extraction includes:

AlexNet is modified as follows:

(1) Model input dimension improvement: the input image size 224×224 of classical AlexNet is still larger for bearing fault diagnosis based on vibration signals, if the frequency of collecting vibration signals of the bearing is higher, the image generated by wavelet transformation of all samples occupies a large storage space, so that a color image with the size of 32×32 is adopted as input.

(2) Convolutional layer activation function improvement: the ReLU function has limitations because its function relu→f (z) =max (0, z) computes the gradient formula at the time of iterative update as:

using the variant PReLU of ReLU, the representation is:

PReLU unlike ReLU, which has a linear function with a slope a when z <0, the gradient update is calculated as:

The value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameter in the network.

(3) Full connectivity layer and output layer improvement: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four classifications are adopted, so that the size of an output layer of the improved AlexNet structure is set to be 4; since the output layer becomes smaller, the size of the second full connection layer is set to 1000.

Further, the improved AlexNet structure includes:

(1) Convolutional layer

The convolution layer and the upper layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:

Wherein h _j represents the j-th output feature map of the current convolutional layer; x _i represents the ith output feature map of the previous convolutional layer, i.e., the convolutional layer input of the current layer; * Representing convolution operation, mapping a convolution kernel corresponding to an ith input feature to a jth output feature in a current layer by a parameter matrix W _ij, and mapping b _j to an offset corresponding to the jth input feature of the current layer convolution layer; f (x) is a nonlinear activation function corresponding to PReLU functions shown in equation (8).

(2) Pooling layer

The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects the largest pool for extracting the largest value from the convolution output layer Y _cn as follows:

Wherein S ^M×N is a pooled scale matrix; m and N are the dimensions of S. During pooling, the maximum value is extracted from the mxn matrix in Y _cn until the entire Y _cn is scanned by a fixed step size; s is a 3 x 3 matrix, then the parameters in Y _cn will be reduced to 1/9 and assigned to P _cn in the pool output layer.

(3) Full connection layer

The characteristic is that the last convolution layer and the pooling layer are passed through and then reach the flat layer to Flatten data into one dimension, each neuron in the full connection layer is completely connected with all neurons in the upper layer through the two full connection layers, dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are equivalent to being randomly discarded by a network, the structure of the network is changed after each iteration, the effect is equivalent to the effect of integrated learning of networks with various structures, and the multiple networks are combined to average so as to effectively prevent overfitting.

The last layer is the output layer. To perform multiple types of fault classification, a Softmax classifier is used. The input picture in the training dataset is denoted as x _k, the label y _k denotes the probability that x _k belongs to category k, where y e (1, 2,..and J) denotes the fault category. For each x, softmax attempts to estimate the probability p (y=j|x) of the tag for each y e (1, 2. The Softmax activation function is expressed as follows:

Where θ is the weight matrix of the Softmax layer and θ _i is the row vector of θ.

(4) Parameter update

To accommodate multi-class fault diagnosis tasks, the loss function is set to a cross entropy loss function, expressed as:

Wherein, A probability representing that the prediction of the ith sample belongs to class k; /(I)As a practical probability, if the true class of the ith sample is k, then/>Otherwise, 0; w ^(l) is the parameter matrix of the first layer; the first term in the formula measures the predictionAnd true category/>The cross entropy between the two is the largest when the predicted value and the true value are equal, and the loss function is the smallest; the second term is the L2 regularization term and the coefficient λ is the weight decay parameter.

The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:

where α is the learning rate, and the magnitude of the gradient change in each iteration is controlled. The residual quantity generated by the loss function at the jth node of the first layer is recorded as The recurrence formula is expressed as:

The gradient formula of the loss versus parameter function is expressed as:

for equation (13), y _k takes a value of 1 only for one class k, the remainder being 0. Let the true category be Then:

obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)

The residuals δ ^(L-1),...,δ⁽¹⁾ for the other layers are calculated according to recurrence formula (15).

The bearing failure feature extraction model is built by Python-based Keras deep learning framework that uses Tensorflow back-end support. The SGD optimizer, cross entropy loss function, and normalization method of Keras are selected to train the parameters.

Further, in the fourth step, the LGBM fault diagnosis includes gradient-based single-side sampling and mutual exclusion feature bundling, including:

(1) Gradient-based single-sided sampling algorithms. The centralized training has the example of large gradients, for the example of small gradients, random decimation is used and the effect on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The GOSS algorithm is as follows:

Input: training data I with n instances { x ₁,...,x_n }, a number of iterations d, sampling rates a, b of large and small gradient data, a loss function loss and a number of weak learners L.

And (3) outputting: a strong learner.

Step 1: initializing: let topn=a×len (I) denote the number of large gradient data samples; model list models adds L, and weight w of each training data is set to 1;

Step 2: predicting training data by using a model list, calculating the loss g of each data by using a loss function loss, and arranging the training data in descending order of g;

Step 3: taking the front topN of the sequenced training data as a large gradient subset A, randomly extracting b×|A ^C | from the rest data set A ^C as a small gradient subset B, and combining the large gradient subset and the small gradient subset to be usedSet;

step 4: multiplying the weight w of the small gradient sample by a factor (1-a)/b;

Step 5: inputting the data I, the negative gradient-g and the weight w corresponding to the training data set usedSet into a learner L for training to obtain a new model;

The example is partitioned according to the estimated variance gain of vector V _j (d) over subsets a and B:

Wherein ,A_l＝{x_i∈A:x_ij≤d},A_r＝{x_i∈A:x_ij＞d},B_l＝{x_i∈B:x_ij≤d},B_r＝{x_i∈B:x_ij＞d} coefficients (1-a)/B are used to normalize the gradient sum over B to the size of a ^C; adding newModel to model list models;

Step 6: and (5) circularly executing the steps 2 to 5 until the iteration times d are reached or convergence is achieved.

(2) The mutual exclusion feature binding algorithm comprises two steps of binding cluster generation and mutual exclusion feature combination.

The binding cluster generation algorithm determines which mutually exclusive features can be combined, and the combined features are put together and are called bundle; combining the mutually exclusive features to combine each bundle into one feature; determining which mutually exclusive features can be combined to use GreedyBundle, wherein the process is that firstly, by taking the features as vertexes and adding edges for each feature under the condition that every two features are not mutually exclusive, the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutually exclusive feature merging constructs feature packages by letting mutually exclusive features exclusivefeatures reside in different bins, which can be implemented simply by adding an offset to the value of the original feature.

The outputs of the penultimate layer of AlexNet are classified using the LGBMClassifier package of Python for programming.

(3) Bayesian super-parameter optimization

Parameters were tuned for the training process of the LGBM model using HyperOpt. HyperOpt provides an easy-to-use bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization by model-based sequential optimization techniques. Optimization based on a sequence model is a bayesian optimization technique.

The Bayesian optimization is an optimization algorithm based on a model, and is specially customized for an objective function, namely a cost function quantity, and the Bayesian optimization search can obtain the maximum value of an unknown objective function of a sample; and as with all model-based optimization algorithms, creating a model of the objective function by using a regression method, selecting the next point to be acquired according to the model, and updating the model.

The basic algorithm of bayesian optimization is as follows:

step 1: setting a Gaussian process for the objective function f;

step 2: observing f at n ₀ points according to the initial space-filling experimental design, and setting n=n ₀;

Step 3: when N is less than or equal to N, performing a loop: updating the posterior probability distribution over f using all available data; let x _n be the maximum of the capture function over x, where the capture function is calculated using the current posterior distribution; observing y _n＝f(x_n); increasing n by 1;

step 4: returning a solution: points calculated with maximum f (x), or points calculated with maximum posterior mean;

The objective function f is usually unknown, a gaussian process defines for each point x a probability distribution f (x) of gaussian distribution, determined by the mean μ and standard deviation σ, defining the probability distribution of the function:

Wherein, Representing a standard normal distribution.

To estimate μ (x) and σ (x), a gaussian process is fitted to the data. Assuming that each observation f (χ) is a normally distributed sample, if there is a dataset made up of multiple observations, i.e., f (χ ₁),f(χ₂),...,f(χ_t), then the vector of datasets [ f (χ ₁),f(χ₂),...,f(χ_t) ] is a multi-element normally distributed sample defined by a mean vector and covariance matrix, so the gaussian process is an n-variable normally distributed, where n is the number of observations. The covariance matrix is defined by kernel function k (χ ₁,χ₂), which shows that far samples are hardly correlated, while near samples are highly correlated. Based on the prior assumption of the fact that the function tends to smooth and the likelihood of the prior function, the corresponding close χ ₁ and χ ₂ values of the two observations are likely to be correlated.

Given a set of observations P _1:t＝f(χ_1:t) and sampling noiseThe gaussian process is calculated as follows:

Wherein, k＝[k(x,χ₁)k(x,χ₂)…k(x,χ_t)]。

The bayesian optimization implementation uses this gaussian process model to search for the maximum f (x) of the unknown objective function. The next χ is selected for testing by selecting the maximum of the acquisition function to balance exploration, i.e., improving the model in the less explored portions of the search space and development, i.e., favoring the balance between the promising portions predicted by the model. After observation, the algorithm will update the gaussian process to take into account the new data. Since it is assumed that all points of the search space have good likelihood, the gaussian process is initialized with a constant mean value. After each observation, the model was gradually refined.

The gaussian process specifies μ (x) and kernel function k (χ ₁,χ₂) entirely from the mean function.

The goal is to learn the feature length scale l ² and the overall varianceThe probability θ of maximizing the data given a kernel function, the marginal probability is calculated as follows:

Wherein μ ₀ is the mean function.

Another object of the present invention is to provide a bearing failure diagnosis system to which the bearing failure diagnosis method is applied, the bearing failure diagnosis system comprising:

the signal sampling module is used for continuously sampling original vibration data according to sampling intervals sample_interval in an overlapped sampling mode by taking each sample_length continuous data point as one sample;

The wavelet transformation signal processing module is used for performing Morlet continuous wavelet transformation signal processing, performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a test set; for a training process, executing the AlexNet feature extraction module; for a test process, jumping to the test module;

AlexNet a feature extraction module, which is used for inputting a time-frequency diagram with the size of N multiplied by N of a training set into an improved AlexNet model for training, and storing the model;

LGBM a fault diagnosis module, which is used for inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last-to-last full-connection layer, inputting a LGBM model for training, and the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model;

The test module is used for inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, inputting the trained LGBM model, and outputting the LGBM model as a fault diagnosis result.

It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

(1) And (3) signal sampling: taking each sample_length of continuous data points as one sample of the original vibration data, and continuously sampling according to the sampling interval sample_interval in an overlapped sampling mode;

(2) Morlet continuous wavelet transform signal processing: performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a testing set; for the training process, performing step (3); for the test procedure, jump to step (5);

(3) AlexNet feature extraction: inputting a time-frequency diagram with the size of N multiplied by N of a training set into an improved AlexNet model for training, and storing the model;

(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last-to-last full-connection layer, inputting a LGBM model for training, wherein the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model;

(5) The testing process comprises the following steps: inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, and inputting a trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

Another object of the present invention is to provide an information data processing terminal for implementing the bearing failure diagnosis system.

By combining all the technical schemes, the invention has the advantages and positive effects that: aiming at the problem that the Softmax layer classification capability of CNN is inferior to that of an emerging machine learning classification method, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and AlexNet-lightweight gradient elevator fusion model (AlexNet-LIGHT GRADIENT Boosted Machine, alexNet-LGBM), and the method can be divided into three parts: ① Vibration signal data processing based on continuous wavelet transformation: extracting time-frequency characteristics from an original vibration signal of the bearing by using continuous wavelet transformation, and converting the time-frequency characteristics into a two-dimensional image of 32 multiplied by 32 pixels; ② For fault feature extraction, a AlexNet model is improved to perform feature extraction on the time spectrum diagram; ③ For fault diagnosis, the extracted fault characteristics are subjected to fault classification by LGBM classification algorithm, and the optimal model parameters are selected by using Bayesian optimization. The invention also uses a Kaiser Chu Da (CASE WESTERN RESERVE University, CWRU) bearing dataset to carry out a comparison experiment, and the improved AlexNet, leNet-5 and various combination methods of multi-granularity cascade forests and LGBM, catBoost are compared, so that the result shows that the AlexNet-LGBM fault diagnosis method based on continuous wavelet transformation provided by the invention has optimal fault diagnosis accuracy.

The bearing fault diagnosis method provided by the invention has the following advantages:

(1) For equipment fault feature extraction, the vibration data is first subjected to continuous wavelet transform (Continuous Wavelet Transform, CWT) to convert to a time-frequency diagram. In order to adapt to bearing fault feature extraction, alexNet models are improved: ① The input dimension is changed to be 32 multiplied by 3 so as to reduce the memory space occupied by the time-frequency diagram; ② The convolutional layer activation function uses a parameterized linear rectification function (PARAMETRIC RECTIFIED LINEAR Unit, PReLU) to overcome the limitations of the linear rectification function (RECTIFIED LINEAR Unit, reLU); ③ The full connection layer and the output layer are changed into the size suitable for the fault classification number; the improved AlexNet and LeNet-5 and EFFICIENTNET-B0 migration models were used for feature extraction, respectively, and the feature extraction capabilities of the three neural network structures were compared.

(2) For equipment fault diagnosis, a fault diagnosis method based on continuous wavelet transformation and AlexNet-lightweight gradient hoist fusion model (AlexNet-LIGHT GRADIENT Boosted Machine, alexNet-LGBM) is proposed: firstly, extracting fault characteristics from vibration signals by using continuous wavelet transformation and improved AlexNet, further carrying out fault classification on the extracted characteristics by using a lightweight gradient elevator classification algorithm, and optimizing model parameters by using Bayesian optimization. And the various combinations of improved AlexNet, leNet-5 feature extraction and multi-granularity cascade Forest (multi-GRAINED CASCADE Forest, gcForest), LGBM and CatBoost classification algorithms were compared.

In order to solve the problems of fault feature extraction and fault diagnosis in the rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and AlexNet-LGBM, and compared with other 7 methods, the method has the highest accuracy rate 99.712%, the prediction time of 1800 samples is 1.47 seconds and is in the same order of magnitude as the time of other models, the variance of the accuracy rate of five predictions is only 0.063, and the method is stable compared with other 6 methods, and has optimal comprehensive performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a bearing fault diagnosis method provided by an embodiment of the present invention.

FIG. 2 is a block diagram of a bearing fault diagnosis system provided by an embodiment of the present invention;

In the figure: 1. a signal sampling module; 2. a wavelet transformation signal processing module; 3. AlexNet feature extraction module; 4. LGBM fault diagnosis module; 5. and a test module.

Fig. 3 is a flow chart of processing a vibration signal of a bearing according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the Morlet continuous wavelet transform effect provided by the embodiment of the invention.

Fig. 5 is a schematic structural diagram of an improvement AlexNet provided by an embodiment of the present invention.

Fig. 6 is a flow chart of bearing fault diagnosis provided by an embodiment of the present invention.

Fig. 7 is a schematic diagram of a continuous wavelet transform processing result according to an embodiment of the present invention.

FIG. 8 is a graph showing the accuracy rate variation of the modifications AlexNet, leNet-5 and EFFICENTNET provided by the embodiment of the invention.

FIG. 9 is a schematic diagram of the loss variation of the modifications AlexNet, leNet-5 and EFFICENTNET provided by the examples of the present invention.

Fig. 10 is a schematic diagram of a TSNE visual display of extracted features provided by an embodiment of the present invention.

FIG. 11 is a schematic diagram of the accuracy of a test set of 5 experiments for six combined models provided by an embodiment of the present invention.

Fig. 12 is a schematic diagram of average accuracy of 5 experimental test sets of six combined models provided in an embodiment of the present invention.

Fig. 13 is a schematic diagram of the average time consumption of the test set of 5 experimental predictions of six combined models provided in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Aiming at the problems existing in the prior art, the invention provides a bearing fault diagnosis method, a system, equipment and a terminal, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the bearing fault diagnosis method provided by the embodiment of the invention includes the following steps:

S101, signal sampling: taking each sample_length of continuous data points as one sample of the original vibration data, and continuously sampling according to the sampling interval sample_interval in an overlapped sampling mode;

S102, morlet continuous wavelet transformation signal processing: performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a testing set; for the training process, S103 is performed; for the test procedure, jump to S105;

S103, alexNet feature extraction: inputting a time-frequency diagram with the size of N multiplied by N of a training set into an improved AlexNet model for training, and storing the model;

S104, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last-to-last full-connection layer, inputting a LGBM model for training, wherein the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model;

S105, testing process: inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, and inputting a trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

As shown in fig. 2, the bearing fault diagnosis system provided by the embodiment of the invention includes:

The signal sampling module 1 is used for continuously sampling original vibration data according to a sampling interval sample_interval in an overlapped sampling mode by taking each sample_length continuous data point as a sample;

The wavelet transformation signal processing module 2 is used for performing Morlet continuous wavelet transformation signal processing, performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a test set; for a training process, executing the AlexNet feature extraction module; for a test process, jumping to the test module;

AlexNet the feature extraction module 3 is used for inputting a time-frequency diagram with the size of N multiplied by N of the training set into the improved AlexNet model for training, and storing the model;

LGBM the fault diagnosis module 4 is used for inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the penultimate full-connection layer, inputting a LGBM model for training, and the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model;

and the test module 5 is used for inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, inputting the trained LGBM model, and outputting the LGBM model as a fault diagnosis result.

The technical scheme of the invention is further described below with reference to specific embodiments.

Aiming at the problems existing in the prior art, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and AlexNet-lightweight gradient elevator fusion model. Firstly, extracting time-frequency characteristics of an original vibration signal of a bearing by using continuous wavelet transformation, and converting the time-frequency characteristics into a two-dimensional image of 32 multiplied by 32 pixels; secondly, improving AlexNet model to extract fault characteristics of the time spectrum diagram; finally, for fault diagnosis classification, optimal model parameters are selected through LGBM classification algorithm and using Bayesian optimization.

1. Signal processing

1.1 Vibration Signal processing flow

First, a sampling length sample_length (set to 1024 in the present experiment) is selected from the original vibration signal as one original sample. Then, the sample_length continuous sampling points are subjected to continuous wavelet transformation to generate a corresponding time-frequency image. Subsequently, the time-frequency image was readjusted to a suitable size of n×n (set to 32×32 in the present experiment). Then, consecutive sample_length data points after the sampling interval sample_interval (set to 384 in the present experiment) are selected as another sample in an overlapping manner, and another image of n×n size is generated as shown in fig. 3. The above process is repeated to produce enough training and test images.

1.2 Morlet continuous wavelet transform signal processing

among the different wavelets, the complex or analytic wavelet has a fourier transform with a negative frequency of zero. With such complex wavelets, the phase and amplitude components of the signal can be separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis with Morlet complex wavelet has the advantage of enabling separation of information in the wavelet domain and simpler relationship between transform ridge and instantaneous frequency. The invention uses Morlet to process the bearing vibration signal. Morlet wavelet is defined as:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

Where f ₀ is the center frequency of the parent wavelet. The second term in brackets is called the correction term because it corrects for the non-zero average of the complex sine times the gaussian term (corresponding to the first term in brackets). In practice, the value of f ₀ >0 is negligible, in which case the Morlet wavelet can be expressed as follows:

This wavelet is a simple complex sinusoid exp (i 2 pi f ₀ t) within a gaussian envelope exp (-t ²/2). The term pi ^1/4 is a normalization factor that ensures that the wavelet has unit energy. The function given by equation (3) is not a true wavelet because it has a non-zero mean, i.e. its zero frequency term of the corresponding energy spectrum is non-zero and therefore it is not acceptable. However, in practice, when f ₀ > 0, it can be used with minimal error.

The fourier transform of the Morlet wavelet is as follows:

it has the form of a gaussian function, shifted along the frequency axis by f ₀. The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequency of the Morlet wavelet. The characteristic frequency is set for the parent wavelet and varies as follows according to wavelet scale a:

the energy spectrum (square magnitude of fourier transform) is calculated as follows:

The Morlet wavelet energy integrated is equal to 1 according to equation (3).

The effect of using Morlet's continuous wavelet transform on the processing of bearing vibration signal samples is shown in FIG. 4, depending on the sampling frequency of the signal (12 kHz in the present invention).

Fig. 4 (a) is a record of the acceleration continuously measured by the bearing acceleration sensor for the bearing rotating at high speed in 86 ms, the abscissa is the time axis, and the ordinate is the acceleration of the monitoring point when the bearing rotates. Under ideal conditions, the bearing acceleration for a complete uniform rotation should be 0. It can be seen from fig. 4 (a) that the actual acceleration of the bearing fluctuates up and down around 0 mean, and at about 27 ms and 61 ms, the acceleration of the bearing is larger, the corresponding bearing energy is larger, and the color brightness is higher at the corresponding time position in the right graph.

Through continuous wavelet transformation, the one-dimensional vibration signal can be converted into a picture, and the picture contains a corresponding relationship between time and frequency.

1.3 AlexNet feature extraction

The AlexNet model proposed by Krizhevsky et al can achieve better performance in image recognition than other methods. The Alexnet model has, to date, played an important role in many areas. In order to adapt to bearing fault feature extraction, the invention improves AlexNet as follows:

(1) Model input dimension improvement. The input image size 224×224 of classical AlexNet is still larger for vibration signal based bearing fault diagnosis, and if the frequency of vibration signal acquisition of the bearing is higher, the pictures generated by wavelet transformation of all samples occupy a large storage space. Therefore, the present invention takes a 32×32 sized color picture as an input.

(2) Convolutional layer activation function improvement. The ReLU function has limitations because its function relu→f (z) =max (0, z) computes the gradient formula at the time of iterative update as:

Since the negative gradient is set to 0 by the ReLU activation function, it is not possible to participate in the subsequent propagation and activation, so that the parameters of the neuron cannot be updated. If the learning rate is set too large in actual training, part of neurons can fail, and parameter updating cannot be effectively performed, so that training fails. For this purpose, the invention uses a variant PReLU of the ReLU, expressed in the form:

PReLU unlike ReLU, the value of z <0 is a linear function with a slope a (smaller constant). The gradient update is calculated as follows:

PReLU can greatly reduce the loss of negative gradient information and simultaneously inhibit on one side. The value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.

(3) Full connectivity layer and output layer improvements. Since the bearing fault diagnosis studied by the present invention contains 1 normal type and 3 fault types (described in detail in the fourth section), four classifications are added, the output layer size of the improved AlexNet structure is set to 4. Since the output layer becomes smaller, the second fully connected layer is sized to 1000 to better extract key features. The improved AlexNet structure proposed by the present invention is shown in figure 5.

1.3.1 Convolutional layers

The convolution layer and the upper layer are connected in a local connection and counterweight mode, so that the number of parameters is greatly reduced. The operation process when the convolution is carried out is as follows:

Where h _j denotes the jth output feature map of the current convolutional layer, X _i denotes the ith output feature map of the last convolutional layer (convolutional layer input of the current layer), and the parameter matrix W _ij denotes the convolutional operation, maps the convolutional kernel corresponding to the ith input feature to the jth output feature in the current layer, and b _j to the offset corresponding to the jth input feature of the convolutional layer of the current layer. f (x) is a nonlinear activation function, corresponding in the present invention to PReLU functions shown in equation (8).

1.3.2 Pooling layer

The gray color in fig. 5 is a pooling layer, which is used for downsampling after convolution operation, and can further reduce the dimension of the extracted features. Common pooling layers include a maximum pool and an average pool. The present invention selects the maximum pool, which can extract the maximum value from the convolution output layer Y _cn as follows:

Wherein S ^M×N is a pooled scale matrix; m and N are the dimensions of S. During pooling, the maximum value is extracted from the mxn matrix in Y _cn until the entire Y _cn is scanned by a fixed step size. In this chapter, S is a3 x 3 matrix, then the parameters in Y _cn will be reduced to 1/9 and assigned to P _cn in the pool output layer.

1.3.3 Full connection layer

The features pass through the last convolutional layer and the pooling layer, then reach the flat layer to Flatten the data into one dimension, and then pass through the two fully connected layers. Each neuron in the fully connected layer is fully connected with all neurons in the upper layer. Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, and partial units are not updated, which is equivalent to being randomly discarded by a network. Therefore, the structure of the network after each iteration is changed, which is equivalent to the effect of integrated learning of networks with various structures, and the multiple networks can jointly perform averaging to effectively prevent over-fitting.

The last layer is the output layer. To perform multiple types of fault classification, a Softmax classifier is used. The Softmax classifier can effectively solve the problem of multiple classification. The input picture in the training dataset is denoted x _k, its label y _k denotes the probability that x _k belongs to category k, where y e (1, 2,..and J) denotes the failure category. For each x, softmax attempts to estimate the probability p (y=j|x) of the tag for each y e (1, 2. The Softmax activation function is expressed as follows:

1.3.4 Parameter update

To accommodate the multi-classification fault diagnosis task of this chapter, the loss function is set to a cross entropy loss function, expressed as:

Wherein the method comprises the steps of Representing the probability that the prediction of the ith sample belongs to class k,/>Is the actual probability (if the actual class of the ith sample is k, then/>Otherwise 0), W ^(l) is the parameter matrix of the first layer. The first term in the formula measures prediction/>And true category/>The cross entropy between the two is the largest when the predicted value and the true value are equal, and the loss function is the smallest. The second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter, so that the relative weights of the two terms can be balanced, and the overfitting is effectively prevented. /(I)

where α is the learning rate, and the magnitude of the gradient change in each iteration is controlled. The residual quantity generated by the loss function at the jth node of the first layer is recorded as Its recurrence formula can be expressed as:

the gradient of the loss versus parameter function can be written as:

The residual delta ^(L-1),...,δ⁽¹⁾ of the other layers can be calculated according to recurrence formula (15).

For experiments, the invention constructs a bearing fault feature extraction model through a Python-based Keras deep learning framework that uses Tensorflow back-end support. The SGD optimizer, cross entropy loss function, and normalization method of Keras are selected to train the parameters.

1.4 LGBM fault classification

The flow of LGBM classification algorithm mainly comprises single-side sampling based on gradient and mutual exclusion feature bundling.

And (3) outputting: a strong learner.

Step 1: initializing: let topn=a×len (I) denote the number of large gradient data samples. Model list models adds L. The weight w of each training data is set to 1.

Step 2: the model list predicts the training data and calculates the loss g for each data using a loss function loss. And the training data are arranged in descending order of g.

Step 3: the ordered training data were taken as the first topN as the large gradient subset a and b×|a ^C | were randomly extracted from the remaining data set a ^C as the small gradient subset B. The large gradient subset and the small gradient subset are combined and denoted usedSet.

Step 4: the weight w of the small gradient sample is multiplied by a factor (1-a)/b.

Step5: the training data set usedSet is input into the learner L to train to obtain new model.

The instances are partitioned according to the estimated variance gain of vector V _j (d) over subsets a and B.

Wherein A_l＝{x_i∈A:x_ij≤d},A_r＝{x_i∈A:x_ij＞d},B_l＝{x_i∈B:x_ij≤d},B_r＝{x_i∈B:x_ij＞d} coefficients (1-a)/B are used to normalize the gradient sum over B to the size of a ^C. The newModel is added to the model list models.

The bundle cluster generation algorithm determines which mutually exclusive features can be merged (features that can be merged together, called bundles), and then the mutually exclusive feature merges the individual bundles into one feature. The specific process of determining which mutually exclusive features can be used together is GreedyBundle, namely, firstly taking the features as vertexes and adding edges for each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then using a greedy algorithm; mutually exclusive feature merging constructs feature packages by letting mutually exclusive features exclusivefeatures reside in different bins, which can be implemented simply by adding an offset to the value of the original feature.

The present invention uses the LGBMClassifier package of Python to program and classify the output of the penultimate layer of AlexNet.

1.5 Bayes Supermarameter optimization

The invention uses HyperOpt to perform parameter tuning on the training process of the LGBM model. HyperOpt provides an easy-to-use bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization through model-based sequential optimization techniques. Optimization based on a sequence model is a bayesian optimization technique.

Bayesian optimization is a model-based optimization algorithm specifically tailored to an objective function (also known as a cost function). The bayesian optimization search may obtain a maximum of the unknown objective function of the sample from which to obtain. And as with all model-based optimization algorithms, creating a model of the objective function by using a regression method, selecting the next point to be acquired according to the model, and updating the model.

The basic algorithm of bayesian optimization is as follows:

step 1: a gaussian process is set for the objective function f.

Step 2: f was observed at point n ₀ according to the initial space-filling experimental design. N=n ₀ is set.

Step 3: when N is less than or equal to N, performing a loop: updating the posterior probability distribution over f using all available data; let x _n be the maximum of the capture function over x, where the capture function is calculated using the current posterior distribution; observing y _n＝f(x_n); n is self-increased by 1.

Step 4: returning a solution: points calculated with maximum f (x), or points calculated with maximum a posteriori average.

The objective function f is typically unknown, and a gaussian process defines a gaussian probability distribution f (x) for each point x. And is therefore determined by the mean mu and standard deviation sigma. Defining a probability distribution of the function:

Wherein the method comprises the steps of Representing a standard normal distribution.

To estimate μ (x) and σ (x), a gaussian process needs to be fitted to the data. For this reason, it is assumed that each observation f (χ) is a normally distributed sample. If there is a dataset made up of multiple observations, i.e., f (χ ₁),f(χ₂),...,f(χ_t), then the vector of datasets [ f (χ ₁),f(χ₂),...,f(χ_t) ] is a multivariate normal distribution of samples defined by a mean vector and a covariance matrix. Thus, the gaussian process is a normal distribution of n variables, where n is the number of observations. The covariance matrix is defined by kernel function k (χ ₁,χ₂), which shows that far samples are hardly correlated, while near samples are highly correlated. Based on the prior assumption of the fact that the function tends to smooth and the likelihood of the prior function, the corresponding close χ ₁ and χ ₂ values of the two observations are likely to be correlated.

Wherein the method comprises the steps of k＝[k(x,χ₁)k(x,χ₂)…k(x,χ_t)]。

The bayesian optimization implementation uses this gaussian process model to search for the maximum f (x) of the unknown objective function. The next χ is selected for testing by selecting the maximum of the acquisition function to balance the exploration (improving the model in the less explored portion of the search space) and development (favoring the promising portion predicted by the model). After observation, the algorithm will update the gaussian process to take into account the new data. Since it is assumed that all points of the search space have good likelihood, the gaussian process is initialized with a constant mean value. After each observation, the model was gradually refined.

The gaussian process specifies μ (x) and kernel function k (χ ₁,χ₂) entirely from its mean function.

The goal is to learn the feature length scale l ² and the overall varianceThe probability θ of the data is maximized given the kernel function. The marginal probability is calculated as follows:

Where mu ₀ is the mean function.

3. Bearing fault diagnosis method based on CWT and AlexNet-LGBM

As shown in fig. 6, the bearing fault diagnosis flow based on the continuous wavelet transform and AlexNet-LGBM is as follows:

step 1: and (3) signal sampling: for raw vibration data, every sample_length (set to 1024 in the fourth experiment) is taken as one sample, and samples are continuously taken at sampling intervals sample_interval (set to 384 in the fourth experiment) in an overlapping sampling manner.

Step 2: continuous wavelet transform signal processing: each sample is subjected to continuous wavelet transformation to generate a corresponding time-frequency image, and the color picture is readjusted to be of size n×n (set to 32×32 in the fourth chapter experiment). Sufficient pictures are generated to be divided into training and test sets. For the training process, step 3 is performed; for the test procedure, the process jumps to step 5 execution.

Step 3: alexNet feature extraction: the training set N multiplied by N time-frequency diagram is input into the improved AlexNet model for training, and the model is stored.

Step 4: LGBM fault diagnosis: and inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last full-connection layer of the training set, inputting the training set into a LGBM model for training, wherein the data dimension is sample_Num multiplied by 1000, the sample_Num represents the number of samples, and 1000 is the number of neurons of the second full-connection layer of the AlexNet model.

Step 5: the testing process comprises the following steps: inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, inputting the characteristics into a trained LGBM model, and outputting the LGBM model as a fault diagnosis result.

4. Experiment verification

4.1 Data set and experimental Environment introduction

The present invention uses a bearing vibration dataset disclosed by the university of kesixi. In CWRU bearing experiments, there are four variables, including fault location, fault depth, motor load, and sampling frequency. The data file adopts MATLAB format and comprises fan end and driving end bearing acceleration data and motor rotating speed data.

Considering that in reality, the load is not 0 most of the time when the rotary machine works, and the fault diagnosis should be applicable to all load conditions as much as possible, and the fault position is more important than the fault depth so as to be convenient for replacing parts. Therefore, the present invention sets the fault diagnosis targets to identify the bearing fault location, including the inner ring fault, the ball fault, the outer ring fault, and the normal four. In combination with the absence of data under individual conditions of CWRU datasets, the present invention uses normal data for 1 to 3 horsepower loads and drive end bearing failure data for a 12kHz sampling frequency, with the CWRU partial data files specifically used being shown in tables 1 and 2.

Experiments were performed on a Windows 1064-bit operating system computer with a GPU, CPU model i5-4200U, running 12GB of memory. Programming was performed on Jupyter Notebook compilers using the Python 3.7 language, using the deep learning frameworks of versions Tensorflow 2.3.1 and Keras 2.4.3.

Table 1 normal data files used in the present invention

Table 2 fault data files for use with the present invention

4.2 Data processing

In CWRU datasets, each operating condition was run for around 20s, i.e., about 240,000 data points in each dataset, based on 12,000hz of sample frequency. Therefore, it is necessary to truncate the original vibration signal to generate training and test data sets. In the present invention, the overlapping sampling method described in section 3.1.1 is used to generate training and test data sets. The truncated window is slid along the original vibration signal with 384 data points at sampling intervals and a window size of 1,024 data points. Each movement of the window generates a data set of 1,024 data points. The first 300 samples were selected among the small samples of several consecutive 1,024 consecutive data points generated per file, so that a total of 30 files in tables 1 and 2 could result in 9,000 samples.

And performing continuous wavelet transformation signal processing in a 1.2-knot on 9000 samples, selecting Morlet mother wavelet function, and resetting a time-frequency spectrum diagram obtained by wavelet transformation to be 32×32 pixels, so as to obtain 9,000 time-frequency pictures with uniform sizes. The processing results are shown in fig. 7.

As can be seen from fig. 7, the normal bearing has a more uniform energy distribution compared to the faulty bearing, and the faulty bearing exhibits a periodic high energy band, and the frequency distribution of the faulty bearing is also different from that of the normal bearing in the longitudinal direction, and the energy distribution of the normal bearing is in a lower frequency band.

Table 3 data set partitioning

4.3 Comparison of neural network feature extraction Capacity

In order to compare the feature extraction capability of the bearing vibration spectrograms of different neural network structures, the invention compares the improved AlexNet and LeNet-5 proposed in section 3.2 with EFFICENTNET.

Table 4 LeNet-5 and EFFICENTNET configuration and parameter settings

The improved AlexNet structure is shown in section 1.3, the total parameters are 17,289,484, and compared with AlexNet of the original 60,965,128 parameters, the improved AlexNet structure is reduced by 71.6%, and the training speed of AlexNet is improved.

Since EFFICENTNET requires larger and larger picture input sizes from the models of B0 to B7, the 32X 32 picture of this chapter is only suitable for using the EFFICENTNET-B0 model, and the top-down structure and parameter settings of the modified LeNet-5 and EFFICENTNET-B0 models are shown in Table 4.

AlexNet, leNet-5 and EFFICENTNET each use a cross entropy loss function categorical _ crossentropy and SGD optimizer, with a learning rate set to 0.001. The number of iterations was set to 30 generations and the training results were as shown in fig. 8 and 9.

From the change of the training accuracy and loss, the accuracy and loss of the three models are hardly changed after 30 iterations, and convergence is achieved. EFFICIENTNET can only reach about 85% of verification set accuracy, and LeNet-5 and AlexNet can reach a better effect of 98% of accuracy. EFFICIENTNET is not suitable for fault diagnosis of a 32×32 pixel bearing fault spectrogram, and the training fluctuation of the LeNet-5 is larger than AlexNet, and AlexNet is more stable than the LeNet-5.

Features extracted from AlexNet and the penultimate fully-connected layer of LeNet-5 were clustered and dimensionality reduced by means of TSNE tool sklearn as shown in fig. 10.

It can be seen that the features extracted by LeNet-5 are harder to classify in two places (the dotted circles), the data of different categories fit together, and the improved AlexNet is harder to classify in only one place. The improved AlexNet of the present invention has better feature extraction capability. AlexNet and LeNet-5 were later used for feature extraction, continuing with fault diagnosis by LGBM classification.

4.4 Comprehensive comparison of bearing fault diagnosis methods

In order to verify that the bearing fault diagnosis method based on continuous wavelet transformation and AlexNet-LGBM provided by the invention has the highest accuracy, the invention compares the fault diagnosis effects of different combinations of similar AlexNet and LGBM combined structures.

Wherein LGBM classifier is subjected to Bayesian parameter optimization, and the parameter settings are shown in table 5.

The penultimate layer outputs of AlexNet and LeNet-5 are input into LGBM, gcForest and CatBoost classifiers, respectively, to produce six combined classifiers, abbreviated as CWT-Alex-LGBM, CWT-Alex-GCF, CWT-Alex-Cat, CWT-LeNet5-LGBM, CWT-LeNet5-GCF and CWT-LeNet5-Cat, plus a neural network to output classification results of CWT-AlexNet and CWT-LeNet5 directly with a full connection layer, giving a total of 8 models to be compared. Wherein the research structure of CWT-Alex is the same as Wang, and the research structure of CWT-LeNet5-GCF is the same as Xu. Studies of Xu have concluded that the CWT-LeNet5-GCF model is superior to the CWT-LeNet5 and CWT-GCF and conventional CNN models.

The above 8 kinds of combined models were used for fault diagnosis of 9000 time-frequency spectrum pattern books with the size of 32×32 obtained in section 4.2, 5 experiments were performed, and accuracy and prediction time of a test set consisting of 1800 samples were recorded, so that the experimental results in table 6 were obtained.

TABLE 5 LGBM parameter settings

TABLE 6 test set fault diagnosis results for eight models

As can be seen from Table 6, the proposed method for diagnosing bearing failure based on continuous wavelet transform and AlexNet-LGBM (CWT-Alex-LGBM in the table) has a 99.712% accuracy, and has the highest accuracy compared with the other 7 models, which is higher than the Wang CWT-AlexNet model and the Xu CWT-LeNet5-gcForest model (98.788% and 99.598% respectively).

The CWT-Alex and CWT-LeNet5 are classified by using the full-connection layer Softmax, so that the effect is not as good as that of classifying the characteristics extracted by the neural network by using the LGBM, gcForest and CatBoost classifiers, the average accuracy of the two is only 98.788% and 98.186%, the accuracy is far lower than the accuracy (more than 99.5%) of reclassification of the LGBM, gcForest and CatBoost classifiers, the multiple prediction results are very unstable, and the variances are 2.147 and 1.971 respectively, and are far higher than other combination models.

To more intuitively compare the reclassification effect of LGBM, gcForest and CatBoost classifiers, the six combined models in the table are plotted as shown in fig. 11-13.

As can be seen from fig. 12, both LGBM, gcForest and CatBoost classifiers exhibit a accuracy of the reclassification of the neural network of LGBM > gcForest > CatBoost. As can be seen from FIG. 13, LGBM and CatBoost are less predictive than gcForest and LeNet-5 is generally less predictive than AlexNet, but are all of the same order of magnitude.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims

1. A bearing failure diagnosis method, characterized by comprising:

Firstly, extracting time-frequency characteristics of an original vibration signal of a bearing by using continuous wavelet transformation, and converting the time-frequency characteristics into a two-dimensional image of 32 multiplied by 32 pixels; secondly, performing fault feature extraction on the time-frequency spectrum chart by using an improved AlexNet model; finally, for fault diagnosis classification, selecting optimal model parameters through LGBM classification algorithm and using Bayesian optimization;

The bearing fault diagnosis method comprises the following steps:

step one, signal sampling: taking each sample_length of continuous data points as one sample of the original vibration data, and continuously sampling according to the sampling interval sample_interval in an overlapped sampling mode;

step two, morlet continuous wavelet transformation signal processing: performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a testing set; for the training process, executing a step three; for the test process, jumping to the fifth step;

step three, alexNet feature extraction: inputting a time-frequency diagram with the size of N multiplied by N of a training set into an improved AlexNet model for training, and storing the model;

Fourth, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last-to-last full-connection layer, inputting a LGBM model for training, wherein the data dimension is sample_Num multiplied by 1000; wherein sample_num represents the number of samples, 1000 is the number of neurons of the second full connection layer of AlexNet model;

step five, testing process: inputting a time-frequency diagram with the size of N multiplied by N of the test set into a trained AlexNet model, taking out the output of a second full-connection layer of the AlexNet model as the characteristics extracted by AlexNet, inputting a trained LGBM model, and outputting a LGBM model as a fault diagnosis result;

in the fourth step, the LGBM fault diagnosis includes gradient-based unilateral sampling and mutual exclusion feature bundling, including:

(1) A gradient-based single-side sampling algorithm; intensively training an instance with a large gradient, adopting random extraction and compensating the influence on data distribution by adding a constant multiplier when calculating the information gain for an instance with a small gradient; the GOSS algorithm is as follows:

input: training data I with n instances { x ₁,...,x_n }, iteration number d, sampling rates a, b of large gradient data and small gradient data, loss function loss and a number of weak learners L;

And (3) outputting: a trained strong learner;

step 6: step 2 to step 5 are circularly executed until the iteration times d are reached or convergence is achieved;

(2) The mutual exclusion feature binding algorithm comprises two steps of binding cluster generation and mutual exclusion feature combination;

The binding cluster generation algorithm determines which mutually exclusive features can be combined, and the combined features are put together and are called bundle; combining the mutually exclusive features to combine each bundle into one feature; determining which mutually exclusive features can be combined and used by Greedy Bundle, wherein the process is that firstly, the features are taken as vertexes, edges are added for each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a Greedy algorithm is used; mutually exclusive feature merging constructs feature packages by letting mutually exclusive features exclusive features reside in different bins, which can be simply implemented by adding an offset to the value of the original feature;

programming using the LGBMClassifier package of Python to sort the output of the penultimate layer of AlexNet;

(3) Bayesian super-parameter optimization

Performing parameter tuning on the training process of the LGBM model by using HyperOpt; hyperOpt provides an easy-to-use bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization by model-based sequential optimization techniques; optimization based on a sequence model is a bayesian optimization technique;

The Bayesian optimization is an optimization algorithm based on a model, and is specially customized for an objective function, namely a cost function quantity, and the Bayesian optimization search can obtain the maximum value of an unknown objective function of a sample; the method comprises the steps that the model of an objective function is created by using a regression method as in all model-based optimization algorithms, then a next point to be acquired is selected according to the model, and then the model is updated;

the basic algorithm of bayesian optimization is as follows:

step 1: setting a Gaussian process for the objective function f;

Wherein, Representing a standard normal distribution;

To estimate μ (x) and σ (x), a gaussian process is fitted to the data; assuming that each observation f (χ) is a normally distributed sample, if there is a dataset made up of multiple observations, i.e., f (χ ₁),f(χ₂),...,f(χ_t), then the vector made up of datasets [ f (χ ₁),f(χ₂),...,f(χ_t) ] is a multi-element normally distributed sample defined by a mean vector and covariance matrix, so the gaussian process is an n-variable normally distributed, where n is the number of observations; the covariance matrix is defined by kernel function k (χ ₁,χ₂), which indicates that samples at far away are hardly correlated, while samples at near away are highly correlated; based on the prior assumption of the fact that the function tends to be smooth and the likelihood of the prior function, the corresponding close χ ₁ and χ ₂ values of the two observations are likely to be correlated;

Given a set of observations P _1:t＝f(χ_1:t) and sampling noise The gaussian process is calculated as follows:

Wherein, k＝[k(x,χ₁) k(x,χ₂) … k(x,χ_t)]；

The bayesian optimization implementation uses this gaussian process model to search for the maximum f (x) of the unknown objective function; selecting the next χ to test by selecting the maximum of the acquisition function to balance exploration, i.e., improving the model in the less explored portion of the search space and development, i.e., favoring the balance between the promising portions predicted by the model; after observation, the algorithm will update the gaussian process to take the new data into account; since it is assumed that all points of the search space have good likelihood, the gaussian process is initialized with a constant mean; after each observation, the model is gradually perfected;

the gaussian process specifies μ (x) and kernel k (χ ₁,χ₂) completely from the mean function;

The goal is to learn the feature length scale l ² and the overall variance The probability θ of maximizing the data given a kernel function, the marginal probability is calculated as follows:

Wherein μ ₀ is the mean function.

2. The bearing fault diagnosis method as claimed in claim 1, wherein in step one, the signal sampling includes:

3. The bearing fault diagnosis method as claimed in claim 1, wherein in the second step, the Morlet continuous wavelet transform signal processing includes:

Among the different wavelets, the complex or analytic wavelet has a fourier transform with a negative frequency of zero; with such complex wavelets, the phase and amplitude components of the signal are separated; morlet is the most commonly used complex wavelet, and continuous wavelet analysis by Morlet complex wavelet has the advantages of enabling separation of information in wavelet domain and simpler relationship between transform ridge and instantaneous frequency; bearing vibration signals are processed using Morlet, the Morlet wavelet is defined as:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

Wherein f ₀ is the center frequency of the mother wavelet; the second term in brackets is called the correction term, which is used to correct the complex sine multiplied by the non-zero average of the gaussian term; in practice, the value of f ₀ >0 is ignored, in which case the Morlet wavelet is represented as follows:

wherein the Morlet wavelet is a simple complex sine exp (i 2 pi f ₀ t) within a Gaussian envelope exp (-t ²/2); the term pi ^1/4 is a normalization factor that ensures that the wavelet has unit energy;

the fourier transform of the Morlet wavelet is as follows:

Wherein the expression of the fourier transform of the Morlet wavelet has the form of a gaussian function shifted by f ₀ along the frequency axis, the center frequency of the gaussian spectrum being typically selected to resolve the characteristic frequencies of the Morlet wavelet; the characteristic frequency is set for the parent wavelet and varies as follows according to wavelet scale a:

according to equation (3), the integrated Morlet wavelet energy is equal to 1;

4. The bearing fault diagnosis method as claimed in claim 1, wherein in the third step, the AlexNet feature extraction includes:

AlexNet is modified as follows:

(1) Model input dimension improvement: the input image size 224×224 of classical AlexNet is still larger for bearing fault diagnosis based on vibration signals, if the frequency of collecting vibration signals of the bearing is higher, the image generated by wavelet transformation of all samples occupies a large storage space, so that a color image with the size of 32×32 is adopted as input;

using the variant PReLU of ReLU, the representation is:

the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameter in the network;

5. A bearing failure diagnosis system that implements the bearing failure diagnosis method according to any one of claims 1 to 4, characterized in that the bearing failure diagnosis system comprises:

The wavelet transformation signal processing module is used for performing Morlet continuous wavelet transformation signal processing, performing continuous wavelet transformation on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to a color picture with the size of N multiplied by N, and generating enough pictures to be divided into a training set and a test set; for a training process, executing the AlexNet feature extraction module; for the test process, jumping to a test module;

6. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the bearing fault diagnosis method according to any one of claims 1 to 4.

7. An information data processing terminal, characterized in that the information data processing terminal is adapted to realize the bearing failure diagnosis system according to claim 5.