CN113834656A

CN113834656A - Bearing fault diagnosis method, system, equipment and terminal

Info

Publication number: CN113834656A
Application number: CN202110997171.6A
Authority: CN
Inventors: 刘立芳; 张梓锐; 和伟辉; 李飞龙; 齐小刚
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-12-24
Anticipated expiration: 2041-08-27
Also published as: CN113834656B

Abstract

The invention belongs to the technical field of bearing fault diagnosis and discloses a bearing fault diagnosis method, a system, equipment and a terminal, wherein the bearing fault diagnosis method comprises the following steps: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; extracting fault features of the time-frequency spectrogram by using an improved AlexNet model; and for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and using Bayesian optimization. The bearing fault diagnosis method provided by the invention has the optimal fault diagnosis accuracy. Through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that consumed by other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.

Description

Bearing fault diagnosis method, system, equipment and terminal

Technical Field

The invention belongs to the technical field of bearing fault diagnosis, and particularly relates to a bearing fault diagnosis method, system, equipment and terminal.

Background

At present, effective mechanical equipment failure diagnosis can reduce huge economic losses caused in industrial production, and in recent years, the application of machine learning or deep learning techniques has been greatly increased, and in addition, the utilization of advanced measurement techniques enables a large amount of data to be collected in an industrial environment. Under the background of big data, the machine learning and Deep learning fault diagnosis algorithm model shows excellent effects, such as Deep Neural Network (DNN), CNN, recurrent Neural Network, and the like.

At present, automatic encoders and convolutional neural networks are common in deep learning fault diagnosis models. Lei et al propose a deep neural network for rotary machine fault diagnosis based on frequency domain data. Zong et al propose a frequency domain data-based bearing fault diagnosis denoising autoencoder. Wei et al propose a one-dimensional CNN for bearing fault diagnosis by means of raw time signals, which perform well in noisy environments. Guo X et al propose a hierarchical adaptive depth CNN for bearing fault diagnosis by converting the raw time signal into a 32X 32 matrix as input. Wang Q et al propose a CNN-based bearing reliability assessment and residual life prediction method that converts frequency domain signals into a 32 x 32 matrix as input. Wang J et al proposed a generic bearing fault diagnosis model transferred from a well-known AlexNet model and compared the effects of eight time-frequency feature extraction methods. Wang L H et al propose a motor fault diagnosis CNN that converts a fault signal into a Time-frequency image using Short-Time Fourier Transform (STFT). Claessens et al propose a bearing fault diagnosis local connection network consisting of normalized sparse autoencoders. Eren et al use one-dimensional convolutional neural networks for time series prediction for data preprocessing. Better efficiency is achieved by filtering, decimating and normalizing the input data. Ran et al claim that time series prediction using DNN achieves a high degree of accuracy, but do not provide any architectural details for their proposed DNN networks. The same problem occurs in the research of Mao et al, claiming to use a new deep learning approach to achieve high accuracy, but they provide only training accuracy (rather than testing accuracy) and do not provide any feasible architecture for the proposed network, resulting in difficult reproducibility. In more advanced articles they are focused on both CNN and Long Short-Term Memory networks (LSTM) for bearing fault diagnosis. However, the stepwise construction process of the model they propose is not explicitly explained. Therefore, a new bearing fault diagnosis method is needed to overcome the defects of the conventional bearing fault diagnosis method.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) in existing bearing fault diagnosis methods, no architectural details are provided for the proposed DNN network.

(2) In the existing bearing fault diagnosis method, only training accuracy (not test accuracy) is provided, and any feasible architecture is not provided for the proposed network, thereby causing difficulty in reproduction.

(3) In the existing technical scheme which simultaneously focuses on CNN and long-short term memory network LSTM for bearing fault diagnosis, the gradual construction process of the model is not clearly explained.

The difficulty in solving the above problems and defects is:

(1) many DNN models are deep and complex in structure.

(2) In training and testing the model, the testing precision is generally smaller than the training precision, and higher accuracy can be obtained by giving the training precision, but the excellence of the model method cannot be explained more than the testing precision.

(3) Sometimes, the model is built through the final result obtained by continuously adjusting result feedback, and the building process is difficult to explain.

The significance of solving the problems and the defects is as follows:

(1) in view of the above first problem, the architecture details of the DNN network can be directly constructed by the deep learning tool to construct the same network model, so as to directly utilize the constructed excellent model for fault diagnosis.

(2) In view of the second problem, the provision of test accuracy in the fault diagnosis method can better explain the advantages and effects of the method, and provide a feasible architecture for the proposed network, which can be reproduced more easily.

(3) Diagnosing the third problem, it is stated that the gradual model building process can make the diagnostic method have better interpretability, clearer when researching the principle of the method, and clearer guidance when improving the method.

Disclosure of Invention

The invention provides a bearing fault diagnosis method, a system, equipment and a terminal aiming at the problems in the prior art, and particularly relates to a bearing fault diagnosis method, a system, equipment and a terminal based on a continuous wavelet transform CWT and AlexNet-light gradient elevator fusion model AlexNet-LGBM.

The invention is realized in such a way that a bearing fault diagnosis method comprises the following steps:

firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, extracting fault features of the time-frequency spectrum by using an improved AlexNet model; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.

Further, the bearing fault diagnosis method comprises the following steps:

step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode; the method has the advantages that the original signal samples are segmented, so that samples with proper sizes are generated for processing in the subsequent steps, and in addition, more samples can be generated for training and testing after the samples are segmented, so that the accuracy of the model is increased.

Step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process; the main functions of the step are two points: (1) and (2) processing the one-dimensional signals by utilizing Morlet continuous wavelet transform to extract time domain features and frequency domain features of the one-dimensional signals, and converting the one-dimensional signals into two-dimensional pictures by utilizing Morlet continuous wavelet transform for training a subsequent model and further extracting the features.

Step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model; the method mainly has the main effects of training an improved AlexNet feature extraction model and adjusting various hyper-parameters of the model, so that the model has the optimal feature extraction capability and then stores model parameters, and the model parameters are used for a subsequent test stage.

Step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model; the main function of the step is to train an LGBM model, and the fault characteristics extracted by the AlexNet model are input into the LGBM to train a final fault classifier.

Step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result. The main function of this step is to obtain the final classification result of the fault diagnosis.

Further, in step one, the signal sampling includes:

selecting continuous data points with sampling length sample _ length from an original vibration signal as an original sample; sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation; readjusting the time-frequency image to be proper N multiplied by N; successive sample _ length data points after the sample interval sample _ interval are selected in an overlapping manner as another sample, another image of size N × N is generated, and the above process is repeated to generate sufficient training and test images.

Further, in step two, the Morlet continuous wavelet transform signal processing includes:

the wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:

in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal are separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

wherein f is₀Is the center frequency of the mother wavelet; the second term in brackets is called the correction term and is used to correct the complex sine times the non-zero mean of the gaussian term. In fact, f₀Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:

wherein the Morlet wavelet is a simple complex sine exp (i2 π f)₀t) at a Gaussian envelope exp (-t)²B,/2); pi^1/4The term is a normalization factor that ensures that the wavelet has a unit energy.

The fourier transform of the Morlet wavelet is as follows:

wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis₀The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:

the energy spectrum, i.e. the squared magnitude of the fourier transform, is calculated as follows:

the integrated Morlet wavelet energy is equal to 1 according to equation (3).

And converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises the corresponding relation between time and frequency.

Further, in step three, the AlexNet feature extraction includes:

AlexNet was modified as follows:

(1) improving the dimension of model input: the input image size 224 × 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of a bearing is high, a picture generated by performing wavelet transform on all samples occupies a large storage space, so that a color picture with the size of 32 × 32 is adopted as input.

(2) Convolutional layer activation function improvement: the ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:

a variant of ReLU, pralu, was used, expressed as:

PReLU differs from ReLU in that when z <0, the value is a linear function with slope a, and the gradient update is calculated as:

and the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.

(3) Improvement of a full connection layer and an output layer: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four types are classified, so that the size of an output layer of the improved AlexNet structure is set to be 4; as the output layer becomes smaller, the size of the second fully connected layer is set to 1000.

Further, the improved AlexNet structure comprises:

(1) convolutional layer

The convolution layer and the previous layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:

wherein h is_jA jth output feature map representing a current convolutional layer; x_iThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix W_ijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, b_jMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; f (x) is a non-linear activation functionCorresponding to the PReLU function shown in equation (8).

(2) Pooling layer

The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects a largest pool for output from the convolution output layer Y_cnThe maximum values of the extraction are as follows:

wherein S is^M×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from Y_cnUntil the whole Y is scanned by a fixed step size_cn(ii) a S is a 3 × 3 matrix, then Y_cnWill be reduced to 1/9 and assigned to P in the pool output layer_cn。

(3) Full connection layer

The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.

The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. Representing input pictures in a training dataset as x_kThe label is y_kDenotes x_kA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:

where θ is the weight matrix of the Softmax layer, θ_iIs the row vector of theta.

(4) Parameter updating

To accommodate the multi-classification fault diagnosis task, the loss function is set as a cross-entropy loss function, expressed as:

wherein the content of the first and second substances,

represents the probability that the prediction of the ith sample belongs to class k;

for practical probability, if the true class of the ith sample is k, then

Otherwise, the value is 0; w^(l)A parameter matrix of the l layer; the first term in the formula measures the prediction

And true category

The cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is the L2 regularization term, and the coefficient λ is the weight decay parameter.

The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:

where α is the learning rate, controlling the magnitude of the gradient change in each iteration. The residual amount of the loss function generated at the jth node of the l-th layer is recorded as

The recurrence formula is expressed as:

the gradient formula of the loss versus parameter function is expressed as:

for formula (13), y_kThe value is 1 only in one category k, and the rest are 0. Let the real category be

Then:

obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)

Residual δ of other layers^(L-1),...,δ⁽¹⁾To be calculated according to the recursion formula (15).

The bearing fault feature extraction model was constructed by a Python-based Keras deep learning framework using the tensrflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.

Further, in step four, the LGBM fault diagnosis includes gradient-based unilateral sampling and mutual exclusion feature bundling, including:

(1) gradient-based single-edge sampling algorithms. The case with large gradient is trained intensively, and for the case with small gradient, random extraction is adopted and the influence on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The gos algorithm is as follows:

inputting: with n instances x₁,...,x_nTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L.

And (3) outputting: a well-trained strong learner.

Step 1: initialization: let topN ═ a × len (i) denote the number of large gradient data samples; adding L into a model list model, and setting the weight w of each training datum as 1;

step 2: predicting training data by the model list, calculating the loss g of each data by using a loss function loss, and arranging the training data according to the g descending order;

and step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets A^CRandom extraction of Bx A^CTaking | as small gradient subsets B, and combining the large gradient subsets and the small gradient subsets to be recorded as usedSet;

and 4, step 4: multiplying the weight w of the small gradient sample by a coefficient (1-a)/b;

and 5: inputting data I, negative gradient-g and weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model;

according to vector V_j(d) The example is split by the estimated variance gain over subsets a and B:

wherein A is_l＝{x_i∈A:x_ij≤d}，A_r＝{x_i∈A:x_ij＞d}，B_l＝{x_i∈B:x_ij≤d}，B_r＝{x_i∈B:x_ijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to A^CThe size of (d); add newModel to model list models;

step 6: and (5) circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached.

(2) The mutually exclusive feature binding algorithm comprises two steps of binding cluster generation and mutually exclusive feature combination.

Determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.

The output of the penultimate layer of AlexNet is sorted using Python's LGBMClassifier packet for programming.

(3) Bayesian hyper-parameter optimization

And performing parameter optimization on the training process of the LGBM model by using HyperOpt. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm, and hyper-parametric optimization is performed through a model-based sequential optimization technology. Sequence model-based optimization is a bayesian optimization technique.

Bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; as with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.

The basic algorithm of bayesian optimization is as follows:

step 1: setting a Gaussian process for the target function f;

step 2: according to the initial space filling experimental design, at n₀Point observation f, setting n to n₀；

And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let x_nIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of y_n＝f(x_n) (ii) a Increasing n by 1;

and 4, step 4: returning a solution: a point calculated using the maximum f (x), or a point calculated using the maximum a posteriori mean;

the objective function f is usually unknown, a gaussian process defining for each point x the probability distribution f (x) of the gaussian distribution, determined by the mean μ and the standard deviation σ, defining the probability distribution of the function:

wherein the content of the first and second substances,

representing a standard normal distribution.

To estimate μ (x) and σ (x), a gaussian process is fitted to the data. Assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)₁),f(χ₂),...,f(χ_t) Then the vector [ f (χ) of the data set₁),f(χ₂),...,f(χ_t)]Is a sample of multivariate normal distribution defined by a mean vector and a covariance matrix, so the gaussian process is an n-variable normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)₁,χ₂) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function₁Hexix-₂The values are likely to be correlated.

Given a set of observations P_1:t＝f(χ_1:t) And sampling noise

The gaussian process is calculated as follows:

wherein the content of the first and second substances,

k＝[k(x,χ₁)k(x,χ₂)…k(x,χ_t)]。

bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the desired portion predicted by the model. After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since it is assumed that all points of the search space have good probability. After each observation, the model was gradually perfected.

The Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)₁,χ₂)。

The goal is to learn the characteristic length scale l²And the total variance

The probability of data, θ, is maximized given the kernel function, and the marginal probability is calculated as follows:

wherein, mu₀Is a mean function.

Another object of the present invention is to provide a bearing fault diagnosis system applying the bearing fault diagnosis method, the bearing fault diagnosis system including:

the signal sampling module is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;

the wavelet transform signal processing module is used for carrying out Morlet continuous wavelet transform signal processing, carrying out continuous wavelet transform on each sample, generating a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;

the AlexNet characteristic extraction module is used for inputting a time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;

the LGBM fault diagnosis module is used for inputting a time-frequency graph with the size of NxN of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num x 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;

and the test module is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

(1) signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;

(2) and Morlet continuous wavelet transformation signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the step (3); for the test process, jumping to the step (5);

(3) extracting AlexNet features: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;

(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;

(5) the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

Another object of the present invention is to provide an information data processing terminal for implementing the bearing fault diagnosis system.

By combining all the technical schemes, the invention has the advantages and positive effects that: aiming at the problem that the classification capability of a Softmax layer of a CNN is not as good as that of a novel Machine learning classification method, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient reinforced Machine, AlexNet-LGBM), and the method can be divided into three parts: vibration signal data processing based on continuous wavelet transform: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, for fault feature extraction, improving an AlexNet model to extract features of a time-frequency spectrogram; and thirdly, for fault diagnosis, fault classification is carried out on the extracted fault characteristics through an LGBM classification algorithm, and optimal model parameters are selected by using Bayesian optimization. The invention also uses a bearing data set of the Kaiser University of Western storage (CWRU) to carry out a comparison experiment, compares the improved AlexNet and LeNet-5 with various combination methods of multi-granularity cascade forests, LGBMs and Catboost, and shows that the AlexNet-LGBM fault diagnosis method based on continuous wavelet transformation provided by the invention has the optimal fault diagnosis accuracy.

The bearing fault diagnosis method provided by the invention also has the following advantages:

(1) for equipment fault feature extraction, firstly, Continuous Wavelet Transform (CWT) is performed on vibration data to convert the vibration data into a time-frequency graph. In order to adapt to the extraction of the fault characteristics of the bearing, an AlexNet model is improved: firstly, the input dimension is changed into 32 multiplied by 3 so as to reduce the storage space occupied by the time-frequency diagram; secondly, the convolution layer activation function uses a parameterized Linear rectification function (PReLU) to overcome the limitation of the Linear rectification function (ReLU); the full connection layer and the output layer are changed to be suitable for the size of the fault classification number; the improved AlexNet, LeNet-5 and EfficientNet-B0 migration models are respectively used for feature extraction, and the feature extraction capabilities of three neural network structures are compared.

(2) For equipment fault diagnosis, a fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient Boosted Machine, AlexNet-LGBM) is proposed: firstly, extracting fault characteristics from a vibration signal by using continuous wavelet transformation and improved AlexNet, further carrying out fault classification on the extracted characteristics by using a lightweight gradient elevator classification algorithm, and optimizing model parameters by using Bayesian optimization. And various combination methods of the improved AlexNet, LeNet-5 feature extraction and multi-granular Cascade Forest (gcForest), LGBM and Catboost classification algorithms are compared.

In order to solve the problems of fault feature extraction and fault diagnosis in a rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM, through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that of other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a bearing fault diagnosis method provided in an embodiment of the present invention.

FIG. 2 is a block diagram of a bearing fault diagnosis system provided by an embodiment of the present invention;

in the figure: 1. a signal sampling module; 2. a wavelet transform signal processing module; 3. an AlexNet feature extraction module; 4. an LGBM fault diagnosis module; 5. and a testing module.

Fig. 3 is a flow chart of bearing vibration signal processing according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the effect of the Morlet continuous wavelet transform provided by the embodiment of the present invention.

Fig. 5 is a schematic structural diagram of an improved AlexNet according to an embodiment of the present invention.

Fig. 6 is a flowchart of bearing fault diagnosis provided by an embodiment of the present invention.

Fig. 7 is a schematic diagram of processing results of continuous wavelet transform according to an embodiment of the present invention.

FIG. 8 is a schematic diagram of the accuracy rate variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.

FIG. 9 is a schematic diagram of the loss variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.

Fig. 10 is a schematic diagram of TSNE visualization display of extracted features provided in the embodiment of the present invention.

Fig. 11 is a schematic diagram of the accuracy of 5 experimental test sets of six combination models provided by the embodiment of the present invention.

Fig. 12 is a schematic diagram of the average accuracy of 5 experimental test sets of six combined models provided in the embodiment of the present invention.

Fig. 13 is a schematic diagram of average time of 5 experimental prediction test sets of six combined models provided by the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a method, a system, a device and a terminal for diagnosing a bearing fault, which are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, a bearing fault diagnosis method provided by an embodiment of the present invention includes the following steps:

s101, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;

s102, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training procedure, S103 is performed; for the test process, jumping to S105;

s103, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;

s104, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;

s105, testing: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

As shown in fig. 2, a bearing fault diagnosis system provided by an embodiment of the present invention includes:

the signal sampling module 1 is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;

the wavelet transform signal processing module 2 is used for performing Morlet continuous wavelet transform signal processing, performing continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to divide the images into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;

the AlexNet characteristic extraction module 3 is used for inputting the time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;

the LGBM fault diagnosis module 4 is used for inputting a time-frequency graph with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;

and the test module 5 is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.

The technical solution of the present invention is further described below with reference to specific examples.

Aiming at the problems in the prior art, the invention provides a bearing fault diagnosis method based on a continuous wavelet transform and AlexNet-lightweight gradient elevator fusion model. Firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, improving an AlexNet model to extract fault characteristics of the time-frequency spectrogram; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.

1. Signal processing

1.1 vibration Signal processing flow

First, a sampling length sample _ length (set to 1024 in the experiment of the present invention) of consecutive data points is selected from the original vibration signal as an original sample. Then, sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation. Subsequently, the time-frequency image is readjusted to an appropriate N × N (set to 32 × 32 in the experiment of the present invention) size. Then, consecutive sample _ length data points after the sampling interval sample _ interval (set to 384 in the experiment of the present invention) are selected as another sample in an overlapping manner, resulting in another image of N × N size, as shown in fig. 3. The above process is repeated to generate sufficient training and test images.

1.2 Morlet continuous wavelet transform signal processing

in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal can be separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The invention uses Morlet to process the bearing vibration signal. The Morlet wavelet is defined as:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

wherein f is₀Is the center frequency of the mother wavelet. The second term in parentheses is referred to as the correction term because it corrects for the non-zero mean of the complex sine times the gaussian term (corresponding to the first term in parentheses). In fact, f₀The value of > 0 is negligible, in which case the Morlet wavelet can be expressed as follows:

this wavelet is a simple complex sine exp (i2 π f)₀t) at a Gaussian envelope exp (-t)²And/2) in the column. Pi^1/4The term is a normalization factor that ensures that the wavelet has a unit energy. The function given by equation (3) is not a true wavelet because it has a non-zero mean, i.e., its zero frequency term for the energy spectrum is non-zero, and therefore it is not acceptable. However, in practice, when f₀> 0, it can be used with minimal error.

The fourier transform of the Morlet wavelet is as follows:

it has a form of a Gaussian function, and is shifted by f along the frequency axis₀. The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:

the energy spectrum (squared magnitude of the fourier transform) is calculated as follows:

the integrated Morlet wavelet energy is equal to 1 according to equation (3).

The effect of using the Morlet continuous wavelet transform on the bearing vibration signal sample processing according to the sampling frequency of the signal (12 kHz in the present invention) is shown in FIG. 4.

Fig. 4(a) is a record of the acceleration continuously measured by the bearing acceleration sensor for the bearing rotating at high speed within 86 milliseconds, with the abscissa as the time axis and the ordinate as the acceleration of the monitoring point during the rotation of the bearing. Under ideal conditions, the bearing acceleration should be 0 for a perfectly uniform rotation. It can be seen from fig. 4(a) that the actual acceleration of the bearing fluctuates around 0 mean, and at about 27 ms and 61 ms, the acceleration of the bearing is larger, the corresponding energy of the bearing is larger, and the higher color brightness is shown at the corresponding time position in the right graph.

Through continuous wavelet transformation, the one-dimensional vibration signal can be converted into a picture, and the picture contains the corresponding relation between time and frequency.

1.3 AlexNet feature extraction

The AlexNet model proposed by Krizhevsky et al can achieve better performance in image recognition than other methods. To date, the Alexnet model has still played an important role in many areas. In order to adapt to the extraction of the fault characteristics of the bearing, the AlexNet is improved as follows:

(1) the model input dimension improves. The input image size 224 x 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is high, the pictures generated by wavelet transformation of all samples occupy a large storage space. Therefore, the present invention takes a color picture of 32 × 32 size as input.

(2) Convolutional layer activation function improvement. The ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:

since the negative gradient is set to 0 by the ReLU activation function, it cannot participate in subsequent propagation and is activated, so that the parameters of the neuron cannot be updated. If the learning rate is set to be too large in the actual training, part of neurons can be invalid, and the parameters cannot be updated effectively, so that the training fails. To this end, the invention uses a variant PReLU of ReLU, which is represented by the form:

PReLU is different from ReLU, and has a value of a linear function with a slope a (smaller constant) when z < 0. The gradient update is calculated as:

the PReLU can greatly reduce the loss of negative gradient information and can be suppressed at one side. The value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.

(3) Full connection layer and output layer improvement. Since the bearing fault diagnosis studied by the present invention includes 1 normal type and 3 fault types (which will be described in detail in the fourth section), for a total of four categories, the output layer size of the improved AlexNet structure is set to 4. As the output layer becomes smaller, the size of the second fully-connected layer is set to 1000 to better extract the key features. The improved AlexNet structure proposed by the present invention is shown in fig. 5.

1.3.1 convolutional layers

The convolution layer and the upper layer are connected in a local connection and counterweight mode, so that the number of parameters is greatly reduced. The operation process when convolution is carried out is as follows:

wherein h is_jJ-th output feature map, X, representing the current convolutional layer_iI-th output feature map representing last convolutional layer (convolutional layer input of current layer). The represents convolution operation, parameter matrix W_ijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, b_jAnd mapping to the offset corresponding to the jth input characteristic of the convolution layer of the current layer. f (x) is a nonlinear activation function, which in the present invention corresponds to the PReLU function shown in equation (8).

1.3.2 pooling layer

The grey color in fig. 5 is the pooling layer used for downsampling after the convolution operation to enable further dimensionality reduction of the extracted features. Common pooling layers include a maximum pool and an average pool. The invention selects the largest pool that can be output from the convolution output layer Y_cnThe maximum values of the extraction are as follows:

wherein S^M×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from Y_cnUntil the whole Y is scanned by a fixed step size_cn. In this chapter, S is a 3 × 3 matrix, then Y_cnWill be reduced to 1/9 and assigned to P in the pool output layer_cn。

1.3.3 full connection layer

The features are flattened into one dimension by the last convolutional and pooling layers before reaching the Flatten layer, and then pass through two fully-connected layers. Each neuron in the fully connected layer is fully connected to all neurons in the upper layer. Dropout operation is carried out on the outputs of the two full connection layers, the discarding rate is 0.5, and partial units are not updated, namely are randomly discarded by the network. Therefore, the structure of the network changes after each iteration, which is equivalent to the effect of ensemble learning of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.

The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. The Softmax classifier can effectively solve the problem of multiple classifications. Representing input pictures in a training dataset as x_kThe label is y_kDenotes x_kA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:

1.3.4 parameter update

To accommodate the multi-classification fault diagnosis task of this chapter, the loss function is set as a cross-entropy loss function, expressed as:

wherein

Representing the probability that the prediction of the ith sample belongs to class k,

is the actual probability (if the true class of the ith sample is k, then

Otherwise 0), W^(l)Is the parameter matrix of the l-th layer. The first term in the formula measures the prediction

And true category

The cross entropy between the two is the maximum entropy and the minimum loss function when the predicted value and the real value are equal. The second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter, so that the relative weights of the two terms can be balanced, and overfitting can be effectively prevented.

Its recurrence formula can be expressed as:

the gradient of the loss versus parameter function can be written as:

Then:

Residual δ of other layers^(L-1),...,δ⁽¹⁾Can be calculated according to the recursion formula (15).

For experiments, the invention constructs a bearing fault feature extraction model by a Pyron-based Keras deep learning framework using Tensorflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.

1.4 LGBM Fault Classification

The LGBM classification algorithm mainly comprises unilateral sampling based on gradient and mutually exclusive feature binding.

And (3) outputting: a well-trained strong learner.

Step 1: initialization: let topN be a × len (i) denote the number of large gradient data samples. The model list models adds L. The weight w of each training data is set to 1.

Step 2: the model list predicts the training data and calculates the loss g for each data using the loss function loss. And the training data is sorted in descending order of g.

And step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets A^CRandom extraction of Bx A^CAnd | as small gradient subsets B. The large and small gradient subsets are merged and denoted usedSet.

And 4, step 4: the weight w of the small gradient sample is multiplied by a factor (1-a)/b.

And 5: and inputting the data I, the negative gradient-g and the weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model newModel.

According to vector V_j(d) The estimated variance gain on subsets a and B divides the instances.

Wherein A is_l＝{x_i∈A:x_ij≤d}，A_r＝{x_i∈A:x_ij＞d}，B_l＝{x_i∈B:x_ij≤d}，B_r＝{x_i∈B:x_ijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to A^CThe size of (2). Will newModel is added to the model list models.

The bundled cluster generation algorithm determines which mutually exclusive features can be merged (features that can be merged are put together and are called bundles), and then the mutually exclusive feature merging merges the respective bundles into one feature. Determining which mutually exclusive features can be used in combination is Greeny bundle, and the specific process is that firstly, the features are used as vertexes, edges are added to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.

The present invention uses the Python's LGBMClassifier packet for programming to classify the output of the second last layer of AlexNet.

1.5 Bayesian hyper-parameter optimization

The invention uses HyperOpt to carry out parameter optimization on the training process of the LGBM model. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization via model-based sequential optimization techniques. Sequence model-based optimization is a bayesian optimization technique.

Bayesian optimization is a model-based optimization algorithm that is specifically tailored to the objective function (also called cost function). Bayesian optimization searches for the maximum of the unknown objective function from which the sample can be obtained. As with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.

The basic algorithm of bayesian optimization is as follows:

step 1: a gaussian process is set for the objective function f.

Step 2: according to the initial space filling experimental facilityIs counted at n₀Point observation f. Setting n as n₀。

And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let x_nIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of y_n＝f(x_n) (ii) a Increasing n by 1.

And 4, step 4: returning a solution: the point calculated using the maximum f (x), or the point calculated using the maximum a posteriori mean.

The objective function f is usually unknown, and a gaussian process defines for each point x a probability distribution f (x) of the gaussian distribution. And is therefore determined by the mean μ and the standard deviation σ. Defining the probability distribution of the function:

wherein

Representing a standard normal distribution.

To estimate μ (x) and σ (x), a gaussian process needs to be fitted to the data. For this reason, it is assumed that each observation f (χ) is a sample of a normal distribution. If there is a data set consisting of a plurality of observations, i.e. f (χ)₁),f(χ₂),...,f(χ_t) Then the vector [ f (χ) of the data set₁),f(χ₂),...,f(χ_t)]Is a sample of a multivariate normal distribution defined by a mean vector and a covariance matrix. Thus, the gaussian process is an n-variate normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)₁,χ₂) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function₁Hexix-₂The values are likely to be correlated.

Given a set of observations P_1:t＝f(χ_1:t) And sampling noise

The gaussian process is calculated as follows:

wherein

k＝[k(x,χ₁)k(x,χ₂)…k(x,χ_t)]。

Bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the balance between exploration (improving the model in less explored parts of the search space) and development (favoring the promising parts predicted by the model). After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since all points of the search space are assumed to have good probability. After each observation, the model was gradually perfected.

The Gaussian process is completely specified by its mean function as μ (x) and kernel function k (χ)₁,χ₂)。

The goal is to learn the characteristic length scale l²And the total variance

The probability of data, θ, is maximized given the kernel function. The marginal probability is calculated as follows:

wherein mu₀Is a mean function.

3. Bearing fault diagnosis method based on CWT and AlexNet-LGBM

As shown in fig. 6, the bearing fault diagnosis process based on continuous wavelet transform and AlexNet-LGBM is as follows:

step 1: signal sampling: for the original vibration data, each sample _ length (1024 in the fourth experiment) consecutive data points is used as a sample, and consecutive sampling is performed at a sampling interval sample _ interval (384 in the fourth experiment) in an overlapping sampling manner.

Step 2: continuous wavelet transform signal processing: each sample is subjected to continuous wavelet transform to generate a corresponding time-frequency image, and the time-frequency image is readjusted to be a color picture of size N × N (set to 32 × 32 in the fourth experiment). Sufficient picture partitions into training and test sets are generated. For the training process, step 3 is performed; for the test process, jump to step 5 execution.

And step 3: extracting AlexNet features: and inputting the time-frequency diagram with the size of N multiplied by N of the training set into an improved AlexNet model for training, and storing the model.

And 4, step 4: LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last full-connected layer, inputting the time-frequency diagram into an LGBM model for training, wherein the data dimension is sample _ Num multiplied by 1000, the sample _ Num represents the number of samples, and 1000 is the number of neurons of the second full-connected layer of the AlexNet model.

And 5: the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the output of the AlexNet model into the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

4. Experimental verification

4.1 data set and Experimental Environment introduction

The present invention uses the bearing vibration data set published by the university of Keiss Xizhi. In the CWRU bearing experiment, there are four variables including fault location, fault depth, motor load and sampling frequency. The data file adopts an MATLAB format and comprises fan end and drive end bearing acceleration data and motor rotating speed data.

Considering that the load is not 0 most of the time when the rotary machine works in reality, the fault diagnosis should be applied to all load situations as much as possible, and the fault position is more concerned than the fault depth so as to be convenient for replacing parts. Therefore, the fault diagnosis target is set to identify the fault position of the bearing, and the fault position comprises four types of inner ring faults, ball faults, outer ring faults and normal. In conjunction with the absence of data under individual conditions of the CWRU data set, the present invention uses normal data for 1 to 3 horsepower loads and drive end bearing failure data for a 12kHz sampling frequency, using specifically the CWRU portion data files shown in tables 1 and 2.

The experiment was performed on a Windows 1064-bit operating system computer with a GPU, the CPU model was i5-4200U, and the running memory was 12 GB. The programming was done on a Jupyter notewood compiler using Python 3.7 language, using the deep learning framework of the tenserflow 2.3.1 and Keras 2.4.3 versions.

Table 1 normal data file used by the present invention

Table 2 fault data file for use with the present invention

4.2 data processing

In the CWRU dataset, each operating condition was run for around 20s, i.e. about 240,000 data points per dataset, depending on 12,000Hz of the sample frequency. Therefore, it is necessary to truncate the original vibration signal to generate training and test data sets. In the present invention, the overlap-sampling method introduced in section 3.1.1 is used to generate training and test data sets. The truncation window slides along the original vibration signal with a sampling interval of 384 data points and a window size of 1,024 data points. Each movement of the window produces a data set of 1,024 data points. The first 300 samples were selected from a small sample consisting of several consecutive 1,024 consecutive data points generated for each file, so that a total of 30 files in tables 1 and 2 resulted in 9,000 samples.

9000 samples are processed by continuous wavelet transform signals in 1.2 knots, Morlet mother wavelet function is selected, and a time-frequency spectrogram obtained by wavelet transform is reset to be 32 multiplied by 32 pixels, so that 9,000 time-frequency pictures with uniform size are obtained. The processing results are shown in fig. 7.

As can be seen from fig. 7, the normal bearings have a more uniform energy distribution compared to the failed bearings, while the failed bearings show periodic high energy bands, and the failure frequency is different from the frequency distribution of the normal bearings in the longitudinal direction, and the energy distribution of the normal bearings is in the lower frequency band.

TABLE 3 data set partitioning

4.3 neural network feature extraction capability comparison

In order to compare the feature extraction capability of different neural network structures on the vibration spectrogram of the bearing, the improved AlexNet and LeNet-5 provided in section 3.2 are compared with EfficentNet.

TABLE 4 LeNet-5 and EfficentNet Structure and parameter settings

The improved AlexNet structure is shown in section 1.3, total 17,289,484 parameters, and compared with the original AlexNet with 60,965,128 parameters, the improved AlexNet structure is reduced by 71.6%, and the training speed of AlexNet is improved.

As the model of EfficentNet from B0 to B7 requires larger and larger picture input sizes, the model of EfficentNet-B0 is only suitable for 32 x 32 pictures in this chapter, and the structure and parameter settings of the improved LeNet-5 and EfficentNet-B0 models from top to bottom are shown in Table 4.

AlexNet, LeNet-5 and EffentrtNet all use the cross entropy loss function, category _ cross sensitivity and SGD optimizer, with a learning rate set to 0.001. The number of iterations is set to 30 generations, and the training results are shown in fig. 8 and 9.

From the change of the training accuracy and the loss, the accuracy and the loss of the three models are almost not changed after 30 iterations, and the convergence is achieved. The accuracy of the validation set of EfficientNet can only reach about 85%, and LeNet-5 and AlexNet can achieve the better effect of 98% accuracy. The EfficientNet is not suitable for fault diagnosis of a bearing fault spectrogram of 32 x 32 pixels, the training fluctuation of LeNet-5 is larger than that of AlexNet, and AlexNet is more stable than that of LeNet-5.

The features extracted from the penultimate fully connected layers of AlexNet and LeNet-5 are clustered by the SNE tool of sklern and visualized for dimensionality reduction as shown in FIG. 10.

It can be seen that the characteristics extracted by LeNet-5 are difficult to classify at two places (dotted circle), the data of different classes are pasted together, and the improved AlexNet only has one place which is difficult to classify. The improved AlexNet of the invention has better feature extraction capability. AlexNet and LeNet-5 are used for feature extraction later, and fault diagnosis is continued through LGBM classification.

4.4 bearing fault diagnosis method comprehensive comparison

In order to verify that the bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM provided by the invention has the highest accuracy, the invention compares the fault diagnosis effects of different combinations of the similar AlexNet and LGBM combined structure.

Wherein the LGBM classifier is optimized by bayesian parameters, and the parameter settings are shown in table 5.

The second last layer output of AlexNet and LeNet-5 is respectively input into LGBM, gcForest and Catboost classifiers, so as to generate six combined classifiers which are called CWT-Alex-LGBM, CWT-Alex-GCF, CWT-Alex-Cat, CWT-LeNet5-LGBM, CWT-LeNet5-GCF and CWT-LeNet5-Cat for short, and the CWT-AlexNet and CWT-LeNet5 which are used for directly outputting classification results by a neural network through a full connection layer are added, so that 8 models to be compared are obtained in total. Wherein, CWT-Alex has the same research structure as Wang, and CWT-LeNet5-GCF has the same research structure as Xu. Xu's study has concluded that the CWT-LeNet5-GCF model outperforms the CWT-LeNet5 and CWT-GCF as well as the traditional CNN model.

The above 8 combined models perform fault diagnosis on 9000 time-frequency spectrogram samples of 32 × 32 size obtained from 4.2 sections, perform 5 experiments, and record the accuracy and prediction time on a test set composed of 1800 samples, thereby obtaining the experimental results in table 6.

TABLE 5 LGBM parameter set

Table 6 test set fault diagnosis results of eight models

As can be seen from Table 6, the accuracy of the proposed bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM (CWT-Alex-LGBM in the table) is 99.712%, which is higher than the CWT-AlexNet model of Wang and the CWT-LeNet5-gcForest model of Xu (98.788% and 99.598%, respectively) compared with the other 7 models.

The classification of CWT-Alex and CWT-LeNet5 is performed by using full-link layer Softmax, the effect is not as good as that of classifying features extracted by a neural network by LGBM, gcForest and CatBoost classifiers, the average accuracy of the two is only 98.788% and 98.186%, the average accuracy is far lower than that of reclassifying the features by the LGBM, gcForest and CatBoost classifiers (more than 99.5%), the multi-prediction result is very unstable, and the variance is 2.147 and 1.971 respectively and is far higher than that of other combined models.

To more intuitively compare the reclassification effect of the LGBM, gcForest, and Catboost classifiers, the six combination profiles in the table are plotted as shown in FIGS. 11-13.

As can be seen from FIG. 12, the reclassification accuracy of the LGBM, gcForest and Catboost classifiers to the neural network all show that LGBM > gcForest > Catboost. As can be seen from FIG. 13, the prediction time for LGBM and CatBOost is less than gcForest, and LeNet-5 is generally less than AlexNet, but both on the same order of magnitude.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A bearing fault diagnosis method, characterized by comprising:

2. The bearing fault diagnosis method according to claim 1, characterized by comprising the steps of:

step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;

step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process;

step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;

step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;

step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.

3. The bearing fault diagnosis method according to claim 2, wherein in the first step, the signal sampling comprises:

4. The bearing fault diagnosis method according to claim 2, wherein in the second step, the Morlet continuous wavelet transform signal processing comprises:

among the different wavelets, the complex or analytic wavelet has a fourier transform with negative frequency zero; separating the phase and amplitude components of the signal using such a complex wavelet; morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler; the bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:

ψ(t)＝π^-1/4(exp(i2πf₀t)-exp(-(2πf₀)²/2))exp(-t²/2) (2)

wherein f is₀Is the center frequency of the mother wavelet; the second term in brackets is called the correction term for correcting the complex sine times the non-zero mean of the gaussian term; in fact, f₀Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:

wherein the Morlet wavelet is a simple complex sine exp (i2 π f)₀t) at a Gaussian envelope exp (-t)²B,/2); pi^1/4The term is a normalization factor that ensures that the wavelet has a unit energy;

the fourier transform of the Morlet wavelet is as follows:

wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis₀The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet; the characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:

the integrated Morlet wavelet energy is equal to 1 according to equation (3);

5. The bearing fault diagnosis method according to claim 2, wherein in step three, the AlexNet feature extraction comprises:

AlexNet was modified as follows:

(1) improving the dimension of model input: the input image size 224 x 224 of the classical AlexNet is still larger for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is higher, the pictures generated by wavelet transformation of all samples occupy a large storage space, so that color pictures with the size of 32 x 32 are adopted as input;

a variant of ReLU, pralu, was used, expressed as:

the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network;

6. The bearing fault diagnostic method of claim 5, wherein the modified AlexNet structure comprises:

(1) convolutional layer

wherein h is_jA jth output feature map representing a current convolutional layer; x_iThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix W_ijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, b_jMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; (x) is a nonlinear activation function corresponding to the PReLU function shown in equation (8);

(2) pooling layer

wherein S is^M×NIs a pooled scale matrix; m and N are the dimensions of S; during pooling from Y_cnUntil the whole Y is scanned by a fixed step size_cn(ii) a S is a 3 × 3 matrix, then Y_cnWill be reduced to 1/9 and assigned to P in the pool output layer_cn；

(3) Full connection layer

The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of the network with various structures, and overfitting can be effectively prevented by averaging the combination of a plurality of networks;

the last layer is an output layer; for classifying various types of faults, a Softmax classifier is used; representing input pictures in a training dataset as x_kThe label is y_kDenotes x_kA probability of belonging to class k, where y ∈ (1, 2.. eta., J) denotes a fault class; for each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2...., J); the Softmax activation function is expressed as follows:

where θ is the weight matrix of the Softmax layer, θ_iIs the row vector of θ;

(4) parameter updating

wherein the content of the first and second substances,

for practical probability, if the true class of the ith sample is k, then

Otherwise, the value is 0; w^(l)A parameter matrix of the l layer; first measure prediction in formula

And true category

The cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter;

wherein alpha is a learning rate, and the amplitude of gradient change in each iteration is controlled; the residual amount of the loss function generated at the jth node of the l-th layer is recorded as

The recurrence formula is expressed as:

the gradient formula of the loss versus parameter function is expressed as:

for formula (13), y_kThe value is 1 only in one category k, and the rest is 0; let the real category be

Then:

Residual δ of other layers^(L-1),...,δ⁽¹⁾To calculate according to recursion formula (15);

constructing a bearing fault feature extraction model by a Keras deep learning framework based on Python, wherein the framework is supported by a Tensorflow rear end; the SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.

7. The bearing fault diagnosis method of claim 2, wherein in step four, the LGBM fault diagnosis, including gradient-based unilateral sampling and mutually exclusive feature bundling, comprises:

(1) a gradient-based single-sided sampling algorithm; training examples with large gradients in a centralized manner, and for examples with small gradients, randomly extracting and compensating the influence on data distribution by adding a constant multiplier when calculating information gain; the gos algorithm is as follows:

inputting: with n instances x₁,...,x_nTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L;

and (3) outputting: a well trained strong learner;

according to vector V_j(d) Estimated variance gain over subsets A and BTo split the example:

step 6: circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached;

(2) the mutual exclusion characteristic binding algorithm comprises two steps of binding cluster generation and mutual exclusion characteristic combination;

determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by making mutual exclusion features exclusivefeatures reside in different bins, which can be simply implemented by adding offsets to the values of the original features;

the output of the penultimate layer of AlexNet is classified using Python's LGBMClassifier packet for programming;

(3) bayesian hyper-parameter optimization

Performing parameter tuning on the training process of the LGBM model by using HyperOpt; HyperOpt provides an easy-to-use Bayesian hyper-parameter optimization algorithm, and hyper-parameter optimization is executed through a model-based sequential optimization technology; optimization based on a sequence model is a Bayesian optimization technology;

bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; the method comprises the following steps of (1) establishing a model of an objective function by using a regression method as with all model-based optimization algorithms, selecting a next point to be acquired according to the model, and updating the model;

the basic algorithm of bayesian optimization is as follows:

step 1: setting a Gaussian process for the target function f;

wherein the content of the first and second substances,

represents a standard normal distribution;

to estimate μ (x) and σ (x), a gaussian process is fitted to the data; assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)₁),f(χ₂),...,f(χ_t) Then the vector [ f (χ) of the data set₁),f(χ₂),...,f(χ_t)]Is a sample of multivariate normal distribution, which is formed by summing the mean vectorsDefining a variance matrix, so that the Gaussian process is n-variable normal distribution, wherein n is the observation times; the covariance matrix is determined by a kernel function k (χ)₁,χ₂) By definition, it means that samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated; two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function₁Hexix-₂The values are likely to be correlated;

given a set of observations P_1:t＝f(χ_1:t) And sampling noise

The gaussian process is calculated as follows:

wherein the content of the first and second substances,

k＝[k(x,χ₁) k(x,χ₂) … k(x,χ_t)]；

bayesian optimization enables the maximum f (x) of an unknown objective function to be searched by using the Gaussian process model; selecting the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the balance between the promising portions predicted by the model; after the observation, the algorithm will update the gaussian process to take into account the new data; since all points of the search space are assumed to have good probability, the gaussian process is initialized with a constant mean; after each observation, the model is gradually improved;

the Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)₁,χ₂)；

The goal is to learn the characteristic length scale l²And the total variance

wherein, mu₀Is a mean function.

8. A bearing fault diagnosis system for implementing the bearing fault diagnosis method according to any one of claims 1 to 7, characterized in that the bearing fault diagnosis system comprises:

9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:

10. An information data processing terminal characterized by being used to implement the bearing failure diagnosis system according to claim 8.