CN113834656A - Bearing fault diagnosis method, system, equipment and terminal - Google Patents

Bearing fault diagnosis method, system, equipment and terminal Download PDF

Info

Publication number
CN113834656A
CN113834656A CN202110997171.6A CN202110997171A CN113834656A CN 113834656 A CN113834656 A CN 113834656A CN 202110997171 A CN202110997171 A CN 202110997171A CN 113834656 A CN113834656 A CN 113834656A
Authority
CN
China
Prior art keywords
model
layer
alexnet
sample
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110997171.6A
Other languages
Chinese (zh)
Other versions
CN113834656B (en
Inventor
刘立芳
张梓锐
和伟辉
李飞龙
齐小刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110997171.6A priority Critical patent/CN113834656B/en
Publication of CN113834656A publication Critical patent/CN113834656A/en
Application granted granted Critical
Publication of CN113834656B publication Critical patent/CN113834656B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M13/00Testing of machine parts
    • G01M13/04Bearings
    • G01M13/045Acoustic or vibration analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Complex Calculations (AREA)

Abstract

The invention belongs to the technical field of bearing fault diagnosis and discloses a bearing fault diagnosis method, a system, equipment and a terminal, wherein the bearing fault diagnosis method comprises the following steps: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; extracting fault features of the time-frequency spectrogram by using an improved AlexNet model; and for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and using Bayesian optimization. The bearing fault diagnosis method provided by the invention has the optimal fault diagnosis accuracy. Through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that consumed by other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.

Description

Bearing fault diagnosis method, system, equipment and terminal
Technical Field
The invention belongs to the technical field of bearing fault diagnosis, and particularly relates to a bearing fault diagnosis method, system, equipment and terminal.
Background
At present, effective mechanical equipment failure diagnosis can reduce huge economic losses caused in industrial production, and in recent years, the application of machine learning or deep learning techniques has been greatly increased, and in addition, the utilization of advanced measurement techniques enables a large amount of data to be collected in an industrial environment. Under the background of big data, the machine learning and Deep learning fault diagnosis algorithm model shows excellent effects, such as Deep Neural Network (DNN), CNN, recurrent Neural Network, and the like.
At present, automatic encoders and convolutional neural networks are common in deep learning fault diagnosis models. Lei et al propose a deep neural network for rotary machine fault diagnosis based on frequency domain data. Zong et al propose a frequency domain data-based bearing fault diagnosis denoising autoencoder. Wei et al propose a one-dimensional CNN for bearing fault diagnosis by means of raw time signals, which perform well in noisy environments. Guo X et al propose a hierarchical adaptive depth CNN for bearing fault diagnosis by converting the raw time signal into a 32X 32 matrix as input. Wang Q et al propose a CNN-based bearing reliability assessment and residual life prediction method that converts frequency domain signals into a 32 x 32 matrix as input. Wang J et al proposed a generic bearing fault diagnosis model transferred from a well-known AlexNet model and compared the effects of eight time-frequency feature extraction methods. Wang L H et al propose a motor fault diagnosis CNN that converts a fault signal into a Time-frequency image using Short-Time Fourier Transform (STFT). Claessens et al propose a bearing fault diagnosis local connection network consisting of normalized sparse autoencoders. Eren et al use one-dimensional convolutional neural networks for time series prediction for data preprocessing. Better efficiency is achieved by filtering, decimating and normalizing the input data. Ran et al claim that time series prediction using DNN achieves a high degree of accuracy, but do not provide any architectural details for their proposed DNN networks. The same problem occurs in the research of Mao et al, claiming to use a new deep learning approach to achieve high accuracy, but they provide only training accuracy (rather than testing accuracy) and do not provide any feasible architecture for the proposed network, resulting in difficult reproducibility. In more advanced articles they are focused on both CNN and Long Short-Term Memory networks (LSTM) for bearing fault diagnosis. However, the stepwise construction process of the model they propose is not explicitly explained. Therefore, a new bearing fault diagnosis method is needed to overcome the defects of the conventional bearing fault diagnosis method.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) in existing bearing fault diagnosis methods, no architectural details are provided for the proposed DNN network.
(2) In the existing bearing fault diagnosis method, only training accuracy (not test accuracy) is provided, and any feasible architecture is not provided for the proposed network, thereby causing difficulty in reproduction.
(3) In the existing technical scheme which simultaneously focuses on CNN and long-short term memory network LSTM for bearing fault diagnosis, the gradual construction process of the model is not clearly explained.
The difficulty in solving the above problems and defects is:
(1) many DNN models are deep and complex in structure.
(2) In training and testing the model, the testing precision is generally smaller than the training precision, and higher accuracy can be obtained by giving the training precision, but the excellence of the model method cannot be explained more than the testing precision.
(3) Sometimes, the model is built through the final result obtained by continuously adjusting result feedback, and the building process is difficult to explain.
The significance of solving the problems and the defects is as follows:
(1) in view of the above first problem, the architecture details of the DNN network can be directly constructed by the deep learning tool to construct the same network model, so as to directly utilize the constructed excellent model for fault diagnosis.
(2) In view of the second problem, the provision of test accuracy in the fault diagnosis method can better explain the advantages and effects of the method, and provide a feasible architecture for the proposed network, which can be reproduced more easily.
(3) Diagnosing the third problem, it is stated that the gradual model building process can make the diagnostic method have better interpretability, clearer when researching the principle of the method, and clearer guidance when improving the method.
Disclosure of Invention
The invention provides a bearing fault diagnosis method, a system, equipment and a terminal aiming at the problems in the prior art, and particularly relates to a bearing fault diagnosis method, a system, equipment and a terminal based on a continuous wavelet transform CWT and AlexNet-light gradient elevator fusion model AlexNet-LGBM.
The invention is realized in such a way that a bearing fault diagnosis method comprises the following steps:
firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, extracting fault features of the time-frequency spectrum by using an improved AlexNet model; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
Further, the bearing fault diagnosis method comprises the following steps:
step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode; the method has the advantages that the original signal samples are segmented, so that samples with proper sizes are generated for processing in the subsequent steps, and in addition, more samples can be generated for training and testing after the samples are segmented, so that the accuracy of the model is increased.
Step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process; the main functions of the step are two points: (1) and (2) processing the one-dimensional signals by utilizing Morlet continuous wavelet transform to extract time domain features and frequency domain features of the one-dimensional signals, and converting the one-dimensional signals into two-dimensional pictures by utilizing Morlet continuous wavelet transform for training a subsequent model and further extracting the features.
Step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model; the method mainly has the main effects of training an improved AlexNet feature extraction model and adjusting various hyper-parameters of the model, so that the model has the optimal feature extraction capability and then stores model parameters, and the model parameters are used for a subsequent test stage.
Step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model; the main function of the step is to train an LGBM model, and the fault characteristics extracted by the AlexNet model are input into the LGBM to train a final fault classifier.
Step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result. The main function of this step is to obtain the final classification result of the fault diagnosis.
Further, in step one, the signal sampling includes:
selecting continuous data points with sampling length sample _ length from an original vibration signal as an original sample; sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation; readjusting the time-frequency image to be proper N multiplied by N; successive sample _ length data points after the sample interval sample _ interval are selected in an overlapping manner as another sample, another image of size N × N is generated, and the above process is repeated to generate sufficient training and test images.
Further, in step two, the Morlet continuous wavelet transform signal processing includes:
the wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
Figure BDA0003234219220000031
in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal are separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet; the second term in brackets is called the correction term and is used to correct the complex sine times the non-zero mean of the gaussian term. In fact, f0Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:
Figure BDA0003234219220000032
wherein the Morlet wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2B,/2); pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy.
The fourier transform of the Morlet wavelet is as follows:
Figure BDA0003234219220000033
wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis0The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
Figure BDA0003234219220000041
the energy spectrum, i.e. the squared magnitude of the fourier transform, is calculated as follows:
Figure BDA0003234219220000042
the integrated Morlet wavelet energy is equal to 1 according to equation (3).
And converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises the corresponding relation between time and frequency.
Further, in step three, the AlexNet feature extraction includes:
AlexNet was modified as follows:
(1) improving the dimension of model input: the input image size 224 × 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of a bearing is high, a picture generated by performing wavelet transform on all samples occupies a large storage space, so that a color picture with the size of 32 × 32 is adopted as input.
(2) Convolutional layer activation function improvement: the ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
Figure BDA0003234219220000043
a variant of ReLU, pralu, was used, expressed as:
Figure BDA0003234219220000044
PReLU differs from ReLU in that when z <0, the value is a linear function with slope a, and the gradient update is calculated as:
Figure BDA0003234219220000045
and the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.
(3) Improvement of a full connection layer and an output layer: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four types are classified, so that the size of an output layer of the improved AlexNet structure is set to be 4; as the output layer becomes smaller, the size of the second fully connected layer is set to 1000.
Further, the improved AlexNet structure comprises:
(1) convolutional layer
The convolution layer and the previous layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:
Figure BDA0003234219220000051
wherein h isjA jth output feature map representing a current convolutional layer; xiThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; f (x) is a non-linear activation functionCorresponding to the PReLU function shown in equation (8).
(2) Pooling layer
The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects a largest pool for output from the convolution output layer YcnThe maximum values of the extraction are as follows:
Figure BDA0003234219220000052
wherein S isM×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from YcnUntil the whole Y is scanned by a fixed step sizecn(ii) a S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn
(3) Full connection layer
The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.
The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. Representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:
Figure BDA0003234219220000053
where θ is the weight matrix of the Softmax layer, θiIs the row vector of theta.
(4) Parameter updating
To accommodate the multi-classification fault diagnosis task, the loss function is set as a cross-entropy loss function, expressed as:
Figure BDA0003234219220000054
wherein the content of the first and second substances,
Figure BDA0003234219220000061
represents the probability that the prediction of the ith sample belongs to class k;
Figure BDA0003234219220000062
for practical probability, if the true class of the ith sample is k, then
Figure BDA0003234219220000063
Otherwise, the value is 0; w(l)A parameter matrix of the l layer; the first term in the formula measures the prediction
Figure BDA0003234219220000064
And true category
Figure BDA0003234219220000065
The cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is the L2 regularization term, and the coefficient λ is the weight decay parameter.
The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
Figure BDA0003234219220000066
Figure BDA0003234219220000067
where α is the learning rate, controlling the magnitude of the gradient change in each iteration. The residual amount of the loss function generated at the jth node of the l-th layer is recorded as
Figure BDA0003234219220000068
The recurrence formula is expressed as:
Figure BDA0003234219220000069
the gradient formula of the loss versus parameter function is expressed as:
Figure BDA00032342192200000610
Figure BDA00032342192200000611
for formula (13), ykThe value is 1 only in one category k, and the rest are 0. Let the real category be
Figure BDA00032342192200000612
Then:
Figure BDA00032342192200000613
Figure BDA00032342192200000614
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Figure BDA00032342192200000615
Figure BDA00032342192200000616
Residual δ of other layers(L-1),...,δ(1)To be calculated according to the recursion formula (15).
The bearing fault feature extraction model was constructed by a Python-based Keras deep learning framework using the tensrflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
Further, in step four, the LGBM fault diagnosis includes gradient-based unilateral sampling and mutual exclusion feature bundling, including:
(1) gradient-based single-edge sampling algorithms. The case with large gradient is trained intensively, and for the case with small gradient, random extraction is adopted and the influence on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L.
And (3) outputting: a well-trained strong learner.
Step 1: initialization: let topN ═ a × len (i) denote the number of large gradient data samples; adding L into a model list model, and setting the weight w of each training datum as 1;
step 2: predicting training data by the model list, calculating the loss g of each data by using a loss function loss, and arranging the training data according to the g descending order;
and step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACTaking | as small gradient subsets B, and combining the large gradient subsets and the small gradient subsets to be recorded as usedSet;
and 4, step 4: multiplying the weight w of the small gradient sample by a coefficient (1-a)/b;
and 5: inputting data I, negative gradient-g and weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model;
according to vector Vj(d) The example is split by the estimated variance gain over subsets a and B:
Figure BDA0003234219220000071
wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (d); add newModel to model list models;
step 6: and (5) circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached.
(2) The mutually exclusive feature binding algorithm comprises two steps of binding cluster generation and mutually exclusive feature combination.
Determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.
The output of the penultimate layer of AlexNet is sorted using Python's LGBMClassifier packet for programming.
(3) Bayesian hyper-parameter optimization
And performing parameter optimization on the training process of the LGBM model by using HyperOpt. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm, and hyper-parametric optimization is performed through a model-based sequential optimization technology. Sequence model-based optimization is a bayesian optimization technique.
Bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; as with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.
The basic algorithm of bayesian optimization is as follows:
step 1: setting a Gaussian process for the target function f;
step 2: according to the initial space filling experimental design, at n0Point observation f, setting n to n0
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1;
and 4, step 4: returning a solution: a point calculated using the maximum f (x), or a point calculated using the maximum a posteriori mean;
the objective function f is usually unknown, a gaussian process defining for each point x the probability distribution f (x) of the gaussian distribution, determined by the mean μ and the standard deviation σ, defining the probability distribution of the function:
Figure BDA0003234219220000081
wherein the content of the first and second substances,
Figure BDA0003234219220000082
representing a standard normal distribution.
To estimate μ (x) and σ (x), a gaussian process is fitted to the data. Assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of multivariate normal distribution defined by a mean vector and a covariance matrix, so the gaussian process is an n-variable normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)12) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated.
Given a set of observations P1:t=f(χ1:t) And sampling noise
Figure BDA0003234219220000083
The gaussian process is calculated as follows:
Figure BDA0003234219220000084
wherein the content of the first and second substances,
Figure BDA0003234219220000087
Figure BDA0003234219220000085
Figure BDA0003234219220000086
k=[k(x,χ1)k(x,χ2)…k(x,χt)]。
bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the desired portion predicted by the model. After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since it is assumed that all points of the search space have good probability. After each observation, the model was gradually perfected.
The Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)12)。
The goal is to learn the characteristic length scale l2And the total variance
Figure BDA0003234219220000091
The probability of data, θ, is maximized given the kernel function, and the marginal probability is calculated as follows:
Figure BDA0003234219220000092
wherein, mu0Is a mean function.
Another object of the present invention is to provide a bearing fault diagnosis system applying the bearing fault diagnosis method, the bearing fault diagnosis system including:
the signal sampling module is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module is used for carrying out Morlet continuous wavelet transform signal processing, carrying out continuous wavelet transform on each sample, generating a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module is used for inputting a time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module is used for inputting a time-frequency graph with the size of NxN of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num x 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
(1) signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
(2) and Morlet continuous wavelet transformation signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the step (3); for the test process, jumping to the step (5);
(3) extracting AlexNet features: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
(5) the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
Another object of the present invention is to provide an information data processing terminal for implementing the bearing fault diagnosis system.
By combining all the technical schemes, the invention has the advantages and positive effects that: aiming at the problem that the classification capability of a Softmax layer of a CNN is not as good as that of a novel Machine learning classification method, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient reinforced Machine, AlexNet-LGBM), and the method can be divided into three parts: vibration signal data processing based on continuous wavelet transform: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, for fault feature extraction, improving an AlexNet model to extract features of a time-frequency spectrogram; and thirdly, for fault diagnosis, fault classification is carried out on the extracted fault characteristics through an LGBM classification algorithm, and optimal model parameters are selected by using Bayesian optimization. The invention also uses a bearing data set of the Kaiser University of Western storage (CWRU) to carry out a comparison experiment, compares the improved AlexNet and LeNet-5 with various combination methods of multi-granularity cascade forests, LGBMs and Catboost, and shows that the AlexNet-LGBM fault diagnosis method based on continuous wavelet transformation provided by the invention has the optimal fault diagnosis accuracy.
The bearing fault diagnosis method provided by the invention also has the following advantages:
(1) for equipment fault feature extraction, firstly, Continuous Wavelet Transform (CWT) is performed on vibration data to convert the vibration data into a time-frequency graph. In order to adapt to the extraction of the fault characteristics of the bearing, an AlexNet model is improved: firstly, the input dimension is changed into 32 multiplied by 3 so as to reduce the storage space occupied by the time-frequency diagram; secondly, the convolution layer activation function uses a parameterized Linear rectification function (PReLU) to overcome the limitation of the Linear rectification function (ReLU); the full connection layer and the output layer are changed to be suitable for the size of the fault classification number; the improved AlexNet, LeNet-5 and EfficientNet-B0 migration models are respectively used for feature extraction, and the feature extraction capabilities of three neural network structures are compared.
(2) For equipment fault diagnosis, a fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient Boosted Machine, AlexNet-LGBM) is proposed: firstly, extracting fault characteristics from a vibration signal by using continuous wavelet transformation and improved AlexNet, further carrying out fault classification on the extracted characteristics by using a lightweight gradient elevator classification algorithm, and optimizing model parameters by using Bayesian optimization. And various combination methods of the improved AlexNet, LeNet-5 feature extraction and multi-granular Cascade Forest (gcForest), LGBM and Catboost classification algorithms are compared.
In order to solve the problems of fault feature extraction and fault diagnosis in a rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM, through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that of other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a bearing fault diagnosis method provided in an embodiment of the present invention.
FIG. 2 is a block diagram of a bearing fault diagnosis system provided by an embodiment of the present invention;
in the figure: 1. a signal sampling module; 2. a wavelet transform signal processing module; 3. an AlexNet feature extraction module; 4. an LGBM fault diagnosis module; 5. and a testing module.
Fig. 3 is a flow chart of bearing vibration signal processing according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the effect of the Morlet continuous wavelet transform provided by the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an improved AlexNet according to an embodiment of the present invention.
Fig. 6 is a flowchart of bearing fault diagnosis provided by an embodiment of the present invention.
Fig. 7 is a schematic diagram of processing results of continuous wavelet transform according to an embodiment of the present invention.
FIG. 8 is a schematic diagram of the accuracy rate variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.
FIG. 9 is a schematic diagram of the loss variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.
Fig. 10 is a schematic diagram of TSNE visualization display of extracted features provided in the embodiment of the present invention.
Fig. 11 is a schematic diagram of the accuracy of 5 experimental test sets of six combination models provided by the embodiment of the present invention.
Fig. 12 is a schematic diagram of the average accuracy of 5 experimental test sets of six combined models provided in the embodiment of the present invention.
Fig. 13 is a schematic diagram of average time of 5 experimental prediction test sets of six combined models provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, a device and a terminal for diagnosing a bearing fault, which are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a bearing fault diagnosis method provided by an embodiment of the present invention includes the following steps:
s101, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
s102, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training procedure, S103 is performed; for the test process, jumping to S105;
s103, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
s104, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
s105, testing: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
As shown in fig. 2, a bearing fault diagnosis system provided by an embodiment of the present invention includes:
the signal sampling module 1 is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module 2 is used for performing Morlet continuous wavelet transform signal processing, performing continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to divide the images into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module 3 is used for inputting the time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module 4 is used for inputting a time-frequency graph with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module 5 is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
The technical solution of the present invention is further described below with reference to specific examples.
Aiming at the problems in the prior art, the invention provides a bearing fault diagnosis method based on a continuous wavelet transform and AlexNet-lightweight gradient elevator fusion model. Firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, improving an AlexNet model to extract fault characteristics of the time-frequency spectrogram; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
1. Signal processing
1.1 vibration Signal processing flow
First, a sampling length sample _ length (set to 1024 in the experiment of the present invention) of consecutive data points is selected from the original vibration signal as an original sample. Then, sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation. Subsequently, the time-frequency image is readjusted to an appropriate N × N (set to 32 × 32 in the experiment of the present invention) size. Then, consecutive sample _ length data points after the sampling interval sample _ interval (set to 384 in the experiment of the present invention) are selected as another sample in an overlapping manner, resulting in another image of N × N size, as shown in fig. 3. The above process is repeated to generate sufficient training and test images.
1.2 Morlet continuous wavelet transform signal processing
The wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
Figure BDA0003234219220000131
in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal can be separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The invention uses Morlet to process the bearing vibration signal. The Morlet wavelet is defined as:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet. The second term in parentheses is referred to as the correction term because it corrects for the non-zero mean of the complex sine times the gaussian term (corresponding to the first term in parentheses). In fact, f0The value of > 0 is negligible, in which case the Morlet wavelet can be expressed as follows:
Figure BDA0003234219220000132
this wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2And/2) in the column. Pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy. The function given by equation (3) is not a true wavelet because it has a non-zero mean, i.e., its zero frequency term for the energy spectrum is non-zero, and therefore it is not acceptable. However, in practice, when f0> 0, it can be used with minimal error.
The fourier transform of the Morlet wavelet is as follows:
Figure BDA0003234219220000133
it has a form of a Gaussian function, and is shifted by f along the frequency axis0. The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
Figure BDA0003234219220000134
the energy spectrum (squared magnitude of the fourier transform) is calculated as follows:
Figure BDA0003234219220000135
the integrated Morlet wavelet energy is equal to 1 according to equation (3).
The effect of using the Morlet continuous wavelet transform on the bearing vibration signal sample processing according to the sampling frequency of the signal (12 kHz in the present invention) is shown in FIG. 4.
Fig. 4(a) is a record of the acceleration continuously measured by the bearing acceleration sensor for the bearing rotating at high speed within 86 milliseconds, with the abscissa as the time axis and the ordinate as the acceleration of the monitoring point during the rotation of the bearing. Under ideal conditions, the bearing acceleration should be 0 for a perfectly uniform rotation. It can be seen from fig. 4(a) that the actual acceleration of the bearing fluctuates around 0 mean, and at about 27 ms and 61 ms, the acceleration of the bearing is larger, the corresponding energy of the bearing is larger, and the higher color brightness is shown at the corresponding time position in the right graph.
Through continuous wavelet transformation, the one-dimensional vibration signal can be converted into a picture, and the picture contains the corresponding relation between time and frequency.
1.3 AlexNet feature extraction
The AlexNet model proposed by Krizhevsky et al can achieve better performance in image recognition than other methods. To date, the Alexnet model has still played an important role in many areas. In order to adapt to the extraction of the fault characteristics of the bearing, the AlexNet is improved as follows:
(1) the model input dimension improves. The input image size 224 x 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is high, the pictures generated by wavelet transformation of all samples occupy a large storage space. Therefore, the present invention takes a color picture of 32 × 32 size as input.
(2) Convolutional layer activation function improvement. The ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
Figure BDA0003234219220000141
since the negative gradient is set to 0 by the ReLU activation function, it cannot participate in subsequent propagation and is activated, so that the parameters of the neuron cannot be updated. If the learning rate is set to be too large in the actual training, part of neurons can be invalid, and the parameters cannot be updated effectively, so that the training fails. To this end, the invention uses a variant PReLU of ReLU, which is represented by the form:
Figure BDA0003234219220000142
PReLU is different from ReLU, and has a value of a linear function with a slope a (smaller constant) when z < 0. The gradient update is calculated as:
Figure BDA0003234219220000143
the PReLU can greatly reduce the loss of negative gradient information and can be suppressed at one side. The value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.
(3) Full connection layer and output layer improvement. Since the bearing fault diagnosis studied by the present invention includes 1 normal type and 3 fault types (which will be described in detail in the fourth section), for a total of four categories, the output layer size of the improved AlexNet structure is set to 4. As the output layer becomes smaller, the size of the second fully-connected layer is set to 1000 to better extract the key features. The improved AlexNet structure proposed by the present invention is shown in fig. 5.
1.3.1 convolutional layers
The convolution layer and the upper layer are connected in a local connection and counterweight mode, so that the number of parameters is greatly reduced. The operation process when convolution is carried out is as follows:
Figure BDA0003234219220000151
wherein h isjJ-th output feature map, X, representing the current convolutional layeriI-th output feature map representing last convolutional layer (convolutional layer input of current layer). The represents convolution operation, parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjAnd mapping to the offset corresponding to the jth input characteristic of the convolution layer of the current layer. f (x) is a nonlinear activation function, which in the present invention corresponds to the PReLU function shown in equation (8).
1.3.2 pooling layer
The grey color in fig. 5 is the pooling layer used for downsampling after the convolution operation to enable further dimensionality reduction of the extracted features. Common pooling layers include a maximum pool and an average pool. The invention selects the largest pool that can be output from the convolution output layer YcnThe maximum values of the extraction are as follows:
Figure BDA0003234219220000152
wherein SM×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from YcnUntil the whole Y is scanned by a fixed step sizecn. In this chapter, S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn
1.3.3 full connection layer
The features are flattened into one dimension by the last convolutional and pooling layers before reaching the Flatten layer, and then pass through two fully-connected layers. Each neuron in the fully connected layer is fully connected to all neurons in the upper layer. Dropout operation is carried out on the outputs of the two full connection layers, the discarding rate is 0.5, and partial units are not updated, namely are randomly discarded by the network. Therefore, the structure of the network changes after each iteration, which is equivalent to the effect of ensemble learning of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.
The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. The Softmax classifier can effectively solve the problem of multiple classifications. Representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:
Figure BDA0003234219220000161
where θ is the weight matrix of the Softmax layer, θiIs the row vector of theta.
1.3.4 parameter update
To accommodate the multi-classification fault diagnosis task of this chapter, the loss function is set as a cross-entropy loss function, expressed as:
Figure BDA0003234219220000162
wherein
Figure BDA0003234219220000163
Representing the probability that the prediction of the ith sample belongs to class k,
Figure BDA0003234219220000164
is the actual probability (if the true class of the ith sample is k, then
Figure BDA0003234219220000165
Otherwise 0), W(l)Is the parameter matrix of the l-th layer. The first term in the formula measures the prediction
Figure BDA0003234219220000166
And true category
Figure BDA0003234219220000167
The cross entropy between the two is the maximum entropy and the minimum loss function when the predicted value and the real value are equal. The second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter, so that the relative weights of the two terms can be balanced, and overfitting can be effectively prevented.
The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
Figure BDA0003234219220000168
Figure BDA0003234219220000169
where α is the learning rate, controlling the magnitude of the gradient change in each iteration. The residual amount of the loss function generated at the jth node of the l-th layer is recorded as
Figure BDA00032342192200001610
Its recurrence formula can be expressed as:
Figure BDA00032342192200001611
the gradient of the loss versus parameter function can be written as:
Figure BDA00032342192200001612
Figure BDA00032342192200001613
for formula (13), ykThe value is 1 only in one category k, and the rest are 0. Let the real category be
Figure BDA00032342192200001614
Then:
Figure BDA00032342192200001615
Figure BDA00032342192200001616
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Figure BDA00032342192200001617
Figure BDA0003234219220000171
Residual δ of other layers(L-1),...,δ(1)Can be calculated according to the recursion formula (15).
For experiments, the invention constructs a bearing fault feature extraction model by a Pyron-based Keras deep learning framework using Tensorflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
1.4 LGBM Fault Classification
The LGBM classification algorithm mainly comprises unilateral sampling based on gradient and mutually exclusive feature binding.
(1) Gradient-based single-edge sampling algorithms. The case with large gradient is trained intensively, and for the case with small gradient, random extraction is adopted and the influence on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L.
And (3) outputting: a well-trained strong learner.
Step 1: initialization: let topN be a × len (i) denote the number of large gradient data samples. The model list models adds L. The weight w of each training data is set to 1.
Step 2: the model list predicts the training data and calculates the loss g for each data using the loss function loss. And the training data is sorted in descending order of g.
And step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACAnd | as small gradient subsets B. The large and small gradient subsets are merged and denoted usedSet.
And 4, step 4: the weight w of the small gradient sample is multiplied by a factor (1-a)/b.
And 5: and inputting the data I, the negative gradient-g and the weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model newModel.
According to vector Vj(d) The estimated variance gain on subsets a and B divides the instances.
Figure BDA0003234219220000172
Wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (2). Will newModel is added to the model list models.
Step 6: and (5) circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached.
(2) The mutually exclusive feature binding algorithm comprises two steps of binding cluster generation and mutually exclusive feature combination.
The bundled cluster generation algorithm determines which mutually exclusive features can be merged (features that can be merged are put together and are called bundles), and then the mutually exclusive feature merging merges the respective bundles into one feature. Determining which mutually exclusive features can be used in combination is Greeny bundle, and the specific process is that firstly, the features are used as vertexes, edges are added to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.
The present invention uses the Python's LGBMClassifier packet for programming to classify the output of the second last layer of AlexNet.
1.5 Bayesian hyper-parameter optimization
The invention uses HyperOpt to carry out parameter optimization on the training process of the LGBM model. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization via model-based sequential optimization techniques. Sequence model-based optimization is a bayesian optimization technique.
Bayesian optimization is a model-based optimization algorithm that is specifically tailored to the objective function (also called cost function). Bayesian optimization searches for the maximum of the unknown objective function from which the sample can be obtained. As with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.
The basic algorithm of bayesian optimization is as follows:
step 1: a gaussian process is set for the objective function f.
Step 2: according to the initial space filling experimental facilityIs counted at n0Point observation f. Setting n as n0
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1.
And 4, step 4: returning a solution: the point calculated using the maximum f (x), or the point calculated using the maximum a posteriori mean.
The objective function f is usually unknown, and a gaussian process defines for each point x a probability distribution f (x) of the gaussian distribution. And is therefore determined by the mean μ and the standard deviation σ. Defining the probability distribution of the function:
Figure BDA0003234219220000181
wherein
Figure BDA0003234219220000182
Representing a standard normal distribution.
To estimate μ (x) and σ (x), a gaussian process needs to be fitted to the data. For this reason, it is assumed that each observation f (χ) is a sample of a normal distribution. If there is a data set consisting of a plurality of observations, i.e. f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of a multivariate normal distribution defined by a mean vector and a covariance matrix. Thus, the gaussian process is an n-variate normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)12) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated.
Given a set of observations P1:t=f(χ1:t) And sampling noise
Figure BDA0003234219220000183
The gaussian process is calculated as follows:
Figure BDA0003234219220000191
wherein
Figure BDA0003234219220000196
Figure BDA0003234219220000192
Figure BDA0003234219220000193
k=[k(x,χ1)k(x,χ2)…k(x,χt)]。
Bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the balance between exploration (improving the model in less explored parts of the search space) and development (favoring the promising parts predicted by the model). After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since all points of the search space are assumed to have good probability. After each observation, the model was gradually perfected.
The Gaussian process is completely specified by its mean function as μ (x) and kernel function k (χ)12)。
The goal is to learn the characteristic length scale l2And the total variance
Figure BDA0003234219220000194
The probability of data, θ, is maximized given the kernel function. The marginal probability is calculated as follows:
Figure BDA0003234219220000195
wherein mu0Is a mean function.
3. Bearing fault diagnosis method based on CWT and AlexNet-LGBM
As shown in fig. 6, the bearing fault diagnosis process based on continuous wavelet transform and AlexNet-LGBM is as follows:
step 1: signal sampling: for the original vibration data, each sample _ length (1024 in the fourth experiment) consecutive data points is used as a sample, and consecutive sampling is performed at a sampling interval sample _ interval (384 in the fourth experiment) in an overlapping sampling manner.
Step 2: continuous wavelet transform signal processing: each sample is subjected to continuous wavelet transform to generate a corresponding time-frequency image, and the time-frequency image is readjusted to be a color picture of size N × N (set to 32 × 32 in the fourth experiment). Sufficient picture partitions into training and test sets are generated. For the training process, step 3 is performed; for the test process, jump to step 5 execution.
And step 3: extracting AlexNet features: and inputting the time-frequency diagram with the size of N multiplied by N of the training set into an improved AlexNet model for training, and storing the model.
And 4, step 4: LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last full-connected layer, inputting the time-frequency diagram into an LGBM model for training, wherein the data dimension is sample _ Num multiplied by 1000, the sample _ Num represents the number of samples, and 1000 is the number of neurons of the second full-connected layer of the AlexNet model.
And 5: the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the output of the AlexNet model into the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
4. Experimental verification
4.1 data set and Experimental Environment introduction
The present invention uses the bearing vibration data set published by the university of Keiss Xizhi. In the CWRU bearing experiment, there are four variables including fault location, fault depth, motor load and sampling frequency. The data file adopts an MATLAB format and comprises fan end and drive end bearing acceleration data and motor rotating speed data.
Considering that the load is not 0 most of the time when the rotary machine works in reality, the fault diagnosis should be applied to all load situations as much as possible, and the fault position is more concerned than the fault depth so as to be convenient for replacing parts. Therefore, the fault diagnosis target is set to identify the fault position of the bearing, and the fault position comprises four types of inner ring faults, ball faults, outer ring faults and normal. In conjunction with the absence of data under individual conditions of the CWRU data set, the present invention uses normal data for 1 to 3 horsepower loads and drive end bearing failure data for a 12kHz sampling frequency, using specifically the CWRU portion data files shown in tables 1 and 2.
The experiment was performed on a Windows 1064-bit operating system computer with a GPU, the CPU model was i5-4200U, and the running memory was 12 GB. The programming was done on a Jupyter notewood compiler using Python 3.7 language, using the deep learning framework of the tenserflow 2.3.1 and Keras 2.4.3 versions.
Table 1 normal data file used by the present invention
Figure BDA0003234219220000201
Table 2 fault data file for use with the present invention
Figure BDA0003234219220000202
4.2 data processing
In the CWRU dataset, each operating condition was run for around 20s, i.e. about 240,000 data points per dataset, depending on 12,000Hz of the sample frequency. Therefore, it is necessary to truncate the original vibration signal to generate training and test data sets. In the present invention, the overlap-sampling method introduced in section 3.1.1 is used to generate training and test data sets. The truncation window slides along the original vibration signal with a sampling interval of 384 data points and a window size of 1,024 data points. Each movement of the window produces a data set of 1,024 data points. The first 300 samples were selected from a small sample consisting of several consecutive 1,024 consecutive data points generated for each file, so that a total of 30 files in tables 1 and 2 resulted in 9,000 samples.
9000 samples are processed by continuous wavelet transform signals in 1.2 knots, Morlet mother wavelet function is selected, and a time-frequency spectrogram obtained by wavelet transform is reset to be 32 multiplied by 32 pixels, so that 9,000 time-frequency pictures with uniform size are obtained. The processing results are shown in fig. 7.
As can be seen from fig. 7, the normal bearings have a more uniform energy distribution compared to the failed bearings, while the failed bearings show periodic high energy bands, and the failure frequency is different from the frequency distribution of the normal bearings in the longitudinal direction, and the energy distribution of the normal bearings is in the lower frequency band.
TABLE 3 data set partitioning
Figure BDA0003234219220000211
4.3 neural network feature extraction capability comparison
In order to compare the feature extraction capability of different neural network structures on the vibration spectrogram of the bearing, the improved AlexNet and LeNet-5 provided in section 3.2 are compared with EfficentNet.
TABLE 4 LeNet-5 and EfficentNet Structure and parameter settings
Figure BDA0003234219220000212
The improved AlexNet structure is shown in section 1.3, total 17,289,484 parameters, and compared with the original AlexNet with 60,965,128 parameters, the improved AlexNet structure is reduced by 71.6%, and the training speed of AlexNet is improved.
As the model of EfficentNet from B0 to B7 requires larger and larger picture input sizes, the model of EfficentNet-B0 is only suitable for 32 x 32 pictures in this chapter, and the structure and parameter settings of the improved LeNet-5 and EfficentNet-B0 models from top to bottom are shown in Table 4.
AlexNet, LeNet-5 and EffentrtNet all use the cross entropy loss function, category _ cross sensitivity and SGD optimizer, with a learning rate set to 0.001. The number of iterations is set to 30 generations, and the training results are shown in fig. 8 and 9.
From the change of the training accuracy and the loss, the accuracy and the loss of the three models are almost not changed after 30 iterations, and the convergence is achieved. The accuracy of the validation set of EfficientNet can only reach about 85%, and LeNet-5 and AlexNet can achieve the better effect of 98% accuracy. The EfficientNet is not suitable for fault diagnosis of a bearing fault spectrogram of 32 x 32 pixels, the training fluctuation of LeNet-5 is larger than that of AlexNet, and AlexNet is more stable than that of LeNet-5.
The features extracted from the penultimate fully connected layers of AlexNet and LeNet-5 are clustered by the SNE tool of sklern and visualized for dimensionality reduction as shown in FIG. 10.
It can be seen that the characteristics extracted by LeNet-5 are difficult to classify at two places (dotted circle), the data of different classes are pasted together, and the improved AlexNet only has one place which is difficult to classify. The improved AlexNet of the invention has better feature extraction capability. AlexNet and LeNet-5 are used for feature extraction later, and fault diagnosis is continued through LGBM classification.
4.4 bearing fault diagnosis method comprehensive comparison
In order to verify that the bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM provided by the invention has the highest accuracy, the invention compares the fault diagnosis effects of different combinations of the similar AlexNet and LGBM combined structure.
Wherein the LGBM classifier is optimized by bayesian parameters, and the parameter settings are shown in table 5.
The second last layer output of AlexNet and LeNet-5 is respectively input into LGBM, gcForest and Catboost classifiers, so as to generate six combined classifiers which are called CWT-Alex-LGBM, CWT-Alex-GCF, CWT-Alex-Cat, CWT-LeNet5-LGBM, CWT-LeNet5-GCF and CWT-LeNet5-Cat for short, and the CWT-AlexNet and CWT-LeNet5 which are used for directly outputting classification results by a neural network through a full connection layer are added, so that 8 models to be compared are obtained in total. Wherein, CWT-Alex has the same research structure as Wang, and CWT-LeNet5-GCF has the same research structure as Xu. Xu's study has concluded that the CWT-LeNet5-GCF model outperforms the CWT-LeNet5 and CWT-GCF as well as the traditional CNN model.
The above 8 combined models perform fault diagnosis on 9000 time-frequency spectrogram samples of 32 × 32 size obtained from 4.2 sections, perform 5 experiments, and record the accuracy and prediction time on a test set composed of 1800 samples, thereby obtaining the experimental results in table 6.
TABLE 5 LGBM parameter set
Figure BDA0003234219220000221
Table 6 test set fault diagnosis results of eight models
Figure BDA0003234219220000222
Figure BDA0003234219220000231
As can be seen from Table 6, the accuracy of the proposed bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM (CWT-Alex-LGBM in the table) is 99.712%, which is higher than the CWT-AlexNet model of Wang and the CWT-LeNet5-gcForest model of Xu (98.788% and 99.598%, respectively) compared with the other 7 models.
The classification of CWT-Alex and CWT-LeNet5 is performed by using full-link layer Softmax, the effect is not as good as that of classifying features extracted by a neural network by LGBM, gcForest and CatBoost classifiers, the average accuracy of the two is only 98.788% and 98.186%, the average accuracy is far lower than that of reclassifying the features by the LGBM, gcForest and CatBoost classifiers (more than 99.5%), the multi-prediction result is very unstable, and the variance is 2.147 and 1.971 respectively and is far higher than that of other combined models.
To more intuitively compare the reclassification effect of the LGBM, gcForest, and Catboost classifiers, the six combination profiles in the table are plotted as shown in FIGS. 11-13.
As can be seen from FIG. 12, the reclassification accuracy of the LGBM, gcForest and Catboost classifiers to the neural network all show that LGBM > gcForest > Catboost. As can be seen from FIG. 13, the prediction time for LGBM and CatBOost is less than gcForest, and LeNet-5 is generally less than AlexNet, but both on the same order of magnitude.
In order to solve the problems of fault feature extraction and fault diagnosis in a rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM, through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that of other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A bearing fault diagnosis method, characterized by comprising:
firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, extracting fault features of the time-frequency spectrum by using an improved AlexNet model; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
2. The bearing fault diagnosis method according to claim 1, characterized by comprising the steps of:
step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process;
step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
3. The bearing fault diagnosis method according to claim 2, wherein in the first step, the signal sampling comprises:
selecting continuous data points with sampling length sample _ length from an original vibration signal as an original sample; sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation; readjusting the time-frequency image to be proper N multiplied by N; successive sample _ length data points after the sample interval sample _ interval are selected in an overlapping manner as another sample, another image of size N × N is generated, and the above process is repeated to generate sufficient training and test images.
4. The bearing fault diagnosis method according to claim 2, wherein in the second step, the Morlet continuous wavelet transform signal processing comprises:
the wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
Figure FDA0003234219210000021
among the different wavelets, the complex or analytic wavelet has a fourier transform with negative frequency zero; separating the phase and amplitude components of the signal using such a complex wavelet; morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler; the bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet; the second term in brackets is called the correction term for correcting the complex sine times the non-zero mean of the gaussian term; in fact, f0Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:
Figure FDA0003234219210000022
wherein the Morlet wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2B,/2); pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy;
the fourier transform of the Morlet wavelet is as follows:
Figure FDA0003234219210000023
wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis0The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet; the characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
Figure FDA0003234219210000031
the energy spectrum, i.e. the squared magnitude of the fourier transform, is calculated as follows:
Figure FDA0003234219210000032
the integrated Morlet wavelet energy is equal to 1 according to equation (3);
and converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises the corresponding relation between time and frequency.
5. The bearing fault diagnosis method according to claim 2, wherein in step three, the AlexNet feature extraction comprises:
AlexNet was modified as follows:
(1) improving the dimension of model input: the input image size 224 x 224 of the classical AlexNet is still larger for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is higher, the pictures generated by wavelet transformation of all samples occupy a large storage space, so that color pictures with the size of 32 x 32 are adopted as input;
(2) convolutional layer activation function improvement: the ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
Figure FDA0003234219210000033
a variant of ReLU, pralu, was used, expressed as:
Figure FDA0003234219210000034
PReLU differs from ReLU in that when z <0, the value is a linear function with slope a, and the gradient update is calculated as:
Figure FDA0003234219210000035
the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network;
(3) improvement of a full connection layer and an output layer: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four types are classified, so that the size of an output layer of the improved AlexNet structure is set to be 4; as the output layer becomes smaller, the size of the second fully connected layer is set to 1000.
6. The bearing fault diagnostic method of claim 5, wherein the modified AlexNet structure comprises:
(1) convolutional layer
The convolution layer and the previous layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:
Figure FDA0003234219210000041
wherein h isjA jth output feature map representing a current convolutional layer; xiThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; (x) is a nonlinear activation function corresponding to the PReLU function shown in equation (8);
(2) pooling layer
The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects a largest pool for output from the convolution output layer YcnThe maximum values of the extraction are as follows:
Figure FDA0003234219210000042
wherein S isM×NIs a pooled scale matrix; m and N are the dimensions of S; during pooling from YcnUntil the whole Y is scanned by a fixed step sizecn(ii) a S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn
(3) Full connection layer
The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of the network with various structures, and overfitting can be effectively prevented by averaging the combination of a plurality of networks;
the last layer is an output layer; for classifying various types of faults, a Softmax classifier is used; representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) denotes a fault class; for each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2...., J); the Softmax activation function is expressed as follows:
Figure FDA0003234219210000051
where θ is the weight matrix of the Softmax layer, θiIs the row vector of θ;
(4) parameter updating
To accommodate the multi-classification fault diagnosis task, the loss function is set as a cross-entropy loss function, expressed as:
Figure FDA0003234219210000052
wherein the content of the first and second substances,
Figure FDA0003234219210000053
represents the probability that the prediction of the ith sample belongs to class k;
Figure FDA0003234219210000054
for practical probability, if the true class of the ith sample is k, then
Figure FDA0003234219210000055
Otherwise, the value is 0; w(l)A parameter matrix of the l layer; first measure prediction in formula
Figure FDA0003234219210000056
And true category
Figure FDA0003234219210000057
The cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter;
the model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
Figure FDA0003234219210000058
Figure FDA0003234219210000059
wherein alpha is a learning rate, and the amplitude of gradient change in each iteration is controlled; the residual amount of the loss function generated at the jth node of the l-th layer is recorded as
Figure FDA0003234219210000061
The recurrence formula is expressed as:
Figure FDA0003234219210000062
the gradient formula of the loss versus parameter function is expressed as:
Figure FDA0003234219210000063
Figure FDA0003234219210000064
for formula (13), ykThe value is 1 only in one category k, and the rest is 0; let the real category be
Figure FDA0003234219210000065
Then:
Figure FDA0003234219210000066
Figure FDA0003234219210000067
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Figure FDA0003234219210000068
Figure FDA0003234219210000069
Residual δ of other layers(L-1),...,δ(1)To calculate according to recursion formula (15);
constructing a bearing fault feature extraction model by a Keras deep learning framework based on Python, wherein the framework is supported by a Tensorflow rear end; the SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
7. The bearing fault diagnosis method of claim 2, wherein in step four, the LGBM fault diagnosis, including gradient-based unilateral sampling and mutually exclusive feature bundling, comprises:
(1) a gradient-based single-sided sampling algorithm; training examples with large gradients in a centralized manner, and for examples with small gradients, randomly extracting and compensating the influence on data distribution by adding a constant multiplier when calculating information gain; the gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L;
and (3) outputting: a well trained strong learner;
step 1: initialization: let topN ═ a × len (i) denote the number of large gradient data samples; adding L into a model list model, and setting the weight w of each training datum as 1;
step 2: predicting training data by the model list, calculating the loss g of each data by using a loss function loss, and arranging the training data according to the g descending order;
and step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACTaking | as small gradient subsets B, and combining the large gradient subsets and the small gradient subsets to be recorded as usedSet;
and 4, step 4: multiplying the weight w of the small gradient sample by a coefficient (1-a)/b;
and 5: inputting data I, negative gradient-g and weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model;
according to vector Vj(d) Estimated variance gain over subsets A and BTo split the example:
Figure FDA0003234219210000071
wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (d); add newModel to model list models;
step 6: circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached;
(2) the mutual exclusion characteristic binding algorithm comprises two steps of binding cluster generation and mutual exclusion characteristic combination;
determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by making mutual exclusion features exclusivefeatures reside in different bins, which can be simply implemented by adding offsets to the values of the original features;
the output of the penultimate layer of AlexNet is classified using Python's LGBMClassifier packet for programming;
(3) bayesian hyper-parameter optimization
Performing parameter tuning on the training process of the LGBM model by using HyperOpt; HyperOpt provides an easy-to-use Bayesian hyper-parameter optimization algorithm, and hyper-parameter optimization is executed through a model-based sequential optimization technology; optimization based on a sequence model is a Bayesian optimization technology;
bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; the method comprises the following steps of (1) establishing a model of an objective function by using a regression method as with all model-based optimization algorithms, selecting a next point to be acquired according to the model, and updating the model;
the basic algorithm of bayesian optimization is as follows:
step 1: setting a Gaussian process for the target function f;
step 2: according to the initial space filling experimental design, at n0Point observation f, setting n to n0
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1;
and 4, step 4: returning a solution: a point calculated using the maximum f (x), or a point calculated using the maximum a posteriori mean;
the objective function f is usually unknown, a gaussian process defining for each point x the probability distribution f (x) of the gaussian distribution, determined by the mean μ and the standard deviation σ, defining the probability distribution of the function:
Figure FDA0003234219210000081
wherein the content of the first and second substances,
Figure FDA0003234219210000082
represents a standard normal distribution;
to estimate μ (x) and σ (x), a gaussian process is fitted to the data; assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of multivariate normal distribution, which is formed by summing the mean vectorsDefining a variance matrix, so that the Gaussian process is n-variable normal distribution, wherein n is the observation times; the covariance matrix is determined by a kernel function k (χ)12) By definition, it means that samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated; two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated;
given a set of observations P1:t=f(χ1:t) And sampling noise
Figure FDA0003234219210000091
The gaussian process is calculated as follows:
Figure FDA0003234219210000092
wherein the content of the first and second substances,
Figure FDA0003234219210000096
k=[k(x,χ1) k(x,χ2) … k(x,χt)];
bayesian optimization enables the maximum f (x) of an unknown objective function to be searched by using the Gaussian process model; selecting the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the balance between the promising portions predicted by the model; after the observation, the algorithm will update the gaussian process to take into account the new data; since all points of the search space are assumed to have good probability, the gaussian process is initialized with a constant mean; after each observation, the model is gradually improved;
the Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)12);
The goal is to learn the characteristic length scale l2And the total variance
Figure FDA0003234219210000094
The probability of data, θ, is maximized given the kernel function, and the marginal probability is calculated as follows:
Figure FDA0003234219210000095
wherein, mu0Is a mean function.
8. A bearing fault diagnosis system for implementing the bearing fault diagnosis method according to any one of claims 1 to 7, characterized in that the bearing fault diagnosis system comprises:
the signal sampling module is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module is used for carrying out Morlet continuous wavelet transform signal processing, carrying out continuous wavelet transform on each sample, generating a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module is used for inputting a time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module is used for inputting a time-frequency graph with the size of NxN of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num x 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
(1) signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
(2) and Morlet continuous wavelet transformation signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the step (3); for the test process, jumping to the step (5);
(3) extracting AlexNet features: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
(5) the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
10. An information data processing terminal characterized by being used to implement the bearing failure diagnosis system according to claim 8.
CN202110997171.6A 2021-08-27 2021-08-27 Bearing fault diagnosis method, system, equipment and terminal Active CN113834656B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110997171.6A CN113834656B (en) 2021-08-27 2021-08-27 Bearing fault diagnosis method, system, equipment and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110997171.6A CN113834656B (en) 2021-08-27 2021-08-27 Bearing fault diagnosis method, system, equipment and terminal

Publications (2)

Publication Number Publication Date
CN113834656A true CN113834656A (en) 2021-12-24
CN113834656B CN113834656B (en) 2024-04-30

Family

ID=78961351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110997171.6A Active CN113834656B (en) 2021-08-27 2021-08-27 Bearing fault diagnosis method, system, equipment and terminal

Country Status (1)

Country Link
CN (1) CN113834656B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282579A (en) * 2021-12-30 2022-04-05 浙大城市学院 Aviation bearing fault diagnosis method based on variational modal decomposition and residual error network
CN114609994A (en) * 2022-02-24 2022-06-10 天津大学 Fault diagnosis method and device based on multi-granularity regularization rebalance incremental learning
CN114646468A (en) * 2022-02-28 2022-06-21 南京航空航天大学 Subway wheel bearing fault diagnosis method based on small samples
CN114692694A (en) * 2022-04-11 2022-07-01 合肥工业大学 Equipment fault diagnosis method based on feature fusion and integrated clustering
CN114964476A (en) * 2022-05-27 2022-08-30 中国石油大学(北京) Fault diagnosis method, device and equipment for oil and gas pipeline system power equipment
CN115017121A (en) * 2022-08-05 2022-09-06 山东天意机械股份有限公司 Concrete production equipment data storage system
CN116434029A (en) * 2023-06-15 2023-07-14 西南石油大学 Drinking detection method
CN116577061A (en) * 2023-07-14 2023-08-11 常州市建筑科学研究院集团股份有限公司 Detection method for wind resistance of metal roof, computer equipment and medium
CN117171625A (en) * 2023-10-23 2023-12-05 云和恩墨(北京)信息技术有限公司 Intelligent classification method and device for working conditions, electronic equipment and storage medium
CN117686226A (en) * 2024-02-04 2024-03-12 南京凯奥思数据技术有限公司 Automatic bearing fault diagnosis method and system based on energy ratio and energy sum

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160041070A1 (en) * 2014-08-05 2016-02-11 01dB-METRAVIB, Société par Actions Simplifiée Automatic Rotating-Machine Fault Diagnosis With Confidence Level Indication
CN107179194A (en) * 2017-06-30 2017-09-19 安徽工业大学 Rotating machinery fault etiologic diagnosis method based on convolutional neural networks
CN108426713A (en) * 2018-02-26 2018-08-21 成都昊铭科技有限公司 Rolling bearing Weak fault diagnostic method based on wavelet transformation and deep learning
CN111274911A (en) * 2020-01-17 2020-06-12 河海大学 Dense fog monitoring method based on wireless microwave attenuation characteristic transfer learning
US20200209109A1 (en) * 2018-12-28 2020-07-02 Shanghai United Imaging Intelligence Co., Ltd. Systems and methods for fault diagnosis
CN111442926A (en) * 2020-01-11 2020-07-24 哈尔滨理工大学 Fault diagnosis method for rolling bearings of different models under variable load based on deep characteristic migration
CN111504675A (en) * 2020-04-14 2020-08-07 河海大学 On-line diagnosis method for mechanical fault of gas insulated switchgear
US20200302234A1 (en) * 2019-03-22 2020-09-24 Capital One Services, Llc System and method for efficient generation of machine-learning models
CN111721536A (en) * 2020-07-20 2020-09-29 哈尔滨理工大学 Rolling bearing fault diagnosis method for improving model migration strategy
CN112036435A (en) * 2020-07-22 2020-12-04 温州大学 Brushless direct current motor sensor fault detection method based on convolutional neural network
US20210020360A1 (en) * 2019-07-15 2021-01-21 Wuhan University Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation
US20210065065A1 (en) * 2019-09-03 2021-03-04 Palo Alto Research Center Incorporated Method for classification based diagnosis with partial system model information
CN112733612A (en) * 2020-12-18 2021-04-30 华中科技大学 Cross-domain rotating machinery fault diagnosis model establishing method and application thereof
CN113159218A (en) * 2021-05-12 2021-07-23 北京联合大学 Radar HRRP multi-target identification method and system based on improved CNN

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160041070A1 (en) * 2014-08-05 2016-02-11 01dB-METRAVIB, Société par Actions Simplifiée Automatic Rotating-Machine Fault Diagnosis With Confidence Level Indication
CN107179194A (en) * 2017-06-30 2017-09-19 安徽工业大学 Rotating machinery fault etiologic diagnosis method based on convolutional neural networks
CN108426713A (en) * 2018-02-26 2018-08-21 成都昊铭科技有限公司 Rolling bearing Weak fault diagnostic method based on wavelet transformation and deep learning
US20200209109A1 (en) * 2018-12-28 2020-07-02 Shanghai United Imaging Intelligence Co., Ltd. Systems and methods for fault diagnosis
US20200302234A1 (en) * 2019-03-22 2020-09-24 Capital One Services, Llc System and method for efficient generation of machine-learning models
US20210020360A1 (en) * 2019-07-15 2021-01-21 Wuhan University Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation
US20210065065A1 (en) * 2019-09-03 2021-03-04 Palo Alto Research Center Incorporated Method for classification based diagnosis with partial system model information
CN111442926A (en) * 2020-01-11 2020-07-24 哈尔滨理工大学 Fault diagnosis method for rolling bearings of different models under variable load based on deep characteristic migration
CN111274911A (en) * 2020-01-17 2020-06-12 河海大学 Dense fog monitoring method based on wireless microwave attenuation characteristic transfer learning
CN111504675A (en) * 2020-04-14 2020-08-07 河海大学 On-line diagnosis method for mechanical fault of gas insulated switchgear
CN111721536A (en) * 2020-07-20 2020-09-29 哈尔滨理工大学 Rolling bearing fault diagnosis method for improving model migration strategy
CN112036435A (en) * 2020-07-22 2020-12-04 温州大学 Brushless direct current motor sensor fault detection method based on convolutional neural network
CN112733612A (en) * 2020-12-18 2021-04-30 华中科技大学 Cross-domain rotating machinery fault diagnosis model establishing method and application thereof
CN113159218A (en) * 2021-05-12 2021-07-23 北京联合大学 Radar HRRP multi-target identification method and system based on improved CNN

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李向伟 等: "基于双向长短时记忆网络和卷积神经网络的 电力系统暂态稳定评估", 《科学技术与工程》, vol. 20, no. 7 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114282579A (en) * 2021-12-30 2022-04-05 浙大城市学院 Aviation bearing fault diagnosis method based on variational modal decomposition and residual error network
CN114609994A (en) * 2022-02-24 2022-06-10 天津大学 Fault diagnosis method and device based on multi-granularity regularization rebalance incremental learning
CN114609994B (en) * 2022-02-24 2023-11-07 天津大学 Fault diagnosis method and device based on multi-granularity regularized rebalancing increment learning
CN114646468A (en) * 2022-02-28 2022-06-21 南京航空航天大学 Subway wheel bearing fault diagnosis method based on small samples
CN114646468B (en) * 2022-02-28 2022-12-23 南京航空航天大学 Subway wheel bearing fault diagnosis method based on small samples
CN114692694A (en) * 2022-04-11 2022-07-01 合肥工业大学 Equipment fault diagnosis method based on feature fusion and integrated clustering
CN114692694B (en) * 2022-04-11 2024-02-13 合肥工业大学 Equipment fault diagnosis method based on feature fusion and integrated clustering
CN114964476B (en) * 2022-05-27 2023-08-22 中国石油大学(北京) Fault diagnosis method, device and equipment for oil and gas pipeline system moving equipment
CN114964476A (en) * 2022-05-27 2022-08-30 中国石油大学(北京) Fault diagnosis method, device and equipment for oil and gas pipeline system power equipment
CN115017121A (en) * 2022-08-05 2022-09-06 山东天意机械股份有限公司 Concrete production equipment data storage system
CN115017121B (en) * 2022-08-05 2022-10-25 山东天意机械股份有限公司 Data storage system of concrete production equipment
CN116434029A (en) * 2023-06-15 2023-07-14 西南石油大学 Drinking detection method
CN116434029B (en) * 2023-06-15 2023-08-18 西南石油大学 Drinking detection method
CN116577061B (en) * 2023-07-14 2023-09-15 常州市建筑科学研究院集团股份有限公司 Detection method for wind resistance of metal roof, computer equipment and medium
CN116577061A (en) * 2023-07-14 2023-08-11 常州市建筑科学研究院集团股份有限公司 Detection method for wind resistance of metal roof, computer equipment and medium
CN117171625A (en) * 2023-10-23 2023-12-05 云和恩墨(北京)信息技术有限公司 Intelligent classification method and device for working conditions, electronic equipment and storage medium
CN117171625B (en) * 2023-10-23 2024-02-06 云和恩墨(北京)信息技术有限公司 Intelligent classification method and device for working conditions, electronic equipment and storage medium
CN117686226A (en) * 2024-02-04 2024-03-12 南京凯奥思数据技术有限公司 Automatic bearing fault diagnosis method and system based on energy ratio and energy sum
CN117686226B (en) * 2024-02-04 2024-04-16 南京凯奥思数据技术有限公司 Automatic bearing fault diagnosis method and system based on energy ratio and energy sum

Also Published As

Publication number Publication date
CN113834656B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN113834656B (en) Bearing fault diagnosis method, system, equipment and terminal
Solanki et al. Music instrument recognition using deep convolutional neural networks
CN108231201B (en) Construction method, system and application method of disease data analysis processing model
CN110728360B (en) Micro-energy device energy identification method based on BP neural network
WO2022121289A1 (en) Methods and systems for mining minority-class data samples for training neural network
Corizzo et al. Scalable auto-encoders for gravitational waves detection from time series data
CN111860982A (en) Wind power plant short-term wind power prediction method based on VMD-FCM-GRU
JP2019207685A (en) Method, device and system for estimating causal relation between observation variables
CN109993236A (en) Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks
US11830521B2 (en) Voice activity detection method and system based on joint deep neural network
CN114169110B (en) Motor bearing fault diagnosis method based on feature optimization and GWAA-XGboost
CN115290326A (en) Rolling bearing fault intelligent diagnosis method
CN113780160A (en) Electric energy quality disturbance signal classification method and system
CN116819423A (en) Method and system for detecting abnormal running state of gateway electric energy metering device
CN113792879A (en) Case reasoning attribute weight adjusting method based on introspection learning
CN113420870A (en) U-Net structure generation countermeasure network and method for underwater acoustic target recognition
Garcia-Cardona et al. Structure prediction from neutron scattering profiles: A data sciences approach
CN112884093B (en) Rotary machine fault diagnosis method and equipment based on DSCRN model and storage medium
Cai et al. Inductive Conformal Out-of-distribution Detection based on Adversarial Autoencoders
Vidnerová et al. Kernel Function Tuning for Single-Layer Neural Networks
JP7310937B2 (en) Abnormality degree calculation device, abnormal sound detection device, methods and programs thereof
Daneshfar et al. Speech Emotion Recognition System by Quaternion Nonlinear Echo State Network
Garcia-Cardona et al. Structure prediction from scattering profiles: A neutron-scattering use-case
CN117312920A (en) Weighting integration unbalance classification method, system, storage medium, equipment and terminal
CN117523278A (en) Semantic attention element learning method based on Bayesian estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant