CN113834656A - Bearing fault diagnosis method, system, equipment and terminal - Google Patents
Bearing fault diagnosis method, system, equipment and terminal Download PDFInfo
- Publication number
- CN113834656A CN113834656A CN202110997171.6A CN202110997171A CN113834656A CN 113834656 A CN113834656 A CN 113834656A CN 202110997171 A CN202110997171 A CN 202110997171A CN 113834656 A CN113834656 A CN 113834656A
- Authority
- CN
- China
- Prior art keywords
- model
- layer
- alexnet
- sample
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 140
- 238000003745 diagnosis Methods 0.000 title claims abstract description 109
- 238000005457 optimization Methods 0.000 claims abstract description 44
- 238000007635 classification algorithm Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 112
- 238000012549 training Methods 0.000 claims description 104
- 230000008569 process Effects 0.000 claims description 70
- 238000005070 sampling Methods 0.000 claims description 63
- 238000012360 testing method Methods 0.000 claims description 58
- 238000000605 extraction Methods 0.000 claims description 40
- 238000004422 calculation algorithm Methods 0.000 claims description 31
- 238000009826 distribution Methods 0.000 claims description 31
- 238000010586 diagram Methods 0.000 claims description 30
- 230000000875 corresponding effect Effects 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 28
- 239000011159 matrix material Substances 0.000 claims description 23
- 210000002569 neuron Anatomy 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 14
- 230000009466 transformation Effects 0.000 claims description 14
- 230000000694 effects Effects 0.000 claims description 13
- 238000013135 deep learning Methods 0.000 claims description 9
- 230000007717 exclusion Effects 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000008901 benefit Effects 0.000 claims description 8
- 230000009191 jumping Effects 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 230000006872 improvement Effects 0.000 claims description 7
- 230000002596 correlated effect Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000015654 memory Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000004040 coloring Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 238000007306 functionalization reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 238000002405 diagnostic procedure Methods 0.000 claims description 2
- 238000013401 experimental design Methods 0.000 claims description 2
- 238000013527 convolutional neural network Methods 0.000 description 11
- 238000002474 experimental method Methods 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 9
- 230000001133 acceleration Effects 0.000 description 7
- 230000007547 defect Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005096 rolling process Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- NQBKFULMFQMZBE-UHFFFAOYSA-N n-bz-3-benzanthronylpyrazolanthron Chemical compound C12=CC=CC(C(=O)C=3C4=CC=CC=3)=C2C4=NN1C1=CC=C2C3=C1C1=CC=CC=C1C(=O)C3=CC=C2 NQBKFULMFQMZBE-UHFFFAOYSA-N 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01M—TESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
- G01M13/00—Testing of machine parts
- G01M13/04—Bearings
- G01M13/045—Acoustic or vibration analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Acoustics & Sound (AREA)
- Complex Calculations (AREA)
Abstract
The invention belongs to the technical field of bearing fault diagnosis and discloses a bearing fault diagnosis method, a system, equipment and a terminal, wherein the bearing fault diagnosis method comprises the following steps: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; extracting fault features of the time-frequency spectrogram by using an improved AlexNet model; and for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and using Bayesian optimization. The bearing fault diagnosis method provided by the invention has the optimal fault diagnosis accuracy. Through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that consumed by other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.
Description
Technical Field
The invention belongs to the technical field of bearing fault diagnosis, and particularly relates to a bearing fault diagnosis method, system, equipment and terminal.
Background
At present, effective mechanical equipment failure diagnosis can reduce huge economic losses caused in industrial production, and in recent years, the application of machine learning or deep learning techniques has been greatly increased, and in addition, the utilization of advanced measurement techniques enables a large amount of data to be collected in an industrial environment. Under the background of big data, the machine learning and Deep learning fault diagnosis algorithm model shows excellent effects, such as Deep Neural Network (DNN), CNN, recurrent Neural Network, and the like.
At present, automatic encoders and convolutional neural networks are common in deep learning fault diagnosis models. Lei et al propose a deep neural network for rotary machine fault diagnosis based on frequency domain data. Zong et al propose a frequency domain data-based bearing fault diagnosis denoising autoencoder. Wei et al propose a one-dimensional CNN for bearing fault diagnosis by means of raw time signals, which perform well in noisy environments. Guo X et al propose a hierarchical adaptive depth CNN for bearing fault diagnosis by converting the raw time signal into a 32X 32 matrix as input. Wang Q et al propose a CNN-based bearing reliability assessment and residual life prediction method that converts frequency domain signals into a 32 x 32 matrix as input. Wang J et al proposed a generic bearing fault diagnosis model transferred from a well-known AlexNet model and compared the effects of eight time-frequency feature extraction methods. Wang L H et al propose a motor fault diagnosis CNN that converts a fault signal into a Time-frequency image using Short-Time Fourier Transform (STFT). Claessens et al propose a bearing fault diagnosis local connection network consisting of normalized sparse autoencoders. Eren et al use one-dimensional convolutional neural networks for time series prediction for data preprocessing. Better efficiency is achieved by filtering, decimating and normalizing the input data. Ran et al claim that time series prediction using DNN achieves a high degree of accuracy, but do not provide any architectural details for their proposed DNN networks. The same problem occurs in the research of Mao et al, claiming to use a new deep learning approach to achieve high accuracy, but they provide only training accuracy (rather than testing accuracy) and do not provide any feasible architecture for the proposed network, resulting in difficult reproducibility. In more advanced articles they are focused on both CNN and Long Short-Term Memory networks (LSTM) for bearing fault diagnosis. However, the stepwise construction process of the model they propose is not explicitly explained. Therefore, a new bearing fault diagnosis method is needed to overcome the defects of the conventional bearing fault diagnosis method.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) in existing bearing fault diagnosis methods, no architectural details are provided for the proposed DNN network.
(2) In the existing bearing fault diagnosis method, only training accuracy (not test accuracy) is provided, and any feasible architecture is not provided for the proposed network, thereby causing difficulty in reproduction.
(3) In the existing technical scheme which simultaneously focuses on CNN and long-short term memory network LSTM for bearing fault diagnosis, the gradual construction process of the model is not clearly explained.
The difficulty in solving the above problems and defects is:
(1) many DNN models are deep and complex in structure.
(2) In training and testing the model, the testing precision is generally smaller than the training precision, and higher accuracy can be obtained by giving the training precision, but the excellence of the model method cannot be explained more than the testing precision.
(3) Sometimes, the model is built through the final result obtained by continuously adjusting result feedback, and the building process is difficult to explain.
The significance of solving the problems and the defects is as follows:
(1) in view of the above first problem, the architecture details of the DNN network can be directly constructed by the deep learning tool to construct the same network model, so as to directly utilize the constructed excellent model for fault diagnosis.
(2) In view of the second problem, the provision of test accuracy in the fault diagnosis method can better explain the advantages and effects of the method, and provide a feasible architecture for the proposed network, which can be reproduced more easily.
(3) Diagnosing the third problem, it is stated that the gradual model building process can make the diagnostic method have better interpretability, clearer when researching the principle of the method, and clearer guidance when improving the method.
Disclosure of Invention
The invention provides a bearing fault diagnosis method, a system, equipment and a terminal aiming at the problems in the prior art, and particularly relates to a bearing fault diagnosis method, a system, equipment and a terminal based on a continuous wavelet transform CWT and AlexNet-light gradient elevator fusion model AlexNet-LGBM.
The invention is realized in such a way that a bearing fault diagnosis method comprises the following steps:
firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, extracting fault features of the time-frequency spectrum by using an improved AlexNet model; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
Further, the bearing fault diagnosis method comprises the following steps:
step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode; the method has the advantages that the original signal samples are segmented, so that samples with proper sizes are generated for processing in the subsequent steps, and in addition, more samples can be generated for training and testing after the samples are segmented, so that the accuracy of the model is increased.
Step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process; the main functions of the step are two points: (1) and (2) processing the one-dimensional signals by utilizing Morlet continuous wavelet transform to extract time domain features and frequency domain features of the one-dimensional signals, and converting the one-dimensional signals into two-dimensional pictures by utilizing Morlet continuous wavelet transform for training a subsequent model and further extracting the features.
Step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model; the method mainly has the main effects of training an improved AlexNet feature extraction model and adjusting various hyper-parameters of the model, so that the model has the optimal feature extraction capability and then stores model parameters, and the model parameters are used for a subsequent test stage.
Step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model; the main function of the step is to train an LGBM model, and the fault characteristics extracted by the AlexNet model are input into the LGBM to train a final fault classifier.
Step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result. The main function of this step is to obtain the final classification result of the fault diagnosis.
Further, in step one, the signal sampling includes:
selecting continuous data points with sampling length sample _ length from an original vibration signal as an original sample; sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation; readjusting the time-frequency image to be proper N multiplied by N; successive sample _ length data points after the sample interval sample _ interval are selected in an overlapping manner as another sample, another image of size N × N is generated, and the above process is repeated to generate sufficient training and test images.
Further, in step two, the Morlet continuous wavelet transform signal processing includes:
the wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal are separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet; the second term in brackets is called the correction term and is used to correct the complex sine times the non-zero mean of the gaussian term. In fact, f0Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:
wherein the Morlet wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2B,/2); pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy.
The fourier transform of the Morlet wavelet is as follows:
wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis0The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
the energy spectrum, i.e. the squared magnitude of the fourier transform, is calculated as follows:
the integrated Morlet wavelet energy is equal to 1 according to equation (3).
And converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises the corresponding relation between time and frequency.
Further, in step three, the AlexNet feature extraction includes:
AlexNet was modified as follows:
(1) improving the dimension of model input: the input image size 224 × 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of a bearing is high, a picture generated by performing wavelet transform on all samples occupies a large storage space, so that a color picture with the size of 32 × 32 is adopted as input.
(2) Convolutional layer activation function improvement: the ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
a variant of ReLU, pralu, was used, expressed as:
PReLU differs from ReLU in that when z <0, the value is a linear function with slope a, and the gradient update is calculated as:
and the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.
(3) Improvement of a full connection layer and an output layer: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four types are classified, so that the size of an output layer of the improved AlexNet structure is set to be 4; as the output layer becomes smaller, the size of the second fully connected layer is set to 1000.
Further, the improved AlexNet structure comprises:
(1) convolutional layer
The convolution layer and the previous layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:
wherein h isjA jth output feature map representing a current convolutional layer; xiThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; f (x) is a non-linear activation functionCorresponding to the PReLU function shown in equation (8).
(2) Pooling layer
The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects a largest pool for output from the convolution output layer YcnThe maximum values of the extraction are as follows:
wherein S isM×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from YcnUntil the whole Y is scanned by a fixed step sizecn(ii) a S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn。
(3) Full connection layer
The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.
The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. Representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:
where θ is the weight matrix of the Softmax layer, θiIs the row vector of theta.
(4) Parameter updating
To accommodate the multi-classification fault diagnosis task, the loss function is set as a cross-entropy loss function, expressed as:
wherein the content of the first and second substances,represents the probability that the prediction of the ith sample belongs to class k;for practical probability, if the true class of the ith sample is k, thenOtherwise, the value is 0; w(l)A parameter matrix of the l layer; the first term in the formula measures the predictionAnd true categoryThe cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is the L2 regularization term, and the coefficient λ is the weight decay parameter.
The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
where α is the learning rate, controlling the magnitude of the gradient change in each iteration. The residual amount of the loss function generated at the jth node of the l-th layer is recorded asThe recurrence formula is expressed as:
the gradient formula of the loss versus parameter function is expressed as:
for formula (13), ykThe value is 1 only in one category k, and the rest are 0. Let the real category beThen:
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Residual δ of other layers(L-1),...,δ(1)To be calculated according to the recursion formula (15).
The bearing fault feature extraction model was constructed by a Python-based Keras deep learning framework using the tensrflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
Further, in step four, the LGBM fault diagnosis includes gradient-based unilateral sampling and mutual exclusion feature bundling, including:
(1) gradient-based single-edge sampling algorithms. The case with large gradient is trained intensively, and for the case with small gradient, random extraction is adopted and the influence on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L.
And (3) outputting: a well-trained strong learner.
Step 1: initialization: let topN ═ a × len (i) denote the number of large gradient data samples; adding L into a model list model, and setting the weight w of each training datum as 1;
step 2: predicting training data by the model list, calculating the loss g of each data by using a loss function loss, and arranging the training data according to the g descending order;
and step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACTaking | as small gradient subsets B, and combining the large gradient subsets and the small gradient subsets to be recorded as usedSet;
and 4, step 4: multiplying the weight w of the small gradient sample by a coefficient (1-a)/b;
and 5: inputting data I, negative gradient-g and weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model;
according to vector Vj(d) The example is split by the estimated variance gain over subsets a and B:
wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (d); add newModel to model list models;
step 6: and (5) circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached.
(2) The mutually exclusive feature binding algorithm comprises two steps of binding cluster generation and mutually exclusive feature combination.
Determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.
The output of the penultimate layer of AlexNet is sorted using Python's LGBMClassifier packet for programming.
(3) Bayesian hyper-parameter optimization
And performing parameter optimization on the training process of the LGBM model by using HyperOpt. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm, and hyper-parametric optimization is performed through a model-based sequential optimization technology. Sequence model-based optimization is a bayesian optimization technique.
Bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; as with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.
The basic algorithm of bayesian optimization is as follows:
step 1: setting a Gaussian process for the target function f;
step 2: according to the initial space filling experimental design, at n0Point observation f, setting n to n0;
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1;
and 4, step 4: returning a solution: a point calculated using the maximum f (x), or a point calculated using the maximum a posteriori mean;
the objective function f is usually unknown, a gaussian process defining for each point x the probability distribution f (x) of the gaussian distribution, determined by the mean μ and the standard deviation σ, defining the probability distribution of the function:
To estimate μ (x) and σ (x), a gaussian process is fitted to the data. Assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of multivariate normal distribution defined by a mean vector and a covariance matrix, so the gaussian process is an n-variable normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)1,χ2) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated.
Given a set of observations P1:t=f(χ1:t) And sampling noiseThe gaussian process is calculated as follows:
bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the desired portion predicted by the model. After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since it is assumed that all points of the search space have good probability. After each observation, the model was gradually perfected.
The Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)1,χ2)。
The goal is to learn the characteristic length scale l2And the total varianceThe probability of data, θ, is maximized given the kernel function, and the marginal probability is calculated as follows:
wherein, mu0Is a mean function.
Another object of the present invention is to provide a bearing fault diagnosis system applying the bearing fault diagnosis method, the bearing fault diagnosis system including:
the signal sampling module is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module is used for carrying out Morlet continuous wavelet transform signal processing, carrying out continuous wavelet transform on each sample, generating a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module is used for inputting a time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module is used for inputting a time-frequency graph with the size of NxN of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num x 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
(1) signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
(2) and Morlet continuous wavelet transformation signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the step (3); for the test process, jumping to the step (5);
(3) extracting AlexNet features: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
(5) the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
Another object of the present invention is to provide an information data processing terminal for implementing the bearing fault diagnosis system.
By combining all the technical schemes, the invention has the advantages and positive effects that: aiming at the problem that the classification capability of a Softmax layer of a CNN is not as good as that of a novel Machine learning classification method, the invention provides a bearing fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient reinforced Machine, AlexNet-LGBM), and the method can be divided into three parts: vibration signal data processing based on continuous wavelet transform: extracting time-frequency characteristics from the original vibration signals of the bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, for fault feature extraction, improving an AlexNet model to extract features of a time-frequency spectrogram; and thirdly, for fault diagnosis, fault classification is carried out on the extracted fault characteristics through an LGBM classification algorithm, and optimal model parameters are selected by using Bayesian optimization. The invention also uses a bearing data set of the Kaiser University of Western storage (CWRU) to carry out a comparison experiment, compares the improved AlexNet and LeNet-5 with various combination methods of multi-granularity cascade forests, LGBMs and Catboost, and shows that the AlexNet-LGBM fault diagnosis method based on continuous wavelet transformation provided by the invention has the optimal fault diagnosis accuracy.
The bearing fault diagnosis method provided by the invention also has the following advantages:
(1) for equipment fault feature extraction, firstly, Continuous Wavelet Transform (CWT) is performed on vibration data to convert the vibration data into a time-frequency graph. In order to adapt to the extraction of the fault characteristics of the bearing, an AlexNet model is improved: firstly, the input dimension is changed into 32 multiplied by 3 so as to reduce the storage space occupied by the time-frequency diagram; secondly, the convolution layer activation function uses a parameterized Linear rectification function (PReLU) to overcome the limitation of the Linear rectification function (ReLU); the full connection layer and the output layer are changed to be suitable for the size of the fault classification number; the improved AlexNet, LeNet-5 and EfficientNet-B0 migration models are respectively used for feature extraction, and the feature extraction capabilities of three neural network structures are compared.
(2) For equipment fault diagnosis, a fault diagnosis method based on continuous wavelet transformation and an AlexNet-lightweight Gradient elevator fusion model (AlexNet-Light Gradient Boosted Machine, AlexNet-LGBM) is proposed: firstly, extracting fault characteristics from a vibration signal by using continuous wavelet transformation and improved AlexNet, further carrying out fault classification on the extracted characteristics by using a lightweight gradient elevator classification algorithm, and optimizing model parameters by using Bayesian optimization. And various combination methods of the improved AlexNet, LeNet-5 feature extraction and multi-granular Cascade Forest (gcForest), LGBM and Catboost classification algorithms are compared.
In order to solve the problems of fault feature extraction and fault diagnosis in a rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM, through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that of other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a bearing fault diagnosis method provided in an embodiment of the present invention.
FIG. 2 is a block diagram of a bearing fault diagnosis system provided by an embodiment of the present invention;
in the figure: 1. a signal sampling module; 2. a wavelet transform signal processing module; 3. an AlexNet feature extraction module; 4. an LGBM fault diagnosis module; 5. and a testing module.
Fig. 3 is a flow chart of bearing vibration signal processing according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of the effect of the Morlet continuous wavelet transform provided by the embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an improved AlexNet according to an embodiment of the present invention.
Fig. 6 is a flowchart of bearing fault diagnosis provided by an embodiment of the present invention.
Fig. 7 is a schematic diagram of processing results of continuous wavelet transform according to an embodiment of the present invention.
FIG. 8 is a schematic diagram of the accuracy rate variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.
FIG. 9 is a schematic diagram of the loss variation of the improved AlexNet, LeNet-5 and EfficentNet according to the embodiment of the present invention.
Fig. 10 is a schematic diagram of TSNE visualization display of extracted features provided in the embodiment of the present invention.
Fig. 11 is a schematic diagram of the accuracy of 5 experimental test sets of six combination models provided by the embodiment of the present invention.
Fig. 12 is a schematic diagram of the average accuracy of 5 experimental test sets of six combined models provided in the embodiment of the present invention.
Fig. 13 is a schematic diagram of average time of 5 experimental prediction test sets of six combined models provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, a device and a terminal for diagnosing a bearing fault, which are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a bearing fault diagnosis method provided by an embodiment of the present invention includes the following steps:
s101, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
s102, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training procedure, S103 is performed; for the test process, jumping to S105;
s103, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
s104, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
s105, testing: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
As shown in fig. 2, a bearing fault diagnosis system provided by an embodiment of the present invention includes:
the signal sampling module 1 is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module 2 is used for performing Morlet continuous wavelet transform signal processing, performing continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to divide the images into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module 3 is used for inputting the time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module 4 is used for inputting a time-frequency graph with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module 5 is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
The technical solution of the present invention is further described below with reference to specific examples.
Aiming at the problems in the prior art, the invention provides a bearing fault diagnosis method based on a continuous wavelet transform and AlexNet-lightweight gradient elevator fusion model. Firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, improving an AlexNet model to extract fault characteristics of the time-frequency spectrogram; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
1. Signal processing
1.1 vibration Signal processing flow
First, a sampling length sample _ length (set to 1024 in the experiment of the present invention) of consecutive data points is selected from the original vibration signal as an original sample. Then, sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation. Subsequently, the time-frequency image is readjusted to an appropriate N × N (set to 32 × 32 in the experiment of the present invention) size. Then, consecutive sample _ length data points after the sampling interval sample _ interval (set to 384 in the experiment of the present invention) are selected as another sample in an overlapping manner, resulting in another image of N × N size, as shown in fig. 3. The above process is repeated to generate sufficient training and test images.
1.2 Morlet continuous wavelet transform signal processing
The wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
in different wavelets, a complex or analytic wavelet has a fourier transform with negative frequencies of zero. With such a complex wavelet, the phase and amplitude components of the signal can be separated. Morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler. The invention uses Morlet to process the bearing vibration signal. The Morlet wavelet is defined as:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet. The second term in parentheses is referred to as the correction term because it corrects for the non-zero mean of the complex sine times the gaussian term (corresponding to the first term in parentheses). In fact, f0The value of > 0 is negligible, in which case the Morlet wavelet can be expressed as follows:
this wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2And/2) in the column. Pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy. The function given by equation (3) is not a true wavelet because it has a non-zero mean, i.e., its zero frequency term for the energy spectrum is non-zero, and therefore it is not acceptable. However, in practice, when f0> 0, it can be used with minimal error.
The fourier transform of the Morlet wavelet is as follows:
it has a form of a Gaussian function, and is shifted by f along the frequency axis0. The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet. The characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
the energy spectrum (squared magnitude of the fourier transform) is calculated as follows:
the integrated Morlet wavelet energy is equal to 1 according to equation (3).
The effect of using the Morlet continuous wavelet transform on the bearing vibration signal sample processing according to the sampling frequency of the signal (12 kHz in the present invention) is shown in FIG. 4.
Fig. 4(a) is a record of the acceleration continuously measured by the bearing acceleration sensor for the bearing rotating at high speed within 86 milliseconds, with the abscissa as the time axis and the ordinate as the acceleration of the monitoring point during the rotation of the bearing. Under ideal conditions, the bearing acceleration should be 0 for a perfectly uniform rotation. It can be seen from fig. 4(a) that the actual acceleration of the bearing fluctuates around 0 mean, and at about 27 ms and 61 ms, the acceleration of the bearing is larger, the corresponding energy of the bearing is larger, and the higher color brightness is shown at the corresponding time position in the right graph.
Through continuous wavelet transformation, the one-dimensional vibration signal can be converted into a picture, and the picture contains the corresponding relation between time and frequency.
1.3 AlexNet feature extraction
The AlexNet model proposed by Krizhevsky et al can achieve better performance in image recognition than other methods. To date, the Alexnet model has still played an important role in many areas. In order to adapt to the extraction of the fault characteristics of the bearing, the AlexNet is improved as follows:
(1) the model input dimension improves. The input image size 224 x 224 of the classical AlexNet is still large for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is high, the pictures generated by wavelet transformation of all samples occupy a large storage space. Therefore, the present invention takes a color picture of 32 × 32 size as input.
(2) Convolutional layer activation function improvement. The ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
since the negative gradient is set to 0 by the ReLU activation function, it cannot participate in subsequent propagation and is activated, so that the parameters of the neuron cannot be updated. If the learning rate is set to be too large in the actual training, part of neurons can be invalid, and the parameters cannot be updated effectively, so that the training fails. To this end, the invention uses a variant PReLU of ReLU, which is represented by the form:
PReLU is different from ReLU, and has a value of a linear function with a slope a (smaller constant) when z < 0. The gradient update is calculated as:
the PReLU can greatly reduce the loss of negative gradient information and can be suppressed at one side. The value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network.
(3) Full connection layer and output layer improvement. Since the bearing fault diagnosis studied by the present invention includes 1 normal type and 3 fault types (which will be described in detail in the fourth section), for a total of four categories, the output layer size of the improved AlexNet structure is set to 4. As the output layer becomes smaller, the size of the second fully-connected layer is set to 1000 to better extract the key features. The improved AlexNet structure proposed by the present invention is shown in fig. 5.
1.3.1 convolutional layers
The convolution layer and the upper layer are connected in a local connection and counterweight mode, so that the number of parameters is greatly reduced. The operation process when convolution is carried out is as follows:
wherein h isjJ-th output feature map, X, representing the current convolutional layeriI-th output feature map representing last convolutional layer (convolutional layer input of current layer). The represents convolution operation, parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjAnd mapping to the offset corresponding to the jth input characteristic of the convolution layer of the current layer. f (x) is a nonlinear activation function, which in the present invention corresponds to the PReLU function shown in equation (8).
1.3.2 pooling layer
The grey color in fig. 5 is the pooling layer used for downsampling after the convolution operation to enable further dimensionality reduction of the extracted features. Common pooling layers include a maximum pool and an average pool. The invention selects the largest pool that can be output from the convolution output layer YcnThe maximum values of the extraction are as follows:
wherein SM×NIs a pooled scale matrix; m and N are the dimensions of S. During pooling from YcnUntil the whole Y is scanned by a fixed step sizecn. In this chapter, S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn。
1.3.3 full connection layer
The features are flattened into one dimension by the last convolutional and pooling layers before reaching the Flatten layer, and then pass through two fully-connected layers. Each neuron in the fully connected layer is fully connected to all neurons in the upper layer. Dropout operation is carried out on the outputs of the two full connection layers, the discarding rate is 0.5, and partial units are not updated, namely are randomly discarded by the network. Therefore, the structure of the network changes after each iteration, which is equivalent to the effect of ensemble learning of networks with various structures, and overfitting can be effectively prevented by jointly averaging a plurality of networks.
The last layer is the output layer. For multiple types of fault classification, a Softmax classifier is used. The Softmax classifier can effectively solve the problem of multiple classifications. Representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) represents a fault class. For each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2. The Softmax activation function is expressed as follows:
where θ is the weight matrix of the Softmax layer, θiIs the row vector of theta.
1.3.4 parameter update
To accommodate the multi-classification fault diagnosis task of this chapter, the loss function is set as a cross-entropy loss function, expressed as:
whereinRepresenting the probability that the prediction of the ith sample belongs to class k,is the actual probability (if the true class of the ith sample is k, thenOtherwise 0), W(l)Is the parameter matrix of the l-th layer. The first term in the formula measures the predictionAnd true categoryThe cross entropy between the two is the maximum entropy and the minimum loss function when the predicted value and the real value are equal. The second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter, so that the relative weights of the two terms can be balanced, and overfitting can be effectively prevented.
The model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
where α is the learning rate, controlling the magnitude of the gradient change in each iteration. The residual amount of the loss function generated at the jth node of the l-th layer is recorded asIts recurrence formula can be expressed as:
the gradient of the loss versus parameter function can be written as:
for formula (13), ykThe value is 1 only in one category k, and the rest are 0. Let the real category beThen:
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Residual δ of other layers(L-1),...,δ(1)Can be calculated according to the recursion formula (15).
For experiments, the invention constructs a bearing fault feature extraction model by a Pyron-based Keras deep learning framework using Tensorflow back-end support. The SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
1.4 LGBM Fault Classification
The LGBM classification algorithm mainly comprises unilateral sampling based on gradient and mutually exclusive feature binding.
(1) Gradient-based single-edge sampling algorithms. The case with large gradient is trained intensively, and for the case with small gradient, random extraction is adopted and the influence on the data distribution is compensated by adding a constant multiplier when calculating the information gain. The gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L.
And (3) outputting: a well-trained strong learner.
Step 1: initialization: let topN be a × len (i) denote the number of large gradient data samples. The model list models adds L. The weight w of each training data is set to 1.
Step 2: the model list predicts the training data and calculates the loss g for each data using the loss function loss. And the training data is sorted in descending order of g.
And step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACAnd | as small gradient subsets B. The large and small gradient subsets are merged and denoted usedSet.
And 4, step 4: the weight w of the small gradient sample is multiplied by a factor (1-a)/b.
And 5: and inputting the data I, the negative gradient-g and the weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model newModel.
According to vector Vj(d) The estimated variance gain on subsets a and B divides the instances.
Wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (2). Will newModel is added to the model list models.
Step 6: and (5) circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached.
(2) The mutually exclusive feature binding algorithm comprises two steps of binding cluster generation and mutually exclusive feature combination.
The bundled cluster generation algorithm determines which mutually exclusive features can be merged (features that can be merged are put together and are called bundles), and then the mutually exclusive feature merging merges the respective bundles into one feature. Determining which mutually exclusive features can be used in combination is Greeny bundle, and the specific process is that firstly, the features are used as vertexes, edges are added to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by having the mutual exclusion features exclusivefeaturees reside in different bins, which can be simply implemented by adding an offset to the value of the original feature.
The present invention uses the Python's LGBMClassifier packet for programming to classify the output of the second last layer of AlexNet.
1.5 Bayesian hyper-parameter optimization
The invention uses HyperOpt to carry out parameter optimization on the training process of the LGBM model. HyperOpt provides an easy-to-use Bayesian hyper-parametric optimization algorithm that performs hyper-parametric optimization via model-based sequential optimization techniques. Sequence model-based optimization is a bayesian optimization technique.
Bayesian optimization is a model-based optimization algorithm that is specifically tailored to the objective function (also called cost function). Bayesian optimization searches for the maximum of the unknown objective function from which the sample can be obtained. As with all model-based optimization algorithms, a regression method is used to create a model of the objective function, and the next point to be acquired is selected according to the model, and then the model is updated.
The basic algorithm of bayesian optimization is as follows:
step 1: a gaussian process is set for the objective function f.
Step 2: according to the initial space filling experimental facilityIs counted at n0Point observation f. Setting n as n0。
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1.
And 4, step 4: returning a solution: the point calculated using the maximum f (x), or the point calculated using the maximum a posteriori mean.
The objective function f is usually unknown, and a gaussian process defines for each point x a probability distribution f (x) of the gaussian distribution. And is therefore determined by the mean μ and the standard deviation σ. Defining the probability distribution of the function:
To estimate μ (x) and σ (x), a gaussian process needs to be fitted to the data. For this reason, it is assumed that each observation f (χ) is a sample of a normal distribution. If there is a data set consisting of a plurality of observations, i.e. f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of a multivariate normal distribution defined by a mean vector and a covariance matrix. Thus, the gaussian process is an n-variate normal distribution, where n is the number of observations. The covariance matrix is determined by a kernel function k (χ)1,χ2) By definition, samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated. Two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated.
Given a set of observations P1:t=f(χ1:t) And sampling noiseThe gaussian process is calculated as follows:
Bayesian optimization enables the search for the maximum value f (x) of the unknown objective function using this gaussian process model. The selection of the next χ is tested by selecting the maximum of the acquisition function to balance the balance between exploration (improving the model in less explored parts of the search space) and development (favoring the promising parts predicted by the model). After observation, the algorithm will update the gaussian process to take into account the new data. The gaussian process is initialized with a constant mean value, since all points of the search space are assumed to have good probability. After each observation, the model was gradually perfected.
The Gaussian process is completely specified by its mean function as μ (x) and kernel function k (χ)1,χ2)。
The goal is to learn the characteristic length scale l2And the total varianceThe probability of data, θ, is maximized given the kernel function. The marginal probability is calculated as follows:
wherein mu0Is a mean function.
3. Bearing fault diagnosis method based on CWT and AlexNet-LGBM
As shown in fig. 6, the bearing fault diagnosis process based on continuous wavelet transform and AlexNet-LGBM is as follows:
step 1: signal sampling: for the original vibration data, each sample _ length (1024 in the fourth experiment) consecutive data points is used as a sample, and consecutive sampling is performed at a sampling interval sample _ interval (384 in the fourth experiment) in an overlapping sampling manner.
Step 2: continuous wavelet transform signal processing: each sample is subjected to continuous wavelet transform to generate a corresponding time-frequency image, and the time-frequency image is readjusted to be a color picture of size N × N (set to 32 × 32 in the fourth experiment). Sufficient picture partitions into training and test sets are generated. For the training process, step 3 is performed; for the test process, jump to step 5 execution.
And step 3: extracting AlexNet features: and inputting the time-frequency diagram with the size of N multiplied by N of the training set into an improved AlexNet model for training, and storing the model.
And 4, step 4: LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of the last full-connected layer, inputting the time-frequency diagram into an LGBM model for training, wherein the data dimension is sample _ Num multiplied by 1000, the sample _ Num represents the number of samples, and 1000 is the number of neurons of the second full-connected layer of the AlexNet model.
And 5: the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the output of the AlexNet model into the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
4. Experimental verification
4.1 data set and Experimental Environment introduction
The present invention uses the bearing vibration data set published by the university of Keiss Xizhi. In the CWRU bearing experiment, there are four variables including fault location, fault depth, motor load and sampling frequency. The data file adopts an MATLAB format and comprises fan end and drive end bearing acceleration data and motor rotating speed data.
Considering that the load is not 0 most of the time when the rotary machine works in reality, the fault diagnosis should be applied to all load situations as much as possible, and the fault position is more concerned than the fault depth so as to be convenient for replacing parts. Therefore, the fault diagnosis target is set to identify the fault position of the bearing, and the fault position comprises four types of inner ring faults, ball faults, outer ring faults and normal. In conjunction with the absence of data under individual conditions of the CWRU data set, the present invention uses normal data for 1 to 3 horsepower loads and drive end bearing failure data for a 12kHz sampling frequency, using specifically the CWRU portion data files shown in tables 1 and 2.
The experiment was performed on a Windows 1064-bit operating system computer with a GPU, the CPU model was i5-4200U, and the running memory was 12 GB. The programming was done on a Jupyter notewood compiler using Python 3.7 language, using the deep learning framework of the tenserflow 2.3.1 and Keras 2.4.3 versions.
Table 1 normal data file used by the present invention
Table 2 fault data file for use with the present invention
4.2 data processing
In the CWRU dataset, each operating condition was run for around 20s, i.e. about 240,000 data points per dataset, depending on 12,000Hz of the sample frequency. Therefore, it is necessary to truncate the original vibration signal to generate training and test data sets. In the present invention, the overlap-sampling method introduced in section 3.1.1 is used to generate training and test data sets. The truncation window slides along the original vibration signal with a sampling interval of 384 data points and a window size of 1,024 data points. Each movement of the window produces a data set of 1,024 data points. The first 300 samples were selected from a small sample consisting of several consecutive 1,024 consecutive data points generated for each file, so that a total of 30 files in tables 1 and 2 resulted in 9,000 samples.
9000 samples are processed by continuous wavelet transform signals in 1.2 knots, Morlet mother wavelet function is selected, and a time-frequency spectrogram obtained by wavelet transform is reset to be 32 multiplied by 32 pixels, so that 9,000 time-frequency pictures with uniform size are obtained. The processing results are shown in fig. 7.
As can be seen from fig. 7, the normal bearings have a more uniform energy distribution compared to the failed bearings, while the failed bearings show periodic high energy bands, and the failure frequency is different from the frequency distribution of the normal bearings in the longitudinal direction, and the energy distribution of the normal bearings is in the lower frequency band.
TABLE 3 data set partitioning
4.3 neural network feature extraction capability comparison
In order to compare the feature extraction capability of different neural network structures on the vibration spectrogram of the bearing, the improved AlexNet and LeNet-5 provided in section 3.2 are compared with EfficentNet.
TABLE 4 LeNet-5 and EfficentNet Structure and parameter settings
The improved AlexNet structure is shown in section 1.3, total 17,289,484 parameters, and compared with the original AlexNet with 60,965,128 parameters, the improved AlexNet structure is reduced by 71.6%, and the training speed of AlexNet is improved.
As the model of EfficentNet from B0 to B7 requires larger and larger picture input sizes, the model of EfficentNet-B0 is only suitable for 32 x 32 pictures in this chapter, and the structure and parameter settings of the improved LeNet-5 and EfficentNet-B0 models from top to bottom are shown in Table 4.
AlexNet, LeNet-5 and EffentrtNet all use the cross entropy loss function, category _ cross sensitivity and SGD optimizer, with a learning rate set to 0.001. The number of iterations is set to 30 generations, and the training results are shown in fig. 8 and 9.
From the change of the training accuracy and the loss, the accuracy and the loss of the three models are almost not changed after 30 iterations, and the convergence is achieved. The accuracy of the validation set of EfficientNet can only reach about 85%, and LeNet-5 and AlexNet can achieve the better effect of 98% accuracy. The EfficientNet is not suitable for fault diagnosis of a bearing fault spectrogram of 32 x 32 pixels, the training fluctuation of LeNet-5 is larger than that of AlexNet, and AlexNet is more stable than that of LeNet-5.
The features extracted from the penultimate fully connected layers of AlexNet and LeNet-5 are clustered by the SNE tool of sklern and visualized for dimensionality reduction as shown in FIG. 10.
It can be seen that the characteristics extracted by LeNet-5 are difficult to classify at two places (dotted circle), the data of different classes are pasted together, and the improved AlexNet only has one place which is difficult to classify. The improved AlexNet of the invention has better feature extraction capability. AlexNet and LeNet-5 are used for feature extraction later, and fault diagnosis is continued through LGBM classification.
4.4 bearing fault diagnosis method comprehensive comparison
In order to verify that the bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM provided by the invention has the highest accuracy, the invention compares the fault diagnosis effects of different combinations of the similar AlexNet and LGBM combined structure.
Wherein the LGBM classifier is optimized by bayesian parameters, and the parameter settings are shown in table 5.
The second last layer output of AlexNet and LeNet-5 is respectively input into LGBM, gcForest and Catboost classifiers, so as to generate six combined classifiers which are called CWT-Alex-LGBM, CWT-Alex-GCF, CWT-Alex-Cat, CWT-LeNet5-LGBM, CWT-LeNet5-GCF and CWT-LeNet5-Cat for short, and the CWT-AlexNet and CWT-LeNet5 which are used for directly outputting classification results by a neural network through a full connection layer are added, so that 8 models to be compared are obtained in total. Wherein, CWT-Alex has the same research structure as Wang, and CWT-LeNet5-GCF has the same research structure as Xu. Xu's study has concluded that the CWT-LeNet5-GCF model outperforms the CWT-LeNet5 and CWT-GCF as well as the traditional CNN model.
The above 8 combined models perform fault diagnosis on 9000 time-frequency spectrogram samples of 32 × 32 size obtained from 4.2 sections, perform 5 experiments, and record the accuracy and prediction time on a test set composed of 1800 samples, thereby obtaining the experimental results in table 6.
TABLE 5 LGBM parameter set
Table 6 test set fault diagnosis results of eight models
As can be seen from Table 6, the accuracy of the proposed bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM (CWT-Alex-LGBM in the table) is 99.712%, which is higher than the CWT-AlexNet model of Wang and the CWT-LeNet5-gcForest model of Xu (98.788% and 99.598%, respectively) compared with the other 7 models.
The classification of CWT-Alex and CWT-LeNet5 is performed by using full-link layer Softmax, the effect is not as good as that of classifying features extracted by a neural network by LGBM, gcForest and CatBoost classifiers, the average accuracy of the two is only 98.788% and 98.186%, the average accuracy is far lower than that of reclassifying the features by the LGBM, gcForest and CatBoost classifiers (more than 99.5%), the multi-prediction result is very unstable, and the variance is 2.147 and 1.971 respectively and is far higher than that of other combined models.
To more intuitively compare the reclassification effect of the LGBM, gcForest, and Catboost classifiers, the six combination profiles in the table are plotted as shown in FIGS. 11-13.
As can be seen from FIG. 12, the reclassification accuracy of the LGBM, gcForest and Catboost classifiers to the neural network all show that LGBM > gcForest > Catboost. As can be seen from FIG. 13, the prediction time for LGBM and CatBOost is less than gcForest, and LeNet-5 is generally less than AlexNet, but both on the same order of magnitude.
In order to solve the problems of fault feature extraction and fault diagnosis in a rolling bearing, the invention provides a bearing fault diagnosis method based on continuous wavelet transform and AlexNet-LGBM, through experimental comparison, the method has the highest accuracy of 99.712% compared with other 7 methods, the time consumed for prediction of 1800 samples is 1.47 seconds and is in the same order of magnitude as that of other models, the five-time prediction accuracy variance is only 0.063, and the method is stable compared with other 6 methods, and has the optimal comprehensive performance.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A bearing fault diagnosis method, characterized by comprising:
firstly, extracting time-frequency characteristics from an original vibration signal of a bearing by using continuous wavelet transform, and converting the time-frequency characteristics into a two-dimensional image with 32 multiplied by 32 pixels; secondly, extracting fault features of the time-frequency spectrum by using an improved AlexNet model; and finally, for fault diagnosis classification, selecting optimal model parameters by an LGBM classification algorithm and Bayesian optimization.
2. The bearing fault diagnosis method according to claim 1, characterized by comprising the steps of:
step one, signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
step two, Morlet continuous wavelet transform signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing step three; jumping to the step five for the test process;
step three, AlexNet feature extraction: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
step four, LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
step five, the testing process: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
3. The bearing fault diagnosis method according to claim 2, wherein in the first step, the signal sampling comprises:
selecting continuous data points with sampling length sample _ length from an original vibration signal as an original sample; sample _ length continuous sampling points generate a corresponding time-frequency image through continuous wavelet transformation; readjusting the time-frequency image to be proper N multiplied by N; successive sample _ length data points after the sample interval sample _ interval are selected in an overlapping manner as another sample, another image of size N × N is generated, and the above process is repeated to generate sufficient training and test images.
4. The bearing fault diagnosis method according to claim 2, wherein in the second step, the Morlet continuous wavelet transform signal processing comprises:
the wavelet function ψ (t) performs a continuous wavelet transform formula of the signal x (t) as follows:
among the different wavelets, the complex or analytic wavelet has a fourier transform with negative frequency zero; separating the phase and amplitude components of the signal using such a complex wavelet; morlet is the most commonly used complex wavelet, and continuous wavelet analysis using Morlet complex wavelets has the advantage of enabling separation of information in the wavelet domain and making the relationship between transform ridges and instantaneous frequency simpler; the bearing vibration signal was processed using Morlet, defined as the Morlet wavelet:
ψ(t)=π-1/4(exp(i2πf0t)-exp(-(2πf0)2/2))exp(-t2/2) (2)
wherein f is0Is the center frequency of the mother wavelet; the second term in brackets is called the correction term for correcting the complex sine times the non-zero mean of the gaussian term; in fact, f0Values of > 0 are ignored, in which case the Morlet wavelet is represented as follows:
wherein the Morlet wavelet is a simple complex sine exp (i2 π f)0t) at a Gaussian envelope exp (-t)2B,/2); pi1/4The term is a normalization factor that ensures that the wavelet has a unit energy;
the fourier transform of the Morlet wavelet is as follows:
wherein the expression of the Fourier transform of said Morlet wavelet has the form of a Gaussian function, shifted by f along the frequency axis0The center frequency of the gaussian spectrum is typically chosen to resolve the characteristic frequencies of the Morlet wavelet; the characteristic frequency is set for the mother wavelet and varies according to the wavelet scale a as follows:
the energy spectrum, i.e. the squared magnitude of the fourier transform, is calculated as follows:
the integrated Morlet wavelet energy is equal to 1 according to equation (3);
and converting the one-dimensional vibration signal into a picture through continuous wavelet transformation, wherein the picture comprises the corresponding relation between time and frequency.
5. The bearing fault diagnosis method according to claim 2, wherein in step three, the AlexNet feature extraction comprises:
AlexNet was modified as follows:
(1) improving the dimension of model input: the input image size 224 x 224 of the classical AlexNet is still larger for bearing fault diagnosis based on vibration signals, and if the vibration signal acquisition frequency of the bearing is higher, the pictures generated by wavelet transformation of all samples occupy a large storage space, so that color pictures with the size of 32 x 32 are adopted as input;
(2) convolutional layer activation function improvement: the ReLU function has limitations because its function ReLU → f (z) ═ max (0, z) calculates the gradient formula at the time of iterative update as:
a variant of ReLU, pralu, was used, expressed as:
PReLU differs from ReLU in that when z <0, the value is a linear function with slope a, and the gradient update is calculated as:
the value of a is continuously updated through back propagation, and is iteratively optimized together with the weight and the bias parameters in the network;
(3) improvement of a full connection layer and an output layer: the bearing fault diagnosis comprises 1 normal type and 3 fault types, and four types are classified, so that the size of an output layer of the improved AlexNet structure is set to be 4; as the output layer becomes smaller, the size of the second fully connected layer is set to 1000.
6. The bearing fault diagnostic method of claim 5, wherein the modified AlexNet structure comprises:
(1) convolutional layer
The convolution layer and the previous layer are connected in a local connection and counterweight mode, and the operation process during convolution is as follows:
wherein h isjA jth output feature map representing a current convolutional layer; xiThe ith output feature map representing the last convolutional layer, i.e. the convolutional layer input of the current layer; representing convolution operations, a parameter matrix WijMapping the convolution kernel corresponding to the ith input feature to the jth output feature in the current layer, bjMapping to the offset corresponding to the jth input feature of the convolution layer of the current layer; (x) is a nonlinear activation function corresponding to the PReLU function shown in equation (8);
(2) pooling layer
The pooling layer is used for downsampling after convolution operation and further reducing the dimension of the extracted features; the pooling layer selects a largest pool for output from the convolution output layer YcnThe maximum values of the extraction are as follows:
wherein S isM×NIs a pooled scale matrix; m and N are the dimensions of S; during pooling from YcnUntil the whole Y is scanned by a fixed step sizecn(ii) a S is a 3 × 3 matrix, then YcnWill be reduced to 1/9 and assigned to P in the pool output layercn;
(3) Full connection layer
The method is characterized in that data are flattened into one dimension after the data reach a Flatten layer through a last convolution layer and a pooling layer, each neuron in the full connection layer is completely connected with all neurons in an upper layer after passing through the two full connection layers, Dropout operation is carried out on the output of the two full connection layers, the discarding rate is 0.5, partial units are not updated and are discarded randomly by a network, so that the structure of the network is changed after each iteration and is equivalent to the integrated learning effect of the network with various structures, and overfitting can be effectively prevented by averaging the combination of a plurality of networks;
the last layer is an output layer; for classifying various types of faults, a Softmax classifier is used; representing input pictures in a training dataset as xkThe label is ykDenotes xkA probability of belonging to class k, where y ∈ (1, 2.. eta., J) denotes a fault class; for each x, Softmax attempts to estimate the probability p (y J | x) of the tag for each y ∈ (1, 2...., J); the Softmax activation function is expressed as follows:
where θ is the weight matrix of the Softmax layer, θiIs the row vector of θ;
(4) parameter updating
To accommodate the multi-classification fault diagnosis task, the loss function is set as a cross-entropy loss function, expressed as:
wherein the content of the first and second substances,represents the probability that the prediction of the ith sample belongs to class k;for practical probability, if the true class of the ith sample is k, thenOtherwise, the value is 0; w(l)A parameter matrix of the l layer; first measure prediction in formulaAnd true categoryThe cross entropy between the two is maximum when the predicted value and the real value are equal, and the loss function is minimum; the second term is an L2 regularization term, and the coefficient lambda is a weight attenuation parameter;
the model training uses a random gradient descent method, and the process of updating the parameter W and the bias b in each iteration is as follows:
wherein alpha is a learning rate, and the amplitude of gradient change in each iteration is controlled; the residual amount of the loss function generated at the jth node of the l-th layer is recorded asThe recurrence formula is expressed as:
the gradient formula of the loss versus parameter function is expressed as:
for formula (13), ykThe value is 1 only in one category k, and the rest is 0; let the real category beThen:
obtaining the residual error of the last layer according to the Softmax activation function formula of the formula (12)
Residual δ of other layers(L-1),...,δ(1)To calculate according to recursion formula (15);
constructing a bearing fault feature extraction model by a Keras deep learning framework based on Python, wherein the framework is supported by a Tensorflow rear end; the SGD optimizer, cross entropy loss function and normalization method of Keras were chosen to train the parameters.
7. The bearing fault diagnosis method of claim 2, wherein in step four, the LGBM fault diagnosis, including gradient-based unilateral sampling and mutually exclusive feature bundling, comprises:
(1) a gradient-based single-sided sampling algorithm; training examples with large gradients in a centralized manner, and for examples with small gradients, randomly extracting and compensating the influence on data distribution by adding a constant multiplier when calculating information gain; the gos algorithm is as follows:
inputting: with n instances x1,...,xnTraining data I of the system, iteration times d, sampling rates a and b of large gradient data and small gradient data, a loss function loss and a plurality of weak learners L;
and (3) outputting: a well trained strong learner;
step 1: initialization: let topN ═ a × len (i) denote the number of large gradient data samples; adding L into a model list model, and setting the weight w of each training datum as 1;
step 2: predicting training data by the model list, calculating the loss g of each data by using a loss function loss, and arranging the training data according to the g descending order;
and step 3: taking top topN sequenced training data as large gradient subsets A, and taking the rest data sets ACRandom extraction of Bx ACTaking | as small gradient subsets B, and combining the large gradient subsets and the small gradient subsets to be recorded as usedSet;
and 4, step 4: multiplying the weight w of the small gradient sample by a coefficient (1-a)/b;
and 5: inputting data I, negative gradient-g and weight w corresponding to the used training data set usedSet into a learner L for training to obtain a new model;
according to vector Vj(d) Estimated variance gain over subsets A and BTo split the example:
wherein A isl={xi∈A:xij≤d},Ar={xi∈A:xij>d},Bl={xi∈B:xij≤d},Br={xi∈B:xijThe > d } coefficient (1-a)/B is used to normalize the sum of gradients on B to ACThe size of (d); add newModel to model list models;
step 6: circularly executing the steps 2 to 5 until the iteration number d is reached or convergence is reached;
(2) the mutual exclusion characteristic binding algorithm comprises two steps of binding cluster generation and mutual exclusion characteristic combination;
determining which mutually exclusive characteristics can be combined by a binding cluster generation algorithm, wherein the characteristics which can be combined are put together and are called bundle; combining mutually exclusive characteristics to combine each bundle into one characteristic; determining which mutually exclusive features can be used in combination is Greeny bundle, wherein the process is to firstly take the features as vertexes and add edges to each feature under the condition that every two features are not mutually exclusive, so that the optimal binding problem is simplified into a graph coloring problem, and then a greedy algorithm is used; mutual exclusion feature merging constructs feature packets by making mutual exclusion features exclusivefeatures reside in different bins, which can be simply implemented by adding offsets to the values of the original features;
the output of the penultimate layer of AlexNet is classified using Python's LGBMClassifier packet for programming;
(3) bayesian hyper-parameter optimization
Performing parameter tuning on the training process of the LGBM model by using HyperOpt; HyperOpt provides an easy-to-use Bayesian hyper-parameter optimization algorithm, and hyper-parameter optimization is executed through a model-based sequential optimization technology; optimization based on a sequence model is a Bayesian optimization technology;
bayesian optimization is an optimization algorithm based on a model, and is specially designed for a target function, namely the target function is customized, and the Bayesian optimization searches the maximum value of an unknown target function from which a sample can be obtained; the method comprises the following steps of (1) establishing a model of an objective function by using a regression method as with all model-based optimization algorithms, selecting a next point to be acquired according to the model, and updating the model;
the basic algorithm of bayesian optimization is as follows:
step 1: setting a Gaussian process for the target function f;
step 2: according to the initial space filling experimental design, at n0Point observation f, setting n to n0;
And step 3: when N is less than or equal to N, executing a loop: updating the posterior probability distribution over f using all available data; let xnIs the maximum of the capture function on x, where the capture function is calculated using the current a posteriori distribution; observation of yn=f(xn) (ii) a Increasing n by 1;
and 4, step 4: returning a solution: a point calculated using the maximum f (x), or a point calculated using the maximum a posteriori mean;
the objective function f is usually unknown, a gaussian process defining for each point x the probability distribution f (x) of the gaussian distribution, determined by the mean μ and the standard deviation σ, defining the probability distribution of the function:
to estimate μ (x) and σ (x), a gaussian process is fitted to the data; assuming that each observation f (χ) is a normally distributed sample, if there is a data set consisting of multiple observations, f (χ)1),f(χ2),...,f(χt) Then the vector [ f (χ) of the data set1),f(χ2),...,f(χt)]Is a sample of multivariate normal distribution, which is formed by summing the mean vectorsDefining a variance matrix, so that the Gaussian process is n-variable normal distribution, wherein n is the observation times; the covariance matrix is determined by a kernel function k (χ)1,χ2) By definition, it means that samples at a distance are nearly uncorrelated, while samples in the vicinity are highly correlated; two observations correspond to similar χ values based on a priori assumptions of the fact that the function tends to be smooth and the likelihood of the prior function1Hexix-2The values are likely to be correlated;
given a set of observations P1:t=f(χ1:t) And sampling noiseThe gaussian process is calculated as follows:
bayesian optimization enables the maximum f (x) of an unknown objective function to be searched by using the Gaussian process model; selecting the next χ is tested by selecting the maximum of the acquisition function to balance the exploration, i.e., improving the model in the less explored portion of the search space, and the development, i.e., favoring the balance between the promising portions predicted by the model; after the observation, the algorithm will update the gaussian process to take into account the new data; since all points of the search space are assumed to have good probability, the gaussian process is initialized with a constant mean; after each observation, the model is gradually improved;
the Gaussian process is completely specified by the mean function for μ (x) and the kernel function k (χ)1,χ2);
The goal is to learn the characteristic length scale l2And the total varianceThe probability of data, θ, is maximized given the kernel function, and the marginal probability is calculated as follows:
wherein, mu0Is a mean function.
8. A bearing fault diagnosis system for implementing the bearing fault diagnosis method according to any one of claims 1 to 7, characterized in that the bearing fault diagnosis system comprises:
the signal sampling module is used for taking each sample _ length continuous data point of the original vibration data as a sample and continuously sampling the original vibration data according to the sample _ interval at a sampling interval in an overlapped sampling mode;
the wavelet transform signal processing module is used for carrying out Morlet continuous wavelet transform signal processing, carrying out continuous wavelet transform on each sample, generating a corresponding time-frequency image, readjusting the time-frequency image into a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the AlexNet feature extraction module; for the test process, jumping to the test module;
the AlexNet characteristic extraction module is used for inputting a time-frequency diagram with the size of NxN of the training set into an improved AlexNet model for training and storing the model;
the LGBM fault diagnosis module is used for inputting a time-frequency graph with the size of NxN of a training set into a trained AlexNet model, taking out the output of a penultimate full-link layer, inputting the time-frequency graph into the LGBM model for training, and the data dimension is sample _ Num x 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
and the test module is used for inputting the time-frequency diagram with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by the AlexNet, inputting the trained LGBM model, and obtaining the output of the LGBM model as a fault diagnosis result.
9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of:
(1) signal sampling: taking continuous data points of each sample _ length as a sample of the original vibration data, and continuously sampling according to a sampling interval sample _ interval in an overlapped sampling mode;
(2) and Morlet continuous wavelet transformation signal processing: carrying out continuous wavelet transform on each sample to generate a corresponding time-frequency image, readjusting the time-frequency image to be a color image with the size of NxN, and generating enough images to be divided into a training set and a test set; for the training process, executing the step (3); for the test process, jumping to the step (5);
(3) extracting AlexNet features: inputting a time-frequency diagram with the size of NxN of a training set into an improved AlexNet model for training, and storing the model;
(4) LGBM fault diagnosis: inputting a time-frequency diagram with the size of N multiplied by N of a training set into a trained AlexNet model, taking out the output of a penultimate full-connected layer, inputting the output into an LGBM model for training, and setting the data dimension to be sample _ Num multiplied by 1000; wherein sample _ Num represents the number of samples, and 1000 is the number of neurons of the second fully-connected layer of the AlexNet model;
(5) the testing process comprises the following steps: and inputting a time-frequency graph with the size of NxN of the test set into the trained AlexNet model, taking out the output of the second full-connection layer of the AlexNet model as the characteristic extracted by AlexNet, and inputting the trained LGBM model, wherein the output of the LGBM model is the fault diagnosis result.
10. An information data processing terminal characterized by being used to implement the bearing failure diagnosis system according to claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110997171.6A CN113834656B (en) | 2021-08-27 | 2021-08-27 | Bearing fault diagnosis method, system, equipment and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110997171.6A CN113834656B (en) | 2021-08-27 | 2021-08-27 | Bearing fault diagnosis method, system, equipment and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113834656A true CN113834656A (en) | 2021-12-24 |
CN113834656B CN113834656B (en) | 2024-04-30 |
Family
ID=78961351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110997171.6A Active CN113834656B (en) | 2021-08-27 | 2021-08-27 | Bearing fault diagnosis method, system, equipment and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113834656B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114282579A (en) * | 2021-12-30 | 2022-04-05 | 浙大城市学院 | Aviation bearing fault diagnosis method based on variational modal decomposition and residual error network |
CN114609994A (en) * | 2022-02-24 | 2022-06-10 | 天津大学 | Fault diagnosis method and device based on multi-granularity regularization rebalance incremental learning |
CN114646468A (en) * | 2022-02-28 | 2022-06-21 | 南京航空航天大学 | Subway wheel bearing fault diagnosis method based on small samples |
CN114692694A (en) * | 2022-04-11 | 2022-07-01 | 合肥工业大学 | Equipment fault diagnosis method based on feature fusion and integrated clustering |
CN114964476A (en) * | 2022-05-27 | 2022-08-30 | 中国石油大学(北京) | Fault diagnosis method, device and equipment for oil and gas pipeline system power equipment |
CN115017121A (en) * | 2022-08-05 | 2022-09-06 | 山东天意机械股份有限公司 | Concrete production equipment data storage system |
CN116434029A (en) * | 2023-06-15 | 2023-07-14 | 西南石油大学 | Drinking detection method |
CN116577061A (en) * | 2023-07-14 | 2023-08-11 | 常州市建筑科学研究院集团股份有限公司 | Detection method for wind resistance of metal roof, computer equipment and medium |
CN117171625A (en) * | 2023-10-23 | 2023-12-05 | 云和恩墨(北京)信息技术有限公司 | Intelligent classification method and device for working conditions, electronic equipment and storage medium |
CN117686226A (en) * | 2024-02-04 | 2024-03-12 | 南京凯奥思数据技术有限公司 | Automatic bearing fault diagnosis method and system based on energy ratio and energy sum |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160041070A1 (en) * | 2014-08-05 | 2016-02-11 | 01dB-METRAVIB, Société par Actions Simplifiée | Automatic Rotating-Machine Fault Diagnosis With Confidence Level Indication |
CN107179194A (en) * | 2017-06-30 | 2017-09-19 | 安徽工业大学 | Rotating machinery fault etiologic diagnosis method based on convolutional neural networks |
CN108426713A (en) * | 2018-02-26 | 2018-08-21 | 成都昊铭科技有限公司 | Rolling bearing Weak fault diagnostic method based on wavelet transformation and deep learning |
CN111274911A (en) * | 2020-01-17 | 2020-06-12 | 河海大学 | Dense fog monitoring method based on wireless microwave attenuation characteristic transfer learning |
US20200209109A1 (en) * | 2018-12-28 | 2020-07-02 | Shanghai United Imaging Intelligence Co., Ltd. | Systems and methods for fault diagnosis |
CN111442926A (en) * | 2020-01-11 | 2020-07-24 | 哈尔滨理工大学 | Fault diagnosis method for rolling bearings of different models under variable load based on deep characteristic migration |
CN111504675A (en) * | 2020-04-14 | 2020-08-07 | 河海大学 | On-line diagnosis method for mechanical fault of gas insulated switchgear |
US20200302234A1 (en) * | 2019-03-22 | 2020-09-24 | Capital One Services, Llc | System and method for efficient generation of machine-learning models |
CN111721536A (en) * | 2020-07-20 | 2020-09-29 | 哈尔滨理工大学 | Rolling bearing fault diagnosis method for improving model migration strategy |
CN112036435A (en) * | 2020-07-22 | 2020-12-04 | 温州大学 | Brushless direct current motor sensor fault detection method based on convolutional neural network |
US20210020360A1 (en) * | 2019-07-15 | 2021-01-21 | Wuhan University | Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation |
US20210065065A1 (en) * | 2019-09-03 | 2021-03-04 | Palo Alto Research Center Incorporated | Method for classification based diagnosis with partial system model information |
CN112733612A (en) * | 2020-12-18 | 2021-04-30 | 华中科技大学 | Cross-domain rotating machinery fault diagnosis model establishing method and application thereof |
CN113159218A (en) * | 2021-05-12 | 2021-07-23 | 北京联合大学 | Radar HRRP multi-target identification method and system based on improved CNN |
-
2021
- 2021-08-27 CN CN202110997171.6A patent/CN113834656B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160041070A1 (en) * | 2014-08-05 | 2016-02-11 | 01dB-METRAVIB, Société par Actions Simplifiée | Automatic Rotating-Machine Fault Diagnosis With Confidence Level Indication |
CN107179194A (en) * | 2017-06-30 | 2017-09-19 | 安徽工业大学 | Rotating machinery fault etiologic diagnosis method based on convolutional neural networks |
CN108426713A (en) * | 2018-02-26 | 2018-08-21 | 成都昊铭科技有限公司 | Rolling bearing Weak fault diagnostic method based on wavelet transformation and deep learning |
US20200209109A1 (en) * | 2018-12-28 | 2020-07-02 | Shanghai United Imaging Intelligence Co., Ltd. | Systems and methods for fault diagnosis |
US20200302234A1 (en) * | 2019-03-22 | 2020-09-24 | Capital One Services, Llc | System and method for efficient generation of machine-learning models |
US20210020360A1 (en) * | 2019-07-15 | 2021-01-21 | Wuhan University | Internal thermal fault diagnosis method of oil-immersed transformer based on deep convolutional neural network and image segmentation |
US20210065065A1 (en) * | 2019-09-03 | 2021-03-04 | Palo Alto Research Center Incorporated | Method for classification based diagnosis with partial system model information |
CN111442926A (en) * | 2020-01-11 | 2020-07-24 | 哈尔滨理工大学 | Fault diagnosis method for rolling bearings of different models under variable load based on deep characteristic migration |
CN111274911A (en) * | 2020-01-17 | 2020-06-12 | 河海大学 | Dense fog monitoring method based on wireless microwave attenuation characteristic transfer learning |
CN111504675A (en) * | 2020-04-14 | 2020-08-07 | 河海大学 | On-line diagnosis method for mechanical fault of gas insulated switchgear |
CN111721536A (en) * | 2020-07-20 | 2020-09-29 | 哈尔滨理工大学 | Rolling bearing fault diagnosis method for improving model migration strategy |
CN112036435A (en) * | 2020-07-22 | 2020-12-04 | 温州大学 | Brushless direct current motor sensor fault detection method based on convolutional neural network |
CN112733612A (en) * | 2020-12-18 | 2021-04-30 | 华中科技大学 | Cross-domain rotating machinery fault diagnosis model establishing method and application thereof |
CN113159218A (en) * | 2021-05-12 | 2021-07-23 | 北京联合大学 | Radar HRRP multi-target identification method and system based on improved CNN |
Non-Patent Citations (1)
Title |
---|
李向伟 等: "基于双向长短时记忆网络和卷积神经网络的 电力系统暂态稳定评估", 《科学技术与工程》, vol. 20, no. 7 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114282579A (en) * | 2021-12-30 | 2022-04-05 | 浙大城市学院 | Aviation bearing fault diagnosis method based on variational modal decomposition and residual error network |
CN114609994A (en) * | 2022-02-24 | 2022-06-10 | 天津大学 | Fault diagnosis method and device based on multi-granularity regularization rebalance incremental learning |
CN114609994B (en) * | 2022-02-24 | 2023-11-07 | 天津大学 | Fault diagnosis method and device based on multi-granularity regularized rebalancing increment learning |
CN114646468A (en) * | 2022-02-28 | 2022-06-21 | 南京航空航天大学 | Subway wheel bearing fault diagnosis method based on small samples |
CN114646468B (en) * | 2022-02-28 | 2022-12-23 | 南京航空航天大学 | Subway wheel bearing fault diagnosis method based on small samples |
CN114692694A (en) * | 2022-04-11 | 2022-07-01 | 合肥工业大学 | Equipment fault diagnosis method based on feature fusion and integrated clustering |
CN114692694B (en) * | 2022-04-11 | 2024-02-13 | 合肥工业大学 | Equipment fault diagnosis method based on feature fusion and integrated clustering |
CN114964476B (en) * | 2022-05-27 | 2023-08-22 | 中国石油大学(北京) | Fault diagnosis method, device and equipment for oil and gas pipeline system moving equipment |
CN114964476A (en) * | 2022-05-27 | 2022-08-30 | 中国石油大学(北京) | Fault diagnosis method, device and equipment for oil and gas pipeline system power equipment |
CN115017121A (en) * | 2022-08-05 | 2022-09-06 | 山东天意机械股份有限公司 | Concrete production equipment data storage system |
CN115017121B (en) * | 2022-08-05 | 2022-10-25 | 山东天意机械股份有限公司 | Data storage system of concrete production equipment |
CN116434029A (en) * | 2023-06-15 | 2023-07-14 | 西南石油大学 | Drinking detection method |
CN116434029B (en) * | 2023-06-15 | 2023-08-18 | 西南石油大学 | Drinking detection method |
CN116577061B (en) * | 2023-07-14 | 2023-09-15 | 常州市建筑科学研究院集团股份有限公司 | Detection method for wind resistance of metal roof, computer equipment and medium |
CN116577061A (en) * | 2023-07-14 | 2023-08-11 | 常州市建筑科学研究院集团股份有限公司 | Detection method for wind resistance of metal roof, computer equipment and medium |
CN117171625A (en) * | 2023-10-23 | 2023-12-05 | 云和恩墨(北京)信息技术有限公司 | Intelligent classification method and device for working conditions, electronic equipment and storage medium |
CN117171625B (en) * | 2023-10-23 | 2024-02-06 | 云和恩墨(北京)信息技术有限公司 | Intelligent classification method and device for working conditions, electronic equipment and storage medium |
CN117686226A (en) * | 2024-02-04 | 2024-03-12 | 南京凯奥思数据技术有限公司 | Automatic bearing fault diagnosis method and system based on energy ratio and energy sum |
CN117686226B (en) * | 2024-02-04 | 2024-04-16 | 南京凯奥思数据技术有限公司 | Automatic bearing fault diagnosis method and system based on energy ratio and energy sum |
Also Published As
Publication number | Publication date |
---|---|
CN113834656B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113834656B (en) | Bearing fault diagnosis method, system, equipment and terminal | |
Solanki et al. | Music instrument recognition using deep convolutional neural networks | |
CN108231201B (en) | Construction method, system and application method of disease data analysis processing model | |
CN110728360B (en) | Micro-energy device energy identification method based on BP neural network | |
WO2022121289A1 (en) | Methods and systems for mining minority-class data samples for training neural network | |
Corizzo et al. | Scalable auto-encoders for gravitational waves detection from time series data | |
CN111860982A (en) | Wind power plant short-term wind power prediction method based on VMD-FCM-GRU | |
JP2019207685A (en) | Method, device and system for estimating causal relation between observation variables | |
CN109993236A (en) | Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks | |
US11830521B2 (en) | Voice activity detection method and system based on joint deep neural network | |
CN114169110B (en) | Motor bearing fault diagnosis method based on feature optimization and GWAA-XGboost | |
CN115290326A (en) | Rolling bearing fault intelligent diagnosis method | |
CN113780160A (en) | Electric energy quality disturbance signal classification method and system | |
CN116819423A (en) | Method and system for detecting abnormal running state of gateway electric energy metering device | |
CN113792879A (en) | Case reasoning attribute weight adjusting method based on introspection learning | |
CN113420870A (en) | U-Net structure generation countermeasure network and method for underwater acoustic target recognition | |
Garcia-Cardona et al. | Structure prediction from neutron scattering profiles: A data sciences approach | |
CN112884093B (en) | Rotary machine fault diagnosis method and equipment based on DSCRN model and storage medium | |
Cai et al. | Inductive Conformal Out-of-distribution Detection based on Adversarial Autoencoders | |
Vidnerová et al. | Kernel Function Tuning for Single-Layer Neural Networks | |
JP7310937B2 (en) | Abnormality degree calculation device, abnormal sound detection device, methods and programs thereof | |
Daneshfar et al. | Speech Emotion Recognition System by Quaternion Nonlinear Echo State Network | |
Garcia-Cardona et al. | Structure prediction from scattering profiles: A neutron-scattering use-case | |
CN117312920A (en) | Weighting integration unbalance classification method, system, storage medium, equipment and terminal | |
CN117523278A (en) | Semantic attention element learning method based on Bayesian estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |