CN113281048A - Rolling bearing fault diagnosis method and system based on relational knowledge distillation - Google Patents
Rolling bearing fault diagnosis method and system based on relational knowledge distillation Download PDFInfo
- Publication number
- CN113281048A CN113281048A CN202110716619.2A CN202110716619A CN113281048A CN 113281048 A CN113281048 A CN 113281048A CN 202110716619 A CN202110716619 A CN 202110716619A CN 113281048 A CN113281048 A CN 113281048A
- Authority
- CN
- China
- Prior art keywords
- model
- distillation
- representing
- loss
- teacher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01M—TESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
- G01M13/00—Testing of machine parts
- G01M13/04—Bearings
- G01M13/045—Acoustic or vibration analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Acoustics & Sound (AREA)
- Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention discloses a rolling bearing fault diagnosis method and system based on relational knowledge distillation, and belongs to the technical field of fault diagnosis. According to the invention, after the original vibration signals of the bearing are collected, a time-frequency graph is constructed for each processing sample as a fault sample, and the fault sample is used as the input of a fault diagnosis system. According to the invention, the student model is adopted to simultaneously learn the multivariate relation between the Softmax output soft label of the teacher model and the output of a plurality of samples in the last pooling layer, namely, the student network learns from two aspects of teacher structure and single sample output in the teacher network, and the classification performance of the fault diagnosis system is effectively improved under the condition of not increasing the memory and the training time. The invention realizes bearing fault diagnosis by using a relation knowledge distillation migration learning method, and effectively reduces the computational complexity by using the idea of replacing a large model with a small model.
Description
Technical Field
The invention belongs to the technical field of fault diagnosis, and particularly relates to a rolling bearing fault diagnosis method and system based on relational knowledge distillation.
Background
Rolling bearings are a key component of rotating machinery and are also one of the high failure rate elements, and according to incomplete statistics, 30% of failures of rotating equipment are caused by rolling bearing failures. The condition monitoring and fault diagnosis of the rolling bearing play an important role in knowing the operation performance of equipment and finding potential faults, and the management level and the maintenance efficiency of mechanical equipment can be effectively improved.
At present, a new round of artificial intelligence technology represented by deep learning makes the establishment of an end-to-end deep integrated intelligent fault diagnosis method become a new target in the industrial intelligence era. Compared with the traditional model, the deep learning model has deeper network layers and strong nonlinear computing capability, can better approximate complex function relationship, and has more successful application in the field of fault diagnosis. However, the success of deep learning fault diagnosis depends on a large amount of labeled high-quality data, and in deep learning, when a large-scale data set is trained, the number of network layers is often increased in order to process complex data distribution, and the number of model parameters can reach millions, so that a large amount of computing power and resources are consumed in training to achieve higher accuracy. However, the model is large in scale and high in training cost from the beginning, and is difficult to deploy due to the limitations of computing resources, response speed and the like when actual engineering is deployed.
Patent CN110162018A discloses an incremental equipment fault diagnosis method based on knowledge distillation and hidden layer sharing, and the main ideas are as follows: by using knowledge distillation and hidden layer sharing technologies, the shallow equipment fault diagnosis model is ensured to have better data feature extraction capability, and the fault classification performance of the shallow equipment fault diagnosis model is improved. Aiming at the continuous increase of industrial data and the update of a fault diagnosis model of edge equipment, the incremental learning of the model is realized by using methods such as effective sample identification, data set reconstruction, pre-training model fine adjustment and the like. The method provided by the invention overcomes the requirements on network bandwidth and network delay in the data transmission process of mass real-time industrial equipment, improves the accuracy of the fault diagnosis method of shallow equipment, and supports incremental learning. Through simulation experiments on bearing running state data, under the condition of limited computing resources, the method improves edge cloud cooperative data transmission efficiency, realizes fault prediction classification accuracy, and supports data learning and processing. Patent CN112504678A discloses a motor bearing fault diagnosis method based on knowledge distillation, which has the main ideas: the vibration signal training model is used as a teacher model, the current signal and the rotating speed signal are input as student models, and the student models are trained by using the dark knowledge brought by the teacher model, so that the student models can be stably converged to perform effective diagnosis.
However, the following drawbacks exist: 1) most of the collected time sequence signals are analyzed to extract effective characteristics, for example, the most relevant fault characteristics are extracted through multiple screening such as pretreatment of the collected vibration signals, characteristic screening and the like and are used as the input of a fault classifier, but the manual screening is long in time and effective information is easy to lose; 2) the student model only learns the output of Softmax at the tail end of the teacher model, namely only the performance of a single sample on the teacher model is considered, so that the fault diagnosis accuracy of the student model is low.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a rolling bearing fault diagnosis method and system based on relational knowledge distillation, and aims to improve the real-time response efficiency and accuracy of a fault diagnosis model.
To achieve the above object, according to a first aspect of the present invention, there is provided a rolling bearing failure diagnosis method based on relational knowledge distillation, the method including:
a preparation stage:
acquiring vibration signal sections of a rolling bearing in a normal state and a fault state, and measuring each state for multiple times; taking a plurality of continuous sampling points with the same state type as a processing sample, constructing each processing sample into a time-frequency graph, and taking the < time-frequency graph and the corresponding state type > as training samples to obtain a training sample set; constructing a teacher model-student model;
a training stage:
pre-training a teacher model by using a training sample set; simultaneously inputting a plurality of training samples into a pre-trained teacher model to obtain a plurality of corresponding features output by the last pooling layer of the teacher model, and taking the features as a feature set T;
randomly initializing a student model; simultaneously inputting a plurality of training samples into the initialized student model to obtain a plurality of corresponding features output by the last pooling layer of the student model as a feature set S;
calculating the binary distance and the ternary angle between elements in the feature set T, and calculating the binary distance and the ternary angle between the elements in the feature set S;
constructing distance distillation loss based on binary distances between elements in the feature sets T and S, and constructing angle distillation loss based on ternary angles between elements in the feature sets T and S;
incorporating distance and angle distillation losses into the overall loss function of the entire model;
training a teacher model-student model by taking the minimization of the total loss function as a target to obtain a trained teacher model-student model;
an application stage:
acquiring a vibration signal section of a rolling bearing to be detected, and constructing a time-frequency diagram; inputting the result into a trained student model to obtain a diagnosis result.
Preferably, continuous wavelet analysis is carried out on the normalized one-dimensional vibration signal segment to generate a continuous three-channel wavelet time-frequency map.
Has the advantages that: according to the invention, continuous wavelet analysis is preferably carried out on the normalized one-dimensional vibration signal segment to generate a continuous three-channel wavelet time-frequency map, and the direct generation of the time-frequency map does not need to carry out feature screening on signals, so that the loss of partial information caused by time-frequency domain feature extraction is reduced, and the performance of a fault diagnosis model is improved when a large number of fault samples are processed.
Preferably, the distance distillation loss is constructed based on the binary distances between the elements in the feature sets T and S, specifically as follows:
wherein L isRKD-DDenotes the distance distillation loss, xi,xjRespectively represent the ith, jth training sample, χ2Representing a set of binary relations,/δ() Representing the Huber loss function, ΨD(ti,tj) Represents tiAnd tjDistance of (2), ΨD(si,sj) Denotes siAnd sjDistance of (d), ti,tjRespectively representing a plurality of corresponding characteristics, s, of the ith and jth training samples input to the final pooling layer output of the teacher modeli,sjRespectively representing a plurality of corresponding characteristics output by the ith and jth training samples input to the last pooling layer of the student model.
Has the advantages that: the distance distillation loss is preferably constructed in the mode, and the distance difference in the characteristic space is punished to realize the transfer learning of the student model and the teacher model due to the distillation loss in the distance direction, so that the student model is not forced to be directly matched with the output of the teacher model, but the student model is encouraged to learn the distance structure output by the teacher model, and the fault diagnosis performance of the student model is closer to that of the teacher model.
Preferably, the angle distillation loss is constructed based on the ternary angles between the elements in the feature sets T and S, specifically as follows:
wherein L isRKD-ADenotes the angular distillation loss, xi,xj,xkRespectively represent the ith, j, k training samples, χ3To representSet of ternary relationships, lδ() Representing the Huber loss function, ti,tj,tkRespectively representing a plurality of corresponding characteristics, s, of the ith, j and k training samples input to the final pooling layer output of the teacher modeli,sj,skRespectively representing a plurality of corresponding characteristics, psi, of the ith, j, k training samples input to the last pooling layer output of the student modelA(ti,tj,tk) Representing teacher model output features ti,tj,tkTernary angular relationship between, ΨA(si,sj,sk) Representing student model output features si,sj,skA ternary angular relationship therebetween.
Has the advantages that: the angle distillation loss is preferably constructed in the mode, the angle distillation loss realizes the transfer learning of the embedding relation of the training samples in the student model and the teacher model by punishing the angle difference in the characteristic space, and because the angle is an attribute with a higher order than the distance, the angle distillation loss can effectively transmit the relation information and provide more flexibility for the student model in the training process, thereby having higher convergence and better performance.
Preferably ΨA(ti,tj,tk)=cos∠titjtk=<eij,ejk>
Wherein, titjtkRepresenting a ternary feature ti,tj,tkAngle of formation eijRepresents a vector titiUnitized vector of, ejkRepresents a vector tjtkThe vector of (a) is unitized,<,>represents a vector eij,ejkCosine value of the angle between ti,tj,tkRespectively representing the ith, j and k training samples to be input into the final pooling of the teacher modelA plurality of corresponding features of the layer output.
Has the advantages that: the invention preferably transmits the relation knowledge of the characteristics in the high-order space in the mode, even if the output characteristic dimension is different between the teacher model and the student model, the high-order characteristic angle potential energy is invariable to the low-order characteristic space angle potential energy through the calculation, the high-order potential energy is possibly strong in capturing the high-order structure, but the high-order potential energy is high in calculation cost, and therefore the measurement of the characteristics in the high-order space relation knowledge can be realized under the condition of small calculation amount by using simple and effective ternary angle relation.
Preferably, the total loss function is calculated as follows:
L=α*LKD+β*(ω1*LRKD-D+ω2*LRKD-A)
wherein L isKDDenotes the value of distillation loss, LRKD-DDenotes the distance distillation loss, LRKD-ARepresenting angular distillation loss, alpha, beta representing weight coefficient of various loss values, omega1,ω2Weights for distance distillation loss and angle distillation loss are expressed.
Has the advantages that: the invention preferably calculates the total loss in the above mode, and the student model can learn stronger feature expression capability from the teacher model due to the added punishment on the binary distance distillation loss and the ternary angle distillation loss, thereby improving the fault diagnosis performance of the student model.
To achieve the above object, according to a second aspect of the present invention, there is provided a rolling bearing failure diagnosis system based on relational knowledge distillation, comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium and execute the rolling bearing fault diagnosis method based on the relational knowledge distillation according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) in the existing bearing fault diagnosis method based on knowledge distillation, a student model only learns the output of Softmax at the tail end of a teacher model, namely, the performance of a single sample on the teacher model is only considered, so that the fault diagnosis accuracy of the student model is low. Aiming at the problem, the student model is adopted to simultaneously learn the multivariate relation between the output of Softmax of the teacher model and the output of a plurality of samples at the last pooling layer, namely, the network structure information of the teacher model can be learned, the structure information contained in the network is considered, the input samples of the same mini-batch are cooperatively learned, the student network learns from the two aspects of teacher structure and single sample output in the teacher network, and the classification performance of the fault diagnosis system is effectively improved under the condition of not increasing the memory and the training time.
(2) In the prior art, effective characteristics are extracted by analyzing the acquired time sequence signals, for example, the most relevant fault characteristics are extracted by performing multiple screening such as preprocessing and characteristic screening on the acquired vibration signals and are used as the input of a fault classifier, but the manual screening is long and effective information is easily lost. Aiming at the problem, after the original vibration signals of the rolling bearing are collected, 1000 sampling points are used as a processing sample, a time-frequency graph is constructed for each processing sample and used as a fault sample, the fault sample is used as the input of a teacher model, and the time-frequency graph contains complete time-frequency information of the vibration signals, so that the real-time response efficiency and the accuracy of a fault diagnosis model are improved.
(3) The method realizes bearing fault diagnosis by using a relational knowledge distillation migration learning method, effectively reduces the computational complexity by replacing a large model with a small model, trains a simple model more suitable for actual engineering deployment under the condition of ensuring the precision, and improves the response efficiency of a terminal model.
Drawings
FIG. 1 is a flow chart of a bearing fault diagnosis system based on relationship knowledge distillation provided by the present invention.
Fig. 2 is a schematic diagram of a network structure of a system model according to an embodiment of the present invention.
FIG. 3 is pseudo code of a loss function for batch computation knowledge distillation provided by the present invention.
FIG. 4 is a schematic diagram of a bearing fault diagnosis system based on relationship knowledge distillation provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the present invention provides a rolling bearing fault diagnosis method based on relational knowledge distillation, including:
step 1: the method comprises the steps of collecting and marking a sensor signal installed on a rolling bearing, wherein the signal is a vibration signal capable of reflecting the running characteristics of the bearing, the original tag hardtarget of a data set is a one-hot tag, namely the positive tag is 1, and the negative tag is 0.
Step 2: signal preprocessing and data transformation, namely taking a continuous wavelet time-frequency diagram of an original signal of the rolling bearing as model input, and dividing the generated time-frequency diagram into a training set and a test set.
Specifically, an original vibration signal is selected, the sample length is 1000 data samples, mexh wavelets are used as basic wavelets of continuous wavelet analysis, and a 32 x 3 three-channel wavelet time-frequency graph is generated for each sample and used as a new data set. Randomly breaking the time-frequency graph samples of each category, and selecting 80% of the time-frequency graph samples as a training set and 20% of the time-frequency graph samples as a testing set.
Continuous Wavelet Transform (CWT) is defined as the transformation of an arbitrary space L2The function f (t) of (R) is expanded on the wavelet basis essentially by projecting the time function onto the time-scale phase plane, as expressed by:
wherein, WTf(a, tau) is wavelet transform coefficient, < > represents inner product operation, wavelet base psia,τ(t) has two parameters, the scale α and the translation τ, and takes continuously varying values.
Wavelet is the wave phi (t) existing in a small area, and phi (t) epsilon L2(R), if its fourier transform Ψ (ω) satisfies the following equation, ψ (t) is a basic wavelet.
The invention selects mexh wavelet as basic wavelet of CWT, and its calculation formula is as follows:
the mexh wavelet function is a second derivative of a Gaussian function, has good localization in time domain and frequency domain, is used for extracting the edge of a signal and an image, and has no scale function and therefore has no orthogonality.
And step 3: pre-training a teacher model, training a teacher network by using a time-frequency graph training set and a real data label, and storing the optimal model as the teacher model.
Specifically, the teacher model training process is as follows:
step 3.1, establishing a teacher network: as shown in fig. 2, the teacher network is a ResNet-20 with a multi-layer network, and is composed of 19 convolutional layers and 1 fully-connected layer (excluding the pooling layer and the BN layer), wherein three residual Block (Block) structures constitute a layer module. The student network is a ResNet-8 residual network and consists of 7 convolutional layers and 1 full-connection layer, the layer module is only provided with one residual block, the network output adopts a global average pooling layer to replace the full-connection layer of the traditional convolutional neural network, the GAP layer is closely associated with each class, a feature map can be generated for each class, the black box operation of the full-connection layer is avoided, the GAP layer does not have parameters needing to be learned, the parameter number is effectively reduced, and the problem of overfitting is avoided.
The invention uses the residual error network as the backbone frame of the fault diagnosis system, the teacher network adopts ResNet-, the student network adopts ResNet-8 network structure, the network structure is similar, the output dimension is consistent, which is more beneficial to the extraction of the characteristic information, the training is more stable, the problems of degradation and gradient disappearance or explosion of the conventional CNN along with the deepening of the network layer number are solved, and the characteristic extraction capability of the fault diagnosis signal is effectively improved.
Step 3.2 pre-training the teacher model: and (3) inputting the time-frequency graph training set generated in the step (2) into a ResNet-20 network, and comparing the model output with the sample real label to obtain the difference between the model output and the sample real label so as to form a loss function. And continuously updating the weight of the model network by using a back propagation algorithm to reduce the loss function until the model converges, and storing the model with the highest accuracy on the test set as the final model.
And 4, step 4: and (3) the student model learns the relationship structure information of the pre-training teacher model for training, the test set is used for verifying the student model, and the student model is stored as a model for final deployment when the prediction precision reaches the best.
Specifically, the training learning process of the student model is as follows:
step 4.1, calculating a relation knowledge distillation loss function value: one mini-batch time frequency diagram sample { x1,…,xnInputting the data into a pre-trained teacher model and an initialized student model, and respectively outputting the characteristics f of the last pooling layerT,fsAs the feature of the learning structure, calculating the loss function of the relation knowledge distillationAndthe calculation process is shown in fig. 3.
Step 4.2, calculating a knowledge distillation loss function value: distilling the pre-trained teacher network at high temperature T, calculating the soft target value of the teacher model at the temperature T, and comparing the soft target value with the student model to obtain Losssoft. The calculation formula of the soft label after the knowledge distillation is as follows, wherein a temperature coefficient T is introduced to control the distribution of the soft label, and the original Softmax output value is obtained when T is 1.
As shown in FIG. 4, when training a student model, the student network learns the soft object generated by the teacher network with the same T value, approaches the soft object to learn the structural distribution characteristics of the data, and the output is compared with the softtarget to generate Losssoft(ii) a Meanwhile, the student network calculates Softmax to obtain a predicted value, and the predicted value is compared with Hardtarget to generate Losshard. The total Loss function Loss is obtained by the weighted sum of the lambda of the two Loss functions and is a target function, and the Loss calculation formula is as follows:
Loss=λLosssof+(1-λ)Losshard
in order to enable the output of the student model to be closer to that of the teacher model, KL divergence is introduced to measure the output distribution of the two models, distillation is realized by constantly minimizing the KL divergence in the learning process, and at the moment, a total loss function L is obtainedKDThe following steps are changed:
LKD=αT2·KLdiv(Qs,QT)+(1-α).Losshard
in the formula, Qs,QTAnd respectively outputting the Softmax output for the student model and the Softmax output for the teacher model, wherein alpha is an adjusting coefficient.
The student models are trained under supervision of pre-trained teacher models, different layers of output features of the teacher network are recombined into structural information, and the student network learns the structural relationship of the teacher network and single output information to improve model performance. Wherein the structural information uses the binary distance ΨdAnd ternary angle ΨATo achieve this relationship.
ΨdThe characteristic distance of a binary sample in a student network kernel teacher network can be measured, wherein mu is a normalized coefficient of the distance, and the calculation formula is as follows:
wherein mu is a normalized coefficient of the distance, and the calculation formula is as follows:
wherein lδIs the Huber loss, defined as:
ΨAthe method can measure the angular distance of the ternary samples in the student network, the core and teacher network, and the calculation formula is psiA(ti,tj,tk)=co∠titjtk=<eij,ejk>
the angular loss function at this time is:
step 4.3, calculating a total loss function value of the distillation fault diagnosis system based on the relational knowledge:
in the formula, Losshard,LossdivThe loss value of the hard tag in knowledge distillation, KL divergence loss value, T knowledge distillation temperature, gamma, alpha and beta are weight coefficients of each loss function value, and omega1,ω2To adjust the weight of the distance loss and angle loss values.
And 4.4, training a student model, initializing the student model, training a student network by using a training set and a hardtarget, enabling the performance of the student network in the system to be closer to that of a teacher network from two aspects by learning structural information of the teacher network and output information of a single sample in the teacher network, updating a total loss function, updating model parameters according to a gradient updating and error back propagation mode, and storing the model with the highest accuracy on the test set as a finally deployed model.
To further elaborate on the invention, the invention was verified using bearing data of a rotating machine, which was derived from a rolling bearing condition monitoring experiment at the university of Paderborn, germany, which provides condition monitoring of rolling bearing current and vibration signals to collect vibration signals. This time verification experiment chooses acceleration life experimental data among the bearing experiment for use, and this experiment produces real bearing fault data through the acceleration life experiment, can simulate the data in the actual engineering, and the model of training is more suitable for the engineering and uses. The test stand was operated at 1500rpm, with a load torque of 0.7Nm and a radial force F of 1000N acting on the bearing. And selecting 6203 type ball bearing fault data, and dividing the states into an outer ring damage state, an inner ring damage state and a normal state. 20 measurements were recorded for each condition, 4 seconds each. Selecting an original vibration signal, wherein the sample length is 1000 data samples, and generating a 32 x 3 three-channel CWT time-frequency graph of each sample as the CWT time-frequency graphA new data set. Randomly breaking the time-frequency graph samples of each category, and selecting 80% of the time-frequency graph samples as a training set and 20% of the time-frequency graph samples as a testing set. The initial learning rate lr of the teacher network is 0.05, momentum is 0.9, and weight _ decay is 5 e-4; the initial learning rate of the student network is 0.01, an SGD optimization algorithm is selected for iteration, the batch number is 24, the system hyper-parameter is set to be gamma-0.5, alpha-0.3, beta-0.2, and omega1=2,ω25, the distillation temperature T is 4, and the experimental result is shown in Table 1, so that the method effectively improves the classification performance of the fault diagnosis system under the condition of not increasing the memory and the training time.
TABLE 1
The above-mentioned modules in the system for diagnosing the distillation fault based on the relationship knowledge can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (7)
1. A rolling bearing fault diagnosis method based on relational knowledge distillation is characterized by comprising the following steps:
a preparation stage:
acquiring vibration signal sections of a rolling bearing in a normal state and a fault state, and measuring each state for multiple times; taking a plurality of continuous sampling points with the same state type as a processing sample, constructing each processing sample into a time-frequency graph, and taking the < time-frequency graph and the corresponding state type > as training samples to obtain a training sample set; constructing a teacher model-student model;
a training stage:
pre-training a teacher model by using a training sample set; simultaneously inputting a plurality of training samples into a pre-trained teacher model to obtain a plurality of corresponding features output by the last pooling layer of the teacher model, and taking the features as a feature set T;
randomly initializing a student model; simultaneously inputting a plurality of training samples into the initialized student model to obtain a plurality of corresponding features output by the last pooling layer of the student model as a feature set S;
calculating the binary distance and the ternary angle between elements in the feature set T, and calculating the binary distance and the ternary angle between the elements in the feature set S;
constructing distance distillation loss based on binary distances between elements in the feature sets T and S, and constructing angle distillation loss based on ternary angles between elements in the feature sets T and S;
incorporating distance and angle distillation losses into the overall loss function of the entire model;
training a teacher model-student model by taking the minimization of the total loss function as a target to obtain a trained teacher model-student model;
an application stage:
acquiring a vibration signal section of a rolling bearing to be detected, and constructing a time-frequency diagram; inputting the result into a trained student model to obtain a diagnosis result.
2. The method of claim 1, wherein the normalized one-dimensional vibratory signal segment is subjected to continuous wavelet analysis to generate a continuous three-channel wavelet time-frequency map.
3. The method according to claim 1 or 2, characterized in that the distance distillation loss is constructed on the basis of the binary distances between the elements in the feature sets T and S, in particular as follows:
wherein L isRKD-DDenotes the distance distillation loss, xi,xjRespectively represent the ith, jth training sample, χ2Representing a set of binary relations,/δ() Representing the Huber loss function, ΨD(ti,tj) Represents tiAnd tjDistance of (2), ΨD(si,sj) Denotes siAnd sjDistance of (d), ti,tjRespectively representing a plurality of corresponding characteristics, s, of the ith and jth training samples input to the final pooling layer output of the teacher modeli,sjRespectively representing a plurality of corresponding characteristics output by the ith and jth training samples input to the last pooling layer of the student model.
4. The method according to claim 1 or 2, characterized in that the angular distillation loss is constructed on the basis of the ternary angles between the elements in the feature sets T and S, as follows:
wherein L isRKD-ADenotes the angular distillation loss, xi,xj,xkRespectively represent the ith, j, k training samples, χ3Representing a set of ternary relationships, lδ() Representing the Huber loss function, ti,tj,tkRespectively representing a plurality of corresponding characteristics, s, of the ith, j and k training samples input to the final pooling layer output of the teacher modeli,sj,skRespectively representing a plurality of corresponding characteristics, psi, of the ith, j, k training samples input to the last pooling layer output of the student modelA(ti,tj,tk) Representing teacher model outputCharacteristic ti,tj,tkTernary angular relationship between, ΨA(si,sj,sk) Representing student model output features si,sj,skA ternary angular relationship therebetween.
5. The method of claim 4,
ΨA(ti,tj,tk)=cos∠titjtk=<eij,ejk>
wherein, titjtkRepresenting a ternary feature ti,tj,tkAngle of formation eijRepresents a vector titjUnitized vector of, ejkRepresents a vector tjtkThe vector of (a) is unitized,<,>cosine value, t, representing the angle between vectorsi,tj,tkAnd respectively representing a plurality of corresponding characteristics output by the ith, j and k training samples input to the last pooling layer of the teacher model.
6. The method of claim 1 or 2, wherein the total loss function is calculated as follows:
L=α*LKD+β*(ω1*LRKD-D+ω2*LRKD-A)
wherein L isKDDenotes the value of distillation loss, LRKD-DDenotes the distance distillation loss, LRKD-ARepresenting angular distillation loss, alpha, beta representing weight coefficient of various loss values, omega1,ω2Weights for distance distillation loss and angle distillation loss are expressed.
7. A rolling bearing fault diagnosis system based on relational knowledge distillation is characterized by comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer readable storage medium and executing the rolling bearing fault diagnosis method based on the relational knowledge distillation, which is disclosed by any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110716619.2A CN113281048B (en) | 2021-06-25 | 2021-06-25 | Rolling bearing fault diagnosis method and system based on relational knowledge distillation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110716619.2A CN113281048B (en) | 2021-06-25 | 2021-06-25 | Rolling bearing fault diagnosis method and system based on relational knowledge distillation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113281048A true CN113281048A (en) | 2021-08-20 |
CN113281048B CN113281048B (en) | 2022-03-29 |
Family
ID=77285707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110716619.2A Active CN113281048B (en) | 2021-06-25 | 2021-06-25 | Rolling bearing fault diagnosis method and system based on relational knowledge distillation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113281048B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113849641A (en) * | 2021-09-26 | 2021-12-28 | 中山大学 | Knowledge distillation method and system for cross-domain hierarchical relationship |
CN114092918A (en) * | 2022-01-11 | 2022-02-25 | 深圳佑驾创新科技有限公司 | Model training method, device, equipment and storage medium |
CN114152441A (en) * | 2021-12-13 | 2022-03-08 | 山东大学 | Rolling bearing fault diagnosis method and system based on shift window converter network |
CN114429153A (en) * | 2021-12-31 | 2022-05-03 | 苏州大学 | Lifetime learning-based gearbox increment fault diagnosis method and system |
CN114722886A (en) * | 2022-06-10 | 2022-07-08 | 四川大学 | Knowledge distillation-based crankshaft internal defect detection method and detection equipment |
WO2022217853A1 (en) * | 2021-04-16 | 2022-10-20 | Huawei Technologies Co., Ltd. | Methods, devices and media for improving knowledge distillation using intermediate representations |
CN116110022A (en) * | 2022-12-10 | 2023-05-12 | 河南工业大学 | Lightweight traffic sign detection method and system based on response knowledge distillation |
CN116189874A (en) * | 2023-03-03 | 2023-05-30 | 海南大学 | Telemedicine system data sharing method based on federal learning and federation chain |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268292A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Learning efficient object detection models with knowledge distillation |
CN110162018A (en) * | 2019-05-31 | 2019-08-23 | 天津开发区精诺瀚海数据科技有限公司 | The increment type equipment fault diagnosis method that knowledge based distillation is shared with hidden layer |
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
CN112508169A (en) * | 2020-11-13 | 2021-03-16 | 华为技术有限公司 | Knowledge distillation method and system |
CN112504678A (en) * | 2020-11-12 | 2021-03-16 | 重庆科技学院 | Motor bearing fault diagnosis method based on knowledge distillation |
CN112560693A (en) * | 2020-12-17 | 2021-03-26 | 华中科技大学 | Highway foreign matter identification method and system based on deep learning target detection |
CN112560631A (en) * | 2020-12-09 | 2021-03-26 | 昆明理工大学 | Knowledge distillation-based pedestrian re-identification method |
CN112712052A (en) * | 2021-01-13 | 2021-04-27 | 安徽水天信息科技有限公司 | Method for detecting and identifying weak target in airport panoramic video |
-
2021
- 2021-06-25 CN CN202110716619.2A patent/CN113281048B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268292A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Learning efficient object detection models with knowledge distillation |
CN110162018A (en) * | 2019-05-31 | 2019-08-23 | 天津开发区精诺瀚海数据科技有限公司 | The increment type equipment fault diagnosis method that knowledge based distillation is shared with hidden layer |
CN111199242A (en) * | 2019-12-18 | 2020-05-26 | 浙江工业大学 | Image increment learning method based on dynamic correction vector |
CN112504678A (en) * | 2020-11-12 | 2021-03-16 | 重庆科技学院 | Motor bearing fault diagnosis method based on knowledge distillation |
CN112508169A (en) * | 2020-11-13 | 2021-03-16 | 华为技术有限公司 | Knowledge distillation method and system |
CN112560631A (en) * | 2020-12-09 | 2021-03-26 | 昆明理工大学 | Knowledge distillation-based pedestrian re-identification method |
CN112560693A (en) * | 2020-12-17 | 2021-03-26 | 华中科技大学 | Highway foreign matter identification method and system based on deep learning target detection |
CN112712052A (en) * | 2021-01-13 | 2021-04-27 | 安徽水天信息科技有限公司 | Method for detecting and identifying weak target in airport panoramic video |
Non-Patent Citations (4)
Title |
---|
WONPYO PARK: "Relational Knowledge Distillation", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
YIWEI CHENG: "Intelligent fault diagnosis of rotating machinery based on continuous wavelet transform-local binary convolutional neural network", 《KNOWLEDGE-BASED SYSTEMS》 * |
袁泽昊: "基于特征知识蒸馏的人体姿态估计", 《软件》 * |
高钦泉: "基于知识蒸馏的超分辨率卷积神经网络压缩方法", 《计算机应用》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022217853A1 (en) * | 2021-04-16 | 2022-10-20 | Huawei Technologies Co., Ltd. | Methods, devices and media for improving knowledge distillation using intermediate representations |
CN113849641A (en) * | 2021-09-26 | 2021-12-28 | 中山大学 | Knowledge distillation method and system for cross-domain hierarchical relationship |
CN113849641B (en) * | 2021-09-26 | 2023-10-24 | 中山大学 | Knowledge distillation method and system for cross-domain hierarchical relationship |
CN114152441A (en) * | 2021-12-13 | 2022-03-08 | 山东大学 | Rolling bearing fault diagnosis method and system based on shift window converter network |
CN114429153A (en) * | 2021-12-31 | 2022-05-03 | 苏州大学 | Lifetime learning-based gearbox increment fault diagnosis method and system |
CN114429153B (en) * | 2021-12-31 | 2023-04-28 | 苏州大学 | Gear box increment fault diagnosis method and system based on life learning |
CN114092918A (en) * | 2022-01-11 | 2022-02-25 | 深圳佑驾创新科技有限公司 | Model training method, device, equipment and storage medium |
CN114722886A (en) * | 2022-06-10 | 2022-07-08 | 四川大学 | Knowledge distillation-based crankshaft internal defect detection method and detection equipment |
CN116110022A (en) * | 2022-12-10 | 2023-05-12 | 河南工业大学 | Lightweight traffic sign detection method and system based on response knowledge distillation |
CN116110022B (en) * | 2022-12-10 | 2023-09-05 | 河南工业大学 | Lightweight traffic sign detection method and system based on response knowledge distillation |
CN116189874A (en) * | 2023-03-03 | 2023-05-30 | 海南大学 | Telemedicine system data sharing method based on federal learning and federation chain |
CN116189874B (en) * | 2023-03-03 | 2023-11-28 | 海南大学 | Telemedicine system data sharing method based on federal learning and federation chain |
Also Published As
Publication number | Publication date |
---|---|
CN113281048B (en) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113281048B (en) | Rolling bearing fault diagnosis method and system based on relational knowledge distillation | |
CN109580215B (en) | Wind power transmission system fault diagnosis method based on deep generation countermeasure network | |
Li et al. | Self-attention ConvLSTM and its application in RUL prediction of rolling bearings | |
Han et al. | Intelligent fault diagnosis method for rotating machinery via dictionary learning and sparse representation-based classification | |
CN110427654B (en) | Landslide prediction model construction method and system based on sensitive state | |
Maschler et al. | Continual learning of fault prediction for turbofan engines using deep learning with elastic weight consolidation | |
CN111458142A (en) | Sliding bearing fault diagnosis method based on generation of countermeasure network and convolutional neural network | |
CN114549925A (en) | Sea wave effective wave height time sequence prediction method based on deep learning | |
CN114004252A (en) | Bearing fault diagnosis method, device and equipment | |
CN113255432B (en) | Turbine vibration fault diagnosis method based on deep neural network and manifold alignment | |
Tian et al. | A multilevel convolutional recurrent neural network for blade icing detection of wind turbine | |
CN113076920B (en) | Intelligent fault diagnosis method based on asymmetric domain confrontation self-adaptive model | |
CN116451150A (en) | Equipment fault diagnosis method based on semi-supervised small sample | |
CN114048688A (en) | Method for predicting service life of bearing of wind power generator | |
Stojanovic et al. | Semi-supervised learning for structured regression on partially observed attributed graphs | |
CN112784920A (en) | Cloud-side-end-coordinated dual-anti-domain self-adaptive fault diagnosis method for rotating part | |
Du et al. | DCGAN based data generation for process monitoring | |
CN117708656B (en) | Rolling bearing cross-domain fault diagnosis method for single source domain | |
CN114972904B (en) | Zero sample knowledge distillation method and system based on fighting against triplet loss | |
Du et al. | Convolutional neural network-based data anomaly detection considering class imbalance with limited data | |
Djaballah et al. | Deep transfer learning for bearing fault diagnosis using CWT time–frequency images and convolutional neural networks | |
CN117113078A (en) | Small sample bearing fault mode identification method and system based on multi-source data integration | |
CN116399592A (en) | Bearing fault diagnosis method based on channel attention dual-path feature extraction | |
Long et al. | A customized meta-learning framework for diagnosing new faults from unseen working conditions with few labeled data | |
CN115423045A (en) | System log detection method and system based on GAN network and meta learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |