CN113281048A - Rolling bearing fault diagnosis method and system based on relational knowledge distillation - Google Patents

Rolling bearing fault diagnosis method and system based on relational knowledge distillation Download PDF

Info

Publication number
CN113281048A
CN113281048A CN202110716619.2A CN202110716619A CN113281048A CN 113281048 A CN113281048 A CN 113281048A CN 202110716619 A CN202110716619 A CN 202110716619A CN 113281048 A CN113281048 A CN 113281048A
Authority
CN
China
Prior art keywords
model
distillation
representing
loss
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110716619.2A
Other languages
Chinese (zh)
Other versions
CN113281048B (en
Inventor
朱海平
王慧
陈志鹏
石海彬
冯世元
程佳欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110716619.2A priority Critical patent/CN113281048B/en
Publication of CN113281048A publication Critical patent/CN113281048A/en
Application granted granted Critical
Publication of CN113281048B publication Critical patent/CN113281048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M13/00Testing of machine parts
    • G01M13/04Bearings
    • G01M13/045Acoustic or vibration analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a rolling bearing fault diagnosis method and system based on relational knowledge distillation, and belongs to the technical field of fault diagnosis. According to the invention, after the original vibration signals of the bearing are collected, a time-frequency graph is constructed for each processing sample as a fault sample, and the fault sample is used as the input of a fault diagnosis system. According to the invention, the student model is adopted to simultaneously learn the multivariate relation between the Softmax output soft label of the teacher model and the output of a plurality of samples in the last pooling layer, namely, the student network learns from two aspects of teacher structure and single sample output in the teacher network, and the classification performance of the fault diagnosis system is effectively improved under the condition of not increasing the memory and the training time. The invention realizes bearing fault diagnosis by using a relation knowledge distillation migration learning method, and effectively reduces the computational complexity by using the idea of replacing a large model with a small model.

Description

Rolling bearing fault diagnosis method and system based on relational knowledge distillation
Technical Field
The invention belongs to the technical field of fault diagnosis, and particularly relates to a rolling bearing fault diagnosis method and system based on relational knowledge distillation.
Background
Rolling bearings are a key component of rotating machinery and are also one of the high failure rate elements, and according to incomplete statistics, 30% of failures of rotating equipment are caused by rolling bearing failures. The condition monitoring and fault diagnosis of the rolling bearing play an important role in knowing the operation performance of equipment and finding potential faults, and the management level and the maintenance efficiency of mechanical equipment can be effectively improved.
At present, a new round of artificial intelligence technology represented by deep learning makes the establishment of an end-to-end deep integrated intelligent fault diagnosis method become a new target in the industrial intelligence era. Compared with the traditional model, the deep learning model has deeper network layers and strong nonlinear computing capability, can better approximate complex function relationship, and has more successful application in the field of fault diagnosis. However, the success of deep learning fault diagnosis depends on a large amount of labeled high-quality data, and in deep learning, when a large-scale data set is trained, the number of network layers is often increased in order to process complex data distribution, and the number of model parameters can reach millions, so that a large amount of computing power and resources are consumed in training to achieve higher accuracy. However, the model is large in scale and high in training cost from the beginning, and is difficult to deploy due to the limitations of computing resources, response speed and the like when actual engineering is deployed.
Patent CN110162018A discloses an incremental equipment fault diagnosis method based on knowledge distillation and hidden layer sharing, and the main ideas are as follows: by using knowledge distillation and hidden layer sharing technologies, the shallow equipment fault diagnosis model is ensured to have better data feature extraction capability, and the fault classification performance of the shallow equipment fault diagnosis model is improved. Aiming at the continuous increase of industrial data and the update of a fault diagnosis model of edge equipment, the incremental learning of the model is realized by using methods such as effective sample identification, data set reconstruction, pre-training model fine adjustment and the like. The method provided by the invention overcomes the requirements on network bandwidth and network delay in the data transmission process of mass real-time industrial equipment, improves the accuracy of the fault diagnosis method of shallow equipment, and supports incremental learning. Through simulation experiments on bearing running state data, under the condition of limited computing resources, the method improves edge cloud cooperative data transmission efficiency, realizes fault prediction classification accuracy, and supports data learning and processing. Patent CN112504678A discloses a motor bearing fault diagnosis method based on knowledge distillation, which has the main ideas: the vibration signal training model is used as a teacher model, the current signal and the rotating speed signal are input as student models, and the student models are trained by using the dark knowledge brought by the teacher model, so that the student models can be stably converged to perform effective diagnosis.
However, the following drawbacks exist: 1) most of the collected time sequence signals are analyzed to extract effective characteristics, for example, the most relevant fault characteristics are extracted through multiple screening such as pretreatment of the collected vibration signals, characteristic screening and the like and are used as the input of a fault classifier, but the manual screening is long in time and effective information is easy to lose; 2) the student model only learns the output of Softmax at the tail end of the teacher model, namely only the performance of a single sample on the teacher model is considered, so that the fault diagnosis accuracy of the student model is low.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a rolling bearing fault diagnosis method and system based on relational knowledge distillation, and aims to improve the real-time response efficiency and accuracy of a fault diagnosis model.
To achieve the above object, according to a first aspect of the present invention, there is provided a rolling bearing failure diagnosis method based on relational knowledge distillation, the method including:
a preparation stage:
acquiring vibration signal sections of a rolling bearing in a normal state and a fault state, and measuring each state for multiple times; taking a plurality of continuous sampling points with the same state type as a processing sample, constructing each processing sample into a time-frequency graph, and taking the < time-frequency graph and the corresponding state type > as training samples to obtain a training sample set; constructing a teacher model-student model;
a training stage:
pre-training a teacher model by using a training sample set; simultaneously inputting a plurality of training samples into a pre-trained teacher model to obtain a plurality of corresponding features output by the last pooling layer of the teacher model, and taking the features as a feature set T;
randomly initializing a student model; simultaneously inputting a plurality of training samples into the initialized student model to obtain a plurality of corresponding features output by the last pooling layer of the student model as a feature set S;
calculating the binary distance and the ternary angle between elements in the feature set T, and calculating the binary distance and the ternary angle between the elements in the feature set S;
constructing distance distillation loss based on binary distances between elements in the feature sets T and S, and constructing angle distillation loss based on ternary angles between elements in the feature sets T and S;
incorporating distance and angle distillation losses into the overall loss function of the entire model;
training a teacher model-student model by taking the minimization of the total loss function as a target to obtain a trained teacher model-student model;
an application stage:
acquiring a vibration signal section of a rolling bearing to be detected, and constructing a time-frequency diagram; inputting the result into a trained student model to obtain a diagnosis result.
Preferably, continuous wavelet analysis is carried out on the normalized one-dimensional vibration signal segment to generate a continuous three-channel wavelet time-frequency map.
Has the advantages that: according to the invention, continuous wavelet analysis is preferably carried out on the normalized one-dimensional vibration signal segment to generate a continuous three-channel wavelet time-frequency map, and the direct generation of the time-frequency map does not need to carry out feature screening on signals, so that the loss of partial information caused by time-frequency domain feature extraction is reduced, and the performance of a fault diagnosis model is improved when a large number of fault samples are processed.
Preferably, the distance distillation loss is constructed based on the binary distances between the elements in the feature sets T and S, specifically as follows:
Figure BDA0003134351460000041
wherein L isRKD-DDenotes the distance distillation loss, xi,xjRespectively represent the ith, jth training sample, χ2Representing a set of binary relations,/δ() Representing the Huber loss function, ΨD(ti,tj) Represents tiAnd tjDistance of (2), ΨD(si,sj) Denotes siAnd sjDistance of (d), ti,tjRespectively representing a plurality of corresponding characteristics, s, of the ith and jth training samples input to the final pooling layer output of the teacher modeli,sjRespectively representing a plurality of corresponding characteristics output by the ith and jth training samples input to the last pooling layer of the student model.
Has the advantages that: the distance distillation loss is preferably constructed in the mode, and the distance difference in the characteristic space is punished to realize the transfer learning of the student model and the teacher model due to the distillation loss in the distance direction, so that the student model is not forced to be directly matched with the output of the teacher model, but the student model is encouraged to learn the distance structure output by the teacher model, and the fault diagnosis performance of the student model is closer to that of the teacher model.
Preferably, the angle distillation loss is constructed based on the ternary angles between the elements in the feature sets T and S, specifically as follows:
Figure BDA0003134351460000042
wherein L isRKD-ADenotes the angular distillation loss, xi,xj,xkRespectively represent the ith, j, k training samples, χ3To representSet of ternary relationships, lδ() Representing the Huber loss function, ti,tj,tkRespectively representing a plurality of corresponding characteristics, s, of the ith, j and k training samples input to the final pooling layer output of the teacher modeli,sj,skRespectively representing a plurality of corresponding characteristics, psi, of the ith, j, k training samples input to the last pooling layer output of the student modelA(ti,tj,tk) Representing teacher model output features ti,tj,tkTernary angular relationship between, ΨA(si,sj,sk) Representing student model output features si,sj,skA ternary angular relationship therebetween.
Has the advantages that: the angle distillation loss is preferably constructed in the mode, the angle distillation loss realizes the transfer learning of the embedding relation of the training samples in the student model and the teacher model by punishing the angle difference in the characteristic space, and because the angle is an attribute with a higher order than the distance, the angle distillation loss can effectively transmit the relation information and provide more flexibility for the student model in the training process, thereby having higher convergence and better performance.
Preferably ΨA(ti,tj,tk)=cos∠titjtk=<eij,ejk>
Figure BDA0003134351460000051
Wherein, titjtkRepresenting a ternary feature ti,tj,tkAngle of formation eijRepresents a vector titiUnitized vector of, ejkRepresents a vector tjtkThe vector of (a) is unitized,<,>represents a vector eij,ejkCosine value of the angle between ti,tj,tkRespectively representing the ith, j and k training samples to be input into the final pooling of the teacher modelA plurality of corresponding features of the layer output.
Has the advantages that: the invention preferably transmits the relation knowledge of the characteristics in the high-order space in the mode, even if the output characteristic dimension is different between the teacher model and the student model, the high-order characteristic angle potential energy is invariable to the low-order characteristic space angle potential energy through the calculation, the high-order potential energy is possibly strong in capturing the high-order structure, but the high-order potential energy is high in calculation cost, and therefore the measurement of the characteristics in the high-order space relation knowledge can be realized under the condition of small calculation amount by using simple and effective ternary angle relation.
Preferably, the total loss function is calculated as follows:
L=α*LKD+β*(ω1*LRKD-D2*LRKD-A)
wherein L isKDDenotes the value of distillation loss, LRKD-DDenotes the distance distillation loss, LRKD-ARepresenting angular distillation loss, alpha, beta representing weight coefficient of various loss values, omega1,ω2Weights for distance distillation loss and angle distillation loss are expressed.
Has the advantages that: the invention preferably calculates the total loss in the above mode, and the student model can learn stronger feature expression capability from the teacher model due to the added punishment on the binary distance distillation loss and the ternary angle distillation loss, thereby improving the fault diagnosis performance of the student model.
To achieve the above object, according to a second aspect of the present invention, there is provided a rolling bearing failure diagnosis system based on relational knowledge distillation, comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium and execute the rolling bearing fault diagnosis method based on the relational knowledge distillation according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) in the existing bearing fault diagnosis method based on knowledge distillation, a student model only learns the output of Softmax at the tail end of a teacher model, namely, the performance of a single sample on the teacher model is only considered, so that the fault diagnosis accuracy of the student model is low. Aiming at the problem, the student model is adopted to simultaneously learn the multivariate relation between the output of Softmax of the teacher model and the output of a plurality of samples at the last pooling layer, namely, the network structure information of the teacher model can be learned, the structure information contained in the network is considered, the input samples of the same mini-batch are cooperatively learned, the student network learns from the two aspects of teacher structure and single sample output in the teacher network, and the classification performance of the fault diagnosis system is effectively improved under the condition of not increasing the memory and the training time.
(2) In the prior art, effective characteristics are extracted by analyzing the acquired time sequence signals, for example, the most relevant fault characteristics are extracted by performing multiple screening such as preprocessing and characteristic screening on the acquired vibration signals and are used as the input of a fault classifier, but the manual screening is long and effective information is easily lost. Aiming at the problem, after the original vibration signals of the rolling bearing are collected, 1000 sampling points are used as a processing sample, a time-frequency graph is constructed for each processing sample and used as a fault sample, the fault sample is used as the input of a teacher model, and the time-frequency graph contains complete time-frequency information of the vibration signals, so that the real-time response efficiency and the accuracy of a fault diagnosis model are improved.
(3) The method realizes bearing fault diagnosis by using a relational knowledge distillation migration learning method, effectively reduces the computational complexity by replacing a large model with a small model, trains a simple model more suitable for actual engineering deployment under the condition of ensuring the precision, and improves the response efficiency of a terminal model.
Drawings
FIG. 1 is a flow chart of a bearing fault diagnosis system based on relationship knowledge distillation provided by the present invention.
Fig. 2 is a schematic diagram of a network structure of a system model according to an embodiment of the present invention.
FIG. 3 is pseudo code of a loss function for batch computation knowledge distillation provided by the present invention.
FIG. 4 is a schematic diagram of a bearing fault diagnosis system based on relationship knowledge distillation provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the present invention provides a rolling bearing fault diagnosis method based on relational knowledge distillation, including:
step 1: the method comprises the steps of collecting and marking a sensor signal installed on a rolling bearing, wherein the signal is a vibration signal capable of reflecting the running characteristics of the bearing, the original tag hardtarget of a data set is a one-hot tag, namely the positive tag is 1, and the negative tag is 0.
Step 2: signal preprocessing and data transformation, namely taking a continuous wavelet time-frequency diagram of an original signal of the rolling bearing as model input, and dividing the generated time-frequency diagram into a training set and a test set.
Specifically, an original vibration signal is selected, the sample length is 1000 data samples, mexh wavelets are used as basic wavelets of continuous wavelet analysis, and a 32 x 3 three-channel wavelet time-frequency graph is generated for each sample and used as a new data set. Randomly breaking the time-frequency graph samples of each category, and selecting 80% of the time-frequency graph samples as a training set and 20% of the time-frequency graph samples as a testing set.
Continuous Wavelet Transform (CWT) is defined as the transformation of an arbitrary space L2The function f (t) of (R) is expanded on the wavelet basis essentially by projecting the time function onto the time-scale phase plane, as expressed by:
Figure BDA0003134351460000071
wherein, WTf(a, tau) is wavelet transform coefficient, < > represents inner product operation, wavelet base psia,τ(t) has two parameters, the scale α and the translation τ, and takes continuously varying values.
Wavelet is the wave phi (t) existing in a small area, and phi (t) epsilon L2(R), if its fourier transform Ψ (ω) satisfies the following equation, ψ (t) is a basic wavelet.
Figure BDA0003134351460000081
The invention selects mexh wavelet as basic wavelet of CWT, and its calculation formula is as follows:
Figure BDA0003134351460000082
Figure BDA0003134351460000083
the mexh wavelet function is a second derivative of a Gaussian function, has good localization in time domain and frequency domain, is used for extracting the edge of a signal and an image, and has no scale function and therefore has no orthogonality.
And step 3: pre-training a teacher model, training a teacher network by using a time-frequency graph training set and a real data label, and storing the optimal model as the teacher model.
Specifically, the teacher model training process is as follows:
step 3.1, establishing a teacher network: as shown in fig. 2, the teacher network is a ResNet-20 with a multi-layer network, and is composed of 19 convolutional layers and 1 fully-connected layer (excluding the pooling layer and the BN layer), wherein three residual Block (Block) structures constitute a layer module. The student network is a ResNet-8 residual network and consists of 7 convolutional layers and 1 full-connection layer, the layer module is only provided with one residual block, the network output adopts a global average pooling layer to replace the full-connection layer of the traditional convolutional neural network, the GAP layer is closely associated with each class, a feature map can be generated for each class, the black box operation of the full-connection layer is avoided, the GAP layer does not have parameters needing to be learned, the parameter number is effectively reduced, and the problem of overfitting is avoided.
The invention uses the residual error network as the backbone frame of the fault diagnosis system, the teacher network adopts ResNet-, the student network adopts ResNet-8 network structure, the network structure is similar, the output dimension is consistent, which is more beneficial to the extraction of the characteristic information, the training is more stable, the problems of degradation and gradient disappearance or explosion of the conventional CNN along with the deepening of the network layer number are solved, and the characteristic extraction capability of the fault diagnosis signal is effectively improved.
Step 3.2 pre-training the teacher model: and (3) inputting the time-frequency graph training set generated in the step (2) into a ResNet-20 network, and comparing the model output with the sample real label to obtain the difference between the model output and the sample real label so as to form a loss function. And continuously updating the weight of the model network by using a back propagation algorithm to reduce the loss function until the model converges, and storing the model with the highest accuracy on the test set as the final model.
And 4, step 4: and (3) the student model learns the relationship structure information of the pre-training teacher model for training, the test set is used for verifying the student model, and the student model is stored as a model for final deployment when the prediction precision reaches the best.
Specifically, the training learning process of the student model is as follows:
step 4.1, calculating a relation knowledge distillation loss function value: one mini-batch time frequency diagram sample { x1,…,xnInputting the data into a pre-trained teacher model and an initialized student model, and respectively outputting the characteristics f of the last pooling layerT,fsAs the feature of the learning structure, calculating the loss function of the relation knowledge distillation
Figure BDA0003134351460000092
And
Figure BDA0003134351460000093
the calculation process is shown in fig. 3.
Step 4.2, calculating a knowledge distillation loss function value: distilling the pre-trained teacher network at high temperature T, calculating the soft target value of the teacher model at the temperature T, and comparing the soft target value with the student model to obtain Losssoft. The calculation formula of the soft label after the knowledge distillation is as follows, wherein a temperature coefficient T is introduced to control the distribution of the soft label, and the original Softmax output value is obtained when T is 1.
Figure BDA0003134351460000091
As shown in FIG. 4, when training a student model, the student network learns the soft object generated by the teacher network with the same T value, approaches the soft object to learn the structural distribution characteristics of the data, and the output is compared with the softtarget to generate Losssoft(ii) a Meanwhile, the student network calculates Softmax to obtain a predicted value, and the predicted value is compared with Hardtarget to generate Losshard. The total Loss function Loss is obtained by the weighted sum of the lambda of the two Loss functions and is a target function, and the Loss calculation formula is as follows:
Loss=λLosssof+(1-λ)Losshard
in order to enable the output of the student model to be closer to that of the teacher model, KL divergence is introduced to measure the output distribution of the two models, distillation is realized by constantly minimizing the KL divergence in the learning process, and at the moment, a total loss function L is obtainedKDThe following steps are changed:
LKD=αT2·KLdiv(Qs,QT)+(1-α).Losshard
in the formula, Qs,QTAnd respectively outputting the Softmax output for the student model and the Softmax output for the teacher model, wherein alpha is an adjusting coefficient.
The student models are trained under supervision of pre-trained teacher models, different layers of output features of the teacher network are recombined into structural information, and the student network learns the structural relationship of the teacher network and single output information to improve model performance. Wherein the structural information uses the binary distance ΨdAnd ternary angle ΨATo achieve this relationship.
ΨdThe characteristic distance of a binary sample in a student network kernel teacher network can be measured, wherein mu is a normalized coefficient of the distance, and the calculation formula is as follows:
Figure BDA0003134351460000101
wherein mu is a normalized coefficient of the distance, and the calculation formula is as follows:
Figure BDA0003134351460000102
loss function of distance distillation at this time
Figure BDA0003134351460000103
Comprises the following steps:
Figure BDA0003134351460000104
wherein lδIs the Huber loss, defined as:
Figure BDA0003134351460000105
ΨAthe method can measure the angular distance of the ternary samples in the student network, the core and teacher network, and the calculation formula is psiA(ti,tj,tk)=co∠titjtk=<eij,ejk>
Wherein the content of the first and second substances,
Figure BDA0003134351460000106
the angular loss function at this time is:
Figure BDA0003134351460000107
step 4.3, calculating a total loss function value of the distillation fault diagnosis system based on the relational knowledge:
Figure BDA0003134351460000108
in the formula, Losshard,LossdivThe loss value of the hard tag in knowledge distillation, KL divergence loss value, T knowledge distillation temperature, gamma, alpha and beta are weight coefficients of each loss function value, and omega1,ω2To adjust the weight of the distance loss and angle loss values.
And 4.4, training a student model, initializing the student model, training a student network by using a training set and a hardtarget, enabling the performance of the student network in the system to be closer to that of a teacher network from two aspects by learning structural information of the teacher network and output information of a single sample in the teacher network, updating a total loss function, updating model parameters according to a gradient updating and error back propagation mode, and storing the model with the highest accuracy on the test set as a finally deployed model.
To further elaborate on the invention, the invention was verified using bearing data of a rotating machine, which was derived from a rolling bearing condition monitoring experiment at the university of Paderborn, germany, which provides condition monitoring of rolling bearing current and vibration signals to collect vibration signals. This time verification experiment chooses acceleration life experimental data among the bearing experiment for use, and this experiment produces real bearing fault data through the acceleration life experiment, can simulate the data in the actual engineering, and the model of training is more suitable for the engineering and uses. The test stand was operated at 1500rpm, with a load torque of 0.7Nm and a radial force F of 1000N acting on the bearing. And selecting 6203 type ball bearing fault data, and dividing the states into an outer ring damage state, an inner ring damage state and a normal state. 20 measurements were recorded for each condition, 4 seconds each. Selecting an original vibration signal, wherein the sample length is 1000 data samples, and generating a 32 x 3 three-channel CWT time-frequency graph of each sample as the CWT time-frequency graphA new data set. Randomly breaking the time-frequency graph samples of each category, and selecting 80% of the time-frequency graph samples as a training set and 20% of the time-frequency graph samples as a testing set. The initial learning rate lr of the teacher network is 0.05, momentum is 0.9, and weight _ decay is 5 e-4; the initial learning rate of the student network is 0.01, an SGD optimization algorithm is selected for iteration, the batch number is 24, the system hyper-parameter is set to be gamma-0.5, alpha-0.3, beta-0.2, and omega1=2,ω25, the distillation temperature T is 4, and the experimental result is shown in Table 1, so that the method effectively improves the classification performance of the fault diagnosis system under the condition of not increasing the memory and the training time.
TABLE 1
Figure BDA0003134351460000111
Figure BDA0003134351460000121
The above-mentioned modules in the system for diagnosing the distillation fault based on the relationship knowledge can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A rolling bearing fault diagnosis method based on relational knowledge distillation is characterized by comprising the following steps:
a preparation stage:
acquiring vibration signal sections of a rolling bearing in a normal state and a fault state, and measuring each state for multiple times; taking a plurality of continuous sampling points with the same state type as a processing sample, constructing each processing sample into a time-frequency graph, and taking the < time-frequency graph and the corresponding state type > as training samples to obtain a training sample set; constructing a teacher model-student model;
a training stage:
pre-training a teacher model by using a training sample set; simultaneously inputting a plurality of training samples into a pre-trained teacher model to obtain a plurality of corresponding features output by the last pooling layer of the teacher model, and taking the features as a feature set T;
randomly initializing a student model; simultaneously inputting a plurality of training samples into the initialized student model to obtain a plurality of corresponding features output by the last pooling layer of the student model as a feature set S;
calculating the binary distance and the ternary angle between elements in the feature set T, and calculating the binary distance and the ternary angle between the elements in the feature set S;
constructing distance distillation loss based on binary distances between elements in the feature sets T and S, and constructing angle distillation loss based on ternary angles between elements in the feature sets T and S;
incorporating distance and angle distillation losses into the overall loss function of the entire model;
training a teacher model-student model by taking the minimization of the total loss function as a target to obtain a trained teacher model-student model;
an application stage:
acquiring a vibration signal section of a rolling bearing to be detected, and constructing a time-frequency diagram; inputting the result into a trained student model to obtain a diagnosis result.
2. The method of claim 1, wherein the normalized one-dimensional vibratory signal segment is subjected to continuous wavelet analysis to generate a continuous three-channel wavelet time-frequency map.
3. The method according to claim 1 or 2, characterized in that the distance distillation loss is constructed on the basis of the binary distances between the elements in the feature sets T and S, in particular as follows:
Figure FDA0003134351450000021
wherein L isRKD-DDenotes the distance distillation loss, xi,xjRespectively represent the ith, jth training sample, χ2Representing a set of binary relations,/δ() Representing the Huber loss function, ΨD(ti,tj) Represents tiAnd tjDistance of (2), ΨD(si,sj) Denotes siAnd sjDistance of (d), ti,tjRespectively representing a plurality of corresponding characteristics, s, of the ith and jth training samples input to the final pooling layer output of the teacher modeli,sjRespectively representing a plurality of corresponding characteristics output by the ith and jth training samples input to the last pooling layer of the student model.
4. The method according to claim 1 or 2, characterized in that the angular distillation loss is constructed on the basis of the ternary angles between the elements in the feature sets T and S, as follows:
Figure FDA0003134351450000022
wherein L isRKD-ADenotes the angular distillation loss, xi,xj,xkRespectively represent the ith, j, k training samples, χ3Representing a set of ternary relationships, lδ() Representing the Huber loss function, ti,tj,tkRespectively representing a plurality of corresponding characteristics, s, of the ith, j and k training samples input to the final pooling layer output of the teacher modeli,sj,skRespectively representing a plurality of corresponding characteristics, psi, of the ith, j, k training samples input to the last pooling layer output of the student modelA(ti,tj,tk) Representing teacher model outputCharacteristic ti,tj,tkTernary angular relationship between, ΨA(si,sj,sk) Representing student model output features si,sj,skA ternary angular relationship therebetween.
5. The method of claim 4,
ΨA(ti,tj,tk)=cos∠titjtk=<eij,ejk>
Figure FDA0003134351450000023
wherein, titjtkRepresenting a ternary feature ti,tj,tkAngle of formation eijRepresents a vector titjUnitized vector of, ejkRepresents a vector tjtkThe vector of (a) is unitized,<,>cosine value, t, representing the angle between vectorsi,tj,tkAnd respectively representing a plurality of corresponding characteristics output by the ith, j and k training samples input to the last pooling layer of the teacher model.
6. The method of claim 1 or 2, wherein the total loss function is calculated as follows:
L=α*LKD+β*(ω1*LRKD-D2*LRKD-A)
wherein L isKDDenotes the value of distillation loss, LRKD-DDenotes the distance distillation loss, LRKD-ARepresenting angular distillation loss, alpha, beta representing weight coefficient of various loss values, omega1,ω2Weights for distance distillation loss and angle distillation loss are expressed.
7. A rolling bearing fault diagnosis system based on relational knowledge distillation is characterized by comprising: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer readable storage medium and executing the rolling bearing fault diagnosis method based on the relational knowledge distillation, which is disclosed by any one of claims 1 to 6.
CN202110716619.2A 2021-06-25 2021-06-25 Rolling bearing fault diagnosis method and system based on relational knowledge distillation Active CN113281048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110716619.2A CN113281048B (en) 2021-06-25 2021-06-25 Rolling bearing fault diagnosis method and system based on relational knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110716619.2A CN113281048B (en) 2021-06-25 2021-06-25 Rolling bearing fault diagnosis method and system based on relational knowledge distillation

Publications (2)

Publication Number Publication Date
CN113281048A true CN113281048A (en) 2021-08-20
CN113281048B CN113281048B (en) 2022-03-29

Family

ID=77285707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110716619.2A Active CN113281048B (en) 2021-06-25 2021-06-25 Rolling bearing fault diagnosis method and system based on relational knowledge distillation

Country Status (1)

Country Link
CN (1) CN113281048B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849641A (en) * 2021-09-26 2021-12-28 中山大学 Knowledge distillation method and system for cross-domain hierarchical relationship
CN114092918A (en) * 2022-01-11 2022-02-25 深圳佑驾创新科技有限公司 Model training method, device, equipment and storage medium
CN114152441A (en) * 2021-12-13 2022-03-08 山东大学 Rolling bearing fault diagnosis method and system based on shift window converter network
CN114429153A (en) * 2021-12-31 2022-05-03 苏州大学 Lifetime learning-based gearbox increment fault diagnosis method and system
CN114722886A (en) * 2022-06-10 2022-07-08 四川大学 Knowledge distillation-based crankshaft internal defect detection method and detection equipment
WO2022217853A1 (en) * 2021-04-16 2022-10-20 Huawei Technologies Co., Ltd. Methods, devices and media for improving knowledge distillation using intermediate representations
CN116110022A (en) * 2022-12-10 2023-05-12 河南工业大学 Lightweight traffic sign detection method and system based on response knowledge distillation
CN116189874A (en) * 2023-03-03 2023-05-30 海南大学 Telemedicine system data sharing method based on federal learning and federation chain

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110162018A (en) * 2019-05-31 2019-08-23 天津开发区精诺瀚海数据科技有限公司 The increment type equipment fault diagnosis method that knowledge based distillation is shared with hidden layer
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112504678A (en) * 2020-11-12 2021-03-16 重庆科技学院 Motor bearing fault diagnosis method based on knowledge distillation
CN112560693A (en) * 2020-12-17 2021-03-26 华中科技大学 Highway foreign matter identification method and system based on deep learning target detection
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
CN112712052A (en) * 2021-01-13 2021-04-27 安徽水天信息科技有限公司 Method for detecting and identifying weak target in airport panoramic video

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110162018A (en) * 2019-05-31 2019-08-23 天津开发区精诺瀚海数据科技有限公司 The increment type equipment fault diagnosis method that knowledge based distillation is shared with hidden layer
CN111199242A (en) * 2019-12-18 2020-05-26 浙江工业大学 Image increment learning method based on dynamic correction vector
CN112504678A (en) * 2020-11-12 2021-03-16 重庆科技学院 Motor bearing fault diagnosis method based on knowledge distillation
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
CN112560693A (en) * 2020-12-17 2021-03-26 华中科技大学 Highway foreign matter identification method and system based on deep learning target detection
CN112712052A (en) * 2021-01-13 2021-04-27 安徽水天信息科技有限公司 Method for detecting and identifying weak target in airport panoramic video

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WONPYO PARK: "Relational Knowledge Distillation", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
YIWEI CHENG: "Intelligent fault diagnosis of rotating machinery based on continuous wavelet transform-local binary convolutional neural network", 《KNOWLEDGE-BASED SYSTEMS》 *
袁泽昊: "基于特征知识蒸馏的人体姿态估计", 《软件》 *
高钦泉: "基于知识蒸馏的超分辨率卷积神经网络压缩方法", 《计算机应用》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022217853A1 (en) * 2021-04-16 2022-10-20 Huawei Technologies Co., Ltd. Methods, devices and media for improving knowledge distillation using intermediate representations
CN113849641A (en) * 2021-09-26 2021-12-28 中山大学 Knowledge distillation method and system for cross-domain hierarchical relationship
CN113849641B (en) * 2021-09-26 2023-10-24 中山大学 Knowledge distillation method and system for cross-domain hierarchical relationship
CN114152441A (en) * 2021-12-13 2022-03-08 山东大学 Rolling bearing fault diagnosis method and system based on shift window converter network
CN114429153A (en) * 2021-12-31 2022-05-03 苏州大学 Lifetime learning-based gearbox increment fault diagnosis method and system
CN114429153B (en) * 2021-12-31 2023-04-28 苏州大学 Gear box increment fault diagnosis method and system based on life learning
CN114092918A (en) * 2022-01-11 2022-02-25 深圳佑驾创新科技有限公司 Model training method, device, equipment and storage medium
CN114722886A (en) * 2022-06-10 2022-07-08 四川大学 Knowledge distillation-based crankshaft internal defect detection method and detection equipment
CN116110022A (en) * 2022-12-10 2023-05-12 河南工业大学 Lightweight traffic sign detection method and system based on response knowledge distillation
CN116110022B (en) * 2022-12-10 2023-09-05 河南工业大学 Lightweight traffic sign detection method and system based on response knowledge distillation
CN116189874A (en) * 2023-03-03 2023-05-30 海南大学 Telemedicine system data sharing method based on federal learning and federation chain
CN116189874B (en) * 2023-03-03 2023-11-28 海南大学 Telemedicine system data sharing method based on federal learning and federation chain

Also Published As

Publication number Publication date
CN113281048B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN113281048B (en) Rolling bearing fault diagnosis method and system based on relational knowledge distillation
CN109580215B (en) Wind power transmission system fault diagnosis method based on deep generation countermeasure network
Li et al. Self-attention ConvLSTM and its application in RUL prediction of rolling bearings
Han et al. Intelligent fault diagnosis method for rotating machinery via dictionary learning and sparse representation-based classification
CN110427654B (en) Landslide prediction model construction method and system based on sensitive state
Maschler et al. Continual learning of fault prediction for turbofan engines using deep learning with elastic weight consolidation
CN111458142A (en) Sliding bearing fault diagnosis method based on generation of countermeasure network and convolutional neural network
CN114549925A (en) Sea wave effective wave height time sequence prediction method based on deep learning
CN114004252A (en) Bearing fault diagnosis method, device and equipment
CN113255432B (en) Turbine vibration fault diagnosis method based on deep neural network and manifold alignment
Tian et al. A multilevel convolutional recurrent neural network for blade icing detection of wind turbine
CN113076920B (en) Intelligent fault diagnosis method based on asymmetric domain confrontation self-adaptive model
CN116451150A (en) Equipment fault diagnosis method based on semi-supervised small sample
CN114048688A (en) Method for predicting service life of bearing of wind power generator
Stojanovic et al. Semi-supervised learning for structured regression on partially observed attributed graphs
CN112784920A (en) Cloud-side-end-coordinated dual-anti-domain self-adaptive fault diagnosis method for rotating part
Du et al. DCGAN based data generation for process monitoring
CN117708656B (en) Rolling bearing cross-domain fault diagnosis method for single source domain
CN114972904B (en) Zero sample knowledge distillation method and system based on fighting against triplet loss
Du et al. Convolutional neural network-based data anomaly detection considering class imbalance with limited data
Djaballah et al. Deep transfer learning for bearing fault diagnosis using CWT time–frequency images and convolutional neural networks
CN117113078A (en) Small sample bearing fault mode identification method and system based on multi-source data integration
CN116399592A (en) Bearing fault diagnosis method based on channel attention dual-path feature extraction
Long et al. A customized meta-learning framework for diagnosing new faults from unseen working conditions with few labeled data
CN115423045A (en) System log detection method and system based on GAN network and meta learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant