CN110751207B

CN110751207B - Fault diagnosis method for anti-migration learning based on deep convolution domain

Info

Publication number: CN110751207B
Application number: CN201910991173.7A
Authority: CN
Inventors: 李锋; 唐拓江; 韩国良
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2022-08-05
Anticipated expiration: 2039-10-18
Also published as: CN110751207A

Abstract

The invention discloses a fault diagnosis method for anti-migration learning based on a depth convolution domain, which utilizes a depth convolution residual error feature extractor to extract high-level features in DCDATL, thereby improving the convergence and nonlinear approximation capability of DCDATL; the characteristic joint distribution representation is obtained through the obtained Crohn's product of the high-level characteristics and the label information and is embedded into the domain classifier, and the domain confrontation training is carried out to improve the migration performance of the DCDATL; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function improves the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample under the historical working condition under the condition that the labeled sample does not exist under the current working condition of the rotating machine.

Description

Fault diagnosis method for anti-migration learning based on deep convolution domain

Technical Field

The invention belongs to the technical field of fault diagnosis methods for rotary machines, and particularly relates to a fault diagnosis method based on deep convolution domain anti-migration learning.

Background

The rotary mechanical equipment is widely applied to industries such as metallurgy, aviation, traffic, chemical engineering, energy and the like, and the mechanical structure of the rotary mechanical equipment tends to be developed in the directions of large-scale, heavy-duty, precision, high-speed and the like. The condition monitoring and fault diagnosis of the system are important means for ensuring the long-term safe and reliable operation of modern industrial equipment.

In the whole service process, the rotary machine usually works in a variable working condition industrial environment, so that the fault occurrence probability is high and the rotary machine is easy to damage. Under the environment of variable working conditions (such as different rotating speeds and different loads), the problem that the state data under the current actual working condition is difficult to directly monitor exists, so that the sample data of the known fault type (namely the sample data of a target domain with a label) of the rotating machinery under the current actual working condition is less because of difficult acquisition; in addition, under the condition of variable working conditions, the distribution characteristics of historical training fault data (namely, sample data with labels in the auxiliary domain) and target domain data obtained under the condition of the previous working conditions are different and drift. Although research on fault diagnosis of rotating machines under variable operating conditions has received some attention in recent years, many researchers and research institutes have attempted to solve this problem by using conventional Machine learning methods such as Support Vector Machines (SVMs), Nearest Neighbor Classifiers (NNCs), Automatic Encoders (AEs), and the like. However, in the case of the distribution difference between the training samples and the testing samples, the generalization capability of the established classification model is poor, and therefore, the method is not suitable for fault diagnosis under the variable working condition.

The transfer learning theory developed in recent years provides a brand new solution for fault diagnosis of the rotating machinery under the condition of variable working conditions. The migration learning does not need to make the same distribution assumption of training samples and test samples as the traditional machine learning, and the main idea is to learn knowledge from the existing auxiliary domain and then migrate the knowledge to the target domain to complete the classification of the target domain. Currently, the transfer learning research for fault diagnosis of rotating machinery is just started, and only a few research cases exist. Such as: shenfei and the like adopt a feature extraction method based on an autocorrelation matrix SVD and are combined with transfer learning to be used for fault diagnosis of the motor bearing; the best and the like adopt an improved least square Support Vector Machine (SVM) with transfer learning capacity for spherical bearing fault diagnosis; the health conservation and the like adopt Transfer Component Analysis (TCA) for diagnosing the fault of the rolling bearing under variable working conditions. However, the above migration learning methods require that the auxiliary domain and the target domain samples have higher similarity, and if the similarity between the auxiliary domain and the target domain samples is lower, the classification accuracy will decline, and the problem of "negative migration" will occur.

Disclosure of Invention

Aiming at the defects in the prior art, the fault diagnosis method based on the deep convolution domain pair anti-migration learning solves the problem that the fault diagnosis accuracy of the current sample to be detected is low due to the fact that a large number of labeled samples (namely samples with known fault types) of the rotary machine are difficult to obtain in the conventional fault diagnosis method of the rotary machine, so that the labeled samples under the current working condition are few.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: the fault diagnosis method for the anti-migration learning based on the deep convolution domain comprises the following steps:

s1, performing segmented preprocessing on each rotating machine sample of the auxiliary domain and the target domain respectively to obtain corresponding preprocessing results;

s2, inputting the two preprocessing results serving as input samples into the deep convolution domain anti-migration learning model respectively to obtain a domain anti-loss function of a domain classifier in the deep convolution domain anti-migration learning model and a joint loss function of a label predictor;

s3, calculating a joint distribution domain confrontation total loss function of the deep convolution domain confrontation migration learning model based on the domain confrontation loss function of the domain classifier and the joint loss function of the label predictor, and training the deep convolution domain confrontation migration learning model;

and S4, inputting the rotating machinery to-be-detected sample in the target domain into the trained deep convolution domain anti-migration learning model, and predicting the rotating machinery to-be-detected sample class label to realize rotating machinery fault diagnosis.

Further, the segmentation preprocessing in step S1 is to perform normalization and array reconstruction processing on each rotating machine sample in turn.

Further, the deep convolution domain confrontation learning model in the step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier;

the deep convolution residual error feature extractor is respectively connected with a label predictor and a domain classifier, and the label predictor is connected with the domain classifier;

the deep convolution residual error feature extractor comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit comprises a first convolution layer, a first residual error layer, a first average pooling layer and a first full-connection layer which are sequentially connected, and the second feature extraction unit comprises a second convolution layer, a second residual error layer, a second average pooling layer and a second full-connection layer which are sequentially connected;

the first residual layer and the second residual layer respectively comprise a plurality of residual blocks which are connected in a stacked mode, each residual block comprises a first weight layer, a first relu activation function, a second weight layer, an accumulator and a second relu activation function which are sequentially connected, the input end of the first weight layer is further connected with the input end of the accumulator, and the output end of the second relu activation function is connected with the input end of the next residual block, the input end of the first averaging pooling layer or the input end of the second averaging pooling layer;

the label predictor comprises a first label prediction unit and a second label prediction unit, wherein the first label prediction unit and the second label prediction unit share parameters, and the first label prediction unit comprises a third full-link layer, a fourth full-link layer and a first softmax activation function which are sequentially connected; the second label prediction unit comprises a fifth full-link layer, a sixth full-link layer and a second softmax activation function which are connected in sequence;

the domain classifier comprises a gradient inversion layer, wherein a first output end, a seventh full connection layer, an eighth full connection layer and a first sigmoid activation function of the gradient inversion layer are sequentially connected, and a second output end, a ninth full connection layer, a tenth full connection layer and a second sigmoid activation function of the gradient inversion layer are sequentially connected;

the output end of the first full connection layer is connected with the input end of the third full connection layer, the output end of the first softmax activation function and the first input end of the gradient inversion layer respectively, and the output end of the second full connection layer is connected with the input end of the fifth full connection layer and the second input end of the gradient inversion layer respectively.

Further, the step S2 is specifically:

s21, extracting the high-level characteristics of the auxiliary domain sample corresponding to the preprocessed auxiliary domain rotating mechanical sample and the high-level characteristics of the target domain sample corresponding to the target domain rotating mechanical sample through a deep convolution residual error characteristic extractor, and inputting the high-level characteristics of the auxiliary domain sample and the target domain sample into a label predictor;

s22, training the label predictor according to the high-level characteristics of the auxiliary domain samples input into the label predictor;

s23, processing the high-level characteristics of the input target domain sample through the trained label predictor to obtain a joint loss function of the target domain sample pseudo label and the label predictor;

s24, calculating a kronecker product of the high-level features of the auxiliary domain samples and the labels thereof and a kronecker product of the high-level features of the target domain samples and the corresponding pseudo labels thereof, obtaining a feature joint distribution representation, and inputting the feature joint distribution representation into the domain classifier;

and S25, performing domain confrontation training on the domain classifier based on the feature joint distribution representation input into the domain classifier, and obtaining a domain confrontation loss function of the domain classifier.

Further, the step S21 is specifically:

a1, setting input to depth convolution residual error feature extractionThe auxiliary field rotating mechanical sample of the extractor is

The target domain rotating mechanical sample is

The output of the first convolution layer and the second convolution layer is x ^s (l) And x ^l (l) Respectively taking the residual error signals as the input of a first residual error layer and a second residual error layer;

a2, setting the number of residual blocks in the first residual layer to L, then the output x of the first residual layer ^s (L +1) is:

wherein F (-) is a residual function;

W ⁽ⁱ⁾ the weights of the ith residual block in the first residual layer are respectively, and i is L, L is 1,2, 3.. L, L is the number of the residual block;

a3, sequentially carrying out average pooling and full-connection processing on the output of the first residual error layer through the first average pooling layer and the first full-connection layer to obtain the high-level characteristic f of the auxiliary domain sample output by the first full-connection layer ^s Comprises the following steps:

in the formula, G _f (. to) is the overall mapping function of the deep convolution residual feature extractor;

θ _f a set of weight parameters and bias coefficients for a depth convolution residual feature extractor;

a4, obtaining a target domain sample high-level feature f of the second full-connection-layer output based on the steps A2-A3 ^t Comprises the following steps:

f ^t ＝G _f (x _i ^t ,θ _f )。

further, the label in the step S23The joint loss function of the predictor comprises an auxiliary domain label loss function

And target domain label loss function

The auxiliary domain label predictor function

Comprises the following steps:

in the formula, N _S The total number of the original rotating mechanical samples in the auxiliary domain;

is a logarithmic loss function;

G _S (. is the overall mapping function of the label predictor;

θ _l the label predictor is a set of weight parameters and bias coefficients of each layer;

rotating the label of the mechanical specimen for the auxiliary field;

the target domain label predictor loss function

Comprises the following steps:

in the formula, N _t The total number of original rotating machinery samples of the target domain is obtained;

c is the total number of categories of the target domain rotating machinery samples;

the entropy is distributed for the class conditions,

rotating a pseudo label of a mechanical specimen for a target domain, wherein,

further, the step S25 is specifically:

kronecker product of high-level features of auxiliary domain samples and labels thereof

Performing Crohn's product with the high-level features of the target domain samples and the pseudo labels thereof

Inputting the combined distribution as a characteristic into a domain classifier, enabling a deep convolution residual error characteristic extractor and the domain classifier to form an anti-collision network through a gradient inversion layer, reducing the combined distribution difference between the high-level characteristics of the auxiliary domain sample and the high-level characteristics of the target domain sample by using a domain anti-collision training method, and obtaining a domain anti-collision loss function of the domain classifier;

in the formula (I), the compound is shown in the specification,

is the kronecker product operator.

Further, in the step S25, in training the countermeasure network with the domain countermeasure:

due to the existence of the gradient inversion layer, errors of the predicted value and the true value are transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated through each layer of the network in the domain classifier according to the transmitted errors;

when the error is transmitted to the gradient inversion layer, multiplying the error by a parameter-lambda, and then transmitting the error to a depth convolution residual error feature extractor connected with the error, so that the training purposes of the depth convolution residual error feature extractor and a domain classifier are opposite, and the domain confrontation training is realized;

in determining the domain opposition loss function of the domain classifier:

the gradient inversion layer f (v) in the domain classifier is defined as:

g(f(v),λ)＝f(v)

wherein g (-) is a gradient inversion layer mapping function;

v is an independent variable input by the gradient inversion layer;

lambda is a gradient inversion layer superparameter;

the gradient of the gradient inversion layer in the domain classifier is obtained as:

updating the parameter set θ of the depth convolution residual feature extractor by a standard back propagation algorithm due to the presence of the gradient inversion layer _f Parameter set θ of sum-domain classifier _d Obtaining an auxiliary domain class label output by the domain classifier

And target domain category label

Respectively as follows:

in the formula (I), the compound is shown in the specification,

and

an auxiliary domain class label and a target domain class label which are respectively output by the domain classifier in the prediction of the auxiliary domain and the target domain;

G _d () a domain classifier mapping function;

setting an auxiliary domain original rotating machine sample and a target domain original rotating machine sample as

And

deriving a domain-fighting loss function for a domain classifier

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

a sample domain classifier loss function for the auxiliary domain;

a classifier loss function is set for the target domain sample domain;

N _S the total number of the original rotating mechanical samples in the auxiliary domain;

N _t the total number of original rotating machinery samples of the target domain is obtained;

are decision variables that are domain class labels, and, when auxiliary domain class labels,

when it is the eyeWhen the domain category label is tagged,

further, in the step S3, a joint distribution domain antagonistic total loss function of the deep convolution antagonistic migration learning model is calculated, and the optimal parameter θ, which is the parameter that minimizes the value of the total loss function, is found by using the stochastic gradient descent algorithm ^* _f 、θ ^* _d And theta ^* _l Completing the training of the anti-migration learning model of the deep convolution domain;

the joint distribution domain opposes an overall loss function

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

and alpha and beta are balance constraint parameters which are joint loss functions of the label predictor and are respectively used for constraining local optimization behaviors of the label predictor and the domain classifier.

Further, in step S4, the rotating machine to-be-measured sample in the target domain is input into the trained deep convolution domain anti-migration learning model, and the class label of the rotating machine to-be-measured sample is predicted to obtain the class prediction label y thereof _i ^t Fault diagnosis of the rotary machine is realized;

the class prediction label y _i ^t The expression of (a) is:

the invention has the beneficial effects that:

according to the fault diagnosis method based on the deep convolution domain pair anti-migration learning, disclosed by the invention, in the DCDATL, the high-level feature extraction is carried out by utilizing the deep convolution residual feature extractor, so that the gradient problems such as gradient disappearance, gradient divergence and the like in the DCDATL training process can be avoided, and the convergence and nonlinear approximation capability of the DCDATL can be improved; obtaining feature joint distribution representation through the obtained Crohn's product of the high-level features and the label information, embedding the feature joint distribution representation into a domain classifier, performing the countertraining of an auxiliary domain and a target domain in the domain classifier by utilizing gradient inversion, and performing the domain countertraining through the feature joint distribution of the labeled samples of the auxiliary domain and the unlabeled samples of the target domain to enhance the domain adaptability of the labeled samples of the auxiliary domain to the target domain, so that the domain invariant features of the auxiliary domain and the target domain are obtained through learning, and the migration performance of the DCDATL is improved; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function can improve the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample (i.e., the labeled sample in the auxiliary domain) under the historical working condition under the condition that the labeled sample (i.e., the labeled sample in the target domain) does not exist under the current working condition of the rotating machine.

Drawings

FIG. 1 is a flow chart of an implementation of the fault diagnosis method based on deep convolution domain anti-migration learning according to the present invention.

FIG. 2 is a diagram of the DCDATL model architecture provided by the present invention.

Fig. 3 is a diagram of a residual block provided by the present invention.

Fig. 4 is a real object diagram of a rolling bearing fault simulation experiment table in the embodiment of the invention.

FIG. 5 shows the comparison result of the outer ring crack fault diagnosis accuracy in the embodiment of the invention.

FIG. 6 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.

FIG. 7 shows the comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the invention.

Fig. 8 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.

Fig. 9 is a comparison result of the outer ring crack fault diagnosis accuracy in the embodiment provided by the invention.

FIG. 10 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.

FIG. 11 is a comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the invention.

Fig. 12 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.

FIG. 13 shows the comparison result of the outer ring crack fault diagnosis accuracy in the embodiment of the present invention.

FIG. 14 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.

FIG. 15 shows the comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the present invention.

Fig. 16 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, the fault diagnosis method for anti-migration learning based on the deep convolution domain includes the following steps:

s2, inputting the two preprocessing results serving as input samples into a deep convolution domain antitransport learning model (DCDATL) respectively to obtain a domain antitransport loss function of a domain classifier in the deep convolution domain antitransport learning model and a joint loss function of a label predictor; (ii) a

The segmentation preprocessing in step S1 is to perform normalization and array reconstruction processing for each rotating machine sample in turn.

As shown in fig. 2, the deep convolution domain antagonistic learning model in step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier, and migration and classification of fault features are completed through three processes of extracting high-level features by the deep convolution residual feature extractor, performing domain antagonistic training with the joint distribution of auxiliary domain labeled samples and target domain unlabeled samples, and minimizing the overall loss function of the domain classifier and the label predictor (i.e., minimizing the joint distribution domain antagonistic loss overall function of DCDATL). Specifically, the deep convolution residual feature extractor in the DCDATL in fig. 2 is connected to the label predictor and the domain classifier respectively, and the label predictor is connected to the domain classifier;

the depth convolution residual error feature extractor comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit comprises a first convolution layer, a first residual error layer, a first average pooling layer and a first full-connection layer which are sequentially connected, and the second feature extraction unit comprises a second convolution layer, a second residual error layer, a second average pooling layer and a second full-connection layer which are sequentially connected;

as shown in fig. 3, each of the first residual layer and the second residual layer includes a plurality of residual blocks connected in a stacked manner, each of the residual blocks includes a first weight layer, a first relu activation function, a second weight layer, an accumulator, and a second relu activation function connected in sequence, an input end of the first weight layer is further connected to an input end of the accumulator, and an output end of the second relu activation function is connected to an input end of a next residual block, an input end of the first averaging pooling layer, or an input end of the second averaging pooling layer;

the label predictor comprises a first label prediction unit and a second label prediction unit, wherein the first label prediction unit and the second label prediction unit share parameters, and the first label prediction unit comprises a third full-link layer, a fourth full-link layer and a first softmax activation function which are sequentially connected; the second label prediction unit comprises a fifth full connection layer, a sixth full connection layer and a second softmax activation function which are connected in sequence; the parameters shared by the first label prediction unit and the second label prediction unit comprise the same corresponding weight and bias of each layer, namely the parameters of a third layer full-connection layer and a fifth layer full-connection layer are the same, and the parameters of a fourth layer full-connection layer and a sixth layer full-connection layer are the same;

the domain classifier comprises a gradient inversion layer, a first output end, a seventh full connection layer, an eighth full connection layer and a first sigmoid activation sequential function of the gradient inversion layer are connected, and a second output end of the gradient inversion layer is respectively connected with a ninth full connection layer, a tenth full connection layer and a second sigmoid activation sequential function in sequence;

The DCDATL theoretical advantage enables the DCDATL-based rotary machine fault diagnosis method to carry out high-precision fault diagnosis on the rotary machine current to-be-detected sample by utilizing the labeled sample (auxiliary domain labeled sample) under the historical working condition under the condition that the labeled sample (namely the labeled sample in the target domain) does not exist under the current working condition of the rotary machine.

As shown in fig. 2, when performing a specific fault diagnosis based on the theoretical model of DCDATL, the step S2 is specifically:

In the step S21, when the deep convolutional Residual feature extractor is used to extract the high-level features, two Residual layer forming Residual networks (ResNet) construct the deep network in the deep convolutional Residual feature extractor by introducing a Residual block (Residual block), where the structure of the Residual block is shown in fig. 3, x is input, h (x) is output, f (x) is a Residual mapping function, and the weight layer is a convolutional layer.

The basic constraint for constructing a deep network is that the error of the stacked network model should not be higher than that of the underlying shallow model (a shallow model contains a residual block). The deep model is constructed by using an identity mapping method, namely, H (x) x represents the optimal mapping of a group of stacked network layers. When the network is deep, the model is difficult to directly fit the actual mapping h (x), the residual network converts the mapping h (x) problem into the fitting residual mapping f (x) by introducing a "shortcut connection, and the actual mapping h (x) is expressed as:

H(x)＝F(x)+x (1)

at this time, the residual network model only needs to minimize the residual function f (x) ═ h (x) -x to approximate the actual mapping to solve the performance degradation problem of the network layer stack.

Therefore, the step S21 is specifically:

a1, setting the auxiliary domain rotating machine sample input to the deep convolution residual error feature extractor as

The target domain rotating mechanical sample is

wherein F (·) is a residual function;

as shown in the formula (2), the residual error network is characterized by the summation of residual error characteristics of each layer, and the l +1 layer is ensured to have more characteristic information than the l layer. In the back propagation process, according to the chain derivative rule, the gradient calculation formula of the error loss term loss to the lth residual block at the front end of the network is as follows:

the formula (3) shows that the gradient of the deepest layer L can be directly transmitted to any network layer L which is shallower than the deepest layer L, and the gradient of the error loss term loss to each residual block is larger than or equal to 1 when the error loss term loss is propagated reversely, so that the problems of gradient disappearance and gradient divergence can be avoided.

A3, average pooling the output of the first residual layer by the first average pooling layer and the first full-link layerAnd full connection processing is carried out to obtain the high-level characteristic f of the auxiliary domain sample output by the first full connection layer ^s Comprises the following steps:

f ^t ＝G _f (x _i ^t ,θ _f ) (5)

the joint loss function of the label predictor in the above step S23 includes an auxiliary domain label loss function

And target domain label loss function

Auxiliary domain label predictor function

Comprises the following steps:

is a logarithmic loss function;

G _S (. is the overall mapping function of the label predictor;

rotating the label of the mechanical specimen for the auxiliary field;

in order to enable the label predictor to better fit the distribution of the target sample, the invention takes the cross entropy of the minimum class conditional probability distribution as the target and defines the loss function of the label predictor of the target domain by combining the sample to be detected of the target domain

Comprises the following steps:

the entropy is distributed for the class conditions,

rotating a pseudo label of a mechanical specimen for a target field, wherein

By minimizing the class condition distribution entropy, the label predictor can more accurately predict the class label of the sample to be detected in the target domain.

The step S25 is specifically:

As a characteristic joint distribution (

Kronecker operator) is input into a domain classifier, a deep convolution residual error feature extractor and the domain classifier form an anti-network through a gradient inversion layer, the joint distribution difference between the high-level features of the auxiliary domain samples and the high-level features of the target domain samples is reduced by a domain anti-training method, and a domain anti-loss function of the domain classifier is obtained. The deep convolution domain residual error feature extractor tries to extract features which cannot be distinguished by a domain classifier from samples of an auxiliary domain and a target domain, however, the domain classifier is trained to judge whether information comes from the auxiliary domain or the target domain as much as possible, and therefore the deep convolution residual error feature extractor and the domain classifier form a confrontation relation.

Therefore, in the above step S25, in training the countermeasure network with the domain countermeasure:

due to the existence of a Gradient Reversal Layer (GRL), the countermeasure network is easy to train, errors of a predicted value and a true value are transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated according to the transmitted errors through each layer of the domain classifier;

in determining the domain opposition loss function of the domain classifier:

the gradient inversion layer in the domain classifier is defined as:

g(f(v),λ)＝f(v) (8)

wherein g (-) is a gradient inversion layer mapping function;

v is an independent variable input by the gradient inversion layer;

lambda is a gradient inversion layer superparameter;

And target domain category label

Respectively as follows:

in the formula (I), the compound is shown in the specification,

and

G _d () a domain classifier mapping function;

And

deriving a domain-fighting loss function for a domain classifier

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

a sample domain classifier loss function for the auxiliary domain;

a classifier loss function is set for the target domain sample domain;

when it is the target domain category label,

domain penalty function in a minimized domain classifier

And in the process of updating parameters through a back propagation algorithm, when the error between the predicted value and the true value of the domain classifier is transmitted to the gradient inversion layer, the error is multiplied by a negative number-lambda, so that the purposes of training the deep convolution residual error feature extractor and the domain classifier are opposite, and the domain confrontation training is realized. Thus, the features are represented in a joint distribution

And

embedded domain classifier, guaranteed by domain countermeasure training as shown in equations (8) - (12)

The high-level feature joint probability distribution of the auxiliary domain samples and the target domain samples extracted by the deep convolution residual error feature extractor is similar, namely the auxiliary domain labeled samples and the target domain unlabeled samples (namely the samples to be detected) are mapped to the shared feature space with the minimum domain difference, so that the domain invariant features of the auxiliary domain and the target domain are learned, and the first half of the process of the transfer learning is completed. The above process of performing domain confrontation training by using the joint distribution of inter-domain samples enhances the domain adaptability of the labeled samples of the auxiliary domain to the target domain, so that the subsequent migration learning process of DCDATL can be better performed.

In the above step S3, the joint distribution domain antagonistic total loss function of the deep convolution domain antagonistic migration learning model is calculated, and the optimal parameter θ, which minimizes the value of the total loss function, is found using the stochastic gradient descent algorithm ^* _f 、θ ^* _d And theta ^* _l Completing the training of the anti-migration learning model of the deep convolution domain;

joint distributed domain antagonistic global loss function

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

a joint loss function that is a label predictor;

from the foregoing equation, it can be further derived that the overall loss function is:

in the formula, both alpha and beta are balance constraint parameters which are respectively used for constraining local optimizing behaviors of the label predictor and the domain classifier.

Finally, a random gradient descent algorithm is used to find the optimal parameter θ that minimizes the value of equation (14) (i.e., minimizes the overall loss) ^* _f 、θ ^* _d And theta ^* _l And finishing the training of DCDATL.

In step S4, the target domain sample to be tested is input into the trained DCDATL to obtain the class prediction label y thereof _i ^t And fault diagnosis of the rotating machine is realized, and the process can be expressed by the following formula.

Based on the above process, the second half of the DCDATL migration learning process aims at minimizing the DCDATL joint distribution domain confrontation total loss function, so that the domain confrontation purpose is achieved, the feature migration is completed, and the high precision of the label predictor for the target domain to-be-detected sample class label prediction is ensured.

In one embodiment of the invention, the fault of the rotating machinery by adopting the method is aimed at an experimental process:

1. experimental apparatus:

the experiment adopts bearing data collected by a rolling bearing (a rolling bearing is a typical rotating machine) fault simulation test bed of an electrical engineering laboratory of the university of Kaiser storage. The test stand, as shown in fig. 4, includes a motor, a torque sensor/encoder, and the like. A rolling bearing (model number SKF6205-2RS) at the driving end to be detected supports a rotating shaft of the motor. In the laboratory, small grooves with the diameter of 0.3556mm and the depth of 0.279mm are respectively machined on the outer ring, the rolling body and the inner ring of 3 rolling bearings in an electric spark machining mode to simulate single-point cracks of the outer ring, the rolling body and the inner ring of the rolling bearing. Motor power and speed are measured by a torque sensor/encoder and manually recorded. An acceleration sensor is installed on a bearing seat at a driving end, vibration acceleration signals of the rolling bearing under different working conditions (namely different rotating speeds and different loads) monitored by the acceleration sensor and caused by faults are collected through a signal collector, and the sampling frequency is 12 kHz. Every 1024 consecutive vibration acceleration data points are taken as a sample. Respectively recording the working conditions of 1772r/min and 1 horsepower load as working conditions A; working condition B is adopted under the conditions of rotation speed of 1750r/min and 2 horsepower load; the rotating speed is 1730r/min, and the working condition is C under the load condition of 3 horsepower. About 110 fault samples of the outer ring, the rolling body and the inner ring under each working condition are collected respectively. And performing segmentation preprocessing on each sample to obtain a corresponding 32 × 32 matrix as an input sample of the DCDATL.

DCDATL network architecture design and parameter set-up:

the DCDATL network structure is designed as follows:

(1) designing a first layer of a depth convolution residual error feature extractor network as a common convolution layer, a second layer as a1 st residual error layer, a third layer as a2 nd residual error layer, a fourth layer as a3 rd residual error layer, a fifth layer as an average pooling layer and a sixth layer as a full connection layer. The specific configuration is shown in table 1:

table 1 depth residual convolution feature extractor network structure table

{3 × 3,16 in table 1; 3 × 3,16 represents two layers of weight layer connection to form a residual block, where 3 × 3 represents the convolution kernel size of the weight layer, and 16 is the number of feature maps output by the weight layer, {3 × 3, 16; 3 × 3,16} × 3 indicates that one residual layer is composed of three residual blocks.

(2) The structure of the label predictor is designed as shown in table 2, the first layer is a fully-connected layer connected with the depth residual convolution feature extractor, the activation function of the layer is a linear correction unit function, the second layer is a maximum soft output layer of the label predictor, and the activation function of the layer is softmax.

Table 2 label predictor network architecture table

(3) The structure of the domain classifier is designed as shown in table 3, the first layer is a full-link layer, the activation function of the layer is a linear modification unit function, the linear modification unit function is connected with the full-link layer of the depth residual convolution feature extractor through a gradient inversion layer, the second layer is an output layer of the domain classifier, and the activation function of the layer is sigmoid.

TABLE 3 Domain classifier network architecture Table

The parameters for DCDATL are set as follows: the balance constraint parameter alpha of the label predictor is 0.5; the domain adaptive constraint parameter β of the domain classifier is 1, and is used in the early training of DCDATL to overcome the interference of noise signals

Changing the hyper-parameter λ of the gradient inversion layer in a domain classifier, where different data sensitive parameters λ ₀ Gamma and q are each lambda ₀ 1, γ 2, q 0.6 ∈ (0, 1); learning rate of stochastic gradient descent algorithm

Wherein the parameter η is adjusted ₀ 0.02,. mu.0.94, 2000 total number of training steps N, 1,2, …, 2000.

3. Taking the sample under the working condition A as an auxiliary domain labeled sample, taking the sample under the working condition B as a target domain unlabeled sample (namely the current sample to be detected) to perform experiments and comparison:

before the experiment, the number of the labeled samples used as training samples in the auxiliary domain is always kept unchanged, and is taken as 100 multiplied by 3(3 is the number of fault types, the same below); taking the maximum number of unlabeled samples used as training samples in the target domain as 100 × 3; then, 30 × 3 samples are randomly extracted from 110 × 3 unlabeled samples in the target domain as the current samples to be detected. Under the condition of changing the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain (namely, only gradually decreasing the number of unlabeled samples used as training samples in the target domain), the diagnosis accuracy and the average diagnosis accuracy of the current to-be-detected samples of three faults of bearing SKF6205-2RS based on the DCDATL-based fault diagnosis method provided by the invention are compared with the fault diagnosis method based on TCA and SVM (TCA + SVM) and the method based on TrAdBoost, and the comparison result is shown in FIGS. 5-8. Table 4 shows the fault diagnosis accuracy of the proposed method and the two compared methods when the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain is r-2.5.

Table 4 fault diagnosis accuracy (%) -when the ratio r of the number of training samples in the auxiliary domain to the number of training samples in the target domain is 2.5

As can be seen from fig. 5 to 8, as the number of the training samples in the target domain decreases (i.e., the ratio of the number of the training samples in the auxiliary domain to the number of the training samples in the target domain increases), the diagnosis accuracy of the three methods decreases gradually (because the number of the unlabeled samples serving as the training samples in the target domain decreases, the training samples with high similarity and good mobility between the two domains also decrease, so that the training of the three algorithms is insufficient, and therefore, the domain adaptability of the algorithms decreases to a certain extent), but the proposed method DCDATL always has higher fault diagnosis accuracy than the other two methods.

4. Taking the sample under the working condition C as an auxiliary domain labeled sample, taking the sample under the working condition B as a target domain unlabeled sample (namely the current sample to be detected) to perform experiment and comparison:

the experimental process is similar to that in 3, and the comparison results of the diagnosis accuracy and the average diagnosis accuracy of the current to-be-detected sample of three faults of bearing SKF6205-2RS by the DCDATL-based fault diagnosis method provided by the invention and the fault diagnosis method based on TCA + SVM and the method based on TrAdBoost are shown in FIGS. 9-12.

The comparison result shows that the method still has higher fault diagnosis precision than the other two methods when the sample under the working condition C is used as the auxiliary domain labeled sample and the working condition B is used as the target domain unlabeled sample (namely the current sample to be detected).

5. Mixing samples under the A, C working condition to be used as auxiliary domain labeled samples, and using the B working condition samples as target domain unlabeled samples (namely, current samples to be detected) to perform experiments and comparison:

in the experiment, 50 × 3 labeled samples under the working condition A and 50 × 3 labeled samples under the working condition C are mixed to be used as training samples in an auxiliary domain, and the total number of the training samples, namely 100 × 3, is also kept unchanged; taking the maximum number of unlabeled samples used as training samples in the target domain as 100 × 3; then, 30 × 3 samples are randomly extracted from 110 × 3 unlabeled samples in the target domain as the current samples to be detected. In the case of changing the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain (i.e., only gradually decreasing the number of unlabeled samples used as training samples in the target domain), the results of comparing the diagnosis accuracy and the average diagnosis accuracy of the current to-be-measured samples of three faults of the bearing with the TCA + SVM-based method and the tradbost-based method are shown in fig. 13 to 16.

The comparison result shows that in the case that the A, C working condition sample is mixed to serve as the auxiliary domain labeled sample and the B working condition sample serves as the target domain sample (i.e. the sample to be measured currently), the proposed method still has higher fault diagnosis precision than the other two methods, which shows that the proposed method has better robustness and practicability.

The invention has the beneficial effects that:

according to the fault diagnosis method based on the deep convolution domain pair anti-migration learning, disclosed by the invention, in the DCDATL, the high-level feature extraction is carried out by utilizing the deep convolution residual feature extractor, so that the gradient problems such as gradient disappearance, gradient divergence and the like in the DCDATL training process can be avoided, and the convergence and nonlinear approximation capability of the DCDATL can be improved; obtaining feature joint distribution representation through the obtained kronecker product of the high-level features and the label information, embedding the feature joint distribution representation into a domain classifier, performing the countertraining of an auxiliary domain and a target domain in the domain classifier by utilizing gradient inversion, and performing the domain countertraining through the joint distribution of the labeled samples in the auxiliary domain and the unlabeled samples in the target domain to enhance the domain adaptability of the labeled samples in the auxiliary domain to the target domain, so as to learn to obtain the domain invariant features of the auxiliary domain and the target domain, and further improve the migration performance of DCDATL; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function can improve the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample (i.e., the labeled sample in the auxiliary domain) under the historical working condition under the condition that the labeled sample (i.e., the labeled sample in the target domain) does not exist under the current working condition of the rotating machine.

Claims

1. The fault diagnosis method for the anti-migration learning based on the deep convolution domain is characterized by comprising the following steps of:

s4, inputting the rotating machinery to-be-detected sample in the target domain into the trained deep convolution domain anti-migration learning model, and predicting the rotating machinery to-be-detected sample class label to realize rotating machinery fault diagnosis;

the step S2 specifically includes:

s25, performing domain confrontation training on the domain classifier based on the feature joint distribution representation input into the domain classifier, and obtaining a domain confrontation loss function of the domain classifier;

in step S3, a joint distribution domain antagonistic overall loss function of the deep convolution antagonistic migration learning model is calculated, and the optimal parameter θ, which minimizes the overall loss function value, is found by using the stochastic gradient descent algorithm ^* _f 、θ ^* _d And theta ^* _l Completing the training of the anti-migration learning model of the deep convolution domain;

the joint distribution domain confronts the populationLoss function

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

2. The method for fault diagnosis based on deep convolution domain versus anti-migration learning of claim 1, wherein the segmentation preprocessing in step S1 is to perform normalization and group reconstruction processing on each rotating machine sample in turn.

3. The fault diagnosis method based on deep convolution domain antagonistic transfer learning of claim 1, wherein the deep convolution domain antagonistic learning model in the step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier;

4. The method for fault diagnosis based on deep convolutional domain pair anti-migration learning of claim 1, wherein the step S21 specifically comprises:

a1, setting the auxiliary domain rotating mechanical sample input to the deep convolution residual error feature extractor as

The target domain rotating mechanical sample is

The output of the first convolution layer and the second convolution layer is x ^s (l) And x ^t (l) Respectively taking the residual error signals as the input of a first residual error layer and a second residual error layer;

wherein F (-) is a residual function;

f ^t ＝G _f (x _i ^t ,θ _f )。

5. the method for fault diagnosis based on deep convolutional domain versus anti-migration learning of claim 4, wherein the joint loss function of the label predictor in step S23 comprises an auxiliary domain label predictor loss function

And target domain label predictor loss function

The auxiliary domain label predictor loss function

Comprises the following steps:

is a logarithmic loss function;

G _S (. is the overall mapping function of the label predictor;

θ _l a set of weight parameters and bias coefficients for each layer of the label predictor;

rotating the label of the mechanical specimen for the auxiliary field;

the target domain label predictor loss function

Comprises the following steps:

the entropy is distributed for the class conditions,

rotating a pseudo label of a mechanical specimen for a target domain, wherein,

6. the method for fault diagnosis based on deep convolutional domain pair anti-migration learning of claim 5, wherein the step S25 specifically comprises:

in the formula (I), the compound is shown in the specification,

is the kronecker product operator.

7. The fault diagnosis method based on deep convolution domain antagonistic migration learning of claim 6, wherein in the step S25, in the process of training the antagonistic network with the domain antagonistic:

based on the existence of the gradient inversion layer, the error between the predicted value and the true value is transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated according to the transmitted error through each layer of the network in the domain classifier;

in determining the domain opposition loss function of the domain classifier:

the gradient inversion layer f (v) in the domain classifier is defined as:

g(f(v),λ)＝f(v)

wherein g (-) is a gradient inversion layer mapping function;

v is an independent variable input by the gradient inversion layer;

lambda is a gradient inversion layer superparameter;

And target domain category label

Respectively as follows:

in the formula (I), the compound is shown in the specification,

and

G _d () a domain classifier mapping function;

And

deriving a domain-fighting loss function for a domain classifier

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

a sample domain classifier loss function for the auxiliary domain;

a classifier loss function is set for the target domain sample domain;

N _S for auxiliary field original rotary machinesTotal number of samples;

when it is the target domain category label,

8. the method for fault diagnosis based on deep convolution domain anti-migration learning of claim 1, wherein in step S4, the rotating machinery sample to be tested in the target domain is input into the trained deep convolution domain anti-migration learning model, and the class label of the rotating machinery sample to be tested is predicted to obtain the class prediction label y _i ^t Fault diagnosis of the rotary machine is realized;

the class prediction label y _i ^t The expression of (a) is: