CN110751207B - Fault diagnosis method for anti-migration learning based on deep convolution domain - Google Patents

Fault diagnosis method for anti-migration learning based on deep convolution domain Download PDF

Info

Publication number
CN110751207B
CN110751207B CN201910991173.7A CN201910991173A CN110751207B CN 110751207 B CN110751207 B CN 110751207B CN 201910991173 A CN201910991173 A CN 201910991173A CN 110751207 B CN110751207 B CN 110751207B
Authority
CN
China
Prior art keywords
domain
layer
label
sample
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910991173.7A
Other languages
Chinese (zh)
Other versions
CN110751207A (en
Inventor
李锋
唐拓江
韩国良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910991173.7A priority Critical patent/CN110751207B/en
Publication of CN110751207A publication Critical patent/CN110751207A/en
Application granted granted Critical
Publication of CN110751207B publication Critical patent/CN110751207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a fault diagnosis method for anti-migration learning based on a depth convolution domain, which utilizes a depth convolution residual error feature extractor to extract high-level features in DCDATL, thereby improving the convergence and nonlinear approximation capability of DCDATL; the characteristic joint distribution representation is obtained through the obtained Crohn's product of the high-level characteristics and the label information and is embedded into the domain classifier, and the domain confrontation training is carried out to improve the migration performance of the DCDATL; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function improves the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample under the historical working condition under the condition that the labeled sample does not exist under the current working condition of the rotating machine.

Description

Fault diagnosis method for anti-migration learning based on deep convolution domain
Technical Field
The invention belongs to the technical field of fault diagnosis methods for rotary machines, and particularly relates to a fault diagnosis method based on deep convolution domain anti-migration learning.
Background
The rotary mechanical equipment is widely applied to industries such as metallurgy, aviation, traffic, chemical engineering, energy and the like, and the mechanical structure of the rotary mechanical equipment tends to be developed in the directions of large-scale, heavy-duty, precision, high-speed and the like. The condition monitoring and fault diagnosis of the system are important means for ensuring the long-term safe and reliable operation of modern industrial equipment.
In the whole service process, the rotary machine usually works in a variable working condition industrial environment, so that the fault occurrence probability is high and the rotary machine is easy to damage. Under the environment of variable working conditions (such as different rotating speeds and different loads), the problem that the state data under the current actual working condition is difficult to directly monitor exists, so that the sample data of the known fault type (namely the sample data of a target domain with a label) of the rotating machinery under the current actual working condition is less because of difficult acquisition; in addition, under the condition of variable working conditions, the distribution characteristics of historical training fault data (namely, sample data with labels in the auxiliary domain) and target domain data obtained under the condition of the previous working conditions are different and drift. Although research on fault diagnosis of rotating machines under variable operating conditions has received some attention in recent years, many researchers and research institutes have attempted to solve this problem by using conventional Machine learning methods such as Support Vector Machines (SVMs), Nearest Neighbor Classifiers (NNCs), Automatic Encoders (AEs), and the like. However, in the case of the distribution difference between the training samples and the testing samples, the generalization capability of the established classification model is poor, and therefore, the method is not suitable for fault diagnosis under the variable working condition.
The transfer learning theory developed in recent years provides a brand new solution for fault diagnosis of the rotating machinery under the condition of variable working conditions. The migration learning does not need to make the same distribution assumption of training samples and test samples as the traditional machine learning, and the main idea is to learn knowledge from the existing auxiliary domain and then migrate the knowledge to the target domain to complete the classification of the target domain. Currently, the transfer learning research for fault diagnosis of rotating machinery is just started, and only a few research cases exist. Such as: shenfei and the like adopt a feature extraction method based on an autocorrelation matrix SVD and are combined with transfer learning to be used for fault diagnosis of the motor bearing; the best and the like adopt an improved least square Support Vector Machine (SVM) with transfer learning capacity for spherical bearing fault diagnosis; the health conservation and the like adopt Transfer Component Analysis (TCA) for diagnosing the fault of the rolling bearing under variable working conditions. However, the above migration learning methods require that the auxiliary domain and the target domain samples have higher similarity, and if the similarity between the auxiliary domain and the target domain samples is lower, the classification accuracy will decline, and the problem of "negative migration" will occur.
Disclosure of Invention
Aiming at the defects in the prior art, the fault diagnosis method based on the deep convolution domain pair anti-migration learning solves the problem that the fault diagnosis accuracy of the current sample to be detected is low due to the fact that a large number of labeled samples (namely samples with known fault types) of the rotary machine are difficult to obtain in the conventional fault diagnosis method of the rotary machine, so that the labeled samples under the current working condition are few.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: the fault diagnosis method for the anti-migration learning based on the deep convolution domain comprises the following steps:
s1, performing segmented preprocessing on each rotating machine sample of the auxiliary domain and the target domain respectively to obtain corresponding preprocessing results;
s2, inputting the two preprocessing results serving as input samples into the deep convolution domain anti-migration learning model respectively to obtain a domain anti-loss function of a domain classifier in the deep convolution domain anti-migration learning model and a joint loss function of a label predictor;
s3, calculating a joint distribution domain confrontation total loss function of the deep convolution domain confrontation migration learning model based on the domain confrontation loss function of the domain classifier and the joint loss function of the label predictor, and training the deep convolution domain confrontation migration learning model;
and S4, inputting the rotating machinery to-be-detected sample in the target domain into the trained deep convolution domain anti-migration learning model, and predicting the rotating machinery to-be-detected sample class label to realize rotating machinery fault diagnosis.
Further, the segmentation preprocessing in step S1 is to perform normalization and array reconstruction processing on each rotating machine sample in turn.
Further, the deep convolution domain confrontation learning model in the step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier;
the deep convolution residual error feature extractor is respectively connected with a label predictor and a domain classifier, and the label predictor is connected with the domain classifier;
the deep convolution residual error feature extractor comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit comprises a first convolution layer, a first residual error layer, a first average pooling layer and a first full-connection layer which are sequentially connected, and the second feature extraction unit comprises a second convolution layer, a second residual error layer, a second average pooling layer and a second full-connection layer which are sequentially connected;
the first residual layer and the second residual layer respectively comprise a plurality of residual blocks which are connected in a stacked mode, each residual block comprises a first weight layer, a first relu activation function, a second weight layer, an accumulator and a second relu activation function which are sequentially connected, the input end of the first weight layer is further connected with the input end of the accumulator, and the output end of the second relu activation function is connected with the input end of the next residual block, the input end of the first averaging pooling layer or the input end of the second averaging pooling layer;
the label predictor comprises a first label prediction unit and a second label prediction unit, wherein the first label prediction unit and the second label prediction unit share parameters, and the first label prediction unit comprises a third full-link layer, a fourth full-link layer and a first softmax activation function which are sequentially connected; the second label prediction unit comprises a fifth full-link layer, a sixth full-link layer and a second softmax activation function which are connected in sequence;
the domain classifier comprises a gradient inversion layer, wherein a first output end, a seventh full connection layer, an eighth full connection layer and a first sigmoid activation function of the gradient inversion layer are sequentially connected, and a second output end, a ninth full connection layer, a tenth full connection layer and a second sigmoid activation function of the gradient inversion layer are sequentially connected;
the output end of the first full connection layer is connected with the input end of the third full connection layer, the output end of the first softmax activation function and the first input end of the gradient inversion layer respectively, and the output end of the second full connection layer is connected with the input end of the fifth full connection layer and the second input end of the gradient inversion layer respectively.
Further, the step S2 is specifically:
s21, extracting the high-level characteristics of the auxiliary domain sample corresponding to the preprocessed auxiliary domain rotating mechanical sample and the high-level characteristics of the target domain sample corresponding to the target domain rotating mechanical sample through a deep convolution residual error characteristic extractor, and inputting the high-level characteristics of the auxiliary domain sample and the target domain sample into a label predictor;
s22, training the label predictor according to the high-level characteristics of the auxiliary domain samples input into the label predictor;
s23, processing the high-level characteristics of the input target domain sample through the trained label predictor to obtain a joint loss function of the target domain sample pseudo label and the label predictor;
s24, calculating a kronecker product of the high-level features of the auxiliary domain samples and the labels thereof and a kronecker product of the high-level features of the target domain samples and the corresponding pseudo labels thereof, obtaining a feature joint distribution representation, and inputting the feature joint distribution representation into the domain classifier;
and S25, performing domain confrontation training on the domain classifier based on the feature joint distribution representation input into the domain classifier, and obtaining a domain confrontation loss function of the domain classifier.
Further, the step S21 is specifically:
a1, setting input to depth convolution residual error feature extractionThe auxiliary field rotating mechanical sample of the extractor is
Figure BDA0002238342780000042
The target domain rotating mechanical sample is
Figure BDA0002238342780000041
The output of the first convolution layer and the second convolution layer is x s (l) And x l (l) Respectively taking the residual error signals as the input of a first residual error layer and a second residual error layer;
a2, setting the number of residual blocks in the first residual layer to L, then the output x of the first residual layer s (L +1) is:
Figure BDA0002238342780000051
wherein F (-) is a residual function;
W (i) the weights of the ith residual block in the first residual layer are respectively, and i is L, L is 1,2, 3.. L, L is the number of the residual block;
a3, sequentially carrying out average pooling and full-connection processing on the output of the first residual error layer through the first average pooling layer and the first full-connection layer to obtain the high-level characteristic f of the auxiliary domain sample output by the first full-connection layer s Comprises the following steps:
Figure BDA0002238342780000052
in the formula, G f (. to) is the overall mapping function of the deep convolution residual feature extractor;
θ f a set of weight parameters and bias coefficients for a depth convolution residual feature extractor;
a4, obtaining a target domain sample high-level feature f of the second full-connection-layer output based on the steps A2-A3 t Comprises the following steps:
f t =G f (x i tf )。
further, the label in the step S23The joint loss function of the predictor comprises an auxiliary domain label loss function
Figure BDA0002238342780000058
And target domain label loss function
Figure BDA0002238342780000059
The auxiliary domain label predictor function
Figure BDA00022383427800000510
Comprises the following steps:
Figure BDA0002238342780000053
in the formula, N S The total number of the original rotating mechanical samples in the auxiliary domain;
Figure BDA0002238342780000056
is a logarithmic loss function;
G S (. is the overall mapping function of the label predictor;
θ l the label predictor is a set of weight parameters and bias coefficients of each layer;
Figure BDA0002238342780000054
rotating the label of the mechanical specimen for the auxiliary field;
the target domain label predictor loss function
Figure BDA0002238342780000057
Comprises the following steps:
Figure BDA0002238342780000055
in the formula, N t The total number of original rotating machinery samples of the target domain is obtained;
c is the total number of categories of the target domain rotating machinery samples;
Figure BDA0002238342780000061
the entropy is distributed for the class conditions,
Figure BDA0002238342780000062
rotating a pseudo label of a mechanical specimen for a target domain, wherein,
Figure BDA0002238342780000063
further, the step S25 is specifically:
kronecker product of high-level features of auxiliary domain samples and labels thereof
Figure BDA0002238342780000064
Performing Crohn's product with the high-level features of the target domain samples and the pseudo labels thereof
Figure BDA0002238342780000065
Inputting the combined distribution as a characteristic into a domain classifier, enabling a deep convolution residual error characteristic extractor and the domain classifier to form an anti-collision network through a gradient inversion layer, reducing the combined distribution difference between the high-level characteristics of the auxiliary domain sample and the high-level characteristics of the target domain sample by using a domain anti-collision training method, and obtaining a domain anti-collision loss function of the domain classifier;
in the formula (I), the compound is shown in the specification,
Figure BDA0002238342780000066
is the kronecker product operator.
Further, in the step S25, in training the countermeasure network with the domain countermeasure:
due to the existence of the gradient inversion layer, errors of the predicted value and the true value are transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated through each layer of the network in the domain classifier according to the transmitted errors;
when the error is transmitted to the gradient inversion layer, multiplying the error by a parameter-lambda, and then transmitting the error to a depth convolution residual error feature extractor connected with the error, so that the training purposes of the depth convolution residual error feature extractor and a domain classifier are opposite, and the domain confrontation training is realized;
in determining the domain opposition loss function of the domain classifier:
the gradient inversion layer f (v) in the domain classifier is defined as:
g(f(v),λ)=f(v)
wherein g (-) is a gradient inversion layer mapping function;
v is an independent variable input by the gradient inversion layer;
lambda is a gradient inversion layer superparameter;
the gradient of the gradient inversion layer in the domain classifier is obtained as:
Figure BDA0002238342780000071
updating the parameter set θ of the depth convolution residual feature extractor by a standard back propagation algorithm due to the presence of the gradient inversion layer f Parameter set θ of sum-domain classifier d Obtaining an auxiliary domain class label output by the domain classifier
Figure BDA0002238342780000072
And target domain category label
Figure BDA0002238342780000073
Respectively as follows:
Figure BDA0002238342780000074
Figure BDA0002238342780000075
in the formula (I), the compound is shown in the specification,
Figure BDA0002238342780000076
and
Figure BDA0002238342780000077
an auxiliary domain class label and a target domain class label which are respectively output by the domain classifier in the prediction of the auxiliary domain and the target domain;
G d () a domain classifier mapping function;
setting an auxiliary domain original rotating machine sample and a target domain original rotating machine sample as
Figure BDA0002238342780000078
And
Figure BDA00022383427800000716
deriving a domain-fighting loss function for a domain classifier
Figure BDA00022383427800000713
Comprises the following steps:
Figure BDA0002238342780000079
in the formula (I), the compound is shown in the specification,
Figure BDA00022383427800000714
a sample domain classifier loss function for the auxiliary domain;
Figure BDA00022383427800000715
a classifier loss function is set for the target domain sample domain;
N S the total number of the original rotating mechanical samples in the auxiliary domain;
N t the total number of original rotating machinery samples of the target domain is obtained;
Figure BDA00022383427800000710
are decision variables that are domain class labels, and, when auxiliary domain class labels,
Figure BDA00022383427800000711
when it is the eyeWhen the domain category label is tagged,
Figure BDA00022383427800000712
further, in the step S3, a joint distribution domain antagonistic total loss function of the deep convolution antagonistic migration learning model is calculated, and the optimal parameter θ, which is the parameter that minimizes the value of the total loss function, is found by using the stochastic gradient descent algorithm * f 、θ * d And theta * l Completing the training of the anti-migration learning model of the deep convolution domain;
the joint distribution domain opposes an overall loss function
Figure BDA0002238342780000083
Comprises the following steps:
Figure BDA0002238342780000081
in the formula (I), the compound is shown in the specification,
Figure BDA0002238342780000084
and alpha and beta are balance constraint parameters which are joint loss functions of the label predictor and are respectively used for constraining local optimization behaviors of the label predictor and the domain classifier.
Further, in step S4, the rotating machine to-be-measured sample in the target domain is input into the trained deep convolution domain anti-migration learning model, and the class label of the rotating machine to-be-measured sample is predicted to obtain the class prediction label y thereof i t Fault diagnosis of the rotary machine is realized;
the class prediction label y i t The expression of (a) is:
Figure BDA0002238342780000082
the invention has the beneficial effects that:
according to the fault diagnosis method based on the deep convolution domain pair anti-migration learning, disclosed by the invention, in the DCDATL, the high-level feature extraction is carried out by utilizing the deep convolution residual feature extractor, so that the gradient problems such as gradient disappearance, gradient divergence and the like in the DCDATL training process can be avoided, and the convergence and nonlinear approximation capability of the DCDATL can be improved; obtaining feature joint distribution representation through the obtained Crohn's product of the high-level features and the label information, embedding the feature joint distribution representation into a domain classifier, performing the countertraining of an auxiliary domain and a target domain in the domain classifier by utilizing gradient inversion, and performing the domain countertraining through the feature joint distribution of the labeled samples of the auxiliary domain and the unlabeled samples of the target domain to enhance the domain adaptability of the labeled samples of the auxiliary domain to the target domain, so that the domain invariant features of the auxiliary domain and the target domain are obtained through learning, and the migration performance of the DCDATL is improved; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function can improve the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample (i.e., the labeled sample in the auxiliary domain) under the historical working condition under the condition that the labeled sample (i.e., the labeled sample in the target domain) does not exist under the current working condition of the rotating machine.
Drawings
FIG. 1 is a flow chart of an implementation of the fault diagnosis method based on deep convolution domain anti-migration learning according to the present invention.
FIG. 2 is a diagram of the DCDATL model architecture provided by the present invention.
Fig. 3 is a diagram of a residual block provided by the present invention.
Fig. 4 is a real object diagram of a rolling bearing fault simulation experiment table in the embodiment of the invention.
FIG. 5 shows the comparison result of the outer ring crack fault diagnosis accuracy in the embodiment of the invention.
FIG. 6 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.
FIG. 7 shows the comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the invention.
Fig. 8 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.
Fig. 9 is a comparison result of the outer ring crack fault diagnosis accuracy in the embodiment provided by the invention.
FIG. 10 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.
FIG. 11 is a comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the invention.
Fig. 12 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.
FIG. 13 shows the comparison result of the outer ring crack fault diagnosis accuracy in the embodiment of the present invention.
FIG. 14 is a comparison result of rolling element crack fault diagnosis accuracy in the embodiment provided by the invention.
FIG. 15 shows the comparison result of the inner ring crack fault diagnosis accuracy in the embodiment of the present invention.
Fig. 16 is a comparison result of the average fault diagnosis accuracy in the embodiment provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, the fault diagnosis method for anti-migration learning based on the deep convolution domain includes the following steps:
s1, performing segmented preprocessing on each rotating machine sample of the auxiliary domain and the target domain respectively to obtain corresponding preprocessing results;
s2, inputting the two preprocessing results serving as input samples into a deep convolution domain antitransport learning model (DCDATL) respectively to obtain a domain antitransport loss function of a domain classifier in the deep convolution domain antitransport learning model and a joint loss function of a label predictor; (ii) a
S3, calculating a joint distribution domain confrontation total loss function of the deep convolution domain confrontation migration learning model based on the domain confrontation loss function of the domain classifier and the joint loss function of the label predictor, and training the deep convolution domain confrontation migration learning model;
and S4, inputting the rotating machinery to-be-detected sample in the target domain into the trained deep convolution domain anti-migration learning model, and predicting the rotating machinery to-be-detected sample class label to realize rotating machinery fault diagnosis.
The segmentation preprocessing in step S1 is to perform normalization and array reconstruction processing for each rotating machine sample in turn.
As shown in fig. 2, the deep convolution domain antagonistic learning model in step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier, and migration and classification of fault features are completed through three processes of extracting high-level features by the deep convolution residual feature extractor, performing domain antagonistic training with the joint distribution of auxiliary domain labeled samples and target domain unlabeled samples, and minimizing the overall loss function of the domain classifier and the label predictor (i.e., minimizing the joint distribution domain antagonistic loss overall function of DCDATL). Specifically, the deep convolution residual feature extractor in the DCDATL in fig. 2 is connected to the label predictor and the domain classifier respectively, and the label predictor is connected to the domain classifier;
the depth convolution residual error feature extractor comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit comprises a first convolution layer, a first residual error layer, a first average pooling layer and a first full-connection layer which are sequentially connected, and the second feature extraction unit comprises a second convolution layer, a second residual error layer, a second average pooling layer and a second full-connection layer which are sequentially connected;
as shown in fig. 3, each of the first residual layer and the second residual layer includes a plurality of residual blocks connected in a stacked manner, each of the residual blocks includes a first weight layer, a first relu activation function, a second weight layer, an accumulator, and a second relu activation function connected in sequence, an input end of the first weight layer is further connected to an input end of the accumulator, and an output end of the second relu activation function is connected to an input end of a next residual block, an input end of the first averaging pooling layer, or an input end of the second averaging pooling layer;
the label predictor comprises a first label prediction unit and a second label prediction unit, wherein the first label prediction unit and the second label prediction unit share parameters, and the first label prediction unit comprises a third full-link layer, a fourth full-link layer and a first softmax activation function which are sequentially connected; the second label prediction unit comprises a fifth full connection layer, a sixth full connection layer and a second softmax activation function which are connected in sequence; the parameters shared by the first label prediction unit and the second label prediction unit comprise the same corresponding weight and bias of each layer, namely the parameters of a third layer full-connection layer and a fifth layer full-connection layer are the same, and the parameters of a fourth layer full-connection layer and a sixth layer full-connection layer are the same;
the domain classifier comprises a gradient inversion layer, a first output end, a seventh full connection layer, an eighth full connection layer and a first sigmoid activation sequential function of the gradient inversion layer are connected, and a second output end of the gradient inversion layer is respectively connected with a ninth full connection layer, a tenth full connection layer and a second sigmoid activation sequential function in sequence;
the output end of the first full connection layer is connected with the input end of the third full connection layer, the output end of the first softmax activation function and the first input end of the gradient inversion layer respectively, and the output end of the second full connection layer is connected with the input end of the fifth full connection layer and the second input end of the gradient inversion layer respectively.
The DCDATL theoretical advantage enables the DCDATL-based rotary machine fault diagnosis method to carry out high-precision fault diagnosis on the rotary machine current to-be-detected sample by utilizing the labeled sample (auxiliary domain labeled sample) under the historical working condition under the condition that the labeled sample (namely the labeled sample in the target domain) does not exist under the current working condition of the rotary machine.
As shown in fig. 2, when performing a specific fault diagnosis based on the theoretical model of DCDATL, the step S2 is specifically:
s21, extracting the high-level characteristics of the auxiliary domain sample corresponding to the preprocessed auxiliary domain rotating mechanical sample and the high-level characteristics of the target domain sample corresponding to the target domain rotating mechanical sample through a deep convolution residual error characteristic extractor, and inputting the high-level characteristics of the auxiliary domain sample and the target domain sample into a label predictor;
s22, training the label predictor according to the high-level characteristics of the auxiliary domain samples input into the label predictor;
s23, processing the high-level characteristics of the input target domain sample through the trained label predictor to obtain a joint loss function of the target domain sample pseudo label and the label predictor;
s24, calculating a kronecker product of the high-level features of the auxiliary domain samples and the labels thereof and a kronecker product of the high-level features of the target domain samples and the corresponding pseudo labels thereof, obtaining a feature joint distribution representation, and inputting the feature joint distribution representation into the domain classifier;
and S25, performing domain confrontation training on the domain classifier based on the feature joint distribution representation input into the domain classifier, and obtaining a domain confrontation loss function of the domain classifier.
In the step S21, when the deep convolutional Residual feature extractor is used to extract the high-level features, two Residual layer forming Residual networks (ResNet) construct the deep network in the deep convolutional Residual feature extractor by introducing a Residual block (Residual block), where the structure of the Residual block is shown in fig. 3, x is input, h (x) is output, f (x) is a Residual mapping function, and the weight layer is a convolutional layer.
The basic constraint for constructing a deep network is that the error of the stacked network model should not be higher than that of the underlying shallow model (a shallow model contains a residual block). The deep model is constructed by using an identity mapping method, namely, H (x) x represents the optimal mapping of a group of stacked network layers. When the network is deep, the model is difficult to directly fit the actual mapping h (x), the residual network converts the mapping h (x) problem into the fitting residual mapping f (x) by introducing a "shortcut connection, and the actual mapping h (x) is expressed as:
H(x)=F(x)+x (1)
at this time, the residual network model only needs to minimize the residual function f (x) ═ h (x) -x to approximate the actual mapping to solve the performance degradation problem of the network layer stack.
Therefore, the step S21 is specifically:
a1, setting the auxiliary domain rotating machine sample input to the deep convolution residual error feature extractor as
Figure BDA0002238342780000133
The target domain rotating mechanical sample is
Figure BDA0002238342780000132
The output of the first convolution layer and the second convolution layer is x s (l) And x l (l) Respectively taking the residual error signals as the input of a first residual error layer and a second residual error layer;
a2, setting the number of residual blocks in the first residual layer to L, then the output x of the first residual layer s (L +1) is:
Figure BDA0002238342780000131
wherein F (·) is a residual function;
W (i) the weights of the ith residual block in the first residual layer are respectively, and i is L, L is 1,2, 3.. L, L is the number of the residual block;
as shown in the formula (2), the residual error network is characterized by the summation of residual error characteristics of each layer, and the l +1 layer is ensured to have more characteristic information than the l layer. In the back propagation process, according to the chain derivative rule, the gradient calculation formula of the error loss term loss to the lth residual block at the front end of the network is as follows:
Figure BDA0002238342780000141
the formula (3) shows that the gradient of the deepest layer L can be directly transmitted to any network layer L which is shallower than the deepest layer L, and the gradient of the error loss term loss to each residual block is larger than or equal to 1 when the error loss term loss is propagated reversely, so that the problems of gradient disappearance and gradient divergence can be avoided.
A3, average pooling the output of the first residual layer by the first average pooling layer and the first full-link layerAnd full connection processing is carried out to obtain the high-level characteristic f of the auxiliary domain sample output by the first full connection layer s Comprises the following steps:
Figure BDA0002238342780000142
in the formula, G f (. to) is the overall mapping function of the deep convolution residual feature extractor;
θ f a set of weight parameters and bias coefficients for a depth convolution residual feature extractor;
a4, obtaining a target domain sample high-level feature f of the second full-connection-layer output based on the steps A2-A3 t Comprises the following steps:
f t =G f (x i tf ) (5)
the joint loss function of the label predictor in the above step S23 includes an auxiliary domain label loss function
Figure BDA0002238342780000145
And target domain label loss function
Figure BDA0002238342780000146
Auxiliary domain label predictor function
Figure BDA0002238342780000147
Comprises the following steps:
Figure BDA0002238342780000143
in the formula, N S The total number of the original rotating mechanical samples in the auxiliary domain;
Figure BDA0002238342780000148
is a logarithmic loss function;
G S (. is the overall mapping function of the label predictor;
θ l the label predictor is a set of weight parameters and bias coefficients of each layer;
Figure BDA0002238342780000144
rotating the label of the mechanical specimen for the auxiliary field;
in order to enable the label predictor to better fit the distribution of the target sample, the invention takes the cross entropy of the minimum class conditional probability distribution as the target and defines the loss function of the label predictor of the target domain by combining the sample to be detected of the target domain
Figure BDA0002238342780000158
Comprises the following steps:
Figure BDA0002238342780000151
in the formula, N t The total number of original rotating machinery samples of the target domain is obtained;
c is the total number of categories of the target domain rotating machinery samples;
Figure BDA0002238342780000152
the entropy is distributed for the class conditions,
Figure BDA0002238342780000153
rotating a pseudo label of a mechanical specimen for a target field, wherein
Figure BDA0002238342780000154
By minimizing the class condition distribution entropy, the label predictor can more accurately predict the class label of the sample to be detected in the target domain.
The step S25 is specifically:
kronecker product of high-level features of auxiliary domain samples and labels thereof
Figure BDA0002238342780000155
Performing Crohn's product with the high-level features of the target domain samples and the pseudo labels thereof
Figure BDA0002238342780000156
As a characteristic joint distribution (
Figure BDA0002238342780000157
Kronecker operator) is input into a domain classifier, a deep convolution residual error feature extractor and the domain classifier form an anti-network through a gradient inversion layer, the joint distribution difference between the high-level features of the auxiliary domain samples and the high-level features of the target domain samples is reduced by a domain anti-training method, and a domain anti-loss function of the domain classifier is obtained. The deep convolution domain residual error feature extractor tries to extract features which cannot be distinguished by a domain classifier from samples of an auxiliary domain and a target domain, however, the domain classifier is trained to judge whether information comes from the auxiliary domain or the target domain as much as possible, and therefore the deep convolution residual error feature extractor and the domain classifier form a confrontation relation.
Therefore, in the above step S25, in training the countermeasure network with the domain countermeasure:
due to the existence of a Gradient Reversal Layer (GRL), the countermeasure network is easy to train, errors of a predicted value and a true value are transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated according to the transmitted errors through each layer of the domain classifier;
when the error is transmitted to the gradient inversion layer, multiplying the error by a parameter-lambda, and then transmitting the error to a depth convolution residual error feature extractor connected with the error, so that the training purposes of the depth convolution residual error feature extractor and a domain classifier are opposite, and the domain confrontation training is realized;
in determining the domain opposition loss function of the domain classifier:
the gradient inversion layer in the domain classifier is defined as:
g(f(v),λ)=f(v) (8)
wherein g (-) is a gradient inversion layer mapping function;
v is an independent variable input by the gradient inversion layer;
lambda is a gradient inversion layer superparameter;
the gradient of the gradient inversion layer in the domain classifier is obtained as:
Figure BDA0002238342780000161
updating the parameter set θ of the depth convolution residual feature extractor by a standard back propagation algorithm due to the presence of the gradient inversion layer f Parameter set θ of sum-domain classifier d Obtaining an auxiliary domain class label output by the domain classifier
Figure BDA0002238342780000162
And target domain category label
Figure BDA0002238342780000163
Respectively as follows:
Figure BDA0002238342780000164
Figure BDA0002238342780000165
in the formula (I), the compound is shown in the specification,
Figure BDA0002238342780000166
and
Figure BDA0002238342780000167
an auxiliary domain class label and a target domain class label which are respectively output by the domain classifier in the prediction of the auxiliary domain and the target domain;
G d () a domain classifier mapping function;
setting an auxiliary domain original rotating machine sample and a target domain original rotating machine sample as
Figure BDA0002238342780000168
And
Figure BDA00022383427800001612
deriving a domain-fighting loss function for a domain classifier
Figure BDA00022383427800001610
Comprises the following steps:
Figure BDA0002238342780000169
in the formula (I), the compound is shown in the specification,
Figure BDA00022383427800001611
a sample domain classifier loss function for the auxiliary domain;
Figure BDA0002238342780000177
a classifier loss function is set for the target domain sample domain;
N S the total number of the original rotating mechanical samples in the auxiliary domain;
N t the total number of original rotating machinery samples of the target domain is obtained;
Figure BDA0002238342780000171
are decision variables that are domain class labels, and, when auxiliary domain class labels,
Figure BDA0002238342780000172
when it is the target domain category label,
Figure BDA0002238342780000173
domain penalty function in a minimized domain classifier
Figure BDA0002238342780000178
And in the process of updating parameters through a back propagation algorithm, when the error between the predicted value and the true value of the domain classifier is transmitted to the gradient inversion layer, the error is multiplied by a negative number-lambda, so that the purposes of training the deep convolution residual error feature extractor and the domain classifier are opposite, and the domain confrontation training is realized. Thus, the features are represented in a joint distribution
Figure BDA0002238342780000174
And
Figure BDA0002238342780000175
embedded domain classifier, guaranteed by domain countermeasure training as shown in equations (8) - (12)
Figure BDA0002238342780000176
The high-level feature joint probability distribution of the auxiliary domain samples and the target domain samples extracted by the deep convolution residual error feature extractor is similar, namely the auxiliary domain labeled samples and the target domain unlabeled samples (namely the samples to be detected) are mapped to the shared feature space with the minimum domain difference, so that the domain invariant features of the auxiliary domain and the target domain are learned, and the first half of the process of the transfer learning is completed. The above process of performing domain confrontation training by using the joint distribution of inter-domain samples enhances the domain adaptability of the labeled samples of the auxiliary domain to the target domain, so that the subsequent migration learning process of DCDATL can be better performed.
In the above step S3, the joint distribution domain antagonistic total loss function of the deep convolution domain antagonistic migration learning model is calculated, and the optimal parameter θ, which minimizes the value of the total loss function, is found using the stochastic gradient descent algorithm * f 、θ * d And theta * l Completing the training of the anti-migration learning model of the deep convolution domain;
joint distributed domain antagonistic global loss function
Figure BDA0002238342780000179
Comprises the following steps:
Figure BDA00022383427800001710
Figure BDA0002238342780000181
in the formula (I), the compound is shown in the specification,
Figure BDA0002238342780000184
a joint loss function that is a label predictor;
from the foregoing equation, it can be further derived that the overall loss function is:
Figure BDA0002238342780000182
in the formula, both alpha and beta are balance constraint parameters which are respectively used for constraining local optimizing behaviors of the label predictor and the domain classifier.
Finally, a random gradient descent algorithm is used to find the optimal parameter θ that minimizes the value of equation (14) (i.e., minimizes the overall loss) * f 、θ * d And theta * l And finishing the training of DCDATL.
In step S4, the target domain sample to be tested is input into the trained DCDATL to obtain the class prediction label y thereof i t And fault diagnosis of the rotating machine is realized, and the process can be expressed by the following formula.
Figure BDA0002238342780000183
Based on the above process, the second half of the DCDATL migration learning process aims at minimizing the DCDATL joint distribution domain confrontation total loss function, so that the domain confrontation purpose is achieved, the feature migration is completed, and the high precision of the label predictor for the target domain to-be-detected sample class label prediction is ensured.
In one embodiment of the invention, the fault of the rotating machinery by adopting the method is aimed at an experimental process:
1. experimental apparatus:
the experiment adopts bearing data collected by a rolling bearing (a rolling bearing is a typical rotating machine) fault simulation test bed of an electrical engineering laboratory of the university of Kaiser storage. The test stand, as shown in fig. 4, includes a motor, a torque sensor/encoder, and the like. A rolling bearing (model number SKF6205-2RS) at the driving end to be detected supports a rotating shaft of the motor. In the laboratory, small grooves with the diameter of 0.3556mm and the depth of 0.279mm are respectively machined on the outer ring, the rolling body and the inner ring of 3 rolling bearings in an electric spark machining mode to simulate single-point cracks of the outer ring, the rolling body and the inner ring of the rolling bearing. Motor power and speed are measured by a torque sensor/encoder and manually recorded. An acceleration sensor is installed on a bearing seat at a driving end, vibration acceleration signals of the rolling bearing under different working conditions (namely different rotating speeds and different loads) monitored by the acceleration sensor and caused by faults are collected through a signal collector, and the sampling frequency is 12 kHz. Every 1024 consecutive vibration acceleration data points are taken as a sample. Respectively recording the working conditions of 1772r/min and 1 horsepower load as working conditions A; working condition B is adopted under the conditions of rotation speed of 1750r/min and 2 horsepower load; the rotating speed is 1730r/min, and the working condition is C under the load condition of 3 horsepower. About 110 fault samples of the outer ring, the rolling body and the inner ring under each working condition are collected respectively. And performing segmentation preprocessing on each sample to obtain a corresponding 32 × 32 matrix as an input sample of the DCDATL.
DCDATL network architecture design and parameter set-up:
the DCDATL network structure is designed as follows:
(1) designing a first layer of a depth convolution residual error feature extractor network as a common convolution layer, a second layer as a1 st residual error layer, a third layer as a2 nd residual error layer, a fourth layer as a3 rd residual error layer, a fifth layer as an average pooling layer and a sixth layer as a full connection layer. The specific configuration is shown in table 1:
table 1 depth residual convolution feature extractor network structure table
Figure BDA0002238342780000191
Figure BDA0002238342780000201
{3 × 3,16 in table 1; 3 × 3,16 represents two layers of weight layer connection to form a residual block, where 3 × 3 represents the convolution kernel size of the weight layer, and 16 is the number of feature maps output by the weight layer, {3 × 3, 16; 3 × 3,16} × 3 indicates that one residual layer is composed of three residual blocks.
(2) The structure of the label predictor is designed as shown in table 2, the first layer is a fully-connected layer connected with the depth residual convolution feature extractor, the activation function of the layer is a linear correction unit function, the second layer is a maximum soft output layer of the label predictor, and the activation function of the layer is softmax.
Table 2 label predictor network architecture table
Figure BDA0002238342780000202
(3) The structure of the domain classifier is designed as shown in table 3, the first layer is a full-link layer, the activation function of the layer is a linear modification unit function, the linear modification unit function is connected with the full-link layer of the depth residual convolution feature extractor through a gradient inversion layer, the second layer is an output layer of the domain classifier, and the activation function of the layer is sigmoid.
TABLE 3 Domain classifier network architecture Table
Figure BDA0002238342780000203
The parameters for DCDATL are set as follows: the balance constraint parameter alpha of the label predictor is 0.5; the domain adaptive constraint parameter β of the domain classifier is 1, and is used in the early training of DCDATL to overcome the interference of noise signals
Figure BDA0002238342780000204
Changing the hyper-parameter λ of the gradient inversion layer in a domain classifier, where different data sensitive parameters λ 0 Gamma and q are each lambda 0 1, γ 2, q 0.6 ∈ (0, 1); learning rate of stochastic gradient descent algorithm
Figure BDA0002238342780000205
Wherein the parameter η is adjusted 0 0.02,. mu.0.94, 2000 total number of training steps N, 1,2, …, 2000.
3. Taking the sample under the working condition A as an auxiliary domain labeled sample, taking the sample under the working condition B as a target domain unlabeled sample (namely the current sample to be detected) to perform experiments and comparison:
before the experiment, the number of the labeled samples used as training samples in the auxiliary domain is always kept unchanged, and is taken as 100 multiplied by 3(3 is the number of fault types, the same below); taking the maximum number of unlabeled samples used as training samples in the target domain as 100 × 3; then, 30 × 3 samples are randomly extracted from 110 × 3 unlabeled samples in the target domain as the current samples to be detected. Under the condition of changing the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain (namely, only gradually decreasing the number of unlabeled samples used as training samples in the target domain), the diagnosis accuracy and the average diagnosis accuracy of the current to-be-detected samples of three faults of bearing SKF6205-2RS based on the DCDATL-based fault diagnosis method provided by the invention are compared with the fault diagnosis method based on TCA and SVM (TCA + SVM) and the method based on TrAdBoost, and the comparison result is shown in FIGS. 5-8. Table 4 shows the fault diagnosis accuracy of the proposed method and the two compared methods when the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain is r-2.5.
Table 4 fault diagnosis accuracy (%) -when the ratio r of the number of training samples in the auxiliary domain to the number of training samples in the target domain is 2.5
Figure BDA0002238342780000211
As can be seen from fig. 5 to 8, as the number of the training samples in the target domain decreases (i.e., the ratio of the number of the training samples in the auxiliary domain to the number of the training samples in the target domain increases), the diagnosis accuracy of the three methods decreases gradually (because the number of the unlabeled samples serving as the training samples in the target domain decreases, the training samples with high similarity and good mobility between the two domains also decrease, so that the training of the three algorithms is insufficient, and therefore, the domain adaptability of the algorithms decreases to a certain extent), but the proposed method DCDATL always has higher fault diagnosis accuracy than the other two methods.
4. Taking the sample under the working condition C as an auxiliary domain labeled sample, taking the sample under the working condition B as a target domain unlabeled sample (namely the current sample to be detected) to perform experiment and comparison:
the experimental process is similar to that in 3, and the comparison results of the diagnosis accuracy and the average diagnosis accuracy of the current to-be-detected sample of three faults of bearing SKF6205-2RS by the DCDATL-based fault diagnosis method provided by the invention and the fault diagnosis method based on TCA + SVM and the method based on TrAdBoost are shown in FIGS. 9-12.
The comparison result shows that the method still has higher fault diagnosis precision than the other two methods when the sample under the working condition C is used as the auxiliary domain labeled sample and the working condition B is used as the target domain unlabeled sample (namely the current sample to be detected).
5. Mixing samples under the A, C working condition to be used as auxiliary domain labeled samples, and using the B working condition samples as target domain unlabeled samples (namely, current samples to be detected) to perform experiments and comparison:
in the experiment, 50 × 3 labeled samples under the working condition A and 50 × 3 labeled samples under the working condition C are mixed to be used as training samples in an auxiliary domain, and the total number of the training samples, namely 100 × 3, is also kept unchanged; taking the maximum number of unlabeled samples used as training samples in the target domain as 100 × 3; then, 30 × 3 samples are randomly extracted from 110 × 3 unlabeled samples in the target domain as the current samples to be detected. In the case of changing the ratio of the number of training samples in the auxiliary domain to the number of training samples in the target domain (i.e., only gradually decreasing the number of unlabeled samples used as training samples in the target domain), the results of comparing the diagnosis accuracy and the average diagnosis accuracy of the current to-be-measured samples of three faults of the bearing with the TCA + SVM-based method and the tradbost-based method are shown in fig. 13 to 16.
The comparison result shows that in the case that the A, C working condition sample is mixed to serve as the auxiliary domain labeled sample and the B working condition sample serves as the target domain sample (i.e. the sample to be measured currently), the proposed method still has higher fault diagnosis precision than the other two methods, which shows that the proposed method has better robustness and practicability.
The invention has the beneficial effects that:
according to the fault diagnosis method based on the deep convolution domain pair anti-migration learning, disclosed by the invention, in the DCDATL, the high-level feature extraction is carried out by utilizing the deep convolution residual feature extractor, so that the gradient problems such as gradient disappearance, gradient divergence and the like in the DCDATL training process can be avoided, and the convergence and nonlinear approximation capability of the DCDATL can be improved; obtaining feature joint distribution representation through the obtained kronecker product of the high-level features and the label information, embedding the feature joint distribution representation into a domain classifier, performing the countertraining of an auxiliary domain and a target domain in the domain classifier by utilizing gradient inversion, and performing the domain countertraining through the joint distribution of the labeled samples in the auxiliary domain and the unlabeled samples in the target domain to enhance the domain adaptability of the labeled samples in the auxiliary domain to the target domain, so as to learn to obtain the domain invariant features of the auxiliary domain and the target domain, and further improve the migration performance of DCDATL; the feature migration and classification process based on minimizing the joint distribution domain of DCDATL against the global loss function can improve the classification accuracy after migration. The above advantages of the DCDATL enable the DCDATL-based fault diagnosis method to perform high-precision fault diagnosis on the current to-be-detected sample of the rotating machine by using the labeled sample (i.e., the labeled sample in the auxiliary domain) under the historical working condition under the condition that the labeled sample (i.e., the labeled sample in the target domain) does not exist under the current working condition of the rotating machine.

Claims (8)

1. The fault diagnosis method for the anti-migration learning based on the deep convolution domain is characterized by comprising the following steps of:
s1, performing segmented preprocessing on each rotating machine sample of the auxiliary domain and the target domain respectively to obtain corresponding preprocessing results;
s2, inputting the two preprocessing results serving as input samples into the deep convolution domain anti-migration learning model respectively to obtain a domain anti-loss function of a domain classifier in the deep convolution domain anti-migration learning model and a joint loss function of a label predictor;
s3, calculating a joint distribution domain confrontation total loss function of the deep convolution domain confrontation migration learning model based on the domain confrontation loss function of the domain classifier and the joint loss function of the label predictor, and training the deep convolution domain confrontation migration learning model;
s4, inputting the rotating machinery to-be-detected sample in the target domain into the trained deep convolution domain anti-migration learning model, and predicting the rotating machinery to-be-detected sample class label to realize rotating machinery fault diagnosis;
the step S2 specifically includes:
s21, extracting the high-level characteristics of the auxiliary domain sample corresponding to the preprocessed auxiliary domain rotating mechanical sample and the high-level characteristics of the target domain sample corresponding to the target domain rotating mechanical sample through a deep convolution residual error characteristic extractor, and inputting the high-level characteristics of the auxiliary domain sample and the target domain sample into a label predictor;
s22, training the label predictor according to the high-level characteristics of the auxiliary domain samples input into the label predictor;
s23, processing the high-level characteristics of the input target domain sample through the trained label predictor to obtain a joint loss function of the target domain sample pseudo label and the label predictor;
s24, calculating a kronecker product of the high-level features of the auxiliary domain samples and the labels thereof and a kronecker product of the high-level features of the target domain samples and the corresponding pseudo labels thereof, obtaining a feature joint distribution representation, and inputting the feature joint distribution representation into the domain classifier;
s25, performing domain confrontation training on the domain classifier based on the feature joint distribution representation input into the domain classifier, and obtaining a domain confrontation loss function of the domain classifier;
in step S3, a joint distribution domain antagonistic overall loss function of the deep convolution antagonistic migration learning model is calculated, and the optimal parameter θ, which minimizes the overall loss function value, is found by using the stochastic gradient descent algorithm * f 、θ * d And theta * l Completing the training of the anti-migration learning model of the deep convolution domain;
the joint distribution domain confronts the populationLoss function
Figure FDA0003457559660000022
Comprises the following steps:
Figure FDA0003457559660000021
in the formula (I), the compound is shown in the specification,
Figure FDA0003457559660000023
and alpha and beta are balance constraint parameters which are joint loss functions of the label predictor and are respectively used for constraining local optimization behaviors of the label predictor and the domain classifier.
2. The method for fault diagnosis based on deep convolution domain versus anti-migration learning of claim 1, wherein the segmentation preprocessing in step S1 is to perform normalization and group reconstruction processing on each rotating machine sample in turn.
3. The fault diagnosis method based on deep convolution domain antagonistic transfer learning of claim 1, wherein the deep convolution domain antagonistic learning model in the step S2 includes a deep convolution residual feature extractor, a label predictor and a domain classifier;
the deep convolution residual error feature extractor is respectively connected with a label predictor and a domain classifier, and the label predictor is connected with the domain classifier;
the deep convolution residual error feature extractor comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit comprises a first convolution layer, a first residual error layer, a first average pooling layer and a first full-connection layer which are sequentially connected, and the second feature extraction unit comprises a second convolution layer, a second residual error layer, a second average pooling layer and a second full-connection layer which are sequentially connected;
the first residual layer and the second residual layer respectively comprise a plurality of residual blocks which are connected in a stacked mode, each residual block comprises a first weight layer, a first relu activation function, a second weight layer, an accumulator and a second relu activation function which are sequentially connected, the input end of the first weight layer is further connected with the input end of the accumulator, and the output end of the second relu activation function is connected with the input end of the next residual block, the input end of the first averaging pooling layer or the input end of the second averaging pooling layer;
the label predictor comprises a first label prediction unit and a second label prediction unit, wherein the first label prediction unit and the second label prediction unit share parameters, and the first label prediction unit comprises a third full-link layer, a fourth full-link layer and a first softmax activation function which are sequentially connected; the second label prediction unit comprises a fifth full-link layer, a sixth full-link layer and a second softmax activation function which are connected in sequence;
the domain classifier comprises a gradient inversion layer, wherein a first output end, a seventh full connection layer, an eighth full connection layer and a first sigmoid activation function of the gradient inversion layer are sequentially connected, and a second output end, a ninth full connection layer, a tenth full connection layer and a second sigmoid activation function of the gradient inversion layer are sequentially connected;
the output end of the first full connection layer is connected with the input end of the third full connection layer, the output end of the first softmax activation function and the first input end of the gradient inversion layer respectively, and the output end of the second full connection layer is connected with the input end of the fifth full connection layer and the second input end of the gradient inversion layer respectively.
4. The method for fault diagnosis based on deep convolutional domain pair anti-migration learning of claim 1, wherein the step S21 specifically comprises:
a1, setting the auxiliary domain rotating mechanical sample input to the deep convolution residual error feature extractor as
Figure FDA0003457559660000041
The target domain rotating mechanical sample is
Figure FDA0003457559660000042
The output of the first convolution layer and the second convolution layer is x s (l) And x t (l) Respectively taking the residual error signals as the input of a first residual error layer and a second residual error layer;
a2, setting the number of residual blocks in the first residual layer to L, then the output x of the first residual layer s (L +1) is:
Figure FDA0003457559660000043
wherein F (-) is a residual function;
W (i) the weights of the ith residual block in the first residual layer are respectively, and i is L, L is 1,2, 3.. L, L is the number of the residual block;
a3, sequentially carrying out average pooling and full-connection processing on the output of the first residual error layer through the first average pooling layer and the first full-connection layer to obtain the high-level characteristic f of the auxiliary domain sample output by the first full-connection layer s Comprises the following steps:
Figure FDA0003457559660000044
in the formula, G f (. to) is the overall mapping function of the deep convolution residual feature extractor;
θ f a set of weight parameters and bias coefficients for a depth convolution residual feature extractor;
a4, obtaining a target domain sample high-level feature f of the second full-connection-layer output based on the steps A2-A3 t Comprises the following steps:
f t =G f (x i tf )。
5. the method for fault diagnosis based on deep convolutional domain versus anti-migration learning of claim 4, wherein the joint loss function of the label predictor in step S23 comprises an auxiliary domain label predictor loss function
Figure FDA0003457559660000046
And target domain label predictor loss function
Figure FDA0003457559660000047
The auxiliary domain label predictor loss function
Figure FDA0003457559660000048
Comprises the following steps:
Figure FDA0003457559660000045
in the formula, N S The total number of the original rotating mechanical samples in the auxiliary domain;
Figure FDA0003457559660000049
is a logarithmic loss function;
G S (. is the overall mapping function of the label predictor;
θ l a set of weight parameters and bias coefficients for each layer of the label predictor;
Figure FDA0003457559660000051
rotating the label of the mechanical specimen for the auxiliary field;
the target domain label predictor loss function
Figure FDA0003457559660000059
Comprises the following steps:
Figure FDA0003457559660000052
in the formula, N t The total number of original rotating machinery samples of the target domain is obtained;
c is the total number of categories of the target domain rotating machinery samples;
Figure FDA0003457559660000053
the entropy is distributed for the class conditions,
Figure FDA0003457559660000054
rotating a pseudo label of a mechanical specimen for a target domain, wherein,
Figure FDA0003457559660000055
6. the method for fault diagnosis based on deep convolutional domain pair anti-migration learning of claim 5, wherein the step S25 specifically comprises:
kronecker product of high-level features of auxiliary domain samples and labels thereof
Figure FDA0003457559660000056
Performing Crohn's product with the high-level features of the target domain samples and the pseudo labels thereof
Figure FDA0003457559660000057
Inputting the combined distribution as a characteristic into a domain classifier, enabling a deep convolution residual error characteristic extractor and the domain classifier to form an anti-collision network through a gradient inversion layer, reducing the combined distribution difference between the high-level characteristics of the auxiliary domain sample and the high-level characteristics of the target domain sample by using a domain anti-collision training method, and obtaining a domain anti-collision loss function of the domain classifier;
in the formula (I), the compound is shown in the specification,
Figure FDA0003457559660000058
is the kronecker product operator.
7. The fault diagnosis method based on deep convolution domain antagonistic migration learning of claim 6, wherein in the step S25, in the process of training the antagonistic network with the domain antagonistic:
based on the existence of the gradient inversion layer, the error between the predicted value and the true value is transmitted backwards in the backward propagation process in the domain classifier, and the gradient is calculated according to the transmitted error through each layer of the network in the domain classifier;
when the error is transmitted to the gradient inversion layer, multiplying the error by a parameter-lambda, and then transmitting the error to a depth convolution residual error feature extractor connected with the error, so that the training purposes of the depth convolution residual error feature extractor and a domain classifier are opposite, and the domain confrontation training is realized;
in determining the domain opposition loss function of the domain classifier:
the gradient inversion layer f (v) in the domain classifier is defined as:
g(f(v),λ)=f(v)
wherein g (-) is a gradient inversion layer mapping function;
v is an independent variable input by the gradient inversion layer;
lambda is a gradient inversion layer superparameter;
the gradient of the gradient inversion layer in the domain classifier is obtained as:
Figure FDA0003457559660000061
updating the parameter set θ of the depth convolution residual feature extractor by a standard back propagation algorithm due to the presence of the gradient inversion layer f Parameter set θ of sum-domain classifier d Obtaining an auxiliary domain class label output by the domain classifier
Figure FDA0003457559660000062
And target domain category label
Figure FDA0003457559660000063
Respectively as follows:
Figure FDA0003457559660000064
Figure FDA0003457559660000065
in the formula (I), the compound is shown in the specification,
Figure FDA0003457559660000066
and
Figure FDA0003457559660000067
an auxiliary domain class label and a target domain class label which are respectively output by the domain classifier in the prediction of the auxiliary domain and the target domain;
G d () a domain classifier mapping function;
setting an auxiliary domain original rotating machine sample and a target domain original rotating machine sample as
Figure FDA0003457559660000068
And
Figure FDA0003457559660000069
deriving a domain-fighting loss function for a domain classifier
Figure FDA00034575596600000611
Comprises the following steps:
Figure FDA00034575596600000610
in the formula (I), the compound is shown in the specification,
Figure FDA00034575596600000612
a sample domain classifier loss function for the auxiliary domain;
Figure FDA0003457559660000075
a classifier loss function is set for the target domain sample domain;
N S for auxiliary field original rotary machinesTotal number of samples;
N t the total number of original rotating machinery samples of the target domain is obtained;
Figure FDA0003457559660000071
are decision variables that are domain class labels, and, when auxiliary domain class labels,
Figure FDA0003457559660000072
when it is the target domain category label,
Figure FDA0003457559660000073
8. the method for fault diagnosis based on deep convolution domain anti-migration learning of claim 1, wherein in step S4, the rotating machinery sample to be tested in the target domain is input into the trained deep convolution domain anti-migration learning model, and the class label of the rotating machinery sample to be tested is predicted to obtain the class prediction label y i t Fault diagnosis of the rotary machine is realized;
the class prediction label y i t The expression of (a) is:
Figure FDA0003457559660000074
CN201910991173.7A 2019-10-18 2019-10-18 Fault diagnosis method for anti-migration learning based on deep convolution domain Active CN110751207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910991173.7A CN110751207B (en) 2019-10-18 2019-10-18 Fault diagnosis method for anti-migration learning based on deep convolution domain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910991173.7A CN110751207B (en) 2019-10-18 2019-10-18 Fault diagnosis method for anti-migration learning based on deep convolution domain

Publications (2)

Publication Number Publication Date
CN110751207A CN110751207A (en) 2020-02-04
CN110751207B true CN110751207B (en) 2022-08-05

Family

ID=69278749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910991173.7A Active CN110751207B (en) 2019-10-18 2019-10-18 Fault diagnosis method for anti-migration learning based on deep convolution domain

Country Status (1)

Country Link
CN (1) CN110751207B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353548B (en) * 2020-03-11 2020-10-20 中国人民解放军军事科学院国防科技创新研究院 Robust feature deep learning method based on confrontation space transformation network
CN111504644B (en) * 2020-04-01 2021-04-09 西安交通大学 Bearing fault diagnosis method for adaptive anti-domain based on embedding distinguishability condition
CN111598161A (en) * 2020-05-14 2020-08-28 哈尔滨工业大学(威海) Engine gas circuit state diagnosis system based on CNN transfer learning
CN111860677B (en) * 2020-07-29 2023-11-21 湖南科技大学 Rolling bearing migration learning fault diagnosis method based on partial domain countermeasure
CN112434602B (en) * 2020-11-23 2023-08-29 西安交通大学 Fault diagnosis method based on movable common feature space mining
CN112418175A (en) * 2020-12-09 2021-02-26 北京信息科技大学 Rolling bearing fault diagnosis method and system based on domain migration and storage medium
CN112733900B (en) * 2020-12-31 2024-01-23 华北电力大学 Boiler combustion state stability judging method based on deep migration learning
CN113076834B (en) * 2021-03-25 2022-05-13 华中科技大学 Rotating machine fault information processing method, processing system, processing terminal, and medium
CN113554075A (en) * 2021-07-12 2021-10-26 安徽容知日新科技股份有限公司 Method for generating fault classification model and method for determining fault type of rotating equipment
CN113821012B (en) * 2021-08-30 2023-12-29 南京航空航天大学 Fault diagnosis method for variable-working-condition satellite attitude control system
CN114118140B (en) * 2021-10-29 2022-09-23 新黎明科技股份有限公司 Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing
CN114021285B (en) * 2021-11-17 2024-04-12 上海大学 Rotary machine fault diagnosis method based on mutual local countermeasure migration learning
CN114202028B (en) * 2021-12-13 2023-04-28 四川大学 MAMTL-based rolling bearing life stage identification method
CN115964661B (en) * 2023-01-04 2023-09-08 东北林业大学 Rotary machine fault diagnosis method and system based on domain-by-domain network
CN116630679A (en) * 2023-03-29 2023-08-22 南通大学 Osteoporosis identification method based on CT image and domain invariant feature

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958287A (en) * 2017-11-23 2018-04-24 清华大学 Towards the confrontation transfer learning method and system of big data analysis transboundary
CN109766921A (en) * 2018-12-19 2019-05-17 合肥工业大学 A kind of vibration data Fault Classification based on depth domain-adaptive
CN109902393A (en) * 2019-03-01 2019-06-18 哈尔滨理工大学 Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110186680A (en) * 2019-05-30 2019-08-30 盐城工学院 A kind of confrontation differentiation domain adapts to one-dimensional convolutional neural networks intelligent failure diagnosis method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6435581B2 (en) * 2015-01-29 2018-12-12 パナソニックIpマネジメント株式会社 Transfer learning device, transfer learning system, transfer learning method and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958287A (en) * 2017-11-23 2018-04-24 清华大学 Towards the confrontation transfer learning method and system of big data analysis transboundary
CN109766921A (en) * 2018-12-19 2019-05-17 合肥工业大学 A kind of vibration data Fault Classification based on depth domain-adaptive
CN109902393A (en) * 2019-03-01 2019-06-18 哈尔滨理工大学 Fault Diagnosis of Roller Bearings under a kind of variable working condition based on further feature and transfer learning
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110186680A (en) * 2019-05-30 2019-08-30 盐城工学院 A kind of confrontation differentiation domain adapts to one-dimensional convolutional neural networks intelligent failure diagnosis method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
A transfer convolutional neural network for fault diagnosis based on ResNet-50;Long Wen 等;《Neural Computing and Applications》;20190126;全文 *
Adversarial adaptive 1-D convolutional neural networks for bearing fault diagnosis under varying working condition;Bo Zhang 等;《https://arxiv.org/pdf/1805.00778v1.pdf》;20180501;全文 *
Deep Adversarial Domain Adaptation Model for Bearing Fault Diagnosis;Zhao-Hua Liu 等;《IEEE Transactions on Systems, Man, and Cybernetics: Systems》;20190819;全文 *
Deep Convolutional Transfer Learning Network: A New Method for Intelligent Fault Diagnosis of Machines With Unlabeled Data;Liang Guo 等;《IEEE Transactions on Industrial Electronics》;20181026;全文 *
Network Transfer Learning via Adversarial Domain Adaptation with Graph Convolution;Quanyu Dai 等;《https://arxiv.org/pdf/1909.01541v1.pdf》;20190904;全文 *
Transfer learning with neural networks for bearing fault diagnosis in changing working conditions;Ran Zhang 等;《IEEE Access》;20170628;全文 *
基于生成对抗网络的迁移学习算法研究;臧文华;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20180915;全文 *
大数据下机械装备故障的深度迁移诊断方法;雷亚国等;《机械工程学报》;20190125(第07期);全文 *

Also Published As

Publication number Publication date
CN110751207A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN110751207B (en) Fault diagnosis method for anti-migration learning based on deep convolution domain
CN111651937B (en) Method for diagnosing faults of in-class self-adaptive bearing under variable working conditions
CN110361176B (en) Intelligent fault diagnosis method based on multitask feature sharing neural network
CN112629863B (en) Bearing fault diagnosis method for dynamic joint distribution alignment network under variable working conditions
CN109918752B (en) Mechanical fault diagnosis method, equipment and medium based on migration convolution neural network
CN111709448A (en) Mechanical fault diagnosis method based on migration relation network
CN112308147B (en) Rotary machinery fault diagnosis method based on multi-source domain anchor adapter integrated migration
CN113375941B (en) Open-set fault diagnosis method for bearings of high-speed motor train unit
CN113567130A (en) Bearing fault diagnosis method based on multiple working conditions of equipment
CN113792758B (en) Rolling bearing fault diagnosis method based on self-supervision learning and clustering
CN112132102B (en) Intelligent fault diagnosis method combining deep neural network with artificial bee colony optimization
CN113887136B (en) Electric automobile motor bearing fault diagnosis method based on improved GAN and ResNet
CN115127814A (en) Unsupervised bearing fault diagnosis method based on self-adaptive residual error countermeasure network
CN113505664B (en) Fault diagnosis method for planetary gear box of wind turbine generator
CN116793682A (en) Bearing fault diagnosis method based on iCORAL-MMD and anti-migration learning
CN115374820A (en) Rotary machine cross-domain fault diagnosis method based on multi-source sub-domain adaptive network
CN114970628B (en) Rotating part fault diagnosis method based on generation countermeasure network under condition of unbalanced samples
Liu et al. Rolling bearing fault diagnosis of train running gear based on optimized deep residual network
Han et al. A novel intelligent fault diagnosis method based on dual convolutional neural network with multi-level information fusion
CN113530850B (en) Centrifugal pump fault diagnosis method based on ESA and stacked capsule self-encoder
Ghazaly et al. Prediction of misfire location for SI engine by unsupervised vibration algorithm
Fu et al. Development of deep convolutional neural network with adaptive batch normalization algorithm for bearing fault diagnosis
CN116244640A (en) Unbalanced bearing fault diagnosis method and system based on federal learning
CN115795397A (en) Gearbox temperature prediction method based on 1DCNN-LSTM and BiLSTM parallel network
CN115587289A (en) Multi-source domain adaptive network-based aeroengine fault identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant