CN116992953B - Model training method, fault diagnosis method and device - Google Patents

Model training method, fault diagnosis method and device Download PDF

Info

Publication number
CN116992953B
CN116992953B CN202311253060.XA CN202311253060A CN116992953B CN 116992953 B CN116992953 B CN 116992953B CN 202311253060 A CN202311253060 A CN 202311253060A CN 116992953 B CN116992953 B CN 116992953B
Authority
CN
China
Prior art keywords
domain
domain sample
data
sample
source domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311253060.XA
Other languages
Chinese (zh)
Other versions
CN116992953A (en
Inventor
柳雅倩
王建国
蔡浩原
廖建辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Geniitek Sensor Co ltd
Original Assignee
Suzhou Geniitek Sensor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Geniitek Sensor Co ltd filed Critical Suzhou Geniitek Sensor Co ltd
Priority to CN202311253060.XA priority Critical patent/CN116992953B/en
Publication of CN116992953A publication Critical patent/CN116992953A/en
Application granted granted Critical
Publication of CN116992953B publication Critical patent/CN116992953B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The application discloses a model training method, a fault diagnosis method and a device, wherein the model training method comprises the steps of preprocessing a source domain sample and a target domain sample which are obtained in advance and inputting the preprocessed source domain sample and the preprocessed target domain sample into a domain countermeasure neural network model which is built in advance; extracting features of a source domain sample and a target domain sample by using a domain antagonistic neural network to obtain feature data; carrying out category prediction on the source domain sample according to the characteristic data, carrying out local domain prediction on the source domain sample and the target domain sample, and calculating the total loss of the model according to the prediction result and the characteristic data; and updating parameters of the pre-constructed domain countermeasure neural network model according to the total model loss until a preset updating stop condition is met, so as to obtain the target domain countermeasure neural network model. The application can align the characteristics of similar fault data in different fields (distribution), avoid gradient disappearance, improve the whole training process and finally realize the great improvement of the performance of the fault diagnosis model.

Description

Model training method, fault diagnosis method and device
Technical Field
The present application relates to the field of fault diagnosis technologies, and in particular, to a model training method, a fault diagnosis method, and a device.
Background
In the field of industrial production, fault diagnosis of equipment is of great importance. Some devices (such as motors and the like) tend to operate under various loads, which brings challenges to device fault diagnosis under different loads, and deep learning-based device fault diagnosis is a current research hot spot, but a deep learning model is usually trained under device vibration data of a single load (such as the same rotating speed and the same load), so that on one hand, an engineering practice cannot have a good recognition effect, amplitude characteristics of vibration signals change along with the change of the load, and dynamic change of fault characteristics can be caused, on the other hand, a characteristic extraction method is single, and the problem of difference of distribution of load fault data is difficult to deal with. Therefore, under different load conditions, the premise of the model based on deep learning to obtain high recognition rate is that the labeled vibration data of the motor running under different loads needs to be collected, and the actual situation is often that the data is redundant and the labels are insufficient, so that the fault diagnosis method based on deep learning has great limitation in practical application.
In view of the above problems, the migration learning is the most representative of the current research schemes. Transfer learning is a method of transferring learned knowledge from one domain to another. In recent years, in the field of transfer learning, a transfer learning method based on field adaptation (Domain Adaption, DA) is increasingly used in the field of device failure diagnosis. And how to effectively extract the features of the source domain and the target domain, measure the distance between the features of the source domain and the target domain and optimize the distance, is the key point and difficulty of fault diagnosis based on DA.
At present, besides the migration learning method for aligning features by reducing inter-Domain distribution distance, adding resistance training to a DA algorithm is also an important means for optimizing a model, the method is inspired by generating an countermeasure network, namely a Domain countermeasure neural network (Domain-ADVERSARIAL NEURAL NETWORK, DANN), the DANN learns a Domain-invariant feature representation in a manner of countermeasure learning, the weight of a deep learning model can be adaptively adjusted, and the feature representation of a signal is separated from Domain information, so that the signal maintains consistent features in different data domains. In order to better align feature distribution and improve network performance, a multi-layer MMD (Maximum MEAN DISCREPANCY, maximum mean difference) is adopted to reduce distribution difference between a source domain and a target domain, and meanwhile, the multi-layer MMD is embedded into a DANN (convolutional neural network) based on a CNN, so that high-quality features extracted by convolution are greatly utilized, and model diagnosis performance is improved. However, the method has strict requirements on algorithms of characteristic distribution distances, once the data volume of a source domain is too large or the distribution of a target domain and the source domain is far away, gradient disappearance and the like are generated, and training of a diagnosis model cannot be effectively completed, so that motor faults under a variable load condition cannot be effectively identified.
Therefore, a solution to the problem of gradient extinction and to effectively complete the training of the diagnostic model is needed to complete the motor fault diagnosis.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a model training method, a diagnosis device, computer equipment and a storage medium, so as to solve the problem that feature extraction and effective identification among different fields cannot be realized in equipment fault diagnosis aiming at variable loads in the prior art.
In order to solve one or more of the technical problems, the application adopts the following technical scheme:
in a first aspect, a model training method is provided, the method comprising:
preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, and inputting the preprocessed source domain sample and the pre-acquired target domain sample into a pre-constructed domain antagonistic neural network model;
Extracting features of the source domain sample and the target domain sample by using the domain antagonistic neural network to obtain feature data; performing category prediction on the source domain sample according to the characteristic data and performing local domain prediction on the source domain sample and the target domain sample to obtain a prediction result, and calculating a model total loss according to the prediction result and the characteristic data, wherein the model total loss comprises classification loss, antagonism loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the characteristic data of the source domain sample and the characteristic data of the target domain sample;
and updating parameters of the pre-constructed domain countermeasure neural network model according to the model total loss until a preset updating stop condition is met, so as to obtain a target domain countermeasure neural network model.
In a specific embodiment, the domain countermeasure neural network model includes a feature extractor, a tag classifier, a domain arbiter;
The feature extractor is used for extracting feature data of the source domain sample and the target domain sample, and at least comprises at least two one-dimensional convolution layers and a global average pooling layer which are sequentially arranged;
The one-dimensional convolution layer is used for carrying out one-dimensional convolution calculation on the data in the source domain sample and the target domain sample to obtain initial characteristics, and the global average pooling layer is used for carrying out dimension reduction on the initial characteristics to obtain the characteristic data;
the label classifier is used for carrying out category prediction on the source domain sample according to the characteristic data of the source domain sample;
the domain discriminator is used for predicting the local domain of the source domain sample and the target domain sample according to the characteristic data of the source domain sample and the target domain sample.
In a specific embodiment, the feature extractor further comprises an input layer and a flattening layer, and the input layer, the two one-dimensional convolution layers, the global average pooling layer, and the flattening layer are sequentially arranged.
In a specific embodiment, the tag classifier includes at least two fully connected layers, and a random deactivation layer is disposed between the two fully connected layers.
In a specific embodiment, the calculation formula of the bulldozer distance is:
wherein, For joint probability distribution,/>And/>Is two probability distributions over a given metric space,/>Representing distribution/>And/>Set of all possible joint probability distributions of a combination,/>Is from/>Samples of the source domain and the target domain obtained by middle sampling,/>Representation/>And/>Distance of/>Expressed in joint probability distribution/>The expected value of the sample versus distance.
In a specific embodiment, the model total loss is calculated from the classification loss, the pair of anti-losses and the distribution loss weighting;
Wherein the classification loss is determined according to the characteristic data of the source domain sample;
and/or, the fight loss is determined from the characteristic data of the source domain sample and the target domain sample.
In a specific embodiment, the calculation formula of the classification loss is:
wherein, And/>For data of source domain samples and corresponding labels,/>Output of finger feature extractor,/>Tagged data representing a desired dataset as a source domain,/>Refers to the distribution of fault source domain samples,/>Output of index tag classifier,/>Is a weight parameter of the feature extractor,/>Is a weight parameter of the tag classifier;
And/or, the calculation formula of the countermeasures loss is as follows:
wherein, And/>For data of source domain samples and corresponding labels,/>Tagged data representing a desired dataset as a source domain,/>Is the output of the domain arbiter,/>Output of finger feature extractor,/>Data that is a target domain sample;
and/or, the calculation formula of the distribution loss is as follows:
wherein, And/>Is two probability distributions over a given metric space,/>Untagged data representing the desired dataset as a source domain sample,/>Unlabeled data representing the desired dataset as a target domain sample,/>Refers to satisfying/>Upper bounds in the case.
In a specific embodiment, the feature extractor is connected to the domain arbiter through a gradient inversion layer, the input of the gradient inversion layer is the feature data, and the output of the gradient inversion layer is the feature with the same dimension as the feature dataSaid/>Expressed as:
wherein, Representing the feature data extracted by the feature extractor,/>Representing the gradient of the challenge loss to the feature extractor parameters,/>Is a super parameter of the gradient inversion layer.
In a specific embodiment, the source domain sample and the target domain sample acquisition source comprise fault motors under different working conditions.
In a specific embodiment, the different conditions include different loads of the faulty motor;
or the source domain sample and the target domain sample comprise vibration signals acquired from fault motors under different working conditions.
In a specific embodiment, the source domain samples and the target domain samples are time-series signals.
In a specific embodiment, the preprocessing the pre-acquired source domain samples and target domain samples includes:
and carrying out Fourier transformation on the source domain sample and the target domain sample.
In a specific embodiment, the predetermined update stop condition includes at least one of the domain countermeasure neural network model reaching a preset number of iterations, or a true label error rate of the predicted label of the characteristic data and the data in the source domain sample predicted by the label classifier being smaller than a first set value, or an accuracy of the domain prediction of the characteristic data by the domain arbiter being larger than a second set value.
In a second aspect, there is also provided a fault diagnosis method based on the target domain antagonistic neural network model trained by the model training method as described above, the method comprising:
Obtaining measured data of equipment to be diagnosed, wherein the measured data comprises vibration signals of the equipment to be diagnosed;
and inputting the measured data into the target domain antagonistic neural network model for identification and classification, and obtaining a diagnosis result.
In a third aspect, there is also provided a model training apparatus, the apparatus comprising:
The processing module is used for preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, inputting the pre-acquired source domain sample and the pre-acquired target domain sample into a pre-constructed domain antagonism neural network model, extracting characteristics of the source domain sample and the target domain sample by utilizing the domain antagonism neural network to obtain characteristic data, carrying out category prediction on the source domain sample and carrying out local domain prediction on the source domain sample and the target domain sample according to the characteristic data, and calculating model total loss according to a prediction result and the characteristic data, wherein the model total loss comprises classification loss, antagonism loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the characteristic data of the source domain sample and the target domain sample;
And the updating module is used for updating the parameters of the pre-constructed domain countermeasure neural network model according to the total model loss until a preset updating stop condition is met, so as to obtain the target domain countermeasure neural network model.
In a fourth aspect, there is also provided a fault diagnosis training apparatus, the apparatus comprising:
The data acquisition module is used for acquiring actual measurement data of equipment to be diagnosed, wherein the actual measurement data comprises vibration signals of the equipment to be diagnosed;
The fault diagnosis module is used for inputting the measured data into a target domain antagonistic neural network model obtained in advance to carry out identification and classification, and obtaining a diagnosis result.
In a fifth aspect, there is also provided a computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the model training method.
In a sixth aspect, there is also provided a computer readable storage medium having a computer program stored therein, which when executed, implements the model training method.
According to the specific embodiment provided by the application, the application discloses the following technical effects:
The application provides a model training method, a fault diagnosis method, a device, computer equipment and a storage medium, wherein the model training method comprises the steps of preprocessing a source domain sample and a target domain sample which are obtained in advance and inputting the preprocessed source domain sample and target domain sample into a domain countermeasure neural network model which is built in advance; extracting features of the source domain sample and the target domain sample by using the domain antagonistic neural network to obtain feature data; performing category prediction on the source domain sample according to the characteristic data and performing local domain prediction on the source domain sample and the target domain sample to obtain a prediction result, and calculating a model total loss according to the prediction result and the characteristic data, wherein the model total loss comprises classification loss, antagonism loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the characteristic data of the source domain sample and the characteristic data of the target domain sample; and updating parameters of the pre-constructed domain countermeasure neural network model according to the model total loss until a preset updating stop condition is met, so as to obtain a trained target domain countermeasure neural network model. According to the application, the total loss function is obtained by fusing the bulldozer distance (Wasserstein) with the DANN, the Wasserstein distance can still reflect the distance between the source domain and the target domain under the condition that the two distributions are very little or no overlapping, one distribution can be continuously transformed into the other distribution, the geometrical morphology characteristics of the distributions are kept, the characteristics of similar fault data in different fields (the distributions) are aligned, gradient disappearance is avoided, the whole training process is improved, and the performance of the fault diagnosis model is greatly improved finally;
Further, the model training method, the fault diagnosis method, the device, the computer equipment and the storage medium improve the feature extraction capability of the feature extractor by adding the one-dimensional convolution layer and the global average pooling layer, so that the feature extraction capability of the model on time series signals is greatly enhanced, meanwhile, the complexity of the model is reduced, and the calculation efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a model training method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a domain countermeasure neural network model provided by an embodiment of the present application;
FIG. 3 is a flowchart of a loss optimization method for fusing Wasserstein distances provided by an embodiment of the present application;
FIG. 4 is a training process 2, A schematic diagram of the variation of the case;
FIG. 5 is a training process Schematic representation of the variation of (a);
the probability distribution of the spectrum under different loads when removing the dc component is shown in fig. 6;
FIGS. 7a to 7d are confusion matrices representing the diagnosis results of four condition migration experiments on fault data;
FIG. 8 is a flow chart of a fault diagnosis method provided by an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a fault diagnosis apparatus according to an embodiment of the present application;
fig. 11 is a schematic diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As described in the background art, in the training process of the domain countermeasure neural network model in the prior art, when the data volume of the source domain is too large or the data distribution of the target domain is far different from the data distribution of the source domain, the real data distribution and the generated data distribution are overlapped very little at the beginning of the model training, the situation can cause a problem that the gradient is very small and finally the gradient disappears, so that the training of the model cannot be completed effectively, and the equipment fault under the variable load condition cannot be identified effectively.
In order to solve one or more of the above technical problems, a new model training method, fault diagnosis method, apparatus, computer device and storable medium are provided in the embodiments of the present application, the feature extraction is performed on the source domain sample and the target domain sample by the domain antagonistic neural network model to obtain feature data, the class prediction is performed according to the feature data, and the local domain prediction is performed on the source domain sample and the target domain sample, then the model total loss is calculated according to the prediction result and the feature data, and the parameters of the model are adjusted according to the model total loss, wherein the model total loss includes classification loss, antagonistic loss and distribution loss, the distribution loss is calculated according to the bulldozer distance between the feature data of the source domain sample and the target domain sample, and since the bulldozer distance can reflect the distance between the source domain sample data and the target domain sample data in the case that the source domain sample data and the target domain sample data are far apart, the features of the similar domain antagonistic neural network model with excellent performance can be aligned, so as to effectively identify the equipment faults under the condition of variable load.
The following describes the embodiments of the present application in detail with reference to the drawings.
Example 1
The embodiment of the application provides a model training method, as shown in fig. 1, which mainly comprises the following steps:
s100: the pre-acquired source domain samples and target domain samples are preprocessed and then input into a pre-constructed domain antagonistic neural network model.
Specifically, the source domain sample and the acquisition source of the target domain sample comprise fault motors under different working conditions. Preferably, the different conditions include different loads of the faulty motor. Further preferably, the source domain sample and the target domain sample comprise vibration signals acquired from a faulty motor under different conditions.
Specifically, in the embodiment of the application, the source domain refers to the existing knowledge domain, which represents a different domain from the sample to be tested, but has rich supervision information; the target domain refers to the domain to be studied, and represents the domain where the sample to be tested is located, and no label or only a small number of labels exist.
It will be appreciated that a difficulty in optimizing a deep learning based approach in variable load fault diagnosis is reducing the distribution difference between the source domain data set and the target domain data set, as much as possible, to align them. To solve this problem and further enhance the domain adaptation capability of the fault diagnosis method of the equipment (including but not limited to the motor, etc.), in the embodiment of the present application, a bulldozer (wasperstein) distance is fused to a domain countermeasure neural network, which is used for the domain adaptation analysis of the vibration signal of the equipment with variable load, that is, the domain countermeasure neural network model pre-constructed in the embodiment of the present application includes but is not limited to a domain countermeasure neural network model (WD-DANN) fused to the bulldozer distance. The bulldozer distance is a distance function for measuring the distance between two distributions, which is proposed by Rubner and the like, can be used for measuring the similarity between the two distributions, and has wide application in the fields of statistics, optimal transportation and the like.
Preferably, the source domain samples and the target domain samples are time-series signals.
S200: extracting features of the source domain sample and the target domain sample by using the domain antagonistic neural network to obtain feature data; and carrying out category prediction on the source domain sample according to the characteristic data, carrying out local domain prediction on the source domain sample and the target domain sample, obtaining a prediction result, and calculating a model total loss according to the prediction result and the characteristic data, wherein the model total loss comprises classification loss, antagonism loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the characteristic data of the source domain sample and the target domain sample.
Specifically, in the embodiment of the application, the bulldozer distance between the characteristics of the source domain sample and the target domain sample is calculated and used as a component part of model loss, and the characteristics of the variable load data can be extracted more accurately by minimizing the characteristics of the total loss function pair Ji Yuanyu and the target domain, so that the accurate identification of the variable load data is realized.
S300: and updating parameters of the pre-constructed domain countermeasure neural network model according to the model total loss until a preset updating stop condition is met, so as to obtain a target domain countermeasure neural network model.
Specifically, through continuous training and learning of input data, the model can learn invariable characteristic representations among different fields, the weight of the deep learning model is adjusted in a self-adaptive mode, and the characteristic representations of signals are separated from field information, so that the signals can keep consistent characteristics in different data fields, and finally migration diagnosis of equipment fault states under different working conditions is realized.
Specifically, in the embodiment of the present application, the predetermined update stop condition is not particularly limited, and the user may set according to the actual requirement. As an exemplary, but non-limiting illustration, in the embodiment of the present application, the predetermined update stop condition may be that the domain antagonistic neural network model reaches a preset iteration number, or that the real label error rate of the predicted label of the characteristic data predicted by the label classifier and the data in the source domain sample is smaller than a first set value, and/or the accuracy rate of the domain identifier for domain prediction of the characteristic data is larger than a second set value, where the first set value and the second set value may be set according to actual requirements, and the present application is not limited thereto.
As a preferred implementation manner, in the embodiment of the present application, the domain countermeasure neural network model includes a feature extractor, a tag classifier, and a domain arbiter;
The feature extractor is used for extracting feature data of the source domain sample and the target domain sample, and at least comprises at least two one-dimensional convolution layers and a global average pooling layer which are sequentially arranged;
The one-dimensional convolution layer is used for carrying out one-dimensional convolution calculation on the data in the source domain sample and the target domain sample to obtain initial characteristics, and the global average pooling layer is used for carrying out dimension reduction on the initial characteristics to obtain the characteristic data;
the label classifier is used for carrying out category prediction on the source domain sample according to the characteristic data of the source domain sample;
the domain discriminator is used for predicting the local domain of the source domain sample and the target domain sample according to the characteristic data of the source domain sample and the target domain sample.
Specifically, in the embodiment of the application, the characteristics of the source domain sample and the target domain sample are extracted through the characteristic extractor, so that better input is provided for the subsequent classification task. Thus, the feature extractor needs to generate features with good domain adaptability in order to migrate between the source domain and the target domain. The features extracted from the feature extractor are accepted as input by the tag classifier, which by training is able to learn the mapping from input data to output tags to perform specific classification tasks. In the embodiment of the application, the feature extractor and the tag classifier form a fault state identification network of equipment (such as a motor) for identifying which type of fault the input fault data belongs to. The input features are classified by a domain discriminator, and whether the input features are source domains or target domains is predicted by using a binary classifier. The domain discriminator receives the data features extracted by the feature extractor, learns the ability to assign the features to the correct domain, thereby improving the generalization ability of the algorithm, and further realizing the adaptation to the data in different domains so as to eliminate the difference between the source domain and the target domain.
Further, a domain countermeasure neural network model (WD-DANN) in which the bulldozer distances are fused is exemplified as the previously constructed domain countermeasure neural network model. Referring to fig. 2, the WD-DANN model in the embodiment of the application has a structure and a calculation flow similar to those of the original DANN, and compared with the structure and calculation flow similar to those of the original DANN, the improvement lies in that a one-dimensional convolutional neural network (i.e., a one-dimensional convolutional layer) and a global averaging pooling layer are applied as feature extractors to extract features of source domain samples and target domain samples, a waserstein distance between feature distributions of the source domain samples and the target domain samples is calculated and is used as a component of a DANN loss function, and features of variable load data can be extracted more accurately by minimizing the features of the total loss function pair Ji Yuanyu and the target domain, so that accurate identification of the variable load data is realized. Compared with the full connection layer in the original DANN, the one-dimensional convolutional neural network can better extract the characteristics of time sequence signals.
As described above, the feature extractor in the embodiment of the present application at least includes three network structures, which are two one-dimensional convolution layers (Conv) and one global average pooling layer (GAP). In specific implementation, the data (including the source domain sample and the target domain sample) is connected with a global average pooling layer (GAP) after an initial feature vector obtained after one-dimensional convolution calculation is performed. GAP is also a key of the feature extractor in the embodiment of the present application, and the main idea of GAP operation is to capture spatial information in the whole feature map by calculating the average value of the whole feature map, so as to achieve the purpose of reducing and extracting features. Therefore, the data features need to be subjected to dimension reduction through GAP, so that the feature distribution of the data can be better fitted, and the complexity of a model is reduced. The output of the final GAP is a one-dimensional vector containing three floating point numbers, which represent the three axes of the input data.
As a preferred implementation manner, the feature extractor in the embodiment of the present application further includes an Input layer (Input), two one-dimensional convolution layers (Conv 1 and Conv2, respectively), a global average pooling layer (GAP), and a flattening layer (flattening) sequentially disposed. Illustratively, the structure and partial parameters of the feature extractor are shown in Table 1. Where None represents the number of data samples in the input model.
TABLE 1 feature extractor structure and parameters
Furthermore, in the embodiment of the application, by adding a domain discriminator in the neural network, the network can learn the characteristic representation of the source domain and the target domain at the same time, and by adopting an antagonism training mode, the domain difference between the source domain and the target domain is reduced, and meanwhile, a label classifier is trained to realize the identification of input data, and the three components of the characteristic extractor, the label classifier and the domain discriminator act together, so that the generalization performance of an algorithm is improved through alternate training.
It will be appreciated that in diagnosing a fault in a device (e.g., a motor), the DANN model separates the signature representation of the vibration signal from the domain information, i.e., separates the signature of the fault from the operating condition information on which the motor is operating, so that the signature representation of the motor vibration signal remains consistent even under different operating conditions. The characteristic representations can be used for training a classification model so as to realize motor vibration signal analysis under different working conditions.
In a preferred embodiment of the present application, the tag classifier includes at least two full-connection layers, and a random deactivation layer is disposed between the two full-connection layers.
Specifically, a random inactivation layer (Dropout) is added to reduce the overfitting effect during training.
As a preferred implementation manner, in the embodiment of the present application, the domain arbiter includes at least two fully connected layers.
Exemplary, the embodiment of the application provides a label classifier designed in WD-DANN modelComprises three full-connection layers (FC 1, FC2 and FC3 respectively), wherein a random inactivation layer (Dropout) is arranged between two adjacent full-connection layers, and a domain discriminator/>Comprises two fully connected layers (respectively indicated as FC4 and FC 5), and the specific structure and parameters are shown in Table 2:
table2 tag classifier and domain arbiter structural parameters
In a preferred embodiment of the present application, the calculation formula of the bulldozer distance is:
wherein, For joint probability distribution,/>And/>Is two probability distributions over a given metric space,/>Representing distribution/>And/>Set of all possible joint probability distributions of a combination,/>Is from/>Samples of the source domain and the target domain obtained by middle sampling,/>Representation/>And/>Distance of/>Expressed in joint distribution/>The expected value of the sample versus distance.
Specifically, the desired lower bound for all possible joint distributions is the Wasserstein distance. All wasperstein distances can be used to describe the minimum cost required to convert one distribution to another. The Wasserstein distance can be applied to solve the problem of distance optimization between source domain distribution and target domain distribution in transfer learning (TRANSFER LEARNING, TL).
In a preferred embodiment of the present application, the model total loss is calculated according to the classification loss, the pair of anti-loss and the distribution loss;
Wherein the classification loss is determined according to the characteristic data of the source domain sample;
and/or, the fight loss is determined from the characteristic data of the source domain sample and the target domain sample.
In a preferred embodiment of the present application, the calculation formula of the classification loss is:
wherein, And/>For data of source domain samples and corresponding labels,/>Output of finger feature extractor,/>Tagged data representing a desired dataset as a source domain,/>Refers to the distribution of fault source domain samples,/>Output of index tag classifier,/>Is a weight parameter of the feature extractor,/>Is a weight parameter of the tag classifier.
In particular, it is assumed that there are two data setsAnd/>Wherein/>Data representing source domain samples,/>Data representing a sample of the target domain. For each input data/>Its corresponding tag is/>The goal is to implement the transfer learning in the source domain sample and the target domain sample.
First, define a domain arbiterAnd tag classifier/>Shared feature extractor/>Can be expressed asWherein/>And weight parameters representing the feature extractor. Domain arbiter/>The purpose of (a) is to predict input data/>From the source domain or the target domain, domain discriminator is/>At the same time define/>Representation/>From the source domain,/>Representation/>From the target domain. Label classifier/>The target of (1) is to train a classifier on the source domain data, complete identification by using the characteristics of the input data extracted by the characteristic extractor and the weight parameters of the input data, and define the label classifier as/>This task is achieved by minimizing task loss/>Realized by the method.
As a preferred implementation manner, in the embodiment of the present application, the calculation formula of the countermeasures loss is:
wherein, And/>For data of source domain samples and corresponding labels,/>Tagged data for the source domain for the desired dataset,/>Is the output of the domain arbiter,/>Output of finger feature extractor,/>Is the data of the target domain samples.
Specifically, an in-domain arbiterIn the network according to/>The output data characteristics judge the field of the input fault data, namely the working condition of the fault, and the input fault data is calculated/>Loss function/>And propagates it to/>And (3) adjusting parameters of the feature extraction network so that the extracted features can reduce the difference between the source domain and the target domain, thereby improving the classification precision of the source domain and the target domain. In order to make the features extracted by the feature extractor domain independent and related to the fault itself, it is necessary to minimize the domain arbiter/>Loss/>
As a preferred implementation manner, in the embodiment of the present application, the calculation formula of the distribution loss is:
wherein, And/>Is two probability distributions over a given metric space,/>Untagged data representing the desired dataset as a source domain sample,/>Untagged data representing the desired dataset as the target domain,/>Refers to satisfying/>Upper bounds in the case.
In a preferred embodiment of the present application, the feature extractor is connected to the domain arbiter through a gradient inversion layer, an input of the gradient inversion layer is the feature data, and an output of the gradient inversion layer is a feature with the same dimension as the feature dataSaid/>Expressed as:
wherein, Representing the feature data extracted by the feature extractor,/>Representing the gradient of the challenge loss to the feature extractor parameters,/>1 Is the super parameter of the gradient inversion layer.
In particular, the gradient inversion layer (GRADIENT REVERSAL LAYER, GRL) is a key component in the DANN, which serves to connect the feature extractor and domain arbiter and to achieve domain adaptation of the feature space. The input of the gradient inversion layer is the feature extracted by the feature extractor, and the output is the feature with the same dimension. The gradient inversion layer does not change the dimensions and shape of the features and thus can be seamlessly connected and stacked with other layers to construct an end-to-end domain adaptive network.
It will be appreciated that the DANN overall loss functionGenerally expressed as: /(I)
Wherein the method comprises the steps ofRefers to the real label of the source domain sample,/>Is a super-parameter controlling domain adaptation strength for balancing the contribution of the loss function of the tag classifier and the loss function of the domain arbiter.
Multiplying the input features by an attenuation coefficient, the magnitude of which is determined by the loss functionFor gradient decisions of feature extractor parameters, the larger the gradient, the larger the attenuation coefficient, i.e. the smaller the contribution to the feature. Thus, the features learned by the feature extractor can be correspondingly adjusted in the field adaptation process, so that the field adaptation purpose is realized. /(I)The greater the value of (2), the greater the importance of the domain arbiter, and for more indistinguishable samples, the tag classifier will be more dependent on the outcome of the domain arbiter's predictions. In the training process,/>The formula of 2 is expressed as:
wherein, Setting up the concrete performance of training process through network and data,/>Is changed with the training process.
As described above, the use of the feature data of the source domain sample and the target domain sample extracted by the feature extractor is as follows:
1. Source domain features The input label classifier is used for identifying fault types, and classifying loss/>' is obtained by calculating loss according to the identification result
2. Source domain featuresAnd target Domain features/>The common input domain discriminator is used for discriminating the domain, and the counterloss/> is obtained by calculating the loss through gradient inversion (GRL) according to the discrimination result
As a preferred implementation, in the embodiment of the present application, the source domain features are calculatedAnd target domain featuresIs to construct a loss function/>, which is a distribution loss, of Wasserstein distance
As shown in fig. 3, the total loss function for the WD-DANN obtained by finally merging the above losses is:
The process of optimizing the loss derives different parameters respectively, and the total loss can be developed as follows:
the Wasserstein distance is added to the loss function to further reduce the distance between the source and target feature distributions. The goal of the WD-DANN method training process is to minimize the distribution difference between the source domain samples and the target domain samples:
Wherein the method comprises the steps of The value is 0-1, the calculation mode is shown in the formula 6, and we take/>,/>Along with the training process, the linear change is carried out from 0 to 1:
Wherein Iter is the number of iterations, p varies from 0 to 1 with the Iter, so the maximum value of the Iter is epoch sample_num/batch_size, wherein Representing the round, representing one forward pass and one backward pass of all training samples,/>Indicating the batch size, sample_num indicates the number of samples.
By way of example and not limitation, the relevant superparameter settings in embodiments of the present application are as follows:
1. epoch is set to 200;
2、batch size, its size affects training speed and model fitting, increasing batch/> The size accelerates training, but the accuracy of the model is reduced to a certain extent, and the batch/>, which is set by comprehensive experimental resultsThe size is 64, at this time, 64 samples are input into the network for each training, wherein the number of the source domain samples and the number of the target domain samples are respectively 32;
3、 Is set at the beginning of training,/> The value of Wasserstein distance at this time has a smaller specific gravity on WD-DANN, and the model is more focused on classification accuracy; as training proceeds,/>The value of (c) will gradually increase, loss optimization based on wasperstein distance gradually takes up a greater specific gravity, WD-DANN being more focused on the ability of domain adaptation. Finally,/>Will tend to/>Namely, in the whole training process, the balance between domain self-adaption and classification accuracy achieves a balance point;
4 initial learning Rate Set to 0.01, learning rate/>The formula of (2) is as follows:
wherein, Set to 10,/>Set to 0.75.
Then during the course of the training process,The change of 2 is shown in FIG. 4,/>The variation of (2) is shown in fig. 5.
As a preferred implementation manner, in the embodiment of the present application, the preprocessing the source domain sample and the target domain sample that are obtained in advance includes:
and carrying out Fourier transformation on the source domain sample and the target domain sample.
Specifically, fourier transformation is performed on the source domain sample and the target domain sample, and fault frequency domain information is extracted. The frequency domain data of motor faults are divided into different working conditions according to different motor loads, the no-load condition is used as a source domain working condition, and the loaded condition is used as a target domain working condition. The source domain working condition comprises no-load data of the fault motor and corresponding fault labels, and the target domain condition only comprises load data of the fault motor and does not comprise labels.
The model training method provided in the practice of the present application will be described below by taking motor failure diagnosis as an example.
Step one, a training data set is obtained, wherein the training data set comprises a source domain sample and a target domain sample.
Specifically, a fault diagnosis experiment is carried out by adopting variable-load motor vibration acceleration signals collected by a rotary equipment fault simulation test bed, wherein the motor rotating speed is 1495rpm, the sampling frequency of an acceleration sensor is 3300Hz, each sample comprises 512 data points of one-dimensional vibration signals, and the data set dividing mode is as follows:
And taking load 0NM (i.e. no load) data as a source domain sample, and taking data with loads of 2NM,4NM and 6NM as a target domain sample, wherein the source domain sample comprises signals and corresponding fault labels, and the target domain sample only comprises signals and no labels.
The motor vibration signals with variable loads are respectively marked as 5 working conditions including working condition A, working condition B, working condition C, working condition D and working condition E when 2/4/6NM loads are fused, and each working condition contains 11 motor running states including normal running (0 NOR) of the motor and various faults. Because the data without load is easier to collect than the data with load in the actual industrial scene, the data volume is also larger, so that the sample volume of the source domain of the input model is far larger than that of the target domain in the training process, and specific information is shown in table 3.
Table 3 introduction of variable load dataset operating mode types
Preferably, in order to avoid repeated memory of the same data in the model training process, the data can be randomly disturbed and extracted, and the same number of data input models are randomly extracted from the source domain and the target domain in each training, and the processing method can ensure that training data used in the WD-DANN model training process is more close to real distribution of actual data, so that the possibility of over-fitting problem is reduced.
And step two, data preprocessing, namely carrying out Fourier transformation on the source domain sample and the target domain sample.
It can be understood that in the embodiment of the application, fourier transform data of triaxial acceleration signals obtained by actually measuring the fault motor is used as input of a WD-DANN model.
Specifically, using python's scipy tool to fit the distribution histogram of the motor vibration frequency domain data of different loads, taking the acceleration signal of the y axis as an example, fig. 6 shows the probability distribution of the frequency spectrum under different loads when the direct current component is removed, wherein the abscissa is the frequency amplitude, and it can be seen that the data distribution of different loads is similar but different.
And thirdly, diagnosing based on the WD-DANN method under different loads.
Specifically, for the case where the fault data are in parallel with the actual industrial production, 5 experiments were designed, and the corresponding serial numbers and experimental interpretation thereof are shown in table 4.
Table 4 variable load experimental condition settings
For experiment 1, the motor data A without load is used as a training set, training is carried out on a one-dimensional convolutional neural network, the accuracy test is carried out on the data E with mixed load by the obtained model, and the migration learning is not involved in experiment 1. For experiments 2, 3, 4 and 5, the experiments of deep migration learning on the WD-DANN model are carried out by taking A as source domain data and different load working conditions as target domain data respectively.
The diagnosis accuracy of the variable load fault diagnosis method based on WD-DANN under different working conditions is shown in table 5, which shows a traditional deep learning-based method and four working condition migration conditions, and detailed experimental parameters can be seen in table 5.
TABLE 5 WD-DANN based variable load fault diagnosis accuracy
As can be seen from the above table, under the condition that the transfer learning is not performed, the accuracy rate is very low in the test of E, and it can be obtained that only the unloaded data is used as the training set, and the detailed characteristics of the fault cannot be completely extracted even in the deep learning network, but there is a certain probability that the recognition is successful, which also means that the deep learning method can only extract part of the fault characteristics and cannot reject the load characteristics. The features under different loads must be aligned using a method of transfer learning.
The WD-DANN algorithm can average up to 95.07% of fault diagnosis accuracy under other four migration experiments, and can alsoThe highest recognition accuracy is achieved in the migration experiment, and the highest accuracy can reach 98.95%. Compared with a method without transfer learning, the performance is improved by 289.46 percent. The result shows that the fault analysis method based on WD-DANN transfer learning has greatly improved fault diagnosis performance under the condition of variable load.
In order to more intuitively show the diagnostic performance of the WD-DANN method under each working condition, the confusion matrix is used to respectively show the diagnostic results of the four working condition migration experiments on fault data, as shown in fig. 7a to 7d, it can be seen from the confusion matrix that when no-load data is used as a source domain and load data is used as a target domain, the performance is good under the E working condition of mixed load, but the fault of the outer ring of the bearing is easily identified as the fault of the rolling body of the bearing. In addition, under the condition of similar load conditions, the diagnosis accuracy is higher, for example, a certain gap exists between the test results of the load B and the load D, and the main problem is also that the diagnosis of looseness of the bearing and the motor base is different. This also provides direction for future model improvements.
The confusion matrix can be calculated to obtain the error classification condition after iteration, the diagonal line represents that the model prediction is correct, and other positions represent that the prediction is incorrect. The gradation map on the right side represents the correct number of predictions, and the color shades on the left side have a correspondence.
Example two
Corresponding to the first embodiment, the present invention further provides a fault diagnosis method, which is based on the training method to obtain the target domain antagonistic neural network model, wherein in this embodiment, the same or similar content as that in the first embodiment can be referred to the above description, and will not be repeated. Referring to fig. 8, the method includes the steps of:
s10: obtaining measured data of equipment to be diagnosed, wherein the measured data comprises vibration signals of the equipment to be diagnosed;
s20: and inputting the measured data into the target domain antagonistic neural network model for identification and classification, and obtaining a diagnosis result.
Specifically, the measured data of the device to be diagnosed includes a vibration signal of the diagnostic device.
Example III
The present invention also provides a model training device corresponding to the first embodiment, wherein in the present embodiment, the same or similar content as that of the first embodiment can be referred to the description above, and the description is omitted. Referring to fig. 9, the apparatus includes:
The processing module is used for preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, inputting the pre-acquired source domain sample and the pre-acquired target domain sample into a pre-constructed domain antagonism neural network model, carrying out feature extraction on the source domain sample and the target domain sample by utilizing the domain antagonism neural network to obtain feature data, carrying out category prediction on the source domain sample and carrying out local domain prediction on the source domain sample and the target domain sample according to the feature data, and calculating model total loss according to a prediction result and the feature data, wherein the model total loss is obtained by carrying out weighted calculation according to classification loss, antagonism loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the feature data of the source domain sample and the target domain sample;
and the updating module is used for updating the parameters of the pre-constructed domain countermeasure neural network model according to the total model loss until a preset updating stop condition is met, so as to obtain the trained target domain countermeasure neural network model.
In some embodiments, when the model training device executes the model training method, the method may further implement steps corresponding to the method in the first embodiment, and reference may be made to the detailed description in the first embodiment, which is not repeated herein.
Example IV
The present invention also provides a fault diagnosis apparatus corresponding to the second embodiment, wherein in the present embodiment, the same or similar content as that of the first embodiment may be referred to the description above, and the description is omitted herein. Referring to fig. 10, the apparatus includes:
The data acquisition module is used for acquiring actual measurement data of equipment to be diagnosed, wherein the actual measurement data comprises vibration signals of the equipment to be diagnosed;
and the fault diagnosis module is used for inputting the measured data into the target domain antagonistic neural network model to identify and classify, and obtaining a diagnosis result.
In some embodiments, when the fault diagnosis device executes the fault diagnosis method in the embodiment of the present application, steps corresponding to the method in the second embodiment may be further implemented, and reference may be made to the detailed description in the second embodiment, which is not repeated herein.
Example five
The present invention also provides a computer device corresponding to the first to fourth embodiments, including: a processor and a memory, the memory storing a computer program executable on the processor, which when executed by the processor, performs the model training method provided by any of the embodiments described above.
FIG. 11 illustrates a computer device 1500 that may include, inter alia, a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.
The processor 1510 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits (ics), etc. for executing related programs to implement the technical solution provided by the present invention.
The memory 1520 may be implemented in the form of ROM (read only memory), RAM (Random Access Memory ), static storage, dynamic storage, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the electronic device, a Basic Input Output System (BIOS) for controlling low-level operation of the electronic device. In addition, a web browser 1523, a data storage management system 1524, a device identification information processing system 1525, and the like may also be stored. The device identification information processing system 1525 may be an application program that implements the operations of the steps described above in embodiments of the present invention. In general, when the present invention is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.
The input/output interface 1513 is used for connecting with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 1514 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
The bus includes a path to transfer information between various components of the device (e.g., the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520).
In addition, the electronic device may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, so as to be used for performing condition judgment, and the like.
It is noted that although the above devices illustrate only the processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, the memory 1520, the bus, etc., in particular implementations, the device may include other components necessary to achieve proper functioning. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present invention, and not all of the components shown in the drawings.
Example six
The present invention also provides a computer readable storage medium corresponding to the first to fifth embodiments, wherein in the present embodiment, the same or similar content as that of the first to fourth embodiments can be referred to the above description, and the description is omitted.
The computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, and inputting the preprocessed source domain sample and the pre-acquired target domain sample into a pre-constructed domain antagonistic neural network model;
Extracting features of the source domain sample and the target domain sample by using the domain antagonistic neural network to obtain feature data, carrying out category prediction on the source domain sample and carrying out local domain prediction on the source domain sample and the target domain sample according to the feature data, and calculating model total loss according to a prediction result and the feature data, wherein the model total loss comprises classification loss, antagonistic loss and distribution loss, and the distribution loss is determined according to bulldozer distance between the feature data of the source domain sample and the feature data of the target domain sample;
and updating parameters of the pre-constructed domain countermeasure neural network model according to the model total loss until a preset updating stop condition is met, so as to obtain a trained target domain countermeasure neural network model.
In some embodiments, when the computer program is executed by the processor, the steps corresponding to the method described in the first embodiment may be further implemented, and reference may be made to the detailed description in the first embodiment, which is not repeated herein.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing has outlined the more detailed description of the preferred embodiment of the present invention and is provided herein as a detailed description of the principles and embodiments of the present invention with the use of specific examples, the above examples being provided for the purpose of facilitating the understanding of the method of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (14)

1. A method of model training, the method comprising:
Preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, and inputting the preprocessed source domain sample and the pre-acquired target domain sample into a pre-constructed domain countermeasure neural network model, wherein the domain countermeasure neural network model comprises a feature extractor, a tag classifier and a domain discriminator; the feature extractor at least comprises at least two one-dimensional convolution layers and a global average pooling layer which are sequentially arranged; the label classifier comprises at least two full-connection layers, and a random inactivation layer is arranged between the two full-connection layers;
the source domain sample and the target domain sample acquisition sources comprise fault motors under different working conditions;
Extracting features of the source domain sample and the target domain sample by using the domain antagonistic neural network to obtain feature data; performing category prediction on the source domain sample and performing local domain prediction on the source domain sample and the target domain sample according to the characteristic data to obtain a prediction result, and calculating model total loss according to the prediction result and the characteristic data, wherein the model total loss is calculated based on a formula of l=l y2(Ld+Lwd), L is the model total loss, L y is the classification loss, L d is the counterloss, L wd is the distribution loss, and lambda 2 is the weight coefficient; the distribution loss L wd is determined according to the bulldozer distance between the characteristic data of the source domain sample and the target domain sample;
and updating parameters of the pre-constructed domain countermeasure neural network model according to the model total loss until a preset updating stop condition is met, so as to obtain a target domain countermeasure neural network model.
2. The method for training a model according to claim 1,
The feature extractor is used for extracting feature data of the source domain sample and the target domain sample;
The one-dimensional convolution layer is used for carrying out one-dimensional convolution calculation on the data in the source domain sample and the target domain sample to obtain initial characteristics, and the global average pooling layer is used for carrying out dimension reduction on the initial characteristics to obtain the characteristic data;
the label classifier is used for carrying out category prediction on the source domain sample according to the characteristic data of the source domain sample;
the domain discriminator is used for predicting the local domain of the source domain sample and the target domain sample according to the characteristic data of the source domain sample and the target domain sample.
3. The model training method of claim 2, wherein the feature extractor further comprises an input layer and a flattening layer, the input layer, the two one-dimensional convolution layers, the global averaging pooling layer, and the flattening layer being disposed in sequence.
4. A model training method according to any one of claims 1 to 3, wherein the calculation formula of the bulldozer distance is:
wherein, For joint probability distribution,/>And/>Is two probability distributions over a given metric space,/>Representing distribution/>And/>Set of all possible joint probability distributions of a combination,/>Is from/>Samples of the source domain and the target domain obtained by middle sampling,/>Representation/>And/>Distance of/>Expressed in joint probability distribution/>The expected value of the sample versus distance.
5. A model training method as claimed in any one of claims 1 to 3, wherein the model total loss is calculated from the classification loss, the pair of anti-losses and the distribution loss weighting;
Wherein the classification loss is determined according to the characteristic data of the source domain sample; and/or, the fight loss is determined from the characteristic data of the source domain sample and the target domain sample.
6. The model training method of claim 5, wherein the classification loss is calculated by the formula:
wherein, And/>For data of source domain samples and corresponding labels,/>Output of finger feature extractor,/>Tagged data representing a desired dataset as a source domain,/>Refers to the distribution of fault source domain samples,/>Output of index tag classifier,/>Is a weight parameter of the feature extractor,/>Is a weight parameter of the tag classifier;
And/or, the calculation formula of the countermeasures loss is as follows:
wherein, And/>For data of source domain samples and corresponding labels,/>Tagged data representing a desired dataset as a source domain,/>Is the output of the domain arbiter,/>Output of finger feature extractor,/>Data that is a target domain sample;
and/or, the calculation formula of the distribution loss is as follows:
wherein, And/>Is two probability distributions over a given metric space,/>Untagged data representing the desired dataset as a source domain sample,/>Untagged data representing the desired dataset as the target domain,/>Refers to satisfying/>Upper bounds in the case.
7. The model training method of claim 2, wherein the feature extractor is connected to the domain arbiter through a gradient inversion layer, the input of the gradient inversion layer is the feature data, and the output of the gradient inversion layer is a feature of the same dimension as the feature dataSaid/>Expressed as:
wherein, Representing the feature data extracted by the feature extractor,/>Representing the gradient of the challenge loss to the feature extractor parameters,/>Is a super parameter of the gradient inversion layer.
8. The model training method of claim 1, wherein the different conditions include different loads of the failed motor; or the source domain sample and the target domain sample comprise vibration signals acquired from fault motors under different working conditions.
9. A model training method according to any one of claims 1 to 3, wherein the source domain samples and the target domain samples are time series signals.
10. A model training method according to any one of claims 1 to 3, wherein the pre-processing of pre-acquired source domain samples and target domain samples comprises:
and carrying out Fourier transformation on the source domain sample and the target domain sample.
11. A model training method according to any one of claims 1 to 3, wherein the predetermined update stop condition comprises at least one of a domain antagonism neural network model reaching a predetermined number of iterations, or a true label error rate of the data in the source domain sample and the predicted label of the label classifier predicted feature data being smaller than a first set value, or a domain discriminator domain prediction accuracy of the feature data being larger than a second set value.
12. A fault diagnosis method, characterized in that the method is based on a target domain antagonistic neural network model trained by the model training method according to any one of claims 1to 11, the method comprising:
Obtaining measured data of equipment to be diagnosed, wherein the measured data comprises vibration signals of the equipment to be diagnosed;
and inputting the measured data into the target domain antagonistic neural network model for identification and classification, and obtaining a diagnosis result.
13. A model training apparatus, the apparatus comprising:
The processing module is used for preprocessing a pre-acquired source domain sample and a pre-acquired target domain sample, inputting the pre-processed source domain sample and the pre-acquired target domain sample into a pre-constructed domain countermeasure neural network model, extracting characteristics of the source domain sample and the target domain sample by utilizing the domain countermeasure neural network to obtain characteristic data, carrying out category prediction on the source domain sample and carrying out local domain prediction on the source domain sample and the target domain sample according to the characteristic data, and calculating model total loss according to a prediction result and the characteristic data, wherein the model total loss is calculated based on a formula L=L y2(Ld+Lwd), L is the model total loss, L y is the classification loss, L d is the countermeasure loss, L wd is the distribution loss and lambda 2 is a weight coefficient, and the distribution loss L wd is determined according to bulldozer distance between the characteristic data of the source domain sample and the target domain sample;
The updating module is used for updating the parameters of the pre-constructed domain countermeasure neural network model according to the total model loss until a preset updating stop condition is met, so as to obtain a target domain countermeasure neural network model;
The domain countermeasure neural network model comprises a feature extractor, a tag classifier and a domain discriminator; the feature extractor at least comprises at least two one-dimensional convolution layers and a global average pooling layer which are sequentially arranged; the label classifier comprises at least two full-connection layers, and a random inactivation layer is arranged between the two full-connection layers;
The source domain sample and the target domain sample acquisition source comprise fault motors under different working conditions.
14. A fault diagnosis apparatus which is trained based on the target domain antagonistic neural network model obtained by the model training method according to any one of claims 1 to 11, characterized in that the apparatus comprises: the data acquisition module is used for acquiring actual measurement data of equipment to be diagnosed, wherein the actual measurement data comprises vibration signals of the equipment to be diagnosed;
and the fault diagnosis module is used for inputting the measured data into the target domain antagonistic neural network model to identify and classify, and obtaining a diagnosis result.
CN202311253060.XA 2023-09-27 2023-09-27 Model training method, fault diagnosis method and device Active CN116992953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311253060.XA CN116992953B (en) 2023-09-27 2023-09-27 Model training method, fault diagnosis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311253060.XA CN116992953B (en) 2023-09-27 2023-09-27 Model training method, fault diagnosis method and device

Publications (2)

Publication Number Publication Date
CN116992953A CN116992953A (en) 2023-11-03
CN116992953B true CN116992953B (en) 2024-04-19

Family

ID=88526968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311253060.XA Active CN116992953B (en) 2023-09-27 2023-09-27 Model training method, fault diagnosis method and device

Country Status (1)

Country Link
CN (1) CN116992953B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110414383A (en) * 2019-07-11 2019-11-05 华中科技大学 Convolutional neural networks based on Wasserstein distance fight transfer learning method and its application
CN110907176A (en) * 2019-09-30 2020-03-24 合肥工业大学 Wasserstein distance-based fault diagnosis method for deep countermeasure migration network
CN111898634A (en) * 2020-06-22 2020-11-06 西安交通大学 Intelligent fault diagnosis method based on depth-to-reactance-domain self-adaption
CN112183581A (en) * 2020-09-07 2021-01-05 华南理工大学 Semi-supervised mechanical fault diagnosis method based on self-adaptive migration neural network
CN114358124A (en) * 2021-12-03 2022-04-15 华南理工大学 Rotary machine new fault diagnosis method based on deep-antithetical-convolution neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110414383A (en) * 2019-07-11 2019-11-05 华中科技大学 Convolutional neural networks based on Wasserstein distance fight transfer learning method and its application
CN110907176A (en) * 2019-09-30 2020-03-24 合肥工业大学 Wasserstein distance-based fault diagnosis method for deep countermeasure migration network
CN111898634A (en) * 2020-06-22 2020-11-06 西安交通大学 Intelligent fault diagnosis method based on depth-to-reactance-domain self-adaption
CN112183581A (en) * 2020-09-07 2021-01-05 华南理工大学 Semi-supervised mechanical fault diagnosis method based on self-adaptive migration neural network
CN114358124A (en) * 2021-12-03 2022-04-15 华南理工大学 Rotary machine new fault diagnosis method based on deep-antithetical-convolution neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Shen Jian 等.Wasserstein Distance Guided Representation Learning for Domain Adaptation.《National Conference on Artificial Intelligence》.2018,第1-12页. *
Unsupervised Domain Adaptation by Backpropagation;Yaroslav Ganin 等;《Arxiv》;第1-11页 *
Wasserstein Distance Guided Representation Learning for Domain Adaptation;Shen Jian 等;《National Conference on Artificial Intelligence》;第1-12页 *
吴心筱 等.《迁移学习基础及应用》.北京理工大学出版社,2021,第115-120页. *

Also Published As

Publication number Publication date
CN116992953A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
Zhao et al. Deep multi-scale convolutional transfer learning network: A novel method for intelligent fault diagnosis of rolling bearings under variable working conditions and domains
CN111964908B (en) Bearing fault diagnosis method under variable working condition based on multi-scale weight distribution convolutional neural network model
CN112964469B (en) Online fault diagnosis method for rolling bearing under variable load of transfer learning
CN110657984B (en) Planetary gearbox fault diagnosis method based on reinforced capsule network
US11681913B2 (en) Method and system with neural network model updating
CN110879982A (en) Crowd counting system and method
CN108416373A (en) A kind of unbalanced data categorizing system based on regularization Fisher threshold value selection strategies
US20220245405A1 (en) Deterioration suppression program, deterioration suppression method, and non-transitory computer-readable storage medium
CN111325224A (en) Computer-readable storage medium, input data checking method, and computing device
CN114358124A (en) Rotary machine new fault diagnosis method based on deep-antithetical-convolution neural network
CN117474918B (en) Abnormality detection method and device, electronic device, and storage medium
CN116579616B (en) Risk identification method based on deep learning
CN116992953B (en) Model training method, fault diagnosis method and device
CN116862893A (en) Industrial part anomaly detection method and system based on dynamic feature center
Hou et al. Imbalanced fault identification via embedding-augmented Gaussian prototype network with meta-learning perspective
CN116757533A (en) Industrial equipment abnormality detection method and related device
Liu et al. A dual-branch balance saliency model based on discriminative feature for fabric defect detection
CN114387524B (en) Image identification method and system for small sample learning based on multilevel second-order representation
CN110705631A (en) SVM-based bulk cargo ship equipment state detection method
Wu et al. Remaining useful life prediction of bearings with different failure types based on multi-feature and deep convolution transfer learning
CN116151319A (en) Method and device for searching neural network integration model and electronic equipment
CN114528906A (en) Fault diagnosis method, device, equipment and medium for rotary machine
Pristyanto et al. Comparison of ensemble models as solutions for imbalanced class classification of datasets
Chen et al. A residual convolution transfer framework based on slow feature for cross-domain machinery fault diagnosis
CN110728292A (en) Self-adaptive feature selection algorithm under multi-task joint optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant