CN114818996A

CN114818996A - Method and system for diagnosing mechanical fault based on federal domain generalization

Info

Publication number: CN114818996A
Application number: CN202210738070.1A
Authority: CN
Inventors: 宋艳; 李沂滨; 贾磊; 崔明; 王代超
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-07-29
Anticipated expiration: 2042-06-28
Also published as: CN114818996B

Abstract

The invention discloses a method and a system for diagnosing mechanical faults based on the generalization of the federal domain, which relate to the technical field of fault diagnosis. In a second step, the client independently trains the model using its own training data set. And thirdly, sending the models trained by all the clients to a server, and averaging all the model parameters in the server to obtain a global model. And fourthly, the client and the central server cooperate to train the global model. In the testing stage, the server sends the global model to the client side containing the target domain data to complete fault diagnosis. The invention utilizes the inherent relation between the label and the characteristic of the source domain data, and completes the training of the global fault diagnosis model by weighting and aggregating the training loss and the model parameters of different client models in the central server.

Description

Method and system for diagnosing mechanical fault based on federal domain generalization

Technical Field

The invention relates to the technical field of fault diagnosis, in particular to a method and a system for diagnosing mechanical faults based on the generalization of the federal domain.

Background

The mechanical fault data are generally from devices with different models, different working conditions or different operating environments, and a fault diagnosis model trained by using the data in cooperation has the defects of low accuracy and poor generalization capability on the prediction of new data. The domain generalization and domain adaptation method in the transfer learning solves the domain drift problem by aligning the data feature space. In the domain generalization and domain adaptation methods, labeled data is generally called source domain data, and unlabeled data to be predicted is called target domain data. The domain adaptation realizes the fault prediction of the target domain data by carrying out feature space alignment on the source domain data and the target domain data. Unlike domain adaptation, which only has source domain data and no target domain data, domain generalization achieves domain migration by exploiting the internal relationships of data features and labels in the source domain.

In the prior art, a certain amount of Fault Diagnosis is studied based on Domain Generalization, and a Rolling Bearing Fault Diagnosis method based on Domain Generalization is proposed in a document [ Deep Domain Generalization combination A principle Diagnosis Knowledge heated Cross-Domain Fault Diagnosis of Rolling Bearing ]. The method eliminates potential differences among multiple domains under the condition that the target domain only has a healthy sample, and realizes efficient fault diagnosis. The document [ Conditional adaptive Domain Generalization With a Single Discriminator for Bearing Fault Diagnosis ] proposes a condition-to-immunity Domain Generalization method With a Discriminator, and aims to extract Domain invariant features from data With different working conditions and generalize the features into new Fault data. A novel Intelligent Fault Identification method Based on a multi-source Domain is provided in a document [ Intelligent Fault Identification Based on multi-source Domain general knowledge scientific scienio ]. The method describes the discriminant structure of each source domain as a point of the Grassmann manifold using local Fisher discriminant analysis. By preserving the local structure within the class, the local Fisher discriminant analysis can learn an effective discriminator from multi-modal fault data. A multi-Source Domain Adaptation probability learning Method is proposed in the document (A New Multiple Source Domain Adaptation probability methods Between Different learning Machines). The method uses a multi-pair learning strategy to obtain a feature representation of domain alignment while having discriminability for a target domain. The document [ Deep adaptive Domain Adaptation Model for Bearing Fault Diagnosis ] proposes a depth-to-anti-Domain adaptive Model for Fault Diagnosis of a rolling Bearing. The model constructs an anti-domain adaptation network to solve the problem of inconsistent distribution of source domain and target domain characteristics.

It can be seen that the data-driven fault diagnosis algorithm trains the diagnostic model based on a large amount of fault data. Therefore, in order to guarantee the effectiveness of the deep learning method, as much fault data as possible needs to be used in an aggregation manner. However, due to data security and privacy requirements, the aggregated use of data by different clients is not allowed in most cases. Therefore, in order to effectively aggregate and use data on the premise of ensuring the data safety of different clients and solve the problem of data island in the deep learning process, federal learning is carried forward. In federal learning, the learning task is federally addressed by multiple participating devices (i.e., clients) under the coordination of a central server. From the perspective of theoretical research, scholars at home and abroad develop research on common scientific problems in federal learning, such as the problem of non-independent and same distribution of data, the problem of no-labeled data, safety and the like. From the application perspective, a great deal of research has been conducted by domestic and foreign scholars on how to combine federal learning with a specific application scenario, such as finance, medical treatment, robots, smart cities, and the like.

Federal learning uses data of different clients to cooperatively train a model, but due to different operating conditions or models of devices of different clients, the data usually has domain drift problems, so federal migration learning is concerned by more and more researchers. Zhang et al (Federal Transfer Learning method for Intelligent Fault Diagnostics Using Deep adaptive Networks with Data Privacy) provides a Federated Transfer Learning method for Fault diagnosis, which designs different network model structures for different clients. The document [ Data privacy fed transfer learning in mechanical failure diagnostics using prior distributions ] proposes a joint migration learning method for mechanical failure diagnosis. The method provides that the domain drift problem is indirectly solved by using prior distribution, and the fault diagnosis is carried out by extracting the domain invariant features of different users.

The existing federal migration learning fault diagnosis method considers the problems of data safety and domain drift between a source domain and a target domain. However, the existing method assumes that the target domain data exists and participates in the training process, and does not consider the problems that the target domain data is unavailable and the model is trained individually.

Disclosure of Invention

Aiming at the problems, the invention provides a method and a system for diagnosing the mechanical fault based on the generalization of the federal domain, which train an individualized fault diagnosis model for each client based on the difference of source domain data.

In the invention, in order to ensure the safety of data, fault data and fault characteristics are not shared between the client and between the client and the central server. On the other hand, the method provided by the invention trains a global fault diagnosis model in a central server by using partial model parameters and weighted loss of different clients.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the first aspect of the disclosure provides a mechanical fault diagnosis method based on federal domain generalization, which includes the following steps:

the central server randomly initializes the global model and sends the global model to all the clients;

the client independently trains the model by using the training data set of the client;

sending the models trained by all the clients to a central server, and averaging all the model parameters in the central server to obtain a global model;

the central server sends the global model to all the clients, and the clients and the central server cooperate to train the global model;

and the central server sends the trained global model to a client containing target domain data to complete fault diagnosis.

Further, the central server sends the global model to all the clients, and the clients complete the following tasks:

calculating the classification loss based on the classification loss function and sending the classification loss to the central server;

acquiring the output characteristics of each client characteristic extraction network, and sending the covariance matrix of the output characteristics to a central server;

and calculating the invariant risk minimization loss based on the invariant risk minimization loss function and sending the invariant risk minimization loss to the central server.

Further, the classification loss is

The classification loss function is:

wherein, the first and the second end of the pipe are connected with each other,

is as follows

The training data of the individual clients is,

is a first

A clientThe training data set of the terminal is a true label,

is a prediction result.

Further, the invariant risk minimization loss function is as follows:

wherein IRM is an invariant risk minimization loss,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is greater than the total number of the (c),

、

in order to input the data, the data is,

、

in order to input the label, the user must,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

The group characteristics and the label of the tag,

is the first

To a client

Group characteristics and labels;

is a function of the classification loss for the,

is a network of feature extraction that is,

is a scalar quantity.

Further, the central server receives the classification loss and invariant risk minimization loss of all the clients and the covariance matrix of the features; and calculating the second-order statistical characteristic distance of the characteristic covariance matrix of every two clients in the central server, and obtaining characteristic distance measurement loss based on a characteristic distance measurement loss function.

Further, the feature distance metric loss function is as follows:

，

wherein the content of the first and second substances,

for the feature distance metric loss, N represents the number of clients; f denotes the F-norm of the matrix,

is the size of the feature vector(s),

in the form of a covariance matrix,

、

representing the characteristic covariance matrix of any two clients.

Further, a global penalty value is calculated at the central server based on the global penalty function, and a global model of the central server is trained based on the global penalty value back propagation.

Further, the global penalty function is as follows:

，

wherein the content of the first and second substances,

categorical loss, invariant risk minimization loss and feature distance metric, respectivelyThe loss of the carbon dioxide gas is reduced,

is as follows

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

is shown as

The loss value of a sample of source domain datasets, N representing the number of clients.

A second aspect of the present disclosure provides a mechanical fault diagnosis system based on federal domain generalization, including:

a central server and a client; the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; the central server is also used for initializing the global model.

Furthermore, each client comprises a feature extraction network and a classification network, wherein N clients comprise N source domain data sets, and the (N + 1) th client comprises a target domain data set.

The beneficial effects of the above-mentioned embodiment of the present invention are as follows:

according to the method, a federal learning mode is adopted, the data of the source domain does not need to be leaked to other untrusted third parties, the privacy of the source domain data is protected, the safety of the data is ensured, and a global fault diagnosis model is trained on the basis of the source domain data of all clients; compared with other domain adaptation methods, the data security and the fault diagnosis accuracy of the federal domain generalized fault diagnosis method provided by the invention are improved, the interpretability of a machine learning method is improved, and the problem of cross-domain fault diagnosis is fundamentally solved.

The method comprehensively considers the internal inherent relation between the data characteristics and the labels, the spatial distance of the client data characteristics and the migration among different client models, takes the intrinsic causal relation between the fault characteristics and the fault types of each client as a training objective function, and trains under a federal learning framework to obtain the fault diagnosis model with strong generalization capability.

The method provided by the invention does not share fault data or characteristics, reduces the difference of data in different fields through the training and migration of part of model parameters in each client, adopts a model transfer strategy in a characteristic extraction layer of a client model, and reduces the workload of a training model on the premise of not influencing the generalization capability of the model.

The method provided by the invention quantifies the difference of the characteristic space distances of different source domains, and realizes domain generalization by weighting the model losses of different clients.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a structural relationship diagram of a Federal learning center server and a client in a conventional method;

FIG. 2 is a framework diagram of the federated domain generalization method of the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and/or "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;

the first embodiment is as follows:

let N clients contain N source domain data sets

The (N + 1) th client has a target domain data set

Wherein, in the step (A),

represents the total number of samples of the target domain data set,

denotes the first

The number of samples of the target domain data set,

range of values from 1 to

An integer in between.

Is shown as

The individual source domain data set samples are,

is an integer between 1 and N. The target domain data set does not participate in model training and is only used for testing and the source domainAnd (5) participating in training.

Each client has a feature extraction network and a classification network. Setting the model set of N characteristic extraction networks as

The set of N classification network models is

. Setting the global characteristic extraction network model in the central server as

The global classification network model is

。

In order to effectively aggregate and use data on the premise of ensuring the data safety of different clients and solve the problem of data island in the deep learning process, federal learning is carried forward. The distribution architecture of the client and the central server in the traditional federal learning is shown in fig. 1. In federal learning, the learning task is federally addressed by multiple participating devices (i.e., clients) under the coordination of a central server.

The existing federal migration learning fault diagnosis method considers the problems of data safety and domain drift between a source domain and a target domain. However, the existing method assumes that the target domain data exists and participates in the training process, and does not consider the problems that the target domain data is unavailable and the model is trained individually. Aiming at the problems, the invention trains an individualized fault diagnosis model for each client based on the principle that source domain data are different, and as shown in fig. 2, the training of a global model is completed by performing weighted aggregation of loss gradients on training losses and model parameters of different client models in a central server by utilizing the inherent relationship between the labels and the characteristics of the source domain data. In order to ensure the safety of data, fault data and fault characteristics are not shared between the client and between the client and the central server. On the other hand, the method provided by the invention uses partial model parameters and weighting loss of different clients to carry out loss gradient aggregation in the central server, thereby training the model and overcoming the defects that target domain data is unavailable and the model is not trained in a personalized way.

The first embodiment of the disclosure provides a mechanical fault diagnosis method based on federal domain generalization, and in a training phase, a central server firstly transmits a randomly initialized global model to all clients. The client side independently trains the models by using the training data set of the client side, then transmits all client side models to the central server, averages all model parameters in the central server to obtain a global model, and then transmits the processed model to the client side. The client transmits the loss to the central server, and based on the loss value, the central server trains the global model. And finally, sending the trained global model to a client with target domain data, and inputting fault data to be tested to the trained global model to diagnose the fault. The method specifically comprises the following steps:

first, the central server initializes the model randomly

And

to each client.

Second, the client will

And

as its initial model, the client model is obtained by training model parameters using a data set in the client

And

，

is shown as

The individual source domain data set samples are,

is an integer between 1 and N.

Third, set of all client models

And

is sent to a central server which averages all model parameters to obtain a global model

And

wherein, in the step (A),

a global model of the network is extracted for the central server features,

a global model of the network is classified for the central server. The specific feature extraction network and classification network structures are shown in the following table:

TABLE 1 network architecture

As shown in the above table, the feature extraction network consists of three sets of one-dimensional convolutional layers, batch normalization layers, modified linear unit layers, and one-dimensional maximum pooling layers connected in series. The number of convolution kernels of the three one-dimensional convolution layers is 128, the sizes of the convolution kernels are 17/17/3 respectively, and convolution step lengths are 1; batch normalization layer no parameter; the parameters of the three modified linear units are all 0.2; the parameters for the three one-dimensional maximum pooling layers are 16/16/2, respectively.

The classification network consists of a full connection layer, a batch standardization layer, a modified linear unit layer, a Dropout layer and a Softmax layer. Wherein the parameter of the full connection layer is 512, the parameter of the correction linear unit layer is 0.2, the parameter of random zero setting is 0.3, and the parameter of the Softmax layer is the number of fault categories.

The fourth step, the central server will

And

sending the data to all clients, and completing the following tasks by the clients:

calculating classification loss

. Wherein

For the training data of the k-th client,

for the training data set true label of the kth client,

is a prediction result. Will classify the loss

And sending the data to a central server.

Obtaining the output characteristics of each client terminal characteristic extraction network, and outputting the covariance matrix of the output characteristics

And sending the data to a central server.

Invariant Risk Minimization loss (Invariant Risk Minimization) was calculated. Invariant risk minimization assumes that the distribution of data in different domains is different, but the causal relationship of data features to tags is constant. The causal relationship between the tags and features does not change as conditions or environments change. The purpose of invariant risk minimization is to find out the potential invariance of different domains. The invariant risk minimization loss function is as follows:

wherein IRM is an invariant risk minimization loss,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is,

、

in order to input the data, the data is,

、

in order to input the label, the user must,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

The group characteristics and the label of the tag,

is the first

To a client

Group characteristics and labels;

is a function of the classification loss for the,

is a network of feature extraction that is,

is a scalar quantity.

Fifthly, the central server receives the loss of all the clients

And

and covariance matrix of features

. Calculating the second-order statistical characteristic distance of the characteristic covariance matrix of every two clients in the central server to obtain a characteristic distance measurement loss function as follows:

wherein the content of the first and second substances,

in order to characterize the loss of the distance metric,

is the size of the feature vector, N represents the number of clients; f denotes the F-norm of the matrix,

denotes the first

The number of the client-side is small,

、

representing the characteristic covariance matrix of any two clients.

And sixthly, calculating a global loss value on the central server, wherein a global loss function is as follows:

，

wherein the content of the first and second substances,

classification loss, invariant risk minimization loss and feature distance metric loss,

is as follows

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

is shown as

By weighting, clients with poor classification performance (large loss value) contribute large loss value proportion, so the selection of the model with the worst performance is considered in the global model training.

And seventhly, training the fault diagnosis model of the central server in a back propagation mode based on the global loss value. Because the low-level feature extraction network is a general feature of fault data, namely, the difference of the model parameters in different clients is small, in order to reduce the training burden of the model, a model transfer strategy is adopted, the low-level network parameters of the global model are frozen, only the parameters in the high-level network are trained, and the workload of the training model is reduced on the premise of not influencing the generalization capability of the model.

The model transfer strategy means that in the process of cooperatively training a model by a client and a server, because the characteristics extracted by the low-level network parameters of the characteristic feature extraction layer are general characteristics, in the process of training the model by the central server, the low-level network parameters can be frozen, and only the parameters of a higher-level network are trained.

And finally, sending the trained global model to a client with target domain data, and inputting fault data to be tested to carry out fault diagnosis.

And (3) experimental verification:

the invention verifies the fault diagnosis accuracy and safety of the fault diagnosis model obtained by the application through the following experiments.

1) Introduction of data set: the bearing failure data set was provided by the university of Keiss West reservoir (CWRU). In this data set, the bearings had three different failure diameters, including 7, 14, and 21 mils. The label information is shown in table 2, and the condition information is shown in table 3. The data set sample length is 4096. The training and testing protocols of the present invention are shown in table 4, for example.

Case 1 indicates that the 1 st experiment uses the data sets numbered 0 and 1 in table 3 as source domain 1 and source domain 2, respectively, and the data set numbered 3 as the target domain.

Case 2 indicates that the 2 nd experiment will use the data sets numbered 1 and 2 in table 3 as source domain 1 and source domain 2, respectively, and the data set numbered 3 as the target domain.

TABLE 2 CWRU Fault data set tag information

TABLE 3 CWRU Equipment Condition

TABLE 4 Source and target Domain information in CWRU experiments

2) The experimental results are as follows: the results of the experiment are shown in Table 5. It can be seen from table 5 that, compared With the Domain adaptation method in the paper [ Conditional adaptive Domain genetic With a Single resolver for Bearing Fault Diagnosis ], the method provided by the present invention can improve the Fault Diagnosis accuracy of the target Domain on the premise of ensuring the data security of the source Domain.

TABLE 5 results of the experiment

Example two:

the second embodiment of the present disclosure provides a mechanical fault diagnosis system based on federal domain generalization, including:

the central server is used for initializing a global model, the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; each client comprises a feature extraction network and a classification network, N clients comprise N source domain data sets, and the (N + 1) th client comprises a target domain data set.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The mechanical fault diagnosis method based on the generalization of the federal domain is characterized by comprising the following steps of:

2. The method for diagnosing mechanical failure based on generalization of the federal domain as claimed in claim 1, wherein said fault diagnosis means is a fault diagnosis means,

the central server sends the global model to all the clients, and the clients complete the following tasks:

3. The method for diagnosing mechanical failure based on generalization of the federal domain as claimed in claim 2, wherein,

is classified as

The classification loss function is:

wherein the content of the first and second substances,

for the training data of the k-th client,

for the training data set true label of the kth client,

is a prediction result.

4. The method of claim 3, wherein the machine fault diagnosis based on the generalization of the Federal domain,

the invariant risk minimization loss function is as follows:

wherein the IRM is an invariant risk minimization loss,

it is meant that the gradient calculation is performed,

b represents

And

the number of the (c) is,

、

in order to input the data, the data is,

、

in order to input the label, the user must,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is composed of

Passing through a feature extraction network

The characteristics of the latter output are such that,

is the first

To a client

The group characteristics and the label of the tag,

is the first

To a client

Group characteristics and labels;

is a function of the classification loss for the,

is a network of feature extraction that is,

is a scalar quantity.

5. The federal domain generalization-based mechanical failure diagnosis method of claim 4, wherein the central server receives classification loss and invariant risk minimization loss of all clients, and a covariance matrix of features; and calculating the second-order statistical characteristic distance of the characteristic covariance matrix of every two clients in the central server, and obtaining characteristic distance measurement loss based on a characteristic distance measurement loss function.

6. The federal domain generalization-based mechanical failure diagnostic method of claim 5, wherein the characteristic distance metric loss function is as follows:

，

wherein the content of the first and second substances,

is the size of the feature vector and is,

is shown as

The number of the client-side is small,

、

representing the characteristic covariance matrix of any two clients.

7. The federal domain generalization-based mechanical fault diagnosis method as claimed in claim 6, wherein the global loss value is calculated on the central server based on a global loss function, and the global model of the central server is trained based on the global loss value back propagation.

8. The federal domain generalization-based mechanical failure diagnosis method of claim 7, wherein the global loss function is as follows:

，

wherein the content of the first and second substances,

is as follows

The number of the client-side is small,

classifying a weight lost for each client, wherein,

is an integer between 1 and N,

is shown as

Loss values for the source domain dataset samples, N represents the number of clients.

9. Mechanical fault diagnosis system based on federal domain generalization, characterized by, includes:

a central server and a client; the central server comprises a global feature extraction network and a global classification network, and the central server simultaneously carries out information interaction with a plurality of clients; the central server is also used for initializing a global model; the characteristic extraction network consists of three groups of one-dimensional convolutional layers, a batch standardization layer, a correction linear unit layer and a one-dimensional maximum pooling layer which are connected in series; the classification network consists of a full connection layer, a batch standardization layer, a modified linear unit layer, a Dropout layer and a Softmax layer.

10. The system according to claim 9, wherein each client includes a feature extraction network and a classification network, and N clients include N source domain data sets, and the (N + 1) th client includes a target domain data set; the characteristic extraction network consists of three groups of one-dimensional convolutional layers, a batch standardization layer, a correction linear unit layer and a one-dimensional maximum pooling layer which are connected in series; the classification network consists of a full connection layer, a batch standardization layer, a modified linear unit layer, a Dropout layer and a Softmax layer.